I've suggested that a certain writing style favors replication. Can we learn something about style by study of replications we find by search? Not yet. But maybe.
We start by looking for item ids that are replicated on multiple pages. Often these are twins, but not always. Here is one I have copied often, has have others.
SEARCH ITEMS dcf2b5140b411c4b
We can list all of the item ids search has encountered, here with the most frequent first. items
ALL ITEMS INPUT
I assembled this list with a short script that counts uniques among all sites indexed.
(cd sites; ls | while read i; do cat $i/items.txt; done) | sort | uniq -c | sort -nr >most/items.txt
Here I've applied it to other tallies, slugs being most informative.
The analysis I've done here can be improved. See Search Index Downloads for a tgz file of all indices.