Item Distribution

We wonder how widely specific items are distributed throughout the federation. So we compute it from our scrape site history. See Search Index Downloads

We find the distribution across sites.

cat sites/ward.asia.wiki.org/pages/*/items.txt | \ sort | \ uniq -c | \ sort -n | \ perl -ne 'print if not /^ 1 /' > dist/sites.txt

We find the distribution across pages.

ls sites | \ while read s; do cat sites/$s/pages/*/items.txt done | \ sort | uniq -c | \ sort -n | \ perl -ne 'print if not /^ 1 /' > dist/pages.txt

We join the two distributions.

p = {} s = {} File.readlines('pages.txt').each do |line| t = line.split(/\s+/) p[t[2]] = t[1].to_i end File.readlines('sites.txt').each do |line| t = line.split(/\s+/) s[t[2]] = t[1].to_i end s.each do |id,ss| pp = p[id] puts "#{ss} #{pp} #{id}" if pp && pp != ss end