This year, for the first time, we have enough activity in the federation to make scraping a sound search strategy.
We can know a site by fetching a summary via sitemap or all pages via export. Sitemap is widely available and usually fast. Export is unique to the node version but can be simulated with sitemap.
We discourage wide spread application of scrapers until we develop algorithms that avoid senseless duplication of server and network traffic.
Ruby Sitemap Scrape with search application.
Ruby Export Scrape and subsequent tallies.
Distributed Search looking for more.
Find More related to any page many ways.