How Scrape Works

The scrape runs every six hours on a schedule that shifts with daylight savings time. The scrape is built from scripts that manipulates files in directories. Some files are rolled up from similarly named files in subdirectories. github

The cron script sequences a series of scripts that collect data from federation sites.

The sites directory accumulates flat file indices of data collected from scraped sites.

The activity directory holds new information discovered fresh with each scrape.

The public directory serves composite files used by various downstream reports.

Here we depart from title case for names of these unix elements. We use suffix words rather than extensions but actual file names will be mentioned on the pages that explain them.

Merge All Graphs for a single diagram of all pages.

See How Search Works where data collected here is used.