Sitemap Scrape Statistics

Two of seven charts rendered following plot.ly's simple instructions for getting started. page

We collect various counts while scraping and report them as a text file. json

Here we plot the most recent counts available. plots

We fetch and parse the counts.txt file line by line. Each line is its own json record. github

We render each field as a separate time series using the recently open-sourced plotly.js following the advice provided in their quick-start documentation. plotly

The horizontal axis is in days. The scraper runs every six hours. Our first sample, sample number zero, was recorded on Sep 5, 2015 at 22:30 GMT.

Now with 6-hour growth rates for items and links. github

Now with x-axis dates that snap to weeks. github

Now with runtime range from 20-80 minutes, tolerating timezone and daylight shifts. github

# Recovery

Our implementation running on a retired MacBook completed its last scrape on Tuesday, December 17, 2024 8:58:05 PM. The machine has not completed a boot since.

Most of the services we once provided are now offered at federatedwiki.org.

replace: `http://search.fed.wiki.org:3030` with: `http://search.federatedwiki.org:3030`

We have recovered the historical sitemap scrap statistics from the limping laptop and merge them together with newer data to be viewed together here. plot

pages/sitemap-scrape-statistics