vmcrawl

vmcrawl is a Mastodon-focused version reporting crawler. It is written in Python, with a Postgres database backend. It performs periodic polling of known Mastodon instances.

Screenshot of a Grafana dashboard with vmcrawl data

Crawled Endpoints

vmcrawl may attempt to access the following endpoints of your Fediverse server:

  • /robots.txt
  • /.well-known/webfinger
  • /.well-known/host-meta
  • /.well-known/nodeinfo

Based on those results, if it's determined that your server is running Mastodon, it may access:

  • /api/v2/instance for Mastodon 4.x
  • /api/v1/instance for Mastodon 3.x

vmcrawl may periodically attempt to access the following endpoint from known deployments:

  • /api/v1/instance/peers

User Agent

The vmcrawl user agent can be identified as: vmcrawl/0.x (https://docs.vmst.io/vmcrawl)

robots.txt

If you would like to stop vmcrawl from checking your server, you can add the following to your robots.txt file:

User-agent: vmcrawl
Disallow: /

It will also respect a disallow of all bots/crawlers in this file, or an HTTP 410 reply to any endpoint.

Collected Data

vmcrawl collects the following data only from servers that it identifies as Mastodon, or a related fork:

  • Software Version
  • Total & Active Users Count
  • Administrator Address
  • Code Repository

vmcrawl may periodically request a list of peers from a server, but only to discover new servers to request the data outlined above. It does not store specific peer information for any server.

Uncrawled Servers

Some servers may not appear in our results based on many factors:

  • Servers may opt-out of vmcrawl by blocking our user agent specifically, or that block any crawler
  • Servers which appear on the IFTAS DNI list, are not interacted with
  • Servers under known junk domains, which can be found here, are skipped
  • Servers which do not allow unauthenticated access to their public instance API (as shown above) cannot be included
  • Servers running versions of Mastodon prior to 3.0 lack the proper APIs, and cannot be included
  • Servers which report suspiscious user counts (ex: active users larger than total users) are not included
  • Servers which report distorted version information