vmcrawl
vmcrawl is a Mastodon-focused version reporting crawler. It is written in Python, with a Postgres database backend. It performs periodic polling of known Mastodon instances.
Crawled Endpoints
vmcrawl may attempt to access the following endpoints of your Fediverse server:
/robots.txt
/.well-known/webfinger
/.well-known/host-meta
/.well-known/nodeinfo
Based on those results, if it's determined that your server is running Mastodon, it may access:
/api/v2/instance
for Mastodon 4.x/api/v1/instance
for Mastodon 3.x
vmcrawl may periodically attempt to access the following endpoint from known deployments:
/api/v1/instance/peers
User Agent
The vmcrawl user agent can be identified as: vmcrawl/0.x (https://docs.vmst.io/vmcrawl)
robots.txt
If you would like to stop vmcrawl from checking your server, you can add the following to your robots.txt
file:
User-agent: vmcrawl
Disallow: /
It will also respect a disallow of all bots/crawlers in this file, or an HTTP 410 reply to any endpoint.
Collected Data
vmcrawl collects the following data only from servers that it identifies as Mastodon, or a related fork:
- Software Version
- Total & Active Users Count
- Administrator Address
- Code Repository
vmcrawl may periodically request a list of peers from a server, but only to discover new servers to request the data outlined above. It does not store specific peer information for any server.
Uncrawled Servers
Some servers may not appear in our results based on many factors:
- Servers may opt-out of
vmcrawl
by blocking our user agent specifically, or that block any crawler - Servers which appear on the IFTAS DNI list, are not interacted with
- Servers under known junk domains, which can be found here, are skipped
- Servers which do not allow unauthenticated access to their public instance API (as shown above) cannot be included
- Servers running versions of Mastodon prior to 3.0 lack the proper APIs, and cannot be included
- Servers which report suspiscious user counts (ex: active users larger than total users) are not included
- Servers which report distorted version information