vmcrawl
vmcrawl is a Mastodon-focused version of a reporting crawler. It is written in Python, with a SQLite database backend. It performs periodic polling of known Mastodon instances.
Crawled Endpoints
vmcrawl may attempt to access the following endpoints of your Fediverse server:
/robots.txt
/.well-known/webfinger
/.well-known/host-meta
/.well-known/nodeinfo
(and alternative locations referenced in this document)
Based on those results, if it's determined that your server is running Mastodon, it may contact:
/api/v2/instance
for Mastodon 4.x/api/v1/instance
for Mastodon 3.x/about
for Mastodon 3.x
vmcrawl may periodically attempt to access the following endpoint from known deployments:
/api/v1/instance/peers
User Agent
The vmcrawl user agent can be identified as: vmcrawl/0.1 (https://docs.vmst.io/projects/crawler)
robots.txt
If you would like to stop vmcrawl from checking your instance, you can add the following to your robots.txt
file:
User-agent: vmcrawl
Disallow: /
It will also respect a disallow of all bots/crawlers in this file, or an HTTP 410 reply to this file.
Collected Data
vmcrawl collects the following data only from instances that it identifies as Mastodon, or a related fork:
- Domain Name
- Software Version
- Total Users Count
- Active Users Count
- Administrator Email Address
- Source Code Repository
vmcrawl may periodically request a list of peer instances from a server, but only to discover new servers to request the data outlined above. It does not store peer information for any server.