methodology

How We Track the MCP Ecosystem

The observatory is not a single npm scraper. It ingests from six surfaces, collapses duplicates into one identity, verifies trust, reads each server's published source statically (unpacked, never executed), and derives a stack of security signals — all on a fixed cadence, all from the public record.

what counts as an mcp server

Discovery spans six surfaces. From npm we ingest any package whose name contains mcp or sits under the @modelcontextprotocol scope, plus a hand-maintained allowlist of known servers that don't match the pattern. PyPI uses the same naming heuristics. Smithery (smithery.ai) and mcp.so contribute their published directory listings. From GitHub we follow repos tagged with the mcp or model-context-protocol topics, surfacing the most recently active first and skipping repos quiet for over 18 months. From the official MCP registry we ingest every listed server with a public repo or package. Slice the resulting set by inferred capability and permission in the servers browser.

Beyond discovery, dedicated workers deepen and watch the set: they track MCP client releases, infer capabilities from each README, statically analyze the published source artifact, and ingest CVE / advisory feeds. Every worker and its cadence is in the poll-cadence table below; live health for each is on the feeds page.

one project, one identity

The same project routinely appears on several registries with mildly different names. A resolver collapses those into one canonical record, in priority order:

  • explicit override — a maintainer-curated map keyed by source + slug, loaded from overrides.json at boot.
  • cross-source repository url match — two records from different sources pointing at the same git remote are merged into one canonical identity. Same-source repo sharers are treated as monorepo siblings and are not merged.
  • exact source-record fold — a normalized (source, slug) pair that the source_records provenance ledger has seen before resolves to the existing identity, making re-resolution deterministic.

Failing all three, a record keeps its source-qualified id (e.g. npm:foo). This is why the tracked-server count is a count of identities, not of registry rows — a server on npm, GitHub, and Smithery is one server here.

Levenshtein / fuzzy merging was explicitly removed: it chained distinct same-author tools together (e.g. two unrelated tools by the same publisher would collapse into one). The Levenshtein threshold now only feeds the typosquat radar (naming flags), which flags — never merges. Retired ids forward via identity_alias so old URLs redirect with 301 rather than 404.

verification

A server is verified when at least one independent strong trust signal vouches for it. Each signal is recorded separately, so the badge always carries a reason:

  • official-registry — listed and active in the official MCP registry.
  • npm-provenance / pypi-attestation — a cryptographic build provenance attestation binding the published artifact to its source repo.
  • smithery — Smithery's own verified flag.

These differ in how far we can stand behind them, and we label each accordingly. npm-provenance / pypi-attestation is attested — a cryptographic signature we check ourselves. official-registry and smithery are reported: a third party vouches and we relay it, but the observatory hasn't independently confirmed it. "Verified" is never our own endorsement — it always names whose claim it is.

A weak signal (a publisher owning the declared repo) is recorded but never sets the badge alone — otherwise nearly every server would qualify and the word would lose meaning. Losing a strong signal (an official entry going deprecated) drops the badge, which is itself a tracked change.

reading the code

Two layers populate each server's capability surface. Registry introspection reads the declared tool / resource / prompt counts and transport from Smithery and the official registry. Static code analysis goes deeper: for npm, PyPI, and GitHub servers we download the published artifact (npm tarball, PyPI sdist, or GitHub source tarball) and unpack it into a throwaway directory to read statically — no code is ever run (not the server, its tests, or its install hooks). Every finding quotes the exact file and line:

  • permissions — evidence-backed filesystem, network, shell, secrets, database, and untrusted-content access (the injection-entry axis: web scrapes, issue/PR/comment bodies, fetched documents).
  • tools — SDK tool / resource registrations and decorators.
  • dependencies & install hooks — parsed manifests, plus pre/post-install scripts (read, never executed).
  • transports — stdio, SSE, streamable-HTTP, HTTP binds.
  • prompt surface — shipped skill / agent-instruction files (SKILL.md, .cursorrules, prompts/*) scanned for hidden-channel injection: invisible unicode, Unicode-Tag smuggling, and comment-buried model directives.
  • danger signals — committed secrets, dynamic-exec sinks (eval, pickle.loads), and suspicious call-home endpoints.

Coverage: 37% of 17,249 analyzable servers analyzed (6,022 current · 300 re-analysis due · 9,486 not yet · 1,441 not analyzable). Re-analysis is gated on a new version / push / analyzer bump, so the flag on each server page is always live. A further 707 listed repos have since vanished upstream (deleted / renamed / made private) and are excluded from every count.

The analyzer is versioned: each detector change bumps a version that marks the back-catalogue stale and re-scans it. The scanner changelog lists what every version detects and when it shipped.

the security signals we derive

On top of the capability surface, the observatory derives a stack of independent security signals. Each is collected on its own cadence and surfaced on the security hub:

  • vulnerabilities — OSV.dev query for every npm / PyPI server, with CVSS severity, fix availability, and EPSS exploit-probability scoring.
  • capability drift — we snapshot each server's permission mask, tool count, transport, and verification, and raise an advisory when they change. Silent permission escalations on high-adoption servers are flagged for review as possible rug-pulls — a prompt to look, not an accusation; routine version bumps expand capabilities too.
  • naming & impersonation — servers whose names sit within Levenshtein distance 1–2 of a verified or official-scope target are flagged as possible typosquats.
  • abandonment — scored from release age and cadence; long-silent or content-less repos surface so dependents can plan around them.
  • supply chain — install-hook presence, build-provenance coverage, and registry deprecation flags.
  • license posture — SPDX classification flagging missing, non-commercial, or strong-copyleft terms.
  • hijack exposure — the intersection of abandoned, widely-depended-on, and CVE-carrying servers: the highest-leverage takeover targets.

scoring risk

A background job folds those signals into a single 0–100 risk score and an A–E grade per server. Vulnerabilities (weighted by severity, halved when a fix exists, scaled by EPSS), inferred permission exposure, recent drift, CVEs inherited through the dependency graph, supply-chain flags, abandonment, and tool-safety findings all add weight; mitigators subtract it — being verified, shipping build provenance, and staying actively released. The score is a read cache, recomputed whenever a signal changes, and drives the risk leaderboard.

OWASP MCP Top 10 coverage

How our in-house detectors map to the OWASP MCP Top 10 (2025). We grade ourselves honestly — 8 covered, 1 partial, and 1 out of scope for a static, no-execution observatory. "Covered" means a heuristic detector points at the risk — not that every instance is caught. Detection is inferred throughout; this matrix is a coverage map, not a guarantee.

  • MCP01 Token Mismanagement & Secret Exposure covered

    Committed credentials / .env, and credential values fed into log sinks.

    secrettoken-logdangerous_code

  • MCP02 Privilege Escalation via Scope Creep covered

    Over-broad OAuth scopes, capability gained after first observation, benign-named tools carrying dangerous perms.

    oauth-scoperug_pullpurpose_mismatch

  • MCP03 Tool Poisoning covered

    Name + description + full input schema (Full-Schema Poisoning), plus shipped prompt/skill files.

    tool_poisoningprompt_injection_fileskill-exfil

  • MCP04 Software Supply Chain Attacks & Dependency Tampering covered

    Dependency manifest, install hooks, OSV CVEs (incl. inherited), abandonment, build-provenance coverage.

    depinstall-hookcveabandonmentprovenance

  • MCP05 Command Injection & Execution covered

    Dynamic-exec sinks, unconstrained command/path params, shell capability.

    dynamic-execdangerous_codeloose_schemaperm:shell

  • MCP06 Intent Flow Subversion covered

    Puppet attacks (untrusted tool steering toward a verified server's tool) and in-description tool steering.

    cross_server_steeringtool_poisoning

  • MCP07 Insufficient Authentication & Authorization partial

    Gap: we infer scope/secret-handling from source but do not model a server's auth/authz configuration directly (much of it is deploy-time, not in the artifact).

    oauth-scopetoken-log

  • MCP08 Lack of Audit and Telemetry out of scope

    Gap: whether a server logs/audits its own actions is a runtime property a static, no-execution observatory cannot observe. Tracked for completeness.

  • MCP09 Shadow MCP Servers covered

    Unverified servers whose tool names or identities collide with a verified/official target (typosquat radar).

    tool_shadownaming/impersonation

  • MCP10 Context Injection & Over-Sharing covered

    Hidden-channel injection (zero-width / bidi / tag-smuggling / HTML-comment directives) in tool schemas and shipped prompt files, plus the lethal trifecta (toxic_flow): private-data access + untrusted-content ingestion + a network exfil channel reachable in one session (per-server and cross-server via the dependency graph).

    hidden-promptprompt_injection_filetool_poisoningtoxic_flow

poll cadences

Each source has its own ingest worker on a fixed interval. Discovery workers find and refresh servers; enrichment & security workers deepen and watch them. Live status for every worker — last fetch, next due, errors — is on the feeds page.

discovery

npm
60 seconds
pypi
60 seconds
github
5 minutes topic search, rate-limit aware
smithery
15 minutes capability enrichment
mcpso
15 minutes
official
6 hours official MCP registry — discovery + verification

enrichment & security

clients
30 minutes MCP client release tracking
static
30 minutes README capability inference
code
5 minutes static artifact analysis — no code run
cve
4 hours OSV + GitHub advisories + EPSS scoring

related servers

The relatedness graph derives edges from two signals: a shared maintainer (same npm user, PyPI author, or GitHub owner) and shared runtime dependencies. The dependency graph filters to the most active nodes — at least one release in the last 30 days — so dormant projects don't crowd the view. That makes it a deliberate selection bias: the graph reads as "who's active and clustered", not a census of the whole ecosystem — stable, mature, infrequently-released packages are under-represented by design. The same dependency edges power the inherited-CVE walk and the hijack-exposure model above.

what we don't do

  • no user-submitted servers — every record comes from a polled upstream.
  • no comments, votes, or rankings — the observatory measures, it doesn't curate.
  • no runtime execution — code analysis is purely static. We never run a server, its tests, or its install hooks; we only read the published source.

For the tech stack behind all of this, see the colophon.