ToolShed

The Problem

Agents are capable.
The infrastructure isn't.

AI agents are increasingly capable, but when they need specialized tools — fraud detection, geospatial analysis, compliance checking — they hit a wall.

No Discovery

Tools are hardcoded or manually configured. There's no Yellow Pages for agent capabilities.

No Payment

There's no machine-native way to pay for a tool call. It's all API keys, billing dashboards, and enterprise contracts.

No Reputation

An agent can't know which tool provider is reliable, fast, or accurate without a human pre-vetting everything.

No Portability

Switch your MCP server or API provider and you're rewiring everything from scratch.

No Audit Trail

When an agent makes a decision based on a tool's output, there's no versioned, reproducible record of what happened.

SaaS pricing models assume a human buyer. Per-seat licenses, annual contracts, "schedule a demo" funnels — the entire commercial infrastructure was designed for humans. An agent can't sit through a sales call.

The protocol debate (MCP vs. skills vs. REST vs. gRPC) is a distraction. The real gap is: how do agents find, trust, pay for, and audit tool usage across organizational boundaries?

Inspirations

Built on the shoulders of four big ideas

Data Model AT Protocol / "A Social Filesystem"

Dan Abramov's article reframes AT Protocol as a distributed filesystem for social computing. The ToolShed borrows its design patterns — but doesn't depend on Bluesky's infrastructure.

Everything is a record — JSON records in repos, organized into namespaced collections
Lexicons — machine-readable schemas for record formats, published by anyone
DIDs — decentralized identifiers for portable, self-sovereign identity
"Third party is first party" — anyone builds discovery, reputation, or alt registries over the same data

"Our memories, our thoughts, our designs should outlive the software we used to create them." Replace "software" with "agent frameworks" and the same principle applies to tools.

Architecture Gas Town (Steve Yegge)

A multi-agent orchestration system for coordinating 20-30+ Claude Code agents working simultaneously.

Dolt as the backbone — every piece of agent state lives in Dolt with full commit history
Mol Mall (planned) — a marketplace for reusable agent workflow templates, like npm for agents
Federation — multiple instances reference each other via Dolt remotes
Agent identity — every operation is attributed with actor information and provenance

Storage Dolt DB

A SQL database you can fork, clone, branch, merge, push, and pull — just like Git. MySQL-compatible with full version history.

Feature	Application in ToolShed
`dolt_history_*`	Full row-level history of every tool registration and invocation
`AS OF` queries	"What tools were available at time T? What schema was version N?"
`dolt_diff()`	"What changed between schema versions?"
Branch & merge	A/B test new pricing, preview schema changes before publishing
`dolt clone / push`	Distribute the registry, federate across organizations
Signed commits	Tamper-evident audit trail

Versioning Unison Programming Language

A language where every definition is identified by a hash of its syntax tree, not by its name. Names are just metadata — pointers to hashes.

No version conflicts — two versions of a function are just two different hashes that coexist
No builds break — dependencies are pinned by hash, not by name
Immutable definitions — once a definition exists at a hash, it never changes
Names are pointers — mutable metadata pointing to immutable content

“What we now think of as a dependency conflict is instead just a situation where there are multiple terms or types that serve a similar purpose.” The ToolShed applies this to tool schemas.

The On-Ramp

It's just another MCP server

The ToolShed doesn't change the pattern developers already use — it is that pattern. Add one config entry and your agent has access to every tool in the registry. No new paradigm, no behavioral shift.

The Meta-Tools

The ToolShed exposes a small set of meta-tools:

toolshed_search

Find tools by capability, price, latency, reputation

toolshed_invoke

Call a tool — handles payment, schema validation, logging

toolshed_reputation

Check reliability, quality scores, SLA compliance

toolshed_review

Submit a proof-of-use upvote after using a tool

The Flow

Agent has a task: "analyze this transaction for fraud"

Agent doesn't have a fraud tool — calls toolshed_search({ capabilities: ["fraud"], max_price: 0.01 })

ToolShed returns ranked results from the Dolt registry

Agent picks one, calls toolshed_invoke({ tool: "fraud-detection-v3@acme.com", input: {...} })

ToolShed gateway handles Stripe payment, calls the endpoint, validates response

Agent gets the result, uses it — then calls toolshed_review with quality signal

An agent that already has a hardcoded fraud tool will never search for fraud — it'll just use what it has. But the first time it needs something it doesn't have, the ToolShed is right there. Discovery happens organically at the edges.

Architecture

Three layers. Clean separation.

Layer 1 — Registry

What exists, who provides it, what's the contract

Dolt-backed. Tool records with schema, pricing, endpoint, payment methods, and SLA. Company identity verified via domain ownership or DIDs. Capability search and discovery.

Tool RecordsSchemasPricingDiscoveryIdentity

Layer 2 — Gateway

Invoke the tool, handle payment, verify response

Thin routing + auth + metering. Protocol translation (MCP, REST, gRPC — doesn't care). Payment negotiation, usage metering via Stripe, and response validation against schema.

Protocol TranslationPaymentMeteringValidation

Layer 3 — Ledger

Who called what, when, what did it cost

Dolt-backed audit trail. Every invocation is a commit. Time-travel, diff, reproduce any agent decision. Settlement and reconciliation records.

Invocation LogsTime TravelDiffsSettlement

Protocol Agnosticism

The MCP-vs-skills debate is a false choice. The invocation method is just a field in the tool record:

invocation field

{
  "protocol": "mcp",       // or "rest", "grpc", "graphql", "skill"
  "endpoint": "https://tools.acme.com/mcp",
  "tool_name": "fraud_check"
}

It's like how DNS doesn't care what protocol you speak once you've resolved the address. The schema is the contract; the protocol is a transport detail.

Data Model

Everything is records

Every entity in the system is a record — a JSON document with a schema. No special servers for payment, reputation, or discovery. It's all records in the Dolt registry, with materialized views computed by whoever needs them.

The Tool Record

A company registers a tool in two parts: an immutable definition (the contract) and a mutable listing (the metadata). The registry hashes the definition to produce a content_hash — the tool's true identity.

definition (immutable, keyed by content_hash)

{
  "provider": {
    "domain": "acme.com",
    "did": "did:plc:acme-corp"
  },

  "schema": {
    "input": {
      "transaction_id": { "type": "string" },
      "amount": { "type": "number" },
      "merchant_category": { "type": "string" }
    },
    "output": {
      "risk_score": { "type": "number", "min": 0, "max": 1 },
      "flags": { "type": "array", "items": { "type": "string" } }
    }
  },

  "invocation": {
    "protocol": "mcp",
    "endpoint": "https://tools.acme.com/mcp",
    "tool_name": "fraud_check"
  },

  "capabilities": ["fraud", "ml", "financial", "real-time"],
  "createdAt": "2026-03-01T00:00:00Z"
}

listing (mutable, points to definition)

{
  "definition_hash": "sha256:a1b2c3d4e5f6...",

  "name": "Fraud Detection",
  "version_label": "3.1.0",
  "description": "Real-time transaction fraud scoring with ML",

  "pricing": { "model": "per_call", "price": 0.005, "currency": "usd" },

  "sla": { "p99_latency_ms": 500, "uptime": "99.9%" },
  "updatedAt": "2026-03-01T00:00:00Z"
}

The Upvote Record (Proof of Use)

When an agent uses a tool and gets good results, it creates an upvote — a quality signal with proof that the agent actually paid for and used the tool:

com.toolshed.tool.upvote/5kqw3x @ agent-company-xyz.com

{
  "subject": "com.toolshed.tool/fraud-detection-v3@acme.com",
  "proof": {
    "payment_method": "stripe",
    "stripe_invoice_id": "in_1abc123def456",
    "invocation_hash": "sha256:deadbeef...",
    "ledger_commit": "dolt:76qerj11u38il8rb..."
  },
  "evaluation": {
    "quality": 5,
    "latency_met_sla": true,
    "schema_valid": true,
    "useful": true
  }
}

Summary of Record Types

Tool Definition

tool_definitions (by content hash)

Immutable contract: schema, invocation, capabilities

Tool Listing

com.toolshed.tool/

Mutable metadata: name, pricing, SLA — points to a definition

Schema / Lexicon

com.toolshed.lexicon/

Machine-readable input/output contract

Invocation Log

com.toolshed.tool.invocation/

Record of each call: input hash, output hash, timing

Upvote

com.toolshed.tool.upvote/

Quality signal with proof-of-use

Versioning

Content-addressed tools

Inspired by the Unison programming language, tool definitions are identified by a hash of their content, not by a name or version number. Names and version labels are mutable metadata that point to immutable hashes.

No breaking changes

New schema → new hash → new definition. Old hash still exists. Agents pinned to the old hash keep working.

No version conflicts

Two definitions with different schemas are different hashes. They coexist. No coordination needed.

Agents pin by hash

After a successful call, an agent stores sha256:abc123 — immutable and precise. Names can change; the hash is stable.

Deduplication

Two providers with the same schema and contract share a content hash. Discovery surfaces both providers for one definition.

No deprecation flags

Old hashes just exist. Stale reputation naturally pushes agents toward newer definitions.

Dolt makes it seamless

tool_definitions is append-only. dolt_history_tool_listings tracks every pointer change. AS OF queries reproduce any point in time.

Payment

Payment is just a field on the record

No special payment subsystem. The provider declares "send cash this way" as part of their tool registration. The agent reads the payment methods, picks one it supports, pays, and calls the tool.

Payment Methods

Stripe Metered (MVP) Free / Open Source Lightning (L402) Cashu Ecash Anyone Can Extend

payment field — extensible via lexicons

// MVP: Stripe metered billing
"payment": { "stripe": { "payment_link": "https://buy.stripe.com/...", "meter_id": "mtr_abc123" } }

// Open source / community tools
"payment": { "free": {} }

Extensible via Lexicons

New payment methods don't require protocol changes. Anyone publishes a new lexicon:

payment lexicon namespace examples

com.toolshed.defs#paymentStripe       ← MVP
com.toolshed.defs#paymentFree         ← open source
com.toolshed.defs#paymentLightning    ← future: micropayments
com.toolshed.defs#paymentCashu        ← future: bearer tokens
io.fedi.defs#paymentFedimint         ← community-defined
xyz.newrail.defs#paymentWhatever     ← anyone can extend

Validate on read. If a tool lists a payment method the agent doesn't understand, the agent skips it and picks one it does. If it can't pay at all, it moves on to the next tool.

Reputation

Distributed. Derived. Ungameable.

Reputation is not stored on the tool. It's derived — a materialized view computed from all upvote records in the Dolt registry. Nobody owns the score. Nobody can inflate it without paying for real usage. Anybody can compute it.

reputation computation (pseudocode)

-- REPUTATION for acme's fraud-detection-v3:

SELECT AVG(quality_score), COUNT(*), COUNT(DISTINCT caller_domain)
FROM upvotes
WHERE tool_id = 'com.toolshed.tool/fraud-detection-v3@acme.com'
  AND proof_is_valid = true    -- payment receipt checks out
  AND invocation_exists = true -- hash found in ledger

-- Nobody owns this score.
-- Nobody can inflate it without paying for real usage.
-- Anybody can compute it (clone the registry, run the query).

Anti-Gaming Properties

Attack	Why It Fails
Fake upvotes (sybil)	Proof-of-use required. No valid payment receipt = unverifiable upvote.
Self-upvoting	Provider pays themselves real money. `caller_did == provider_did` — trivial to filter.
Wash trading	Detectable via diversity-of-upvoters weighting. PageRank-style graph analysis.
Buying upvotes	Requires real usage and real payment — the tool still has to deliver quality.
Deleting bad reviews	Impossible. Upvotes live in the reviewer's repo, not the provider's.

Discovery Algorithms

Because the registry is a Dolt database anyone can clone, anyone can build discovery algorithms over the data:

Trending ToolsMost upvotes this week

Most ReliableHighest SLA compliance

My Network TrustsUpvoted by companies you trust

Best for FinanceFiltered by context.task_type

Budget PicksHighest quality-to-price ratio

Clone the Dolt registry, write your own ranking SQL, expose it as an API. Competition between discovery algorithms improves quality for everyone.

Storage

The Dolt Backbone

Every table gets Git-style version control for free. Time-travel queries, schema diffs, branch-and-merge for tool configurations, and a tamper-evident audit trail.

registry schema (SQL)

-- Tool definitions (immutable, content-addressed)
-- Append-only: rows are never updated or deleted
CREATE TABLE tool_definitions (
    content_hash      VARCHAR(64) PRIMARY KEY,    -- sha256 of (schema + invocation + provider)
    provider_domain   VARCHAR(255) NOT NULL,
    provider_did      VARCHAR(255),                -- Tier 2, nullable
    schema_json       JSON NOT NULL,
    invocation_json   JSON NOT NULL,
    capabilities      JSON,
    created_at        DATETIME
);

-- Tool listings (mutable, human-readable metadata)
-- Points to a tool_definition via content_hash
CREATE TABLE tool_listings (
    id                VARCHAR(255) PRIMARY KEY,
    definition_hash   VARCHAR(64) NOT NULL,       -- points to tool_definitions.content_hash
    provider_domain   VARCHAR(255) NOT NULL,
    provider_did      VARCHAR(255),                -- Tier 2, nullable
    name              VARCHAR(255) NOT NULL,
    version_label     VARCHAR(32),                 -- cosmetic, like a Git tag
    description       TEXT,
    pricing_json      JSON NOT NULL,
    payment_json      JSON NOT NULL,
    sla_json          JSON,
    capabilities      JSON,
    created_at        DATETIME,
    updated_at        DATETIME,
    FOREIGN KEY (definition_hash) REFERENCES tool_definitions(content_hash)
);

-- Upvotes (proof-of-use quality signals)
CREATE TABLE upvotes (
    id              VARCHAR(255) PRIMARY KEY,
    tool_id         VARCHAR(255) NOT NULL,
    caller_domain   VARCHAR(255) NOT NULL,
    quality_score   INT,
    proof_json      JSON NOT NULL,
    context_json    JSON,
    created_at      DATETIME,
    FOREIGN KEY (tool_id) REFERENCES tool_listings(id)
);

-- Invocation ledger
CREATE TABLE invocations (
    id              VARCHAR(255) PRIMARY KEY,
    tool_id           VARCHAR(255) NOT NULL,
    definition_hash   VARCHAR(64) NOT NULL,       -- exact definition called (immutable pin)
    input_hash        VARCHAR(64) NOT NULL,
    output_hash     VARCHAR(64),
    payment_proof   VARCHAR(500),
    latency_ms      INT,
    created_at      DATETIME
);

-- Reputation (materialized view, recomputed periodically)
CREATE TABLE reputation (
    tool_id           VARCHAR(255) PRIMARY KEY,
    verified_upvotes  INT DEFAULT 0,
    avg_quality       DECIMAL(3,2),
    unique_callers    INT DEFAULT 0,
    computed_at       DATETIME
);

End to End

How it works

For a Company Listing a Tool

Company already has their tool running (API, MCP server, whatever)

They submit a tool record (JSON) to the Dolt registry — schema, endpoint, pricing, payment

They verify domain ownership via DNS TXT record or .well-known

That's it. No SDK. No middleware. No infrastructure changes.

For an Agent Using a Tool

Agent needs fraud detection for a financial analysis task

Queries the Dolt-backed registry: capabilities LIKE '%fraud%' ORDER BY reputation DESC

Gets ranked tools, validates schema matches its needs

Reads the payment field, calls the tool — gateway reports usage to Stripe

Gets result, validates against schema, creates invocation + upvote records

The Feedback Loop

Virtuous Cycle

Better tools get more upvotes→

More visibility in search→

More agents discover & pay→

More revenue for provider→

Provider invests in quality→

Quality → Visibility → Usage → Revenue → Quality ∞

Adoption

Start simple. Upgrade when ready.

Companies start with the simplest possible on-ramp and upgrade when the value is proven.

Tier 1 — MVP

Dolt Registry

Identity: Domain ownership (DNS TXT or .well-known)
Payment: Stripe metered billing (USD)
Discovery: SQL queries against the Dolt registry
Reputation: Proof-of-use upvotes in Dolt
Requires: A JSON file and a domain you own. That's it.

Tier 2 — Upgrade

AT Protocol Integration

Identity: DID anchored to domain (did:plc → @acme.com)
Payment: Stripe + Lightning, Cashu
Discovery: ToolShed feeds + cross-network at:// queries
Reputation: Upvotes in caller's own repo — portable across registries
Migration: All existing history and reputation carries over

Both tiers write to the same Dolt tables. A Tier 1 tool and a Tier 2 tool sit side by side. The only difference is whether at_uri and provider_did are populated. Agents don't care — they see the same schema, same pricing, same endpoint.

Boundaries

What this is not

Not a runtime — Tools run on the provider's infrastructure. The ToolShed doesn't execute anything.
Not an MCP replacement — MCP, REST, gRPC are invocation protocols. The ToolShed is discovery, payment, and reputation. They're complementary.
Not a blockchain — Dolt has Git semantics. It's versioned, auditable, tamper-evident — but not a distributed consensus system.
Not a Bluesky app — Uses AT Protocol's design patterns but doesn't depend on Bluesky infrastructure or social graph.
Not centralized — The registry is a Dolt DB anyone can clone. Anyone can run their own node, compute reputation, or build discovery.

You Know This	ToolShed Equivalent
DNS	Tool discovery — resolve a capability to an endpoint
TLS Certificates	Domain verification / DIDs — prove your identity
npm Registry	Tool registry — search, install, version
App Store Ratings	Reputation — but only from verified purchasers
Stripe Connect	Payment — provider declares how to be paid
Google PageRank	Discovery algorithms — anyone can rank differently
Git + GitHub	Dolt + DoltHub — version control for registry & ledger

Open Questions

Things we're still figuring out

Lexicon Governance

Who defines com.toolshed.* lexicons? A foundation? A GitHub org? Follow AT Protocol norms — publish early, evolve carefully.

Agent Wallets

V1: Stripe customer ID + spending cap. V2: per-agent budgets. V3: prepaid balances. V4: autonomous agents with own funds (Lightning, Cashu).

Schema Evolution

When tool input/output changes, how do we handle backward compatibility? Follow lexicon rules — additive only, breaking changes = new name.

Privacy

Invocation logs contain sensitive data. Dolt ledger could be local-only, with only upvotes going to the shared registry.

Dispute Resolution

What if a tool takes payment but returns garbage? Proof-of-use creates a public record. Low quality + valid payment = strong signal. Formal resolution is TBD.

Relay Economics

Who runs relays that index tool records? Same model as AT Protocol relays — some public, some private, some subsidized by providers.