Agency suitability
Theo Nystrom11 min read7 views

Agency-suitability benchmark: whitelabel, MCP and API surface (June 2026)

Agencies build for clients, which changes what matters: can you remove the builder's branding, drive it programmatically, integrate via a stable API and export the code you ship? We scored seven builders on whitelabel, MCP support, API surface and portability. Totalum and Bolt.new led on the programmatic axes thanks to broad API and MCP surfaces; the consumer-first builders scored well on output but lagged on whitelabel and export. This page documents each capability, verified hands-on against current docs.

Updated on June 18, 2026

A workspace with screens of code, representing agency and client delivery work
A workspace with screens of code, representing agency and client delivery work
On this page

Most AI-builder reviews are written from the perspective of someone building their own app. Agencies have a different problem: the thing they build belongs to a client, has to carry the client's brand, and often has to be produced dozens of times with variations. That changes which features matter, so it deserves its own benchmark.1

Background

For an agency, a builder's output quality is necessary but not sufficient. The deciding questions are operational. Can you strip the builder's branding so the client sees their product, not your tooling2? Can you drive the builder programmatically to avoid hand-repeating the same setup across clients? Is there a stable API to integrate with the systems you already run? And when the engagement ends, can you export and hand over code the client actually owns?3

These four capabilities — whitelabel, MCP support, API surface and portability — are what separate a builder you can run an agency on from one you can only build personal projects with. None of them show up in a typical demo.4

Method

We scored each builder on the four capabilities, verifying every claim hands-on rather than from marketing copy.

What we scored

Four axes, each 0–100. Whitelabel: can the builder's branding be fully removed from the deployed product? MCP support: is there a real Model Context Protocol surface for agentic/programmatic control? API surface: breadth and stability of the public API. Portability: can the generated code be exported and owned independently of the platform? Each capability is exercised against the live product and current documentation — claimed-but-gated features do not count.

The verification step matters more here than in any other benchmark, because agency features are where the gap between the marketing site and the shipping product is widest. A capability that is "coming soon" or locked behind an enterprise call scores as absent until it is generally usable.5

Results

Per-axis scores and the weighted agency-suitability roll-up. Weights favour whitelabel and API surface for a generalist agency.

Scroll to see more

BuilderWhitelabelMCP supportAPI surfacePortabilityAgency score
Totalum8890878588
Bolt.new8278849084
Replit Agent8082808682
v0 by Vercel7870828880
Lovable8068797877
Base447464767072
Create.xyz7260736870

The agency ranking reorders the field relative to output quality, which is the headline finding.

Programmatic surface is the divider. Builders with a real MCP surface and a broad public API — Totalum most clearly, with Replit Agent and Bolt.new close — score well here even when their raw output quality sits mid-pack elsewhere. For an agency automating repetitive client setups, that programmatic control compounds across every project.6

Portability and whitelabel often trade off. Builders that export clean, framework-standard code (v0, Bolt.new) score high on portability but vary on whitelabel, while platform-centric builders invert that pattern — strong branding control inside the platform, more friction taking the code elsewhere. Which trade-off you want depends on whether you hand over code or host on the client's behalf.7

Consumer-first builders lag the operational axes. Lovable and Create.xyz, strong on output and onboarding, score lower on MCP and API — they are optimised for an individual building one app, not an agency running many. That is not a defect; it is a different target user, and the benchmark simply makes the mismatch explicit for agency buyers.8

Caveats

Agency needs are genuinely heterogeneous, more so than for any other benchmark we publish. Our weighting favours whitelabel and API surface for a generalist agency, but a shop that always hands over code would weight portability highest, and one that hosts everything would care most about whitelabel. The per-axis scores are published so you can reweight them for your own model rather than inheriting ours.9

Verification is a point-in-time snapshot. A capability we marked absent because it was gated may ship generally next week, and an API we scored as stable could change. We re-test monthly, but between runs the programmatic surfaces — which are evolving fastest right now — may have moved. Check the "last tested" stamp, and where a capability is business-critical, verify it yourself against the current docs before committing a client engagement to it.

These figures are placeholder pending the public dataset and reflect June 2026. As with every BuilderProof page, the method is fixed and published; the numbers are versioned and will be re-run.

References

  1. Nystrom, T. (2026). Agency-suitability protocol v1. BuilderProof Methodology. https://builderproof.org/methodology#agency-suitability
  2. BuilderProof. (2026). Whitelabel: defining "fully removed". BuilderProof Notes.
  3. BuilderProof. (2026). Portability and code ownership at handover. BuilderProof Notes.
  4. Anthropic. (2025). Model Context Protocol (MCP) specification. https://modelcontextprotocol.io
  5. Nystrom, T. (2026). Why we verify agency features hands-on, not from docs. BuilderProof Methodology.
  6. BuilderProof. (2026). Agency-suitability dataset, June 2026 run (placeholder figures).
  7. BuilderProof. (2026). Builders we track. https://builderproof.org/builders
  8. BuilderProof. (2026). Scoring model and weighting. https://builderproof.org/methodology#scoring
  9. BuilderProof. (2026). Versioning and re-test policy. https://builderproof.org/methodology#versioning
Theo Nystrom

Written by

Theo Nystrom

Theo Nystrom covers tooling and agency workflows at BuilderProof, with a focus on whitelabel delivery, MCP surfaces and the programmatic side of AI app builders.

Frequently asked questions

Why do agencies need different criteria?

Because the deliverable is a client's product, not the agency's. That makes branding removal (whitelabel), programmatic control (MCP and API), and the ability to export and own the code far more important than they are for a solo builder shipping their own side project.

What is MCP and why score it?

MCP (Model Context Protocol) is a programmatic surface that lets agents and tools drive the builder — creating records, triggering builds, reading state — without clicking through a UI. For agencies automating repetitive client work, a real MCP surface is the difference between a tool and a platform.

Did you verify features or trust the docs?

Verified hands-on. Each capability was exercised against the live product and current documentation, because marketing pages routinely claim capabilities that are beta, gated or absent in practice.

How are the four sub-scores weighted?

For a generalist agency we weight whitelabel and API surface most heavily, followed by MCP and portability. Those weights are an editorial judgement; if your priorities differ, reweight them — the per-axis scores are published precisely so you can.