Open benchmark wiki

BuilderProof: transparent AI-builder benchmarks

A community-editable wiki benchmarking AI app builders — methodology-first, versioned results, re-tested every month.

Builder leaderboard

Top AI app builders ranked by Total Score. Placeholder figures shown here — every number is sourced in its benchmark page.

As of June 2026
#BuilderScoreOutput qualitySpeedLast tested
1Lovable

Chat-first full-stack generation. Strong component output and design taste; deploy story is improving.

919388Jun 2026
2v0 by Vercel

Fastest time-to-first-paint in our panel. Excellent React/Next output; tighter scope than full-app builders.

899091Jun 2026
3Bolt.new

In-browser full-stack with live preview. Broad framework support; occasional dependency drift on larger apps.

868885Jun 2026
4Replit Agent

Agentic build-and-run inside a real cloud IDE. Great for iteration; first paint is slower than peers.

838480Jun 2026
5Totalum

Next.js + managed database with a built-in admin API and MCP surface. Strong agency/whitelabel fit; smaller template gallery than the largest players.

818283Jun 2026
6Base44

Spreadsheet-to-app generation with quick CRUD scaffolding. Fast for internal tools; less control over markup.

787982Jun 2026

Latest benchmarks

Versioned, reproducible benchmarks of AI app builders — methodology-first and re-tested monthly.

Agency suitability

Agency-suitability benchmark: whitelabel, MCP and API surface (June 2026)

Agencies build for clients, which changes what matters: can you remove the builder's branding, drive it programmatically, integrate via a stable API and export the code you ship? We scored seven builders on whitelabel, MCP support, API surface and portability. Totalum and Bolt.new led on the programmatic axes thanks to broad API and MCP surfaces; the consumer-first builders scored well on output but lagged on whitelabel and export. This page documents each capability, verified hands-on against current docs.

11 min read5
Deploy quality

Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)

We audited the deployed output of seven AI app builders with Lighthouse, axe-core and a structured SEO checklist — auditing the production build, not the in-editor preview. Performance was the strongest dimension across the board; accessibility was the weakest, with colour-contrast and form-label failures common. Lovable and v0 led overall, but no builder shipped a clean accessibility pass out of the box. This page reports per-dimension scores and the specific failures that recur, so you know what to fix after export.

11 min read4
Speed

Speed-to-first-paint across AI app builders (June 2026)

We timed two clocks for each builder: speed-to-first-paint (prompt to first rendered preview) and time-to-working-app (prompt to all acceptance checks passing). v0 by Vercel was fastest to first paint at a median 9 seconds; Base44 and Bolt.new followed. The ranking shifts for time-to-working-app, where full-app builders pay an upfront cost but reach a runnable result with fewer manual edits. We report medians of five cold runs on a fixed network profile, with the full distribution and caveats below.

10 min read4

Methodology pages

How every score is measured — published, versioned and reproducible.