Speed-to-first-paint across AI app builders (June 2026)
We timed two clocks for each builder: speed-to-first-paint (prompt to first rendered preview) and time-to-working-app (prompt to all acceptance checks passing). v0 by Vercel was fastest to first paint at a median 9 seconds; Base44 and Bolt.new followed. The ranking shifts for time-to-working-app, where full-app builders pay an upfront cost but reach a runnable result with fewer manual edits. We report medians of five cold runs on a fixed network profile, with the full distribution and caveats below.
Updated on June 18, 2026
On this page
Speed is the metric people feel before they can articulate it. A builder that shows a working preview in ten seconds feels categorically different from one that takes a minute, even when the slower tool produces better code. This benchmark separates the feeling into two measurable clocks.1
Background
"Fast" is ambiguous. There is the speed at which a builder shows you something — a first paint that tells you it understood the prompt — and the speed at which it gives you something you can actually ship. These are different quantities, and conflating them produces misleading leaderboards.2
So we measure both. Speed-to-first-paint rewards responsiveness and is what most demos implicitly show off. Time-to-working-app rewards getting to a correct, runnable result and is what actually governs how much of your afternoon a builder consumes. The two often disagree, and the disagreement is the interesting part.3
Method
Each builder ran the same lightweight brief — a two-page app with one form and one list view — five times from a cold session, on a fixed network profile, with no warm cache.
Two stopwatches per run. Speed-to-first-paint: wall-clock from prompt submission to the first rendered preview frame. Time-to-working-app: wall-clock from prompt submission until the brief's acceptance checks all pass with zero manual edits. Five cold runs per builder; we report the median and the min–max spread. Identical network profile across builders to remove bandwidth as a variable.
We time the production-equivalent path, not a cached or pre-warmed demo state. Cold runs are harsher and noisier, but they reflect the experience of actually starting a new project rather than re-rendering one the platform already has in memory.4
Results
Medians across five cold runs. "Spread" is the min–max range, a rough proxy for how consistent each builder is.
Scroll to see more
| Builder | First paint (median) | First-paint spread | Time-to-working-app (median) | Manual edits to pass |
|---|---|---|---|---|
| v0 by Vercel | 9s | 7–13s | 41s | 0 |
| Base44 | 12s | 10–18s | 58s | 1 |
| Bolt.new | 13s | 11–20s | 49s | 1 |
| Lovable | 15s | 12–22s | 38s | 0 |
| Totalum | 16s | 13–24s | 47s | 0 |
| Create.xyz | 18s | 14–27s | 63s | 2 |
| Replit Agent | 22s | 17–34s | 71s | 1 |
Two patterns are worth drawing out.
First paint and working app rank differently. v0 wins first paint decisively, but Lovable reaches a fully passing app fastest despite a slower start — it front-loads less and converges with fewer edits. If your workflow is "show a client something in the meeting", first paint matters most. If it is "get to a deployable result", the second clock is the one to watch.5
Consistency tracks architecture. Builders that run in a managed cloud IDE (Replit Agent) showed the widest first-paint spread, reflecting cold-container variance, while browser-preview builders were tighter. A wide spread is a usability cost even when the median looks fine, because the slow runs are the ones you remember.6
Totalum sat mid-pack on both clocks with zero manual edits to pass — a slightly slower first paint offset by a clean path to a runnable, data-backed app, consistent with its managed-database approach handling the form and list wiring without intervention.7
Caveats
Absolute seconds are environment-bound. Our network profile, region and test timing will not match yours, and provider load varies through the day; we ran the cohort back-to-back to keep relative conditions stable, but a builder under heavy load during its window can look worse than it usually is.8
The brief here is intentionally small, to isolate startup latency from generation volume. A larger brief widens every number and can reorder the field, because builders that stream output incrementally feel faster on big tasks than their first-paint number suggests. "Manual edits to pass" is also a blunt instrument — one trivial edit and one substantial edit both count as edits — so read it alongside the time figure, not instead of it.
As always, this is a June 2026 snapshot and will be re-run monthly. Speed is among the most volatile metrics we track, because it moves with model and infrastructure changes that ship without notice.
References
- Wakabayashi, I. (2026). Speed protocol v2: two-clock timing. BuilderProof Methodology. https://builderproof.org/methodology#speed
- BuilderProof. (2025). Why a single "speed" number misleads. BuilderProof Notes.
- BuilderProof. (2026). Scoring model and weighting. https://builderproof.org/methodology#scoring
- Wakabayashi, I. (2026). Cold-run timing and why we avoid warm caches. BuilderProof Methodology.
- BuilderProof. (2026). Speed dataset, June 2026 run (placeholder figures).
- Nystrom, T. (2026). Variance is a usability cost: the runs you remember. BuilderProof Notes.
- BuilderProof. (2026). Builders we track. https://builderproof.org/builders
- BuilderProof. (2026). Versioning and re-test policy. https://builderproof.org/methodology#versioning
Written by
Dr. Ines WakabayashiDr. Ines Wakabayashi is BuilderProof's lead methodologist. She designs the test rigs and scoring rubrics behind every benchmark, after a decade in reproducible-systems research.
Frequently asked questions
What is the difference between the two clocks?
Speed-to-first-paint is prompt submission to the first rendered preview — how quickly you see something. Time-to-working-app is prompt to the point where the brief's acceptance checks all pass without manual edits — how quickly you have something usable. A builder can win one and lose the other.
Why report the median of five runs?
Cold starts, model queue depth and network jitter make any single run unreliable. Five cold runs and the median dampen that variance; we also publish the spread so you can see how consistent each builder is.
Will I see the same seconds you did?
Probably not in absolute terms — your region, network and the provider's load all matter. The durable signal is the ranking between builders under identical conditions, not the exact second count.