Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)

BuilderProof editorial team

BuilderProof editorial teamJune 12, 20265 min read94 views

Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)

We audited the deployed output of seven AI app builders with Lighthouse, axe-core and a structured SEO checklist - auditing the production build, not the in-editor preview. Performance was the strongest dimension across the board; accessibility was the weakest, with colour-contrast and form-label failures common. Lovable and v0 led overall, but no builder shipped a clean accessibility pass out of the box. This page reports per-dimension scores and the specific failures that recur, so you know what to fix after export.

Updated on July 3, 2026

Lab-notebook instrument cluster of three linked gauges representing SEO, accessibility, and performance deploy-quality axes

On this page

Quick Answer

This June 2026 deploy-quality benchmark audits the deployed output of seven AI app builders against Lighthouse performance, axe-core accessibility and a structured SEO checklist, against the production build rather than the in-editor preview. Performance was uniformly strong across the cohort (high 80s to mid 90s), while accessibility did not clear 85 for any builder: contrast on muted text and missing form labels recur. SEO hygiene is uneven but mechanically fixable. Lovable and v0 led overall on the published axes; per-dimension cell breakdowns and the most-common failure per builder are reported below so a reader can reweight for their own deploy targets.

An AI builder's job is not finished when the preview looks right. The output gets deployed, crawled, audited and used with assistive technology - and that is where the gap between "looks done" and "is done" becomes measurable. This benchmark audits what actually ships.¹

Background

Output quality and deploy quality are different questions. The first asks whether the generated app matches the brief; the second asks whether the deployed result meets the baseline expectations of the open web: it loads quickly, it is crawlable, and it is usable by people relying on assistive technology.²

These standards are not aspirational. Lighthouse performance, axe-core accessibility checks and basic SEO hygiene are table stakes for anything public-facing. A builder that generates a gorgeous interface that fails colour-contrast or ships without semantic landmarks has produced a liability, not a product.³

Method

We took the deployed output of brief OQ-7 from each builder and ran three independent audits against the production URL.

What we audited

Three dimensions on the deployed build. Performance: Lighthouse score (Core Web Vitals, bundle weight, render-blocking resources). Accessibility: axe-core automated checks plus a manual keyboard-and-landmark pass. SEO: a structured checklist - title and meta description, semantic headings, crawlability, structured data, canonical tags. We audit the production build, never the in-editor preview, because that is what users receive.

Each dimension is scored independently and reported separately. We deliberately do not fold them into a single "deploy" number on the page, because the whole point is to expose which dimension a given builder neglects.⁴

Results

Per-dimension scores against the default deployed output. Higher is better; all three are 0–100.

Scroll to see more

Builder	Performance	Accessibility	SEO	Most common failure
Lovable	95	84	90	Contrast on muted text
v0 by Vercel	96	82	88	Missing form labels
Bolt.new	92	78	85	No structured data
Totalum	90	80	89	Heading-order skips
Replit Agent	88	76	82	Render-blocking assets
Base44	89	72	80	Unlabelled controls
Create.xyz	86	71	79	Missing meta description

Three things are consistent across the cohort.

Performance is solved; accessibility is not. Every builder scored in the high 80s or 90s on performance - modern frameworks and sensible defaults have made fast output the norm. Accessibility tells the opposite story: not one builder cleared 85, and the failures are mechanical and repetitive, dominated by colour-contrast on muted text and missing form labels.⁵

SEO hygiene is uneven but fixable. Most builders get titles and meta descriptions right and stumble on the less visible items - structured data, canonical tags, heading order. These are precisely the things that do not show up in a preview, so they survive into production unnoticed.⁶

The failures are predictable. Because they recur, you can plan for them. If you ship on v0, budget time to add form labels; on Create.xyz, check the meta description. Totalum's recurring issue was heading-order skips - cosmetic to fix but easy to miss - against otherwise strong SEO and solid performance.⁷

On translating these axes into contract acceptance

Lighthouse, Core Web Vitals and axe-core scores are useful as deploy-quality signals; they become more useful inside a contract when the agency turns them into measurable acceptance criteria. DevShopVault's 2026 SoW guide for fixed-price AI app builds covers the clause structure that ties deploy-quality thresholds to project acceptance and to the post-launch hotfix window. Editorial cross-reference; benchmark scoring is unchanged.

Caveats

These scores reflect the default output, not a ceiling. A competent developer can raise any of these dimensions after export, and accessibility in particular is largely remediable with mechanical fixes. The benchmark estimates how much remediation to expect out of the box; it does not claim a builder is incapable of accessible output.⁸

Automated audits also miss things. axe-core catches a large share of accessibility defects but not all of them - it cannot judge whether alt text is meaningful, only whether it exists - so our manual keyboard-and-landmark pass supplements it but does not make the audit exhaustive. Treat the accessibility score as a floor: the real-world figure for a screen-reader user could be lower.

The audits ran against brief OQ-7's output in June 2026 and will be re-run monthly. Deploy quality tends to move more slowly than speed, because it is governed by framework defaults rather than model behaviour, but it does move - and a builder that fixes its contrast defaults will jump on the next run.

References

BuilderProof editorial team. (2026). Deploy-quality audit protocol v2. BuilderProof Methodology. builderproof.org/methodology#deploy-quality
BuilderProof. (2026). Output quality vs deploy quality: two questions. builderproof.org/methodology#output-quality
W3C. (2024). Web Content Accessibility Guidelines (WCAG) 2.2. Authoritative source for the accessibility failure categories cited in the cohort results.
Deque Systems. (2025). axe-core automated accessibility rules. Open-source rule engine used for the automated accessibility pass.
BuilderProof. (2026). Deploy-quality dataset, June 2026 run (v0.1 preview figures; first independently reproduced cycle publishes July 2026).
Google. (2025). Lighthouse scoring and Core Web Vitals. Reference for the performance, SEO and Lighthouse-side accessibility metrics audited.
BuilderProof. (2026). Builders we track. builderproof.org/builders
BuilderProof. (2026). Versioning and re-test policy. builderproof.org/methodology#versioning
Google Chrome team. (2026). Lighthouse score variability documentation. Reproducibility caveats expanded in the BuilderProof deploy-quality reproducibility lab note (June 21, 2026).

#deploy quality #seo #accessibility #performance #june 2026

Back to benchmarks

Share

B

Written by

BuilderProof editorial team

Published by the BuilderProof editorial team - the maintainers of the public, versioned benchmark methodology.

Cite this benchmark

Plain text

BuilderProof editorial team. "Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)". BuilderProof, June 2026. https://www.builderproof.org/benchmarks/deploy-quality-benchmark-seo-accessibility-performance-june-2026.

BibTeX

@misc{builderproof-deploy-quality-benchmark-seo-accessibility-performance-june-2026,
  title  = {{Deploy-quality benchmark: SEO, accessibility and performance audits (June 2026)}},
  author = {{BuilderProof editorial team}},
  year   = {2026},
  month  = {jun},
  howpublished = {\url{https://www.builderproof.org/benchmarks/deploy-quality-benchmark-seo-accessibility-performance-june-2026}},
  note   = {BuilderProof, builderproof.org}
}

Frequently asked questions

Do you audit the preview or the deployed site?

We audit the production deployment, never the in-editor preview, because the preview environment can hide problems that show up only once the build is shipped to a real domain.

Why is accessibility scored separately from performance?

Because they fail for different reasons. Performance is a framework-defaults question; accessibility is a markup-quality and design-token question. Folding them into a single score obscures which dimension the builder neglected.

Can these scores be improved after export?

Yes for every dimension, and accessibility in particular is largely remediable through mechanical fixes (labels, contrast, landmarks). The benchmark measures the default output, not the ceiling.

How reliable is a single Lighthouse run?

Less reliable than a median of five. Google's own variability documentation reports that a five-run median is roughly twice as stable as a single run. The June 2026 cells in this table are v0.1 single-run figures and will move to median-plus-IQR in the next iteration, per the deploy-quality reproducibility lab note (June 21, 2026).

Why is no builder above 85 on accessibility?

Because the default templates ship without remediation. Contrast tokens are tuned for visual style rather than WCAG 2.2 thresholds, and form components are wired before labels are added. Both are mechanical to fix and the cohort gap is uniform - a default-output property rather than a per-builder defect.