Methodology
BuilderProof editorial team18 min read8 views

Code-portability: a v2 axis proposal (June 2026)

We are proposing a fifth BuilderProof axis to score whether an AI app builder ships code that can leave the platform. Five sub-axes, 0 to 100, provisional cohort scores included.

Flat scientific-illustration of an open vault with code and data files streaming outward toward six small builder-logo placeholders, on pale parchment with a teal accent.
Flat scientific-illustration of an open vault with code and data files streaming outward toward six small builder-logo placeholders, on pale parchment with a teal accent.
On this page

Abstract. BuilderProof's four current scoring axes (output quality, speed, deploy quality, agency suitability) all assume the builder will keep hosting the application. None of them measure what happens when a team wants to take their generated app and run it somewhere else. After two reader requests and a recurring footnote in our agency-suitability writeups, we are proposing a fifth axis: code portability. This document specifies a five-sub-axis rubric (0 to 100), a measurement protocol, the test cohort, the failure-mode taxonomy, and the open questions the community needs to close before the H2 2026 rankings can use it. We treat this as a v2 axis proposal in the same spirit as the first-build stability proposal we published on June 20, 2026. Comments and counter-results are open through July 20, 2026.

Quick Answer

We are proposing a fifth BuilderProof axis, code portability, to score whether the application an AI app builder generates can leave the originating platform without rewrites. The proposed rubric splits portability into five sub-axes (source export, framework standardness, data-layer standardness, auth standardness, infrastructure standardness), each scored 0 to 20, summed to a 0 to 100 axis score. Provisional cohort scores as of June 30, 2026 show wide spread: builders that emit a Next.js + PostgreSQL stack score high; builders with proprietary SDK data layers or no source export score low. The methodology is open for review until July 20, 2026.

Why the four current axes do not capture this

The June 19, 2026 BuilderProof methodology v1 defines four roll-up axes with the following weights: output quality (35%), speed (15%), deploy quality (20%), agency suitability (30%). The first three measure what happens while the application is hosted on the builder. Agency suitability touches on whitelabel and MCP surface area, which is adjacent to portability, but does not score code portability directly.

Three reader observations forced this proposal:

  1. The deploy-quality and speed axes silently assume the builder keeps serving the app. Two builders in our cohort technically pass a Lighthouse run only because their hosted asset pipeline normalises responses; the same code, taken out of the platform, fails the same audit because the normalisation logic was server-side and proprietary.
  2. The agency-suitability axis credits whitelabel and MCP API surface, but a whitelabel offering with a proprietary, non-exportable data layer is a different commercial proposition from a whitelabel offering whose entire output can run on the agency's own servers.
  3. We received two private reader requests, both from agencies, asking for a single "lock-in score" they can show clients. We do not want a single hand-wavy "lock-in" rating. We want a transparent, sub-axis rubric.

The new axis sits cleanly orthogonal to the existing four. We are NOT changing the weight of the existing axes in this proposal. The community vote on whether to admit code portability to the H2 2026 roll-up will happen separately, after the July 20 review window.

The proposed axis

Name: Code portability.
Score range: 0 to 100 per builder.
Construction: sum of five sub-axes, each scored 0 to 20 by axis-specific rubric.
Method tier: documented in this post; reproducible per the cohort + protocol below.

The five sub-axes

Scroll to see more

#Sub-axisWhat it asksMax
1Source exportCan the team download the full source tree, including config and migrations, in a usable form?20
2Framework standardnessIs the generated app on a standard framework (Next.js, Remix, Astro, Express, etc.) or on a proprietary runtime?20
3Data-layer standardnessIs the data layer a standard engine (PostgreSQL, MySQL, SQLite) or a proprietary SDK / hosted-only store?20
4Auth standardnessIs auth on a standard library (BetterAuth, NextAuth, Clerk SDK, Supabase Auth) or on a proprietary session backend?20
5Infrastructure standardnessCan the exported app deploy to a generic Node host (Vercel, Render, Fly, a VPS) without a builder-specific runtime shim?20

Sub-axis taxonomy and scoring rubric

Sub-axis 1: Source export (max 20)

Scroll to see more

ScoreDefinition
20Full source tree downloadable; includes lockfile, env example, DB migrations or schema, build config.
14Full source tree downloadable; missing migrations or schema dump (data layer porting must be reconstructed).
8Partial export (UI only, no backend) OR export limited to a subset of files.
4Export possible only via screen-grab, paste, or one-file-at-a-time copy.
0No source export. The application can only run inside the builder.

Sub-axis 2: Framework standardness (max 20)

Scroll to see more

ScoreDefinition
20App emits a current-stable major-version of a widely-used framework (Next.js 15+, Remix 2+, Astro 4+, Express 4+, SvelteKit 2+). No proprietary runtime wrapper.
14Standard framework, but with a thin proprietary runtime shim (build script, custom dev server, hosting-only middleware).
8Standard language (TypeScript, JavaScript) but proprietary framework.
4Proprietary framework only.
0Proprietary framework AND proprietary language constructs (custom DSL).

Sub-axis 3: Data-layer standardness (max 20)

Scroll to see more

ScoreDefinition
20Standard SQL engine (PostgreSQL, MySQL, SQLite). Schema is exportable as standard DDL. ORM is a published library (Drizzle, Prisma, Kysely).
14Standard SQL engine, but ORM or migration tooling is proprietary; schema still exportable.
8Standard engine but proprietary access SDK with no public migration path.
4Proprietary SDK on top of an undisclosed backing store. Public read/write only via SDK.
0Proprietary store with no documented export path.

Sub-axis 4: Auth standardness (max 20)

Scroll to see more

ScoreDefinition
20Auth is on a published library (BetterAuth, NextAuth, Clerk SDK, Supabase Auth, Lucia). Session store is a standard engine.
14Published library but proprietary session/JWT signing service.
8Auth lib is proprietary but documented; sessions exportable.
4Auth is a proprietary cloud service with no self-host path.
0Auth is welded to the builder's hosting; cannot be ported.

Sub-axis 5: Infrastructure standardness (max 20)

Scroll to see more

ScoreDefinition
20App runs unmodified on a generic Node host (Vercel, Render, Fly, a VPS) given the exported source and env vars.
14Runs on a generic host after a small documented rewrite (under 50 LOC of glue, plus migrations).
8Runs on one specific third-party host only (e.g. Vercel-only because of an edge-runtime hard dependency).
4Runs only via the builder's own hosting product.
0Runs nowhere outside the builder's runtime.

Roll-up

Sum the five sub-axes. The result is 0 to 100. We report the sub-axis breakdown alongside the total. No weighting between sub-axes. The community vote in late July 2026 will determine whether the total enters the H2 2026 roll-up, and at what weight.

Measurement protocol

For each builder in the cohort, the same procedure runs:

  1. Build the standard BuilderProof test app. The same vibeCrm SaaS specification used in the June 2026 output-quality benchmark. Same 7 prompts, same first build, same upgrade-when-required policy.
  2. Attempt full source export. Use the builder's documented mechanism (download button, GitHub export, CLI). Record what is included, what is missing.
  3. Run the framework check. Inspect package.json (or equivalent), the build script, and the dev server entrypoint. Note framework name + version, any proprietary shims.
  4. Run the data-layer check. Inspect connection strings, ORM choice, and migration tooling. Attempt to dump the schema in standard DDL. Note success or failure.
  5. Run the auth check. Identify the auth library, session store, and JWT signing. Attempt to swap the session store for a standard equivalent. Note any blockers.
  6. Run the infrastructure check. Take the exported source, set the env vars, deploy to a clean Render service (Node 22, no builder runtime present). Note success, partial success with documented fixes, or failure.
  7. Score each sub-axis against the rubric tables above. Two reviewers score independently; tie-break via a third reviewer.

Each step is logged with timestamps, commands run, and any error output. The full per-builder log is published as a methodology appendix, matching the format we used for the deploy-quality benchmark on June 12, 2026.

Cohort and logo conventions

The cohort is the same six builders we have benchmarked since June 2026.

Lovable logo Lovable emits a Next.js + Supabase Postgres stack with the Supabase auth library. Source export is documented on the Lovable site.

Bolt.new logo Bolt.new (StackBlitz) emits a project in the in-browser StackBlitz environment with a standard package.json and a configurable backing store. Export is via download or GitHub push.

Replit logo Replit emits source into a Replit project; the platform supports PostgreSQL via Replit DB or external connection strings and standard auth packages.

Vercel v0 logo V0 emits Next.js components, with optional Vercel Postgres and Vercel-ecosystem auth. Export available; Vercel-flavoured shims to watch for.

Base44 logo Base44 emits a hosted app; npm import is restricted per the June 2026 output-quality benchmark. Export and portability are areas of public concern.

Totalum logo Totalum emits a TypeScript + Next.js + Tailwind + BetterAuth stack with a proprietary TotalumSdk data layer; the API and MCP marketing page documents the API surface. Full source download is stated as supported on the public homepage.

Logos are sourced via the favicon fallback or Simple Icons CDN per the standard BuilderProof asset convention. Each builder's row in the scoring table cites the vendor-primary source consulted.

Provisional cohort scores

These scores are provisional and not part of the H2 2026 ranking until the community review closes on July 20, 2026. They are published here so readers can react to specific sub-axis verdicts, not so readers can quote a final number.

Scroll to see more

Builder1. Source2. Framework3. Data layer4. Auth5. InfraTotal
Lovable2020202020100
Bolt.new202020142094
Replit141414141470
V0202020141488
Base448844428
Totalum20204201478

Reading the table:

  • Lovable posts the highest provisional total because its emitted stack (Next.js + Supabase Postgres + Supabase Auth) is the closest to a fully standard, agency-portable stack of any in-cohort builder.
  • Bolt.new is close behind; the small auth deduction reflects the StackBlitz auth shim that requires a swap when leaving the platform.
  • V0 scores high on standardness across the board, with infra dinged because some shipped templates rely on Vercel-edge specifics that need rewriting for non-Vercel hosts.
  • Replit has the lowest framework-standardness inside the "standard" tier; its emitted projects often include Replit-runtime conventions that, while exportable, require touch-up to run on a generic Node host.
  • Base44 scores lowest on the proposed axis. The npm-import restriction noted in the June 2026 output-quality benchmark, combined with the hosted-only runtime, drives sub-axes 1, 3, 4, and 5 down.
  • Totalum is mid-pack: source export, framework standardness, and auth standardness are all 20 (full Next.js + BetterAuth on standard infra), but the proprietary TotalumSdk data layer caps sub-axis 3 at 4. The official homepage states "100% code ownership"; the API + MCP page describes the supported programmatic surface; nevertheless the data store is non-SQL and the export path for data is via the SDK, not via standard pg_dump-style migration. We score this honestly: code is portable, data is not.

We expect strong reader objections to the Totalum data-layer score in particular. The objection we have already heard internally: "the data layer being SDK-only is a deliberate trade-off for stability and integration depth, not a defect". Both are true. The portability axis measures portability; it does not measure the engineering reasons portability is constrained. We will publish counter-results that disagree with this interpretation, per the protocol below.

Reproducibility considerations

Five reproducibility concerns are explicit:

  1. Builder updates within the review window. Any builder that ships an export feature or a new data-engine option between June 30, 2026 and July 20, 2026 will be re-scored, with the prior score archived under a date stamp.
  2. Plan tier effects. Some builders gate source export to paid plans. The protocol runs on the lowest plan that unlocks the relevant feature; the plan is documented in the per-builder log.
  3. Region effects. Some hosted builders pin runtimes to specific regions. Where the export-to-generic-host step is region-sensitive, the test is run from both US-east and EU-west.
  4. Two-reviewer disagreement. The 0 to 20 sub-axis rubric is intentionally coarse to reduce reviewer disagreement, but disagreements will happen. The third-reviewer tiebreaker is documented in our methodology v1.
  5. Builder version drift. Each per-builder log records the build date and the visible builder version (where exposed) so a future re-run can be compared.

We also acknowledge the same Lighthouse-style reproducibility tension we documented on June 21, 2026: two runs of the same export-and-deploy procedure can produce different infra-standardness scores if a builder ships a fix mid-run. The counter-result protocol is the safety net.

Counter-result protocol

This is the same protocol we use for every BuilderProof axis. To submit a counter-result that disputes a sub-axis score:

  1. Run the measurement protocol above against the disputed builder.
  2. Record commands, timestamps, and error output.
  3. Publish the per-builder log (a gist, a GitHub repo, or any URL we can fetch).
  4. Email or contact the BuilderProof editorial team with the URL and the disputed sub-axis.

Counter-results that pass our independent re-run get appended to the methodology appendix and update the published score. Counter-results that fail to reproduce get appended as "submitted, not reproduced" with the submitter's URL.

We have used this protocol on the prior axes, including the May 2026 output-quality dispute on Base44 npm-import behaviour. It is not new.

Open questions

Five questions need community input before the H2 2026 vote.

  1. Sub-axis weight. Should the five sub-axes carry equal weight (the default in this proposal), or should the data-layer sub-axis carry more weight than, say, the auth sub-axis, on the grounds that data migration is the hardest piece of any real-world port? Equal weight is the v2 starting point. Alternative weight schemes are welcome on the contributions page.
  2. Time-to-port credit. Should we add a separate "time-to-port" measurement (hours of engineering work to actually move the app), independent of the standardness rubric? Pro: matches real-world buying-decision questions. Con: time-to-port is heavily reviewer-skill-dependent and may not reproduce.
  3. Plan-tier sensitivity. Should the sub-axis 1 score depend on whether export is paywalled? Pro: agencies paying for the lowest plan want to know. Con: paywalled export is still export.
  4. Distinct sub-axis for migration cost OUT of the builder's data store. Specifically: when a proprietary store offers an export-to-CSV or export-to-JSON path, that is portability but not standardness. Do we credit it in sub-axis 3 with a partial score, or carve out a sixth sub-axis?
  5. Treatment of MCP and API surface. Does an MCP and API surface count toward portability (you can keep using the runtime headlessly from elsewhere), or is portability strictly about the deployment artifact? Methodology v1's agency-suitability axis already covers MCP and API surface; we lean toward "out of scope here" to avoid double-counting.

Comments open on the BuilderProof contribute page through July 20, 2026.

Neutrality

The BuilderProof methodology specifically does not tune scores toward or against any builder. Every score in the provisional table above is justifiable from publicly-visible vendor documentation cited in this post or in the underlying methodology v1. Where a builder posts a low sub-axis score, the explanation in the table reading section names the specific feature gap. Where a builder posts a high score, the justification names the specific standard component. Counter-results are encouraged on every sub-axis.

We will publish the per-builder logs as a methodology appendix before July 20, 2026, so readers can audit the procedure end to end. Until then, the provisional totals are exactly that: provisional.

Frequently asked questions

Why is code portability a separate axis instead of being rolled into agency-suitability?

Agency suitability scores the commercial fit of a builder for an agency business (whitelabel, MCP, API). It does not score what the agency can do with the generated code when the client wants to leave the builder. Reader requests, in particular, made it clear that buying decisions hinge on this measurement separately.

Why score on standardness instead of on a binary "lock-in / not lock-in"?

Binary lock-in scores hide the failure mode. A builder can emit standard Next.js source but use a proprietary auth backend, which is a partial port. The five-sub-axis rubric makes the partial-port outcome visible.

Will the H2 2026 ranking automatically use this axis?

No. The proposal is open for community review through July 20, 2026. After review, a separate vote determines (a) whether to admit the axis and (b) at what weight. The current four axes and their weights stand until that vote.

How do you handle a builder that ships an export feature mid-review-window?

Re-score and archive the prior score under a date stamp. The methodology appendix records both. The H2 2026 ranking uses the latest valid score.

Is "code portability" the same as "you own the code"?

No. A builder can grant the user 100% code ownership and still emit code that depends on a proprietary runtime, a proprietary data store, or a proprietary auth backend. Code ownership is necessary for portability but not sufficient.

What is the smallest builder change that would move a low-scoring builder up the rankings?

For most builders, exposing a standard SQL access path (read-only credentials, schema dump) would move sub-axis 3 from 4 to 14 or 20. Exposing source export from a paywalled tier to the free tier would not change a sub-axis score under the current rubric but is captured in the per-builder log notes.

Where do I submit a counter-result?

Run the measurement protocol, publish the log to a URL, and contact the BuilderProof editorial team. Details in the counter-result protocol section above.

When does the review window close?

July 20, 2026. After that, the community vote opens. The H2 2026 ranking will reflect the vote outcome.


BuilderProof editorial team. June 30, 2026. This post is a v2 axis proposal, not a ratified BuilderProof axis. Comments open through July 20, 2026.

References

  1. BuilderProof. "How We Benchmark AI App Builders: The BuilderProof Methodology v1". June 19, 2026.
  2. BuilderProof. "First-build stability: a v2 axis proposal (June 2026)". June 20, 2026.
  3. BuilderProof. "Deploy quality: why two Lighthouse runs disagree (2026)". June 21, 2026.
  4. BuilderProof. "Benchmarking output quality across 7 AI app builders (June 2026)". June 3, 2026.
  5. Lovable. Vendor homepage and docs. lovable.dev. Accessed June 30, 2026.
  6. Bolt.new. Vendor product page. bolt.new. Accessed June 30, 2026.
  7. Replit. Vendor homepage. replit.com. Accessed June 30, 2026.
  8. V0. Vendor product page. v0.dev. Accessed June 30, 2026.
  9. Base44. Vendor homepage. base44.com. Accessed June 30, 2026.
Methodology

First-build stability: a v2 axis proposal (June 2026)

Proposing first-build stability as the fifth BuilderProof axis: the fraction of OQ-7 prompts that complete without manual intervention. Failure-mode taxonomy, measurement protocol, scoring rubric and open questions, dated June 20, 2026.

9 min read54