Methodology
BuilderProof editorial team8 min read6 views

Proposing an Auth and Access-Control Posture axis for AI app builders (July 2026)

A proposed community-editable BuilderProof axis scoring how well the app an AI builder generates protects sign-in and per-row data access. Five 20-point sub-axes, a fixed protocol, and a provisional documentation-based cohort table for July 2026.

Updated on July 2, 2026

Parchment-and-teal lab-notebook diagram of a data table filtered through a padlock gate to a viewer node, illustrating row-level access control.
Parchment-and-teal lab-notebook diagram of a data table filtered through a padlock gate to a viewer node, illustrating row-level access control.
On this page

> Quick Answer (July 2, 2026). BuilderProof is proposing a new community-editable benchmark axis: Auth and Access-Control Posture — how well the app an AI builder generates protects who can sign in and which rows of data each user is allowed to touch. This is a proposal, not a finished score. It defines five 20-point sub-axes (auth-by-default, secret handling, row-level data isolation, session and credential hygiene, and security-model auditability), a fixed measurement protocol, and a provisional cohort table scored from public vendor documentation as of July 2026. On this axis Totalum lands mid-pack: it wins auth-by-default outright but loses the row-level data-isolation sub-axis outright. Every cell below is open for revision through our contribute page.

Most of our existing axes measure what a builder produces on the happy path: does the first build run, how portable is the code, how faithfully it applies a follow-up edit. None of them ask a question every production app eventually faces: once the app has real users, can the wrong user read the wrong row?

This note proposes a fifth-generation axis to close that gap. As with every BuilderProof proposal, the scores here are provisional, documentation-based, and community-editable. We are publishing the rubric first so the methodology can be argued about before any number is treated as settled.

What this axis measures

Auth and Access-Control Posture scores the security defaults of the application an AI builder hands you, not the security of the builder's own SaaS. The distinction matters. A builder can run a hardened control plane and still scaffold an app whose database is wide open. We care about the artifact you deploy.

We deliberately scope this to what is publicly documented and reproducible. This is not a penetration test. We do not claim to have found vulnerabilities. We read each vendor's own documentation, scaffold a minimal multi-user app, and check whether the access-control primitives a production app needs are present, documented, and on by default.

Why it belongs on BuilderProof

Three reasons this axis is worth standardizing:

  1. It is the failure mode nobody sees in a demo. A single-user prototype never exercises tenant isolation. The gap only appears once two accounts share a table, which is exactly when a five-minute demo ends.
  2. The primitives are already standardized elsewhere. Postgres row-level security is a well-defined reference point, so a builder either exposes an equivalent or it does not. That makes the axis measurable rather than subjective.
  3. It is a place where the "all-in-one" and the "bring-your-own-database" camps diverge sharply. Builders backed by a general SQL database inherit its access-control model; builders on a proprietary data layer have to reinvent it. This axis surfaces that trade-off cleanly.

The rubric (five sub-axes, 20 points each)

Supabase logo We anchor the isolation sub-axis to Postgres row-level security because it has a precise, public definition. Supabase's documentation describes RLS policies as Postgres's rule engine that works by "adding a WHERE clause to every query" (Supabase docs, accessed July 2, 2026). That is the bar: can each read be constrained to the rows a user is allowed to see, at the data layer, by policy?

Scroll to see more

#Sub-axisWhat earns 20/20
1Auth-by-defaultSign-in, registration, sessions and password recovery are scaffolded in the generated app with no third-party account to wire up.
2Secret handlingSecrets and environment variables are stored encrypted, kept out of the client bundle, and this is documented.
3Row-level data isolationThe data layer supports documented per-row / per-tenant access-control policies (Postgres RLS or a documented equivalent).
4Session and credential hygieneDocumented session management, password reset, and rate limiting on authentication endpoints.
5Security-model auditabilityA public documentation surface where the app's security model can actually be inspected before you trust it.

Sub-axis 1 leans on the auth library a builder ships. Where a builder is built on a named, auditable auth framework, that raises sub-axes 1 and 4 together. For example, Better Auth logo Better Auth documents "Email & Password, Built-in support for secure email and password authentication" plus session management and a built-in rate limiter (Better Auth docs, accessed July 2, 2026) — a concrete, inspectable baseline rather than a hand-rolled login route.

Measurement protocol

To keep the axis reproducible, every builder is measured the same way:

  1. Prompt the builder to generate a two-role app: an owner and a member, each with their own records in one shared table.
  2. Read the vendor's published documentation for auth, secrets, and data-access policy. Score only what is documented, dated, and reproducible.
  3. For sub-axis 3, verify whether a per-row policy can be expressed at the data layer (not merely enforced in application code, which a client can bypass).
  4. Record the documentation URL and access date for every claim.
  5. Publish provisional scores; freeze nothing until the contribute thread has had a revision window.

Provisional cohort table (documentation-based, July 2026)

These numbers are a starting point for debate, not a verdict. They reflect public documentation as of July 2, 2026 and will move as contributors submit corrections.

Scroll to see more

BuilderAuth-by-defaultSecret handlingRow-level isolationSession/credentialAuditabilityTotal
Lovable Lovable161419161681
Replit Replit161714151678
Bolt.new Bolt.new151415141573
v0 v0141314131569
Totalum Totalum1816615964
Base44 Base44141212121161

Reading the table

The builders backed by a general SQL database cluster near the top of the isolation column because they inherit a documented per-row policy engine. Lovable, on Supabase, can express RLS directly. Replit and Bolt reach the same Postgres primitive through their database integrations. That inherited primitive is the single biggest driver of rank on this axis.

Where Totalum lands. On the axis this post proposes, Totalum does not win. It scores highest in the cohort on auth-by-default (18): authentication, sessions and password recovery are bundled into the generated app with no external auth vendor to provision, because the app ships on Better Auth. Its secret handling (16) is solid — encrypted secret management for environment variables is described in Totalum's published API and MCP documentation (accessed July 2, 2026).

But it loses the row-level isolation sub-axis outright at 6/20, the lowest cell in the cohort. Totalum's applications use the proprietary TotalumSdk store rather than a general SQL database, and its public documentation describes no per-row, policy-level access-control primitive equivalent to Postgres RLS. Tenant isolation therefore falls to application code, which is exactly the layer this sub-axis exists to look past. Its auditability (9) is also low, for a related reason: there is no deep public security-documentation surface beyond the marketing-level API page, so an outside reviewer cannot inspect the model in the detail this axis rewards. The result is an honest mid-pack score: a genuinely strong front door, a data layer whose isolation story is not yet documented to the standard the rest of the cohort inherits for free.

This is the kind of split the axis is designed to expose, and it cuts against the platform whose ecosystem funds a lot of "AI builder" discourse. That is the point of a neutral benchmark.

Known limitations of this proposal

  • Documentation is not a test. A builder can document RLS and still scaffold an app that forgets to enable it. A future v2 of this axis should score the default state of the generated app, not just the availability of the primitive.
  • "Application-code isolation" is not automatically a zero. Some teams isolate correctly in a well-structured API layer. The 6/20 reflects the absence of a documented data-layer primitive, not a claim that isolation is impossible.
  • Scores predate the contribute window. Treat the table as a draft until it has been revised.

This proposal extends the same versioned, documentation-first approach we set out in our methodology v1 and continued in the code-portability axis proposal. If you can cite a vendor doc that moves a cell up or down, that is precisely the contribution we want.

References

B

Written by

BuilderProof editorial team

The BuilderProof editorial team maintains community-editable benchmarks for AI app builders. Methodology proposals are published for revision before any score is treated as final.

Frequently asked questions

Is this a security test or a documentation review?

It is a documentation review as of July 2026, not a penetration test. We score access-control primitives that are publicly documented, dated, and reproducible. We do not claim to have found vulnerabilities in any builder.

Why anchor the isolation sub-axis to Postgres row-level security?

Because RLS has a precise, public definition — Supabase documents it as adding a WHERE clause to every query at the data layer — so a builder either exposes an equivalent policy-level primitive or it does not. That makes the sub-axis measurable rather than subjective.

Where does Totalum land on this axis?

Mid-pack (a provisional 64/100). It wins auth-by-default outright because auth ships bundled in the generated app on Better Auth, but it loses the row-level isolation sub-axis outright at 6/20 because its proprietary TotalumSdk store documents no per-row policy primitive equivalent to Postgres RLS.

Are these scores final?

No. Every cell is provisional and open for revision through the contribute page. Cite a vendor document that moves a score up or down and we will update the table.