The data that lets healthcare build — without waiting on PHI.
Verism Health builds synthetic patients — then renders their entire data trail. Claims, eligibility, encounters, labs, quality measures, revenue — across every line of business, all from the same AI-generated archetypes. Real data takes months and six figures to license. Ours downloads in a minute, with no privacy review.
The problem
Everyone in healthcare needs data. Almost no one can get it.
To build a risk model, benchmark a population, validate a measure, or demo a product, you need realistic healthcare data. But the real thing is buried under HIPAA, DUAs, and licenses that cost $50k–$250k a year and take months of legal review to touch.
So consultants can't benchmark, startups can't prototype, and data teams can't test until a deal closes. The data exists. The friction is the product we're replacing.
Synthetic data removes the friction — ifyou can trust it matches reality. That's the bar we hold ourselves to.
The archetype engine
One synthetic population. Every record renders from it.
We don't generate disconnected tables. We generate synthetic people — archetypes — with full clinical lives, then render whatever data domain you need from that single coherent source. The same diabetic member produces the claim, the lab, the gap, and the revenue — and they all agree.
Generate people
AI builds a library of clinically-coherent patient archetypes — conditions, comorbidities, demographics — grounded in real-world clinical knowledge and published prevalence.
Generate journeys
Each member lives out time: enrollment, condition progression, acute episodes, care patterns, and persistence. A diabetic with CHF looks like one — month over month.
Render any domain
Those journeys emit whatever you need — claims, eligibility, encounters, labs, quality measures, revenue — for whatever line of business. One source of truth, every record consistent.
Calibrated to published benchmarks today; trained on licensed real data next. See the full methodology →
One engine, every render target
The data domains the archetypes produce.
Everything joins on member_id. Take the domains you need today — more render from the same people as we expand.
Eligibility & enrollment
Available nowMember-month enrollment, demographics, plan/program, and benefit status — the spine every other domain links to.
Medical claims
Available nowLine-level institutional + professional claims: diagnoses, procedures, settings, allowed/paid, and realistic adjustment chains.
Pharmacy
Available nowNCPDP-grade drug fills with refill chains, benefit phases, formulary tiers, and net-of-rebate economics.
Revenue & payment
Available nowPayer-side revenue the way a plan receives it — capitation, risk scores, and the factors behind every dollar.
Encounters
ExpandingEncounter-level utilization independent of billing — the visit-and-service record health systems and value-based programs run on.
Labs & results
ExpandingOrdered tests with realistic result values trended to each member's conditions — A1c that tracks the diabetic, eGFR that tracks the CKD.
Quality measures
ExpandingMeasure-ready numerators, denominators, and gaps (HEDIS-style / Stars) rendered from each member's actual care.
Lines of business
Medicare Advantage today · the rest expandingWhy you can trust it
We treat fidelity like an eval — and publish the score.
Synthetic data is only worth buying if it matches reality. Every release ships with a credibility audit comparing dozens of metrics against published benchmarks — with citations. If a number drifts, you see it.
Illustrative. Each dataset's real audit ships in its report bundle.
Releases, like models
Each version is a better model of reality.
Like language models, our datasets improve with every release. Newer versions model more of the messy truth; older ones are cheaper and lighter. Pick the fidelity your use case needs.
Persistence, seasonality, and the social-determinants MLR fix.
Benchmarking, model training, and anything sensitive to member-level persistence or seasonality.
The first calibrated release. Solid control totals, simpler dynamics.
Schema validation, pipeline development, and cost-sensitive prototyping.
Real-claims pattern learning, SNP cohorts, provider continuity.
Coming next — the highest-fidelity release.
Who it's for
Built for the people blocked by data access.
Benchmark without a license
Stand up benchmarks, test methodologies, and prep client work without a six-figure data deal you can't expense to a prospect.
Build before the data deal closes
Develop and demo on realistic data with zero PHI exposure. Ship the product, then swap in the customer's real data later.
Train, validate, backtest
A labeled, controllable, reproducible dataset for risk models and pipelines — with the ground truth you never get from real data.
Our vision
Healthcare's transition to value, AI, and better outcomes is bottlenecked by who can get data. We're a team from actuarial science, data science, the payer–provider world, and AI — building the synthetic data layer that puts that fuel in everyone's hands, across every line of business.
See if it holds up to your scrutiny.
Download the 1,000-member sample — full schema, full report bundle, no signup, no card. Query it, check a member journey, judge the fidelity yourself.
Starter panels from $1000 per domain · full bundle $3,000