The product
One synthetic population. Every domain renders from it.
We don't sell loose tables of fake rows. We generate synthetic patient archetypes, then render their entire data trail — eligibility, claims, Rx, revenue, and more — across every line of business, all from the same coherent people. Medicare Advantage is available now; everything else is the same engine pointed at a new render target.
One engine, every render target
The archetype is the source. The data is what it renders.
A synthetic member isn't a row — it's a clinical life. Once we've generated the person, we can emit whatever data domain you need from that single coherent source, all joining on member_id. Four domains ship today; the rest render from the same people as we expand.
Eligibility & enrollment
Available nowMember-month enrollment, demographics, plan/program, and benefit status — the spine every other domain links to.
Medical claims
Available nowLine-level institutional + professional claims: diagnoses, procedures, settings, allowed/paid, and realistic adjustment chains.
Pharmacy
Available nowNCPDP-grade drug fills with refill chains, benefit phases, formulary tiers, and net-of-rebate economics.
Revenue & payment
Available nowPayer-side revenue the way a plan receives it — capitation, risk scores, and the factors behind every dollar.
Encounters
ExpandingEncounter-level utilization independent of billing — the visit-and-service record health systems and value-based programs run on.
Labs & results
ExpandingOrdered tests with realistic result values trended to each member's conditions — A1c that tracks the diabetic, eGFR that tracks the CKD.
Quality measures
ExpandingMeasure-ready numerators, denominators, and gaps (HEDIS-style / Stars) rendered from each member's actual care.
Lines of business
Medicare Advantage today · the rest expandingThe product below — the files, sizes, releases, and pricing — is the Medicare Advantage line, available to download today. Every other line of business renders from the same archetype engine.
Medicare Advantage — available now
Available nowOne population, rendered four ways.
Today the Medicare Advantage line renders four linked domains — eligibility, medical, Rx, and revenue. Each file is a faithful representation of how a plan actually receives that data stream. They are not independent samples — they are the same synthetic members seen from four angles, and they all join on member_id.
Eligibility
one row per member-monthThe enrollment spine. Demographics, plan/contract, dual & LIS & ESRD status, and the HCC condition flags that drive everything downstream.
- Member-month grain across 36 months
- Realistic age-ins, disenrollment, mortality
- Dual / LIS / ESRD flags that flip mid-year
- Links to every other file via member_id
Revenue (MMR)
one row per member-monthCMS payment the way a plan actually receives it. Part C & Part D capitation, V24/V28 blended risk scores, and the demographic + dual factors behind each dollar.
- V24, V28, and blended risk scores
- Part C + Part D capitation lines
- Coding-intensity + normalization applied
- Reconciles to eligibility member-months
Medical claims
one row per claim lineLine-level institutional + professional claims. DRGs, CPT/HCPCS, revenue codes, place of service, 25 diagnosis slots, allowed/paid, and realistic adjustment chains.
- Institutional + professional, multi-line claims
- Up to 25 ICD-10 diagnoses per claim
- Acute episode bundles (CHF, AMI, sepsis…)
- Adjustments, denials, reversals, paid-date lag
Rx claims (Part D)
one row per fill lineNCPDP-grade pharmacy claims. NDC-level fills with refill chains, benefit phases (deductible → coverage gap → catastrophic), formulary tiers, and net-of-rebate economics.
- Refill chains with realistic adherence
- Benefit phases + IRA Part D cap effects
- Formulary tiers, DAW codes, pharmacy NPIs
- Gross, member, plan, and net-of-rebate paid
Referential integrity is guaranteed: every claim line, every fill, and every MMR row points back to a member that exists in eligibility for that month. You can join the full picture — diagnosis to spend to risk score to payment — without a single orphaned key.
Packaging
Buy one file, or take the bundle and save.
Pricing is à la carte. Need only revenue to backtest a risk model, or only Rx for a Part D study? Buy that one file. Need the whole joined population? The full bundle of all four costs less than the sum of its parts.
Starter panel
100,000 members · per release
À la carte
All four linked files
Production panel
500,000 members · per release
À la carte
All four linked files
Prices are per release, in USD. The same à-la-carte and bundle structure applies on every available version — you choose the fidelity separately from the files and the size. See full pricing →
Size tiers
Prototype on a starter panel. Ship on a production one.
Both tiers are the same generator, the same schema, and the same fidelity — they differ only in how many synthetic members you get. Pick the size by the job, not the quality.
Starter panel
100,000 membersA 100,000-member population — large enough to be statistically meaningful, small enough to download, query on a laptop, and iterate on fast. Built for prototyping: schema validation, pipeline development, demos, methodology checks, and cost- sensitive experiments where you need realistic structure more than population scale.
- Develop and unit-test ingestion + transforms
- Demo a product on realistic, PHI-free data
- Sanity-check a methodology before you scale it
- From $1000 per file
Production panel
500,000 membersA 500,000-member population — enough density to model rarer conditions, stabilize HCC prevalence, and trust tail behavior in the cost curve. Built for production: training and validating risk-adjustment models, benchmarking a real book of business, and any analysis where small-cell noise would otherwise dominate.
- Train and validate risk-adjustment models at scale
- Benchmark a population with stable rates and tails
- Study rare cohorts without tiny-sample noise
- From $2,500 per file
Versions = model family
Choose a release the way you'd choose a model.
Each version is a distinct model of reality. Newer releases capture more of the messy truth and carry a higher fidelity score; older ones are lighter and cheaper and still hit national control totals. One is on the roadmap. Pick the fidelity your use case actually needs.
The releases below (v1, v2) are the Medicare Advantage line — more lines of business will follow on the same versioned engine.
Persistence, seasonality, and the social-determinants MLR fix.
Best for: Benchmarking, model training, and anything sensitive to member-level persistence or seasonality.
Changelog · 2026-06
- 1,000 AI-generated patient archetypes (up from 50)
- Member-level utilization persistence (sticky high-utilizers)
- Monthly seasonality on ER / IP / surgery / office visits
- Dual MLR now correctly exceeds non-dual (was inverted in v1)
- Explicit well-cohort carve-out for a realistic 5/50 spend curve
The first calibrated release. Solid control totals, simpler dynamics.
Best for: Schema validation, pipeline development, and cost-sensitive prototyping.
Changelog · 2026-05
- Four linked parquet files, full schema
- Calibrated to MA national control totals (risk, MLR, PMPM, util/1000)
- HCC-driven member journeys + acute episode bundles
- Automated credibility audit shipped with every batch
Real-claims pattern learning, SNP cohorts, provider continuity.
Best for: Coming next — the highest-fidelity release.
Planned
- Pattern-learning pipeline trained on licensed real claims
- SNP cohort modeling (D-SNP / C-SNP / I-SNP)
- Provider continuity + referral networks
- Readmission + post-acute pathways to clinical targets
Fidelity scores are our composite credibility metric — the same eval, run release over release, so the trend is honest. v3 is calibrated to published benchmarks today; the real-claims pattern-learning that earns its score is the next thing we're building. How we measure fidelity →
Every dataset is auditable
Each purchase ships with its own credibility audit.
You should never have to take our word for it. Every dataset you buy — any file, any size, any release — arrives with an actuarial summary and a credibility audit generated against that exact batch. The metrics below are from the current Medicare Advantage release.
The report bundle covers the metrics an actuary or data scientist would check first: PMPM, medical loss ratio, average risk score, utilization per 1,000, HCC prevalence, and cost concentration — each compared to a published CMS, MedPAC, or public-insurer benchmark, with citations. If a number drifts outside its expected band, the audit flags it rather than hiding it.
Figures are illustrative. Each dataset's real audit — computed on the batch you purchase — ships in its report bundle alongside the data.
Judge the fidelity yourself.
Start with the free 1,000-member sample — full schema, full report bundle, no signup. Then browse what's ready to download and pick your files, size, and release.