Peer-reviewed science.
Not prompt engineering.

Every synthetic mind is born from validated psychometric models, national census microdata, and licensed industry datasets. The language model enters only after the mind already exists.

Architecture

Eleven-Layer Cognitive Genome

Eleven independent science layers — from census foundation through decision routing, behavioral style, and persistent memory. Each layer adds a dimension that prompt engineering cannot replicate.

Census Demographics

Official national microdata — gender, age, SES, education, region, urbanization, ethnicity. 100K+ cells per country. The mathematical foundation.

Foundation

Psychometric Architecture

OCEAN Big Five with full covariance matrices and K-Means++ archetype clustering. Not a label — a mathematical distribution with Monte Carlo sampling.

Calibrated

Cultural Genome

Intra-country diversity encoded as psychometric covariances. A synthetic Nikkei-Brazilian and a synthetic Nordestino have fundamentally different decision architectures.

Differentiator

Market Behavior

175+ industry attributes across financial, health, food, media, retail, transport, and values domains. Sourced from licensed studies — not LLM training data.

175+ attributes

Household & Life Context

Family structure, household size, marital status, children, income. Decisions don't happen in isolation — they happen at home, shaped by real life constraints.

Context

Cognitive Decision Router

Centaur dual-process engine: System 1 (fast intuition) vs System 2 (deliberate analysis). Routed per individual by OCEAN personality, stakes, and context. Kahneman, operationalized.

Decision science

Behavioral Style

DISC probability distribution computed from OCEAN, modulated by role, vertical, company culture, and governance. The same person behaves differently in different rooms.

Context-dependent

Migration & Lifestyle

Language at home, acculturation level (Berry Framework), internal/international mobility. Populations in movement — a São Paulo professional in Miami gradually shifts lifestyle, consumption, and decision patterns. We model that drift mathematically.

Acculturation

Proprietary Enrichment

Your CRM data, custom clusters, proprietary market attributes — injected as a versioned layer with rollback capability. Never contaminates the shared genome.

Client-owned

L10

Tribes & Scenarios

Micro-populations from what-if splits. "If 30% of your clients adopted GLP-1 and stopped drinking." "If your competitor launched this." A/B testing at population scale.

What-if engine

L11

Memory & Evolution

Persistent episodic memory + collective segment anchor. Minds evolve through simulations but never hallucinate a life inconsistent with their genome. The safety brake.

Evolution

1111

11:11 is an alignment — the moment when census data, personality science, cultural encoding, decision theory, behavioral modeling, migration patterns, proprietary intelligence, scenario engineering, and persistent memory all converge into something that could not have existed before this exact moment in technology.

The blind spot

LLMs don't understand culture. We encode it.

Prompt-engineered personas flatten 200 million people into a stereotype. A “35-year-old Brazilian woman” told to a language model produces an American default with a Brazilian label.

🌍

Same country, different worlds

Brazil has Afro-Brazilian communities in Bahia, Japanese-descended families in São Paulo, German-heritage towns in the South, and evangelical movements reshaping urban peripheries. A single “Brazilian persona” captures none of this.

🧬

Culture shapes decisions

The same product generates opposite reactions across cultural groups within the same country. Our cultural genome layer captures these differences mathematically — through validated psychometric covariances, not stereotypes.

📐

Not stereotypes — statistics

Cultural groups are derived from real demographic data and normative studies. Trait distributions are co-varied with ethnicity, region, and religion. The result is precision, not generalization.

“A synthetic Nikkei-Brazilian and a synthetic Nordestino don't just have different names. They have fundamentally different decision architectures.”

The computational-expressive boundary

Above — Linguistic expression

The language model is responsible only for expression. It gives voice to a mind that already has opinions, preferences, and a cognitive architecture.

Below — Mathematical substrate

The behavioral and attitudinal content is determined by the scientific framework before the language model is involved. Demographics, psychometrics, culture, market behavior — all computed. Zero AI.

This separation — which we term the computational-expressive boundary — is what produces culturally specific, psychometrically coherent outputs that don't simply reflect the statistical center of LLM training data.

Scientific foundation

Standing on peer-reviewed research.

Three independent research programs — from Stanford, Nature, and applied industry science — validate the core principles our architecture operationalizes.

85%

accuracy

Stanford University

Park et al., 2024

Generative agents grounded in rich individual profiles replicate real human responses with 85% normalized accuracy — comparable to human test-retest reliability.

We operationalize this at scale across 30 countries — purely from census-grounded synthetic populations.

10.6M

human choices

Nature (Binz et al.)

Centaur — Foundation Model of Human Cognition, 2025

Trained on 10.6 million human choices from 160 psychological experiments. Outperforms all domain-specific cognitive models. Acknowledged limitation: strong WEIRD population bias.

Our Synthetic Population Matrix provides exactly the culturally calibrated substrate that Centaur cannot.

90→50%

without grounding

PyMC Labs & Colgate-Palmolive

Maier et al., 2025 — 57 consumer surveys, 9,300 respondents

LLMs reproduce human purchase intent with 90% of maximum achievable correlation — but only with rich demographic conditioning. Without it, accuracy collapses to 50%.

Our genome provides that demographic grounding at census-level precision. No guesswork.

Positioning

What we are not.

Not Agent-Based Modeling

Traditional ABM uses behavioral rules and calibrates at the aggregate level through macro-output matching. Our synthetic minds are defined by empirically calibrated psychometric profiles — enabling qualitative outputs (opinions, reasoning, stated preferences) alongside quantitative distributions. Calibration operates at the individual level against documented population characteristics.

Not LLM-as-Simulator

The academic literature (Argyle et al., Santurkar et al., Törnberg et al.) establishes both the potential and limitations of using language models as behavioral simulators: training-data bias, minority underrepresentation, within-agent inconsistency. These are precisely the problems our architecture addresses — by separating behavioral content from linguistic expression.

Validation

Accuracy is not claimed. It is earned.

Every synthetic population passes a pre-deployment validation protocol. Persistent identities earn trust through continuous performance measurement.

Control Test

Synthetic vs. real-world behavioral benchmarks

Consistency

Same specification → statistically equivalent outputs

Discrimination

Different segments → measurably different responses

Calibration

Predicted confidence matches observed hit rates

Sensitivity

Trait variations produce directionally consistent behavioral shifts

Personal Similarity Index (PSI)

For persistent synthetic identities, ongoing validity is tracked through PSI. Each time a synthetic mind participates in a simulation that is subsequently validated against real-world outcome data, the PSI is updated. Identities that consistently produce accurate predictions gain PSI. Those that diverge are reviewed for recalibration. Below a minimum threshold, they are removed from active use.

Validation Study

Peru — Datum Internacional

Synthetic Peruvians produced responses that researchers with deep in-country expertise recognized as culturally authentic — including patterns of household decision-making, family structure representation, and emotional register that had been systematically misrepresented in LLM-only approaches.

Partnership with Datum Internacional, specialist Latin American intelligence firm. Full documentation in progress.

Independent Observation

Brazil — Real-Time Crisis Signal

While a major global sportswear brand faced a real-time public crisis in Brazil, our synthetic population — with zero exposure to the real-world reaction — independently surfaced the same negative signals: emotional rejection, brand dissonance, and cultural misalignment that the simulations flagged as high-risk across multiple consumer segments.

Independent observation — the synthetic signal and the real-world crisis ran in parallel, without one informing the other. Formal documentation in progress.

Scope & Commitments

We are transparent about what this framework is and is not. Ethnographic work, unarticulated need discovery, and genuinely novel cultural phenomena remain domains where human expertise is irreplaceable. Our framework is designed as a complement, not a substitute. No real individual's data is used without appropriate consent. Census data is public. Published science is cited. Outputs are clearly identified as synthetic — never misrepresented as human-generated data.

“The same precision used in climate models and epidemiological forecasting — applied to understanding how people think, choose, and act.”

Thinkers.pro Science Team

See the science in action.

Explore our live populations or request early access to build with genome-grade synthetic minds.

Explore Populations Request Early Access

Peer-reviewed science.Not prompt engineering.