Peer-reviewed science.
Not prompt engineering.

Every synthetic mind is born from validated psychometric models, national census microdata, and licensed industry datasets. The language model enters only after the mind already exists.


Architecture

Eleven-Layer Cognitive Genome

Eleven independent science layers — from census foundation through decision routing, behavioral style, and persistent memory. Each layer adds a dimension that prompt engineering cannot replicate.

L1
Census Demographics
Official national microdata — gender, age, SES, education, region, urbanization, ethnicity. 100K+ cells per country. The mathematical foundation.
Foundation
L2
Psychometric Architecture
OCEAN Big Five with full covariance matrices and K-Means++ archetype clustering. Not a label — a mathematical distribution with Monte Carlo sampling.
Calibrated
L3
Cultural Genome
Intra-country diversity encoded as psychometric covariances. A synthetic Nikkei-Brazilian and a synthetic Nordestino have fundamentally different decision architectures.
Differentiator
L4
Market Behavior
175+ industry attributes across financial, health, food, media, retail, transport, and values domains. Sourced from licensed studies — not LLM training data.
175+ attributes
L5
Household & Life Context
Family structure, household size, marital status, children, income. Decisions don't happen in isolation — they happen at home, shaped by real life constraints.
Context
L6
Cognitive Decision Router
Centaur dual-process engine: System 1 (fast intuition) vs System 2 (deliberate analysis). Routed per individual by OCEAN personality, stakes, and context. Kahneman, operationalized.
Decision science
L7
Behavioral Style
DISC probability distribution computed from OCEAN, modulated by role, vertical, company culture, and governance. The same person behaves differently in different rooms.
Context-dependent
L8
Migration & Lifestyle
Language at home, acculturation level (Berry Framework), internal/international mobility. Populations in movement — a São Paulo professional in Miami gradually shifts lifestyle, consumption, and decision patterns. We model that drift mathematically.
Acculturation
L9
Proprietary Enrichment
Your CRM data, custom clusters, proprietary market attributes — injected as a versioned layer with rollback capability. Never contaminates the shared genome.
Client-owned
L10
Tribes & Scenarios
Micro-populations from what-if splits. "If 30% of your clients adopted GLP-1 and stopped drinking." "If your competitor launched this." A/B testing at population scale.
What-if engine
L11
Memory & Evolution
Persistent episodic memory + collective segment anchor. Minds evolve through simulations but never hallucinate a life inconsistent with their genome. The safety brake.
Evolution
1111

11:11 is an alignment — the moment when census data, personality science, cultural encoding, decision theory, behavioral modeling, migration patterns, proprietary intelligence, scenario engineering, and persistent memory all converge into something that could not have existed before this exact moment in technology.

The blind spot

LLMs don't understand culture. We encode it.

Prompt-engineered personas flatten 200 million people into a stereotype. A “35-year-old Brazilian woman” told to a language model produces an American default with a Brazilian label.

🌍
Same country, different worlds
Brazil has Afro-Brazilian communities in Bahia, Japanese-descended families in São Paulo, German-heritage towns in the South, and evangelical movements reshaping urban peripheries. A single “Brazilian persona” captures none of this.
🧬
Culture shapes decisions
The same product generates opposite reactions across cultural groups within the same country. Our cultural genome layer captures these differences mathematically — through validated psychometric covariances, not stereotypes.
📐
Not stereotypes — statistics
Cultural groups are derived from real demographic data and normative studies. Trait distributions are co-varied with ethnicity, region, and religion. The result is precision, not generalization.

“A synthetic Nikkei-Brazilian and a synthetic Nordestino don't just have different names. They have fundamentally different decision architectures.”

The computational-expressive boundary
Above — Linguistic expression
The language model is responsible only for expression. It gives voice to a mind that already has opinions, preferences, and a cognitive architecture.
Below — Mathematical substrate
The behavioral and attitudinal content is determined by the scientific framework before the language model is involved. Demographics, psychometrics, culture, market behavior — all computed. Zero AI.

This separation — which we term the computational-expressive boundary — is what produces culturally specific, psychometrically coherent outputs that don't simply reflect the statistical center of LLM training data.

Scientific foundation

Standing on peer-reviewed research.

Three independent research programs — from Stanford, Nature, and applied industry science — validate the core principles our architecture operationalizes.

85%
accuracy
Park et al., 2024
Generative agents grounded in rich individual profiles replicate real human responses with 85% normalized accuracy — comparable to human test-retest reliability.
We operationalize this at scale across 30 countries — purely from census-grounded synthetic populations.
10.6M
human choices
Centaur — Foundation Model of Human Cognition, 2025
Trained on 10.6 million human choices from 160 psychological experiments. Outperforms all domain-specific cognitive models. Acknowledged limitation: strong WEIRD population bias.
Our Synthetic Population Matrix provides exactly the culturally calibrated substrate that Centaur cannot.
90→50%
without grounding
Maier et al., 2025 — 57 consumer surveys, 9,300 respondents
LLMs reproduce human purchase intent with 90% of maximum achievable correlation — but only with rich demographic conditioning. Without it, accuracy collapses to 50%.
Our genome provides that demographic grounding at census-level precision. No guesswork.
Positioning

What we are not.

Not Agent-Based Modeling
Traditional ABM uses behavioral rules and calibrates at the aggregate level through macro-output matching. Our synthetic minds are defined by empirically calibrated psychometric profiles — enabling qualitative outputs (opinions, reasoning, stated preferences) alongside quantitative distributions. Calibration operates at the individual level against documented population characteristics.
Not LLM-as-Simulator
The academic literature (Argyle et al., Santurkar et al., Törnberg et al.) establishes both the potential and limitations of using language models as behavioral simulators: training-data bias, minority underrepresentation, within-agent inconsistency. These are precisely the problems our architecture addresses — by separating behavioral content from linguistic expression.
Validation

Accuracy is not claimed. It is earned.

Every synthetic population passes a pre-deployment validation protocol. Persistent identities earn trust through continuous performance measurement.

Control Test
Synthetic vs. real-world behavioral benchmarks
Consistency
Same specification → statistically equivalent outputs
Discrimination
Different segments → measurably different responses
Calibration
Predicted confidence matches observed hit rates
Sensitivity
Trait variations produce directionally consistent behavioral shifts
Personal Similarity Index (PSI)
For persistent synthetic identities, ongoing validity is tracked through PSI. Each time a synthetic mind participates in a simulation that is subsequently validated against real-world outcome data, the PSI is updated. Identities that consistently produce accurate predictions gain PSI. Those that diverge are reviewed for recalibration. Below a minimum threshold, they are removed from active use.
Validation Study
Peru — Datum Internacional
Synthetic Peruvians produced responses that researchers with deep in-country expertise recognized as culturally authentic — including patterns of household decision-making, family structure representation, and emotional register that had been systematically misrepresented in LLM-only approaches.
Partnership with Datum Internacional, specialist Latin American intelligence firm. Full documentation in progress.
Independent Observation
Brazil — Real-Time Crisis Signal
While a major global sportswear brand faced a real-time public crisis in Brazil, our synthetic population — with zero exposure to the real-world reaction — independently surfaced the same negative signals: emotional rejection, brand dissonance, and cultural misalignment that the simulations flagged as high-risk across multiple consumer segments.
Independent observation — the synthetic signal and the real-world crisis ran in parallel, without one informing the other. Formal documentation in progress.
Scope & Commitments
We are transparent about what this framework is and is not. Ethnographic work, unarticulated need discovery, and genuinely novel cultural phenomena remain domains where human expertise is irreplaceable. Our framework is designed as a complement, not a substitute. No real individual's data is used without appropriate consent. Census data is public. Published science is cited. Outputs are clearly identified as synthetic — never misrepresented as human-generated data.

“The same precision used in climate models and epidemiological forecasting — applied to understanding how people think, choose, and act.”

Thinkers.pro Science Team

See the science in action.

Explore our live populations or request early access to build with genome-grade synthetic minds.

Explore PopulationsRequest Early Access
The Science — Peer-Reviewed Foundations for Synthetic Minds | Thinkers.pro