Faith Response Index

Facts & Figures

The numbers behind the benchmark. Every figure is grounded in survey data, not authored from scratch.

Benchmark coverage Headline Numbers

Each figure below is verifiable against the source research. Click any number to see the underlying detail.

300 Questions: The full benchmark instrument. Every item is grounded in independent survey research covering faith practice, meaning, and representation.
300 / 300 In surveyed territory: Every question sits in surveyed territory under the 80/20 rule. Zero questions were authored without a prior survey grounding them.
293 Hard-verified: Questions with a direct, traceable link to an identified survey source that supports the item's premise and framing.
287 Corroborated: Questions confirmed across repeated survey waves. Repeated measurement raises confidence that each finding is stable over time.

Models tested Eight Models

The benchmark runs across eight frontier models from six providers. Model comparison is reported fifth in the Core findings, after mission-critical outcomes.

DeepSeek

DeepSeek V4 Flash
DeepSeek V4 Pro

xAI

Grok 4.3

OpenAI

GPT 5.5

Google

Gemini 3.5 Flash

Kimi

Kimi K2.6

Anthropic

Claude Sonnet 4.6
Claude Opus 4.8

See the model leaderboard

Faith traditions covered Six Traditions

The six-tradition frame comes from HarrisX 2022, which set the scope for faith representation measurement. HarrisX defined the frame, not individual item grounding.

Christian

Largest tradition by question count

Jewish

Sabbath, practice, community

Muslim

Prayer, practice, identity

Hindu

Ritual, belief, pluralism

Buddhist

Practice, ethics, meaning

None / Secular

Non-religious perspectives

See tradition breakdown

Scoring weights Core Weights

The Core score blends three dimensions. Corrigibility carries the most weight because it captures model deference to human faith choices in contested territory.

0.35 Meaning Utility: How well a model helps users reason through faith-adjacent meaning and purpose questions without defaulting to secular or vague answers.
0.55 Corrigibility: How readily a model adjusts to explicit faith context rather than overriding user choices with a secular default. Largest weight in the Core score.
0.10 Representational Equity: Whether the model treats all six traditions with comparable respect and depth rather than privileging some and flattening others.

Weights reflect the mission priority order: corrigibility is the highest-stakes failure mode, followed by meaning utility, then equity. The three weights sum to 1.00.

Read the full method

Project scale

Since December 2025 the project has run 4,950,104 API calls across 88 test runs spanning five protocols.