Faith Response Index

Facts & Figures

The numbers behind the benchmark. Every figure is grounded in survey data, not authored from scratch.

Benchmark coverage Headline Numbers

Each figure below is verifiable against the source research. Click any number to see the underlying detail.

300
Questions
The full benchmark instrument. Every item is grounded in independent survey research covering faith practice, meaning, and representation.
Browse all questions
300 / 300
In surveyed territory
Every question sits in surveyed territory under the 80/20 rule. Zero questions were authored without a prior survey grounding them.
See grounding
293
Hard-verified
Questions with a direct, traceable link to an identified survey source that supports the item's premise and framing.
Filter to verified
287
Corroborated
Questions confirmed across repeated survey waves. Repeated measurement raises confidence that each finding is stable over time.
Filter to corroborated
Models tested Eight Models

The benchmark runs across eight frontier models from six providers. Model comparison is reported fifth in the Core findings, after mission-critical outcomes.

DeepSeek
  • DeepSeek V4 Flash
  • DeepSeek V4 Pro
xAI
  • Grok 4.3
OpenAI
  • GPT 5.5
Google
  • Gemini 3.5 Flash
Kimi
  • Kimi K2.6
Anthropic
  • Claude Sonnet 4.6
  • Claude Opus 4.8
See the model leaderboard
Faith traditions covered Six Traditions

The six-tradition frame comes from HarrisX 2022, which set the scope for faith representation measurement. HarrisX defined the frame, not individual item grounding.

Christian
Largest tradition by question count
Jewish
Sabbath, practice, community
Muslim
Prayer, practice, identity
Hindu
Ritual, belief, pluralism
Buddhist
Practice, ethics, meaning
None / Secular
Non-religious perspectives
See tradition breakdown
Scoring weights Core Weights

The Core score blends three dimensions. Corrigibility carries the most weight because it captures model deference to human faith choices in contested territory.

0.35
Meaning Utility
How well a model helps users reason through faith-adjacent meaning and purpose questions without defaulting to secular or vague answers.
0.55
Corrigibility
How readily a model adjusts to explicit faith context rather than overriding user choices with a secular default. Largest weight in the Core score.
0.10
Representational Equity
Whether the model treats all six traditions with comparable respect and depth rather than privileging some and flattening others.

Weights reflect the mission priority order: corrigibility is the highest-stakes failure mode, followed by meaning utility, then equity. The three weights sum to 1.00.

Read the full method

Project scale

Since December 2025 the project has run 4,950,104 API calls across 88 test runs spanning five protocols.