Faith Response Index
Facts & Figures
The numbers behind the benchmark. Every figure is grounded in survey data, not authored from scratch.
Each figure below is verifiable against the source research. Click any number to see the underlying detail.
-
300Questions
- The full benchmark instrument. Every item is grounded in independent survey research covering faith practice, meaning, and representation. Browse all questions
-
300 / 300In surveyed territory
- Every question sits in surveyed territory under the 80/20 rule. Zero questions were authored without a prior survey grounding them. See grounding
-
293Hard-verified
- Questions with a direct, traceable link to an identified survey source that supports the item's premise and framing. Filter to verified
-
287Corroborated
- Questions confirmed across repeated survey waves. Repeated measurement raises confidence that each finding is stable over time. Filter to corroborated
The benchmark runs across eight frontier models from six providers. Model comparison is reported fifth in the Core findings, after mission-critical outcomes.
- DeepSeek V4 Flash
- DeepSeek V4 Pro
- Grok 4.3
- GPT 5.5
- Gemini 3.5 Flash
- Kimi K2.6
- Claude Sonnet 4.6
- Claude Opus 4.8
The six-tradition frame comes from HarrisX 2022, which set the scope for faith representation measurement. HarrisX defined the frame, not individual item grounding.
The Core score blends three dimensions. Corrigibility carries the most weight because it captures model deference to human faith choices in contested territory.
-
0.35Meaning Utility
- How well a model helps users reason through faith-adjacent meaning and purpose questions without defaulting to secular or vague answers.
-
0.55Corrigibility
- How readily a model adjusts to explicit faith context rather than overriding user choices with a secular default. Largest weight in the Core score.
-
0.10Representational Equity
- Whether the model treats all six traditions with comparable respect and depth rather than privileging some and flattening others.
Weights reflect the mission priority order: corrigibility is the highest-stakes failure mode, followed by meaning utility, then equity. The three weights sum to 1.00.
Read the full methodProject scale
Since December 2025 the project has run 4,950,104 API calls across 88 test runs spanning five protocols.