arxiv: 2605.14455 · v1 · submitted 2026-05-14 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact

Chandan Rajah , Neha Sengupta , Federico Castanedo , Robin Mills , Amit Bahree , Ramesh Krishnan Muthukrishnan , Larry Murray

Authors on Pith no claims yet

Pith reviewed 2026-05-15 01:41 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords AI impact measurementorganizational AI adoptionIntelligence Impact Quotientworkflow integrationtoken usage metricsAI autonomyimpact quantificationadoption index

0 comments

The pith

The Intelligence Impact Quotient combines novelty-weighted token stock with usage frequency, leverage, task complexity, and autonomy to yield comparable 0-1000 scores of AI integration depth.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the Intelligence Impact Quotient as a composite metric to quantify how deeply AI systems are embedded in organizational workflows and what impact they produce. It moves beyond raw access counts or total token volume by incorporating a novelty-weighted and time-decayed token stock, usage frequency, a recency gate, organizational leverage, task complexity, and degree of autonomy. The formulation first generates a raw Intelligence Adoption Index and then normalizes it to a 0-1000 scale that permits direct comparisons across different users, teams, and organizations. Additional rules support sub-daily updates and bounded estimates of efficiency and financial effects. Synthetic scenarios demonstrate how the metric separates low-leverage repetitive use from higher-autonomy, higher-consequence applications.

Core claim

The Intelligence Impact Quotient (IIQ) is a composite metric that integrates a novelty-weighted, time-decayed token stock with usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy to produce both a raw Intelligence Adoption Index (IAI) and a normalized 0-1000 IIQ index, enabling comparison of AI integration depth across heterogeneous organizational contexts, together with sub-daily update rules and bounded interpretations for estimated efficiency and financial impact.

What carries the argument

The IIQ composite metric, which weights novelty-weighted and time-decayed token stock by usage frequency, organizational leverage, task complexity, and autonomy to quantify workflow integration depth.

If this is right

Organizations gain a standardized 0-1000 scale that allows direct comparison of AI integration across units of different sizes and types.
The metric separates frequent low-leverage prompting from autonomous, high-consequence AI-assisted work.
Sub-daily update rules support continuous monitoring rather than periodic snapshots.
Bounded estimates of efficiency and financial impact can be attached to the IIQ values.
The framework is positioned for ongoing tracking of AI embedding in workflows, not as a measure of model capability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread use could shift organizational AI investment toward tools that raise autonomy and complexity levels rather than raw volume.
The distinction between token quantity and weighted impact suggests that dashboards focused only on usage counts may overstate adoption quality.
Correlation studies linking IIQ scores to measured productivity outcomes would provide an external test of the weighting choices.
The same weighting logic could be adapted to track individual employee or team-level AI maturity trajectories over time.

Load-bearing premise

The chosen factors of novelty-weighted token stock, usage frequency, leverage, complexity, and autonomy together accurately quantify integration depth and impact without post-hoc fitting or external validation against real outcomes.

What would settle it

Longitudinal data from organizations that records actual productivity gains, cost savings, or workflow changes and shows no statistical correlation with computed IIQ scores would falsify the metric's claim to measure impact.

Figures

Figures reproduced from arXiv: 2605.14455 by Amit Bahree, Chandan Rajah, Federico Castanedo, Larry Murray, Neha Sengupta, Ramesh Krishnan Muthukrishnan, Robin Mills.

**Figure 2.** Figure 2: Synthetic 45-day IIQ trajectories for recurring use, a short interruption, a long interruption, and weekly [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comparison of four synthetic user profiles after a 30-day simulation. The left panel shows final raw [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

The Intelligence Impact Quotient (IIQ) is a composite metric intended to quantify the depth to which AI systems are integrated into organizational work and their impact. Rather than treating access counts or aggregate token volume as sufficient evidence of impact, IIQ combines a novelty-weighted, time-decayed token stock with usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy. The formulation produces a raw Intelligence Adoption Index (IAI) and a normalized 0-1000 IIQ index for comparison between heterogeneous users and units. We also derive sub-daily update rules and a bounded interpretation layer for estimated efficiency and financial impact. The paper positions IIQ as a deployment-oriented measurement framework: a formal proposal for tracking AI embedding in workflows, not a direct measure of model capability or a substitute for causal productivity evaluation. Synthetic scenarios illustrate how the revised metric distinguishes between frequent low-leverage use, semantically repetitive prompting, and more autonomous, higher-consequence AI-assisted work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IIQ gives a clear new composite formula for scoring AI integration depth with explicit update rules, but the factors rest on untested choices with no empirical backing.

read the letter

The paper introduces the Intelligence Impact Quotient as a composite score that weights novelty and time decay on token stock, then adds usage frequency, a recency gate, leverage, complexity, and autonomy before normalizing to a 0-1000 scale. The specific combination plus the sub-daily update rules and bounded interpretation layer for efficiency and financial estimates is what is new. Synthetic scenarios show it separating repetitive low-leverage prompting from higher-autonomy work, which helps make the idea concrete. The formulation is transparent and the normalization supports cross-unit comparisons, which is useful for deployment tracking. The paper does a reasonable job laying out the components without hidden steps. The main limitation is the absence of any derivation or data for the chosen weights and functional form. No regression against productivity outcomes, error rates, or other external benchmarks appears, so the score reduces to a weighted sum of the inputs the authors selected. Sensitivity to the free parameters is not tested, and the synthetic cases remain illustrative rather than confirmatory. This leaves the central claim as a modeling assumption rather than a validated measure. The work targets internal reporting teams and AI governance groups that need something beyond raw usage counts for tracking adoption. Practitioners building dashboards could adapt the framework to their own data, though they would need to set the weights themselves. Readers interested in applied measurement tools would find the structure worth examining. I would send it for peer review. Referees could usefully comment on validation approaches and factor selection, which is the right next step for a proposal at this stage.

Referee Report

3 major / 2 minor

Summary. The paper proposes the Intelligence Impact Quotient (IIQ) as a composite metric to quantify the depth of AI integration into organizational workflows and its resulting impact. It defines a raw Intelligence Adoption Index (IAI) from novelty-weighted and time-decayed token stock, usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy; this is then normalized to a 0-1000 IIQ scale for cross-unit comparison. The manuscript also supplies sub-daily update rules and a bounded interpretation layer linking the index to estimated efficiency and financial impact. Synthetic scenarios are presented to show differentiation between low-leverage repetitive use and higher-autonomy, higher-consequence AI-assisted work. The work is framed explicitly as a deployment-oriented measurement framework rather than a validated causal instrument or model-capability benchmark.

Significance. If the chosen factors and functional form can be shown to correlate with observable productivity or outcome measures, IIQ could supply a practical, comparable index for tracking AI embedding that goes beyond raw token counts or access logs. The explicit sub-daily update rules and bounded interpretation layer are concrete engineering contributions that could be implemented directly in monitoring systems. At present, however, the absence of any empirical calibration or external validation leaves the metric as an untested modeling choice whose practical value cannot yet be assessed.

major comments (3)

[Abstract / §3] Abstract and the central formulation (presumably §3): the IAI is defined directly as a linear or weighted combination of novelty-weighted token stock, usage frequency, leverage, complexity, and autonomy with no derivation, sensitivity analysis, or external benchmark supplied for the specific functional form or component weights. The claim that this combination 'quantifies integration depth and impact' therefore rests on definitional choice rather than demonstrated support.
[§5] Synthetic scenarios section (presumably §5): the illustrative cases distinguish usage patterns only within the authors' own synthetic data; no regression against real productivity outcomes, error analysis, or hold-out validation is reported, so the scenarios cannot establish that IIQ tracks actual organizational impact.
[Interpretation layer] Interpretation layer: the bounded mapping from IIQ to estimated efficiency and financial impact is presented without any grounding data or uncertainty quantification, making the downstream claims load-bearing yet unsupported.

minor comments (2)

[Notation] Notation for the time-decay and grace-period parameters is introduced without an explicit table of symbols or default values, which would aid reproducibility.
[Normalization] The normalization step from raw IAI to the 0-1000 IIQ scale is described at a high level; an explicit equation or pseudocode would clarify the exact scaling.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback and for noting the potential utility of the sub-daily update rules and bounded interpretation layer. The paper presents IIQ as a proposed framework for measuring organizational AI impact, explicitly not as a validated causal model. Below we respond point-by-point to the major comments, clarifying the definitional nature of the metric and committing to revisions that strengthen the manuscript's transparency regarding its scope and limitations.

read point-by-point responses

Referee: [Abstract / §3] Abstract and the central formulation (presumably §3): the IAI is defined directly as a linear or weighted combination of novelty-weighted token stock, usage frequency, leverage, complexity, and autonomy with no derivation, sensitivity analysis, or external benchmark supplied for the specific functional form or component weights. The claim that this combination 'quantifies integration depth and impact' therefore rests on definitional choice rather than demonstrated support.

Authors: We agree that the functional form is a definitional choice without derivation from data or sensitivity analysis. The manuscript frames this as a practical framework for deployment monitoring, where the components (novelty-weighted token stock, frequency, leverage, complexity, autonomy) are selected based on their relevance to integration depth. No external benchmark is supplied because the work does not claim to be a predictive model. We will add a new subsection in §3 explaining the rationale for the chosen form and weights, and include a limitations statement noting the lack of sensitivity analysis. revision: partial
Referee: [§5] Synthetic scenarios section (presumably §5): the illustrative cases distinguish usage patterns only within the authors' own synthetic data; no regression against real productivity outcomes, error analysis, or hold-out validation is reported, so the scenarios cannot establish that IIQ tracks actual organizational impact.

Authors: The synthetic scenarios are meant to illustrate the metric's behavior across different usage patterns within a controlled setting, not to validate its correlation with real impact. The paper states upfront that it is a measurement framework proposal and not a substitute for causal evaluation. We will revise §5 to explicitly describe the scenarios as illustrative and add text in the discussion section highlighting that empirical validation against productivity data is required in future work. revision: partial
Referee: [Interpretation layer] Interpretation layer: the bounded mapping from IIQ to estimated efficiency and financial impact is presented without any grounding data or uncertainty quantification, making the downstream claims load-bearing yet unsupported.

Authors: We accept that the interpretation layer lacks grounding data and uncertainty quantification. The mappings are provided as bounded, illustrative examples to aid interpretation rather than as data-driven estimates. In revision, we will qualify these claims more explicitly, add statements on the hypothetical nature of the efficiency and financial links, and include a note on the need for uncertainty quantification in applied use. revision: yes

Circularity Check

1 steps flagged

IIQ reduces to its definitional composite of author-chosen factors with no independent derivation shown

specific steps

self definitional [Abstract]
"IIQ combines a novelty-weighted, time-decayed token stock with usage frequency, a grace-period recency gate, organizational leverage, task complexity, and autonomy. The formulation produces a raw Intelligence Adoption Index (IAI) and a normalized 0-1000 IIQ index for comparison between heterogeneous users and units."

The paper claims this combination quantifies depth of integration and impact, yet the IIQ value is computed directly from the listed inputs via the authors' chosen functional form. The output index is therefore identical to the definitional inputs by construction, with no separate derivation or benchmark establishing that these factors measure impact independently.

full rationale

The paper's central claim is that IIQ quantifies organizational AI impact via a specific combination of factors into IAI then normalized to 0-1000. This is presented as a formulation that produces the index, but the text supplies only the definitional equations and synthetic illustrations. No regression against outcomes, external validation, or first-principles derivation is exhibited, so the measure is equivalent to its input selection and weighting by construction. This matches self-definitional circularity at the core of the framework.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The framework rests on several author-chosen weighting and decay parameters plus the domain assumption that token-based factors can proxy integration depth; no independent evidence for these choices is supplied.

free parameters (2)

component weights
Relative importance assigned to novelty, frequency, leverage, complexity, and autonomy; values not specified but required to produce the composite score.
time decay rate
Parameter controlling how quickly older token uses lose influence; introduced to implement recency gating.

axioms (1)

domain assumption Novelty-weighted token usage combined with leverage and autonomy accurately reflects organizational AI impact depth
Invoked as the basis for the metric definition without supporting derivation or data.

invented entities (2)

Intelligence Impact Quotient (IIQ) no independent evidence
purpose: Normalized 0-1000 index for cross-unit comparison of AI integration
Newly defined composite score with no prior existence claimed.
Intelligence Adoption Index (IAI) no independent evidence
purpose: Raw un-normalized score before scaling to 0-1000
Intermediate quantity invented as part of the framework.

pith-pipeline@v0.9.0 · 5491 in / 1554 out tokens · 48848 ms · 2026-05-15T01:41:58.730838+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 5 canonical work pages

[1]

Anthropic Economic Research. 2026. Anthropic Economic Index: New building blocks for understanding AI use. https://www.anthropic.com/research/economic-index-primitives

2026
[2]

Anthropic Economic Research. 2025. Anthropic Economic Index: AI's impact on software development. https://www.anthropic.com/research/impact-software-development

2025
[3]

Miles McCain et al. 2026. Measuring AI agent autonomy in practice. Anthropic. https://www.anthropic.com/research/measuring-agent-autonomy

2026
[4]

Atoosa Kasirzadeh and Iason Gabriel. 2025. Characterizing AI Agents for Alignment and Governance. arXiv preprint arXiv:2504.21848. https://arxiv.org/abs/2504.21848

work page arXiv 2025
[5]

Thomas Kwa, Ben West, Joel Becker, et al. 2025. Measuring AI Ability to Complete Long Tasks. METR. https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

2025
[6]

Siegel, Nitya Nadgir, and Arvind Narayanan

Sayash Kapoor, Benedikt Stroebl, Zachary S. Siegel, Nitya Nadgir, and Arvind Narayanan. 2024. AI Agents That Matter. arXiv preprint arXiv:2407.01502. https://arxiv.org/abs/2407.01502

work page arXiv 2024
[7]

Anders Humlum and Emilie Vestergaard. 2025. Still Waters, Rapid Currents: Early Labor Market Transformation under Generative AI. National Bureau of Economic Research Working Paper 33777. https://doi.org/10.3386/w33777

work page doi:10.3386/w33777 2025
[8]

Lu Fang, Zhe Yuan, Kaifu Zhang, Dante Donati, and Miklos Sarvary. 2025. Generative AI and Firm Productivity: Field Experiments in Online Retail. arXiv preprint arXiv:2510.12049. https://doi.org/10.48550/arXiv.2510.12049

work page doi:10.48550/arxiv.2510.12049 2025
[9]

Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, and Jerry Tworek

Tejal Patwardhan, Rachel Dias, Elizabeth Proehl, Grace Kim, Michele Wang, Olivia Watkins, Sim \'o n Posada Fishman, Marwan Aljubeh, Phoebe Thacker, Laurance Fauconnet, Natalie S. Kim, Patrick Chao, Samuel Miserendino, Gildas Chabot, David Li, Michael Sharman, Alexandra Barr, Amelia Glaese, and Jerry Tworek. 2025. GDPval: Evaluating AI Model Performance on...

work page doi:10.48550/arxiv.2510.04374 2025