Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model

Minghao Huang (aSSIST University; Seoul; Shuyao Gao; South Korea)

arxiv: 2604.04464 · v2 · pith:RGZJ3PUNnew · submitted 2026-04-06 · 💻 cs.CY · econ.GN· q-fin.EC

Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model

Shuyao Gao , Minghao Huang (aSSIST University , Seoul , South Korea) This is my paper

Pith reviewed 2026-05-10 20:06 UTC · model grok-4.3

classification 💻 cs.CY econ.GNq-fin.EC

keywords AI occupational substitutionTech-Risk Dual-Factor ModelOccupational Automation IndexCognitive Risk AsymmetryCompliance Premiumprofessional liabilitylabor market analysisLLM ensemble evaluation

0 comments

The pith

AI occupational substitution is bounded by institutional risk and liability rather than technical capability alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that occupations are gradually encroached upon by AI through specific work activities, but real-world adoption is constrained by business risks including professional liability and compliance that pure capability measures ignore. It applies a dual-factor scoring approach to thousands of detailed activities across U.S. occupations, using LLM ensembles and expert validation to derive automation indices. This reveals high exposure for cognitive symbolic roles like data scientists alongside complete resilience in physical trades and high-stakes care, challenging routine-biased change models. A sympathetic reader would care because the findings imply asymmetric labor impacts and the emergence of wage premiums tied to risk absorption capacity.

Core claim

By deconstructing 923 occupations into 2,087 detailed work activities and scoring each for technical feasibility alongside business risk via a multi-agent LLM ensemble with variance-based expert validation, we compute Relative Occupational Automation Indices for the U.S. labor market. This produces the finding that non-routine cognitive roles highly dependent on symbolic manipulation face OAI around 0.70 while unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience, quantifying a cognitive risk asymmetry and hypothesizing a compliance premium in which wages increasingly reward risk-absorption capacity.

What carries the argument

Tech-Risk Dual-Factor Model: a framework that separately scores technical feasibility and institutional business risk for each detailed work activity before aggregating into Relative Occupational Automation Indices.

If this is right

Non-routine cognitive roles face substantially higher substitution exposure than previously estimated under capability-only models.
Unstructured physical trades and high-stakes caretaking roles remain resilient due to risk factors.
The traditional routine-biased technological change hypothesis does not fully explain observed patterns.
Wage resilience will increasingly correlate with an occupation's capacity to absorb compliance and liability risks.
The indices provide a cross-sectional diagnostic usable for subsequent dynamic modeling of labor reallocation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sectors with heavy regulation may see slower AI adoption regardless of technical readiness.
Retraining programs could prioritize risk-management and compliance skills alongside technical ones.
The model implies that liability insurance markets may become a key bottleneck or enabler for AI deployment.

Load-bearing premise

The multi-agent LLM ensemble plus expert panel validation can reliably quantify both technical feasibility and the institutional premium of professional liability across work activities without systematic scoring bias.

What would settle it

Longitudinal data on actual employment declines, wage stagnation, or hiring patterns for data scientists versus construction workers or nurses that show no greater AI-driven displacement for the former group over the next several years.

Figures

Figures reproduced from arXiv: 2604.04464 by Minghao Huang (aSSIST University, Seoul, Shuyao Gao, South Korea).

**Figure 2.** Figure 2: Visualizing the Cognitive Gap in Risk Perception. In the [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: The Tech-Risk Dual-Factor Automation Matrix. The color gradient [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Macroeconomic Labor Market Vulnerability Distribution of the [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

read the original abstract

The deployment of Large Language Models (LLMs) has ignited concerns about technological unemployment. Existing task-based evaluations predominantly measure theoretical "exposure" to AI capabilities, ignoring critical frictions of real-world commercial adoption: liability, compliance, and physical safety. We argue occupations are not eradicated instantaneously, but gradually encroached upon via atomic actions. We introduce a Tech-Risk Dual-Factor Model to re-evaluate this. By deconstructing 923 occupations into 2,087 Detailed Work Activities (DWAs), we utilize a multi-agent LLM ensemble to score both technical feasibility and business risk. Through variance-based Human-in-the-Loop (HITL) validation with an expert panel, we demonstrate a profound cognitive gap: isolated algorithmic probabilities fail to encapsulate the "institutional premium" imposed by experts bounded by professional liability. Applying a strictly algorithmic baseline via mathematical bottleneck aggregation, we calculate Relative Occupational Automation Indices ($OAI$) for the U.S. labor market. Our findings challenge the traditional Routine-Biased Technological Change (RBTC) hypothesis. Non-routine cognitive roles highly dependent on symbolic manipulation (e.g., Data Scientists) face unprecedented exposure ($OAI \approx 0.70$). Conversely, unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience, quantifying a profound "Cognitive Risk Asymmetry." We hypothesize the emergent necessity of a "Compliance Premium," indicating wage resilience increasingly tied to risk-absorption capacity. We frame these findings as a cross-sectional diagnostic of systemic vulnerability, establishing a foundation for subsequent Computable General Equilibrium (CGE) econometric modeling involving dynamic wage elasticity and structural labor reallocation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's dual-factor model usefully separates technical feasibility from liability risks in AI substitution but its OAI numbers rest on unvalidated LLM scoring with no reliability or external checks reported.

read the letter

The key point is that this work tries to improve on standard task-exposure studies by adding a risk layer for things like professional liability and compliance, then aggregates scores into new OAI values that flip some expectations about which jobs are most exposed. The decomposition of 923 occupations into 2,087 detailed work activities is a concrete step forward and lets them apply a bottleneck method that treats the hardest-to-automate activity as the limiter for each role. That produces the reported pattern where symbolic cognitive jobs like data science score around 0.70 while physical trades sit near zero, which directly questions the routine-biased technological change story by highlighting an institutional premium on risk absorption. The multi-agent LLM ensemble plus variance-based expert adjustment is a practical way to generate the two-factor scores at scale. The soft spot is the absence of any calibration evidence. No inter-rater agreement numbers, no sensitivity tests on prompt or model choice, and no cross-check against observables like actual adoption rates, insurance premia, or regulatory delays. Without those, the cognitive risk asymmetry claim stays unanchored and could simply reflect how the scoring system weights symbolic versus physical tasks. This is aimed at labor economists and AI policy people who want to model real adoption frictions rather than pure capability. A reader wanting solid quantitative forecasts will find the setup promising but incomplete. It deserves a serious referee because the model and the DWA-level data work are new enough to be worth testing, even if the current results need validation added.

Referee Report

4 major / 2 minor

Summary. The manuscript introduces the Tech-Risk Dual-Factor Model, which decomposes 923 U.S. occupations into 2,087 Detailed Work Activities (DWAs). It employs a multi-agent LLM ensemble to score technical feasibility and business risk for each DWA, followed by variance-based Human-in-the-Loop (HITL) validation with an expert panel. Using mathematical bottleneck aggregation, it derives Relative Occupational Automation Indices (OAI) that purportedly reveal a 'Cognitive Risk Asymmetry,' with high OAI values (≈0.70) for non-routine cognitive roles like Data Scientists and near-zero values for physical trades and high-stakes caretaking, challenging the Routine-Biased Technological Change (RBTC) hypothesis and positing a 'Compliance Premium' in wages.

Significance. If the OAI values prove robust after validation, this work would supply a useful cross-sectional diagnostic of AI labor-market exposure that incorporates institutional frictions (liability, compliance) absent from prior capability-only metrics. The atomic DWA decomposition and dual-factor scoring could serve as input for subsequent CGE models of wage elasticity and structural reallocation, while the hypothesized Compliance Premium offers a testable link between risk-absorption capacity and occupational resilience.

major comments (4)

[Abstract] Abstract: The reported OAI values (e.g., ≈0.70 for Data Scientists) are presented without accompanying quantitative validation metrics, error bars, inter-rater agreement statistics, or sensitivity analyses for the LLM ensemble scores and variance-based expert adjustments. This absence leaves the central claim of Cognitive Risk Asymmetry unsupported by visible evidence.
[Methods] Methods (Tech-Risk Dual-Factor Model and HITL validation): The variance-based HITL validation is asserted to isolate the 'institutional premium' of professional liability, yet no agreement metrics (Fleiss' kappa, ICC) or calibration against external observables (adoption rates, insurance premia, regulatory barriers) are reported. Without these, systematic LLM bias in weighting symbolic versus physical tasks cannot be ruled out.
[Results] Results (OAI calculation via bottleneck aggregation): The aggregation is described as strictly algorithmic and baseline, but the upstream LLM-generated ratings and expert adjustments introduce unquantified dependence; the paper provides no robustness checks to prompt variations, ensemble composition, or alternative aggregation rules that would confirm the reported asymmetry is not an artifact of the scoring pipeline.
[Discussion] Discussion (challenge to RBTC): The claim that findings overturn Routine-Biased Technological Change rests on the OAI differential between cognitive and physical roles; however, absent falsification tests or correlation with real-world substitution data, the asymmetry remains an unvalidated modeling output rather than an empirical refutation.

minor comments (2)

[Abstract] Abstract: The mathematical definition of the bottleneck aggregation used to obtain OAI is not supplied, nor is the precise formula for combining the two risk factors; a compact equation should appear on first use.
[Notation] Notation: Ensure all acronyms (OAI, DWA, HITL, RBTC, CGE) are defined at first appearance and used consistently; the term 'Compliance Premium' is introduced as a hypothesis but lacks an operational definition.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for these constructive comments, which help strengthen the transparency of the Tech-Risk Dual-Factor Model. We address each major point below and indicate revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The reported OAI values (e.g., ≈0.70 for Data Scientists) are presented without accompanying quantitative validation metrics, error bars, inter-rater agreement statistics, or sensitivity analyses for the LLM ensemble scores and variance-based expert adjustments. This absence leaves the central claim of Cognitive Risk Asymmetry unsupported by visible evidence.

Authors: We agree the abstract omits these supporting statistics. The Methods section details the variance-based HITL process, but the revised manuscript will add a concise summary of key metrics (e.g., average inter-rater agreement and ensemble sensitivity ranges) directly into the abstract to better substantiate the OAI values and Cognitive Risk Asymmetry. revision: yes
Referee: [Methods] Methods (Tech-Risk Dual-Factor Model and HITL validation): The variance-based HITL validation is asserted to isolate the 'institutional premium' of professional liability, yet no agreement metrics (Fleiss' kappa, ICC) or calibration against external observables (adoption rates, insurance premia, regulatory barriers) are reported. Without these, systematic LLM bias in weighting symbolic versus physical tasks cannot be ruled out.

Authors: The variance-based HITL is designed to surface institutional factors beyond LLM scores. In revision we will report Fleiss' kappa and ICC for the expert panel. Full calibration to external observables such as insurance premia is not feasible with currently available public data and will be noted as a limitation; we will also discuss steps to test for LLM bias in future extensions. revision: partial
Referee: [Results] Results (OAI calculation via bottleneck aggregation): The aggregation is described as strictly algorithmic and baseline, but the upstream LLM-generated ratings and expert adjustments introduce unquantified dependence; the paper provides no robustness checks to prompt variations, ensemble composition, or alternative aggregation rules that would confirm the reported asymmetry is not an artifact of the scoring pipeline.

Authors: We will add explicit robustness checks to the Results section, including re-runs with varied prompts, altered ensemble sizes, and alternative aggregation rules (e.g., mean pooling). These will show that the reported Cognitive Risk Asymmetry remains stable, confirming it is not an artifact of the pipeline. revision: yes
Referee: [Discussion] Discussion (challenge to RBTC): The claim that findings overturn Routine-Biased Technological Change rests on the OAI differential between cognitive and physical roles; however, absent falsification tests or correlation with real-world substitution data, the asymmetry remains an unvalidated modeling output rather than an empirical refutation.

Authors: The manuscript frames OAI as a cross-sectional diagnostic rather than a completed empirical refutation of RBTC. The differential is produced by the dual-factor DWA scoring. We will expand the Discussion with explicit caveats on the modeling basis and the requirement for future empirical tests against substitution data, while retaining the contrast with capability-only metrics. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines OAI via deconstruction of occupations into DWAs, LLM-ensemble scoring of feasibility and risk factors, variance-based expert validation, and subsequent bottleneck aggregation. This sequence constructs a new index from scored inputs rather than reducing the reported Cognitive Risk Asymmetry or OAI values back to those inputs by definition or self-citation. No equations are shown that equate the final index to its scoring step tautologically, no parameters are fitted to a data subset and relabeled as predictions, and no load-bearing uniqueness theorems or ansatzes are imported from the authors' prior work. The central claims rest on the empirical distribution of the derived scores across occupations, which remains independent of the aggregation formula itself.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 3 invented entities

The central claim depends on the validity of LLM-based risk scoring and the representativeness of the DWA decomposition; no independent empirical benchmarks for the new indices are supplied.

axioms (3)

domain assumption Occupations can be decomposed into 2,087 representative Detailed Work Activities without loss of critical context
Basis for scoring 923 occupations
domain assumption Multi-agent LLM ensembles produce unbiased scores for both technical feasibility and business risk
Core input to the dual-factor model
ad hoc to paper Variance-based HITL validation with an expert panel captures the institutional premium of professional liability
Used to correct algorithmic probabilities

invented entities (3)

Tech-Risk Dual-Factor Model no independent evidence
purpose: To quantify occupational substitution by jointly modeling capability and risk
New framework introduced in the paper
Relative Occupational Automation Indices (OAI) no independent evidence
purpose: To produce a numeric index of substitution exposure for each occupation
Derived output of the model
Compliance Premium no independent evidence
purpose: To explain wage resilience through risk-absorption capacity
Hypothesized emergent property

pith-pipeline@v0.9.0 · 5618 in / 1647 out tokens · 58959 ms · 2026-05-10T20:06:15.818157+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Tech-Risk Dual-Factor Mapping Matrix... AI(DWAi)=f(Ti,Ri) piecewise with thresholds 0/0.3/0.5/0.7/1.0; bottleneck AI(tj)=min AI(d); OAI(ok)=sum wt·AI(t)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Cognitive Risk Asymmetry... Institutional Premium from liability and loss aversion

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.