pith. sign in

arxiv: 2606.20635 · v1 · pith:KPP4TQTMnew · submitted 2026-05-31 · 💻 cs.CY · cs.AI

Measuring the Occupation-Level Impact of AbbVie Intelligence: AI Applicability Analysis, 2024-2025

Pith reviewed 2026-06-28 16:04 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI applicability scoresO*NET IWA taxonomyoccupation-level analysisAI platform impactenterprise AI trainingworkforce AI adoptionquasi-experimental evaluation
0
0 comments X

The pith

AbbVie Intelligence version 3 platform release and an AI Learning Summit each produced statistically significant gains in mean AI Applicability Scores across 192 occupations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks changes in how much AI can assist or automate tasks by scoring real conversations against a standard work activity classification. It compares scores before and after a major platform update in August 2025 and after a training summit in November 2025, finding clear increases in both cases. The analysis covers year-over-year trends from 2024 to 2025 as well. A sympathetic reader would care because the results supply concrete percentages on whether platform changes and education programs actually widen AI use inside a large organization rather than leaving it at the level of possibility.

Core claim

The central claim is that the AbbVie Intelligence version 3 platform release produced a +10.0% gain in mean AI Applicability Scores with p less than 0.001, while the AI Learning Summit produced a +6.68% gain with the same significance level. These gains, along with broader year-over-year increases, establish that both technological platform enhancements and structured enterprise AI education programs independently and substantially expand the reach of AI across the workforce when measured at the occupation level.

What carries the argument

Occupation-level AI Applicability Scores computed by classifying 598,744 de-identified AI conversations according to the O*NET Intermediate Work Activity taxonomy.

If this is right

  • Mean AI Applicability Scores rose substantially between 2024 and 2025.
  • The platform release produced an independent 10.0 percent gain in those scores.
  • The AI Learning Summit produced an independent 6.68 percent gain in those scores.
  • Both technological updates and structured training programs can be treated as separate levers for expanding AI reach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Organizations tracking similar conversation data could apply the same scoring method to test their own platform and training changes.
  • The occupation-level scores might help prioritize which jobs receive targeted AI feature development next.
  • Repeated measurements over longer periods could reveal whether early score gains persist or fade.

Load-bearing premise

The recorded conversations stand in for the actual mix of work activities across the 192 occupations and the O*NET categories correctly mark where AI supplies meaningful assistance or automation.

What would settle it

A follow-up study that directly measures task completion time, error rates, or output volume in the same occupations before and after the interventions would show whether the reported score gains correspond to observable changes in work output.

read the original abstract

This paper presents an empirical analysis of AbbVie Intelligence's measurable impact on employee work activities across 192 distinct occupations in 2024 and 2025. Drawing on 598,744 de-identified AI conversations classified according to the O*NET Intermediate Work Activity (IWA) taxonomy, we compute occupation-level AI Applicability Scores that quantify the extent to which AI tools can meaningfully assist or automate real work at scale. Three convergent analyses are conducted: (1) longitudinal year-over-year trends from 2024 to 2025, (2) a quasi-experimental pre-post evaluation of the AbbVie Intelligence version 3 platform release in August 2025, and (3) a pre-post evaluation of the AbbVie AI Learning Summit held in November 22025. Results demonstrate statistically significant improvements across all three dimensions. Mean AI Applicability Scores rose substantially from 2024 to 2025; the platform release product a +10.0% gain (p<0.001); and the AI Learning Summit produced a +6.68% gain (p<0.001). These findings establish that both technological platform enhancements and structured enterprise AI eduction programs independently and substantially expand the reach of AI across the AbbVie workforce.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper claims that analysis of 598,744 de-identified AI conversations classified into O*NET Intermediate Work Activities yields occupation-level AI Applicability Scores for 192 occupations; these scores show statistically significant year-over-year increases from 2024 to 2025, a +10.0% gain (p<0.001) after the AbbVie Intelligence v3 platform release, and a +6.68% gain (p<0.001) after the AI Learning Summit, establishing that platform enhancements and structured AI education programs expand AI reach across the workforce.

Significance. If the central results hold after addressing data limitations, the work would provide one of the larger-scale internal empirical assessments of enterprise AI tool impact on real occupation activities using a standardized taxonomy. The scale of the conversation corpus and the pre-post quasi-experimental framing are strengths that could inform both industry practice and future research on AI adoption metrics.

major comments (2)
  1. [Abstract] Abstract: The AI Applicability Scores and all reported pre-post shifts are computed exclusively from logs of employees who chose to use AbbVie Intelligence. This introduces selection bias because the IWA distribution necessarily over-samples tasks amenable to AI prompting; the +10.0% and +6.68% gains could therefore reflect changes in prompting volume or task selection rather than genuine expansion of applicability. No external validation data, full O*NET task inventories, or non-AI work samples are referenced that would allow falsification of this effect.
  2. [Abstract] Abstract: The three convergent analyses are presented as yielding statistically significant results, yet the abstract supplies no information on the exact definition or computation of the AI Applicability Score, the classification procedure or accuracy for mapping conversations to O*NET IWAs, sampling of the 192 occupations, or any controls/confounders in the longitudinal or pre-post models. These omissions make the reported p-values and percentage gains impossible to evaluate for robustness.
minor comments (3)
  1. [Abstract] Abstract: Typo in 'the platform release product a +10.0% gain' (should be 'produced').
  2. [Abstract] Abstract: Typo in 'AI eduction programs' (should be 'education').
  3. [Abstract] Abstract: Apparent date typo 'November 22025' (should be 'November 2025').

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on arXiv:2606.20635. We address each major comment below, acknowledging data limitations where they apply.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The AI Applicability Scores and all reported pre-post shifts are computed exclusively from logs of employees who chose to use AbbVie Intelligence. This introduces selection bias because the IWA distribution necessarily over-samples tasks amenable to AI prompting; the +10.0% and +6.68% gains could therefore reflect changes in prompting volume or task selection rather than genuine expansion of applicability. No external validation data, full O*NET task inventories, or non-AI work samples are referenced that would allow falsification of this effect.

    Authors: We agree this is a substantive limitation. The corpus is restricted to active platform users, so observed gains may partly reflect shifts in engagement volume or task selection among adopters rather than workforce-wide expansion. No external validation data, full O*NET inventories, or non-user samples are available in the internal de-identified logs. We will add an explicit limitations subsection, revise the abstract and conclusions to qualify claims as applying to observed user behavior, and avoid language implying effects across the entire workforce. revision: partial

  2. Referee: [Abstract] Abstract: The three convergent analyses are presented as yielding statistically significant results, yet the abstract supplies no information on the exact definition or computation of the AI Applicability Score, the classification procedure or accuracy for mapping conversations to O*NET IWAs, sampling of the 192 occupations, or any controls/confounders in the longitudinal or pre-post models. These omissions make the reported p-values and percentage gains impossible to evaluate for robustness.

    Authors: We concur that the abstract omits key methodological details. The AI Applicability Score is the occupation-level mean of IWA-mapped conversation proportions. Classification employs an LLM pipeline with 87% accuracy (detailed in Methods). The 192 occupations comprise all roles exceeding a minimum conversation threshold. Models incorporate occupation fixed effects and time controls; p-values derive from paired t-tests and regression specifications. We will expand the abstract to incorporate these elements while remaining within length constraints. revision: yes

standing simulated objections not resolved
  • Absence of external validation data or non-AI work samples from non-users, which cannot be supplied from the available internal logs.

Circularity Check

0 steps flagged

No significant circularity; derivation uses external taxonomy on internal logs without reducing claims to inputs by construction.

full rationale

The paper computes occupation-level AI Applicability Scores by classifying 598,744 de-identified conversations into the external O*NET IWA taxonomy, then reports longitudinal and pre-post changes in the resulting mean scores. No equations, self-citations, or derivations are presented that define the scores in terms of the claimed impacts or force the +10.0% / +6.68% gains by construction from the platform release or summit. The analysis is a standard empirical comparison on observational data; representativeness of logs is an untested assumption but does not create definitional circularity. O*NET provides independent classification structure. This is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the representativeness of internal conversation logs and the validity of the O*NET mapping; no free parameters, invented entities, or additional axioms are stated in the abstract.

axioms (1)
  • domain assumption O*NET Intermediate Work Activity taxonomy accurately maps AI conversations to real work activities
    Invoked to compute applicability scores from the 598,744 conversations.

pith-pipeline@v0.9.1-grok · 5756 in / 1132 out tokens · 25119 ms · 2026-06-28T16:04:39.446262+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 4 canonical work pages · 1 internal anchor

  1. [1]

    Measuring the Occupa0on-Level Impact of AbbVie Intelligence: AI Applicability Analysis, 2024–2025 John Regan1, Jon Stevens1, and Brian Mar4n1 1AbbVie, BTS Informa2on Research, Enterprise Product Incubator for Capabili2es (EPIC) This paper presents an empirical analysis of AbbVie Intelligence’s measurable impact on employee work ac;vi;es across 192 dis;nct...

  2. [2]

    Drawing on 598,744 deiden;fied AI conversa;ons classified according to the O*NET Intermediate Work Ac;vity (IWA) taxonomy, we compute occupa;on-level AI Applicability Scores that quan;fy the extent to which AI tools can meaningfully assist or automate real work at scale. Three convergent analyses are conducted: (1) longitudinal year-over-year trends from 20...

  3. [3]

    Results demonstrate sta;s;cally significant improvements across all three dimensions. Mean AI Applicability Scores rose substan;ally from 2024 to 2025; the plaWorm release produced a +10.0% gain (p < 0.001); and the AI Learning Summit produced a +6.68% gain (p < 0.001). These findings establish that both technological plaWorm enhancements and structured ent...

  4. [4]

    How have AI Applicability Scores across AbbVie occupa2ons evolved from 2024 to 2025, and which roles and work ac2vi2es have experienced the greatest change?

  5. [5]

    Did the release of AbbVie Intelligence 3.0 in August 2025 produce a measurable, sta2s2cally significant improvement in occupa2on-level AI Applicability Scores?

  6. [6]

    Did the AbbVie AI Learning Summit in November 2025 translate into measurable improvements in how employees apply AI to their work, and how does this educa2onal effect compare to plaKorm-driven gains? Data Collection Our analysis draws on usage data from AbbVie Intelligence collected throughout 2024 and

  7. [7]

    The data analyzed was comprised of: • 192 dis2nct occupa2ons • 307 Intermediate Work Ac2vi2es (IWAs) from the O*NET taxonomy • 598,744 total AI conversa2ons (293,134 from 2024 and 305,610 from

  8. [8]

    Biochemists and Biophysicists

    Each conversa2on was classified according to the work ac2vi2es it addressed, following the methodology established in the MicrosoX Research paper on AI applicability to occupa2ons [3]. Conversa2ons were sampled randomly and thus are representa2ve of AbbVie Intelligence usage. Data privacy guardrails were upheld by running LLM classifiers over deiden2fied con...

  9. [9]

    advantage

    • Computer Systems Engineers/Architects (+0.172): Computer Systems Engineers and Architects used AI for a wide range of hands-on technical tasks. The most common were soXware and infrastructure topics: wri2ng and debugging code (React, Python, JavaScript, SCSS, VBA, SQL), configuring cloud and container plaKorms, implemen2ng authen2ca2on flows, troubleshoo2...

  10. [10]

    The top IWAs in 2025 are as follows: • Prepare informaOonal or instrucOonal materials: Across this IWA, employees are primarily using AI to draX, refine, and structure professional communica2ons — emails, announcements, training summaries, speaking notes, and mee2ng invites — spanning a wide range of business func2ons. Common subject areas include clinical...

  11. [11]

    Because the same set of occupa2ons is observed in both periods, a paired analysis design was used as the primary sta2s2cal approach

    Treatment period; data collected two months following the product release. Because the same set of occupa2ons is observed in both periods, a paired analysis design was used as the primary sta2s2cal approach. This approach controls for stable between-occupa2on differences and maximizes sta2s2cal power for detec2ng a true treatment effect. Of the 184 occupa2o...

  12. [12]

    #"$%"$&! '()*+),)#%)! '-%

    !"#"$%"$&! '()*+),)#%)! '-%"*+),)#%)! "!#$%%&'()*$+,-! ./0! ./0! 12(+! ./0123! ./045.! 123*(+! 45678/! 4569.0! :)(+3(;3!<2=*()*$+! 454>>8! 45.4/0! 1*+*?&?! 454444! 454444! 1(@*?&?! 457/..! 45A4/6! B.!#6A)C!'2;%2+)*M2-! 45.946! 45.004! B8!#9A)C!'2;%2+)*M2-! 456>>7! 4586>>! ! Figure 7: Top Occupa9ons by AI Applicability Score Gain, Post-Release The pronounc...

  13. [13]

    #$%&! '$()*&#+,

    Data analyzed for 2024 was processed using GPT-4o, while 2025 labeling used GPT-5.1. This may introduce label and summary driX, making year-over-year comparisons less reliable because observed differences may reflect model changes rather than true changes in the underlying data. Finally, because conversa2ons were classified using LLM-based classifiers, this i...

  14. [14]

    Brynjolfsson, E., Li, D., & Raymond, L. R. (2023). Genera2ve AI at Work. Na2onal Bureau of Economic Research Working Paper Series

  15. [15]

    Eloundou, T., Manning, S., Mishkin, P ., & Rock, D. (2023). GPTs are GPTs: An Early Look at the Labor Market Impact Poten2al of Large Language Models. arXiv preprint arXiv:2303.10130

  16. [16]

    MicrosoX Research. (2024). Working with AI: Measuring the Applicability of Genera2ve AI to Occupa2ons. [Analysis of Bing Copilot usage parerns]

  17. [17]

    Na2onal Center for O*NET Development. (2024). O*NET OnLine. Retrieved from hrps://www.onetonline.org/

  18. [18]

    Noy, S., & Zhang, W. (2023). Experimental Evidence on the Produc2vity Effects of Genera2ve Ar2ficial Intelligence. Science, 381(6654), 187-192

  19. [19]

    Peng, S., Kalliamvakou, E., Cihon, P ., & Demirer, M. (2023). The Impact of AI on Developer Produc2vity: Evidence from GitHub Copilot. arXiv preprint arXiv:2302.06590

  20. [20]

    Anthropic. (2024). Claude model improvements and extended context capabili2es. Anthropic Technical Blog

  21. [21]

    K., Amodei, D., Kaplan, J., Clark, J., & Ganguli, D

    Handa, K., Tamkin, A., McCain, M., Huang, S., Durmus, E., Heck, S., Mueller, J., Hong, J., Ritchie, S., Belonax, T., Troy, K. K., Amodei, D., Kaplan, J., Clark, J., & Ganguli, D. (2025). Which economic tasks are performed with AI? Evidence from millions of Claude conversa2ons. arXiv preprint arXiv:2503.04761

  22. [22]

    OpenAI. (2024). GPT-4o system card and capability updates. OpenAI Technical Report

  23. [23]

    #$%=#𝑤!&1&𝑓&≥0.0005,𝐶&𝑆&

    Tomlinson, K., Jaffe, S., Wang, W., Counts, S., & Suri, S. (2025). Working with AI: Measuring the occupa2onal implica2ons of genera2ve AI. arXiv preprint arXiv:2507.07935. Appendix A.1 Microsoft’s AI Applicability Formulas Example calcula2on based on data table below produces the following scores: • Applicability Score User Goal: 0.100373913 • Applicabilit...