Recognition: unknown
Which Voices Move Markets? Speaker Identity and the Cross-Section of Post-Earnings Returns
Pith reviewed 2026-05-10 13:26 UTC · model grok-4.3
The pith
Not all voices in earnings calls move markets equally, as weighting sentiment by speaker identity substantially improves predictions of post-earnings stock returns.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Post-earnings stock returns are not equally affected by all speakers in a conference call. Applying a domain-specific transformer model to 6.5 million sentences from S&P 500 transcripts, the authors derive speaker weights of 49% for analysts, 30% for CFOs, 16% for other executives, and 5% for others. The resulting section-weighted sentiment achieves an out-of-sample Spearman IC of 0.142, generates monthly long-short alpha of 2.03% unexplained by the Fama-French five-factor model, and remains significant after controlling for standardized unexpected earnings. The weighted approach fully subsumes traditional dictionary-based sentiment in joint tests, while cumulative return patterns indicate a
What carries the argument
Section-weighted sentiment formed by multiplying speaker-specific weights, derived empirically from return data, by sentiment scores extracted from distinct sections of earnings call transcripts.
If this is right
- Uniform aggregation of sentiment across an entire call understates the predictive content because speakers differ in influence.
- The weighted sentiment adds explanatory power beyond standardized unexpected earnings alone.
- Transformer-based sentiment fully accounts for the information in older dictionary methods when both are tested together.
- Prices incorporate the weighted information gradually, producing the observed pattern of cumulative abnormal returns.
- The temporal out-of-sample split yields stronger rather than weaker results, consistent with stable relationships.
Where Pith is reading between the lines
- Real-time monitoring tools for investors could prioritize analyst comments during calls to anticipate return patterns.
- Similar speaker-weighting methods applied to other corporate events might identify additional channels of information flow.
- The dominance of analyst speech suggests markets treat external perspectives as more credible signals than internal ones in some contexts.
- Industry or size differences in the optimal weights could be tested to refine application across firm types.
Load-bearing premise
The empirically derived speaker weights remain stable and continue to predict returns in new time periods and market conditions.
What would settle it
Observing that a long-short portfolio sorted on the section-weighted sentiment produces no statistically significant alpha after risk adjustment when tested on data after 2025.
Figures
read the original abstract
We utilize FinBERT, a domain-specific transformer model, to parse 6.5 million sentences from 16,428 S&P 500 quarterly earnings call transcripts (2015-2025) and demonstrate that post-earnings stock returns are not equally affected by all speakers in a conference call. Our section-weighted sentiment, with empirically derived speaker weights (Analyst 49%, CFO 30%, Executive 16%, Other 5%), achieves an out-of-sample Spearman IC of 0.142 versus 0.115 in-sample, generates monthly long-short alpha of 2.03% unexplained by the Fama-French five-factor model (t = 6.49), and remains significant after controlling for standardized unexpected earnings (SUE). FinBERT section-weighted sentiment entirely subsumes the Loughran-McDonald dictionary approach (FinBERT t = 5.90; LM t = 0.86 in the combined specification). Signal decay analysis and cumulative abnormal return charts confirm gradual price adjustment consistent with sluggish assimilation of soft information. All results undergo rigorous out-of-sample validation with an explicit temporal split, yielding improved rather than deteriorated predictive power.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes 6.5 million sentences from 16,428 S&P 500 earnings call transcripts (2015-2025) using FinBERT to extract sentiment. It argues that speaker identity matters for post-earnings returns and proposes a section-weighted sentiment measure with empirically derived weights (Analyst 49%, CFO 30%, Executive 16%, Other 5%). This measure achieves an out-of-sample Spearman IC of 0.142 (vs. 0.115 in-sample), produces a monthly long-short alpha of 2.03% (t=6.49) unexplained by Fama-French five factors, remains significant after controlling for SUE, and subsumes the Loughran-McDonald dictionary approach. The results are supported by signal decay analysis and cumulative abnormal return charts, with all claims validated via temporal out-of-sample split.
Significance. If the speaker weights were derived without look-ahead bias, the paper would make a meaningful contribution by showing that soft information in earnings calls is not uniformly informative across speakers and that weighting by speaker identity enhances predictive power for returns. The large-scale application of a domain-specific LLM, combined with rigorous out-of-sample testing, SUE controls, and evidence of alpha generation, positions this as a useful refinement to sentiment-based trading signals in quantitative finance. The finding that FinBERT subsumes LM is particularly notable for advancing beyond dictionary methods.
major comments (1)
- [Abstract] Abstract: The abstract states that the speaker weights are 'empirically derived' but does not indicate whether this derivation was performed exclusively on the in-sample portion of the temporal split. Since the weights directly determine the section-weighted sentiment used for the reported out-of-sample IC of 0.142 and the alpha of 2.03% (t = 6.49), any use of post-split data in weight optimization would invalidate the out-of-sample claims due to look-ahead bias. The manuscript must specify the optimization procedure (e.g., return regression or IC maximization) and confirm the in-sample restriction.
minor comments (2)
- [Abstract] Abstract: The observation that out-of-sample IC (0.142) exceeds in-sample IC (0.115) is counter to typical expectations for an empirically fitted model and would benefit from explicit discussion or robustness checks in the results section to rule out data peculiarities or overfitting artifacts.
- [Data and Methods] The manuscript should provide the exact temporal split dates (e.g., training end year) and the precise method for assigning sentences to speaker categories to support reproducibility of the 6.5 million sentence corpus.
Simulated Author's Rebuttal
We thank the referee for their careful review and constructive feedback. We address the concern about potential look-ahead bias in the derivation of speaker weights below and commit to revising the manuscript for greater clarity.
read point-by-point responses
-
Referee: [Abstract] Abstract: The abstract states that the speaker weights are 'empirically derived' but does not indicate whether this derivation was performed exclusively on the in-sample portion of the temporal split. Since the weights directly determine the section-weighted sentiment used for the reported out-of-sample IC of 0.142 and the alpha of 2.03% (t = 6.49), any use of post-split data in weight optimization would invalidate the out-of-sample claims due to look-ahead bias. The manuscript must specify the optimization procedure (e.g., return regression or IC maximization) and confirm the in-sample restriction.
Authors: We agree that the abstract is currently ambiguous on this point and that explicit confirmation is necessary to support the out-of-sample claims. We will revise the abstract to state that the speaker weights are 'empirically derived from in-sample data' and expand the methodology section to describe the optimization procedure (IC maximization on in-sample observations only) while confirming that no post-split data entered the weight derivation. This revision will be incorporated in the next version of the manuscript. revision: yes
Circularity Check
Empirically derived speaker weights risk look-ahead bias in OOS IC and alpha claims
specific steps
-
fitted input called prediction
[Abstract]
"Our section-weighted sentiment, with empirically derived speaker weights (Analyst 49%, CFO 30%, Executive 16%, Other 5%), achieves an out-of-sample Spearman IC of 0.142 versus 0.115 in-sample, generates monthly long-short alpha of 2.03% unexplained by the Fama-French five-factor model (t = 6.49), and remains significant after controlling for standardized unexpected earnings (SUE)."
Speaker weights are obtained by empirical fitting on the transcript/return data. The paper then reports OOS performance metrics and alpha as validation of the weighted signal. Without an explicit restriction that weight derivation used only the pre-split in-sample period, the OOS IC, alpha, and LM-subsumption statistics are not independent predictions but are statistically influenced by the fitting step itself.
full rationale
The paper's central result is the section-weighted sentiment signal using fixed speaker weights (Analyst 49%, CFO 30%, Executive 16%, Other 5%). These weights are described as 'empirically derived' and then applied to produce the headline OOS Spearman IC of 0.142, monthly alpha of 2.03% (t=6.49), and subsumption of LM. The abstract and claims invoke an explicit temporal split for 'rigorous out-of-sample validation,' but provide no statement that weight optimization (by regression, IC maximization, or similar) was confined to the in-sample window. This matches the fitted-input-called-prediction pattern: the load-bearing parameter is tuned on the data, after which the tuned output is presented as independent predictive power. No other circular steps (self-definitional, self-citation load-bearing, ansatz smuggling, or renaming) appear in the provided text. The FinBERT vs. LM comparison and factor-adjusted alpha are downstream of the weights and inherit the same dependence.
Axiom & Free-Parameter Ledger
free parameters (1)
- Speaker weights =
Analyst 49%, CFO 30%, Executive 16%, Other 5%
axioms (2)
- domain assumption FinBERT accurately captures financial sentiment in earnings call transcripts
- domain assumption Temporal data split prevents information leakage
Reference graph
Works this paper leans on
-
[1]
FinBERT: A Large Language Model for Extracting Information from Financial Text
Huang, A. H., Wang, H., and Yang, Y . (2023). “FinBERT: A Large Language Model for Extracting Information from Financial Text.”Con- temporary Accounting Research, 40(2), 806–841
2023
-
[2]
When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks
Loughran, T. and McDonald, B. (2011). “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.”The Journal of Finance, 66(1), 35–65
2011
-
[3]
Giving Content to Investor Sentiment: The Role of Media in the Stock Market
Tetlock, P. C. (2007). “Giving Content to Investor Sentiment: The Role of Media in the Stock Market.”The Journal of Finance, 62(3), 1139– 1168
2007
-
[4]
More Than Words: Quantifying Language to Measure Firms’ Fundamentals
Tetlock, P. C., Saar-Tsechansky, M., and Macskassy, S. (2008). “More Than Words: Quantifying Language to Measure Firms’ Fundamentals.” The Journal of Finance, 63(3), 1437–1467
2008
-
[5]
Word Power: A New Approach for Content Analysis
Jegadeesh, N. and Wu, D. (2013). “Word Power: A New Approach for Content Analysis.”Journal of Financial Economics, 110(3), 712–729
2013
-
[6]
BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Under- standing.”Proceedings of NAACL-HLT, 4171–4186
2019
-
[7]
Do Conference Calls Affect Analysts’ Forecasts?
Bowen, R. M., Davis, A. K., and Matsumoto, D. A. (2002). “Do Conference Calls Affect Analysts’ Forecasts?”The Accounting Review, 77(2), 285–316
2002
-
[8]
What Makes Conference Calls Useful? The Information Content of Managers’ Pre- sentations and Analysts’ Discussion Sessions
Matsumoto, D., Pronk, M., and Roelofsen, E. (2011). “What Makes Conference Calls Useful? The Information Content of Managers’ Pre- sentations and Analysts’ Discussion Sessions.”The Accounting Review, 86(4), 1383–1414
2011
-
[9]
An Empirical Evaluation of Accounting Income Numbers
Ball, R. and Brown, P. (1968). “An Empirical Evaluation of Accounting Income Numbers.”Journal of Accounting Research, 6(2), 159–178
1968
-
[10]
Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium?
Bernard, V . L. and Thomas, J. K. (1989). “Post-Earnings-Announcement Drift: Delayed Price Response or Risk Premium?”Journal of Accounting Research, 27, 1–36
1989
-
[11]
Manager Sentiment and Stock Returns
Jiang, F., Lee, J., Martin, X., and Zhou, G. (2019). “Manager Sentiment and Stock Returns.”Journal of Financial Economics, 132(1), 126–149
2019
-
[12]
A Five-Factor Asset Pricing Model
Fama, E. F. and French, K. R. (2015). “A Five-Factor Asset Pricing Model.”Journal of Financial Economics, 116(1), 1–22
2015
-
[13]
A Simple, Positive Semi- Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix
Newey, W. K. and West, K. D. (1987). “A Simple, Positive Semi- Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.”Econometrica, 55(3), 703–708
1987
-
[14]
Risk, Return, and Equilibrium: Empirical Tests
Fama, E. F. and MacBeth, J. D. (1973). “Risk, Return, and Equilibrium: Empirical Tests.”Journal of Political Economy, 81(3), 607–636
1973
-
[15]
XGBoost: A Scalable Tree Boosting System
Chen, T. and Guestrin, C. (2016). “XGBoost: A Scalable Tree Boosting System.”Proceedings of the 22nd ACM SIGKDD, 785–794
2016
-
[16]
A Unified Approach to Inter- preting Model Predictions
Lundberg, S. M. and Lee, S.-I. (2017). “A Unified Approach to Inter- preting Model Predictions.”Advances in Neural Information Processing Systems, 30, 4766–4777
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.