FedMental: Evaluating Federated Learning for Mental Health Detection from Social Media Data

Anjali Ratnam; Nuredin Ali Abdelkadir; Stevie Chancellor; Zeerak Talat

arxiv: 2605.18936 · v2 · pith:BDBFO3BDnew · submitted 2026-05-18 · 💻 cs.LG · cs.CL

FedMental: Evaluating Federated Learning for Mental Health Detection from Social Media Data

Nuredin Ali Abdelkadir , Anjali Ratnam , Zeerak Talat , Stevie Chancellor This is my paper

Pith reviewed 2026-05-20 12:59 UTC · model grok-4.3

classification 💻 cs.LG cs.CL

keywords federated learningdifferential privacymental health detectionsocial media analysisdepression predictionprivacy trade-offsnon-IID data

0 comments

The pith

Federated learning achieves nearly the same accuracy as centralized training for detecting depression from social media posts, but adding differential privacy causes major drops in performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors test whether federated learning can train models to spot mental health risks like depression and suicide risk on social media without sharing users' raw posts. In simulations where each person is a separate data holder, standard federated learning performs close to models trained on all data combined. Adding noise for stronger privacy, however, hurts results badly even at moderate privacy levels. The drop happens because the noise distorts the infrequent but highly telling words and topics that mark mental health concerns. This highlights both the promise and the current limits of privacy tools for sensitive prediction tasks.

Core claim

While federated learning achieves comparable performance to centralized training on depression identification from X posts, differentially private federated learning suffers a large performance-privacy trade-off due to the distortion of highly informative yet sparse mental health linguistic markers such as health topics and emotion words.

What carries the argument

Treating each user as a client in a non-IID data partition, with differential privacy noise added to model updates during federated aggregation.

If this is right

Standard federated learning can support mental health model training with minimal accuracy loss compared to centralized approaches.
Differentially private federated learning introduces substantial accuracy costs for tasks that rely on sparse linguistic features.
The sparse nature of mental health indicators in text makes them vulnerable to privacy-preserving noise.
Evaluation across varying client fractions and privacy budgets reveals consistent patterns in performance trade-offs for these tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar performance issues may arise in other prediction tasks that depend on rare but diagnostic text features, such as certain types of content moderation.
Practitioners might need to explore privacy methods that better preserve sparse signals or combine federated learning with other protections.
Testing on additional mental health datasets could clarify how much the observed drops depend on the specific characteristics of Twitter and Reddit data.

Load-bearing premise

That modeling each user as an isolated client in a non-IID partition reflects realistic privacy-preserving data sharing for mental health inference and that the results extend to other datasets and client settings.

What would settle it

Running the same differentially private federated learning experiments on a mental health detection task where the key predictive features are dense rather than sparse, and observing no significant performance drop.

Figures

Figures reproduced from arXiv: 2605.18936 by Anjali Ratnam, Nuredin Ali Abdelkadir, Stevie Chancellor, Zeerak Talat.

**Figure 3.** Figure 3: The violin plot illustrates the word count dis [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: The violin plot illustrates the word count dis [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 2.** Figure 2: The violin plot illustrates the word count [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 5.** Figure 5: The violin plot illustrates the word count [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

read the original abstract

Social media text data are often used to train Machine Learning (ML) models to identify users exhibiting high-risk mental health behaviors. However, sharing this sensitive data poses privacy risks and limits the growth of benchmark datasets. We comprehensively evaluate whether privacy-preserving ML techniques can enable safer data sharing while preserving performance. Specifically, we apply federated learning (FL) and Differentially Private FL for two widely-studied mental health prediction tasks: depression detection on X (Twitter) and suicide crisis detection on Reddit. We simulate realistic data-sharing scenarios by treating each user as a client in a non-IID setting, evaluating across different client fractions, aggregation strategies, and privacy budgets. While FL achieves comparable performance to centralized training (centralized F1 = 85.63; best FL model F1 = 83.16) on depression identification, we find that Differentially Private FL has a large performance-privacy trade-off (up to F1 = 27.01 drop) even with low levels of noise (epsilon = 50). This is due to the distortion of highly informative yet sparse mental health linguistic markers related to mental health, like health topics and emotion words. This research empirically demonstrates the potential and limitations of current privacy preservation techniques for mental health inference tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FL stays close to centralized training on these tasks but DP-FL takes a big hit even at loose privacy budgets, with the drop tied to sparse markers.

read the letter

The paper shows that federated learning can get close to centralized performance on depression detection from X posts, but adding differential privacy leads to big drops in F1 scores even with relatively high epsilon values. They link this to how the added noise distorts sparse but key linguistic markers in the text. What stands out is the empirical evaluation across two tasks, with users treated as clients in non-IID partitions. They vary the fraction of clients and try different aggregation methods, which gives a sense of how the results hold up under different conditions. This provides new measurements for these specific mental health inference problems that weren't in the earlier work they cite. The setup does a decent job of simulating privacy-preserving scenarios without sharing raw data. The numbers are concrete and the attribution to sparse markers is plausible given the nature of the signals. That said, the simulation of each user as an independent client might not fully reflect real participation in mental health data sharing. Actual users could have correlated behaviors or uneven activity levels that a standard non-IID split doesn't capture, which could affect how much the privacy mechanism impacts performance. Without more details on variance or multiple runs, it's also a bit hard to gauge how stable those 27-point drops really are. This work is aimed at people studying privacy in machine learning for sensitive applications like mental health. Anyone thinking about whether federated approaches can scale to sparse text data will get value from the trade-off numbers. It deserves peer review because the question is practical and the results add to the evidence base, though the generalizability of the simulation is something referees should check.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates federated learning (FL) and differentially private FL (DP-FL) for mental health detection tasks on social media data: depression identification on X and suicide crisis detection on Reddit. By simulating each user as an independent client under non-IID partitions and testing varying client fractions, aggregation methods, and privacy budgets, the authors report that FL achieves near-centralized performance (best FL F1=83.16 vs centralized 85.63) while DP-FL incurs large drops (up to 27 F1 points) even at ε=50, which they attribute to distortion of sparse but informative linguistic markers such as health topics and emotion words.

Significance. If the empirical results prove robust, the work usefully quantifies the performance-privacy tension for DP-FL on sparse-feature mental-health tasks and could guide more targeted privacy research or deployment decisions in sensitive social-media inference settings.

major comments (3)

[§4 Experimental Setup] §4 (Experimental Setup): The reported F1 scores (e.g., centralized 85.63, best FL 83.16, DP-FL down to 27.01) are given as point estimates without error bars, standard deviations, or results across multiple random seeds or data shuffles. This weakens the ability to assess whether the claimed 27-point DP-FL degradation is statistically reliable or sensitive to partitioning stochasticity.
[§3.2 Data Partitioning] §3.2 (Data Partitioning and Client Simulation): The central performance-privacy conclusion rests on treating each user as a client in a non-IID split. No sensitivity analysis or alternative partitioning schemes (e.g., incorporating temporal posting correlations or platform-specific activity rates) are presented, leaving open whether the observed DP-FL drops are intrinsic or artifacts of this particular simulation.
[§5 Results and Discussion] §5 (Results and Discussion): The attribution of DP-FL degradation specifically to “distortion of highly informative yet sparse mental health linguistic markers” is stated qualitatively. No supporting quantitative evidence—such as pre-/post-DP feature importance rankings, marker frequency shifts, or ablation on marker subsets—is provided to substantiate the causal link.

minor comments (2)

[Abstract] Abstract: Specify the exact configuration (client fraction, aggregation strategy, task) that produces the maximum reported F1 drop of 27.01.
Figure and table captions should explicitly state the number of runs or seeds used so readers can interpret variance.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and robustness of our empirical evaluation. We address each major point below and indicate the corresponding revisions.

read point-by-point responses

Referee: [§4 Experimental Setup] The reported F1 scores (e.g., centralized 85.63, best FL 83.16, DP-FL down to 27.01) are given as point estimates without error bars, standard deviations, or results across multiple random seeds or data shuffles. This weakens the ability to assess whether the claimed 27-point DP-FL degradation is statistically reliable or sensitive to partitioning stochasticity.

Authors: We agree that single-run point estimates limit assessment of statistical reliability. In the revised manuscript we will report means and standard deviations over at least five independent random seeds for the main centralized, FL, and DP-FL configurations, and we will add error bars to the primary result tables and figures. revision: yes
Referee: [§3.2 Data Partitioning] The central performance-privacy conclusion rests on treating each user as a client in a non-IID split. No sensitivity analysis or alternative partitioning schemes (e.g., incorporating temporal posting correlations or platform-specific activity rates) are presented, leaving open whether the observed DP-FL drops are intrinsic or artifacts of this particular simulation.

Authors: The per-user client model directly reflects the privacy constraint that each individual’s posts cannot be shared. We will add a limitations paragraph acknowledging that alternative groupings (e.g., by posting frequency or temporal windows) could be explored in future work; however, re-running the full experimental suite under new partitions is beyond the scope of the current revision. revision: partial
Referee: [§5 Results and Discussion] The attribution of DP-FL degradation specifically to “distortion of highly informative yet sparse mental health linguistic markers” is stated qualitatively. No supporting quantitative evidence—such as pre-/post-DP feature importance rankings, marker frequency shifts, or ablation on marker subsets—is provided to substantiate the causal link.

Authors: We accept that the current explanation is qualitative. In the revision we will add a short quantitative subsection that reports (i) the top-20 features ranked by importance before and after noise injection and (ii) the change in frequency of health- and emotion-related tokens, thereby providing concrete support for the claimed mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements of FL vs. DP-FL performance

full rationale

The paper is a pure empirical evaluation study. It reports measured F1 scores from running centralized training, standard FL, and DP-FL on fixed Reddit/X datasets under explicit non-IID user-as-client partitions. The key numbers (centralized F1 = 85.63, best FL F1 = 83.16, DP-FL drops up to 27.01 at ε=50) are direct experimental outputs, not quantities derived from fitted parameters, self-referential definitions, or prior self-citations. No equations, uniqueness theorems, or ansatzes are invoked; the attribution to “sparse mental health linguistic markers” is a post-hoc interpretation of observed results rather than a load-bearing derivation. The non-IID simulation is a methodological choice whose consequences are measured, not presupposed.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The evaluation rests on standard federated learning assumptions about non-IID user data partitions and the informativeness of the chosen linguistic features under noise; no free parameters or invented entities are introduced.

axioms (1)

domain assumption User-level data partitions produce realistic non-IID distributions that reflect privacy constraints in social media mental health data
Invoked when treating each user as a client to simulate data-sharing scenarios.

pith-pipeline@v0.9.0 · 5765 in / 1428 out tokens · 38888 ms · 2026-05-20T12:59:19.026009+00:00 · methodology

FedMental: Evaluating Federated Learning for Mental Health Detection from Social Media Data

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)