Federated Learning for Mobile Keyboard Prediction

Andrew Hard; Chlo\'e Kiddon; Daniel Ramage; Fran\c{c}oise Beaufays; Hubert Eichner; Kanishka Rao; Rajiv Mathews; Sean Augenstein; Swaroop Ramaswamy

arxiv: 1811.03604 · v2 · pith:ZDLKFQVTnew · submitted 2018-11-08 · 💻 cs.CL

Federated Learning for Mobile Keyboard Prediction

Andrew Hard , Kanishka Rao , Rajiv Mathews , Swaroop Ramaswamy , Fran\c{c}oise Beaufays , Sean Augenstein , Hubert Eichner , Chlo\'e Kiddon

show 1 more author

Daniel Ramage

This is my paper

classification 💻 cs.CL

keywords federatedtraininglearningclientdevicespredictionalgorithmdata

0 comments

read the original abstract

We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over the use of their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

When More Parameters Hurt: Foundation Model Priors Amplify Worst-Client Disparity Under Extreme Federated Heterogeneity
cs.LG 2026-05 unverdicted novelty 7.0

Foundation model priors amplify worst-client disparity under extreme federated heterogeneity, creating a fairness paradox where larger models perform worse for disadvantaged clients.
Unified Compression Algorithm for Distributed Nonconvex Optimization: Generalized to 1-Bit, Saturation, and Bounded Noise
math.OC 2026-04 unverdicted novelty 7.0

A unified compression algorithm for distributed nonconvex optimization achieves O(1/sqrt(T)) convergence for locally-bounded compressors, matching centralized 1-bit methods, with an improved O(1/T^{2/3}) rate after on...
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
cs.CR 2026-04 unverdicted novelty 7.0

XFED is the first aggregation-agnostic non-collusive model poisoning attack that bypasses eight state-of-the-art defenses on six benchmark datasets without attacker coordination.
Distributed Online Convex Optimization with Compressed Communication: Optimal Regret and Applications
cs.LG 2026-04 unverdicted novelty 7.0

Optimal regret bounds O(δ^{-1/2}√T) for convex and O(δ^{-1} log T) for strongly convex losses are achieved in distributed online convex optimization under compressed communication.
Beyond Corner Patches: Semantics-Aware Backdoor Attack in Federated Learning
cs.CR 2026-03 unverdicted novelty 7.0

SABLE shows that semantics-aware natural triggers enable effective backdoor attacks in federated learning against multiple aggregation rules while preserving benign accuracy.
Simulating Word Suggestion Usage in Mobile Typing to Guide Intelligent Text Entry Design
cs.HC 2026-02 unverdicted novelty 7.0

WSTypist is a new RL-based simulation model that reproduces human-like word suggestion strategies, individual differences, and adaptation to design changes in mobile text entry.
Tighter Performance Theory of FedExProx
math.OC 2024-10 unverdicted novelty 7.0

New analysis framework yields tighter linear convergence for FedExProx on non-strongly convex quadratics and PL functions, proving outperformance over GD once communication costs are counted.
Factor Augmented High-Dimensional SGD
stat.ML 2026-05 unverdicted novelty 6.0

Proposes Factor-Augmented SGD that runs on streaming high-dimensional data and supplies the first convergence analysis explicitly accounting for latent-factor estimation error.
Rescaled Asynchronous SGD: Optimal Distributed Optimization under Data and System Heterogeneity
cs.LG 2026-05 unverdicted novelty 6.0

Rescaled ASGD recovers convergence to the true global objective by rescaling worker stepsizes proportional to computation times, matching the known time lower bound in the leading term under non-convex smoothness and ...
Analytically Characterized Optimal Power Control for Signal-Level-Integrated Sensing, Computing and Communication in Federated Learning
cs.IT 2026-04 unverdicted novelty 6.0

An optimal convex-reformulated power control algorithm is derived for signal-level integrated sensing, computing and communication in AirComp-based federated learning under a joint target detection constraint.
FedACT: Concurrent Federated Intelligence across Heterogeneous Data Sources
cs.LG 2026-03 unverdicted novelty 6.0

FedACT schedules devices across concurrent FL jobs via alignment scoring and fairness to reduce average job completion time by up to 8.3x and raise accuracy by up to 44.5% versus baselines.
DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training
cs.LG 2026-01 unverdicted novelty 6.0

DeepFedNAS delivers up to 1.21% higher accuracy and 61x faster architecture search for federated learning on heterogeneous IoT by replacing random supernet sampling with Pareto-optimal elite architectures and using a ...
Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers
cs.CV 2025-10 unverdicted novelty 6.0

PEP-FedPT achieves generalization and personalization in federated ViT prompt tuning via adaptive mixing of class-specific prompts weighted by global class prototypes and client priors, without per-client trainable pa...
Privacy Against Agnostic Inference Attacks in Vertical Federated Learning
cs.CR 2023-02 unverdicted novelty 6.0

Active party in VFL performs agnostic inference attacks via independent models on logistic regression and counters them with tunable distortion of passive-party parameters.
Adaptive Federated Optimization
cs.LG 2020-02 unverdicted novelty 6.0

Proposes federated adaptive optimizers (FedAdagrad, FedAdam, FedYogi) with convergence analysis for non-convex objectives under data heterogeneity and reports empirical gains over FedAvg.
FedEDAuth -- Federated Embedding Distribution Authentication for Counterfeit IC Detection
cs.CR 2026-05 unverdicted novelty 5.0

FedEDAuth filters malicious clients in federated learning for counterfeit IC detection by analyzing embedding distributions from a golden reference, achieving 100% detection of poisoned clients and 94.17% model accura...
HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization
cs.CL 2026-03 unverdicted novelty 5.0

HUOZIIME is an on-device LLM-powered input method with post-training on synthesized data and hierarchical memory that achieves efficient execution and memory-driven personalization.
REVERB-FL: Server-Side Adversarial and Reserve-Enhanced Federated Learning for Robust Audio Classification
eess.AS 2025-12 unverdicted novelty 5.0

REVERB-FL uses a server-side reserve set with retraining and adversarial training to reduce poisoning effects and speed convergence in federated audio classification under non-IID data.
LADSG: Label-Anonymized Distillation and Similar Gradient Substitution for Label Privacy in Vertical Federated Learning
cs.CR 2025-06 unverdicted novelty 5.0

LADSG is a unified defense framework that reduces success rates of passive, active, and direct label inference attacks in VFL by 30-60% via label anonymization, gradient substitution, and norm-based filtering.
BoBa: Boosting Backdoor Detection through Data Distribution Inference in Federated Learning
cs.LG 2024-07 unverdicted novelty 5.0

BoBa uses data distribution inference and overlapping clustering with voting to detect backdoor attacks in non-IID federated learning, claiming attack success rates below 0.001.
Federated Learning by Utility-Constrained Stochastic Aggregation for Improving Rational Participation
cs.LG 2026-05 unverdicted novelty 4.0

FedUCA formalizes the server as an optimizer that uses utility-constrained stochastic aggregation to maximize client retention and global performance in heterogeneous federated learning.
Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey
cs.IT 2026-05 unverdicted novelty 3.0

The paper surveys split and aggregation learning for foundation models in 6G networks to improve efficiency, resource use, and data privacy in distributed AI.