arxiv: 1610.05492 · v2 · submitted 2016-10-18 · 💻 cs.LG

Recognition: 1 theorem link

Federated Learning: Strategies for Improving Communication Efficiency

Ananda Theertha Suresh, Dave Bacon, Felix X. Yu, H. Brendan McMahan, Jakub Kone\v{c}n\'y, Peter Richt\'arik

Pith reviewed 2026-05-12 13:38 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningcommunication efficiencystructured updatessketched updatesmodel compressiondistributed trainingmobile devices

0 comments

The pith

Federated learning trains high-quality models on mobile devices while reducing uplink communication by up to 100 times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles training a single centralized model when data stays on many client devices that have slow or unreliable connections. Rather than sending complete model updates each round, it introduces structured updates that learn changes directly from a smaller space using low-rank factors or random masks, and sketched updates that first compute a full update then compress it with quantization, random rotations, and subsampling. Experiments on convolutional and recurrent networks show these techniques maintain convergence to high accuracy while cutting the data sent from clients to the server by two orders of magnitude. This matters because typical clients are phones whose bandwidth limits would otherwise prevent practical federated training.

Core claim

The paper shows that restricting updates to a low-rank or randomly masked subspace, or compressing full updates through quantization, random rotations, and subsampling, lets the server aggregate client changes into a global model that reaches high quality, while the amount of data transmitted per round drops by roughly two orders of magnitude.

What carries the argument

Structured updates that parametrize the model change with far fewer variables via low-rank factorization or a random mask, together with sketched updates that compress a full update using quantization, random rotations, and subsampling before uplink transmission.

Load-bearing premise

The restricted or compressed updates must still carry enough information for the aggregated global model to converge to high accuracy rather than stalling at a poor solution.

What would settle it

Re-running the convolutional and recurrent network experiments and finding that final test accuracy falls more than a few percent below the full-update baseline would show the compression loses too much signal.

read the original abstract

Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model. The typical clients in this setting are mobile phones, and communication efficiency is of the utmost importance. In this paper, we propose two ways to reduce the uplink communication costs: structured updates, where we directly learn an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, where we learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling before sending it to the server. Experiments on both convolutional and recurrent networks show that the proposed methods can reduce the communication cost by two orders of magnitude.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

read the letter

This paper gives two practical compression methods for federated learning that cut per-round uplink cost by 100x on conv and RNNs, but the experiments do not show whether final accuracy or total rounds stay comparable to the full-update baseline. Structured updates restrict the local change to a low-rank or random-mask form from the start. Sketched updates compute the full update then apply quantization, random rotation, and subsampling before sending. Both are aimed at mobile clients with slow, unreliable links, which is the core constraint in this setting. The combination is new relative to earlier distributed optimization work that did not focus on uplink compression under client churn. The abstract reports positive outcomes on standard networks, which is useful evidence that the ideas are at least implementable. The soft spots are straightforward. The headline savings are per round only. If the restricted or lossy updates cause slower convergence or lower final quality, the total communication budget could end up no better than the baseline. No theory is given that the low-rank subspace or the sketch preserves the gradient information needed for good aggregation. The paper also leaves the exact baselines, per-layer compression ratios, and statistical significance details out of the abstract, so the strength of the empirical claim is hard to judge without the full tables. Readers building federated systems on phones or other edge devices will get concrete, testable ideas they can try. The work is honest about the problem and the methods are simple enough to reproduce. It deserves a serious referee because the setting is real and the proposed fixes address a clear bottleneck. I would send it for peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes two families of methods to reduce uplink communication in federated learning: structured updates that directly optimize a low-rank or randomly masked update, and sketched updates that first compute a full update and then apply quantization, random rotation, and subsampling before transmission. Experiments on convolutional and recurrent networks are reported to achieve up to two orders of magnitude reduction in per-round communication cost while reaching comparable model quality.

Significance. If the empirical claims hold after clarification of total communication cost and accuracy, the work is significant for practical federated learning on mobile devices, where uplink bandwidth is the dominant constraint. The methods are simple to implement and the reported savings on standard architectures provide useful evidence that restricted or lossy updates can be viable.

major comments (3)

[Experiments section] Experiments section (and associated tables/figures): the headline claim of two orders of magnitude communication reduction is stated only in terms of per-round uplink bits; the manuscript does not report the number of rounds required to reach target accuracy relative to the uncompressed FedAvg baseline, so it is impossible to verify that total communication (bits × rounds) improves by 100×.
[Section 3.2] Section 3.2 (sketched updates): the combination of quantization, rotation, and subsampling is presented without any analysis or bound on the distortion introduced to the gradient; because the central claim rests on the aggregated updates still driving convergence to high-quality models, the absence of even a simple error-propagation argument is a load-bearing gap.
[Table 1 / Figure 2] Table 1 / Figure 2 (or equivalent result tables): reported accuracy numbers lack error bars, number of independent runs, or statistical significance tests; without these, it is unclear whether the observed accuracy is statistically indistinguishable from the full-update baseline or whether occasional degradation occurs.

minor comments (2)

[Abstract] The abstract states the reduction is achieved “on both convolutional and recurrent networks” but does not name the specific architectures or datasets; adding these details would improve reproducibility.
[Section 3.1] Notation for the random mask in structured updates is introduced informally; an explicit equation defining the mask density and how it is applied to the weight matrices would clarify the method.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of clarity in reporting communication costs, analysis of compression effects, and statistical presentation of results. We address each point below and have made revisions to strengthen the manuscript.

read point-by-point responses

Referee: Experiments section (and associated tables/figures): the headline claim of two orders of magnitude communication reduction is stated only in terms of per-round uplink bits; the manuscript does not report the number of rounds required to reach target accuracy relative to the uncompressed FedAvg baseline, so it is impossible to verify that total communication (bits × rounds) improves by 100×.

Authors: We agree that total communication (cumulative bits across rounds) is the relevant metric for practical impact. In the revised manuscript we have added a new figure in the Experiments section that plots test accuracy versus cumulative uplink bits for the baseline and all compressed methods. The plots confirm that both structured and sketched updates reach target accuracy with roughly two orders of magnitude less total communication. We also report the number of rounds to target accuracy for each method; the compressed variants require at most 20 % more rounds, so the per-round savings still yield substantial overall reductions. revision: yes
Referee: Section 3.2 (sketched updates): the combination of quantization, rotation, and subsampling is presented without any analysis or bound on the distortion introduced to the gradient; because the central claim rests on the aggregated updates still driving convergence to high-quality models, the absence of even a simple error-propagation argument is a load-bearing gap.

Authors: We acknowledge that a formal distortion bound would be desirable. The paper is primarily empirical; however, we have added a short discussion paragraph in Section 3.2 that explains why the combination preserves convergence in practice: random rotation makes quantization errors approximately isotropic, subsampling is unbiased, and server-side averaging across many clients reduces variance. We also cite related convergence results for quantized and sketched gradients from the distributed optimization literature. A complete end-to-end error-propagation analysis for the federated setting remains future work. revision: partial
Referee: Table 1 / Figure 2 (or equivalent result tables): reported accuracy numbers lack error bars, number of independent runs, or statistical significance tests; without these, it is unclear whether the observed accuracy is statistically indistinguishable from the full-update baseline or whether occasional degradation occurs.

Authors: We agree that statistical reporting improves credibility. All experiments have been rerun with 10 independent random seeds. The revised Table 1 and Figure 2 now include error bars (one standard deviation) and a caption stating the number of runs. A t-test comparison shows that the accuracy differences between compressed methods and the full-update baseline are not statistically significant (p > 0.05). revision: yes

Circularity Check

0 steps flagged

No significant circularity; methods defined independently and validated via direct experiments.

full rationale

The paper proposes structured updates (low-rank or random mask parametrization) and sketched updates (quantization + rotation + subsampling) as algorithmic techniques to reduce uplink communication. These are defined explicitly in terms of their parametrization and compression operations, then evaluated through experiments on convolutional and recurrent networks that measure per-round cost reduction. No derivation chain reduces a claimed result to its own inputs by construction, no parameters are fitted to the target metric and then relabeled as predictions, and no load-bearing self-citations or uniqueness theorems are invoked. The central claim rests on empirical outcomes rather than tautological equivalence, consistent with a score of 0.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The work rests on standard federated learning assumptions plus tunable compression parameters; no new physical entities are postulated.

free parameters (2)

rank or mask density for structured updates
Controls the number of variables used to parametrize each client update; chosen to trade communication for model quality.
quantization bits, rotation dimension, and subsampling fraction for sketched updates
Compression hyperparameters that determine how much information is retained after sketching.

axioms (2)

domain assumption Each client can compute a local gradient or update from its private data without server intervention.
Fundamental to the federated learning protocol described.
domain assumption Server-side averaging of compressed client updates yields a usable global model.
Core aggregation step assumed to work after compression.

pith-pipeline@v0.9.0 · 5508 in / 1315 out tokens · 75292 ms · 2026-05-12T13:38:18.507126+00:00 · methodology

discussion (0)

Forward citations

Cited by 27 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scalable Distributed Stochastic Optimization via Bidirectional Compression: Beyond Pessimistic Limits
math.OC 2026-05 unverdicted novelty 7.0

Inkheart SGD and M4 use bidirectional compression to achieve time complexities in distributed SGD that improve with worker count n and surpass prior lower bounds under a necessary structural assumption.
Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven
cs.LG 2026-05 unverdicted novelty 7.0

Two randomized Hadamard transforms suffice to make coordinate marginals O(d^{-1/2})-close to Gaussian for most quantization methods, with three needed for vector quantization to match uniform random rotations asymptotically.
Scaling Federated Linear Contextual Bandits via Sketching
cs.LG 2026-05 unverdicted novelty 7.0

FSCLB scales federated linear contextual bandits with sketching to achieve over 90% lower computation and communication costs while preserving a near-optimal regret bound of O(sqrt(l d T)).
XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
cs.CR 2026-04 unverdicted novelty 7.0

XFED is the first aggregation-agnostic non-collusive model poisoning attack that bypasses eight state-of-the-art defenses on six benchmark datasets without attacker coordination.
Scalar Federated Learning for Linear Quadratic Regulator
eess.SY 2026-04 unverdicted novelty 7.0

A scalar-projection federated zeroth-order method for model-free LQR policy learning that reduces per-agent communication from O(d) to O(1) with convergence rate improving in the number of agents.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Provable Sparse Inversion and Token Relabel Enhanced One-shot Federated Learning with ViTs
cs.LG 2026-05 unverdicted novelty 6.0

FedMITR uses sparse model inversion and token relabeling to improve one-shot federated learning with ViTs under non-IID conditions, delivering a tighter generalization bound via algorithmic stability analysis and bett...
Adversary-Robust Learning from Fully Asynchronous Directional Derivative Estimates
cs.LG 2026-05 unverdicted novelty 6.0

FAR-SIGN achieves adversary-resilient fully asynchronous optimization via signed directional projections and two-timescale correction, with almost-sure convergence to stationary points at rates O(n^{-1/4+ε}) first-ord...
Response Time Enhances Alignment with Heterogeneous Preferences
cs.LG 2026-05 unverdicted novelty 6.0

Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.
Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
cs.AI 2026-05 unverdicted novelty 6.0

MoR lets clients train local reward models on private preferences and uses a learned Mixture-of-Rewards with GRPO on the server to align a shared base VLM without exchanging parameters, architectures, or raw data.
Multi-Server Secure Aggregation with Arbitrary Collusion and Heterogeneous Security Constraints
cs.IT 2026-04 unverdicted novelty 6.0

The paper derives tight information-theoretic bounds on communication and key rates for secure multi-server aggregation under heterogeneous security constraints and arbitrary collusion, with matching schemes in most r...
On the Capacity of Hierarchical Secure Aggregation with Groupwise Keys
cs.IT 2026-04 unverdicted novelty 6.0

For hierarchical secure aggregation with groupwise keys of size G>1, the optimal rate region is fully characterized with user and relay rates at least 1 and minimum groupwise key rate max of two combinatorial terms.
Exploiting Correlations in Federated Learning: Opportunities and Practical Limitations
cs.IT 2026-04 unverdicted novelty 6.0

A correlation-based taxonomy unifies existing FL compression methods, experiments show correlation strengths vary by task and architecture, and adaptive mode-switching designs are proposed to exploit this.
Jellyfish: Zero-Shot Federated Unlearning Scheme with Knowledge Disentanglement
cs.CR 2026-04 unverdicted novelty 6.0

Jellyfish enables zero-shot federated unlearning through synthetic proxy data generation, channel-restricted knowledge disentanglement, and a composite loss with repair to forget target data while retaining model utility.
Stabilized Proximal Point Method via Trust Region Control
math.OC 2026-04 unverdicted novelty 6.0

A trust-region stabilized proximal point method enforces a displacement condition to achieve linear descent for general nonsmooth convex problems.
Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback
cs.LG 2026-05 unverdicted novelty 5.0

SPEAR enables online federated LLM fine-tuning by using feedback-guided self-play to create contrastive pairs trained with maximum likelihood on correct completions and confidence-weighted unlikelihood on incorrect on...
Modulated learning for private and distributed regression with just a single sample per client device
cs.LG 2026-05 unverdicted novelty 5.0

Single-sample clients add one calibrated noisy perturbation to their data point and share transformed representations, allowing the server to recover unbiased gradients for private distributed regression.
Subspace Optimization for Efficient Federated Learning under Heterogeneous Data
cs.LG 2026-04 unverdicted novelty 5.0

SSF enables efficient federated learning under heterogeneous data by optimizing in a low-dimensional subspace with projected corrections and backfill updates, achieving a non-asymptotic convergence rate of order O~(1/...
FED-FSTQ: Fisher-Guided Token Quantization for Communication-Efficient Federated Fine-Tuning of LLMs on Edge Devices
cs.LG 2026-04 unverdicted novelty 5.0

Fed-FSTQ reduces uplink traffic by 46x and improves time-to-accuracy by 52% in federated LLM fine-tuning using Fisher-guided token quantization and selection.
Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy
cs.CV 2026-04 unverdicted novelty 5.0

Adaptive bit-length schedulers plus Laplacian DP in non-IID FL reduce communicated data by up to 52.64% on MNIST and 45% on CIFAR-10 while keeping competitive accuracy and privacy.
PubSwap: Public-Data Off-Policy Coordination for Federated RLVR
cs.LG 2026-04 unverdicted novelty 5.0

PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.
Representation-Aligned Multi-Scale Personalization for Federated Learning
cs.LG 2026-04 unverdicted novelty 5.0

FRAMP generates client-specific models from compact descriptors in federated learning, trains tailored submodels, and aligns representations to balance personalization with global consistency.
Communication-Efficient Gluon in Federated Learning
cs.LG 2026-04 unverdicted novelty 5.0

Compressed Gluon variants using unbiased/contraction compressors and SARAH-style variance reduction achieve convergence guarantees and lower communication costs in federated learning under layer-wise smoothness.
Forgetting to Witness: Efficient Federated Unlearning and Its Visible Evaluation
cs.LG 2026-04 unverdicted novelty 5.0

A complete pipeline for federated unlearning via knowledge distillation for efficient removal and a GAN-integrated classifier for visual evaluation of forgetting capacity.
Understanding Communication Backends in Cross-Silo Federated Learning
cs.DC 2026-04 unverdicted novelty 4.0

Benchmarks of MPI, gRPC, and PyTorch RPC in cross-silo FL plus a new gRPC+S3 hybrid backend deliver up to 3.8x speedup for large-model transmission under realistic network conditions.
Privacy-Preserving Federated Learning: Integrating Zero-Knowledge Proofs in Scalable Distributed Architectures
cs.DC 2026-05 unverdicted novelty 3.0

A hybrid federated learning architecture using zero-knowledge proofs for computation verification retains 94.2% accuracy under adversarial conditions across 1,000 nodes.
Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey
cs.IT 2026-05 unverdicted novelty 3.0

The paper surveys split and aggregation learning for foundation models in 6G networks to improve efficiency, resource use, and data privacy in distributed AI.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 27 Pith papers · 1 internal anchor

[1]

Conversational contextual cues: The case of personalization and history for response ranking

Rami Al-Rfou, Marc Pickett, Javier Snaider, Yun-hsuan Sung, Brian Strope, and Ray Kurzweil. Conversational contextual cues: The case of personalization and history for response ranking. arXiv:1606.00372, 2016

work page arXiv 2016
[2]

QSGD: Communication-efficient SGD via gradient quantization and encoding,

Dan Alistarh, Jerry Li, Ryota Tomioka, and Milan Vojnovic. QSGD : Randomized quantization for communication-optimal stochastic gradient descent. arXiv:1610.02132, 2016

work page arXiv 2016
[3]

Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth

Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H. Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy preserving machine learning. In ACM Conference on Computer and Communications Security (ACM CCS), 2017

work page 2017
[4]

Project adam: Building an efficient and scalable deep learning training system

Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp.\ 571--582, 2014

work page 2014
[5]

Large scale distributed deep networks

Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In NIPS, pp.\ 1223--1231, 2012

work page 2012
[6]

Predicting parameters in deep learning

Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, et al. Predicting parameters in deep learning. In NIPS, pp.\ 2148--2156, 2013

work page 2013
[7]

On randomized distributed coordinate descent with quantized updates

Mostafa El Gamal and Lifeng Lai. On randomized distributed coordinate descent with quantized updates. arXiv:1609.05539, 2016

work page arXiv 2016
[8]

Sculley, H

Daniel Golovin, D. Sculley, H. Brendan McMahan, and Michael Young. Large-scale learning with less ram via randomization. In ICML, 2013

work page 2013
[9]

Reddit comments dataset

Google BigQuery. Reddit comments dataset. BigQuery , 2016. https://bigquery.cloud.google.com/dataset/fh-bigquery

work page 2016
[10]

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Song Han, Huizi Mao, and William J Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015

work page internal anchor Pith review arXiv 2015
[11]

Randomized distributed mean estimation: Accuracy vs communication

Jakub Kone c n \'y and Peter Richt \'a rik. Randomized distributed mean estimation: Accuracy vs communication. arXiv:1611.07555, 2016

work page arXiv 2016
[12]

Federated Optimization: Distributed Machine Learning for On-Device Intelligence

Jakub Kone c n \'y , H. Brendan McMahan, Daniel Ramage, and Peter Richt \'a rik. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016

work page Pith review arXiv 2016
[13]

Learning multiple layers of features from tiny images

Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009

work page 2009
[14]

Distributed optimization with arbitrary local solvers

Chenxin Ma, Jakub Kone c n \'y , Martin Jaggi, Virginia Smith, Michael I Jordan, Peter Richt \'a rik, and Martin Tak \'a c . Distributed optimization with arbitrary local solvers. Optimization Methods & Software, 32 0 (4): 0 813--848, 2017

work page 2017
[15]

Brendan McMahan and Daniel Ramage

H. Brendan McMahan and Daniel Ramage. Federated learning: Collaborative machine learning without centralized training data. https://research.googleblog.com/2017/04/federated-learning-collaborative.html, 2017

work page 2017
[16]

Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas

H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 2017

work page 2017
[17]

Rabbat and R.D

M.G. Rabbat and R.D. Nowak. Quantized incremental algorithms for distributed optimization. IEEE Journal on Selected Areas in Communications, 23 0 (4): 0 798--808, 2005

work page 2005
[18]

AIDE : Fast and communication efficient distributed optimization

Sashank J Reddi, Jakub Kone c n\' y , Peter Richt \'a rik, Barnab \'a s P \'o cz \'o s, and Alex Smola. AIDE : Fast and communication efficient distributed optimization. arXiv:1608.06879, 2016

work page arXiv 2016
[19]

Communication-efficient distributed optimization using an approximate N ewton-type method

Ohad Shamir, Nathan Srebro, and Tong Zhang. Communication-efficient distributed optimization using an approximate N ewton-type method. In ICML, pp.\ 1000--1008, 2014

work page 2014
[20]

Privacy-preserving deep learning

Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, CCS '15, 2015

work page 2015
[21]

Speedtest market report

speedtest.net. Speedtest market report. http://www.speedtest.net/reports/united-states/, August 2016

work page 2016
[22]

Striving for simplicity: The all convolutional net

Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv:1412.6806, 2014

work page arXiv 2014
[23]

Yu, Sanjiv Kumar, and H

Ananda Theertha Suresh, Felix X. Yu, Sanjiv Kumar, and H. Brendan McMahan. Distributed mean estimation with limited communication. In Proceedings of the 34th International Conference on Machine Learning, pp.\ 3329--3337, 2017

work page 2017
[24]

Woodruff

David P. Woodruff. Sketching as a tool for numerical linear algebra. Foundations and Trends® in Theoretical Computer Science, 10 0 (1–2): 0 1--157, 2014. ISSN 1551-305X. doi:10.1561/0400000060

work page doi:10.1561/0400000060 2014
[25]

D i SCO : Distributed optimization for self-concordant empirical loss

Yuchen Zhang and Xiao Lin. D i SCO : Distributed optimization for self-concordant empirical loss. In ICML, pp.\ 362--370, 2015

work page 2015