arxiv: 2605.09855 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: no theorem link

Concordia: Self-Improving Synthetic Tables for Federated LLMs

Jimin Huang , Duanyu Feng , Nuo Chen , Xiaoyu Wang , Zhiqiang Zhang , Xueqing Peng , Mingquan Lin , Prayag Tiwari

show 3 more authors

Guojun Xiong Alejandro Lopez-Lira Sophia Ananiadou

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:21 UTC · model grok-4.3

classification 💻 cs.LG

keywords federated learningsynthetic datalarge language modelstabular dataprivacy-preserving trainingpolicy optimizationself-improving systemsdistribution shift

0 comments

The pith

A tri-level loop lets federated clients refine their own synthetic tables using local scorers and shared ensembles without sharing data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that synthetic data for federated LLM training on tabular tasks can be made adaptive rather than static by aligning its generation directly to private validation signals. Clients train locally with parameter-efficient updates on synthetic tables while learning lightweight scorers that reweight those samples according to their utility on held-out private data. An outer optimization step then uses group-relative policy optimization to improve each client's generator, guided by an ensemble of scorers pooled across clients but without ever moving raw records or generator weights. If this alignment holds, federated performance, stability across clients, and robustness to distribution shifts rise above what fixed synthetic baselines achieve on isolated finance and healthcare tables.

Core claim

Concordia is a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite strict data isolation: at the client level, models adapt via LoRA on synthetic tables while lightweight utility scorers are learned from private validation feedback to reweight synthetic samples; at the outer level, each client refines its synthetic table generator via group-relative policy optimization guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data.

What carries the argument

Tri-level optimization that couples local LoRA training and scorer learning with GRPO-based generator refinement driven by a shared ensemble of client-specific utility scorers.

If this is right

Synthetic tables become progressively more useful for local training as the generators receive repeated feedback from private validation signals.
Federated performance on tabular tasks improves without any client exposing raw records or full model parameters.
Cross-client stability increases because the shared scorer ensemble provides consistent guidance even when client distributions differ.
Robustness to distribution shift grows because the refinement process continuously adjusts synthetic data to match evolving local utility.
The method keeps all generator parameters local while still benefiting from collective scorer information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same self-refinement pattern could be tested on non-tabular tasks if comparable local utility signals can be defined without data sharing.
If the scorer ensemble remains effective at scale, the approach may lower the data-quality barrier that currently limits federated LLM deployment in regulated domains.
A natural next measurement would be how many refinement rounds are needed before synthetic-table utility plateaus on a given client distribution.
The framework implies that privacy-preserving adaptation need not trade off against adaptability, provided the utility signal stays local.

Load-bearing premise

Lightweight utility scorers trained on each client's private validation feedback can accurately reweight synthetic samples and steer generator updates without leaking validation data or model details.

What would settle it

On the same privacy-sensitive finance and healthcare tabular benchmarks, the Concordia pipeline produces no gain or a drop in federated accuracy, cross-client stability, or robustness to distribution shift relative to static or decoupled synthetic-data baselines.

Figures

Figures reproduced from arXiv: 2605.09855 by Alejandro Lopez-Lira, Duanyu Feng, Guojun Xiong, Jimin Huang, Mingquan Lin, Nuo Chen, Prayag Tiwari, Sophia Ananiadou, Xiaoyu Wang, Xueqing Peng, Zhiqiang Zhang.

**Figure 1.** Figure 1: The Overall Framework for Concordia. little minority signal and can collapse to majority-only behavior. Under non-IID clients, this failure is amplified and harms the sites where rare-event performance matters. Accordingly, we emphasize long-tail utility after local adaptation, including worst-client behavior, rather than only average accuracy. Tabular task and LLM interface. Each example consists of a mi… view at source ↗

**Figure 2.** Figure 2: Travel: reward-aligned refinement expands utility [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

read the original abstract

Federated learning (FL) enables training large language models (LLMs) without sharing raw data, but adapting LLMs under strict data isolation and non-IID client distributions remains challenging in practice. Synthetic data offers a natural privacy-preserving surrogate for local training, yet existing federated pipelines typically treat synthetic generation as static or loosely coupled with downstream optimization, leading to rapidly diminishing utility under heterogeneous clients. We study federated adaptation of LLMs on tabular tasks where raw records and validation data cannot be shared, and local training must rely entirely on synthetic tables. We propose Concordia, a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite these constraints. At the client level, models are adapted via parameter-efficient LoRA training on synthetic tables. Clients additionally learn lightweight utility scorers from private validation feedback to reweight synthetic samples during local training. At the outer level, each client refines its own synthetic table generator using group-relative policy optimization (GRPO), guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare demonstrate that Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift compared to static and decoupled synthetic-data baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Concordia proposes a tri-level loop with local scorers and GRPO to adapt synthetic tables for federated LLMs, but the abstract supplies no numbers or controls to show the mechanism actually works in non-IID tabular settings.

read the letter

The paper's main contribution is a tri-level framework where clients adapt LLMs via LoRA on synthetic tables, train lightweight utility scorers on their private validation data to reweight samples, and then refine their own table generators with GRPO guided by a shared ensemble of those scorers. Nothing is aggregated across clients, which keeps the privacy constraints intact. This directly targets the drop in utility that happens when synthetic data is generated once and then used statically under client heterogeneity in tabular domains like finance and healthcare. The framing of the problem is clear and the architecture is a sensible attempt to couple generation to downstream federated performance without leaking validation data or model parameters. The abstract states that experiments show gains in performance, cross-client stability, and robustness to distribution shift over static and decoupled baselines. That would be useful if the results hold up. The soft spots are the missing details. No quantitative results, baseline definitions, ablation studies, or statistical tests appear in the abstract, so it is not possible to judge whether the GRPO refinement step delivers measurable improvement or whether the local scorers simply overfit to each client's distribution. The stress-test concern about weak or noisy signals from heterogeneous scorers in non-IID regimes looks reasonable given the current description; nothing in the provided text demonstrates that the ensemble supplies a stable enough gradient for generator updates. This work is aimed at researchers handling federated LLM adaptation on privacy-sensitive tabular data. Readers already working on synthetic data pipelines or federated optimization might pick up the tri-level structure as an idea to test, but the paper needs the full experiments and controls before it becomes broadly citable. It deserves peer review because the practical problem is real and the proposed coupling is concrete enough to evaluate, even if major revisions for empirical rigor would be required.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Concordia, a tri-level optimization framework for federated adaptation of LLMs on tabular tasks under strict data isolation. Clients adapt models via LoRA on synthetic tables, learn lightweight utility scorers from private validation feedback to reweight synthetic samples, and refine their own table generators via GRPO guided by a shared ensemble of heterogeneous scorers. No generator parameters or validation data are aggregated or exposed. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare are reported to show consistent gains in federated performance, cross-client stability, and robustness to distribution shift relative to static and decoupled synthetic-data baselines.

Significance. If the claimed improvements are substantiated, the work would be significant for privacy-preserving federated LLM training on tabular data. It offers a concrete mechanism to align synthetic data generation with downstream utility in non-IID regimes without violating isolation constraints, using local scorers and GRPO-based self-refinement. This could be particularly relevant for regulated domains such as finance and healthcare where static synthetic data pipelines often degrade under heterogeneity.

major comments (2)

[Abstract] Abstract: the central claim that 'Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift' is asserted without any quantitative results, baseline definitions, statistical tests, ablation details, or effect sizes. This absence makes it impossible to assess whether the tri-level framework delivers measurable gains over the static/decoupled baselines it contrasts against.
[Method overview / §3] The framework (described in the method overview) rests on the assumption that locally trained utility scorers produce reliable reweightings and that the shared heterogeneous ensemble supplies a sufficiently strong, non-overfitting signal for each client's GRPO-based generator refinement. In non-IID tabular regimes typical of the target domains, local scorers risk capturing client-specific artifacts; without ablations isolating the contribution of the scorers versus the ensemble versus the GRPO loop, the load-bearing mechanism for self-improvement remains unverified.

minor comments (1)

[Abstract] The abstract refers to 'lightweight utility scorers' and 'group-relative policy optimization (GRPO)' without defining their architectures, loss formulations, or hyper-parameters at first mention; a short definitional sentence or forward reference to the relevant subsection would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of empirical results and to more explicitly verify the contributions of individual framework components. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift' is asserted without any quantitative results, baseline definitions, statistical tests, ablation details, or effect sizes. This absence makes it impossible to assess whether the tri-level framework delivers measurable gains over the static/decoupled baselines it contrasts against.

Authors: We agree that the abstract would benefit from concrete quantitative highlights to allow immediate assessment of the claimed gains. In the revised version we will incorporate specific effect sizes (e.g., average relative improvements in federated accuracy and stability metrics across the finance and healthcare benchmarks), explicit baseline names (static synthetic tables and decoupled generation pipelines), and references to the statistical tests performed. The experiments section already contains the supporting tables and significance results; the abstract will now summarize the key numbers and effect sizes without altering the original claim. revision: yes
Referee: [Method overview / §3] The framework (described in the method overview) rests on the assumption that locally trained utility scorers produce reliable reweightings and that the shared heterogeneous ensemble supplies a sufficiently strong, non-overfitting signal for each client's GRPO-based generator refinement. In non-IID tabular regimes typical of the target domains, local scorers risk capturing client-specific artifacts; without ablations isolating the contribution of the scorers versus the ensemble versus the GRPO loop, the load-bearing mechanism for self-improvement remains unverified.

Authors: We appreciate the referee's emphasis on isolating the contribution of each optimization level, particularly given the non-IID nature of the target domains. While the current experiments demonstrate end-to-end gains relative to the static and decoupled baselines, we acknowledge that dedicated component-wise ablations would provide stronger verification of the utility scorers, the shared ensemble, and the GRPO refinement. In the revision we will add a new ablation subsection (and corresponding appendix tables) that evaluates performance when each element is removed in turn: (i) uniform sample weighting without learned scorers, (ii) local-only signals without the shared ensemble, and (iii) static generators without the GRPO loop. These results will be reported on the same privacy-sensitive benchmarks to quantify incremental contributions and address potential client-specific artifact concerns. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework driven by external private validation feedback

full rationale

The paper introduces Concordia as a tri-level optimization framework for federated LLM adaptation on tabular tasks: client-level LoRA training on synthetic tables, local lightweight utility scorers trained from private validation feedback to reweight samples, and outer GRPO refinement of each client's generator guided by a shared ensemble of heterogeneous scorers without parameter aggregation or data exposure. No equations, derivations, or first-principles results are presented in the provided text. The claimed improvements in performance, stability, and shift robustness are asserted via experiments on external privacy-sensitive benchmarks, not reduced by construction to any fitted parameter, self-defined quantity, or self-citation chain. The central mechanism explicitly depends on independent validation feedback signals, which are external to the synthetic generation process itself. This is the most common honest non-finding for a method paper whose value is demonstrated empirically rather than through tautological renaming or internal fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard federated learning assumptions (data isolation, non-IID distributions) and existing techniques (LoRA, GRPO) without introducing new free parameters, axioms, or invented entities beyond typical machine-learning hyperparameters.

axioms (1)

domain assumption Federated clients cannot share raw records or validation data and must train exclusively on synthetic tables
Stated directly in the problem setup of the abstract.

pith-pipeline@v0.9.0 · 5558 in / 1213 out tokens · 46229 ms · 2026-05-12T04:21:38.155361+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

[1]

Teo, Lucas C

Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, and Mustafa A. Mustafa

work page
[2]

arXiv:2407.05174 [cs.LG] https://arxiv.org/abs/2407.05174

Synthetic Data Aided Federated Learning Using Foundation Models. arXiv:2407.05174 [cs.LG] https://arxiv.org/abs/2407.05174

work page arXiv
[3]

Opeoluwa Akinseloyin, Xiaorui Jiang, and Vasile Paladel. 2025. Weakly Super- vised Active Learning for Abstract Screening Leveraging LLM-Based Pseudo- Labeling. InmedRxiv. https://api.semanticscholar.org/CorpusID:280868109

work page 2025
[4]

Ezzeldin, Qingfeng Liu, Kee- Bong Song, Mostafa El-Khamy, and Salman Avestimehr

Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee- Bong Song, Mostafa El-Khamy, and Salman Avestimehr. 2023. SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models.ArXivabs/2308.06522 (2023). https://api.semanticscholar.org/CorpusID:260887495

work page arXiv 2023
[5]

Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. 2022. Language models are realistic tabular data generators.arXiv preprint arXiv:2210.06280(2022)

work page arXiv 2022
[6]

Yujun Cheng, Weiting Zhang, Zhewei Zhang, Chuan Zhang, Shengjin Wang, and Shiwen Mao. 2025. Toward Federated Large Language Models: Motivations, Methods, and Future Directions.IEEE Communications Surveys & Tutorials27 (2025), 2733–2764. https://api.semanticscholar.org/CorpusID:274196624

work page 2025
[7]

Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. 2024. Mind the Privacy Unit! User- Level Differential Privacy for Language Model Fine-Tuning.ArXivabs/2406.14322 (2024). https://api.semanticscholar.org/CorpusID:270620664

work page arXiv 2024
[8]

Edward Collins and Michel Wang. 2025. Federated Learning: A Survey on Privacy- Preserving Collaborative Intelligence.ArXivabs/2504.17703 (2025). https: //api.semanticscholar.org/CorpusID:278033793

work page arXiv 2025
[9]

Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. 2021. Exploiting Shared Representations for Personalized Federated Learning.ArXiv abs/2102.07078 (2021). https://api.semanticscholar.org/CorpusID:231924497

work page arXiv 2021
[10]

Detrano, András Jánosi, Walter Steinbrunn, Matthias Emil Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher

Robert C. Detrano, András Jánosi, Walter Steinbrunn, Matthias Emil Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher. 1989. International application of a new probability algorithm for the diagnosis of coronary artery disease.The American journal of cardiology64 5 (1989), 304–10. https://api.semanticscholar.or...

work page 1989
[11]

Tao Fan, Yan Kang, Guoqiang Ma, Weijing Chen, Wenbin Wei, Lixin Fan, and Qiang Yang. 2023. FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models.ArXivabs/2310.10049 (2023). https: //api.semanticscholar.org/CorpusID:264145987

work page arXiv 2023
[12]

Jack Goetz and Ambuj Tewari. 2020. Federated Learning via Synthetic Data. arXiv:2008.04489 [cs.LG] https://arxiv.org/abs/2008.04489

work page arXiv 2020
[13]

Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, and Jing Yuan. 2025. Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation.ArXivabs/2509.20680 (2025). https://api.semanticscholar.org/CorpusID:281526125

work page arXiv 2025
[15]

Hans Hofmann. 1994. Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77

work page doi:10.24432/c5nc77 1994
[16]

Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, and Daniel Lazar. 2024. PrE-Text: training language models on private federated data in the age of LLMs. InProceedings of the 41st International Conference on Machine Learning. 19043–19061

work page 2024
[17]

Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, and Giulia Fanti. 2025. Private Federated Learning using Preference-Optimized Synthetic Data.arXiv preprint arXiv:2504.16438(2025)

work page arXiv 2025
[18]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models.ArXivabs/2106.09685 (2021). https://api.semanticscholar.org/CorpusID: 235458009

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

Shengyuan Hu, Jack Goetz, Kshitiz Malik, Hongyuan Zhan, Zhe Liu, and Yue Liu

work page
[20]

arXiv:2204.01273 [cs.LG] https://arxiv.org/abs/2204.01273

FedSynth: Gradient Compression via Synthetic Data in Federated Learning. arXiv:2204.01273 [cs.LG] https://arxiv.org/abs/2204.01273

work page arXiv
[21]

Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2023. Tabddpm: Modelling tabular data with diffusion models. InInternational Confer- ence on Machine Learning. PMLR, 17564–17579

work page 2023
[22]

Eugenio Lomurno and Matteo Matteucci. 2024. Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing. arXiv:2407.20830 [cs.LG] https: //arxiv.org/abs/2407.20830

work page arXiv 2024
[23]

Matthews

Brian W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme.Biochimica et biophysica acta405 2 (1975), 442–51. https://api.semanticscholar.org/CorpusID:44596673

work page 1975
[24]

Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, and Jinwoo Shin. 2024. Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning.ArXivabs/2406.08527 (2024). https: //api.semanticscholar.org/CorpusID:270440964

work page arXiv 2024
[25]

Ravuri and Oriol Vinyals

Suman V. Ravuri and Oriol Vinyals. 2019. Classification Accuracy Score for Conditional Generative Models.ArXivabs/1905.10887 (2019). https://api. semanticscholar.org/CorpusID:166228599

work page arXiv 2019
[26]

Mengye Ren, Wenyuan Zeng, Binh Yang, and Raquel Urtasun. 2018. Learning to Reweight Examples for Robust Deep Learning. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:4321928

work page 2018
[27]

Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu Galtier, Bennett A

Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu Galtier, Bennett A. Landman, Klaus H. Maier-Hein, Sébastien Ourselin, Micah J. Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and Manuel Jorge Cardoso. 2020. The future of digital health with federated learning.NPJ Dig...

work page 2020
[28]

Maximilian Schmidhuber and Udo Kruschwitz. 2024. Llm-based synthetic datasets: Applications and limitations in toxicity detection.LREC-COLING 2024 (2024), 37

work page 2024
[29]

Aivin V Solatorio and Olivier Dupriez. 2023. Realtabformer: Generating realistic relational and tabular data using transformers.arXiv preprint arXiv:2302.02041 (2023)

work page arXiv 2023
[30]

Town, Rory M

Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar Märtens, Lawrence Phillips, Stephen M. Town, Rory M. Donovan-Maiye, and Julien Fauqueur. 2025. Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries.ArXivabs/2505.21801 (2025). https://api.semanticscholar.org/CorpusID: 278960066

work page arXiv 2025
[31]

Marika Swanberg, Ryan McKenna, Edo Roth, Albert Cheu, and Peter Kairouz

work page
[32]

Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?arXiv preprint arXiv:2502.06555(2025)

work page arXiv 2025
[33]

Alysa Ziying Tan, Han Yu, Li zhen Cui, and Qiang Yang. 2021. Towards Personal- ized Federated Learning.IEEE Transactions on Neural Networks and Learning Sys- tems34 (2021), 9587–9603. https://api.semanticscholar.org/CorpusID:232076330

work page 2021
[34]

Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https: //arxiv.org/abs/2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[35]

Jianwei Wang, Junyao Yang, Haoran Li, Huiping Zhuang, Cen Chen, and Ziqian Zeng. 2025. RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis.arXiv preprint arXiv:2502.18517(2025)

work page arXiv 2025
[36]

Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, and Hao Wang. 2024. Harmonic: Harnessing llms for tabular data synthesis and privacy protection.arXiv preprint arXiv:2408.02927 (2024)

work page arXiv 2024
[37]

Zezhou Wang, Yaxin Du, Xingjun Ma, Yu-Gang Jiang, Zhuzhong Qian, and Siheng Chen. 2024. Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models.Findings of the Association for Computational Linguistics: EMNLP 2025(2024). https://api.semanticscholar.org/ CorpusID:272987142

work page 2024
[38]

Futian Weng, Miao Zhu, Mike Buckle, peta hajek, and Mohammad Zoynul Abedin

work page
[39]

https://api.semanticscholar.org/CorpusID:275048075

Class Imbalance Bayesian Model Averaging for Consumer Loan Default Prediction: The Role of Soft Credit Information.Research in International Business and Finance(2024). https://api.semanticscholar.org/CorpusID:275048075

work page 2024
[40]

Huiyu Wu and Diego Klabjan. 2024. LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples.arXiv preprint arXiv:2410.19114(2024)

work page arXiv 2024
[41]

Bangzhou Xin, Yangyang Geng, Teng Hu, Sheng Chen, Wei Yang, Shaowei Wang, and Liusheng Huang. 2022. Federated synthetic data generation with differential privacy.Neurocomputing468 (2022), 1–10. doi:10.1016/J.NEUCOM.2021.10.027 Conference’17, July 2017, Washington, DC, USA Huang et al

work page doi:10.1016/j.neucom.2021.10.027 2022
[42]

Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni

work page
[43]

Modeling tabular data using conditional gan.Advances in neural information processing systems32 (2019)

work page 2019
[44]

Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzheng Cheng. 2024. On Protecting the Data Privacy of Large Language Models (LLMs): A Survey.arXiv preprint arXiv:2403.05156(2024)

work page arXiv 2024
[45]

Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, and Jian Wu. 2024. Small Models are LLM Knowledge Triggers for Medical Tabular Prediction. InInternational Conference on Learning Representations. https: //api.semanticscholar.org/CorpusID:268248002

work page 2024
[46]

Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, and Siheng Chen. 2024. FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models.ArXivabs/2406.04845 (2024). https://api. semanticscholar.org/CorpusID:270357469

work page arXiv 2024
[47]

Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, and Siheng Chen. 2024. OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2024). https: //api.semanticscholar.org/CorpusID:267627968

work page 2024
[48]

Liping Yi, Han Yu, Gang Wang, and Xiaoguang Liu. 2023. FedLoRA: Model- Heterogeneous Personalized Federated Learning with LoRA Tuning.ArXiv abs/2310.13283 (2023). https://api.semanticscholar.org/CorpusID:264405713

work page arXiv 2023
[49]

Zilong Zhao, Aditya Kunar, Robert Birke, and Lydia Y Chen. 2021. Ctab-gan: Effective table data synthesizing. InAsian Conference on Machine Learning. PMLR, 97–112

work page 2021
[50]

"feature name

Tianyuan Zou, Yang Liu, Peng Li, Yufei Xiong, Jianqing Zhang, Jingjing Liu, Xiaozhou Ye, Ye Ouyang, and Ya-Qin Zhang. 2025. Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion.arXiv preprint arXiv:2502.00245(2025). A Implementation Details This section describes implementation details for reproducing Concordia, including the downstream model ...

work page arXiv 2025