pith. machine review for the scientific record. sign in

arxiv: 2605.09855 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: no theorem link

Concordia: Self-Improving Synthetic Tables for Federated LLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:21 UTC · model grok-4.3

classification 💻 cs.LG
keywords federated learningsynthetic datalarge language modelstabular dataprivacy-preserving trainingpolicy optimizationself-improving systemsdistribution shift
0
0 comments X

The pith

A tri-level loop lets federated clients refine their own synthetic tables using local scorers and shared ensembles without sharing data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that synthetic data for federated LLM training on tabular tasks can be made adaptive rather than static by aligning its generation directly to private validation signals. Clients train locally with parameter-efficient updates on synthetic tables while learning lightweight scorers that reweight those samples according to their utility on held-out private data. An outer optimization step then uses group-relative policy optimization to improve each client's generator, guided by an ensemble of scorers pooled across clients but without ever moving raw records or generator weights. If this alignment holds, federated performance, stability across clients, and robustness to distribution shifts rise above what fixed synthetic baselines achieve on isolated finance and healthcare tables.

Core claim

Concordia is a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite strict data isolation: at the client level, models adapt via LoRA on synthetic tables while lightweight utility scorers are learned from private validation feedback to reweight synthetic samples; at the outer level, each client refines its synthetic table generator via group-relative policy optimization guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data.

What carries the argument

Tri-level optimization that couples local LoRA training and scorer learning with GRPO-based generator refinement driven by a shared ensemble of client-specific utility scorers.

If this is right

  • Synthetic tables become progressively more useful for local training as the generators receive repeated feedback from private validation signals.
  • Federated performance on tabular tasks improves without any client exposing raw records or full model parameters.
  • Cross-client stability increases because the shared scorer ensemble provides consistent guidance even when client distributions differ.
  • Robustness to distribution shift grows because the refinement process continuously adjusts synthetic data to match evolving local utility.
  • The method keeps all generator parameters local while still benefiting from collective scorer information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-refinement pattern could be tested on non-tabular tasks if comparable local utility signals can be defined without data sharing.
  • If the scorer ensemble remains effective at scale, the approach may lower the data-quality barrier that currently limits federated LLM deployment in regulated domains.
  • A natural next measurement would be how many refinement rounds are needed before synthetic-table utility plateaus on a given client distribution.
  • The framework implies that privacy-preserving adaptation need not trade off against adaptability, provided the utility signal stays local.

Load-bearing premise

Lightweight utility scorers trained on each client's private validation feedback can accurately reweight synthetic samples and steer generator updates without leaking validation data or model details.

What would settle it

On the same privacy-sensitive finance and healthcare tabular benchmarks, the Concordia pipeline produces no gain or a drop in federated accuracy, cross-client stability, or robustness to distribution shift relative to static or decoupled synthetic-data baselines.

Figures

Figures reproduced from arXiv: 2605.09855 by Alejandro Lopez-Lira, Duanyu Feng, Guojun Xiong, Jimin Huang, Mingquan Lin, Nuo Chen, Prayag Tiwari, Sophia Ananiadou, Xiaoyu Wang, Xueqing Peng, Zhiqiang Zhang.

Figure 1
Figure 1. Figure 1: The Overall Framework for Concordia. little minority signal and can collapse to majority-only behavior. Under non-IID clients, this failure is amplified and harms the sites where rare-event performance matters. Accordingly, we empha￾size long-tail utility after local adaptation, including worst-client behavior, rather than only average accuracy. Tabular task and LLM interface. Each example consists of a mi… view at source ↗
Figure 2
Figure 2. Figure 2: Travel: reward-aligned refinement expands utility [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

Federated learning (FL) enables training large language models (LLMs) without sharing raw data, but adapting LLMs under strict data isolation and non-IID client distributions remains challenging in practice. Synthetic data offers a natural privacy-preserving surrogate for local training, yet existing federated pipelines typically treat synthetic generation as static or loosely coupled with downstream optimization, leading to rapidly diminishing utility under heterogeneous clients. We study federated adaptation of LLMs on tabular tasks where raw records and validation data cannot be shared, and local training must rely entirely on synthetic tables. We propose Concordia, a tri-level optimization framework that aligns synthetic data generation with federated validation utility despite these constraints. At the client level, models are adapted via parameter-efficient LoRA training on synthetic tables. Clients additionally learn lightweight utility scorers from private validation feedback to reweight synthetic samples during local training. At the outer level, each client refines its own synthetic table generator using group-relative policy optimization (GRPO), guided by an ensemble of heterogeneous scorers shared across clients, without aggregating generator parameters or exposing validation data. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare demonstrate that Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift compared to static and decoupled synthetic-data baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Concordia, a tri-level optimization framework for federated adaptation of LLMs on tabular tasks under strict data isolation. Clients adapt models via LoRA on synthetic tables, learn lightweight utility scorers from private validation feedback to reweight synthetic samples, and refine their own table generators via GRPO guided by a shared ensemble of heterogeneous scorers. No generator parameters or validation data are aggregated or exposed. Experiments on privacy-sensitive tabular benchmarks from finance and healthcare are reported to show consistent gains in federated performance, cross-client stability, and robustness to distribution shift relative to static and decoupled synthetic-data baselines.

Significance. If the claimed improvements are substantiated, the work would be significant for privacy-preserving federated LLM training on tabular data. It offers a concrete mechanism to align synthetic data generation with downstream utility in non-IID regimes without violating isolation constraints, using local scorers and GRPO-based self-refinement. This could be particularly relevant for regulated domains such as finance and healthcare where static synthetic data pipelines often degrade under heterogeneity.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift' is asserted without any quantitative results, baseline definitions, statistical tests, ablation details, or effect sizes. This absence makes it impossible to assess whether the tri-level framework delivers measurable gains over the static/decoupled baselines it contrasts against.
  2. [Method overview / §3] The framework (described in the method overview) rests on the assumption that locally trained utility scorers produce reliable reweightings and that the shared heterogeneous ensemble supplies a sufficiently strong, non-overfitting signal for each client's GRPO-based generator refinement. In non-IID tabular regimes typical of the target domains, local scorers risk capturing client-specific artifacts; without ablations isolating the contribution of the scorers versus the ensemble versus the GRPO loop, the load-bearing mechanism for self-improvement remains unverified.
minor comments (1)
  1. [Abstract] The abstract refers to 'lightweight utility scorers' and 'group-relative policy optimization (GRPO)' without defining their architectures, loss formulations, or hyper-parameters at first mention; a short definitional sentence or forward reference to the relevant subsection would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight opportunities to strengthen the presentation of empirical results and to more explicitly verify the contributions of individual framework components. We address each point below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'Concordia consistently improves federated performance, cross-client stability, and robustness to distribution shift' is asserted without any quantitative results, baseline definitions, statistical tests, ablation details, or effect sizes. This absence makes it impossible to assess whether the tri-level framework delivers measurable gains over the static/decoupled baselines it contrasts against.

    Authors: We agree that the abstract would benefit from concrete quantitative highlights to allow immediate assessment of the claimed gains. In the revised version we will incorporate specific effect sizes (e.g., average relative improvements in federated accuracy and stability metrics across the finance and healthcare benchmarks), explicit baseline names (static synthetic tables and decoupled generation pipelines), and references to the statistical tests performed. The experiments section already contains the supporting tables and significance results; the abstract will now summarize the key numbers and effect sizes without altering the original claim. revision: yes

  2. Referee: [Method overview / §3] The framework (described in the method overview) rests on the assumption that locally trained utility scorers produce reliable reweightings and that the shared heterogeneous ensemble supplies a sufficiently strong, non-overfitting signal for each client's GRPO-based generator refinement. In non-IID tabular regimes typical of the target domains, local scorers risk capturing client-specific artifacts; without ablations isolating the contribution of the scorers versus the ensemble versus the GRPO loop, the load-bearing mechanism for self-improvement remains unverified.

    Authors: We appreciate the referee's emphasis on isolating the contribution of each optimization level, particularly given the non-IID nature of the target domains. While the current experiments demonstrate end-to-end gains relative to the static and decoupled baselines, we acknowledge that dedicated component-wise ablations would provide stronger verification of the utility scorers, the shared ensemble, and the GRPO refinement. In the revision we will add a new ablation subsection (and corresponding appendix tables) that evaluates performance when each element is removed in turn: (i) uniform sample weighting without learned scorers, (ii) local-only signals without the shared ensemble, and (iii) static generators without the GRPO loop. These results will be reported on the same privacy-sensitive benchmarks to quantify incremental contributions and address potential client-specific artifact concerns. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical framework driven by external private validation feedback

full rationale

The paper introduces Concordia as a tri-level optimization framework for federated LLM adaptation on tabular tasks: client-level LoRA training on synthetic tables, local lightweight utility scorers trained from private validation feedback to reweight samples, and outer GRPO refinement of each client's generator guided by a shared ensemble of heterogeneous scorers without parameter aggregation or data exposure. No equations, derivations, or first-principles results are presented in the provided text. The claimed improvements in performance, stability, and shift robustness are asserted via experiments on external privacy-sensitive benchmarks, not reduced by construction to any fitted parameter, self-defined quantity, or self-citation chain. The central mechanism explicitly depends on independent validation feedback signals, which are external to the synthetic generation process itself. This is the most common honest non-finding for a method paper whose value is demonstrated empirically rather than through tautological renaming or internal fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract relies on standard federated learning assumptions (data isolation, non-IID distributions) and existing techniques (LoRA, GRPO) without introducing new free parameters, axioms, or invented entities beyond typical machine-learning hyperparameters.

axioms (1)
  • domain assumption Federated clients cannot share raw records or validation data and must train exclusively on synthetic tables
    Stated directly in the problem setup of the abstract.

pith-pipeline@v0.9.0 · 5558 in / 1213 out tokens · 46229 ms · 2026-05-12T04:21:38.155361+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 2 internal anchors

  1. [1]

    Teo, Lucas C

    Fatima Abacha, Sin G. Teo, Lucas C. Cordeiro, and Mustafa A. Mustafa

  2. [2]

    arXiv:2407.05174 [cs.LG] https://arxiv.org/abs/2407.05174

    Synthetic Data Aided Federated Learning Using Foundation Models. arXiv:2407.05174 [cs.LG] https://arxiv.org/abs/2407.05174

  3. [3]

    Opeoluwa Akinseloyin, Xiaorui Jiang, and Vasile Paladel. 2025. Weakly Super- vised Active Learning for Abstract Screening Leveraging LLM-Based Pseudo- Labeling. InmedRxiv. https://api.semanticscholar.org/CorpusID:280868109

  4. [4]

    Ezzeldin, Qingfeng Liu, Kee- Bong Song, Mostafa El-Khamy, and Salman Avestimehr

    Sara Babakniya, Ahmed Roushdy Elkordy, Yahya H. Ezzeldin, Qingfeng Liu, Kee- Bong Song, Mostafa El-Khamy, and Salman Avestimehr. 2023. SLoRA: Federated Parameter Efficient Fine-Tuning of Language Models.ArXivabs/2308.06522 (2023). https://api.semanticscholar.org/CorpusID:260887495

  5. [5]

    Vadim Borisov, Kathrin Seßler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. 2022. Language models are realistic tabular data generators.arXiv preprint arXiv:2210.06280(2022)

  6. [6]

    Yujun Cheng, Weiting Zhang, Zhewei Zhang, Chuan Zhang, Shengjin Wang, and Shiwen Mao. 2025. Toward Federated Large Language Models: Motivations, Methods, and Future Directions.IEEE Communications Surveys & Tutorials27 (2025), 2733–2764. https://api.semanticscholar.org/CorpusID:274196624

  7. [7]

    Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, and Chiyuan Zhang. 2024. Mind the Privacy Unit! User- Level Differential Privacy for Language Model Fine-Tuning.ArXivabs/2406.14322 (2024). https://api.semanticscholar.org/CorpusID:270620664

  8. [8]

    Edward Collins and Michel Wang. 2025. Federated Learning: A Survey on Privacy- Preserving Collaborative Intelligence.ArXivabs/2504.17703 (2025). https: //api.semanticscholar.org/CorpusID:278033793

  9. [9]

    Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sanjay Shakkottai. 2021. Exploiting Shared Representations for Personalized Federated Learning.ArXiv abs/2102.07078 (2021). https://api.semanticscholar.org/CorpusID:231924497

  10. [10]

    Detrano, András Jánosi, Walter Steinbrunn, Matthias Emil Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher

    Robert C. Detrano, András Jánosi, Walter Steinbrunn, Matthias Emil Pfisterer, Johann-Jakob Schmid, Sarbjit Sandhu, Kern H Guppy, Stella Lee, and Victor Froelicher. 1989. International application of a new probability algorithm for the diagnosis of coronary artery disease.The American journal of cardiology64 5 (1989), 304–10. https://api.semanticscholar.or...

  11. [11]

    Tao Fan, Yan Kang, Guoqiang Ma, Weijing Chen, Wenbin Wei, Lixin Fan, and Qiang Yang. 2023. FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models.ArXivabs/2310.10049 (2023). https: //api.semanticscholar.org/CorpusID:264145987

  12. [12]

    Jack Goetz and Ambuj Tewari. 2020. Federated Learning via Synthetic Data. arXiv:2008.04489 [cs.LG] https://arxiv.org/abs/2008.04489

  13. [13]

    Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, and Jing Yuan. 2025. Can Federated Learning Safeguard Private Data in LLM Training? Vulnerabilities, Attacks, and Defense Evaluation.ArXivabs/2509.20680 (2025). https://api.semanticscholar.org/CorpusID:281526125

  14. [15]

    Hans Hofmann. 1994. Statlog (German Credit Data). UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NC77

  15. [16]

    Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, and Daniel Lazar. 2024. PrE-Text: training language models on private federated data in the age of LLMs. InProceedings of the 41st International Conference on Machine Learning. 19043–19061

  16. [17]

    Charlie Hou, Mei-Yu Wang, Yige Zhu, Daniel Lazar, and Giulia Fanti. 2025. Private Federated Learning using Preference-Optimized Synthetic Data.arXiv preprint arXiv:2504.16438(2025)

  17. [18]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models.ArXivabs/2106.09685 (2021). https://api.semanticscholar.org/CorpusID: 235458009

  18. [19]

    Shengyuan Hu, Jack Goetz, Kshitiz Malik, Hongyuan Zhan, Zhe Liu, and Yue Liu

  19. [20]

    arXiv:2204.01273 [cs.LG] https://arxiv.org/abs/2204.01273

    FedSynth: Gradient Compression via Synthetic Data in Federated Learning. arXiv:2204.01273 [cs.LG] https://arxiv.org/abs/2204.01273

  20. [21]

    Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2023. Tabddpm: Modelling tabular data with diffusion models. InInternational Confer- ence on Machine Learning. PMLR, 17564–17579

  21. [22]

    Eugenio Lomurno and Matteo Matteucci. 2024. Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing. arXiv:2407.20830 [cs.LG] https: //arxiv.org/abs/2407.20830

  22. [23]

    Matthews

    Brian W. Matthews. 1975. Comparison of the predicted and observed secondary structure of T4 phage lysozyme.Biochimica et biophysica acta405 2 (1975), 442–51. https://api.semanticscholar.org/CorpusID:44596673

  23. [24]

    Jaehyun Nam, Kyuyoung Kim, Seunghyuk Oh, Jihoon Tack, Jaehyung Kim, and Jinwoo Shin. 2024. Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning.ArXivabs/2406.08527 (2024). https: //api.semanticscholar.org/CorpusID:270440964

  24. [25]

    Ravuri and Oriol Vinyals

    Suman V. Ravuri and Oriol Vinyals. 2019. Classification Accuracy Score for Conditional Generative Models.ArXivabs/1905.10887 (2019). https://api. semanticscholar.org/CorpusID:166228599

  25. [26]

    Mengye Ren, Wenyuan Zeng, Binh Yang, and Raquel Urtasun. 2018. Learning to Reweight Examples for Robust Deep Learning. InInternational Conference on Machine Learning. https://api.semanticscholar.org/CorpusID:4321928

  26. [27]

    Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu Galtier, Bennett A

    Nicola Rieke, Jonny Hancox, Wenqi Li, Fausto Milletarì, Holger R. Roth, Shadi Albarqouni, Spyridon Bakas, Mathieu Galtier, Bennett A. Landman, Klaus H. Maier-Hein, Sébastien Ourselin, Micah J. Sheller, Ronald M. Summers, Andrew Trask, Daguang Xu, Maximilian Baust, and Manuel Jorge Cardoso. 2020. The future of digital health with federated learning.NPJ Dig...

  27. [28]

    Maximilian Schmidhuber and Udo Kruschwitz. 2024. Llm-based synthetic datasets: Applications and limitations in toxicity detection.LREC-COLING 2024 (2024), 37

  28. [29]

    Aivin V Solatorio and Olivier Dupriez. 2023. Realtabformer: Generating realistic relational and tabular data using transformers.arXiv preprint arXiv:2302.02041 (2023)

  29. [30]

    Town, Rory M

    Josefa Lia Stoisser, Marc Boubnovski Martell, Kaspar Märtens, Lawrence Phillips, Stephen M. Town, Rory M. Donovan-Maiye, and Julien Fauqueur. 2025. Query, Don’t Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries.ArXivabs/2505.21801 (2025). https://api.semanticscholar.org/CorpusID: 278960066

  30. [31]

    Marika Swanberg, Ryan McKenna, Edo Roth, Albert Cheu, and Peter Kairouz

  31. [32]

    Is API Access to LLMs Useful for Generating Private Synthetic Tabular Data?arXiv preprint arXiv:2502.06555(2025)

  32. [33]

    Alysa Ziying Tan, Han Yu, Li zhen Cui, and Qiang Yang. 2021. Towards Personal- ized Federated Learning.IEEE Transactions on Neural Networks and Learning Sys- tems34 (2021), 9587–9603. https://api.semanticscholar.org/CorpusID:232076330

  33. [34]

    Qwen Team. 2025. Qwen3 Technical Report. arXiv:2505.09388 [cs.CL] https: //arxiv.org/abs/2505.09388

  34. [35]

    Jianwei Wang, Junyao Yang, Haoran Li, Huiping Zhuang, Cen Chen, and Ziqian Zeng. 2025. RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis.arXiv preprint arXiv:2502.18517(2025)

  35. [36]

    Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, and Hao Wang. 2024. Harmonic: Harnessing llms for tabular data synthesis and privacy protection.arXiv preprint arXiv:2408.02927 (2024)

  36. [37]

    Zezhou Wang, Yaxin Du, Xingjun Ma, Yu-Gang Jiang, Zhuzhong Qian, and Siheng Chen. 2024. Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models.Findings of the Association for Computational Linguistics: EMNLP 2025(2024). https://api.semanticscholar.org/ CorpusID:272987142

  37. [38]

    Futian Weng, Miao Zhu, Mike Buckle, peta hajek, and Mohammad Zoynul Abedin

  38. [39]

    https://api.semanticscholar.org/CorpusID:275048075

    Class Imbalance Bayesian Model Averaging for Consumer Loan Default Prediction: The Role of Soft Credit Information.Research in International Business and Finance(2024). https://api.semanticscholar.org/CorpusID:275048075

  39. [40]

    Huiyu Wu and Diego Klabjan. 2024. LanFL: Differentially Private Federated Learning with Large Language Models using Synthetic Samples.arXiv preprint arXiv:2410.19114(2024)

  40. [41]

    Bangzhou Xin, Yangyang Geng, Teng Hu, Sheng Chen, Wei Yang, Shaowei Wang, and Liusheng Huang. 2022. Federated synthetic data generation with differential privacy.Neurocomputing468 (2022), 1–10. doi:10.1016/J.NEUCOM.2021.10.027 Conference’17, July 2017, Washington, DC, USA Huang et al

  41. [42]

    Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni

  42. [43]

    Modeling tabular data using conditional gan.Advances in neural information processing systems32 (2019)

  43. [44]

    Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzheng Cheng. 2024. On Protecting the Data Privacy of Large Language Models (LLMs): A Survey.arXiv preprint arXiv:2403.05156(2024)

  44. [45]

    Jiahuan Yan, Jintai Chen, Chaowen Hu, Bo Zheng, Yaojun Hu, Jimeng Sun, and Jian Wu. 2024. Small Models are LLM Knowledge Triggers for Medical Tabular Prediction. InInternational Conference on Learning Representations. https: //api.semanticscholar.org/CorpusID:268248002

  45. [46]

    Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, and Siheng Chen. 2024. FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models.ArXivabs/2406.04845 (2024). https://api. semanticscholar.org/CorpusID:270357469

  46. [47]

    Rui Ye, Wenhao Wang, Jingyi Chai, Dihan Li, Zexi Li, Yinda Xu, Yaxin Du, Yanfeng Wang, and Siheng Chen. 2024. OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning.Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(2024). https: //api.semanticscholar.org/CorpusID:267627968

  47. [48]

    Liping Yi, Han Yu, Gang Wang, and Xiaoguang Liu. 2023. FedLoRA: Model- Heterogeneous Personalized Federated Learning with LoRA Tuning.ArXiv abs/2310.13283 (2023). https://api.semanticscholar.org/CorpusID:264405713

  48. [49]

    Zilong Zhao, Aditya Kunar, Robert Birke, and Lydia Y Chen. 2021. Ctab-gan: Effective table data synthesizing. InAsian Conference on Machine Learning. PMLR, 97–112

  49. [50]

    "feature name

    Tianyuan Zou, Yang Liu, Peng Li, Yufei Xiong, Jianqing Zhang, Jingjing Liu, Xiaozhou Ye, Ye Ouyang, and Ya-Qin Zhang. 2025. Contrastive Private Data Synthesis via Weighted Multi-PLM Fusion.arXiv preprint arXiv:2502.00245(2025). A Implementation Details This section describes implementation details for reproducing Concordia, including the downstream model ...