arxiv: 2605.10315 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.AI

Recognition: 3 theorem links

· Lean Theorem

Active Tabular Augmentation via Policy-Guided Diffusion Inpainting

Zheyu Zhang , Shuo Yang , Bardh Prenkaj , Gjergji Kasneci

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:50 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords tabular data augmentationdiffusion inpaintingpolicy learningdata scarcitygenerative modelsutility optimizationmachine learning

0 comments

The pith

A learner-conditioned policy steers diffusion inpainting to generate tabular samples that reduce a downstream model's held-out loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard generative augmentation for tabular data optimizes for samples that match the original distribution, yet this often fails to improve the actual learner. The paper identifies this fidelity-utility gap and introduces a policy that conditions generation on the current learner to target useful regions while using gating and windowed commitment to add samples safely. This shifts augmentation from passive distribution matching to active support for the training process. A reader would care because many real applications have limited labeled data where simply adding more plausible samples does not guarantee better models.

Core claim

We formalize a fidelity-utility gap and propose TAP, which couples diffusion inpainting with a lightweight learner-conditioned policy to steer generation toward high-utility regions and controls safe injection via explicit gating and conservative windowed commitment.

What carries the argument

TAP, the Tabular Augmentation Policy: a lightweight learner-conditioned policy that directs diffusion inpainting and manages when to inject the resulting samples.

If this is right

Under severe data scarcity the method improves classification accuracy by up to 15.6 percentage points over strong generative baselines.
Regression RMSE drops by up to 32 percent compared with the same baselines on the same seven real-world datasets.
Generation is directed toward regions that help the evolving learner rather than solely replicating the training distribution.
Explicit gating and windowed commitment keep injected samples from degrading performance during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same learner-conditioned steering principle could be tested on image or text data where distributional fidelity likewise fails to guarantee task improvement.
The policy might be combined with active-learning loops to decide both which real points to label and which synthetic points to generate next.
The conservative commitment window offers a starting point for preventing augmentation drift in continual or streaming learning settings.

Load-bearing premise

The policy can reliably select regions whose generated samples will reduce held-out loss and the gating mechanism will block harmful injections without the policy itself overfitting to the training state.

What would settle it

A repeated trial on a new dataset in which policy-steered samples produce no greater loss reduction on held-out data than samples drawn uniformly from the same diffusion model.

Figures

Figures reproduced from arXiv: 2605.10315 by Bardh Prenkaj, Gjergji Kasneci, Shuo Yang, Zheyu Zhang.

**Figure 1.** Figure 1: Fidelity-utility gap in tabular augmentation. Fidelityoriented generators sample high-density regions of P(X, Y ), yielding plausible records that can be redundant and may offer limited downstream gain. Utility is state-dependent and tied to real-query loss. TAP learns what to generate and when to inject using conservative, feasibility-aware decisions under scarcity. generate valid records that respect do… view at source ↗

**Figure 2.** Figure 2: Learnability under matched informativeness. Runs are bucketed by decision-boundary percentile and the vertical axis reports learnability percentile. TAP achieves better learnability at comparable levels of informativeness. 4.3. Where High-Utility Samples Lie The results above establish that TAP outperforms baselines, but they do not reveal what makes injected samples useful. Principle 2 suggests that high-… view at source ↗

**Figure 3.** Figure 3: Utility gain across injection methods with a shared diffusion backbone. Shaded regions denote 95% CIs. Operationalizing informative yet learnable samples. The ladder shows that TAP outperforms Hard inpainting, yet both anchor on uncertain samples. To explain this gap, we introduce two post-hoc diagnostics that are used only for analysis and are not available to the policy during training. We measure inform… view at source ↗

**Figure 4.** Figure 4: Bucketed injection. Utility by learnability bin where 0 is most learnable, and 4 is least learnable. Gains concentrate in the middle bins, and the harmful tail degrades performance. nreal ∈ {20, 50, 100} where robustness matters most. We report utility gain ∆U, win-rate as the fraction of runs with positive gain, and tail risk as the mean scon over the 20% injected samples with the largest scon. Results [… view at source ↗

**Figure 5.** Figure 5: Overview of the TAP framework. TAP frames data augmentation as a sequential control process. At each step, a learnable policy observes the learner’s state to guide a frozen diffusion inpainting kernel. Proposed candidates undergo hard feasibility gating and are accumulated in a temporary pool. A frozen online evaluator assesses the pool’s utility, providing advantage signals for preference-based policy opt… view at source ↗

**Figure 6.** Figure 6: Desirable rate (proxy reward > 0) across commitment windows, aggregated over all datasets at nreal = 50. TAP learns to outperform the frozen baseline as training progresses. E.2. Sensitivity to Commitment Hyperparameters We analyze sensitivity to the commitment window size K and the threshold τ used in the conservative commitment rule. To summarize robustness without listing all per-dataset sweeps, we repo… view at source ↗

read the original abstract

Generative tabular augmentation is appealing in data-scarce domains, yet the prevailing focus on distributional fidelity does not reliably translate into better downstream models. We formalize a fidelity-utility gap: common generative objectives prioritize distributional plausibility, whereas augmentation succeeds only when injected samples reduce the current learner's held-out evaluation loss. This gap motivates learning not just how to generate, but what to generate and when to inject as training evolves. We propose TAP (Tabular Augmentation Policy), which couples diffusion inpainting with a lightweight, learner-conditioned policy to steer generation toward high-utility regions and controls safe injection via explicit gating and conservative windowed commitment. Under severe data scarcity, TAP consistently outperforms strong generative baselines on seven real-world datasets, improving classification accuracy by up to 15.6 percentage points and reducing regression RMSE by up to 32%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TAP gives a clean framing of the fidelity-utility gap and a workable policy-plus-gating recipe, but the policy's ability to steer under scarcity still needs direct checks.

read the letter

The main point is that TAP trains a lightweight policy on the current learner to decide which diffusion-inpainted tabular samples to generate and when to inject them, instead of just chasing distributional match. That is a direct response to the observation that realistic samples do not always help the downstream model. The paper pairs this with explicit gating and a conservative commitment window, and reports consistent gains across seven real datasets—up to 15.6 points in accuracy and 32 percent RMSE reduction under severe scarcity. Those numbers are large enough to matter in medicine and finance, and the method is simple enough to implement on top of existing diffusion pipelines for tabular data. The experiments appear to use standard baselines and multiple datasets, which is better than many augmentation papers that stay with one or two toy tables. The circularity risk is real: the policy sees the same scarce training state it is trying to improve, so it could simply amplify whatever the learner already fits. The gating and window are meant to block that, but the write-up does not show a held-out utility oracle or an ablation that replaces the learned policy with random selection while keeping the rest fixed. Without those diagnostics it is difficult to separate genuine steering from experimental artifact. The paper is aimed at practitioners who already use generative augmentation for low-data tabular problems and want something more targeted than unconditional sampling. A reader who cares about closing the loop between generation and downstream loss will find the framing useful even if the current evidence is preliminary. I would send it to review; the idea is concrete, the claims are falsifiable, and the main gaps are fixable with additional controls rather than a full redesign.

Referee Report

3 major / 2 minor

Summary. The paper proposes TAP (Tabular Augmentation Policy), which couples diffusion inpainting with a lightweight learner-conditioned policy to steer tabular sample generation toward regions that reduce the current learner's held-out loss, using explicit gating and windowed commitment for safe injection. It claims this closes the fidelity-utility gap and yields consistent gains over generative baselines on seven real-world datasets under severe data scarcity, with classification accuracy improvements up to 15.6 percentage points and regression RMSE reductions up to 32%.

Significance. If the empirical results and policy safeguards hold under scrutiny, the work could meaningfully advance tabular data augmentation by prioritizing downstream utility over pure distributional fidelity, a distinction that is often load-bearing in low-data regimes. The explicit mechanisms for controlling injection timing and safety represent a practical contribution that could be adopted more broadly if supported by stronger diagnostics.

major comments (3)

[Methods (policy objective and training)] Methods section on policy training: the learner-conditioned policy is trained on the same scarce data as the downstream model, creating a potential feedback loop where the policy may overfit to training-set noise or transient artifacts rather than true held-out utility. The manuscript must clarify whether policy updates use held-out data and provide an ablation of policy-guided selection versus random gating to show the safeguards function as claimed.
[Experiments and Results] Results section (empirical gains): the headline improvements of 15.6 pp accuracy and 32% RMSE are reported without error bars, without ablations isolating the contribution of the gating mechanism or commitment window, and without a direct diagnostic (e.g., policy accuracy against a held-out utility oracle). These omissions make it impossible to verify that the gains arise from utility steering rather than experimental protocol artifacts.
[Experimental setup] Experimental protocol: the abstract and methods do not specify how the policy avoids circularity with the learner it conditions on, nor do they report whether the reported improvements remain when the policy is trained independently of the final evaluation split. This is load-bearing for the central claim that TAP reliably identifies high-utility injections.

minor comments (2)

[Abstract] The abstract would benefit from naming the seven datasets and briefly stating the data-scarcity regime (e.g., number of samples per class) to allow readers to assess the scope of the claims without reading the full experiments.
[Preliminaries] Notation for the policy input (learner state features) and the commitment window length should be defined once in a dedicated notation paragraph rather than introduced inline.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments correctly identify areas where additional clarity, ablations, and diagnostics would strengthen the manuscript. We address each major comment below and will revise the paper accordingly to incorporate the requested details, experiments, and safeguards.

read point-by-point responses

Referee: Methods section on policy training: the learner-conditioned policy is trained on the same scarce data as the downstream model, creating a potential feedback loop where the policy may overfit to training-set noise or transient artifacts rather than true held-out utility. The manuscript must clarify whether policy updates use held-out data and provide an ablation of policy-guided selection versus random gating to show the safeguards function as claimed.

Authors: We agree that this distinction is important. The policy is trained to predict utility (reduction in held-out loss) using a validation split that is held out from both the learner's training data and the final test set. In the revised manuscript we will explicitly state this protocol in the Methods section. We will also add an ablation that replaces the learned policy with random gating (while keeping the same diffusion inpainting and commitment window) and report the resulting downstream performance to isolate the contribution of utility-guided selection. revision: yes
Referee: Results section (empirical gains): the headline improvements of 15.6 pp accuracy and 32% RMSE are reported without error bars, without ablations isolating the contribution of the gating mechanism or commitment window, and without a direct diagnostic (e.g., policy accuracy against a held-out utility oracle). These omissions make it impossible to verify that the gains arise from utility steering rather than experimental protocol artifacts.

Authors: We acknowledge these omissions in the current draft. The revision will include standard error bars computed over five independent runs for all reported metrics. We will add targeted ablations that disable the gating mechanism and the commitment window individually, and we will introduce a diagnostic that measures how often the policy selects samples whose true held-out utility (computed on a separate oracle split) exceeds a random baseline. These additions will appear in the Experiments section. revision: yes
Referee: Experimental protocol: the abstract and methods do not specify how the policy avoids circularity with the learner it conditions on, nor do they report whether the reported improvements remain when the policy is trained independently of the final evaluation split. This is load-bearing for the central claim that TAP reliably identifies high-utility injections.

Authors: We will revise the Methods and Experimental Setup sections to describe the data partitioning explicitly: the policy is conditioned on the current learner but is trained and validated on a split that is disjoint from the final test evaluation. We will also report an additional experiment in which the policy is trained on an entirely independent validation fold (never seen by the final learner) and show that the accuracy and RMSE gains remain statistically significant, thereby confirming that the improvements are not artifacts of circular evaluation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces independent policy and gating components

full rationale

The provided abstract and description formalize a fidelity-utility gap and introduce TAP as a coupling of diffusion inpainting with a learner-conditioned policy plus explicit gating. No equations, self-citations, or fitted parameters are quoted that reduce the central claims (e.g., utility steering or performance gains) to the inputs by construction. The policy and gating are presented as new mechanisms rather than renamings or self-referential fits, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The abstract invokes standard diffusion inpainting and policy learning without stating new axioms; the central claim rests on the unstated assumption that the policy objective can be optimized without circular dependence on the downstream evaluation.

axioms (1)

domain assumption Diffusion inpainting can be conditioned on a learner state to produce high-utility samples
Invoked implicitly when the policy steers generation; no justification or prior result cited in abstract.

pith-pipeline@v0.9.0 · 5449 in / 1236 out tokens · 45811 ms · 2026-05-12T03:50:02.072517+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formalize a fidelity-utility gap: common generative objectives prioritize distributional plausibility, whereas augmentation succeeds only when injected samples reduce the current learner's held-out evaluation loss.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TAP uses diffusion inpainting to produce manifold-local proposals... A lightweight policy then selects generation conditions based on a compact summary of the learner's state
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use TabDiff as the diffusion backbone... commit only when dΔU > τ + εt

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 1 internal anchor

[1]

Will Synthetic Data Finally Solve the Data Access Problem? , year=

How Well Does Your Tabular Generator Learn the Structure of Tabular Data? , author=. Will Synthetic Data Finally Solve the Data Access Problem? , year=

work page
[2]

Nabeel Seedat and Nicolas Huynh and Boris van Breugel and Mihaela van der Schaar , booktitle=. Curated. 2024 , url=

work page 2024
[3]

Andrei Margeloiu and Xiangjian Jiang and Nikola Simidjievski and Mateja Jamnik , booktitle=. Tab. 2024 , url=

work page 2024
[4]

Journal of artificial intelligence research , volume=

SMOTE: synthetic minority over-sampling technique , author=. Journal of artificial intelligence research , volume=

work page
[5]

Advances in neural information processing systems , volume=

Modeling tabular data using conditional gan , author=. Advances in neural information processing systems , volume=

work page
[6]

International Conference on Artificial Intelligence and Statistics , pages=

Adversarial random forests for density estimation and generative modeling , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2023 , organization=

work page 2023
[7]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[8]

International conference on machine learning , pages=

Tabddpm: Modelling tabular data with diffusion models , author=. International conference on machine learning , pages=. 2023 , organization=

work page 2023
[9]

The Thirteenth International Conference on Learning Representations , year=

TabDiff: a Mixed-type Diffusion Model for Tabular Data Generation , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[10]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

The regression analysis of binary sequences , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1958 , publisher=

work page 1958
[11]

1985 , publisher=

Discriminatory analysis: nonparametric discrimination, consistency properties , author=. 1985 , publisher=

work page 1985
[12]

Advances in neural information processing systems , volume=

Revisiting deep learning models for tabular data , author=. Advances in neural information processing systems , volume=

work page
[13]

Machine learning , volume=

Random forests , author=. Machine learning , volume=. 2001 , publisher=

work page 2001
[14]

Advances in neural information processing systems , volume=

Lightgbm: A highly efficient gradient boosting decision tree , author=. Advances in neural information processing systems , volume=

work page
[15]

Cornell University , year=

XGBoost: A Scalable Tree Boosting System , author=. Cornell University , year=

work page
[16]

Nature , volume=

Accurate predictions on small data with a tabular foundation model , author=. Nature , volume=. 2025 , publisher=

work page 2025
[17]

The Eleventh International Conference on Learning Representations , year=

Transfer Learning with Deep Tabular Models , author=. The Eleventh International Conference on Learning Representations , year=

work page
[18]

arXiv preprint arXiv:2407.21523 , year=

Tabular data augmentation for machine learning: Progress and prospects of embracing generative ai , author=. arXiv preprint arXiv:2407.21523 , year=

work page arXiv
[19]

Forty-first International Conference on Machine Learning , year=

Model alignment as prospect theoretic optimization , author=. Forty-first International Conference on Machine Learning , year=

work page
[20]

2026 , url=

TabStruct: Measuring Structural Fidelity of Tabular Data , author=. 2026 , url=

work page 2026
[21]

Advances in neural information processing systems , volume=

Why do tree-based models still outperform deep learning on typical tabular data? , author=. Advances in neural information processing systems , volume=

work page
[22]

Substance use & misuse , volume=

Metanet*: The theory of independent judges , author=. Substance use & misuse , volume=. 1998 , publisher=

work page 1998
[23]

Brazilian symposium on artificial intelligence , pages=

Learning with drift detection , author=. Brazilian symposium on artificial intelligence , pages=. 2004 , organization=

work page 2004
[24]

2007 , publisher=

UCI machine learning repository , author=. 2007 , publisher=

work page 2007
[25]

Proceedings of the annual symposium on computer application in medical care , pages=

Using the ADAP learning algorithm to forecast the onset of diabetes mellitus , author=. Proceedings of the annual symposium on computer application in medical care , pages=

work page
[26]

PloS one , volume=

Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome , author=. PloS one , volume=. 2015 , publisher=

work page 2015
[27]

International Conference on Learning Representations , year=

Transformers Can Do Bayesian Inference , author=. International Conference on Learning Representations , year=

work page
[28]

IEEE transactions on neural networks and learning systems , volume=

Deep neural networks and tabular data: A survey , author=. IEEE transactions on neural networks and learning systems , volume=. 2022 , publisher=

work page 2022
[29]

Information Fusion , volume=

Tabular data: Deep learning is not all you need , author=. Information Fusion , volume=. 2022 , publisher=

work page 2022
[30]

Asian conference on machine learning , pages=

Ctab-gan: Effective table data synthesizing , author=. Asian conference on machine learning , pages=. 2021 , organization=

work page 2021
[31]

2021 IEEE International Conference on Data Mining (ICDM) , pages=

Ganblr: a tabular data generation model , author=. 2021 IEEE International Conference on Data Mining (ICDM) , pages=. 2021 , organization=

work page 2021
[32]

The Eleventh International Conference on Learning Representations , year=

Language Models are Realistic Tabular Data Generators , author=. The Eleventh International Conference on Learning Representations , year=

work page
[33]

Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models

Zhang, Zheyu and Yang, Shuo and Prenkaj, Bardh and Kasneci, Gjergji. Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.330

work page doi:10.18653/v1/2025.findings-emnlp.330 2025
[34]

P - TA : Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models

Yang, Shuo and Yuan, Chenchen and Rong, Yao and Steinbauer, Felix and Kasneci, Gjergji. P - TA : Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models. Findings of the Association for Computational Linguistics: ACL 2024. 2024. doi:10.18653/v1/2024.findings-acl.16

work page doi:10.18653/v1/2024.findings-acl.16 2024
[35]

The Twelfth International Conference on Learning Representations , year=

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space , author=. The Twelfth International Conference on Learning Representations , year=

work page
[36]

Sengamedu and Christos Faloutsos , journal=

Xi Fang and Weijie Xu and Fiona Anting Tan and Ziqing Hu and Jiani Zhang and Yanjun Qi and Srinivasan H. Sengamedu and Christos Faloutsos , journal=. Large Language Models (. 2024 , url=

work page 2024
[37]

Advances in Neural Information Processing Systems , volume=

Epic: Effective prompting for imbalanced-class data synthesis in tabular data classification via large language models , author=. Advances in Neural Information Processing Systems , volume=

work page
[38]

arXiv preprint arXiv:2306.15636 , year=

On the usefulness of synthetic tabular data generation , author=. arXiv preprint arXiv:2306.15636 , year=

work page arXiv
[39]

Proximal Policy Optimization Algorithms

Proximal policy optimization algorithms , author=. arXiv preprint arXiv:1707.06347 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[40]

2009 , publisher=

Active learning literature survey , author=. 2009 , publisher=

work page 2009
[41]

Advances in neural information processing systems , volume=

Neural spline flows , author=. Advances in neural information processing systems , volume=

work page
[42]

Journal of Intelligent Learning Systems and Applications , volume=

Survey of machine learning algorithms for disease diagnostic , author=. Journal of Intelligent Learning Systems and Applications , volume=. 2017 , publisher=

work page 2017
[43]

Applied Soft Computing , volume=

Statistical and machine learning models in credit scoring: A systematic literature survey , author=. Applied Soft Computing , volume=. 2020 , publisher=

work page 2020
[44]

Nature communications , volume=

Searching for exotic particles in high-energy physics with deep learning , author=. Nature communications , volume=. 2014 , publisher=

work page 2014
[45]

ACM Computing Surveys (Csur) , volume=

A systematic review on data scarcity problem in deep learning: solution and applications , author=. ACM Computing Surveys (Csur) , volume=. 2022 , publisher=

work page 2022
[46]

Journal of Computational and Graphical Statistics , volume=

The art of data augmentation , author=. Journal of Computational and Graphical Statistics , volume=. 2001 , publisher=

work page 2001
[47]

arXiv preprint arXiv:2305.10308 , year=

Rethinking data augmentation for tabular data in deep learning , author=. arXiv preprint arXiv:2305.10308 , year=

work page arXiv
[48]

Array , volume=

Data augmentation: A comprehensive survey of modern approaches , author=. Array , volume=. 2022 , publisher=

work page 2022
[49]

Journal of Big Data , volume=

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications , author=. Journal of Big Data , volume=. 2023 , publisher=

work page 2023
[50]

International conference on machine learning , pages=

Understanding black-box predictions via influence functions , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[51]

International Conference on Learning Representations , year=

mixup: Beyond Empirical Risk Minimization , author=. International Conference on Learning Representations , year=

work page
[52]

2021 , eprint=

Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain , author=. 2021 , eprint=

work page 2021
[53]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Autoaugment: Learning augmentation strategies from data , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[54]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

Randaugment: Practical automated data augmentation with a reduced search space , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops , pages=

work page
[55]

Proceedings of the 26th annual international conference on machine learning , pages=

Curriculum learning , author=. Proceedings of the 26th annual international conference on machine learning , pages=

work page
[56]

International conference on machine learning , pages=

On the power of curriculum learning in training deep networks , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[57]

International conference on machine learning , pages=

Deep bayesian active learning with image data , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[58]

International Conference on Learning Representations , year=

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , author=. International Conference on Learning Representations , year=

work page
[59]

arXiv preprint arXiv:1708.03731 , year=

Openml benchmarking suites , author=. arXiv preprint arXiv:1708.03731 , year=

work page arXiv
[60]

International Conference on Machine Learning , pages=

Codi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis , author=. International Conference on Machine Learning , pages=. 2023 , organization=

work page 2023
[61]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Repaint: Inpainting using denoising diffusion probabilistic models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[62]

1982 , publisher=

Residuals and influence in regression , author=. 1982 , publisher=

work page 1982
[63]

Advances in neural information processing systems , volume=

Pytorch: An imperative style, high-performance deep learning library , author=. Advances in neural information processing systems , volume=

work page
[64]

ACM Transactions on Management Information Systems (TMIS) , volume=

Machine learning for the developing world , author=. ACM Transactions on Management Information Systems (TMIS) , volume=. 2018 , publisher=

work page 2018
[65]

Advances in neural information processing systems , volume=

Data augmentation can improve robustness , author=. Advances in neural information processing systems , volume=

work page
[66]

Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization , pages=

A framework for understanding sources of harm throughout the machine learning life cycle , author=. Proceedings of the 1st ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization , pages=

work page
[67]

Globalization and Health , volume=

Artificial intelligence in health care: laying the foundation for responsible, sustainable, and inclusive innovation in low-and middle-income countries , author=. Globalization and Health , volume=. 2020 , publisher=

work page 2020
[68]

Communications of the ACM , volume=

Datasheets for datasets , author=. Communications of the ACM , volume=. 2021 , publisher=

work page 2021
[69]

Scientific Reports , volume=

A tabular data generation framework guided by downstream tasks optimization , author=. Scientific Reports , volume=. 2024 , publisher=

work page 2024