Recognition: 1 theorem link
TabTransformer: Tabular Data Modeling Using Contextual Embeddings
Pith reviewed 2026-05-16 21:28 UTC · model grok-4.3
The pith
TabTransformer applies self-attention to categorical feature embeddings to create contextual representations that raise prediction accuracy on tabular data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying Transformer self-attention layers to the embeddings of categorical variables produces contextual embeddings that improve accuracy for both fully supervised and semi-supervised tabular modeling, outperforming earlier deep networks and matching tree-based ensembles on public benchmarks.
What carries the argument
Self-attention Transformer layers that convert per-feature categorical embeddings into contextual embeddings carrying cross-feature information.
If this is right
- Tabular tasks can use attention mechanisms without custom feature engineering.
- Predictions remain accurate even when input features contain missing values or noise.
- The learned embeddings support direct inspection of which feature combinations drive each prediction.
- Unsupervised pre-training on unlabeled tables produces useful starting embeddings for downstream labeled tasks.
Where Pith is reading between the lines
- The same contextual-embedding approach may transfer to tables that mix numeric, categorical, and text columns.
- Larger-scale versions could serve as general-purpose backbones for enterprise tabular pipelines.
- Interpretability gains might reduce the need for separate post-hoc explanation tools.
Load-bearing premise
The fifteen public datasets used for testing represent the range of distributions and noise patterns found in real tabular prediction tasks.
What would settle it
A new tabular dataset, after identical tuning of all methods, where TabTransformer shows no AUC improvement over the strongest deep learning baseline.
read the original abstract
We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes TabTransformer, a Transformer-based architecture for supervised and semi-supervised tabular data modeling. It uses self-attention layers to produce contextual embeddings from categorical feature embeddings, claiming this yields higher accuracy than prior deep learning methods. Through experiments on 15 public datasets, it reports that TabTransformer outperforms state-of-the-art deep tabular models by at least 1.0% mean AUC while matching tree-based ensembles (e.g., XGBoost), demonstrates robustness to missing and noisy features, provides improved interpretability via attention, and shows a 2.1% AUC gain from unsupervised pre-training in the semi-supervised case.
Significance. If the empirical claims hold under rigorous controls, the work would be significant for the tabular modeling literature: it provides concrete evidence that attention mechanisms can close the gap with tree ensembles on standard benchmarks while adding robustness and interpretability benefits. The use of 15 public datasets and the semi-supervised pre-training procedure are positive elements that could influence follow-on research on hybrid DL-tree approaches.
major comments (2)
- [§4] §4 (Experiments): The description of baseline implementations provides no details on the hyperparameter search space, number of trials, or compute budget allocated to tree-based methods such as XGBoost and LightGBM. Because tabular performance is known to be highly sensitive to these choices, the central claim that TabTransformer 'matches the performance of tree-based ensemble models' cannot be evaluated without this information.
- [§4.3] §4.3 and Table 2: Mean AUC differences (1.0% over DL baselines) are reported without standard deviations across the 15 datasets, without per-dataset statistical tests, and without explicit confirmation that all baselines received equivalent tuning effort. This omission directly affects the reliability of the performance claims.
minor comments (1)
- [§3.2] §3.2: The notation for the multi-head attention output and the subsequent feed-forward layers could be made more explicit by including the exact dimensionality transformations.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and will incorporate revisions to improve the clarity and rigor of the experimental reporting.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The description of baseline implementations provides no details on the hyperparameter search space, number of trials, or compute budget allocated to tree-based methods such as XGBoost and LightGBM. Because tabular performance is known to be highly sensitive to these choices, the central claim that TabTransformer 'matches the performance of tree-based ensemble models' cannot be evaluated without this information.
Authors: We agree that additional details on hyperparameter tuning are necessary for reproducibility and to substantiate the performance comparison. In the revised version, we will expand Section 4 with explicit hyperparameter search spaces for XGBoost, LightGBM, and all other baselines, the number of random or grid search trials conducted, and the compute budget (e.g., number of CPU/GPU hours) allocated to each method. This will allow direct evaluation of tuning equivalence. revision: yes
-
Referee: [§4.3] §4.3 and Table 2: Mean AUC differences (1.0% over DL baselines) are reported without standard deviations across the 15 datasets, without per-dataset statistical tests, and without explicit confirmation that all baselines received equivalent tuning effort. This omission directly affects the reliability of the performance claims.
Authors: We acknowledge that reporting variability and statistical significance strengthens the claims. We will revise Table 2 to include standard deviations of the AUC values across the 15 datasets. We will also add per-dataset paired statistical tests (Wilcoxon signed-rank) between TabTransformer and each baseline, along with an explicit statement confirming that all methods received comparable tuning effort via the same search protocol. These additions will be included in the updated Section 4.3. revision: yes
Circularity Check
No circularity in derivation chain; all claims empirical
full rationale
The paper introduces TabTransformer as a new architecture using self-attention on categorical embeddings for tabular data and supports its claims solely through direct experimental comparisons on 15 public datasets against DL and tree baselines. No equations, uniqueness theorems, or ansatzes are derived or invoked that reduce by construction to fitted inputs from the same evaluation data. Performance claims (AUC lifts) are presented as measured outcomes rather than predictions forced by the model's own parameterization, leaving the argument self-contained without circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-attention applied to categorical feature embeddings produces robust contextual representations that improve downstream prediction accuracy on tabular data.
Forward citations
Cited by 20 Pith papers
-
From Schema to Signal: Retrieval-Augmented Modeling for Relational Data Analytics
RAM augments relational graph models with attribute-semantic retrieval via random-walk documents and two contrastive augmentations (ATRA, ETRA) to achieve state-of-the-art results on five real-world databases.
-
GeoViSTA: Geospatial Vision-Tabular Transformer for Multimodal Environment Representation
GeoViSTA learns unified geospatial embeddings from co-registered imagery and tabular data via bilateral cross-attention and joint masked autoencoding, yielding better linear probing performance on mortality and fire h...
-
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
LUCAS-MEGA fuses 68 heterogeneous soil datasets into a 70k-sample multimodal collection and demonstrates its value by pretraining a tabular transformer whose representations align with established soil processes.
-
LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems
LUCAS-MEGA fuses 68 soil-environment datasets into a 70k-sample multimodal resource that supports self-supervised pretraining of SoilFormer, whose representations align with known soil processes.
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos using a structured taxonomy of artistic production variables, dual-channel conditioning, a style-motion curriculum, and deformation-aware optimization to prioritize art over physics.
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional ani...
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos using a production knowledge taxonomy, dual-channel conditioning, style-motion curriculum, and deformation-aware preference optimization, outperforming baselines in animator evaluation...
-
VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning
VT-Bench is the first unified benchmark aggregating 14 visual-tabular datasets with over 756K samples and evaluating 23 models to expose challenges in this multi-modal area.
-
ASD-Bench: A Four-Axis Comprehensive Benchmark of AI Models for Autism Spectrum Disorder
ASD-Bench evaluates 17 ML and deep learning models on 4,068 AQ-10 records across child, adolescent, and adult cohorts, showing high adult performance, harder adolescent classification, shifting feature importance, and...
-
Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment
DistPFN is a test-time posterior adjustment that rescales TabPFN class probabilities to reduce overfitting to the training class distribution under label shift.
-
DynaTab: Dynamic Feature Ordering as Neural Rewiring for High-Dimensional Tabular Data
DynaTab dynamically reorders features in tabular data via neural rewiring and reports statistically significant gains over 45 baselines on 36 high-dimensional datasets.
-
ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
ReSS uses decision-tree scaffolds to fine-tune LLMs for faithful tabular reasoning, reporting up to 10% gains over baselines on medical and financial data.
-
Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data
WISE unifies representation via BEP, feature weighting via LOFO, two-stage clustering, and intrinsic explanations via DFI for mixed-type tabular data, outperforming baselines on six datasets.
-
From Uniform to Learned Knots: A Study of Spline-Based Numerical Encodings for Tabular Deep Learning
Spline encodings for numerical features show task-dependent performance in tabular deep learning, with piecewise-linear encoding robust for classification and variable results for regression depending on spline family...
-
Focused PU learning from imbalanced data
A focused empirical risk estimator for PU learning achieves state-of-the-art results on imbalanced datasets under SCAR and SAR labeling mechanisms.
-
Evaluating Tabular Representation Learning for Network Intrusion Detection
Tabular representation learning for network intrusion detection exhibits strong dataset-model dependency, with supervised methods outperforming unsupervised anomaly detection and limited but possible cross-dataset gen...
-
ZAYAN: Disentangled Contrastive Transformer for Tabular Remote Sensing Data
ZAYAN introduces feature-level zero-anchor contrastive pretraining that produces disentangled embeddings and improves classification accuracy on remote sensing tabular datasets over standard deep learning baselines.
-
Evaluating Deep Learning Models for Multiclass Classification of LIGO Gravitational-Wave Glitches
Benchmark finds some deep learning models match gradient-boosted trees on LIGO glitch classification with fewer parameters and partially consistent feature importance across architectures.
-
PRAGMA: Revolut Foundation Model
PRAGMA pre-trains a Transformer on heterogeneous banking events with a tailored self-supervised masked objective, yielding embeddings that support strong downstream performance on credit scoring, fraud detection, and ...
-
Integrating SAINT with Tree-Based Models: A Case Study in Employee Attrition Prediction
Standalone tree-based models outperform both SAINT and SAINT-embedding hybrids for employee attrition prediction on tabular HR data.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the ninth international conference on Information and knowledge management , pages=
Analyzing the effectiveness and applicability of co-training , author=. Proceedings of the ninth international conference on Information and knowledge management , pages=
-
[2]
Advances in neural information processing systems , pages=
Semi-supervised learning by entropy minimization , author=. Advances in neural information processing systems , pages=
-
[3]
Learning from labeled and unlabeled data with label propagation , author=. 2002 , publisher=
work page 2002
-
[4]
Advances in neural information processing systems , pages=
Regularization with stochastic transformations and perturbations for deep semi-supervised learning , author=. Advances in neural information processing systems , pages=
-
[5]
Workshop on challenges in representation learning, ICML , volume=
Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks , author=. Workshop on challenges in representation learning, ICML , volume=
-
[6]
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
Label propagation for deep semi-supervised learning , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages=
-
[7]
International Conference on Learning Representations , year=
Decoupled Weight Decay Regularization , author=. International Conference on Learning Representations , year=
-
[8]
Advances in Neural Information Processing Systems 32 , editor =
PyTorch: An Imperative Style, High-Performance Deep Learning Library , author =. Advances in Neural Information Processing Systems 32 , editor =. 2019 , publisher =
work page 2019
-
[9]
Advances in Neural Information Processing Systems , pages=
LightGBM: A highly efficient gradient boosting decision tree , author=. Advances in Neural Information Processing Systems , pages=. 2017 , url=
work page 2017
-
[11]
Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
Mobilenetv2: Inverted residuals and linear bottlenecks , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
-
[12]
Xgboost: A scalable tree boosting system , author=. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining , pages=
-
[13]
Advances in neural information processing systems , pages=
CatBoost: unbiased boosting with categorical features , author=. Advances in neural information processing systems , pages=
- [14]
-
[15]
International Conference on Learning Representations , year=
Deep Variational Information Bottleneck , author=. International Conference on Learning Representations , year=
-
[17]
Sun, Qiang and Cheng, Zhinan and Fu, Yanwei and Wang, Wenxuan and Jiang, Yu-Gang and Xue, Xiangyang , month = sep, year =
-
[18]
Artificial Neural Networks Applied to Taxi Destination Prediction , year =
De Br\'. Artificial Neural Networks Applied to Taxi Destination Prediction , year =. Proceedings of the 2015th International Conference on ECML PKDD Discovery Challenge - Volume 1526 , pages =
-
[19]
Proceedings of the 1st workshop on deep learning for recommender systems , pages=
Wide & deep learning for recommender systems , author=. Proceedings of the 1st workshop on deep learning for recommender systems , pages=
-
[23]
2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=
Extensions of recurrent neural network language model , author=. 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) , pages=. 2011 , organization=
work page 2011
-
[27]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , author=. NAACL-HLT , year=
-
[28]
Deep & Cross Network for Ad Click Predictions , author=. ADKDD@KDD , year=
-
[31]
Adaptive Computation and Machine Learning , author=
Semi-supervised learning. Adaptive Computation and Machine Learning , author=. 2010 , publisher=
work page 2010
-
[32]
International Journal of Machine Learning and Cybernetics , year=
Semi-supervised self-training for decision tree classifiers , author=. International Journal of Machine Learning and Cybernetics , year=
-
[33]
Stretcu, Otilia and Viswanathan, Krishnamurthy and Movshovitz-Attias, Dana and Platanios, Emmanouil and Ravi, Sujith and Tomkins, Andrew , year =. Graph. Advances in
-
[35]
Guolin Ke and Jia Zhang and Zhenhui Xu and Jiang Bian and Tie-Yan Liu , year=. Tab
-
[36]
UCI Machine Learning Repository
Dua, Dheeru and Graff, Casey. UCI Machine Learning Repository. 2017
work page 2017
-
[37]
Isabelle Guyon and Lisheng Sun-Hosoya and Marc Boull\'e and Hugo Jair Escalante and Sergio Escalera and Zhengying Liu and Damir Jajetic and Bisakha Ray and Mehreen Saeed and Mich\'ele Sebag and Alexander Statnikov and WeiWei Tu and Evelyne Viegas , title =. AutoML , series =. 2019 , url =
work page 2019
-
[38]
Advances in neural information processing systems , pages=
Attention is all you need , author=. Advances in neural information processing systems , pages=
-
[40]
Advances in Neural Information Processing Systems , pages=
Realistic evaluation of deep semi-supervised learning algorithms , author=. Advances in Neural Information Processing Systems , pages=
-
[41]
IEEE Transactions on Neural Networks , volume=
Semi-supervised learning) , author=. IEEE Transactions on Neural Networks , volume=. 2009 , publisher=
work page 2009
-
[42]
Jahrer, Michael , year =. Porto
-
[43]
Clark, Kevin and Luong, Minh-Thang and Le, Quoc V. and Manning, Christopher D. , year =. International
-
[44]
Advances in neural information processing systems , pages=
Self-normalizing neural networks , author=. Advances in neural information processing systems , pages=
-
[46]
Semi-supervised learning , pages=
Entropy regularization , author=. Semi-supervised learning , pages=. 2006 , publisher=
work page 2006
-
[47]
The use of the area under the ROC curve in the evaluation of machine learning algorithms , author=. Pattern recognition , volume=
-
[48]
Shubham Jain , title =
-
[49]
Advances in Neural Information Processing Systems , pages=
Mixmatch: A holistic approach to semi-supervised learning , author=. Advances in Neural Information Processing Systems , pages=
-
[50]
Semi-Supervised Learning for Text Classification by Layer Partitioning , author=. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages=. 2020 , organization=
work page 2020
-
[51]
Education level and jobs: Opportunities by state
Elka Torpey and Audrey Watson. Education level and jobs: Opportunities by state. 2014
work page 2014
-
[52]
Journal of machine learning research , volume=
Visualizing data using t-SNE , author=. Journal of machine learning research , volume=
-
[53]
Computational intelligence and neuroscience , volume=
Deep learning for computer vision: A brief review , author=. Computational intelligence and neuroscience , volume=. 2018 , publisher=
work page 2018
-
[54]
Advances in neural information processing systems , pages=
What uncertainties do we need in bayesian deep learning for computer vision? , author=. Advances in neural information processing systems , pages=
-
[57]
Machine learning and portfolio optimization , author=. Management Science , volume=. 2018 , publisher=
work page 2018
-
[58]
AdaNet: Adaptive Structural Learning of Artificial Neural Networks , author=. 2016 , eprint=
work page 2016
-
[59]
Uncertainty in the Variational Information Bottleneck
Alemi, A. A.; Fischer, I.; and Dillon, J. V. 2018. Uncertainty in the Variational Information Bottleneck . arXiv:1807.00906 [cs, stat] ://arxiv.org/abs/1807.00906. ArXiv: 1807.00906
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[60]
Deep variational information bottleneck.arXiv preprint arXiv:1612.00410,
Alemi, A. A.; Fischer, I.; Dillon, J. V.; and Murphy, K. 2017. Deep Variational Information Bottleneck. International Conference on Learning Representations abs/1612.00410. ://arxiv.org/abs/1612.00410
-
[61]
Arik, S. O.; and Pfister, T. 2019. TabNet: Attentive Interpretable Tabular Learning. arXiv preprint arXiv:1908.07442 ://arxiv.org/abs/1908.07442
-
[62]
Ban, G.-Y.; El Karoui, N.; and Lim, A. E. 2018. Machine learning and portfolio optimization. Management Science 64(3): 1136--1154
work page 2018
-
[63]
Bradley, A. P. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30(7): 1145--1159
work page 1997
- [64]
-
[65]
Chapelle, O.; Scholkopf, B.; and Zien, A. 2009. Semi-supervised learning). IEEE Transactions on Neural Networks 20(3): 542--542
work page 2009
-
[66]
Chappelle, O.; Sch \"o lkopf, B.; and Zien, A. 2010. Semi-supervised learning. Adaptive Computation and Machine Learning
work page 2010
-
[67]
Chen, T.; and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785--794
work page 2016
-
[68]
Cheng, H.-T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems, 7--10
work page 2016
-
[69]
Clark, K.; Luong, M.-T.; Le, Q. V.; and Manning, C. D. 2020. ELECTRA : Pre -training Text Encoders as Discriminators Rather Than Generators . In International Conference on Learning Representations . ://openreview.net/forum?id=r1xMH1BtvB
work page 2020
- [70]
-
[71]
Cortes, C.; Gonzalvo, X.; Kuznetsov, V.; Mohri, M.; and Yang, S. 2016. AdaNet: Adaptive Structural Learning of Artificial Neural Networks
work page 2016
-
[72]
De Br\' e bisson, A.; Simon, E.; Auvolat, A.; Vincent, P.; and Bengio, Y. 2015. Artificial Neural Networks Applied to Taxi Destination Prediction. In Proceedings of the 2015th International Conference on ECML PKDD Discovery Challenge - Volume 1526, ECMLPKDDDC’15, 40–51. Aachen, DEU: CEUR-WS.org
work page 2015
-
[73]
Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT
work page 2019
-
[74]
Dua, D.; and Graff, C. 2017. UCI Machine Learning Repository. ://archive.ics.uci.edu/ml
work page 2017
-
[75]
Grandvalet, Y.; and Bengio, Y. 2005. Semi-supervised learning by entropy minimization. In Advances in neural information processing systems, 529--536
work page 2005
-
[76]
Grandvalet, Y.; and Bengio, Y. 2006. Entropy regularization. Semi-supervised learning 151--168
work page 2006
-
[77]
Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X.; and Dong, Z. 2018. DeepFM : An End -to- End Wide & Deep Learning Framework for CTR Prediction . arXiv:1804.04950 [cs, stat] ://arxiv.org/abs/1804.04950. ArXiv: 1804.04950
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[78]
Guyon, I.; Sun-Hosoya, L.; Boull\'e, M.; Escalante, H. J.; Escalera, S.; Liu, Z.; Jajetic, D.; Ray, B.; Saeed, M.; Sebag, M.; Statnikov, A.; Tu, W.; and Viegas, E. 2019. Analysis of the AutoML Challenge series 2015-2018. In AutoML, Springer series on Challenges in Machine Learning. ://www.automl.org/wp-content/uploads/2018/09/chapter10-challenge.pdf
work page 2019
-
[79]
Howard, J.; and Ruder, S. 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[80]
Huang, Z.; Xu, W.; and Yu, K. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[81]
Iscen, A.; Tolias, G.; Avrithis, Y.; and Chum, O. 2019. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5070--5079
work page 2019
- [82]
-
[83]
Jahrer, M. 2018. Porto Seguro ’s Safe Driver Prediction . ://kaggle.com/c/porto-seguro-safe-driver-prediction
work page 2018
-
[84]
Jain, S. 2017. Introduction to Pseudo-Labelling : A Semi-Supervised learning technique. https://www.analyticsvidhya.com/blog/2017/09/pseudo-labelling-semi-supervised-learning-technique/
work page 2017
-
[85]
Kaggle, Inc. 2017. The State of ML and Data Science 2017. ://www.kaggle.com/surveys/2017
work page 2017
-
[86]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; and Liu, T.-Y. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, 3146--3154. ://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
work page 2017
-
[87]
Ke, G.; Zhang, J.; Xu, Z.; Bian, J.; and Liu, T.-Y. 2019. Tab NN : A Universal Neural Network Solution for Tabular Data. ://openreview.net/forum?id=r1eJssCqY7
work page 2019
-
[88]
Kendall, A.; and Gal, Y. 2017. What uncertainties do we need in bayesian deep learning for computer vision? In Advances in neural information processing systems, 5574--5584
work page 2017
-
[89]
Klambauer, G.; Unterthiner, T.; Mayr, A.; and Hochreiter, S. 2017. Self-normalizing neural networks. In Advances in neural information processing systems, 971--980
work page 2017
-
[90]
Lee, D.-H. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, 2
work page 2013
-
[91]
Li, Z.; Cheng, W.; Chen, Y.; Chen, H.; and Wang, W. 2020. Interpretable Click - Through Rate Prediction through Hierarchical Attention . In Proceedings of the 13th International Conference on Web Search and Data Mining , 313--321. Houston TX USA: ACM. ISBN 978-1-4503-6822-3. doi:10.1145/3336191.3371785. ://dl.acm.org/doi/10.1145/3336191.3371785
-
[92]
Loshchilov, I.; and Hutter, F. 2017. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. ://arxiv.org/abs/1711.05101
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[93]
Maaten, L. v. d.; and Hinton, G. 2008. Visualizing data using t-SNE. Journal of machine learning research 9(Nov): 2579--2605
work page 2008
-
[94]
Mikolov, T.; Kombrink, S.; Burget, L.; C ernock \`y , J.; and Khudanpur, S. 2011. Extensions of recurrent neural network language model. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), 5528--5531. IEEE
work page 2011
-
[95]
S.; Yu, H.; Paganini, M.; and Tian, Y
Morcos, A. S.; Yu, H.; Paganini, M.; and Tian, Y. 2019. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers. arXiv:1906.02773 [cs, stat] ://arxiv.org/abs/1906.02773. ArXiv: 1906.02773
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.