Recognition: no theorem link
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Pith reviewed 2026-05-11 14:49 UTC · model grok-4.3
The pith
An open-source library supplies a unified API and pretrained models for state-of-the-art Transformer architectures in natural language processing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Transformers is an open-source library that provides state-of-the-art Transformer architectures under a single unified API, together with a curated collection of pretrained models contributed by and available to the community. The library is engineered to be extensible for researchers, straightforward for practitioners, and sufficiently robust and efficient for industrial use.
What carries the argument
The unified API that wraps multiple Transformer architectures while preserving their original performance and allowing consistent access to pretrained weights.
If this is right
- New models can be added by researchers without rewriting core training or inference loops.
- Practitioners gain immediate access to high-performing models for downstream tasks without reimplementing architectures.
- Industrial deployments benefit from a single, maintained codebase that supports multiple frameworks and hardware targets.
- Community contributions expand the set of available pretrained models and task-specific fine-tunes.
- Standardized interfaces reduce the engineering overhead of comparing or combining different Transformer variants.
Where Pith is reading between the lines
- Widespread use of the shared codebase could shift research focus from reimplementation details to new modeling ideas or data strategies.
- If the library remains actively maintained, it may serve as a de-facto reference implementation that influences how future papers release code.
- The same API pattern could be extended to other modalities, such as vision or speech, once corresponding Transformer models mature.
Load-bearing premise
The library's implementations must match the accuracy and behavior reported in the original papers that introduced each Transformer model.
What would settle it
A side-by-side benchmark on a standard task such as GLUE or SQuAD in which a model loaded from the library underperforms the numbers published in its source paper would falsify the claim of faithful reproduction.
read the original abstract
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. \textit{Transformers} is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at \url{https://github.com/huggingface/transformers}.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the Hugging Face Transformers library, an open-source Python package that implements a range of state-of-the-art Transformer architectures for natural language processing under a single, consistent API. It is backed by a curated collection of community-contributed pretrained models and is positioned as extensible for researchers, simple for practitioners, and robust for industrial deployment. The library is hosted at https://github.com/huggingface/transformers.
Significance. If the described implementations and pretrained weights are faithful to the original papers, the work is significant because it lowers the barrier to using high-capacity Transformer models, promotes reproducibility through open weights and code, and accelerates both research and deployment in NLP. The emphasis on a unified API and community contributions is a concrete strength that directly supports the paper's stated goals.
minor comments (1)
- [Abstract] Abstract: the phrase 'state-of-the art' is missing a hyphen and should read 'state-of-the-art'.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript, recognition of the library's role in lowering barriers to Transformer models, and recommendation to accept. We appreciate the emphasis on the unified API and community contributions as key strengths.
Circularity Check
No significant circularity; factual software documentation
full rationale
The paper is an announcement and documentation of the Hugging Face Transformers open-source library. It describes goals, design principles, and availability of a software package with pretrained models under a unified API. No mathematical derivations, equations, fitted parameters, predictions of new quantities, or self-referential claims appear. The central claim is the existence and features of publicly available code, externally verifiable via the GitHub URL and community contributions. No load-bearing steps reduce to inputs by construction, and the document contains no self-citation chains or uniqueness theorems invoked to justify internal results.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 47 Pith papers
-
Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models
Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.
-
VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?
VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual ...
-
RULER: What's the Real Context Size of Your Long-Context Language Models?
RULER shows most long-context LMs drop sharply in performance on complex tasks as length and difficulty increase, with only half maintaining results at 32K tokens.
-
Editing Models with Task Arithmetic
Task vectors from weight differences allow arithmetic operations to edit pre-trained models, improving multiple tasks simultaneously and enabling analogical inference on unseen tasks.
-
TokAlign++: Advancing Vocabulary Adaptation via Better Token Alignment
TokAlign++ learns token alignments between LLM vocabularies from monolingual representations to enable faster adaptation, better text compression, and effective token-level distillation across 15 languages with minimal steps.
-
EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints
EdgeFlowerTune is a real-device benchmark that jointly assesses model quality and system costs for federated LLM fine-tuning on edge hardware using three protocols: Quality-under-Budget, Cost-to-Target, and Robustness.
-
How Many Iterations to Jailbreak? Dynamic Budget Allocation for Multi-Turn LLM Evaluation
DAPRO provides the first dynamic, theoretically guaranteed way to allocate interaction budgets across test cases for bounding time-to-event in multi-turn LLM evaluations, achieving tighter coverage than static conform...
-
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
-
Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression
Auto-FlexSwitch achieves efficient dynamic model merging by decomposing task vectors into sparse masks, signs, and scalars, then making the compression learnable via gating and adaptive bit selection with KNN-based retrieval.
-
SecureRouter: Encrypted Routing for Efficient Secure Inference
SecureRouter accelerates secure transformer inference by 1.95x via an encrypted router that selects input-adaptive models from an MPC-optimized pool with negligible accuracy loss.
-
Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models
Agreeableness in AI personas reliably predicts sycophantic behavior in 9 of 13 tested language models.
-
VertAX: a differentiable vertex model for learning epithelial tissue mechanics
VertAX supplies a differentiable JAX implementation of vertex models for confluent epithelia that enables forward simulation, mechanical parameter inference, and inverse design of tissue-scale behaviors.
-
Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Visual attention in MLLMs shows inertia that hinders cognitive inference on object relations, addressed by a training-free Inertia-aware Visual Excitation method that selects dynamically emerging tokens and applies an...
-
QLoRA: Efficient Finetuning of Quantized LLMs
QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
-
High-Resolution Image Synthesis with Latent Diffusion Models
Latent diffusion models achieve state-of-the-art inpainting and competitive results on unconditional generation, scene synthesis, and super-resolution by performing the diffusion process in the latent space of pretrai...
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.
-
Large Spectrum Models (LSMs): Decoder-Only Transformer-Powered Spectrum Activity Forecasting via Tokenized RF Data
Decoder-only transformers trained on tokenized RF spectrum data from 22 TB of measurements achieve 3.25 dB RMSE in spectrum activity forecasting across 33 bands.
-
Query-efficient model evaluation using cached responses
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
-
ModelLens: Finding the Best for Your Task from Myriads of Models
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
-
Why Does Agentic Safety Fail to Generalize Across Tasks?
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstr...
-
BAMI: Training-Free Bias Mitigation in GUI Grounding
BAMI mitigates precision and ambiguity biases in GUI grounding via coarse-to-fine focus and candidate selection, raising accuracy on ScreenSpot-Pro without training.
-
Scaling Pretrained Representations Enables Label-Free Out-of-Distribution Detection Without Fine-Tuning
Scaling pretrained representations improves label-free OOD detection on frozen backbones, causing performance gaps between global and local detectors to vanish across vision and language tasks.
-
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
-
When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient
Certain errors in proxy rewards for policy gradient methods can be benign or beneficial by preventing policies from stalling on outputs with mediocre ground truth rewards, enabling improved RLHF metrics and reward des...
-
Leveraging LLMs for Multi-File DSL Code Generation: An Industrial Case Study
Fine-tuning 7B code LLMs on a custom multi-file DSL dataset achieves structural fidelity of 1.00, high exact-match accuracy, and practical utility validated by expert survey and execution checks.
-
R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs
R-CoV is a six-step region-aware chain-of-verification technique that elicits coordinate and description outputs from LVLMs themselves to detect and reduce object hallucinations without external models or retraining.
-
RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models
RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical predic...
-
Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs
Causal interventions reveal that coordination islands block filler-gap mechanisms in Transformers in a gradient way matching humans, yielding the hypothesis that 'and' encodes relational dependencies differently in ex...
-
SeLaR: Selective Latent Reasoning in Large Language Models
SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.
-
Rethinking Residual Errors in Compensation-based LLM Quantization
Redefining residual errors to include compensation-aware discrepancies and realigning calibration to full-precision outputs improves GPTQ and GPTAQ performance on LLMs.
-
Beyond End-to-End: Dynamic Chain Optimization for Private LLM Adaptation on the Edge
ChainFed achieves memory-efficient private LLM fine-tuning on edge devices through sequential layer-by-layer adapter training with dynamic co-tuning, perceptive optimization, and adaptive starting point selection, imp...
-
Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits
LLM warm-starts for bandits remain better than cold-starts up to roughly 30% random label noise but increase regret under systematic misalignment, with a derived sufficient condition on prior error that predicts when ...
-
MemFactory: Unified Inference & Training Framework for Agent Memory
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
-
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow combines single- and multi-controller paradigms with a 3D-HybridEngine to deliver 1.53x to 20.57x higher throughput for various RLHF algorithms compared to prior systems.
-
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA achieves 16.5% higher task success than the 55B RT-2-X model across 29 tasks with 7x fewer parameters while enabling effective fine-tuning and quantization without performance loss.
-
Steering Llama 2 via Contrastive Activation Addition
Contrastive Activation Addition steers Llama 2 Chat by adding averaged residual-stream activation differences from contrastive example pairs to control targeted behaviors at inference time.
-
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
AdaLoRA uses SVD-based pruning to allocate the parameter budget for low-rank fine-tuning updates according to per-matrix importance scores, yielding better performance than uniform allocation especially under tight budgets.
-
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Grounding DINO fuses language and vision via feature enhancer, language-guided query selection, and cross-modality decoder in a DINO backbone, achieving 52.5 AP zero-shot on COCO and a new record of 26.1 AP mean on ODinW.
-
Latency Analysis and Optimization of Alpamayo 1 via Efficient Trajectory Generation
Redesigning Alpamayo 1 to single-reasoning and optimizing diffusion action generation cuts inference latency by 69.23% while preserving trajectory diversity and prediction quality.
-
Reasoning Compression with Mixed-Policy Distillation
Mixed-Policy Distillation transfers concise reasoning behavior from larger to smaller LLMs by having the teacher compress student-generated trajectories, cutting token usage up to 27% while raising benchmark scores.
-
EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer
EGAD adaptively distills LLM knowledge at the token level by using entropy to create a curriculum from low- to high-entropy tokens, adjust temperature, and switch between logits-only and feature-based branches.
-
GiVA: Gradient-Informed Bases for Vector-Based Adaptation
GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.
-
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
-
Reconstruction of a 3D wireframe from a single line drawing via generative depth estimation
A latent diffusion model conditioned on line drawings estimates dense depth to reconstruct 3D wireframes, reporting 5.3% average depth error after training on over one million pairs.
-
FedSpy-LLM: Towards Scalable and Generalizable Data Reconstruction Attacks from Gradients on LLMs
FedSpy-LLM uses gradient decomposition and iterative alignment to reconstruct larger batches and longer sequences of training data from LLM gradients in federated settings, including with PEFT methods.
-
OpenSOC-AI: Democratizing Security Operations with Parameter Efficient LLM Log Analysis
LoRA fine-tuning of TinyLlama-1.1B on 450 SOC examples produces 68% threat classification accuracy and 58% severity accuracy on 50 held-out logs, with full code, weights, and data released.
Reference graph
Works this paper leans on
-
[1]
Contextual String Embeddings for Sequence Labeling , author=
-
[2]
Pooled Contextualized Embeddings for Named Entity Recognition , author=
-
[3]
Gomez and Stephan Gouws and Llion Jones and
Ashish Vaswani and Samy Bengio and Eugene Brevdo and Francois Chollet and Aidan N. Gomez and Stephan Gouws and Llion Jones and. Tensor2Tensor for Neural Machine Translation , journal =. 2018 , url =
work page 2018
-
[4]
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=
Large-scale transfer learning for natural language generation , author=. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , pages=
-
[5]
Decoupled Weight Decay Regularization
Fixing weight decay regularization in adam , author=. arXiv preprint arXiv:1711.05101 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
Transfer Learning in Natural Language Processing , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials , pages=
work page 2019
-
[10]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , author=. NAACL-HLT , year=
-
[11]
Language Models are Unsupervised Multitask Learners , author=
-
[12]
Improving Language Understanding by Generative Pre-Training , author=
-
[13]
RoBERTa: A Robustly Optimized BERT Pretraining Approach , author=. ArXiv , year=
-
[14]
Dissecting Contextual Word Embeddings: Architecture and Representation , author=. EMNLP , year=
- [15]
- [16]
-
[17]
What Does BERT Look At? An Analysis of BERT's Attention , author =. BlackBoxNLP@ACL , year =
-
[18]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , author=. ICLR , year=
-
[19]
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , author=. ArXiv , year=
-
[20]
SQuAD: 100, 000+ Questions for Machine Comprehension of Text , author=. EMNLP , year=
-
[21]
Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam , booktitle=. Automatic Differentiation in
- [22]
-
[23]
AllenNLP: A Deep Semantic Natural Language Processing Platform , author=. 2018 , booktitle=
work page 2018
-
[24]
Universal Language Model Fine-tuning for Text Classification , author=. ACL , year=
-
[25]
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , author=. EMNLP , year=
-
[26]
RACE: Large-scale ReAding Comprehension Dataset From Examinations , author=. EMNLP , year=
-
[27]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , author=. ArXiv , year=
-
[28]
Know What You Don't Know: Unanswerable Questions for SQuAD , author=. ACL , year=
- [29]
-
[30]
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , author=. ACL , year=
-
[31]
XLNet: Generalized Autoregressive Pretraining for Language Understanding , author=. ArXiv , year=
- [32]
-
[33]
Thomas Wolf and Victor Sanh and Julien Chaumond and Clement Delangue , title =. CoRR , volume =. 2019 , url =
work page 2019
-
[34]
NeurIPS ConvAI Wokshop , year=
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , author=. NeurIPS ConvAI Wokshop , year=
-
[35]
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , year =
Antoine Bosselut and Hannah Rashkin and Maarten Sap and Chaitanya Malaviya and Asli Çelikyilmaz and Yejin Choi , booktitle =. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , year =
- [36]
-
[37]
Neural network acceptability judgments
Neural Network Acceptability Judgments , author=. arXiv preprint 1805.12471 , year=
-
[38]
Recursive deep models for semantic compositionality over a sentiment treebank , author=. Proceedings of EMNLP , pages=
-
[39]
Proceedings of the International Workshop on Paraphrasing , year=
Automatically constructing a corpus of sentential paraphrases , author=. Proceedings of the International Workshop on Paraphrasing , year=
-
[40]
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007) , month =. 2007 , address =
work page 2007
- [41]
-
[42]
Dagan, Ido and Glickman, Oren and Magnini, Bernardo , booktitle=. The. 2006 , publisher=
work page 2006
-
[43]
Bar Haim, Roy and Dagan, Ido and Dolan, Bill and Ferro, Lisa and Giampiccolo, Danilo and Magnini, Bernardo and Szpektor, Idan , year=. The second
- [44]
- [45]
-
[46]
Levesque, Hector J and Davis, Ernest and Morgenstern, Leora , booktitle=. The
-
[47]
Iyer, Shankar and Dandekar, Nikhil and Csernai, Kornel , title =
-
[48]
Clark, Christopher and Lee, Kenton and Chang, Ming-Wei and Kwiatkowski, Tom and Collins, Michael and Toutanova, Kristina , booktitle=
-
[49]
De Marneffe, Marie-Catherine and Simons, Mandy and Tonhauser, Judith , note=
-
[50]
2011 AAAI Spring Symposium Series , year=
Choice of plausible alternatives: An evaluation of commonsense causal reasoning , author=. 2011 AAAI Spring Symposium Series , year=
work page 2011
-
[51]
Looking beyond the surface: A challenge set for reading comprehension over multiple sentences , author=. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages=
work page 2018
-
[52]
Sheng Zhang and Xiaodong Liu and Jingjing Liu and Jianfeng Gao and Kevin Duh and Benjamin Van Durme , journal=
-
[53]
Pilehvar, Mohammad Taher and Camacho-Collados, Jose , booktitle=
-
[54]
Proceedings of NAACL-HLT , year=
Gender Bias in Coreference Resolution , author=. Proceedings of NAACL-HLT , year=
-
[55]
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation , author=. Proceedings of EMNLP , year=
-
[57]
Learned in Translation: Contextualized Word Vectors
McCann, Bryan and Bradbury, James and Xiong, Caiming and Socher, Richard. Learned in Translation: Contextualized Word Vectors. Advances in Neural Information Processing Systems 30
-
[58]
Are Sixteen Heads Really Better than One?
Michel, Paul and Levy, Omer and Neubig, Graham. Are Sixteen Heads Really Better than One?. arXiv:1905.10650
-
[65]
IEEE transactions on visualization and computer graphics , volume=
Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks , author=. IEEE transactions on visualization and computer graphics , volume=. 2017 , publisher=
work page 2017
-
[66]
FlauBERT: Unsupervised Language Model Pre-training for French , booktitle =
Le, Hang and Vial, Lo\". FlauBERT: Unsupervised Language Model Pre-training for French , booktitle =. 2020 , address =
work page 2020
-
[67]
Flair: An easy-to-use framework for state-of-the-art nlp
Akbik, Alan and Bergmann, Tanja and Blythe, Duncan and Rasul, Kashif and Schweter, Stefan and Vollgraf, Roland. Flair: An easy-to-use framework for state-of-the-art nlp. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
work page 2019
-
[68]
The Stanford CoreNLP natural language processing toolkit
Manning, Christopher D and Surdeanu, Mihai and Bauer, John and Finkel, Jenny Rose and Bethard, Steven and McClosky, David. The Stanford CoreNLP natural language processing toolkit. Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations
- [70]
-
[72]
Summary of the models --- transformers 3.0.0 documentation
-
[76]
SciBERT : A Pretrained Language Model for Scientific Text
Beltagy, Iz and Lo, Kyle and Cohan, Arman. SciBERT : A Pretrained Language Model for Scientific Text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing ( EMNLP-IJCNLP )
work page 2019
-
[84]
Honnibal, Matthew and Montani, Ines. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear
-
[86]
preprint arXiv:1701.02810 , year=
Klein, Guillaume and Kim, Yoon and Deng, Yuntian and Senellart, Jean and Rush, Alexander M. OpenNMT : Open-Source Toolkit for Neural Machine Translation. arXiv:1701.02810
-
[88]
James Bradbury and Roy Frostig and Peter Hawkins and Matthew James Johnson and Chris Leary and Dougal Maclaurin and Skye Wanderman-Milne , title =
-
[89]
13th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 18) , pages=
\ TVM \ : An automated end-to-end optimizing compiler for deep learning , author=. 13th \ USENIX \ Symposium on Operating Systems Design and Implementation ( \ OSDI \ 18) , pages=
-
[90]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Reimers, Nils and Gurevych, Iryna. Sentence-BERT : Sentence Embeddings using Siamese BERT-Networks. arXiv:1908.10084
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[92]
Language models are unsupervised multitask learners
Radford, Alec and Wu, Jeffrey and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya. Language models are unsupervised multitask learners. OpenAI Blog
-
[94]
Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, Ukasz and Polosukhin, Illia. Attention is All you Need. Advances in Neural Information Processing Systems 30
-
[95]
Microsoft's Conference Management Toolkit
- [96]
-
[97]
Learning to love virtual conferences in the coronavirus era
Woolston, Chris. Learning to love virtual conferences in the coronavirus era. Nature
-
[98]
Latent Alignment and Variational Attention
Deng, Yuntian and Kim, Yoon and Chiu, Justin and Guo, Demi and Rush, Alexander M. Latent Alignment and Variational Attention. arXiv:1807.03756
-
[99]
Structured Neural Topic Models for Reviews
Esmaeili, Babak and Huang, Hongyi and Wallace, Byron C and van de Meent, Jan-Willem. Structured Neural Topic Models for Reviews. arXiv:1812.05035
-
[100]
An estimate of an upper bound for the entropy of English
Brown, Peter F and Pietra, Vincent J Della and Mercer, Robert L and Pietra, Stephen A Della and Lai, Jennifer C. An estimate of an upper bound for the entropy of English. Comput. Linguist
-
[101]
Prediction and Entropy of Printed English
Shannon, C E. Prediction and Entropy of Printed English. Bell System Technical Journal
-
[102]
Dyna: A declarative language for implementing dynamic programs
Eisner, Jason and Goldlust, Eric and Smith, Noah A. Dyna: A declarative language for implementing dynamic programs. Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
work page 2004
-
[103]
Quick Training of Probabilistic Neural Nets by Importance Sampling
Bengio, Yoshua and Sen \'e cal, Jean-S \'e bastien and Others. Quick Training of Probabilistic Neural Nets by Importance Sampling. AISTATS
-
[104]
Bowman, Luke Vilnis, Oriol Vinyals, Andrew M
Bowman, Samuel R and Vilnis, Luke and Vinyals, Oriol and Dai, Andrew M and Jozefowicz, Rafal and Bengio, Samy. Generating Sentences from a Continuous Space. arXiv:1511.06349
-
[105]
Categorical Reparameterization with Gumbel-Softmax
Jang, Eric and Gu, Shixiang and Poole, Ben. Categorical Reparameterization with Gumbel-Softmax. arXiv:1611.01144
work page internal anchor Pith review arXiv
-
[106]
Differentiable Perturb-and-Parse : Semi-Supervised Parsing with a Structured Variational Autoencoder
Corro, Caio and Titov, Ivan. Differentiable Perturb-and-Parse : Semi-Supervised Parsing with a Structured Variational Autoencoder. arXiv:1807.09875
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.