pith. machine review for the scientific record. sign in

arxiv: 2201.08239 · v3 · submitted 2022-01-20 · 💻 cs.CL · cs.AI

Recognition: no theorem link

LaMDA: Language Models for Dialog Applications

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords language modelsdialog systemssafetyfactual groundingfine-tuningexternal knowledgetransformer models
0
0 comments X

The pith

Fine-tuning LaMDA models on annotated human values plus access to external tools markedly raises safety and factual grounding in dialog responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LaMDA consists of large transformer language models pre-trained on public dialog and web text for conversational use. Scaling the models alone lifts overall response quality but leaves safety and factuality largely unchanged. The work demonstrates that fine-tuning on a modest set of crowdworker annotations for values such as avoiding harm and bias, combined with an interface allowing the model to query information retrieval, translation, and calculation tools, produces measurable gains on both challenges. These gains are quantified with a safety classifier that filters candidate replies and a groundedness metric that checks whether answers rest on verifiable sources rather than plausible invention. The approach is further tested in education and recommendation settings for helpfulness and role consistency.

Core claim

LaMDA models achieve stronger safety by routing candidate replies through a classifier trained on annotated examples of human values and achieve stronger factual grounding by consulting external knowledge sources during generation rather than relying solely on internal parameters.

What carries the argument

A safety classifier fine-tuned on crowd-annotated dialog data that filters responses for alignment with selected human values, paired with an external-tool interface that lets the model call information retrieval, translation, or calculation systems to ground its outputs.

If this is right

  • Dialog systems can filter outputs for consistency with chosen values before they reach users.
  • Models can produce answers that cite or derive from retrieved sources instead of generating from memory alone.
  • Targeted fine-tuning and tool access can outperform further scaling for safety and factuality.
  • Education and recommendation applications show gains in helpfulness and consistency when these methods are applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same fine-tuning and tool-use pattern could be applied to other open-ended generation tasks where alignment and verifiability matter.
  • Expanding the set of external tools might allow the model to handle additional reasoning steps not covered by the current three.
  • Safety metrics built on a limited illustrative value set leave room for later expansion or crowdsourced refinement.

Load-bearing premise

The selected human values for annotation and the three chosen external tools are sufficient to cover safety and factuality needs across open-ended real-world conversations.

What would settle it

Run the model on prompts involving values outside the annotated set or facts absent from the retrieval, translation, and calculator tools; if the rate of unsafe or ungrounded replies stays as high as in the base model, the claimed improvements do not hold.

read the original abstract

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces LaMDA, a family of Transformer-based language models up to 137B parameters pre-trained on 1.56T words of public dialog data and web text. It claims that scaling improves overall dialog quality but yields limited gains on safety and factual grounding. The authors demonstrate that fine-tuning a classifier on a small set of crowdworker-annotated data to filter unsafe responses, combined with enabling the model to consult external tools (information retrieval, translator, calculator), produces improvements on a safety metric derived from an illustrative set of human values and a groundedness metric. The work also includes qualitative explorations of LaMDA in education and content-recommendation domains.

Significance. If the reported gains hold under broader testing, the paper supplies a practical, scalable recipe for mitigating two persistent limitations of large dialog models. The explicit integration of external knowledge sources rather than sole reliance on parametric memory is a clear methodological contribution that later systems have adopted. The scale of the pre-training corpus and the separation of safety fine-tuning from tool-augmented decoding are additional strengths that provide a concrete baseline for subsequent research.

major comments (2)
  1. [Safety and factual grounding sections] Safety and factual-grounding sections: the safety metric is defined over an illustrative set of human values and the groundedness metric depends on the fixed trio of external tools. Because both the training signal and the evaluation metric are constructed from the same limited annotation set and tool interfaces, the measured improvements may be artifacts of the chosen scope rather than robust advances on the broader challenges of safety and factual grounding. An out-of-distribution test set or independently sourced value specification is needed to substantiate the central claim.
  2. [Results sections] Results sections: the manuscript asserts 'significant improvements' yet supplies no error bars, confidence intervals, or statistical significance tests for the safety and groundedness scores. Without these quantities it is impossible to judge whether the observed deltas exceed what could be obtained by alternative fine-tuning regimes or are reliable across random seeds.
minor comments (3)
  1. [Abstract] Abstract: the phrase 'less improvements' is grammatically imprecise and should be replaced by 'smaller improvements' or 'limited improvements'.
  2. [Methods and figures] Figure captions and tool-integration diagrams: the description of how tool calls are interleaved with generation is terse; a short pseudocode snippet or expanded caption would improve reproducibility.
  3. [Related work] Related-work section: several contemporaneous papers on tool-augmented language models and safety fine-tuning are not cited; adding them would better situate the contribution.

Simulated Author's Rebuttal

2 responses · 1 unresolved

Thank you for the constructive feedback on our LaMDA manuscript. We respond to each major comment below, providing clarifications and indicating where revisions can be made to address the concerns.

read point-by-point responses
  1. Referee: [Safety and factual grounding sections] Safety and factual-grounding sections: the safety metric is defined over an illustrative set of human values and the groundedness metric depends on the fixed trio of external tools. Because both the training signal and the evaluation metric are constructed from the same limited annotation set and tool interfaces, the measured improvements may be artifacts of the chosen scope rather than robust advances on the broader challenges of safety and factual grounding. An out-of-distribution test set or independently sourced value specification is needed to substantiate the central claim.

    Authors: The manuscript explicitly describes the safety values as 'illustrative' and the tools as representative examples of external knowledge sources. The improvements demonstrated are specific to this setup, showing that the fine-tuning and tool-use approach can enhance performance on these metrics. We agree that the claims are scoped to the chosen annotations and tools, and we can revise the text to emphasize the illustrative nature and discuss how the framework generalizes to other value sets or tools. However, conducting new out-of-distribution evaluations would require additional crowdworker annotations and experiments not included in the current work. revision: partial

  2. Referee: [Results sections] Results sections: the manuscript asserts 'significant improvements' yet supplies no error bars, confidence intervals, or statistical significance tests for the safety and groundedness scores. Without these quantities it is impossible to judge whether the observed deltas exceed what could be obtained by alternative fine-tuning regimes or are reliable across random seeds.

    Authors: We acknowledge the absence of statistical measures in the reported results. The evaluations were performed using fixed test sets derived from the annotations, and the improvements are presented as direct comparisons. In a revision, we can include error bars estimated via bootstrap resampling or multiple evaluation runs where applicable, and clarify the evaluation methodology to allow assessment of reliability. This will strengthen the presentation without altering the core findings. revision: yes

standing simulated objections not resolved
  • We cannot provide out-of-distribution test sets or independently sourced value specifications, as this would necessitate new data collection efforts beyond the scope of the presented experiments.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's improvements in safety and factual grounding are demonstrated via fine-tuning on externally sourced crowdworker annotations for an illustrative set of human values and via consultation of independent external tools (IR system, translator, calculator). The safety metric and groundedness metric are defined against these separate annotations and known sources rather than quantities derived from the model's own outputs or fitted parameters. No load-bearing step reduces by construction to self-defined inputs, fitted subsets renamed as predictions, or self-citation chains; the experimental results remain falsifiable against the external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no explicit mathematical derivations, free parameters, or invented entities; the work is framed as empirical engineering on top of standard transformer pre-training.

pith-pipeline@v0.9.0 · 5794 in / 1273 out tokens · 73830 ms · 2026-05-12T03:09:49.108188+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 44 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    cs.CL 2022-01 accept novelty 9.0

    Chain-of-thought prompting, by including intermediate reasoning steps in few-shot examples, elicits strong reasoning abilities in large language models on arithmetic, commonsense, and symbolic tasks.

  2. AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

    cs.CR 2024-06 unverdicted novelty 8.0

    AgentDojo introduces an extensible evaluation framework populated with realistic agent tasks and security test cases to measure prompt injection robustness in tool-using LLM agents.

  3. MusicLM: Generating Music From Text

    cs.SD 2023-01 conditional novelty 8.0

    MusicLM produces coherent multi-minute 24 kHz music from text prompts using hierarchical sequence-to-sequence modeling and outperforms prior systems in quality and text adherence.

  4. Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models

    cs.CL 2026-05 unverdicted novelty 7.0

    dGRPO merges outcome-based policy optimization with dense teacher guidance from on-policy distillation, yielding more stable long-context reasoning on the new LongBlocks synthetic dataset.

  5. IRIS: Interpolative R\'enyi Iterative Self-play for Large Language Model Fine-Tuning

    cs.LG 2026-04 unverdicted novelty 7.0

    IRIS unifies self-play fine-tuning under an interpolative Rényi objective with adaptive alpha scheduling and reports better benchmark scores than baselines while surpassing full supervised fine-tuning with only 13% of...

  6. QLoRA: Efficient Finetuning of Quantized LLMs

    cs.LG 2023-05 conditional novelty 7.0

    QLoRA finetunes 4-bit quantized LLMs via LoRA adapters to match full-precision performance while using far less memory, enabling 65B-scale training on single GPUs and producing Guanaco models near ChatGPT level.

  7. Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

    cs.CL 2022-11 unverdicted novelty 7.0

    PoT prompting improves numerical reasoning by having language models write programs executed by a computer instead of performing calculations in natural language chains of thought, with an average 12% gain over CoT.

  8. A Generalist Agent

    cs.AI 2022-05 accept novelty 7.0

    Gato is a multi-modal, multi-task, multi-embodiment generalist policy using one transformer network to handle text, vision, games, and robotics tasks.

  9. OPT: Open Pre-trained Transformer Language Models

    cs.CL 2022-05 unverdicted novelty 7.0

    OPT releases open decoder-only transformers up to 175B parameters that match GPT-3 performance at one-seventh the carbon cost, along with code and training logs.

  10. Flamingo: a Visual Language Model for Few-Shot Learning

    cs.CV 2022-04 unverdicted novelty 7.0

    Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

  11. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

    cs.RO 2022-04 accept novelty 7.0

    SayCan combines an LLM's high-level semantic knowledge with robot skill value functions to select only feasible actions, enabling completion of abstract natural-language instructions on a real mobile manipulator.

  12. Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

    cs.CR 2026-05 unverdicted novelty 6.0

    DR-Smoothing introduces a disrupt-then-rectify prompt processing scheme into smoothing defenses, delivering tight theoretical bounds on success probability against both token- and prompt-level jailbreaks.

  13. Response Time Enhances Alignment with Heterogeneous Preferences

    cs.LG 2026-05 unverdicted novelty 6.0

    Response times modeled as drift-diffusion processes enable consistent estimation of population-average preferences from heterogeneous anonymous binary choices.

  14. A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

    cs.LG 2026-05 unverdicted novelty 6.0

    MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

  15. CleanBase: Detecting Malicious Documents in RAG Knowledge Databases

    cs.CR 2026-05 unverdicted novelty 6.0

    CleanBase identifies malicious documents in RAG databases by detecting cliques in a semantic similarity graph constructed using embedding models and a statistical threshold.

  16. RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models

    cs.CV 2026-04 unverdicted novelty 6.0

    RaTA-Tool retrieves suitable external tools for multimodal queries by matching generated task descriptions against tool metadata, supported by a new Hugging Face-derived dataset and DPO optimization.

  17. ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification

    cs.LG 2026-04 unverdicted novelty 6.0

    ADAPT is a new pre-training paradigm that aligns physical properties of time-series data to allow simultaneous training on 162 diverse classification datasets, achieving new state-of-the-art performance.

  18. Towards an AI co-scientist

    cs.AI 2025-02 unverdicted novelty 6.0

    A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.

  19. Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    cs.CV 2024-01 unverdicted novelty 6.0

    Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.

  20. MiniLLM: On-Policy Distillation of Large Language Models

    cs.CL 2023-06 conditional novelty 6.0

    MiniLLM distills large language models into smaller ones via reverse KL divergence and on-policy optimization, yielding higher-quality responses with lower exposure bias than standard KD baselines.

  21. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

    cs.CL 2023-06 unverdicted novelty 6.0

    Properly filtered web data from CommonCrawl alone trains LLMs that significantly outperform models trained on The Pile, with 600 billion tokens and 1.3B/7.5B parameter models released.

  22. Gorilla: Large Language Model Connected with Massive APIs

    cs.CL 2023-05 conditional novelty 6.0

    Gorilla is a fine-tuned LLM that surpasses GPT-4 in accurate API call generation and uses retrieval to handle documentation updates.

  23. Improving Factuality and Reasoning in Language Models through Multiagent Debate

    cs.CL 2023-05 unverdicted novelty 6.0

    Multiagent debate among LLMs improves mathematical reasoning, strategic reasoning, and factual accuracy while reducing hallucinations.

  24. CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society

    cs.AI 2023-03 conditional novelty 6.0

    CAMEL proposes a role-playing framework with inception prompting that enables autonomous multi-agent cooperation among LLMs and generates conversational data for studying their behaviors.

  25. BloombergGPT: A Large Language Model for Finance

    cs.LG 2023-03 conditional novelty 6.0

    BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

  26. PaLM-E: An Embodied Multimodal Language Model

    cs.LG 2023-03 conditional novelty 6.0

    PaLM-E is a single 562B-parameter multimodal model that performs embodied reasoning tasks like robotic manipulation planning and visual question answering by interleaving vision, state, and text inputs with positive t...

  27. Multimodal Chain-of-Thought Reasoning in Language Models

    cs.CL 2023-02 accept novelty 6.0

    Multimodal-CoT achieves state-of-the-art on ScienceQA by using a two-stage process that incorporates vision into chain-of-thought rationale generation for models under 1 billion parameters.

  28. Improving alignment of dialogue agents via targeted human judgements

    cs.LG 2022-09 unverdicted novelty 6.0

    Sparrow uses targeted rule-based human feedback and evidence provision to outperform baselines in preference while violating rules only 8% of the time under adversarial probing.

  29. Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    cs.CL 2022-08 accept novelty 6.0

    RLHF-aligned language models show increasing resistance to red teaming with scale up to 52B parameters, unlike prompted or rejection-sampled models, supported by a released dataset of 38,961 attacks.

  30. Language Models (Mostly) Know What They Know

    cs.CL 2022-07 unverdicted novelty 6.0

    Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.

  31. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

    cs.CV 2022-06 unverdicted novelty 6.0

    Scaling an autoregressive Transformer to 20B parameters for text-to-image generation using image token sequences achieves new SOTA zero-shot FID of 7.23 and fine-tuned FID of 3.22 on MS-COCO.

  32. Emergent Abilities of Large Language Models

    cs.CL 2022-06 unverdicted novelty 6.0

    Emergent abilities are capabilities present in large language models but absent in smaller ones and cannot be predicted by extrapolating smaller model performance.

  33. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

    cs.CL 2022-04 unverdicted novelty 6.0

    RLHF alignment training on language models boosts NLP performance, supports skill specialization, enables weekly online updates with fresh human data, and shows a linear relation between RL reward and sqrt(KL divergen...

  34. PaLM: Scaling Language Modeling with Pathways

    cs.CL 2022-04 accept novelty 6.0

    PaLM 540B demonstrates continued scaling benefits by setting new few-shot SOTA results on hundreds of benchmarks and outperforming humans on BIG-bench.

  35. Fast NF4 Dequantization Kernels for Large Language Model Inference

    cs.LG 2026-04 unverdicted novelty 5.0

    A lightweight shared-memory technique for NF4 dequantization kernels yields 2.0-2.2x kernel speedup and 1.54x end-to-end gains on models up to 70B parameters while using only 64 bytes of shared memory per block.

  36. PaLM 2 Technical Report

    cs.CL 2023-05 unverdicted novelty 5.0

    PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

  37. StarCoder: may the source be with you!

    cs.CL 2023-05 accept novelty 5.0

    StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

  38. Constitutional AI: Harmlessness from AI Feedback

    cs.CL 2022-12 unverdicted novelty 5.0

    Pith review generated a malformed one-line summary.

  39. Galactica: A Large Language Model for Science

    cs.CL 2022-11 unverdicted novelty 5.0

    Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

  40. Useful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Research

    cs.AI 2026-05 unverdicted novelty 4.0

    AI Q&A tools give useful overviews but fail at precise information extraction and source tracing, while literature review tools aid exploration yet lack reproducibility and transparency, making them unsuitable for sys...

  41. Useful for Exploration, Risky for Precision: Evaluating AI Tools in Academic Research

    cs.AI 2026-05 unverdicted novelty 4.0

    AI tools deliver useful overviews for research exploration but prove unreliable for precise information extraction and systematic reviews due to low explainability, reproducibility, and transparency.

  42. On The Application of Linear Attention in Multimodal Transformers

    cs.CV 2026-04 unverdicted novelty 4.0

    Linear attention delivers significant computational savings in multimodal transformers and follows the same scaling laws as softmax attention on ViT models trained on LAION-400M with ImageNet-21K zero-shot validation.

  43. Large Language Models: A Survey

    cs.CL 2024-02 accept novelty 3.0

    The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.

  44. A Survey of Large Language Models

    cs.CL 2023-03 accept novelty 3.0

    This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Reference graph

Works this paper leans on

120 extracted references · 120 canonical work pages · cited by 43 Pith papers · 8 internal anchors

  1. [1]

    Skip-thought vectors

    Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Skip-thought vectors. In Advances in Neural Information Processing Systems, pages 3294–3302, 2015

  2. [2]

    Semi-supervised sequence learning

    Andrew M Dai and Quoc V Le. Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, 2015

  3. [3]

    Deep contextualized word representations

    Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettle- moyer. Deep contextualized word representations. In NAACL, 2018

  4. [5]

    Improving language understanding by generative pre-training

    Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language understanding by generative pre-training. https://blog.openai.com/language-unsupervised, 2018

  5. [6]

    BERT: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019

  6. [7]

    XLNet: Generalized autoregressive pretraining for language understanding

    Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V Le. XLNet: Generalized autoregressive pretraining for language understanding. In NeurIPS, 2019

  7. [8]

    Albert: A lite bert for self-supervised learning of language representations

    Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1eA7AEtvS

  8. [9]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT pretraining approach.arXiv preprint arXiv:1907.11692, 2019

  9. [10]

    Le, and Christopher D

    Kevin Clark, Minh-Thang Luong, Quoc V . Le, and Christopher D. Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR, 2020

  10. [11]

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 2020

  11. [12]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

  12. [13]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020

  13. [14]

    Neural responding machine for short-text conversation

    Lifeng Shang, Zhengdong Lu, and Hang Li. Neural responding machine for short-text conversation. In ACL, 2015. 19

  14. [15]

    A neural network approach to context-sensitive generation of conversational responses

    Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. A neural network approach to context-sensitive generation of conversational responses. arXiv preprint arXiv:1506.06714, 2015

  15. [16]

    Oriol Vinyals and Quoc V . Le. A neural conversational model. In ICML Workshop, 2015

  16. [17]

    arXiv preprint arXiv:2001.09977 , year=

    Daniel Adiwardana, Minh-Thang Luong, David R. So, Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, and Quoc V . Le. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977, 2020

  17. [18]

    Smith, Y-Lan Boureau, and Jason Weston

    Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, and Jason Weston. Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637, 2020

  18. [19]

    Recurrent neural network based language model

    Tomas Mikolov, Martin Karafiát, Lukas Burget, Jan Cernock`y, and Sanjeev Khudanpur. Recurrent neural network based language model. In INTERSPEECH, 2010

  19. [20]

    Generating text with recurrent neural networks

    Ilya Sutskever, James Martens, and Geoffrey E Hinton. Generating text with recurrent neural networks. In ICML, 2011

  20. [21]

    Exploring the Limits of Language Modeling

    Rafal Józefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016

  21. [22]

    Universal language model fine-tuning for text classification

    Jeremy Howard and Sebastian Ruder. Universal language model fine-tuning for text classification. InACL, 2018

  22. [23]

    Unsupervised representation learning with deep convolutional generative adversarial networks

    Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016

  23. [24]

    Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, ...

  24. [25]

    Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W

    Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander H. Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W. Black, Alexander I. Rudnicky, Jason Williams, Joelle Pineau, Mikhail S. Burtsev, and Jason Weston. The second conversational intelligence challenge (convai2). The NeurIPS ’18 Com...

  25. [26]

    Personalizing dialogue agents: I have a dog, do you have pets too? ACL, 2018

    Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. Personalizing dialogue agents: I have a dog, do you have pets too? ACL, 2018

  26. [27]

    A Diversity-Promoting Objective Function for Neural Conversation Models

    Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055, 2015

  27. [28]

    Generative deep neural networks for dialogue: A short review

    Iulian Vlad Serban, Ryan Lowe, Laurent Charlin, and Joelle Pineau. Generative deep neural networks for dialogue: A short review. arXiv preprint arXiv:1611.06216, 2016

  28. [29]

    Transfertransfo: A transfer learning approach for neural network based conversational agents

    Thomas Wolf, Victor Sanh, Julien Chaumond, and Clement Delangue. Transfertransfo: A transfer learning approach for neural network based conversational agents. In NeurIPS Workshop on Conversational AI, 2019

  29. [30]

    Dialogpt: Large-scale generative pre-training for conversational response generation

    Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, and Bill Dolan. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536, 2019

  30. [31]

    Retrieval augmentation reduces hallucination in conversation

    Kurt Shuster, Spencer Poff, Moya Chen, Douwe Kiela, and Jason Weston. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567, 2021

  31. [32]

    Adam Roberts, Colin Raffel, and Noam Shazeer. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 5418–5426, November 2020

  32. [33]

    Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathy Meier-Hellstern, Toju Duke, Lucas Dixon, 20 Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, a...

  33. [34]

    arXiv preprint arXiv:1911.00172 , year=

    Urvashi Khandelwal, Omer Levy, Dan Jurafsk, Luke Zettlemoyer, and Mike Lewis. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2019

  34. [35]

    Retrieval-augmented generation for knowledge-intensive nlp tasks

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS, 2020

  35. [36]

    doi:10.48550/arXiv.2002.08909 , abstract =

    Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. Realm: Retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909, 2020

  36. [37]

    Leveraging passage retrieval with generative models for open domain question answering

    Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282, 2021

  37. [38]

    Retrieving and reading: A comprehensive survey on open-domain question answering

    Fengbin Zhu, Wenqiang Lei, Chao Wang, Jianming Zheng, Soujanya Poria, and Tat-Seng Chua. Retrieving and reading: A comprehensive survey on open-domain question answering. arXiv preprint arXiv:2101.00774, 2021

  38. [39]

    arXiv preprint arXiv:2004.04906 , year=

    Vladimir Karpukhin, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen tau Yih. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906, 2020

  39. [40]

    A modern perspective on query likelihood with deep generative retrieval models

    Oleg Lesota, Navid Rekabsaz, Daniel Cohen, Klaus Antonius Grasserbauer, Carsten Eickhoff, and Markus Schedl. A modern perspective on query likelihood with deep generative retrieval models. arXiv preprint arXiv:2106.13618, 2021

  40. [41]

    Improving language models by retrieving from trillions of tokens.Preprint arXiv:2112.04426,

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Ori...

  41. [42]

    Tickettalk: Toward human-level performance with end-to-end, transaction-based dialog systems

    Bill Byrne, Karthik Krishnamoorthi, Saravanan Ganesh, and Mihir Sanjay Kale. Tickettalk: Toward human-level performance with end-to-end, transaction-based dialog systems. arXiv preprint arXiv:2012.12458, 2020

  42. [43]

    Reason first, then respond: Modular generation for knowledge-infused dialogue

    Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, and Jason Weston. Reason first, then respond: Modular generation for knowledge-infused dialogue. arXiv preprint arXiv:2111.05204, 2021

  43. [44]

    WebGPT: Browser-assisted question-answering with human feedback

    Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. Webgpt: Browser-assisted question- answering with human feedback. arXiv preprint arX...

  44. [45]

    Internet-augmented dialogue generation

    Mojtaba Komeili, Kurt Shuster, and Jason Weston. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566, 2021

  45. [46]

    Usr: An unsupervised and reference free evaluation metric for dialog generation

    Shikib Mehri and Maxine Eskenazi. Usr: An unsupervised and reference free evaluation metric for dialog generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 681–707, 2020

  46. [47]

    BLEU: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei jing Zhu. BLEU: a method for automatic evaluation of machine translation. In ACL, 2002

  47. [48]

    How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation

    Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , 2016

  48. [49]

    What makes a good conversation? how controllable attributes affect human judgments

    Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. What makes a good conversation? how controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019

  49. [50]

    Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons

    Margaret Li, Jason Weston, and Stephen Roller. Acute-eval: Improved dialogue evaluation with optimized questions and multi-turn comparisons. In NeurIPS workshop on Conversational AI, 2019

  50. [51]

    Treating dialogue quality evaluation as an anomaly detection problem

    Rostislav Nedelchev, Jens Lehmann, and Ricardo Usbeck. Treating dialogue quality evaluation as an anomaly detection problem. In Proceedings of the 12th Conference on Language Resources and Evaluation , pages 508–512, 2020

  51. [52]

    On evaluating and comparing conversational agents

    Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, and Anirudh Raju. On evaluating and comparing conversational agents. NeurIPS, 2017. 21

  52. [53]

    Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Verena Rieser

    Emily Dinan, Gavin Abercrombie, A. Stevie Bergman, Shannon Spruit, Dirk Hovy, Y-Lan Boureau, and Verena Rieser. Anticipating safety issues in e2e conversational ai: Framework and tooling. arXiv preprint arXiv:2107.03451, 2021

  53. [54]

    Ethical and social risks of harm from Language Models

    Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, and Iason Gabriel. Ethical and ...

  54. [55]

    Dropout distillation

    Samuel Rota Bulò, Lorenzo Porzi, and Peter Kontschieder. Dropout distillation. In ICLR, 2016

  55. [56]

    The radicalization risks of GPT-3 and advanced neural language models

    Kris McGuffie and Alex Newhouse. The radicalization risks of GPT-3 and advanced neural language models. arXiv preprint arXiv:2009.06807, 2020

  56. [57]

    Persistent anti-muslim bias in large language models

    Abubakar Abid, Maheen Farooqi, and James Zou. Persistent anti-muslim bias in large language models. arXiv preprint arXiv:2101.05783, 2021

  57. [58]

    Man is to computer programmer as woman is to homemaker? debiasing word embeddings

    Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Processing Systems, 2016

  58. [59]

    Costa-jussà, and Noe Casas

    Christine Basta, Marta R. Costa-jussà, and Noe Casas. Evaluating the underlying gender bias in contextualized word embeddings. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing , August 2019

  59. [60]

    Measuring bias in contextualized word representations

    Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, August 2019

  60. [61]

    Lu, Mohamed Abdalla, Matthew McDermott, and Marzyeh Ghassemi

    Haoran Zhang, Amy X. Lu, Mohamed Abdalla, Matthew McDermott, and Marzyeh Ghassemi. Hurtful words: Quantifying biases in clinical contextual word embeddings. In Proceedings of the ACM Conference on Health, Inference, and Learning, 2020

  61. [62]

    The woman worked as a babysitter: On biases in language generation

    Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , 2019

  62. [63]

    Gender bias in contextualized word embeddings

    Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, and Kai-Wei Chang. Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), June 2019

  63. [64]

    Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases

    Wei Guo and Aylin Caliskan. Detecting emergent intersectional biases: Contextualized word embeddings contain a distribution of human-like biases. arXiv preprint arXiv:2006.03955, 2020

  64. [65]

    Perturbation sensitivity analysis to detect unintended model biases

    Vinodkumar Prabhakaran, Ben Hutchinson, and Margaret Mitchell. Perturbation sensitivity analysis to detect unintended model biases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2019

  65. [66]

    Bowman and Rachel Rudinger , title =

    Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. On measuring social biases in sentence encoders. arXiv preprint arXiv:1903.10561, 2019

  66. [67]

    Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi, and Noah A. Smith. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019

  67. [68]

    Shikha Bordia and Samuel R. Bowman. Identifying and reducing gender bias in word-level language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, 2019

  68. [69]

    On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021

    Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021

  69. [70]

    Smith, and Yejin Choi

    Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

  70. [71]

    Social biases in NLP models as barriers for persons with disabilities

    Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

  71. [72]

    Large language models associate muslims with violence

    Abubakar Abid, Maheen Farooqi, and James Zou. Large language models associate muslims with violence. Nature Machine Intelligence, 2021. 22

  72. [73]

    Extracting training data from large language models

    Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-V oss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. arXiv preprint arXiv:2012.07805, 2020

  73. [74]

    Chi, and Alex Beutel

    Sahaj Garg, Vincent Perot, Nicole Limtiaco, Ankur Taly, Ed H. Chi, and Alex Beutel. Counterfactual fairness in text classification through robustness. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019. ISBN 9781450363242

  74. [75]

    Reducing sentiment bias in language models via counterfactual evaluation

    Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, and Pushmeet Kohli. Reducing sentiment bias in language models via counterfactual evaluation. In EMNLP (Findings), 2020

  75. [76]

    A scalable approach to reducing gender bias in google translate

    Melvin Johnson. A scalable approach to reducing gender bias in google translate. https://ai.googleblog. com/2020/04/a-scalable-approach-to-reducing-gender.html , 2020

  76. [77]

    Reducing gender bias in word-level language models with a gender-equalizing loss function

    Yusu Qian, Urwa Muaz, Ben Zhang, and Jae Won Hyun. Reducing gender bias in word-level language models with a gender-equalizing loss function. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, July 2019

  77. [78]

    Towards debiasing sentence representations

    Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, and Louis-Philippe Morency. Towards debiasing sentence representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020

  78. [79]

    Recipes for safety in open-domain chatbots

    Margaret Li Y-Lan Boureau Jason Weston Emily Dinan Jing Xu, Da Ju. Recipes for safety in open-domain chatbots. arXiv preprint arXiv:2010.07079, 2020

  79. [80]

    Dexperts: Decoding-time controlled text generation with experts and anti-experts.arXiv preprint arXiv:2105.03023, 2021

    Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, and Yejin Choi. On-the-fly controlled text generation with experts and anti-experts. arXiv preprint arXiv:2105.03023, 2021

  80. [81]

    Bot-adversarial dialogue for safe conversational agents

    Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Jason Weston, and Emily Dinan. Bot-adversarial dialogue for safe conversational agents. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Showing first 80 references.