arxiv: 2305.06161 · v2 · submitted 2023-05-09 · 💻 cs.CL · cs.AI· cs.PL· cs.SE

Recognition: no theorem link

StarCoder: may the source be with you!

Raymond Li , Loubna Ben Allal , Yangtian Zi , Niklas Muennighoff , Denis Kocetkov , Chenghao Mou , Marc Marone , Christopher Akiki

show 59 more authors

Jia Li Jenny Chim Qian Liu Evgenii Zheltonozhskii Terry Yue Zhuo Thomas Wang Olivier Dehaene Mishig Davaadorj Joel Lamy-Poirier Jo\~ao Monteiro Oleh Shliazhko Nicolas Gontier Nicholas Meade Armel Zebaze Ming-Ho Yee Logesh Kumar Umapathi Jian Zhu Benjamin Lipkin Muhtasham Oblokulov Zhiruo Wang Rudra Murthy Jason Stillerman Siva Sankalp Patel Dmitry Abulkhanov Marco Zocca Manan Dey Zhihan Zhang Nour Fahmy Urvashi Bhattacharyya Wenhao Yu Swayam Singh Sasha Luccioni Paulo Villegas Maxim Kunakov Fedor Zhdanov Manuel Romero Tony Lee Nadav Timor Jennifer Ding Claire Schlesinger Hailey Schoelkopf Jan Ebert Tri Dao Mayank Mishra Alex Gu Jennifer Robinson Carolyn Jane Anderson Brendan Dolan-Gavitt Danish Contractor Siva Reddy Daniel Fried Dzmitry Bahdanau Yacine Jernite Carlos Mu\~noz Ferrandis Sean Hughes Thomas Wolf Arjun Guha Leandro Von Werra Harm de Vries

Authors on Pith no claims yet

Pith reviewed 2026-05-10 23:27 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.PLcs.SE

keywords StarCodercode generationlarge language modelsopen modelsprogramming languagesHumanEvalfine-tuningmulti-query attention

0 comments

The pith

A 15.5 billion parameter code model trained on a trillion tokens outperforms open multilingual alternatives and matches proprietary performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents StarCoderBase, a model trained on one trillion tokens drawn from permissively licensed code repositories, along with its Python-specialized variant StarCoder. Extensive testing shows StarCoderBase beats all other open code models that handle multiple languages and equals or exceeds a specific closed model on standard benchmarks, while StarCoder exceeds other Python-focused models, reaches 40 percent pass@1 on HumanEval when prompted, and keeps its strength on non-Python languages. Readers would care because this supplies a publicly available system that developers can inspect, run, or adapt without relying on restricted external services. The release also includes steps for removing personal data from training sources and tracing code origins to support responsible use.

Core claim

StarCoderBase is a 15.5B parameter model with 8K context length trained on 1 trillion tokens sourced from The Stack that outperforms every open Code LLM supporting multiple programming languages and matches or outperforms the code-cushman-001 model. StarCoder, produced by fine-tuning StarCoderBase on 35B Python tokens, outperforms every Python-fine-tuned model, can be prompted to 40 percent pass@1 on HumanEval, and retains performance across other languages.

What carries the argument

The 15.5B parameter StarCoderBase transformer with multi-query attention and 8K context, trained on the large collection of permissively licensed repositories, which supplies the scale and data quality needed for strong cross-language code generation and infilling.

If this is right

Developers can access and run a competitive code model locally or adapt it without API fees or usage limits.
Specializing the model on one language through fine-tuning preserves capability on the remaining languages.
Responsible release practices such as improved personal data removal and origin tracing become feasible at this scale.
Prompt engineering can raise performance on targeted tasks like HumanEval without sacrificing breadth.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Larger open code datasets may become easier to assemble if more projects adopt permissive licenses.
Integration into actual software development environments could expose practical limits not visible in isolated benchmarks.
The same training approach might apply to other domains that produce large volumes of structured, publicly licensed text such as mathematical proofs or configuration files.

Load-bearing premise

The chosen benchmarks and prompting methods accurately reflect real code-generation utility across languages and tasks without hidden advantages from training data overlap or evaluation setup.

What would settle it

Evaluating the models on a fresh collection of coding problems assembled after the training data period and directly comparing pass rates against other open models and code-cushman-001 would confirm or refute the performance claims.

read the original abstract

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large collection of permissively licensed GitHub repositories with inspection tools and an opt-out process. We fine-tuned StarCoderBase on 35B Python tokens, resulting in the creation of StarCoder. We perform the most comprehensive evaluation of Code LLMs to date and show that StarCoderBase outperforms every open Code LLM that supports multiple programming languages and matches or outperforms the OpenAI code-cushman-001 model. Furthermore, StarCoder outperforms every model that is fine-tuned on Python, can be prompted to achieve 40\% pass@1 on HumanEval, and still retains its performance on other programming languages. We take several important steps towards a safe open-access model release, including an improved PII redaction pipeline and a novel attribution tracing tool, and make the StarCoder models publicly available under a more commercially viable version of the Open Responsible AI Model license.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The paper introduces StarCoderBase, a 15.5B parameter Code LLM trained on 1 trillion tokens from The Stack (permissively licensed GitHub data with opt-out tools), and StarCoder, obtained by fine-tuning StarCoderBase on 35B Python tokens. The central claims are that StarCoderBase outperforms every open multi-language Code LLM and matches or exceeds OpenAI's code-cushman-001, while StarCoder reaches 40% pass@1 on HumanEval, retains cross-language performance, and supports infilling with 8K context. The work also describes responsible release steps including improved PII redaction and an attribution tracing tool, with public model availability under a commercially viable open license.

Significance. If the performance claims hold under the reported evaluation protocol, this constitutes a substantial contribution by delivering a strong, openly available multi-language Code LLM that rivals proprietary models while emphasizing data transparency, reproducibility, and safety measures. The public release of models, data inspection tools, and the comprehensive benchmark suite can serve as a reference point for future open Code LLM research and responsible AI practices.

minor comments (4)

[Abstract] Abstract: the assertion of 'the most comprehensive evaluation of Code LLMs to date' would be strengthened by a short explicit comparison (in the introduction or § on evaluation) to the scope of prior Code LLM benchmarks such as those in CodeGen or InCoder papers.
[Model Architecture] The description of multi-query attention for fast large-batch inference is mentioned but lacks implementation specifics (e.g., head grouping factor or kernel details) that would aid exact reproduction of the reported inference speeds.
[Evaluation] Evaluation section: while benchmark numbers are provided, inclusion of variance estimates, number of runs, or statistical significance tests for key comparisons (e.g., vs. code-cushman-001 on HumanEval) would improve the robustness of the outperformance claims.
[Data] Data section: the version or commit hash of The Stack used for the 1T-token training run should be stated explicitly to support exact reproducibility of the pre-training corpus.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review, recognition of the significance of the work, and recommendation to accept the manuscript. We are pleased that the contributions around multi-language code modeling, responsible release practices, and comprehensive evaluation have been acknowledged.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports empirical training of StarCoderBase on 1T tokens from The Stack and fine-tuning to StarCoder, followed by benchmark evaluations on HumanEval and other standard code tasks. No mathematical derivations, first-principles predictions, fitted parameters renamed as outputs, or self-referential definitions appear in the abstract or described methodology. Performance claims rest on direct experimental results and external benchmarks rather than any reduction to inputs by construction. Self-citations, if present, are not load-bearing for any central claim and do not create circularity under the specified criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical large-language-model training and evaluation paper. No free parameters in a theoretical derivation sense, no unproved mathematical axioms, and no new postulated scientific entities are introduced.

pith-pipeline@v0.9.0 · 5829 in / 1152 out tokens · 42492 ms · 2026-05-10T23:27:14.771914+00:00 · methodology

discussion (0)

Forward citations

Cited by 45 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LiveBench: A Challenging, Contamination-Limited LLM Benchmark
cs.CL 2024-06 unverdicted novelty 8.0

LiveBench is a contamination-limited LLM benchmark with auto-scored challenging tasks from recent sources across math, coding, reasoning and more, where top models score below 70%.
Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation
cs.AR 2026-05 unverdicted novelty 7.0

Reward-Weighted On-Policy Distillation with an open property-equivalence verifier produces a 7B model that surpasses prior SOTA on NL-to-SVA generation across pass@1/5/10 metrics.
POSTCONDBENCH: Benchmarking Correctness and Completeness in Formal Postcondition Inference
cs.SE 2026-05 unverdicted novelty 7.0

POSTCONDBENCH is a new multilingual benchmark that evaluates LLM postcondition generation on real code using defect discrimination to assess completeness beyond surface matching.
Social Bias in LLM-Generated Code: Benchmark and Mitigation
cs.SE 2026-05 unverdicted novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation
cs.SE 2026-04 unverdicted novelty 7.0

ClassEval-Pro benchmark shows frontier LLMs achieve at most 45.6% Pass@1 on class-level code tasks, with logic errors (56%) and dependency errors (38%) as dominant failure modes.
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
cs.SE 2026-04 unverdicted novelty 7.0

RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than p...
Constraint-Guided Multi-Agent Decompilation for Executable Binary Recovery
cs.SE 2026-04 unverdicted novelty 7.0

A constraint-guided multi-agent system turns raw decompiler output into re-executable code at 84-97% success rates, outperforming prior LLM decompilation methods on real binaries.
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?
cs.SE 2026-04 unverdicted novelty 7.0

Frontier LLMs pass unit tests over 76% of the time on debugging tasks but achieve edit precision below 45%, indicating regeneration rather than precise debugging.
AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search
cs.SE 2026-04 unverdicted novelty 7.0

AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.
CodeComp: Structural KV Cache Compression for Agentic Coding
cs.CL 2026-04 unverdicted novelty 7.0

CodeComp uses Joern-extracted Code Property Graph priors for training-free structural KV cache compression, outperforming attention-only baselines on bug localization and code generation while matching full-context pa...
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
cs.CL 2023-12 accept novelty 7.0

A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.
C-Pack: Packed Resources For General Chinese Embeddings
cs.CL 2023-09 accept novelty 7.0

C-Pack releases a new Chinese embedding benchmark, large training dataset, and optimized models that outperform priors by up to 10% on C-MTEB while also delivering English SOTA results.
Reflexion: Language Agents with Verbal Reinforcement Learning
cs.AI 2023-03 conditional novelty 7.0

Reflexion lets LLM agents improve via stored verbal reflections on task feedback, reaching 91% pass@1 on HumanEval and outperforming prior GPT-4 results.
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
cs.SE 2026-05 accept novelty 6.0

A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
Programmatic Context Augmentation for LLM-based Symbolic Regression
cs.AI 2026-05 unverdicted novelty 6.0

Programmatic context augmentation lets LLM-based symbolic regression perform code-driven data analysis during search, yielding superior efficiency and accuracy over baselines on LLM-SRBench.
DocSync: Agentic Documentation Maintenance via Critic-Guided Reflexion
cs.SE 2026-05 unverdicted novelty 6.0

DocSync fuses AST-aware retrieval with an iterative critic loop to update documentation, outperforming CodeT5-base on semantic alignment and automated judge scores in a proxy code-to-text task.
BlenderRAG: High-Fidelity 3D Object Generation via Retrieval-Augmented Code Synthesis
cs.CV 2026-05 unverdicted novelty 6.0

BlenderRAG improves LLM-generated Blender code for 3D objects by retrieving semantically similar examples from a curated multimodal dataset of 500 expert-validated cases.
REBENCH: A Procedural, Fair-by-Construction Benchmark for LLMs on Stripped-Binary Types and Names (Extended Version)
cs.CR 2026-04 unverdicted novelty 6.0

REBench is a new benchmark that consolidates existing datasets into a large collection of binaries with knowledge-base-driven ground truth to enable fair LLM evaluation on stripped-binary type and name recovery.
Optimas: An Intelligent Analytics-Informed Generative AI Framework for Performance Optimization
cs.PF 2026-04 unverdicted novelty 6.0

Optimas deploys a multi-agent LLM workflow to convert performance diagnostics into correct code transformations, delivering 100% valid code and performance gains in 98.82% of 3,410 experiments across benchmarks and HP...
No Test Cases, No Problem: Distillation-Driven Code Generation for Scientific Workflows
cs.SE 2026-04 unverdicted novelty 6.0

MOSAIC generates executable scientific code without I/O test cases by combining student-teacher distillation with a consolidated context window to reduce hallucinations across subproblems.
Train Separately, Merge Together: Modular Post-Training with Mixture-of-Experts
cs.LG 2026-04 unverdicted novelty 6.0

BAR trains independent domain experts via separate mid-training, SFT, and RL pipelines then composes them with a MoE router to match monolithic retraining performance at lower cost and without catastrophic forgetting.
CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora
cs.SE 2026-04 unverdicted novelty 6.0

CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.
MATRIX: Multi-Layer Code Watermarking via Dual-Channel Constrained Parity-Check Encoding
cs.CR 2026-04 unverdicted novelty 6.0

MATRIX embeds multi-layer watermarks in LLM-generated code via dual-channel constrained parity-check encoding, achieving 99.2% detection accuracy with 0-0.14% functionality loss and 7.7-26.67% better attack robustness...
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation
cs.SE 2026-04 unverdicted novelty 6.0

Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
Runtime Execution Traces Guided Automated Program Repair with Multi-Agent Debate
cs.SE 2026-04 unverdicted novelty 6.0

TraceRepair deploys a probe agent for runtime snapshots and a committee of agents for cross-verification to fix 392 defects on Defects4J, outperforming prior LLM-based automated program repair methods.
A Taxonomy of Programming Languages for Code Generation
cs.CL 2026-03 accept novelty 6.0

The researchers provide a systematic 4-tier classification of 646 programming languages, quantifying the extreme data scarcity facing over 70% of the world's programming languages in the age of LLMs.
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
cs.CL 2024-04 conditional novelty 6.0

MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
cs.SE 2024-03 unverdicted novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
StarCoder 2 and The Stack v2: The Next Generation
cs.SE 2024-02 accept novelty 6.0

StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
cs.CL 2023-09 conditional novelty 6.0

Bootstrapping math questions via rewriting creates MetaMathQA; fine-tuning LLaMA-2 on it yields 66.4% on GSM8K for 7B and 82.3% for 70B, beating prior same-size models by large margins.
Textbooks Are All You Need
cs.CL 2023-06 unverdicted novelty 6.0

A 1.3B-parameter code model trained on 7B tokens of curated textbook and synthetic data achieves 50.6% on HumanEval, indicating data quality can enable strong performance at small scale.
Gorilla: Large Language Model Connected with Massive APIs
cs.CL 2023-05 conditional novelty 6.0

Gorilla is a fine-tuned LLM that surpasses GPT-4 in accurate API call generation and uses retrieval to handle documentation updates.
Teaching Large Language Models to Self-Debug
cs.CL 2023-04 unverdicted novelty 6.0

Self-Debugging teaches LLMs to identify and fix their own code errors through rubber-duck-style natural language explanations and execution feedback, delivering 2-12% gains over baselines on Spider, TransCoder, and MBPP.
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code
cs.SE 2026-05 unverdicted novelty 5.0

LLM-generated code matches human-written code in overall readability but exhibits different issue patterns, and prompt engineering has limited impact on improving it.
ReAD: Reinforcement-Guided Capability Distillation for Large Language Models
cs.CL 2026-05 unverdicted novelty 5.0

ReAD applies a contextual bandit to allocate fixed-token distillation budget across interdependent LLM capabilities, yielding higher task utility and fewer negative spillovers than standard methods.
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
cs.CR 2026-05 accept novelty 5.0

The paper releases a 1,554-prompt consensus-labeled bank separating executable malicious code requests from security knowledge requests, validated by five-model majority labeling with Fleiss' kappa of 0.876.
From Perception to Autonomous Computational Modeling: A Multi-Agent Approach
cs.CE 2026-04 unverdicted novelty 5.0

A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
cs.CL 2025-02 unverdicted novelty 5.0

SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
cs.SE 2024-01 unverdicted novelty 5.0

DeepSeek-Coder open-source models trained on 2T code tokens with fill-in-the-blank pretraining achieve SOTA results among open models and surpass closed-source Codex and GPT-3.5 on code benchmarks.
Prompt-Driven Code Summarization: A Systematic Literature Review
cs.SE 2026-04 unverdicted novelty 4.0

A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models
cs.SE 2026-04 unverdicted novelty 4.0

Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
Qwen2.5-Coder Technical Report
cs.CL 2024-09 unverdicted novelty 4.0

Qwen2.5-Coder models claim state-of-the-art results on over 10 code benchmarks, outperforming larger models of similar size.
A Survey on Large Language Models for Code Generation
cs.CL 2024-06 unverdicted novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...
Large Language Models: A Survey
cs.CL 2024-02 accept novelty 3.0

The paper surveys key large language models, their training methods, datasets, evaluation benchmarks, and future research directions in the field.
A Survey of Large Language Models
cs.CL 2023-03 accept novelty 3.0

This survey reviews the background, key techniques, and evaluation methods for large language models, emphasizing emergent abilities that appear at large scales.

Reference graph

Works this paper leans on

243 extracted references · 200 canonical work pages · cited by 45 Pith papers · 35 internal anchors

[1]

Unified pre-training for program understanding and generation

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Unified pre-training for program understanding and generation. In Proceedings of NAACL, 2021. URL https://aclanthology.org/2021.naacl-main.211

work page 2021
[4]

Stability AI et al

Andersen et al v. Stability AI et al . 3:23-cv-00201 N.D. Cal. 2023

work page 2023
[7]

A maximum likelihood approach to continuous speech recognition

Lalit Bahl, Frederick Jelinek, and Robert Mercer. A maximum likelihood approach to continuous speech recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-5: 0 179 -- 190, 04 1983. doi:10.1109/TPAMI.1983.4767370

work page doi:10.1109/tpami.1983.4767370 1983
[9]

ChatGPT accessible again in Italy

BBC. ChatGPT accessible again in Italy . https://www.bbc.com/news/technology-65431914, 2023

work page 2023
[10]

A framework for the evaluation of code generation models

Loubna Ben Allal, Niklas Muennighoff, Logesh Kumar Umapathi, Ben Lipkin, and Leandro Von Werra. A framework for the evaluation of code generation models. https://github.com/bigcode-project/bigcode-evaluation-harness, December 2022

work page 2022
[11]

SantaCoder: don't reach for the stars! In Deep Learning for Code Workshop (DL4C), 2023

Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García ...

work page 2023
[12]

A neural probabilistic language model

Yoshua Bengio, R\' e jean Ducharme, and Pascal Vincent. A neural probabilistic language model. In T. Leen, T. Dietterich, and V. Tresp (eds.), Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000. URL https://proceedings.neurips.cc/paper_files/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html

work page 2000
[14]

BLOOM (revision 4ab0472), 2022

BigScience Workshop . BLOOM (revision 4ab0472), 2022. URL https://huggingface.co/bigscience/bloom

work page 2022
[15]

Gpt-neox-20b: An open-source autoregressive language model

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: an open-source autoregressive language model. arXiv preprint arXiv:2204.06745, 2022

work page arXiv 2022
[18]

Popat, Peng Xu, Franz J

Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning ( EMNLP - C o NLL ) , pp.\ 858--867, Prague, Czech Republic, June 2007. Association for Computati...

work page 2007
[19]

Andrei Z. Broder. Identifying and filtering near-duplicate documents. In Annual symposium on combinatorial pattern matching, pp.\ 1--10. Springer, 2000

2000
[21]

N -gram counts and language models from the C ommon C rawl

Christian Buck, Kenneth Heafield, and Bas van Ooyen. N -gram counts and language models from the C ommon C rawl. In Proceedings of the Ninth International Conference on Language Resources and Evaluation ( LREC '14) , pp.\ 3579--3584, Reykjavik, Iceland, May 2014. European Language Resources Association (ELRA). URL http://www.lrec-conf.org/proceedings/lrec...

2014
[22]

This CoPilot is stupid and wants to kill me

Matthew Butterick. This CoPilot is stupid and wants to kill me. https://matthewbutterick.com/chron/this-copilot-is-stupid-and-wants-to-kill-me.html, 2022

2022
[24]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

2021
[27]

Fu, Stefano Ermon, Atri Rudra, and Christopher R \'e

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher R \'e . Flash A ttention: Fast and memory-efficient exact attention with IO -awareness. In Advances in Neural Information Processing Systems, 2022

2022
[29]

and GitHub, Inc

DOE 1 v. and GitHub, Inc. 4:22-cv-06823 N.D. Cal. 2022

2022
[30]

GPTs are GPTs: An early look at the labor market impact potential of large language models.arXiv preprint arXiv:2303.10130, 2023

Tyna Eloundou, Sam Manning, Pamela Mishkin, and Daniel Rock. GPTs are GPTs : An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023

work page arXiv 2023
[31]

Microsoft attracting users to its code-writing, generative AI software

Euronews. Microsoft attracting users to its code-writing, generative AI software. https://www.euronews.com/next/2023/01/25/microsoft-results-ai, 2023

2023
[32]

The general data protection regulation

European Council . The general data protection regulation. https://www.consilium.europa.eu/en/policies/data-protection/data-protection-regulation/, 2018

2018
[37]

PAL: Program-aided Language Models

Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. PAL : Program-aided language models. arXiv preprint arXiv:2211.10435, 2022

work page Pith review arXiv 2022
[39]

Clark, and Philipp Koehn

Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.\ 690--696, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL https://aclantholo...

2013
[42]

On the naturalness of software

Abram Hindle, Earl T Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE), pp.\ 837--847. IEEE, 2012

2012
[44]

The curious case of neural text degeneration

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygGQyrFvH

2020
[45]

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019

work page internal anchor Pith review arXiv 1909
[48]

Learning and evaluating contextual embedding of source code

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. Learning and evaluating contextual embedding of source code. In Proceedings of the 37th International Conference on Machine Learning, ICML'20. JMLR.org, 2020

work page 2020
[51]

Adam: A Method for Stochastic Optimization

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings , 2015. URL http://arxiv.org/abs/1412.6980

work page internal anchor Pith review Pith/arXiv arXiv 2015
[52]

The stack: 3 tb of permissively licensed source code

Denis Kocetkov, Raymond Li, Loubna Ben Allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, and Harm de Vries. The S tack: 3 TB of permissively licensed source code. Preprint, 2022. URL https://arxiv.org/abs/2211.15533

work page arXiv 2022
[53]

Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022

work page internal anchor Pith review arXiv 2022
[54]

Bradley M. Kuhn. If software is my copilot, who programmed my software? https://sfconservancy.org/blog/2022/feb/03/github-copilot-copyleft-gpl/, 2022

work page 2022
[57]

DS-1000: a natural and reliable benchmark for data science code generation

Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen tau Yih, Daniel Fried, Sida Wang, and Tao Yu. DS-1000: a natural and reliable benchmark for data science code generation. ArXiv, abs/2211.11501, 2022

work page arXiv 2022
[58]

Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks

Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, number 2, pp.\ 896, 2013

work page 2013
[59]

Comparing code explanations created by students and large language models, 2023

Juho Leinonen, Paul Denny, Stephen MacNeil, Sami Sarsa, Seth Bernstein, Joanne Kim, Andrew Tran, and Arto Hellas. Comparing code explanations created by students and large language models, 2023

work page 2023
[60]

Fair learning

Mark A Lemley and Bryan Casey. Fair learning. Tex. L. Rev., 99: 0 743, 2020. URL https://texaslawreview.org/fair-learning/

work page 2020
[61]

How copyright law can fix artificial intelligence's implicit bias problem

Amanda Levendowski. How copyright law can fix artificial intelligence's implicit bias problem. Wash. L. Rev., 93: 0 579, 2018

work page 2018
[64]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[65]

Unpicking the rules shaping generative AI

Natasha Lomas. Unpicking the rules shaping generative AI . https://techcrunch.com/2023/04/13/generative-ai-gdpr-enforcement/, 2022

work page 2023
[66]

arXiv preprint arXiv:2102.04664 (2021) 16 A

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. CodeXGLUE : A machine learning benchmark dataset for code understanding ...

work page arXiv 2021
[70]

Using in-context learning to improve dialogue safety, February 2023

Nicholas Meade, Spandana Gella, Devamanyu Hazarika, Prakhar Gupta, Di Jin, Siva Reddy, Yang Liu, and Dilek Hakkani-Tür. Using in-context learning to improve dialogue safety, February 2023. URL http://arxiv.org/abs/2302.00871. arXiv:2302.00871 [cs]

work page arXiv 2023
[71]

Recurrent neural network based language model

Tom \' a s Mikolov, Martin Karafi \' a t, Luk \' a s Burget, Jan Cernock \' y , and Sanjeev Khudanpur. Recurrent neural network based language model. In Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura (eds.), INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010...

work page 2010
[78]

CodeGen: an open large language model for code with multi-turn program synthesis

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen: an open large language model for code with multi-turn program synthesis. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=iaYcJKpY2B_

work page 2023
[79]

In-context learning and induction heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

work page 2022
[81]

GPT-4 system card

OpenAI. GPT-4 system card. https://cdn.openai.com/papers/gpt-4-system-card.pdf, 2023 b

work page 2023
[83]

Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,

Hammond Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. Asleep at the keyboard? Assessing the security of GitHub Copilot 's code contributions. In IEEE Symposium on Security and Privacy, San Francisco, CA, 2022. URL https://arxiv.org/abs/2108.09293

work page arXiv 2022
[87]

Language models are unsupervised multitask learners

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

work page 2019
[88]

Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, ...

work page internal anchor Pith review arXiv 2021
[89]

Exploring the limits of transfer learning with a unified text-to-text transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21 0 (1): 0 5485--5551, 2020

work page 2020
[91]

Rothchild and Daniel Rothchild

John A. Rothchild and Daniel Rothchild. Copyright implications of the use of code repositories to train a machine learning model. https://www.fsf.org/licensing/copilot/copyright-implications-of-the-use-of-code-repositories-to-train-a-machine-learning-model, 2022

work page 2022
[92]

Lost at C : A user study on the security implications of large language model code assistants, 2023

Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Siddharth Garg, and Brendan Dolan-Gavitt. Lost at C : A user study on the security implications of large language model code assistants, 2023

work page 2023
[95]

Kernel description

Arfon Smith. Kernel description. https://github.blog/2016-06-29-making-open-source-data-more-available/, 2016

work page 2016
[96]

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zhang, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, and Bryan Catanzaro. Using DeepSpeed and Megatron to ...

work page Pith review arXiv 2022
[97]

The gradient of generative AI release: Methods and considerations

Irene Solaiman. The gradient of generative AI release: Methods and considerations. arXiv preprint arXiv:2302.04844, 2023

work page arXiv 2023
[99]

How an ai became my code-writing genie, Mar 2022

Clive Thompson. How an ai became my code-writing genie, Mar 2022. URL https://www.wired.com/story/openai-copilot-autocomplete-for-code/

work page 2022
[101]

Yannakakis

Julian Togelius and Georgios N. Yannakakis. Choose your weapon: Survival strategies for depressed AI academics. arXiv preprint arXiv:2304.06035, 2023

work page arXiv 2023
[102]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[103]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pp.\ 5998--6008, 2017

work page 2017
[105]

Poisoning language models during instruction tuning, 2023

Alexander Wan, Eric Wallace, Sheng Shen, and Dan Klein. Poisoning language models during instruction tuning, 2023

work page 2023
[106]

GPT-J-6B: a 6 billion parameter autoregressive language model, 2021

Ben Wang and Aran Komatsuzaki. GPT-J-6B: a 6 billion parameter autoregressive language model, 2021

work page 2021
[109]

Chi, Quoc V Le, and Denny Zhou

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed H. Chi, Quoc V Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=...

2022
[111]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art...

2020
[112]

Future of jobs report

World Economic Forum . Future of jobs report. https://www3.weforum.org/docs/WEF_Future_of_Jobs_2023.pdf, 2023

2023
[114]

Do machine learning models produce TypeScript types that type check? In European Conference on Object-Oriented Programming (ECOOP), 2023

Ming-Ho Yee and Arjun Guha. Do machine learning models produce TypeScript types that type check? In European Conference on Object-Oriented Programming (ECOOP), 2023

2023
[115]

GLM-130B: An Open Bilingual Pre-trained Model

Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, and Jie Tang. GLM-130B: an open bilingual pre-trained model. arXiv preprint arXiv:2210.02414, 2022

work page internal anchor Pith review arXiv 2022
[116]

OPT: Open Pre-trained Transformer Language Models

Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, and Luke Zettlemoyer. OPT: open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[117]

Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x.ArXiv, abs/2303.17568, 2023

Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, and Jie Tang. CodeGeeX : A pre-trained model for code generation with multilingual evaluations on HumanEval-X . arXiv preprint arXiv:2303.17568, 2023. doi:10.48550/arXiv.2303.17568

work page doi:10.48550/arxiv.2303.17568 2023
[118]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and Ed Chi. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022

work page internal anchor Pith review arXiv 2022
[119]

BERT: Pre- training of Deep Bidirectional Transformers for Language Understanding

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina. BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). doi:10.18653/v1/N19-1423

work page doi:10.18653/v1/n19-1423 2019
[120]

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , journal=
[121]

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Simcse: Simple contrastive learning of sentence embeddings , author=. arXiv preprint arXiv:2104.08821 , year=

work page internal anchor Pith review arXiv
[122]

Proceedings of the 38th International Conference on Machine Learning , pages =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , pages =. 2021 , editor =

2021
[123]

arXiv preprint , archivePrefix=

Evaluating Large Language Models Trained on Code , author=. arXiv preprint , archivePrefix=. 2021 , eprint=

2021
[124]

2023 , url=

Erik Nijkamp and Bo Pang and Hiroaki Hayashi and Lifu Tu and Huan Wang and Yingbo Zhou and Silvio Savarese and Caiming Xiong , booktitle=. 2023 , url=

2023
[125]

Codegeex: A pre-trained model for code generation with multilingual evaluations on humaneval-x.ArXiv, abs/2303.17568, 2023

Zheng, Qinkai and Xia, Xiao and Zou, Xu and Dong, Yuxiao and Wang, Shan and Xue, Yufei and Wang, Zihan and Shen, Lei and Wang, Andi and Li, Yang and Su, Teng and Yang, Zhilin and Tang, Jie , year =. arXiv preprint arXiv:2303.17568 , doi =

work page arXiv
[126]

Shuai Lu and Daya Guo and Shuo Ren and Junjie Huang and Alexey Svyatkovskiy and Ambrosio Blanco and Colin Clement and Dawn Drain and Daxin Jiang and Duyu Tang and Ge Li and Lidong Zhou and Linjun Shou and Long Zhou and Michele Tufano and Ming Gong and Ming Zhou and Nan Duan and Neel Sundaresan and Shao Kun Deng and Shengyu Fu and Shujie Liu , journal=
[127]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Scao, Teven Le and Fan, Angela and Akiki, Christopher and Pavlick, Ellie and Ili. arXiv preprint arXiv:2211.05100 , year=

work page internal anchor Pith review arXiv
[128]

Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer , journal=
[129]

Aohan Zeng and Xiao Liu and Zhengxiao Du and Zihan Wang and Hanyu Lai and Ming Ding and Zhuoyi Yang and Yifan Xu and Wendi Zheng and Xiao Xia and Weng Lam Tam and Zixuan Ma and Yufei Xue and Jidong Zhai and Wenguang Chen and Peng Zhang and Yuxiao Dong and Jie Tang , journal=
[130]

Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen tau Yih, Luke Zettlemoyer, and Mike Lewis

Fried, Daniel and Aghajanyan, Armen and Lin, Jessy and Wang, Sida and Wallace, Eric and Shi, Freda and Zhong, Ruiqi and Yih, Wen-tau and Zettlemoyer, Luke and Lewis, Mike , keywords =. 2022 , copyright =. doi:10.48550/ARXIV.2204.05999 , journal=

work page doi:10.48550/arxiv.2204.05999 2022
[131]

2022 , copyright =

Christopoulou, Fenia and Lampouras, Gerasimos and Gritta, Milan and Zhang, Guchun and Guo, Yinpeng and Li, Zhongqi and Zhang, Qi and Xiao, Meng and Shen, Bo and Li, Lin and Yu, Hao and Yan, Li and Zhou, Pingyi and Wang, Xin and Ma, Yuchi and Iacobacci, Ignacio and Wang, Yasheng and Liang, Guangtai and Wei, Jiansheng and Jiang, Xin and Wang, Qianxiang and ...

work page doi:10.48550/arxiv.2207.11280 2022
[132]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Competition-Level Code Generation with AlphaCode , author=. arXiv preprint arXiv:2203.07814 , year=

work page arXiv
[133]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani and Drew A. Hudson and Ehsan Adeli and Russ Altman and Simran Arora and Sydney von Arx and Michael S. Bernstein and Jeannette Bohg and Antoine Bosselut and Emma Brunskill and Erik Brynjolfsson and Shyamal Buch and Dallas Card and Rodrigo Castellon and Niladri S. Chatterji and Annie S. Chen and Kathleen Creel and Jared Quincy Davis and Doro...

work page internal anchor Pith review Pith/arXiv arXiv 2021

Showing first 80 references.