Recognition: no theorem link
StarCoder 2 and The Stack v2: The Next Generation
Pith reviewed 2026-05-12 17:22 UTC · model grok-4.3
The pith
StarCoder2's 3B model outperforms prior 15B versions while its 15B model matches models more than twice its size on code tasks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors train models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens from the expanded dataset. They find that the 3B model outperforms other code language models of similar size on most benchmarks and also outperforms the prior 15B base model. The 15B model significantly outperforms other models of comparable size. In addition, it matches or outperforms a model more than twice its size. It further outperforms competing models on math and code reasoning benchmarks as well as several low-resource languages.
What carries the argument
The Stack v2, the four-times-larger curated training dataset spanning hundreds of programming languages and additional high-quality sources that supplies the tokens for training the models.
If this is right
- Smaller models achieving strong results lowers the resources needed to deploy capable code assistants.
- Stronger performance on low-resource languages expands the reach of automated code tools to more programming contexts.
- Gains on reasoning benchmarks suggest the models can handle hybrid coding and mathematical tasks more effectively.
- Open release of weights and data identifiers enables independent verification and building on the results.
Where Pith is reading between the lines
- If data quality drives the gains, future code models may achieve more by refining curation rather than scaling parameters alone.
- The efficiency improvements could support wider use of code models in settings with limited compute or on personal devices.
- Direct comparisons on tasks drawn from actual developer projects would test whether benchmark gains translate outside controlled evaluations.
Load-bearing premise
The chosen benchmarks and data curation rules produce results that generalize to real developer workflows and that no significant data contamination or overlap exists with the evaluation sets.
What would settle it
Running the models on a new set of code completion and reasoning tasks drawn exclusively from private or post-training sources with no possible overlap, where the reported advantages over prior models disappear.
read the original abstract
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data sources, such as GitHub pull requests, Kaggle notebooks, and code documentation. This results in a training set that is 4x larger than the first StarCoder dataset. We train StarCoder2 models with 3B, 7B, and 15B parameters on 3.3 to 4.3 trillion tokens and thoroughly evaluate them on a comprehensive set of Code LLM benchmarks. We find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B. Our large model, StarCoder2- 15B, significantly outperforms other models of comparable size. In addition, it matches or outperforms CodeLlama-34B, a model more than twice its size. Although DeepSeekCoder- 33B is the best-performing model at code completion for high-resource languages, we find that StarCoder2-15B outperforms it on math and code reasoning benchmarks, as well as several low-resource languages. We make the model weights available under an OpenRAIL license and ensure full transparency regarding the training data by releasing the SoftWare Heritage persistent IDentifiers (SWHIDs) of the source code data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces StarCoder2, a family of Code LLMs (3B, 7B, 15B parameters) trained on The Stack v2, a dataset 4x larger than the original Stack, constructed from Software Heritage archives across 619 languages plus curated GitHub PRs, Kaggle notebooks, and documentation. It reports that StarCoder2-3B outperforms other Code LLMs of similar size and even StarCoderBase-15B on most benchmarks, while StarCoder2-15B significantly outperforms comparable models and matches or exceeds CodeLlama-34B (more than twice its size), with additional strengths on math, reasoning, and low-resource languages. Model weights are released under OpenRAIL and training data transparency is provided via SWHIDs.
Significance. If the results hold, this advances open Code LLMs by showing that scaled, high-quality data curation enables smaller models to outperform larger predecessors, with full data transparency as a key strength. The release of SWHIDs and model weights supports reproducibility, and the consistent cross-benchmark outperformance (including against models like DeepSeekCoder-33B on specific tasks) provides concrete evidence for the value of the expanded corpus and training approach.
major comments (1)
- [Data Curation and Evaluation] The manuscript describes careful selection of additional data sources (GitHub PRs, Kaggle notebooks, documentation) and releases SWHIDs for the Software Heritage portion, but provides no explicit decontamination protocol, overlap statistics, or verification that benchmark problems (HumanEval, MBPP, DS-1000, etc.) are absent from the 4x larger training corpus. This is load-bearing for the central performance claims, such as StarCoder2-3B outperforming StarCoderBase-15B and StarCoder2-15B matching CodeLlama-34B, because even modest leakage could undermine the generalization interpretation.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive assessment of the significance of our work. We address the major comment below and will revise the manuscript to incorporate the requested details.
read point-by-point responses
-
Referee: [Data Curation and Evaluation] The manuscript describes careful selection of additional data sources (GitHub PRs, Kaggle notebooks, documentation) and releases SWHIDs for the Software Heritage portion, but provides no explicit decontamination protocol, overlap statistics, or verification that benchmark problems (HumanEval, MBPP, DS-1000, etc.) are absent from the 4x larger training corpus. This is load-bearing for the central performance claims, such as StarCoder2-3B outperforming StarCoderBase-15B and StarCoder2-15B matching CodeLlama-34B, because even modest leakage could undermine the generalization interpretation.
Authors: We agree that explicit documentation of the decontamination protocol is essential to substantiate the generalization claims. The current manuscript prioritizes transparency via SWHIDs for the Software Heritage data, enabling external verification, but does not detail the steps taken to exclude benchmark contamination. We will add a dedicated subsection under Data Curation describing our decontamination procedure (including the methods used to detect and remove overlaps with HumanEval, MBPP, DS-1000, and related benchmarks) along with the resulting overlap statistics. This revision will directly address the load-bearing nature of the concern. revision: yes
Circularity Check
No circularity in empirical dataset construction, training, and benchmarking
full rationale
The paper describes building The Stack v2 from Software Heritage archives plus selected sources, training 3B/7B/15B models on 3.3-4.3T tokens, and reporting benchmark scores on HumanEval, MBPP, DS-1000 and similar suites. No equations, derivations, or 'predictions' are claimed. Performance statements are direct outcomes of large-scale training runs evaluated on external benchmarks, not reductions of fitted parameters or self-citations. The central claims rest on independent empirical measurement rather than any self-referential logic or ansatz smuggled via prior work.
Axiom & Free-Parameter Ledger
free parameters (2)
- Model parameter counts (3B, 7B, 15B)
- Training token volume (3.3-4.3 trillion)
axioms (1)
- domain assumption The curated mix of Software Heritage repositories, GitHub pull requests, Kaggle notebooks, and documentation constitutes high-quality training data representative of real code.
Forward citations
Cited by 30 Pith papers
-
HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks
HWE-Bench is the first repository-level benchmark for LLM agents on real hardware bug repair, where the best agent fixes 70.7% of 417 tasks but drops below 65% on complex SoC projects.
-
Edit, But Verify: An Empirical Audit of Instructed Code-Editing Benchmarks
The two main benchmarks for LLM instructed code editing over-represent Python, miss common real-world domains and edit types, and have test coverage issues that limit what they measure.
-
An Empirical Study of Speculative Decoding on Software Engineering Tasks
Speculative decoding accelerates LLM inference on SE tasks without accuracy loss, with model-based methods suiting code generation and model-free methods suiting repository-level repair and editing.
-
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
-
SynthFix: Adaptive Neuro-Symbolic Code Vulnerability Repair
SynthFix adaptively routes LLM code repairs to supervised fine-tuning or symbolic-reward fine-tuning, yielding up to 32% higher exact match on JavaScript and C vulnerability benchmarks.
-
From Where Words Come: Efficient Regularization of Code Tokenizers Through Source Attribution
SA-BPE regularizes standard BPE training for code by incorporating source diversity to skip problematic merges, substantially cutting unused tokens without altering inference.
-
AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search
AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.
-
Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation
Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.
-
Think Anywhere in Code Generation
Think-Anywhere lets LLMs invoke on-demand reasoning at any token during code generation via cold-start imitation followed by outcome-based RL, reaching state-of-the-art results on LeetCode, LiveCodeBench, HumanEval, and MBPP.
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
-
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
-
Defective Task Descriptions in LLM-Based Code Generation: Detection and Analysis
SpecValidator detects lexical vagueness, under-specification, and syntax-formatting defects in LLM code-generation prompts with F1 0.804, outperforming GPT-5-mini and Claude Sonnet 4, and shows that under-specificatio...
-
A Metamorphic Testing Approach to Diagnosing Memorization in LLM-Based Program Repair
Metamorphic testing on Defects4J and GitBug-Java reveals substantial performance drops in seven LLMs that correlate with NLL, indicating data leakage in LLM-based program repair.
-
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
-
LeGo-Code: Can Modular Curriculum Learning Advance Complex Code Generation? Insights from Text-to-SQL
Modular curriculum learning with tier-specific adapters outperforms standard fine-tuning on complex Text-to-SQL queries in Spider and BIRD benchmarks by avoiding catastrophic forgetting.
-
CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora
CodePivot uses Python as a pivot language plus an Aggressive-Partial-Functional RL reward to train a 7B model that outperforms much larger LLMs on multilingual code transpilation without parallel corpora.
-
DPC: Training-Free Text-to-SQL Candidate Selection via Dual-Paradigm Consistency
DPC selects correct text-to-SQL outputs by enforcing execution consistency between SQL and Python on an adversarially constructed minimal distinguishing database.
-
Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs
A perturbation method shows memorization advantage in code LLMs varies widely by model and task, remaining low on CVEFixes and Defects4J benchmarks.
-
Attention Editing: A Versatile Framework for Cross-Architecture Attention Conversion
Attention Editing converts pre-trained LLMs to new attention architectures through layer-wise teacher-forced optimization and model-level distillation, preserving performance with efficiency gains.
-
Automated Attention Pattern Discovery at Scale in Large Language Models
AP-MAE reconstructs masked attention patterns in LLMs with high accuracy, generalizes across models, predicts generation correctness at 55-70%, and enables 13.6% accuracy gains via targeted interventions.
-
TestDecision: Sequential Test Suite Generation via Greedy Optimization and Reinforcement Learning
By proving test suite coverage is monotone submodular and training LLMs with RL to maximize marginal gains, TestDecision improves branch coverage 38-52% and bug detection up to 95% over base models on ULT and LiveCodeBench.
-
A Taxonomy of Programming Languages for Code Generation
The researchers provide a systematic 4-tier classification of 646 programming languages, quantifying the extreme data scarcity facing over 70% of the world's programming languages in the age of LLMs.
-
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair
Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.
-
A Validated Prompt Bank for Malicious Code Generation: Separating Executable Weapons from Security Knowledge in 1,554 Consensus-Labeled Prompts
The paper releases a 1,554-prompt consensus-labeled bank separating executable malicious code requests from security knowledge requests, validated by five-model majority labeling with Fleiss' kappa of 0.876.
-
Learning Generalizable Multimodal Representations for Software Vulnerability Detection
MultiVul uses multimodal contrastive learning to align code and comment representations, yielding up to 27% F1 gains on vulnerability detection benchmarks over prompting and code-only baselines.
-
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection
Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
SmolLM2 is a 1.7B-parameter language model that outperforms Qwen2.5-1.5B and Llama3.2-1B after overtraining on 11 trillion tokens using custom FineMath, Stack-Edu, and SmolTalk datasets in a multi-stage pipeline.
-
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models
Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
-
A Survey on Large Language Models for Code Generation
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark...
Reference graph
Works this paper leans on
-
[1]
Open science is a research accelerator. , author =. Nature chemistry , volume =
-
[3]
Unsupervised cross-lingual representation learning at scale , author =
-
[9]
Quantifying the Carbon Emissions of Machine Learning , author =. arXiv preprint , url =
-
[10]
Computing in Science & Engineering , volume = 22, number = 2, pages =
Referencing Source Code Artifacts: A Separate Concern in Software Citation , author =. Computing in Science & Engineering , volume = 22, number = 2, pages =
-
[11]
International Conference on Learning Representations , year=
Multitask Prompted Training Enables Zero-Shot Task Generalization , author=. International Conference on Learning Representations , year=
-
[17]
Crosslingual Generalization through Multitask Finetuning , author=. arXiv preprint , url=. 2022 , eprint=
work page 2022
-
[18]
What language model to train if you have one million gpu hours? , author =
-
[20]
Bloom+ 1: Adding language support to bloom for zero-shot prompting , author =
-
[23]
Journal of Machine Learning Research , volume = 24, number = 240, pages =
PaLM: Scaling Language Modeling with Pathways , author =. Journal of Machine Learning Research , volume = 24, number = 240, pages =
-
[26]
Denis Kocetkov and Raymond Li and Loubna Ben allal and Jia LI and Chenghao Mou and Yacine Jernite and Margaret Mitchell and Carlos Mu. The Stack: 3. Transactions on Machine Learning Research , issn =
-
[28]
Frontiers in Artificial Intelligence and Applications , series =
Helping Code Reviewer Prioritize: Pinpointing Personal Data and Its Processing , author =. Frontiers in Artificial Intelligence and Applications , series =
-
[32]
WizardCoder: Empowering Code Large Language Models with Evol-Instruct , author =
-
[35]
Auditing Large Language Models: A Three-Layered Approach , author =. AI Ethics , url =
-
[36]
Niklas Muennighoff and Qian Liu and Armel Randy Zebaze and Qinkai Zheng and Binyuan Hui and Terry Yue Zhuo and Swayam Singh and Xiangru Tang and Leandro Von Werra and Shayne Longpre , booktitle=. 2024 , url=
work page 2024
-
[39]
Gemini: a family of highly capable multimodal models , author =. arXiv preprint , url=
-
[40]
Terry Yue Zhuo and Yujin Huang and Chunyang Chen and Zhenchang Xing , year = 2023, journal =. Red teaming. 2301.12867 , archiveprefix =
-
[41]
Magicoder: Source Code Is All You Need , author =
-
[45]
Source Code Data Augmentation for Deep Learning: A Survey , author =. arXiv preprint , url =. 2305.19915 , archiveprefix =
-
[61]
Thirty-seventh Conference on Neural Information Processing Systems , year=
Scaling Data-Constrained Language Models , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
-
[62]
Generative Representational Instruction Tuning , author=. arXiv preprint , url=. 2024 , eprint=
work page 2024
-
[63]
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning , author=. arXiv preprint , url=. 2024 , eprint=
work page 2024
-
[65]
Ziegler, Albert and Kalliamvakou, Eirini and Li, X. Alice and Rice, Andrew and Rifkin, Devon and Simister, Shawn and Sittampalam, Ganesh and Aftandilian, Edward , year = 2024, month =. Measuring. Commun. ACM , publisher =. doi:10.1145/3633453 , issn =
-
[72]
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models , author =. arXiv preprint , url =
-
[74]
Annual symposium on combinatorial pattern matching , pages =
Identifying and filtering near-duplicate documents , author =. Annual symposium on combinatorial pattern matching , pages =
-
[75]
Lattner, Chris and Adve, Vikram , year = 2004, booktitle =
work page 2004
-
[76]
Kingma and Jimmy Ba , year = 2015, booktitle =
Diederik P. Kingma and Jimmy Ba , year = 2015, booktitle =. Adam:
work page 2015
-
[77]
Nanz, Sebastian and Furia, Carlo A. , year = 2015, booktitle =. A comparative study of programming languages in
work page 2015
-
[78]
Advances in Neural Information Processing Systems , publisher =
Deep Reinforcement Learning from Human Preferences , author =. Advances in Neural Information Processing Systems , publisher =
-
[79]
iPRES 2017: 14th International Conference on Digital Preservation , address =
Software Heritage: Why and How to Preserve Software Source Code , author =. iPRES 2017: 14th International Conference on Digital Preservation , address =
work page 2017
-
[81]
International Conference on Learning Representations , url =
Analysing Mathematical Reasoning Abilities of Neural Models , author =. International Conference on Learning Representations , url =
-
[88]
Measuring Coding Challenge Competence With APPS , author =. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks , publisher =
-
[89]
Jiang, Albert Qiaochu and Li, Wenda and Han, Jesse Michael and Wu, Yuhuai , year = 2021, booktitle =
work page 2021
-
[90]
Ruchir Puri and David S Kung and Geert Janssen and Wei Zhang and Giacomo Domeniconi and Vladimir Zolotov and Julian Dolby and Jie Chen and Mihir Choudhury and Lindsey Decker and Veronika Thost and Luca Buratti and Saurabh Pujar and Shyam Ramji and Ulrich Finkler and Susan Malaika and Frederick Reiss , year = 2021, booktitle =
work page 2021
-
[94]
Workshop on Broadening Research Collaborations 2022 , url =
Christopher Akiki and Giada Pistilli and Margot Mieskes and Matthias Gall. Workshop on Broadening Research Collaborations 2022 , url =
work page 2022
-
[95]
Advances in Neural Information Processing Systems , publisher =
Dao, Tri and Fu, Dan and Ermon, Stefano and Rudra, Atri and R\'. Advances in Neural Information Processing Systems , publisher =
-
[96]
International Conference on Learning Representations , url =
Finetuned Language Models are Zero-Shot Learners , author =. International Conference on Learning Representations , url =
-
[97]
Towards Openness Beyond Open Access: User Journeys through 3 Open
Jennifer Ding and Christopher Akiki and Yacine Jernite and Anne Lee Steele and Temi Popo , year = 2022, booktitle =. Towards Openness Beyond Open Access: User Journeys through 3 Open
work page 2022
-
[98]
2022 IEEE Symposium on Security and Privacy (SP) , pages =
Asleep at the keyboard? assessing the security of github copilot’s code contributions , author =. 2022 IEEE Symposium on Security and Privacy (SP) , pages =
work page 2022
-
[99]
Findings of the Association for Computational Linguistics: EMNLP 2022 , publisher =
Transformer Language Models without Positional Encodings Still Learn Positional Information , author =. Findings of the Association for Computational Linguistics: EMNLP 2022 , publisher =. doi:10.18653/v1/2022.findings-emnlp.99 , url =. 2203.16634 , archiveprefix =
-
[100]
Findings of the Association for Computational Linguistics: EMNLP 2022 , publisher =
The Curious Case of Absolute Position Embeddings , author =. Findings of the Association for Computational Linguistics: EMNLP 2022 , publisher =. doi:10.18653/v1/2022.findings-emnlp.326 , url =. 2210.12574 , archiveprefix =
-
[101]
Proceedings of the 40th International Conference on Machine Learning , publisher =
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling , author =. Proceedings of the 40th International Conference on Machine Learning , publisher =
-
[102]
Erik Nijkamp and Bo Pang and Hiroaki Hayashi and Lifu Tu and Huan Wang and Yingbo Zhou and Silvio Savarese and Caiming Xiong , year = 2023, booktitle =
work page 2023
-
[103]
Proceedings of the 40th International Conference on Machine Learning , publisher =
Gao, Luyu and Madaan, Aman and Zhou, Shuyan and Alon, Uri and Liu, Pengfei and Yang, Yiming and Callan, Jamie and Neubig, Graham , year = 2023, month =. Proceedings of the 40th International Conference on Machine Learning , publisher =
work page 2023
-
[104]
Guilherme Penedo and Quentin Malartic and Daniel Hesslow and Ruxandra Cojocaru and Hamza Alobeidli and Alessandro Cappelli and Baptiste Pannier and Ebtesam Almazrouei and Julien Launay , year = 2023, booktitle =. The
work page 2023
-
[105]
Proceedings of the 40th International Conference on Machine Learning , publisher =
Lai, Yuhang and Li, Chengxi and Wang, Yiming and Zhang, Tianyi and Zhong, Ruiqi and Zettlemoyer, Luke and Yih, Wen-Tau and Fried, Daniel and Wang, Sida and Yu, Tao , year = 2023, month =. Proceedings of the 40th International Conference on Machine Learning , publisher =
work page 2023
-
[106]
Is Your Code Generated by Chat
Liu, Jiawei and Xia, Chunqiu Steven and Wang, Yuyao and Zhang, Lingming , year = 2023, booktitle =. Is Your Code Generated by Chat
work page 2023
-
[107]
Data Portraits: Recording Foundation Model Training Data , author =. Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track , url =
-
[108]
The Eleventh International Conference on Learning Representations , url =
Code Translation with Compiler Representations , author =. The Eleventh International Conference on Learning Representations , url =
-
[109]
Yangruibo Ding and Zijian Wang and Wasi Uddin Ahmad and Hantian Ding and Ming Tan and Nihal Jain and Murali Krishna Ramanathan and Ramesh Nallapati and Parminder Bhatia and Dan Roth and Bing Xiang , year = 2023, booktitle =
work page 2023
-
[114]
The First International Workshop on Large Language Model for Code , url =
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions , author =. The First International Workshop on Large Language Model for Code , url =
-
[115]
Tri Dao , year = 2024, booktitle =
work page 2024
-
[116]
The Twelfth International Conference on Learning Representations , url =
Lemur: Harmonizing Natural Language and Code for Language Agents , author =. The Twelfth International Conference on Learning Representations , url =
-
[117]
The Twelfth International Conference on Learning Representations , url =
Llemma: An Open Language Model for Mathematics , author =. The Twelfth International Conference on Learning Representations , url =
-
[118]
Caballero, Ethan and OpenAI and Sutskever, Ilya , year = 2016, month = aug, doi =
work page 2016
-
[119]
Codeforces: Results of 2020 [Annual Report] , author =
work page 2020
-
[121]
doi:10.57967/hf/0003 , url =
-
[122]
GitHub repository , publisher =
A framework for the evaluation of code generation models , author =. GitHub repository , publisher =
-
[123]
A Hazard Analysis Framework for Code Synthesis Large Language Models , author =. arXiv preprint , url=. 2207.14157 , archiveprefix =
-
[124]
BigCode Model License Agreement , author =
-
[125]
Archival of Software Metadata , author =
-
[126]
SWH Statement on LLM for Code , author =
-
[127]
Big Code Models Leaderboard , author =
-
[128]
Go smol or go home , author =
-
[129]
Language Models as a Service: Overview of a New Paradigm and its Challenges , author =
-
[130]
Polymorphic Virus , author =
-
[131]
Chatting Our Way Into Creating a Polymorphic Malware , author =
-
[132]
Secure by Design , author =
-
[133]
The Generative World Order:
-
[134]
Open Sourcing Highly Capable Foundation Models , author =
-
[135]
Introducing StarChat Alpha: A New Milestone in Conversational AI , author =
-
[136]
Summarization LLM: Enhancing Document Summarization with Large Language Models , author =
-
[137]
Bulk Access Terms of Use , author =
-
[138]
How LLM Adoption Has Impacted AI Job Roles , author =
-
[139]
Jobs of Tomorrow: Large Language Models and Jobs , author =
-
[140]
The Stack V2 , author =
- [141]
-
[145]
Pinnaparaju, Nikhil and Adithyan, Reshinth and Phung, Duy and Tow, Jonathan and Baicoianu, James and and Cooper, Nathan , year = 2024, journal =. Stable Code
work page 2024
-
[146]
Software Heritage Community , author =
-
[147]
New Frontiers: The Origins and Content of New Work, 1940–2018 , author =. doi:10.3386/w30389 , url =
-
[148]
Conference on the theory and application of cryptographic techniques , pages=
A digital signature based on a conventional encryption function , author=. Conference on the theory and application of cryptographic techniques , pages=. 1987 , organization=
work page 1987
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.