pith. sign in

hub Mixed citations

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

Mixed citation behavior. Most common role is background (67%).

70 Pith papers citing it
Background 67% of classified citations
abstract

Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly technical) and natural language more suitable to describe vague concepts and ideas. To enable evaluation of progress on code search, we are releasing the CodeSearchNet Corpus and are presenting the CodeSearchNet Challenge, which consists of 99 natural language queries with about 4k expert relevance annotations of likely results from CodeSearchNet Corpus. The corpus contains about 6 million functions from open-source code spanning six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby). The CodeSearchNet Corpus also contains automatically generated query-like natural language for 2 million functions, obtained from mechanically scraping and preprocessing associated function documentation. In this article, we describe the methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task. We hope that CodeSearchNet Challenge encourages researchers and practitioners to study this interesting task further and will host a competition and leaderboard to track the progress on the challenge. We are also keen on extending CodeSearchNet Challenge to more queries and programming languages in the future.

hub tools

citation-role summary

background 7 dataset 2

citation-polarity summary

clear filters

representative citing papers

SWE-Explore: Benchmarking How Coding Agents Explore Repositories

cs.SE · 2026-06-05 · unverdicted · novelty 7.0

SWE-Explore is a new benchmark evaluating repository exploration by coding agents on 848 issues across 203 repositories, using line-level ground truth from successful agent trajectories and showing agentic methods outperform classical retrieval on coverage and ranking.

VISTA: An End-to-End Benchmark for Visual Spec-to-Web-App Coding Agents

cs.SE · 2026-05-22 · unverdicted · novelty 7.0

VISTA is a new benchmark for end-to-end visual spec-to-web-app generation by LLM agents, featuring five prompt conditions, manual UI annotations, multi-metric evaluation, and results on four agent systems showing partial decoupling of visual and functional performance.

InCoder: A Generative Model for Code Infilling and Synthesis

cs.SE · 2022-04-12 · unverdicted · novelty 7.0

InCoder is the first generative model to directly perform zero-shot code infilling via bidirectional context from a masked-then-appended training scheme, matching left-to-right models on synthesis while improving on type inference, comment generation, and variable renaming.

citing papers explorer

Showing 12 of 12 citing papers after filters.

  • Test-Time Speculation cs.CL · 2026-05-10 · unverdicted · none · ref 42 · 2 links · internal anchor

    TTS adapts speculator models online via target model verifications to improve acceptance lengths by up to 72% over prior methods, with gains increasing for longer generations.

  • Nomic Embed: Training a Reproducible Long Context Text Embedder cs.CL · 2024-02-02 · conditional · none · ref 22 · internal anchor

    Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.

  • CodeT5+: Open Code Large Language Models for Code Understanding and Generation cs.CL · 2023-05-13 · conditional · none · ref 13 · internal anchor

    CodeT5+ is a flexible encoder-decoder LLM family for code pretrained with diverse objectives on multilingual corpora and initialized from existing LLMs, achieving state-of-the-art results on code generation, completion, math programming, and retrieval tasks including new SoTA on HumanEval with the 1

  • Text and Code Embeddings by Contrastive Pre-Training cs.CL · 2022-01-24 · unverdicted · none · ref 7 · internal anchor

    Contrastive pre-training on unsupervised data at scale creates text and code embeddings that set new state-of-the-art results on classification and semantic search benchmarks.

  • CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation cs.CL · 2021-09-02 · conditional · none · ref 12 · internal anchor

    CodeT5 adds identifier-aware pre-training and bimodal dual generation to a T5-style encoder-decoder, yielding better results on defect detection, clone detection, and code-to-text, text-to-code, and code-to-code tasks than prior encoder-only or decoder-only models.

  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages cs.CL · 2020-02-19 · unverdicted · none · ref 39 · internal anchor

    CodeBERT pre-trains a bimodal model on code and text pairs plus unimodal data to achieve state-of-the-art results on natural language code search and code documentation generation.

  • Search-R3: Unifying Reasoning and Embedding in Large Language Models cs.CL · 2025-10-08 · unverdicted · none · ref 28 · internal anchor

    Search-R3 trains LLMs to output search embeddings as a direct product of step-by-step reasoning via supervised pre-training and a specialized RL environment that avoids full corpus re-encoding.

  • Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 147 · internal anchor

    ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

  • Towards General Text Embeddings with Multi-stage Contrastive Learning cs.CL · 2023-08-07 · unverdicted · none · ref 82 · internal anchor

    GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.

  • StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 45 · internal anchor

    StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

  • A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 112 · internal anchor

    A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

  • Retrieval-Augmented Generation for Large Language Models: A Survey cs.CL · 2023-12-18 · unverdicted · none · ref 157 · internal anchor

    A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.