{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2022:53ID4WUY5AAS75FEMZXSW4LOMZ","short_pith_number":"pith:53ID4WUY","schema_version":"1.0","canonical_sha256":"eed03e5a98e8012ff4a4666f2b716e6658dc8194c6f9a0785d56a623555d48f0","source":{"kind":"arxiv","id":"2208.03299","version":3},"attestation_state":"computed","paper":{"title":"Atlas: Few-shot Learning with Retrieval Augmented Language Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Armand Joulin, Edouard Grave, Fabio Petroni, Gautier Izacard, Jane Dwivedi-Yu, Lucas Hosseini, Maria Lomeli, Patrick Lewis, Sebastian Riedel, Timo Schick","submitted_at":"2022-08-05T17:39:22Z","abstract_excerpt":"Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2208.03299","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2022-08-05T17:39:22Z","cross_cats_sorted":[],"title_canon_sha256":"7a9b5aa924eb0af2ec244f3a0898da87f21266d4b237aba1b6ff582e8c4b60bc","abstract_canon_sha256":"bde059df50fd45afa6a2e95cfc49613066bcd5bc2ceafcdfb683eaec6bc5515a"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:47.704652Z","signature_b64":"9O4Q33EgpXrvB8C4r9taDLlWtzDJvI1vzTDTj6Rvv39tzUjnORPnAP1ZbTvKpZe7VbkpyGrypAEGeJVB7cYhCw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"eed03e5a98e8012ff4a4666f2b716e6658dc8194c6f9a0785d56a623555d48f0","last_reissued_at":"2026-05-17T23:38:47.704051Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:47.704051Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Atlas: Few-shot Learning with Retrieval Augmented Language Models","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model.","cross_cats":[],"primary_cat":"cs.CL","authors_text":"Armand Joulin, Edouard Grave, Fabio Petroni, Gautier Izacard, Jane Dwivedi-Yu, Lucas Hosseini, Maria Lomeli, Patrick Lewis, Sebastian Riedel, Timo Schick","submitted_at":"2022-08-05T17:39:22Z","abstract_excerpt":"Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings. In this work we present Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the retrieval index supplies accurate, relevant knowledge and that the pre-training plus few-shot setup reliably transfers this knowledge without the model needing to store facts internally.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ee2094d37738c912d8467f3a4bb983e2ebf79b0dcced81a9ad543c886929614f"},"source":{"id":"2208.03299","kind":"arxiv","version":3},"verdict":{"id":"47724be6-7d83-4292-a418-6b278b269e18","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T13:44:37.774736Z","strongest_claim":"Atlas reaches over 42% accuracy on Natural Questions using only 64 examples, outperforming a 540B parameters model by 3% despite having 50x fewer parameters.","one_line_summary":"Atlas reaches over 42% accuracy on Natural Questions with only 64 examples, outperforming a 540B-parameter model by 3% with 50x fewer parameters.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the retrieval index supplies accurate, relevant knowledge and that the pre-training plus few-shot setup reliably transfers this knowledge without the model needing to store facts internally.","pith_extraction_headline":"Atlas, a retrieval-augmented language model, reaches over 42 percent accuracy on Natural Questions with only 64 examples while using 50 times fewer parameters than a 540 billion parameter model."},"references":{"count":232,"sample":[{"doi":"10.48550/arxiv.2207.06300","year":null,"title":"Re2g: Retrieve, rerank, generate, 2022","work_id":"825fb564-a83f-4edf-95fd-08af8b841641","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.2108.11357","year":null,"title":"Proofver: Natural logic theorem proving for fact verification, 2021","work_id":"4139a343-5372-48e8-a01d-cb6507a14983","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.3233/sw-170273","year":2018,"title":"2018 , volume =","work_id":"491a4b00-500c-4c3d-aaab-ce3014e245d8","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2011,"title":"Robust Disambiguation of Named Entities in Text","work_id":"aa97ffbe-c246-4adb-ac75-77b160679303","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"T - RE x: A Large Scale Alignment of Natural Language with Knowledge Base Triples","work_id":"79a03089-a11e-402a-9ef5-0c648f52f4e1","ref_index":6,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":232,"snapshot_sha256":"d53d5fc2cc4706c4ffd88b7fa33d0fdc5eaef5c4e5c2ded5e75c00a3d5ad0200","internal_anchors":46},"formal_canon":{"evidence_count":2,"snapshot_sha256":"f044238e3553f249c915d253f14cd111624b5fd202c61b536c3d74370b21dfcc"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2208.03299","created_at":"2026-05-17T23:38:47.704164+00:00"},{"alias_kind":"arxiv_version","alias_value":"2208.03299v3","created_at":"2026-05-17T23:38:47.704164+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2208.03299","created_at":"2026-05-17T23:38:47.704164+00:00"},{"alias_kind":"pith_short_12","alias_value":"53ID4WUY5AAS","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"53ID4WUY5AAS75FE","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"53ID4WUY","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":29,"internal_anchor_count":29,"sample":[{"citing_arxiv_id":"2312.10997","citing_title":"Retrieval-Augmented Generation for Large Language Models: A Survey","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2505.06907","citing_title":"A Survey on Foundation Models for Personalized Federated Intelligence","ref_index":122,"is_internal_anchor":true},{"citing_arxiv_id":"2605.16347","citing_title":"HPC-LLM: Practical Domain Adaptation and Retrieval-Augmented Generation for HPC Support","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2307.06435","citing_title":"A Comprehensive Overview of Large Language Models","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2507.10722","citing_title":"Bridging Brains and Machines: A Unified Frontier in Neuroscience, Artificial Intelligence, and Neuromorphic Systems","ref_index":159,"is_internal_anchor":true},{"citing_arxiv_id":"2301.12652","citing_title":"REPLUG: Retrieval-Augmented Black-Box Language Models","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2303.09014","citing_title":"ART: Automatic multi-step reasoning and tool-use for large language models","ref_index":169,"is_internal_anchor":true},{"citing_arxiv_id":"2503.19470","citing_title":"ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2506.20670","citing_title":"MMSearch-R1: Incentivizing LMMs to Search","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2309.03883","citing_title":"DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models","ref_index":83,"is_internal_anchor":true},{"citing_arxiv_id":"2304.08244","citing_title":"API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs","ref_index":6,"is_internal_anchor":true},{"citing_arxiv_id":"2402.19473","citing_title":"Retrieval-Augmented Generation for AI-Generated Content: A Survey","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2401.18059","citing_title":"RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval","ref_index":151,"is_internal_anchor":true},{"citing_arxiv_id":"2603.27253","citing_title":"Mitigating Hallucination on Hallucination in RAG via Ensemble Voting","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03826","citing_title":"Context Matters: Evaluating Context Strategies for Automated ADR Generation Using LLMs","ref_index":15,"is_internal_anchor":true},{"citing_arxiv_id":"2309.07597","citing_title":"C-Pack: Packed Resources For General Chinese Embeddings","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10032","citing_title":"PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2308.14508","citing_title":"LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding","ref_index":91,"is_internal_anchor":true},{"citing_arxiv_id":"2310.11511","citing_title":"Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection","ref_index":95,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23750","citing_title":"The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2308.03281","citing_title":"Towards General Text Embeddings with Multi-stage Contrastive Learning","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10032","citing_title":"PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning","ref_index":58,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23750","citing_title":"The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2310.03714","citing_title":"DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2604.20300","citing_title":"FSFM: A Biologically-Inspired Framework for Selective Forgetting of Agent Memory","ref_index":49,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ","json":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ.json","graph_json":"https://pith.science/api/pith-number/53ID4WUY5AAS75FEMZXSW4LOMZ/graph.json","events_json":"https://pith.science/api/pith-number/53ID4WUY5AAS75FEMZXSW4LOMZ/events.json","paper":"https://pith.science/paper/53ID4WUY"},"agent_actions":{"view_html":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ","download_json":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ.json","view_paper":"https://pith.science/paper/53ID4WUY","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2208.03299&json=true","fetch_graph":"https://pith.science/api/pith-number/53ID4WUY5AAS75FEMZXSW4LOMZ/graph.json","fetch_events":"https://pith.science/api/pith-number/53ID4WUY5AAS75FEMZXSW4LOMZ/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ/action/timestamp_anchor","attest_storage":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ/action/storage_attestation","attest_author":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ/action/author_attestation","sign_citation":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ/action/citation_signature","submit_replication":"https://pith.science/pith/53ID4WUY5AAS75FEMZXSW4LOMZ/action/replication_record"}},"created_at":"2026-05-17T23:38:47.704164+00:00","updated_at":"2026-05-17T23:38:47.704164+00:00"}