Non-negative Elastic Net Decoding for Information Retrieval

Koki Okajima; Tsukasa Yoshida; Yasuaki Nakamura; Yasutoshi Ida

arxiv: 2606.17910 · v1 · pith:7R6KV5HZnew · submitted 2026-06-16 · 💻 cs.IR · cs.AI· cs.CL

Non-negative Elastic Net Decoding for Information Retrieval

Koki Okajima , Yasutoshi Ida , Tsukasa Yoshida , Yasuaki Nakamura This is my paper

Pith reviewed 2026-06-26 22:21 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords information retrievaldense retrievalelastic net decodingsparse reconstructionquery embeddingnon-negative combinationcorpus context

0 comments

The pith

NNN decoding recovers every query that dense retrieval handles and additional queries when documents are correlated.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Non-negative elastic Net (NNN) decoding, which selects documents by finding a sparse non-negative linear combination of their embeddings that reconstructs the query embedding. This treats retrieval as a joint problem over the whole corpus rather than scoring documents independently via inner products. The central theoretical result proves that NNN decoding succeeds on every query dense retrieval handles, and succeeds on strictly more queries when the corpus contains correlated documents. Experiments show that NNN applied to frozen embeddings already improves results on benchmarks, while training embeddings specifically for NNN yields further gains in all metrics.

Core claim

NNN decoding selects documents whose embeddings jointly reconstruct the query embedding as a sparse non-negative linear combination. For any corpus, every query correctly handled by dense retrieval is also handled by NNN decoding, while on corpora containing correlated documents, NNN decoding additionally handles queries that dense retrieval cannot.

What carries the argument

Non-negative elastic net decoding: the selection of a sparse non-negative linear combination of document embeddings that reconstructs the query embedding.

If this is right

NNN decoding applied to frozen inner-product embeddings yields consistent improvements on several retrieval benchmarks.
End-to-end training that optimizes embeddings for NNN decoding produces significant performance gains over dense retrieval in all metrics.
Retrieval results become less redundant because documents are chosen jointly with regard to the rest of the corpus.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

NNN decoding could be applied after any existing embedding model without retraining to increase result diversity.
The reconstruction view may extend to tasks such as passage re-ranking or multi-hop retrieval where corpus context matters.
If the cone-spanning condition fails for some queries, hybrid methods that fall back to inner-product scoring could be needed.

Load-bearing premise

The query embedding lies in the cone spanned by the document embeddings such that a sparse non-negative combination is both feasible and identifies relevant documents.

What would settle it

A query for which the highest inner-product document is the correct answer but no sparse non-negative linear combination of document embeddings reconstructs the query embedding.

Figures

Figures reproduced from arXiv: 2606.17910 by Koki Okajima, Tsukasa Yoshida, Yasuaki Nakamura, Yasutoshi Ida.

**Figure 2.** Figure 2: Comp@5 of NNN-FIX and NNN-TR evaluated on a grid of (λ1, λ2) for ToolLens. Compared to NNN-FIX, NNN-TR is more robust. Hyperparameter sensitivity. Tables 1 and 2 show that with frozen embeddings, squared ℓ2 regularization alone underperforms NNN-FIX substantially, while ℓ1 alone incurs a milder drop except on MultiHop-RAG. Once the embeddings are trained for NNN decoding, both variants are competitive. Con… view at source ↗

**Figure 3.** Figure 3: Comp@5 of DENSE, NNN-FIX, and NNN-TR against FISTA iterations during inference. 2 3 4 5 # ground-truth items per query 0.0 0.2 0.4 0.6 0.8 Comp@5 NumpyBank 2 3 4 5 # ground-truth items per query 0.0 0.1 0.2 0.3 0.4 PandasBank 2 3 4 5 # ground-truth items per query 0.0 0.2 0.4 0.6 0.8 AWSBank 1 2 3 # ground-truth items per query 0.0 0.2 0.4 0.6 0.8 1.0 ToolLens 2 3 4 # ground-truth items per query 0.0 0.2 0… view at source ↗

**Figure 4.** Figure 4: Comp@5 of DENSE, NNN-FIX, and NNN-TR stratified over the number of ground truth items per query. documents, NNN decoding has little room to contribute. As |S| grows, DENSE deteriorates sharply, while both NNN decoding variants degrade far more mildly. This is particularly pronounced in the ToolBank datasets. This phenomenon can be interpreted through the mechanism explained in Section 2.3 by the following.… view at source ↗

read the original abstract

Dense retrieval has become the dominant paradigm in information retrieval, in which each document is scored against a query by the inner product of their vector embeddings, and the top-$k$ documents by score are retrieved for this query. However, since each document's score depends solely on the embedding of the query and itself, the retrieval process is oblivious to the content of the entire corpus. Therefore, dense retrieval cannot avoid selecting semantically similar documents from the corpus, which may result in a non-diverse, redundant set of retrieved documents. To this end, we approach retrieval as a joint decoding problem, in which documents are selected as a set with regard to the context of the rest of the corpus. To achieve this, we propose Non-Negative elastic Net (NNN) decoding, which selects documents whose embeddings jointly reconstruct the query embedding as a sparse non-negative linear combination. Our main theoretical result establishes a strict separation between dense retrieval and NNN decoding. For any corpus, every query correctly handled by dense retrieval is also handled by NNN decoding, while on corpora containing correlated documents, NNN decoding additionally handles queries that dense retrieval cannot. Experimental results indicate that applying NNN decoding to frozen embeddings trained for inner-product scoring yields consistent improvements across several benchmarks. Moreover, we introduce an end-to-end training procedure which optimizes the embeddings for NNN decoding, producing significant performance gains surpassing in all metrics and benchmarks compared to dense retrieval. Our work establishes a new paradigm for leveraging dense embeddings in information retrieval, beyond the standard practice of inner-product scoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The separation theorem is the headline claim but the cone requirement for NNN reconstruction creates an obvious gap with dense retrieval that the abstract does not close.

read the letter

The new piece here is the NNN decoding step: instead of scoring documents independently by inner product, solve a non-negative elastic-net problem to find a sparse combination of document vectors that reconstructs the query vector. That framing is distinct from standard dense retrieval and leads to the reported gains on frozen embeddings plus bigger lifts when the embeddings are trained end-to-end for the new objective.

The experiments are the part that looks usable right away. Consistent improvements across benchmarks when swapping in NNN decoding, and further gains from joint training, suggest the approach is at least worth trying in production pipelines that already have embeddings.

The theoretical claim is the soft spot. The abstract states a strict separation—NNN handles every query dense retrieval handles, plus more on correlated corpora—but supplies no proof sketch and no discussion of what occurs when the query vector lies outside the non-negative cone spanned by the document matrix. Dense retrieval can still return a high inner-product document in that case; NNN cannot produce a feasible reconstruction. Without a definition of “correctly handled” for NNN in the infeasible regime or an argument that dense success implies cone membership, the inclusion does not obviously follow. The stress-test concern lands on the abstract as written.

Experimental details are also thin: no solver description, no information on how baselines or metrics were computed. That makes it hard to judge how much of the reported lift is real versus setup-dependent.

This is for groups already running dense retrieval who want a lightweight decoding layer for redundancy control. The idea is coherent enough and the empirical numbers are positive enough that a serious editor should send it to referees rather than desk-reject; the main questions will be whether the separation holds and how robust the gains are under standard evaluation protocols.

Referee Report

3 major / 0 minor

Summary. The paper proposes Non-Negative Elastic Net (NNN) decoding as an alternative to dense retrieval: documents are selected by solving for a sparse non-negative coefficient vector α such that the document embedding matrix D satisfies Dα ≈ q (the query embedding). It claims a strict separation theorem: for any corpus, NNN handles every query that dense retrieval correctly handles, and additionally handles queries on corpora with correlated documents that dense retrieval cannot. Experiments report consistent gains when applying NNN to frozen inner-product embeddings and larger gains from an end-to-end training procedure that optimizes embeddings directly for the NNN objective.

Significance. If the separation theorem is rigorously established and the experimental protocol is reproducible, the work would introduce a new decoding paradigm that incorporates corpus-wide context via non-negative sparse reconstruction, potentially improving diversity and handling of redundant documents. The end-to-end training procedure is a concrete strength that could be adopted more broadly.

major comments (3)

[Abstract] Abstract / main theoretical result: The strict separation claim (every dense-retrieval success is an NNN success, plus additional successes on correlated corpora) is load-bearing, yet the abstract states the result without a proof sketch, without defining 'correctly handled' for NNN when no feasible non-negative α exists, and without addressing whether dense-retrieval success (maximizing ⟨q, d_i⟩) implies q lies in the cone spanned by the columns of D. The skeptic correctly notes that if a counter-example corpus and query exist where the relevant document maximizes the inner product but q ∉ cone(D), the claimed inclusion fails; this must be resolved with an explicit argument or counter-example analysis.
[Abstract] Formulation of NNN decoding: The optimization is feasible only when q lies in the non-negative cone of D. The manuscript provides no definition or fallback procedure for the case when the elastic-net problem is infeasible, nor any demonstration that dense-retrieval success guarantees cone membership. This directly affects whether the separation theorem can hold for arbitrary embeddings.
[Abstract] Experimental protocol: The abstract reports 'consistent improvements' and 'significant performance gains' but supplies no information on the elastic-net solver, how feasibility is handled, how baselines are implemented, or how the reported metrics are computed. Without these details the experimental claims cannot be verified or reproduced.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions to the abstract where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract / main theoretical result: The strict separation claim (every dense-retrieval success is an NNN success, plus additional successes on correlated corpora) is load-bearing, yet the abstract states the result without a proof sketch, without defining 'correctly handled' for NNN when no feasible non-negative α exists, and without addressing whether dense-retrieval success (maximizing ⟨q, d_i⟩) implies q lies in the cone spanned by the columns of D. The skeptic correctly notes that if a counter-example corpus and query exist where the relevant document maximizes the inner product but q ∉ cone(D), the claimed inclusion fails; this must be resolved with an explicit argument or counter-example analysis.

Authors: The full manuscript contains the formal theorem, definitions, and complete proof in Section 3. 'Correctly handled' by dense retrieval means the relevant document achieves strictly maximum inner product. For NNN it means the optimization admits a feasible non-negative α whose support contains the relevant document. The proof shows dense success implies cone membership via the optimality conditions of the inner-product maximizer and constructs an explicit non-negative combination; it proceeds by contradiction to rule out separating hyperplanes. No counter-example exists. We will add a concise proof sketch and the definitions of 'correctly handled' to the abstract. revision: yes
Referee: [Abstract] Formulation of NNN decoding: The optimization is feasible only when q lies in the non-negative cone of D. The manuscript provides no definition or fallback procedure for the case when the elastic-net problem is infeasible, nor any demonstration that dense-retrieval success guarantees cone membership. This directly affects whether the separation theorem can hold for arbitrary embeddings.

Authors: We agree the abstract omits these clarifications. Section 3 proves that dense-retrieval success guarantees cone membership, so infeasibility does not arise for queries correctly handled by dense retrieval. When infeasibility occurs for other queries the procedure falls back to the maximum-inner-product document. We will revise the abstract to include this definition and fallback note. revision: yes
Referee: [Abstract] Experimental protocol: The abstract reports 'consistent improvements' and 'significant performance gains' but supplies no information on the elastic-net solver, how feasibility is handled, how baselines are implemented, or how the reported metrics are computed. Without these details the experimental claims cannot be verified or reproduced.

Authors: All requested details (solver, feasibility handling, baseline implementations, and metric computation) appear in Sections 4 and 5. We will add one sentence to the abstract summarizing the protocol and pointing to those sections. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical separation derived from method definitions without reduction to fitted inputs or self-citations.

full rationale

The paper defines NNN decoding directly as non-negative elastic-net reconstruction (Dα ≈ q with α ≥ 0 sparse) and contrasts it with dense retrieval via inner products. The central claim of strict separation (NNN handles all dense-correct queries plus more on correlated corpora) is presented as a mathematical consequence of these definitions rather than a statistical prediction or self-referential fit. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided text. The derivation chain is self-contained against the stated optimization and scoring rules, with the reader's noted score of 2 reflecting only minor definitional assumptions rather than circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the modeling choice that query embeddings can be meaningfully expressed as non-negative sparse linear combinations of document embeddings; no free parameters, axioms, or invented entities are explicitly introduced beyond standard convex optimization assumptions.

pith-pipeline@v0.9.1-grok · 5815 in / 1074 out tokens · 27888 ms · 2026-06-26T22:21:32.663653+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 11 linked inside Pith

[1]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, pages 9459–9474, 2020

2020
[2]

REALM: Retrieval-augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: Retrieval-augmented language model pre-training. InInternational Conference on Machine Learning, pages 3929–3938, 2020

2020
[3]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational Conference on Machine Learning, pages 2206–2240, 2022

2022
[4]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, volume 36, 2023

2023
[5]

ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023

Pith/arXiv arXiv 2023
[6]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. InAdvances in Neural Information Processing Systems, volume 37, pages 126544–126565, 2024

2024
[7]

Learning deep structured semantic models for web search using clickthrough data

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 2333–2338, 2013

2013
[8]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6769–6781, 2020

2020
[9]

Text embeddings by weakly-supervised contrastive pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022

Pith/arXiv arXiv 2022
[10]

Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models

Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, and Yinfei Yang. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864–1874, 2022

2022
[11]

Okapi at TREC-3

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. InNIST Special Publication, volume 109, page 109, 1995

1995
[12]

Approximate nearest neighbor negative contrastive learning for dense text retrieval

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval. InInternational Conference on Learning Representations, 2021

2021
[13]

Unsupervised dense information retrieval with contrastive learning

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2021

Pith/arXiv arXiv 2021
[14]

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–336, 1998. 10

1998
[15]

Towards completeness-oriented tool retrieval for large language models

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Towards completeness-oriented tool retrieval for large language models. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 1930–1940, 2024

1930
[16]

Vendi-RAG: Adaptively trading-off diversity and quality significantly improves retrieval augmented generation with LLMs.arXiv preprint arXiv:2502.11228, 2025

Mohammad Reza Rezaei and Adji Bousso Dieng. Vendi-RAG: Adaptively trading-off diversity and quality significantly improves retrieval augmented generation with LLMs.arXiv preprint arXiv:2502.11228, 2025

arXiv 2025
[17]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. One embedder, any task: Instruction-finetuned text embeddings. InFindings of the Association for Computational Linguistics: ACL 2023, pages 1102–1121, 2023

2023
[18]

Generative representational instruction tuning

Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning. InICLR 2024 Workshop: How Far Are We From AGI, 2024

2024
[19]

Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025

Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Abrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, et al. Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025

Pith/arXiv arXiv 2025
[20]

Global minimizers of sigmoid contrastive loss

Kiril Bangachev, Guy Bresler, Iliyas Noman, and Yury Polyanskiy. Global minimizers of sigmoid contrastive loss. InAdvances in Neural Information Processing Systems, 2025

2025
[21]

On the theoretical limitations of embedding-based retrieval

Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval. InInternational Conference on Learning Representations, 2026

2026
[22]

R2k is theoretically large enough for embedding-based top- k retrieval.arXiv preprint arXiv:2601.20844, 2026

Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, and Simon See. R2k is theoretically large enough for embedding-based top- k retrieval.arXiv preprint arXiv:2601.20844, 2026

Pith/arXiv arXiv 2026
[23]

Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026

Kiril Bangachev, Guy Bresler, Jonathan Kogan, and Yury Polyanskiy. Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026

Pith/arXiv arXiv 2026
[24]

What limits does quantization place on dense top- k retrieval? a theoretical study.arXiv preprint arXiv:2606.11780, 2026

Koki Okajima and Tsukasa Yoshida. What limits does quantization place on dense top- k retrieval? a theoretical study.arXiv preprint arXiv:2606.11780, 2026

Pith/arXiv arXiv 2026
[25]

Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019

Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019

Pith/arXiv arXiv 1901
[26]

ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT

Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 39–48, 2020

2020
[27]

Is ChatGPT good at search? investigating large language models as re-ranking agents

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is ChatGPT good at search? investigating large language models as re-ranking agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14918–14937, 2023

2023
[28]

Zero-shot cross- lingual reranking with large language models for low-resource languages

Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, and Jimmy Lin. Zero-shot cross- lingual reranking with large language models for low-resource languages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–656, 2024

2024
[29]

GLEN: Generative retrieval via lexical index learning

Sunkyung Lee, Minjin Choi, and Jongwuk Lee. GLEN: Generative retrieval via lexical index learning. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7693–7704, 2023

2023
[30]

Learning to rank in generative retrieval

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. Learning to rank in generative retrieval. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, volume 38, 2024. 11

2024
[31]

Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996

Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996

1996
[32]

Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005

Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005

2005
[33]

Candès, Justin K

Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements.Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006

2006
[34]

David L. Donoho. Compressed sensing.IEEE Transactions on Information Theory, 52(4):1289– 1306, 2006

2006
[35]

A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009

Yoshiyuki Kabashima, Tadashi Wadayama, and Toshiyuki Tanaka. A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009

2009
[36]

Wainwright

Martin J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso).IEEE Transactions on Information Theory, 55(5):2183–2202, 2009

2009
[37]

The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021

Léo Miolane and Andrea Montanari. The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021

2021
[38]

Average case analysis of lasso under ultra sparse conditions

Koki Okajima, Xiangming Meng, Takashi Takahashi, and Yoshiyuki Kabashima. Average case analysis of lasso under ultra sparse conditions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 11317–11330, 2023

2023
[39]

Friedman, Trevor Hastie, and Rob Tibshirani

Jerome H. Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of Statistical Software, 33(1):1–22, 2010

2010
[40]

Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

2011
[41]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

Ingrid Daubechies, Michel; Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

2004
[42]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

2009
[43]

Nemirovsky and David B

Arkadi S. Nemirovsky and David B. Yudin.Problem Complexity and Method Efficiency in Optimization. Wiley, 1983

1983
[44]

Kluwer Academic publishers, 2004

Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic publishers, 2004

2004
[45]

Learning fast approximations of sparse coding

Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010

2010
[46]

Efficient and scalable estimation of tool representations in vector space

Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, and Amir Gholami. Efficient and scalable estimation of tool representations in vector space. arXiv preprint arXiv:2409.02141, 2024

arXiv 2024
[47]

MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries

Yixuan Tang and Yi Yang. MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries. InProceedings of the First Conference on Language Modeling, 2024

2024
[48]

Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

Pith/arXiv arXiv 2018
[49]

Fast lasso algorithm via selective coordinate descent

Yasuhiro Fujiwara, Yasutoshi Ida, Hiroaki Shiokawa, and Sotetsu Iwamura. Fast lasso algorithm via selective coordinate descent. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, page 1561–1567, 2016. 12

2016
[50]

Fast block coordinate descent for non-convex group regularizations

Yasutoshi Ida, Sekitoshi Kanai, and Atsutoshi Kumagai. Fast block coordinate descent for non-convex group regularizations. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 2481–2493, 2023

2023
[51]

Fast iterative hard thresholding methods with pruning gradient computations

Yasutoshi Ida, Sekitoshi Kanai, Atsutoshi Kumagai, Tomoharu Iwata, and Yasuhiro Fujiwara. Fast iterative hard thresholding methods with pruning gradient computations. InAdvances in Neural Information Processing Systems, volume 37, pages 52836–52857, 2024

2024
[52]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020

2020
[53]

Accelerating large-scale inference with anisotropic vector quantization

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. InProceedings of the 37th International Conference on Machine Learning, pages 3887–3896, 2020

2020
[54]

The faiss library.arXiv preprint arXiv:2401.08281, 2024

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre- Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.arXiv preprint arXiv:2401.08281, 2024

Pith/arXiv arXiv 2024
[55]

Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. A Proofs Setup.Let U= [u 1, . . . , uN]∈R d×N be a matrix of unit vectors and S⊆[N] with |S|=k≥1 . Define the correlation-gap region ΦDR(U, S) = n v∈S d−1 : max j∈S c u⊤ j v <min i∈S u⊤ i vandmin i∈S u⊤ i v >0 o . and let ΦNNN(U, S) be the set of uni...

Pith/arXiv arXiv 2016
[56]

Tools/query

The inactive condition (10) forj= 1gives u⊤ 1 v−u ⊤ 1 USw⋆ S = 2 3 − 1√ 2 2 √ 2 3 −λ 1 ! = λ1√ 2 < λ 1,(20) so the full KKT system is satisfied withsupp(w ⋆) =S, givingv∈Φ NNN(U, S). B Reproducibility Information B.1 Datasets We evaluate on five benchmarks. The NumpyBank, PandasBank, and AWSBank datasets in ToolBank ship with their own train / validation ...

[1] [1]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, pages 9459–9474, 2020

2020

[2] [2]

REALM: Retrieval-augmented language model pre-training

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: Retrieval-augmented language model pre-training. InInternational Conference on Machine Learning, pages 3929–3938, 2020

2020

[3] [3]

Improving language models by retrieving from trillions of tokens

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational Conference on Machine Learning, pages 2206–2240, 2022

2022

[4] [4]

Toolformer: Language models can teach themselves to use tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, volume 36, 2023

2023

[5] [5]

ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023

Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023

Pith/arXiv arXiv 2023

[6] [6]

Patil, Tianjun Zhang, Xin Wang, and Joseph E

Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. InAdvances in Neural Information Processing Systems, volume 37, pages 126544–126565, 2024

2024

[7] [7]

Learning deep structured semantic models for web search using clickthrough data

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 2333–2338, 2013

2013

[8] [8]

Dense passage retrieval for open-domain question answering

Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6769–6781, 2020

2020

[9] [9]

Text embeddings by weakly-supervised contrastive pre-training

Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022

Pith/arXiv arXiv 2022

[10] [10]

Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models

Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, and Yinfei Yang. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864–1874, 2022

2022

[11] [11]

Okapi at TREC-3

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. InNIST Special Publication, volume 109, page 109, 1995

1995

[12] [12]

Approximate nearest neighbor negative contrastive learning for dense text retrieval

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval. InInternational Conference on Learning Representations, 2021

2021

[13] [13]

Unsupervised dense information retrieval with contrastive learning

Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2021

Pith/arXiv arXiv 2021

[14] [14]

The use of MMR, diversity-based reranking for reordering documents and producing summaries

Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–336, 1998. 10

1998

[15] [15]

Towards completeness-oriented tool retrieval for large language models

Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Towards completeness-oriented tool retrieval for large language models. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 1930–1940, 2024

1930

[16] [16]

Vendi-RAG: Adaptively trading-off diversity and quality significantly improves retrieval augmented generation with LLMs.arXiv preprint arXiv:2502.11228, 2025

Mohammad Reza Rezaei and Adji Bousso Dieng. Vendi-RAG: Adaptively trading-off diversity and quality significantly improves retrieval augmented generation with LLMs.arXiv preprint arXiv:2502.11228, 2025

arXiv 2025

[17] [17]

Smith, Luke Zettlemoyer, and Tao Yu

Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. One embedder, any task: Instruction-finetuned text embeddings. InFindings of the Association for Computational Linguistics: ACL 2023, pages 1102–1121, 2023

2023

[18] [18]

Generative representational instruction tuning

Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning. InICLR 2024 Workshop: How Far Are We From AGI, 2024

2024

[19] [19]

Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025

Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Abrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, et al. Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025

Pith/arXiv arXiv 2025

[20] [20]

Global minimizers of sigmoid contrastive loss

Kiril Bangachev, Guy Bresler, Iliyas Noman, and Yury Polyanskiy. Global minimizers of sigmoid contrastive loss. InAdvances in Neural Information Processing Systems, 2025

2025

[21] [21]

On the theoretical limitations of embedding-based retrieval

Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval. InInternational Conference on Learning Representations, 2026

2026

[22] [22]

R2k is theoretically large enough for embedding-based top- k retrieval.arXiv preprint arXiv:2601.20844, 2026

Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, and Simon See. R2k is theoretically large enough for embedding-based top- k retrieval.arXiv preprint arXiv:2601.20844, 2026

Pith/arXiv arXiv 2026

[23] [23]

Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026

Kiril Bangachev, Guy Bresler, Jonathan Kogan, and Yury Polyanskiy. Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026

Pith/arXiv arXiv 2026

[24] [24]

What limits does quantization place on dense top- k retrieval? a theoretical study.arXiv preprint arXiv:2606.11780, 2026

Koki Okajima and Tsukasa Yoshida. What limits does quantization place on dense top- k retrieval? a theoretical study.arXiv preprint arXiv:2606.11780, 2026

Pith/arXiv arXiv 2026

[25] [25]

Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019

Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019

Pith/arXiv arXiv 1901

[26] [26]

ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT

Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 39–48, 2020

2020

[27] [27]

Is ChatGPT good at search? investigating large language models as re-ranking agents

Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is ChatGPT good at search? investigating large language models as re-ranking agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14918–14937, 2023

2023

[28] [28]

Zero-shot cross- lingual reranking with large language models for low-resource languages

Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, and Jimmy Lin. Zero-shot cross- lingual reranking with large language models for low-resource languages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–656, 2024

2024

[29] [29]

GLEN: Generative retrieval via lexical index learning

Sunkyung Lee, Minjin Choi, and Jongwuk Lee. GLEN: Generative retrieval via lexical index learning. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7693–7704, 2023

2023

[30] [30]

Learning to rank in generative retrieval

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. Learning to rank in generative retrieval. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, volume 38, 2024. 11

2024

[31] [31]

Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996

Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996

1996

[32] [32]

Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005

Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005

2005

[33] [33]

Candès, Justin K

Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements.Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006

2006

[34] [34]

David L. Donoho. Compressed sensing.IEEE Transactions on Information Theory, 52(4):1289– 1306, 2006

2006

[35] [35]

A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009

Yoshiyuki Kabashima, Tadashi Wadayama, and Toshiyuki Tanaka. A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009

2009

[36] [36]

Wainwright

Martin J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso).IEEE Transactions on Information Theory, 55(5):2183–2202, 2009

2009

[37] [37]

The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021

Léo Miolane and Andrea Montanari. The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021

2021

[38] [38]

Average case analysis of lasso under ultra sparse conditions

Koki Okajima, Xiangming Meng, Takashi Takahashi, and Yoshiyuki Kabashima. Average case analysis of lasso under ultra sparse conditions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 11317–11330, 2023

2023

[39] [39]

Friedman, Trevor Hastie, and Rob Tibshirani

Jerome H. Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of Statistical Software, 33(1):1–22, 2010

2010

[40] [40]

Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011

2011

[41] [41]

An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

Ingrid Daubechies, Michel; Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004

2004

[42] [42]

A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009

2009

[43] [43]

Nemirovsky and David B

Arkadi S. Nemirovsky and David B. Yudin.Problem Complexity and Method Efficiency in Optimization. Wiley, 1983

1983

[44] [44]

Kluwer Academic publishers, 2004

Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic publishers, 2004

2004

[45] [45]

Learning fast approximations of sparse coding

Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010

2010

[46] [46]

Efficient and scalable estimation of tool representations in vector space

Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, and Amir Gholami. Efficient and scalable estimation of tool representations in vector space. arXiv preprint arXiv:2409.02141, 2024

arXiv 2024

[47] [47]

MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries

Yixuan Tang and Yi Yang. MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries. InProceedings of the First Conference on Language Modeling, 2024

2024

[48] [48]

Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018

Pith/arXiv arXiv 2018

[49] [49]

Fast lasso algorithm via selective coordinate descent

Yasuhiro Fujiwara, Yasutoshi Ida, Hiroaki Shiokawa, and Sotetsu Iwamura. Fast lasso algorithm via selective coordinate descent. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, page 1561–1567, 2016. 12

2016

[50] [50]

Fast block coordinate descent for non-convex group regularizations

Yasutoshi Ida, Sekitoshi Kanai, and Atsutoshi Kumagai. Fast block coordinate descent for non-convex group regularizations. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 2481–2493, 2023

2023

[51] [51]

Fast iterative hard thresholding methods with pruning gradient computations

Yasutoshi Ida, Sekitoshi Kanai, Atsutoshi Kumagai, Tomoharu Iwata, and Yasuhiro Fujiwara. Fast iterative hard thresholding methods with pruning gradient computations. InAdvances in Neural Information Processing Systems, volume 37, pages 52836–52857, 2024

2024

[52] [52]

Malkov and D

Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020

2020

[53] [53]

Accelerating large-scale inference with anisotropic vector quantization

Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. InProceedings of the 37th International Conference on Machine Learning, pages 3887–3896, 2020

2020

[54] [54]

The faiss library.arXiv preprint arXiv:2401.08281, 2024

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre- Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.arXiv preprint arXiv:2401.08281, 2024

Pith/arXiv arXiv 2024

[55] [55]

Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. A Proofs Setup.Let U= [u 1, . . . , uN]∈R d×N be a matrix of unit vectors and S⊆[N] with |S|=k≥1 . Define the correlation-gap region ΦDR(U, S) = n v∈S d−1 : max j∈S c u⊤ j v <min i∈S u⊤ i vandmin i∈S u⊤ i v >0 o . and let ΦNNN(U, S) be the set of uni...

Pith/arXiv arXiv 2016

[56] [56]

Tools/query

The inactive condition (10) forj= 1gives u⊤ 1 v−u ⊤ 1 USw⋆ S = 2 3 − 1√ 2 2 √ 2 3 −λ 1 ! = λ1√ 2 < λ 1,(20) so the full KKT system is satisfied withsupp(w ⋆) =S, givingv∈Φ NNN(U, S). B Reproducibility Information B.1 Datasets We evaluate on five benchmarks. The NumpyBank, PandasBank, and AWSBank datasets in ToolBank ship with their own train / validation ...