Non-negative Elastic Net Decoding for Information Retrieval
Pith reviewed 2026-06-26 22:21 UTC · model grok-4.3
The pith
NNN decoding recovers every query that dense retrieval handles and additional queries when documents are correlated.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NNN decoding selects documents whose embeddings jointly reconstruct the query embedding as a sparse non-negative linear combination. For any corpus, every query correctly handled by dense retrieval is also handled by NNN decoding, while on corpora containing correlated documents, NNN decoding additionally handles queries that dense retrieval cannot.
What carries the argument
Non-negative elastic net decoding: the selection of a sparse non-negative linear combination of document embeddings that reconstructs the query embedding.
If this is right
- NNN decoding applied to frozen inner-product embeddings yields consistent improvements on several retrieval benchmarks.
- End-to-end training that optimizes embeddings for NNN decoding produces significant performance gains over dense retrieval in all metrics.
- Retrieval results become less redundant because documents are chosen jointly with regard to the rest of the corpus.
Where Pith is reading between the lines
- NNN decoding could be applied after any existing embedding model without retraining to increase result diversity.
- The reconstruction view may extend to tasks such as passage re-ranking or multi-hop retrieval where corpus context matters.
- If the cone-spanning condition fails for some queries, hybrid methods that fall back to inner-product scoring could be needed.
Load-bearing premise
The query embedding lies in the cone spanned by the document embeddings such that a sparse non-negative combination is both feasible and identifies relevant documents.
What would settle it
A query for which the highest inner-product document is the correct answer but no sparse non-negative linear combination of document embeddings reconstructs the query embedding.
Figures
read the original abstract
Dense retrieval has become the dominant paradigm in information retrieval, in which each document is scored against a query by the inner product of their vector embeddings, and the top-$k$ documents by score are retrieved for this query. However, since each document's score depends solely on the embedding of the query and itself, the retrieval process is oblivious to the content of the entire corpus. Therefore, dense retrieval cannot avoid selecting semantically similar documents from the corpus, which may result in a non-diverse, redundant set of retrieved documents. To this end, we approach retrieval as a joint decoding problem, in which documents are selected as a set with regard to the context of the rest of the corpus. To achieve this, we propose Non-Negative elastic Net (NNN) decoding, which selects documents whose embeddings jointly reconstruct the query embedding as a sparse non-negative linear combination. Our main theoretical result establishes a strict separation between dense retrieval and NNN decoding. For any corpus, every query correctly handled by dense retrieval is also handled by NNN decoding, while on corpora containing correlated documents, NNN decoding additionally handles queries that dense retrieval cannot. Experimental results indicate that applying NNN decoding to frozen embeddings trained for inner-product scoring yields consistent improvements across several benchmarks. Moreover, we introduce an end-to-end training procedure which optimizes the embeddings for NNN decoding, producing significant performance gains surpassing in all metrics and benchmarks compared to dense retrieval. Our work establishes a new paradigm for leveraging dense embeddings in information retrieval, beyond the standard practice of inner-product scoring.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Non-Negative Elastic Net (NNN) decoding as an alternative to dense retrieval: documents are selected by solving for a sparse non-negative coefficient vector α such that the document embedding matrix D satisfies Dα ≈ q (the query embedding). It claims a strict separation theorem: for any corpus, NNN handles every query that dense retrieval correctly handles, and additionally handles queries on corpora with correlated documents that dense retrieval cannot. Experiments report consistent gains when applying NNN to frozen inner-product embeddings and larger gains from an end-to-end training procedure that optimizes embeddings directly for the NNN objective.
Significance. If the separation theorem is rigorously established and the experimental protocol is reproducible, the work would introduce a new decoding paradigm that incorporates corpus-wide context via non-negative sparse reconstruction, potentially improving diversity and handling of redundant documents. The end-to-end training procedure is a concrete strength that could be adopted more broadly.
major comments (3)
- [Abstract] Abstract / main theoretical result: The strict separation claim (every dense-retrieval success is an NNN success, plus additional successes on correlated corpora) is load-bearing, yet the abstract states the result without a proof sketch, without defining 'correctly handled' for NNN when no feasible non-negative α exists, and without addressing whether dense-retrieval success (maximizing ⟨q, d_i⟩) implies q lies in the cone spanned by the columns of D. The skeptic correctly notes that if a counter-example corpus and query exist where the relevant document maximizes the inner product but q ∉ cone(D), the claimed inclusion fails; this must be resolved with an explicit argument or counter-example analysis.
- [Abstract] Formulation of NNN decoding: The optimization is feasible only when q lies in the non-negative cone of D. The manuscript provides no definition or fallback procedure for the case when the elastic-net problem is infeasible, nor any demonstration that dense-retrieval success guarantees cone membership. This directly affects whether the separation theorem can hold for arbitrary embeddings.
- [Abstract] Experimental protocol: The abstract reports 'consistent improvements' and 'significant performance gains' but supplies no information on the elastic-net solver, how feasibility is handled, how baselines are implemented, or how the reported metrics are computed. Without these details the experimental claims cannot be verified or reproduced.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment point by point below, indicating planned revisions to the abstract where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract / main theoretical result: The strict separation claim (every dense-retrieval success is an NNN success, plus additional successes on correlated corpora) is load-bearing, yet the abstract states the result without a proof sketch, without defining 'correctly handled' for NNN when no feasible non-negative α exists, and without addressing whether dense-retrieval success (maximizing ⟨q, d_i⟩) implies q lies in the cone spanned by the columns of D. The skeptic correctly notes that if a counter-example corpus and query exist where the relevant document maximizes the inner product but q ∉ cone(D), the claimed inclusion fails; this must be resolved with an explicit argument or counter-example analysis.
Authors: The full manuscript contains the formal theorem, definitions, and complete proof in Section 3. 'Correctly handled' by dense retrieval means the relevant document achieves strictly maximum inner product. For NNN it means the optimization admits a feasible non-negative α whose support contains the relevant document. The proof shows dense success implies cone membership via the optimality conditions of the inner-product maximizer and constructs an explicit non-negative combination; it proceeds by contradiction to rule out separating hyperplanes. No counter-example exists. We will add a concise proof sketch and the definitions of 'correctly handled' to the abstract. revision: yes
-
Referee: [Abstract] Formulation of NNN decoding: The optimization is feasible only when q lies in the non-negative cone of D. The manuscript provides no definition or fallback procedure for the case when the elastic-net problem is infeasible, nor any demonstration that dense-retrieval success guarantees cone membership. This directly affects whether the separation theorem can hold for arbitrary embeddings.
Authors: We agree the abstract omits these clarifications. Section 3 proves that dense-retrieval success guarantees cone membership, so infeasibility does not arise for queries correctly handled by dense retrieval. When infeasibility occurs for other queries the procedure falls back to the maximum-inner-product document. We will revise the abstract to include this definition and fallback note. revision: yes
-
Referee: [Abstract] Experimental protocol: The abstract reports 'consistent improvements' and 'significant performance gains' but supplies no information on the elastic-net solver, how feasibility is handled, how baselines are implemented, or how the reported metrics are computed. Without these details the experimental claims cannot be verified or reproduced.
Authors: All requested details (solver, feasibility handling, baseline implementations, and metric computation) appear in Sections 4 and 5. We will add one sentence to the abstract summarizing the protocol and pointing to those sections. revision: yes
Circularity Check
No circularity: theoretical separation derived from method definitions without reduction to fitted inputs or self-citations.
full rationale
The paper defines NNN decoding directly as non-negative elastic-net reconstruction (Dα ≈ q with α ≥ 0 sparse) and contrasts it with dense retrieval via inner products. The central claim of strict separation (NNN handles all dense-correct queries plus more on correlated corpora) is presented as a mathematical consequence of these definitions rather than a statistical prediction or self-referential fit. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided text. The derivation chain is self-contained against the stated optimization and scoring rules, with the reader's noted score of 2 reflecting only minor definitional assumptions rather than circular reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems, volume 33, pages 9459–9474, 2020
2020
-
[2]
REALM: Retrieval-augmented language model pre-training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. REALM: Retrieval-augmented language model pre-training. InInternational Conference on Machine Learning, pages 3929–3938, 2020
2020
-
[3]
Improving language models by retrieving from trillions of tokens
Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. Improving language models by retrieving from trillions of tokens. InInternational Conference on Machine Learning, pages 2206–2240, 2022
2022
-
[4]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, volume 36, 2023
2023
-
[5]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. ToolLLM: Facilitating large language models to master 16000+ real-world APIs.arXiv preprint arXiv:2307.16789, 2023
Pith/arXiv arXiv 2023
-
[6]
Patil, Tianjun Zhang, Xin Wang, and Joseph E
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive apis. InAdvances in Neural Information Processing Systems, volume 37, pages 126544–126565, 2024
2024
-
[7]
Learning deep structured semantic models for web search using clickthrough data
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. Learning deep structured semantic models for web search using clickthrough data. InProceedings of the 22nd ACM International Conference on Information and Knowledge Management, pages 2333–2338, 2013
2013
-
[8]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas O ˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 6769–6781, 2020
2020
-
[9]
Text embeddings by weakly-supervised contrastive pre-training
Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, and Furu Wei. Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533, 2022
Pith/arXiv arXiv 2022
-
[10]
Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models
Jianmo Ni, Gustavo Hernandez Abrego, Noah Constant, Ji Ma, Keith Hall, Daniel Cer, and Yinfei Yang. Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1864–1874, 2022
2022
-
[11]
Okapi at TREC-3
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. InNIST Special Publication, volume 109, page 109, 1995
1995
-
[12]
Approximate nearest neighbor negative contrastive learning for dense text retrieval
Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul N Bennett, Junaid Ahmed, and Arnold Overwijk. Approximate nearest neighbor negative contrastive learning for dense text retrieval. InInternational Conference on Learning Representations, 2021
2021
-
[13]
Unsupervised dense information retrieval with contrastive learning
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, and Edouard Grave. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118, 2021
Pith/arXiv arXiv 2021
-
[14]
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. InProceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335–336, 1998. 10
1998
-
[15]
Towards completeness-oriented tool retrieval for large language models
Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-Rong Wen. Towards completeness-oriented tool retrieval for large language models. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management, pages 1930–1940, 2024
1930
-
[16]
Mohammad Reza Rezaei and Adji Bousso Dieng. Vendi-RAG: Adaptively trading-off diversity and quality significantly improves retrieval augmented generation with LLMs.arXiv preprint arXiv:2502.11228, 2025
arXiv 2025
-
[17]
Smith, Luke Zettlemoyer, and Tao Yu
Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, and Tao Yu. One embedder, any task: Instruction-finetuned text embeddings. InFindings of the Association for Computational Linguistics: ACL 2023, pages 1102–1121, 2023
2023
-
[18]
Generative representational instruction tuning
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, and Douwe Kiela. Generative representational instruction tuning. InICLR 2024 Workshop: How Far Are We From AGI, 2024
2024
-
[19]
Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025
Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Abrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, et al. Gemini embedding: Generalizable embeddings from Gemini.arXiv preprint arXiv:2503.07891, 2025
Pith/arXiv arXiv 2025
-
[20]
Global minimizers of sigmoid contrastive loss
Kiril Bangachev, Guy Bresler, Iliyas Noman, and Yury Polyanskiy. Global minimizers of sigmoid contrastive loss. InAdvances in Neural Information Processing Systems, 2025
2025
-
[21]
On the theoretical limitations of embedding-based retrieval
Orion Weller, Michael Boratko, Iftekhar Naim, and Jinhyuk Lee. On the theoretical limitations of embedding-based retrieval. InInternational Conference on Learning Representations, 2026
2026
-
[22]
Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, and Simon See. R2k is theoretically large enough for embedding-based top- k retrieval.arXiv preprint arXiv:2601.20844, 2026
Pith/arXiv arXiv 2026
-
[23]
Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026
Kiril Bangachev, Guy Bresler, Jonathan Kogan, and Yury Polyanskiy. Is dimensionality a barrier for retrieval models?arXiv preprint arXiv:2605.23556, 2026
Pith/arXiv arXiv 2026
-
[24]
Koki Okajima and Tsukasa Yoshida. What limits does quantization place on dense top- k retrieval? a theoretical study.arXiv preprint arXiv:2606.11780, 2026
Pith/arXiv arXiv 2026
-
[25]
Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019
Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019
Pith/arXiv arXiv 1901
-
[26]
ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT
Omar Khattab and Matei Zaharia. ColBERT: Efficient and effective passage search via con- textualized late interaction over BERT. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 39–48, 2020
2020
-
[27]
Is ChatGPT good at search? investigating large language models as re-ranking agents
Weiwei Sun, Lingyong Yan, Xinyu Ma, Shuaiqiang Wang, Pengjie Ren, Zhumin Chen, Dawei Yin, and Zhaochun Ren. Is ChatGPT good at search? investigating large language models as re-ranking agents. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14918–14937, 2023
2023
-
[28]
Zero-shot cross- lingual reranking with large language models for low-resource languages
Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, and Jimmy Lin. Zero-shot cross- lingual reranking with large language models for low-resource languages. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–656, 2024
2024
-
[29]
GLEN: Generative retrieval via lexical index learning
Sunkyung Lee, Minjin Choi, and Jongwuk Lee. GLEN: Generative retrieval via lexical index learning. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7693–7704, 2023
2023
-
[30]
Learning to rank in generative retrieval
Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. Learning to rank in generative retrieval. InProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, volume 38, 2024. 11
2024
-
[31]
Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996
Robert Tibshirani. Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996
1996
-
[32]
Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005
Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005
2005
-
[33]
Candès, Justin K
Emmanuel J. Candès, Justin K. Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements.Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006
2006
-
[34]
David L. Donoho. Compressed sensing.IEEE Transactions on Information Theory, 52(4):1289– 1306, 2006
2006
-
[35]
A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009
Yoshiyuki Kabashima, Tadashi Wadayama, and Toshiyuki Tanaka. A typical reconstruction limit for compressed sensing based on Lp-norm minimization.Journal of Statistical Mechanics: Theory and Experiment, 2009(09):L09003, 2009
2009
-
[36]
Wainwright
Martin J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso).IEEE Transactions on Information Theory, 55(5):2183–2202, 2009
2009
-
[37]
The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021
Léo Miolane and Andrea Montanari. The distribution of the lasso: Uniform control over sparse balls and adaptive parameter tuning.The Annals of Statistics, 49(4):2313–2335, 2021
2021
-
[38]
Average case analysis of lasso under ultra sparse conditions
Koki Okajima, Xiangming Meng, Takashi Takahashi, and Yoshiyuki Kabashima. Average case analysis of lasso under ultra sparse conditions. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 11317–11330, 2023
2023
-
[39]
Friedman, Trevor Hastie, and Rob Tibshirani
Jerome H. Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent.Journal of Statistical Software, 33(1):1–22, 2010
2010
-
[40]
Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. Distributed opti- mization and statistical learning via the alternating direction method of multipliers.Foundations and Trends in Machine Learning, 3(1):1–122, 2011
2011
-
[41]
An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004
Ingrid Daubechies, Michel; Defrise, and Christine De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint.Communications on Pure and Applied Mathematics, 57(11):1413–1457, 2004
2004
-
[42]
A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009
Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1):183–202, 2009
2009
-
[43]
Nemirovsky and David B
Arkadi S. Nemirovsky and David B. Yudin.Problem Complexity and Method Efficiency in Optimization. Wiley, 1983
1983
-
[44]
Kluwer Academic publishers, 2004
Yurii Nesterov.Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic publishers, 2004
2004
-
[45]
Learning fast approximations of sparse coding
Karol Gregor and Yann LeCun. Learning fast approximations of sparse coding. InProceedings of the 27th International Conference on Machine Learning, pages 399–406, 2010
2010
-
[46]
Efficient and scalable estimation of tool representations in vector space
Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, and Amir Gholami. Efficient and scalable estimation of tool representations in vector space. arXiv preprint arXiv:2409.02141, 2024
arXiv 2024
-
[47]
MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries
Yixuan Tang and Yi Yang. MultiHop-RAG: Benchmarking retrieval-augmented generation for multi-hop queries. InProceedings of the First Conference on Language Modeling, 2024
2024
-
[48]
Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
Pith/arXiv arXiv 2018
-
[49]
Fast lasso algorithm via selective coordinate descent
Yasuhiro Fujiwara, Yasutoshi Ida, Hiroaki Shiokawa, and Sotetsu Iwamura. Fast lasso algorithm via selective coordinate descent. InProceedings of the Thirtieth AAAI Conference on Artificial Intelligence, page 1561–1567, 2016. 12
2016
-
[50]
Fast block coordinate descent for non-convex group regularizations
Yasutoshi Ida, Sekitoshi Kanai, and Atsutoshi Kumagai. Fast block coordinate descent for non-convex group regularizations. InProceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Research, pages 2481–2493, 2023
2023
-
[51]
Fast iterative hard thresholding methods with pruning gradient computations
Yasutoshi Ida, Sekitoshi Kanai, Atsutoshi Kumagai, Tomoharu Iwata, and Yasuhiro Fujiwara. Fast iterative hard thresholding methods with pruning gradient computations. InAdvances in Neural Information Processing Systems, volume 37, pages 52836–52857, 2024
2024
-
[52]
Malkov and D
Yu A. Malkov and D. A. Yashunin. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs.IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(4):824–836, 2020
2020
-
[53]
Accelerating large-scale inference with anisotropic vector quantization
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. Accelerating large-scale inference with anisotropic vector quantization. InProceedings of the 37th International Conference on Machine Learning, pages 3887–3896, 2020
2020
-
[54]
The faiss library.arXiv preprint arXiv:2401.08281, 2024
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre- Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. The faiss library.arXiv preprint arXiv:2401.08281, 2024
Pith/arXiv arXiv 2024
-
[55]
Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016
Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016. A Proofs Setup.Let U= [u 1, . . . , uN]∈R d×N be a matrix of unit vectors and S⊆[N] with |S|=k≥1 . Define the correlation-gap region ΦDR(U, S) = n v∈S d−1 : max j∈S c u⊤ j v <min i∈S u⊤ i vandmin i∈S u⊤ i v >0 o . and let ΦNNN(U, S) be the set of uni...
Pith/arXiv arXiv 2016
-
[56]
Tools/query
The inactive condition (10) forj= 1gives u⊤ 1 v−u ⊤ 1 USw⋆ S = 2 3 − 1√ 2 2 √ 2 3 −λ 1 ! = λ1√ 2 < λ 1,(20) so the full KKT system is satisfied withsupp(w ⋆) =S, givingv∈Φ NNN(U, S). B Reproducibility Information B.1 Datasets We evaluate on five benchmarks. The NumpyBank, PandasBank, and AWSBank datasets in ToolBank ship with their own train / validation ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.