Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval

Wenhao Lu, Jian Jiao, Ruofei Zhang · 2020 · arXiv 0531.341274

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation

cs.IR · 2026-04-04 · accept · novelty 7.0

Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

cs.LG · 2024-02-27 · unverdicted · novelty 7.0

HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.

HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval

cs.IR · 2026-05-22 · unverdicted · novelty 4.0

HARNESS-LM uses teacher fine-tuning, L2 query alignment, and contrastive refinement to distill large SLM retrievers into compact models that recover 98% precision with up to 27x lower latency on Bing Ads benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Tencent Advertising Algorithm Challenge 2025: All-Modality Generative Recommendation cs.IR · 2026-04-04 · accept · none · ref 47
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations cs.LG · 2024-02-27 · unverdicted · none · ref 24
HSTU-based generative recommenders with 1.5 trillion parameters scale as a power law with compute up to GPT-3 scale, outperform baselines by up to 65.8% NDCG, run 5-15x faster than FlashAttention2 on long sequences, and improve online A/B metrics by 12.4%.
HARNESS-LM: A Three-Phase Training Recipe for Harnessing SLMs in Sponsored Search Retrieval cs.IR · 2026-05-22 · unverdicted · none · ref 15
HARNESS-LM uses teacher fine-tuning, L2 query alignment, and contrastive refinement to distill large SLM retrievers into compact models that recover 98% precision with up to 27x lower latency on Bing Ads benchmarks.

Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer