LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
Dhen: A deep and hierarchical ensemble network for large-scale click-through rate prediction
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Releases TencentGR-1M and TencentGR-10M datasets with baselines for all-modality generative recommendation in advertising, including weighted evaluation for conversions.
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
FLAME condenses ensemble diversity into a single network via modular ensemble simulation and guided mutual learning during training, delivering ensemble-level performance with single-network inference speed on sequential recommendation tasks.
PyTorch Fully Sharded Data Parallel enables training of significantly larger models than Distributed Data Parallel with comparable speed and near-linear TFLOPS scaling.
GR2 applies mid-training on semantic IDs, reasoning distillation, RL with conditional verifiable rewards, and a context compressor to re-ranking in industrial recsys, reporting +18.7% R@1 over baselines.
DeMix diagnoses mixed error types in training data via influence-vector-based multi-label classification with an intervention strategy, reporting 22.61% F1 gain and 9.32% downstream improvement on 11 tasks.
Memento applies personalized RAG-style retrieval to long user history for Meta ads models, delivering 5-10x efficiency, sub-10ms latency, and 1% CTR / 1.2% CVR lifts in production.
Empirical scaling of backbone, embeddings, and data shows largely independent additive gains, enabling a deployed model with 2.5x data and 8x compute that delivers +2.6% CVR improvement with minimal latency change.
citing papers explorer
-
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
-
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
-
Efficient Retrieval Scaling with Hierarchical Indexing for Large Scale Recommendation
A jointly learned hierarchical index with cross-attention and residual quantization scales exact retrieval in foundational recommendation models, deployed at Meta with additional performance from test-time training on index nodes.
-
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel
PyTorch Fully Sharded Data Parallel enables training of significantly larger models than Distributed Data Parallel with comparable speed and near-linear TFLOPS scaling.
-
GR2 Technical Report
GR2 applies mid-training on semantic IDs, reasoning distillation, RL with conditional verifiable rewards, and a context compressor to re-ranking in industrial recsys, reporting +18.7% R@1 over baselines.
-
DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors
DeMix diagnoses mixed error types in training data via influence-vector-based multi-label classification with an intervention strategy, reporting 22.61% F1 gain and 9.32% downstream improvement on 11 tasks.
-
Memento: Personalized RAG-Style Long-Retention Data Scaling for META Ads Recommendation
Memento applies personalized RAG-style retrieval to long user history for Meta ads models, delivering 5-10x efficiency, sub-10ms latency, and 1% CTR / 1.2% CVR lifts in production.
-
On the Practice of Scaling Search Conversion Rate Prediction
Empirical scaling of backbone, embeddings, and data shows largely independent additive gains, enabling a deployed model with 2.5x data and 8x compute that delivers +2.6% CVR improvement with minimal latency change.