LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
hub
Wukong: Towards a scaling law for large-scale recommendation
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
TokenFormer unifies multi-field and sequential recommendation modeling via bottom-full-top-sliding attention and non-linear interaction representations to avoid sequential collapse and deliver state-of-the-art performance.
IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance and transferability.
UG-Separation framework disentangles user-side and item-side flows in TokenMixer dense-interaction models to enable reusable user computations, cutting inference latency up to 20% in ByteDance production scenarios.
FLUID introduces LUCID semantic codes from a multimodal encoder to retire item IDs in livestreaming rankers, with staged warmup yielding online gains of +0.55% watch duration and +2.05% cold-start views.
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
DNNs mitigate dimensional collapse of embeddings in feature interaction models, shown via parallel and stacked experiments plus gradient analysis.
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
SilverTorch replaces standalone ANN indexing and filtering with a unified GPU model using a model-based Bloom index and fused Int8 ANN kernel, delivering up to 23.7x throughput and 13.35x cost efficiency gains on industry data.
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.
RecoChain unifies generative candidate generation via hierarchical semantic IDs and SIM-based ranking in a single Transformer to improve top-K recommendation performance.
SIF encodes entire historical raw samples as tokens via hierarchical group-adaptive quantization and token/sample-level mixing to overcome partial encoding and feature heterogeneity limits in scaled recommender models.
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
TASTE dataset and MuQ-token aggregation enable effective use of audio features from large music models to improve content-based music recommendations over collaborative filtering alone.
CMSL uses a learnable module to disentangle user history into multiple pure sequences modeled with linear attention to improve recommendation performance over single-sequence approaches.
FreeScale reduces computational bubbles by up to 90.3% in distributed training of sequence recommendation models on 256 H100 GPUs via load balancing, prioritized embedding overlap, and SM-Free communication.
UniScale couples entire-space data construction with a hierarchical fusion transformer to improve scaling behavior and deliver 1.70% purchase and 2.04% GMV lifts in large-scale e-commerce search A/B tests.
citing papers explorer
-
LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
LoopCTR trains CTR models with recursive layer reuse and process supervision so that zero-loop inference outperforms baselines on public and industrial datasets.
-
TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
TokenFormer unifies multi-field and sequential recommendation modeling via bottom-full-top-sliding attention and non-linear interaction representations to avoid sequential collapse and deliver state-of-the-art performance.
-
IAT: Instance-As-Token Compression for Historical User Sequence Modeling in Industrial Recommender Systems
IAT compresses each historical interaction instance into a unified embedding token via temporal-order or user-order schemes, allowing standard sequence models to learn long-range preferences with better performance and transferability.
-
Compute Only Once: UG-Separation for Efficient Large Recommendation Models
UG-Separation framework disentangles user-side and item-side flows in TokenMixer dense-interaction models to enable reusable user computations, cutting inference latency up to 20% in ByteDance production scenarios.
-
FLUID: From Ephemeral IDs to Multimodal Semantic Codes for Industrial-Scale Livestreaming Recommendation
FLUID introduces LUCID semantic codes from a multimodal encoder to retire item IDs in livestreaming rankers, with staged warmup yielding online gains of +0.55% watch duration and +2.05% cold-start views.
-
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
-
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
DNNs mitigate dimensional collapse of embeddings in feature interaction models, shown via parallel and stacked experiments plus gradient analysis.
-
When Less is More: The LLM Scaling Paradox in Context Compression
Larger LLM compressors in lossy setups often yield less faithful context reconstructions due to knowledge overwriting and semantic drift, with mid-sized models outperforming larger ones across 27 tested configurations.
-
SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs
SilverTorch replaces standalone ANN indexing and filtering with a unified GPU model using a model-based Bloom index and fused Int8 ANN kernel, delivering up to 23.7x throughput and 13.35x cost efficiency gains on industry data.
-
PACEvolve++: Improving Test-time Learning for Evolutionary Search Agents
PACEvolve++ uses a phase-adaptive reinforcement learning advisor to decouple hypothesis selection from execution in LLM-driven evolutionary search, delivering faster convergence than prior frameworks on load balancing, recommendation, and protein tasks.
-
Harmonizing Generative Retrieval and Ranking in Chain-of-Recommendation
RecoChain unifies generative candidate generation via hierarchical semantic IDs and SIM-based ranking in a single Transformer to improve top-K recommendation performance.
-
Sample Is Feature: Beyond Item-Level, Toward Sample-Level Tokens for Unified Large Recommender Models
SIF encodes entire historical raw samples as tokens via hierarchical group-adaptive quantization and token/sample-level mixing to overcome partial encoding and feature heterogeneity limits in scaled recommender models.
-
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
-
Revisiting Content-Based Music Recommendation: Efficient Feature Aggregation from Large-Scale Music Models
TASTE dataset and MuQ-token aggregation enable effective use of audio features from large music models to improve content-based music recommendations over collaborative filtering alone.
-
CMSL: Constructive Multi-Sequence Learning for Recommendation Systems
CMSL uses a learnable module to disentangle user history into multiple pure sequences modeled with linear attention to improve recommendation performance over single-sequence approaches.
-
FreeScale: Distributed Training for Sequence Recommendation Models with Minimal Scaling Cost
FreeScale reduces computational bubbles by up to 90.3% in distributed training of sequence recommendation models on 256 H100 GPUs via load balancing, prioritized embedding overlap, and SM-Free communication.
-
Joint Model Parameter Scaling and Universal-Domain Data Integration for E-commerce Search Ranking
UniScale couples entire-space data construction with a hierarchical fusion transformer to improve scaling behavior and deliver 1.70% purchase and 2.04% GMV lifts in large-scale e-commerce search A/B tests.
- SOLARIS: Speculative Offloading of Latent-bAsed Representation for Inference Scaling