{"total":15,"items":[{"citing_arxiv_id":"2606.26398","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DinoLink: A Token-Centric Representation Compression Framework for Bandwidth-Constrained Collaborative V2X Perception","primary_cat":"cs.CV","submitted_at":"2026-06-24T21:31:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"DinoLink uses saliency-aware token pruning plus residual vector quantization to cut V2X bitrate by 139x while reporting 32.8% mAP on nuScenes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.21033","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MoECodec: Image Compression for joint human and machine perception via Mixture-of-Experts","primary_cat":"eess.IV","submitted_at":"2026-06-19T01:56:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoECodec replaces FFN layers with token-wise MoE plus stable routing and GShMLP experts to support multiple downstream tasks in a single image compression model.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19574","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference","primary_cat":"eess.IV","submitted_at":"2026-06-17T20:28:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FrequencyFormer co-designs a multi-scale DCT tokenizer, LUT-based near-sensor hardware, and modified MIPI communication to enable frequency-domain ViT inference with up to 128x data reduction and 230x lower communication energy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11631","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Benchmarking Neural Speech Compression from a Rate-Distortion Perspective","primary_cat":"eess.AS","submitted_at":"2026-06-10T03:49:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ECC integrates hyperprior side information, channel-wise context, latent residual prediction, temporal modeling, and entropy skip into a learned entropy model, yielding 39.9% and 76.3% average BD-rate reductions on ViSQOL and PESQ over baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10450","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Few-step Generative Models as Lossy Compression","primary_cat":"cs.CV","submitted_at":"2026-06-09T05:56:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Few-step generative models can be reformulated as lossy codecs in the reverse channel coding framework without retraining, yielding faster encoding/decoding on low-resolution image benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02385","ref_index":281,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"How Optimality Structures Sparse Dictionaries: A Theory for Understanding SAE Representations","primary_cat":"q-bio.NC","submitted_at":"2026-06-01T15:34:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Derives optimality constraints for nonnegative joint dictionary learning that explain observed SAE behaviors such as feature splitting, absorption, and dense antipodal features.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05148","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What Matters in Practical Learned Image Compression","primary_cat":"cs.CV","submitted_at":"2026-05-06T17:17:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A practical learned image codec delivers 2.3-3x bitrate savings over AV1/VVC and 20-40% over prior learned codecs while encoding 12MP images in 230ms on iPhone.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04560","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression","primary_cat":"cs.CV","submitted_at":"2026-05-06T07:03:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02849","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion","primary_cat":"cs.CV","submitted_at":"2026-05-04T17:25:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ActDiff-VC partitions video into segments, transmits adaptive keyframes and budget-aware point trajectories, and reconstructs frames via conditional diffusion, reporting up to 64.6% bitrate reduction at matched NIQE on UVG and MCL-JCV.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25330","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generalizable 3D Gaussian Splatting enabled Semantic Coding for Real-Time Immersive Video Communications","primary_cat":"eess.IV","submitted_at":"2026-04-28T07:46:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GS-SCNet unifies 3D Gaussian Splatting with a disparity-guided semantic codec and direct Gaussian parameter prediction for efficient real-time 3D video communications with strong generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21984","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Soft Anisotropic Diagrams for Differentiable Image Representation","primary_cat":"cs.CV","submitted_at":"2026-04-23T18:07:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SAD is a new explicit differentiable image representation based on soft anisotropic additively weighted Voronoi partitions that achieves higher PSNR and 4-19x faster training than Image-GS and Instant-NGP at matched bitrate.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12625","ref_index":2,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting Environments","primary_cat":"cs.GR","submitted_at":"2026-04-14T11:52:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"NDGI compresses temporal lightmaps via neural feature maps and lightweight networks, delivering high-quality dynamic global illumination with low storage and modest real-time decompression cost.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"method supports random access, enabling on demand de- coding of individual regions without processing the rest. 3.4. Feature Map Compression As shown in Figure 3, we adopt two strategies to further compress the four feature maps. ForF 2D ut andF 2D vt , we apply 8-bit post-training quantization and emulate quanti- zation during training by injecting uniform noise [2, 27]. Concretely, sampling at(u, v, t)yields latent vectorsV ut andV vt, to which we add noise as follows: V ′ ut =V ut +U(−0.5,0.5)·α ut, V ′ vt =V vt +U(−0.5,0.5)·α vt. (6) Here,α ut andα vt are set to be 1 256 in all our experiments, which corresponds to the 8-bit representation. ForF 3D uvt andF 2D uv , unlikeF 2D ut andF 2D vt , these two feature maps are"},{"citing_arxiv_id":"2604.10546","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression","primary_cat":"cs.CV","submitted_at":"2026-04-12T09:25:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RDVQ enables joint rate-distortion optimization for vector-quantized generative image compression via differentiable codebook distribution relaxation and an autoregressive entropy model.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"at low bitrates with a lightweight architecture, achiev- ing competitive or superior perceptual quality with sig- nificantly fewer parameters. Beyond empirical results, RDVQ introduces an entropy-constrained VQ formula- tion connecting representation learning and compression. 2. Related work 2.1. SQvsVQ in image compression In transform coding-based lossy image compression [4, 5, 20, 22, 34, 40, 50, 58], quantization converts continuous latent features into discrete symbols for entropy coding, thereby critically shaping the rate-distortion (RD) trade-off. Most existing methods adopt Scalar Quantization (SQ) [9, 27, 38, 41, 42, 44, 49, 64, 73], which quantizes each latent element independently (e.g., via rounding) or formu-"},{"citing_arxiv_id":"2309.15505","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Finite Scalar Quantization: VQ-VAE Made Simple","primary_cat":"cs.CV","submitted_at":"2023-09-27T09:13:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Finite scalar quantization simplifies VQ-VAE latents by independently rounding a few dimensions to fixed levels, producing an equivalent-sized implicit codebook with competitive performance and no collapse.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.02665","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Blind Image Quality Assessment Using A Deep Bilinear Convolutional Neural Network","primary_cat":"eess.IV","submitted_at":"2019-07-05T03:35:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A bilinear CNN that fuses features from a distortion-type classifier and an image classifier achieves superior BIQA performance on both synthetic and authentic distortion databases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}