Dynamic quantization creates side channels allowing partial or full recovery of other users' batched data in at least four popular ML frameworks.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
DharmaOCR models reach 0.925 and 0.911 extraction scores with 0.40% and 0.20% degeneration rates on a new benchmark covering printed, handwritten, and legal documents, outperforming open-source and commercial baselines via SFT plus DPO.
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
QuIDE defines the Intelligence Index I = (C × P) / log₂(T+1) as a unified score for the compression-accuracy-latency trade-off in quantized neural networks, with experiments showing task-dependent optimal bit widths.
Knowledge distillation trains a 3.9x smaller YOLO student to retain 14.5% higher precision than direct training under INT8 quantization on BDD100K, exceeding the large teacher's FP32 precision while cutting false alarms.
Adaptive bit-length schedulers plus Laplacian DP in non-IID FL reduce communicated data by up to 52.64% on MNIST and 45% on CIFAR-10 while keeping competitive accuracy and privacy.
citing papers explorer
-
Quantamination: Dynamic Quantization Leaks Your Data Across the Batch
Dynamic quantization creates side channels allowing partial or full recovery of other users' batched data in at least four popular ML frameworks.
-
DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines
DharmaOCR models reach 0.925 and 0.911 extraction scores with 0.40% and 0.20% degeneration rates on a new benchmark covering printed, handwritten, and legal documents, outperforming open-source and commercial baselines via SFT plus DPO.
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
-
QuIDE: Mastering the Quantized Intelligence Trade-off via Active Optimization
QuIDE defines the Intelligence Index I = (C × P) / log₂(T+1) as a unified score for the compression-accuracy-latency trade-off in quantized neural networks, with experiments showing task-dependent optimal bit widths.
-
Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation
Knowledge distillation trains a 3.9x smaller YOLO student to retain 14.5% higher precision than direct training under INT8 quantization on BDD100K, exceeding the large teacher's FP32 precision while cutting false alarms.
-
Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy
Adaptive bit-length schedulers plus Laplacian DP in non-IID FL reduce communicated data by up to 52.64% on MNIST and 45% on CIFAR-10 while keeping competitive accuracy and privacy.