A unified threat model and evaluation framework is developed to compare privacy-preserving methods for distributed learning in IoT, showing trade-offs in privacy robustness and system efficiency with Bloom filter encodings highlighted for low overhead.
Bloom Filter Encoding for Machine Learning
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We present a method that uses a Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact bit-array representation using hash-based encoding, producing a fixed-length feature space that reduces memory usage and obfuscates original feature values. The encoding does not rely on keyed hashing; however, a key can optionally be used to control the mapping and would be required to reproduce the representation. We evaluate the approach on six datasets spanning text, time-series, tabular, and image domains: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST. Four classifiers are considered: Extreme Gradient Boosting, Deep Neural Networks, Convolutional Neural Networks, and Logistic Regression. Results show that models trained on Bloom filter encodings achieve performance comparable to models trained on raw data or standard dimensionality reduction techniques across several datasets, while providing consistent memory savings. These findings suggest that Bloom filter encodings can serve as an efficient, general-purpose pre-processing representation that preserves useful similarity structure for learning tasks while providing a degree of data obfuscation.
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Privacy-Preserving Distributed Learning in IoT Systems: A Unified Threat Model and Evaluation Framework
A unified threat model and evaluation framework is developed to compare privacy-preserving methods for distributed learning in IoT, showing trade-offs in privacy robustness and system efficiency with Bloom filter encodings highlighted for low overhead.