Feature Hashing for Large Scale Multitask Learning

Alex Smola; Anirban Dasgupta; John Langford; Josh Attenberg; Kilian Weinberger

arxiv: 0902.2206 · v5 · submitted 2009-02-12 · 💻 cs.AI

Feature Hashing for Large Scale Multitask Learning

Kilian Weinberger , Anirban Dasgupta , Josh Attenberg , John Langford , Alex Smola This is my paper

classification 💻 cs.AI

keywords hashingfeaturelearningmultitaskapproachboundscasedemonstrate

0 comments

read the original abstract

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
cs.LG 2026-04 unverdicted novelty 6.0

RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence ana...
Applying Graph Analysis for Unsupervised Fast Malware Fingerprinting
cs.CR 2025-10 conditional novelty 5.0

TrapNet applies PCA-based FloatHash vectors and graph community detection to enable unsupervised malware fingerprinting and family attribution from static analysis.
Semantic Product Search
cs.IR 2019-07 unverdicted novelty 5.0

A neural semantic matcher for product search uses a custom loss on behavior data, n-gram pooling, and hashing to beat prior methods by 4.7% Recall@100 and 14.5% MAP.