pith. sign in

arxiv: 1709.02327 · v1 · pith:52USBR36new · submitted 2017-09-07 · 💻 cs.DC · cs.LG· stat.ML

Feature selection in high-dimensional dataset using MapReduce

classification 💻 cs.DC cs.LGstat.ML
keywords datasetsfeatureimplementationmapreduceselectionalgorithmapproachbioinformatics
0
0 comments X
read the original abstract

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.