FML-Bench shows that a simple greedy hill-climber performs nearly as well as complex tree-search agents on ML research tasks, with an adaptive strategy that switches exploration modes outperforming all tested agents.
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias
6 Pith papers cite this work. Polarity classification is still indexing.
abstract
Fairness is an increasingly important concern as machine learning models are used to support decision making in high-stakes applications such as mortgage lending, hiring, and prison sentencing. This paper introduces a new open source Python toolkit for algorithmic fairness, AI Fairness 360 (AIF360), released under an Apache v2.0 license {https://github.com/ibm/aif360). The main objectives of this toolkit are to help facilitate the transition of fairness research algorithms to use in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms. The package includes a comprehensive set of fairness metrics for datasets and models, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It also includes an interactive Web experience (https://aif360.mybluemix.net) that provides a gentle introduction to the concepts and capabilities for line-of-business users, as well as extensive documentation, usage guidance, and industry-specific tutorials to enable data scientists and practitioners to incorporate the most appropriate tool for their problem into their work products. The architecture of the package has been engineered to conform to a standard paradigm used in data science, thereby further improving usability for practitioners. Such architectural design and abstractions enable researchers and developers to extend the toolkit with their new algorithms and improvements, and to use it for performance benchmarking. A built-in testing infrastructure maintains code quality.
representative citing papers
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
Differential parity is proposed as a relative fairness metric between decision sets independent of sensitive attributes, usable with or without a reference set and extendable via ML for mismatched data.
FairLogue provides modular tools to quantify intersectional fairness gaps in clinical ML using extended demographic parity, equalized odds, and counterfactual methods, shown on a glaucoma surgery prediction task from All of Us data.
Proposes a behavioral model of positive friction to characterize beneficial obstacles in AI user experiences and developer processes, diagnose needs, and suggest design solutions.
InsightBoard integrates synchronized multi-metric plots, correlation analysis, and group fairness indicators into TensorBoard to reveal subgroup disparities that aggregate metrics hide during model training.
citing papers explorer
-
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics
FML-Bench shows that a simple greedy hill-climber performs nearly as well as complex tree-search agents on ML research tasks, with an adaptive strategy that switches exploration modes outperforming all tested agents.
-
Towards Reliable Testing of Machine Unlearning
Causal fuzzing with budgeted interventions can detect residual direct and indirect influence of unlearned data that standard attribution methods miss due to proxies, cancellations, and masking.
-
Differential Parity: Relative Fairness Between Two Sets of Decisions
Differential parity is proposed as a relative fairness metric between decision sets independent of sensitive attributes, usable with or without a reference set and extendable via ML for mismatched data.
-
FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models
FairLogue provides modular tools to quantify intersectional fairness gaps in clinical ML using extended demographic parity, equalized odds, and counterfactual methods, shown on a glaucoma surgery prediction task from All of Us data.
-
Exploring a Behavioral Model of "Positive Friction" in Human-AI Interaction
Proposes a behavioral model of positive friction to characterize beneficial obstacles in AI user experiences and developer processes, diagnose needs, and suggest design solutions.
-
InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard
InsightBoard integrates synchronized multi-metric plots, correlation analysis, and group fairness indicators into TensorBoard to reveal subgroup disparities that aggregate metrics hide during model training.