ComputeFest 2019: Symposium on Data Science, Machine Learning, and Fairness in Computational Science
Table of Contents
Computefest 2019
Hosted by the Institute for Applied Computational Science (IACS), ComputeFest is an annual winter event of knowledge and skill-building activities in computational science, engineering and data science. The workshop content compliments the curriculum taught in DataFest.
IACS Symposium: "Data Science at the Frontier of Discovery: Machine Learning in the Physical World"
Tuesday, January 22nd, 2019 Harvard University Science Center, Hall B, 1 Oxford Street, Cambridge MA 02138
Notes
https://christophm.github.io/interpretable-ml-book/ https://www.fatconference.org/2019/program.html
Most of these tools won't work with MLaaS but could be wired into a framework.
On Fairness and Interpretability
Model Agnostic Methods for Interpretability and Fairness
- look at local local perturbations
- decision boundaries
- shapeley values
Local Perturbations
- lime provides local modifications around the input values driving target
https://github.com/marcotcr/lime
- input gradients around spend and volume 4 month sliding
- use to clarify impact of feature
https://arxiv.org/abs/1611.07634
- hold all factors constant
- example was prob of default relative to debt to income
- plot holds
BILE Decision Boundary
- spend and lift -> SpendLiftV6
Shapely Values
- requires retraining 2F model retraining to determining interpretability.
Workshop
- load the training data
- perform the core splitting based on the features and the labels
- consider the heatmap as a 3d map where one could apply the facet plot
- LICE plot uses the values as fixed for the training data then moving each of teh values through
- consider looking at points near the decision boundary
- ensure that fairness testing are part of the visualization not the pipeline
Measuring Fairness
- Statistical parity (same %)
- Conditional parity (same % but with grouped classifiers)
- False positive rate
What-If Tool
https://ai.google/research/teams/brain/pair
- change data (e.g., LIME and LICE)
- grouping
https://fairmlclass.github.io/
https://pypi.org/project/witwidget/
https://github.com/PAIR-code/what-if-tool
https://github.com/PAIR-code/what-if-tool/blob/master/computefest.md
COMPAS
Fairness Race
WIT Toxicity Text Model Comparison
AI Fairness 360 toolkit (AIF360)
How does one increase trust in ML algorithms: 1) fair, 2) repeatable, 3) explainable
Example: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
- Bias unit tests
- Create new explainers on source dataset
- create pre-processing algorithm
Toolbox
- metrics
- data set
- algorithm
Glossary
- favorable label
- protected attribute
- group vs. individual fairness
- bias
- fairness metric
- explainers
Data Exploration Tools
- Data Points
- Describe DF
- Describe feature + hist