ComputeFest 2019: Symposium on Data Science, Machine Learning, and Fairness in Computational Science

Table of Contents

1. Computefest 2019

Hosted by the Institute for Applied Computational Science (IACS), ComputeFest is an annual winter event of knowledge and skill-building activities in computational science, engineering and data science. The workshop content compliments the curriculum taught in DataFest.

IACS Symposium: "Data Science at the Frontier of Discovery: Machine Learning in the Physical World"

Tuesday, January 22nd, 2019 Harvard University Science Center, Hall B, 1 Oxford Street, Cambridge MA 02138

1.3. Model Agnostic Methods for Interpretability and Fairness

  • look at local local perturbations
  • decision boundaries
  • shapeley values

1.3.1. Local Perturbations

  • lime provides local modifications around the input values driving target

https://github.com/marcotcr/lime

  • input gradients around spend and volume 4 month sliding
  • use to clarify impact of feature

https://arxiv.org/abs/1611.07634

  • hold all factors constant
  • example was prob of default relative to debt to income
  • plot holds

1.3.2. BILE Decision Boundary

  • spend and lift -> SpendLiftV6

1.3.3. Shapely Values

  • requires retraining 2^F model retraining to determining interpretability.

1.3.4. Workshop

  • load the training data
  • perform the core splitting based on the features and the labels
  • consider the heatmap as a 3d map where one could apply the facet plot
  • LICE plot uses the values as fixed for the training data then moving each of teh values through
  • consider looking at points near the decision boundary
  • ensure that fairness testing are part of the visualization not the pipeline

1.3.5. Measuring Fairness

  • Statistical parity (same %)
  • Conditional parity (same % but with grouped classifiers)
  • False positive rate

https://github.com/pblankley/interp-workshop-2019

1.5. AI Fairness 360 toolkit (AIF360)

https://github.com/ibm/aif360

How does one increase trust in ML algorithms: 1) fair, 2) repeatable, 3) explainable

Example: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

  • Bias unit tests
  • Create new explainers on source dataset
  • create pre-processing algorithm

1.5.1. Toolbox

  • metrics
  • data set
  • algorithm

1.5.2. Glossary

  • favorable label
  • protected attribute
  • group vs. individual fairness
  • bias
  • fairness metric
  • explainers

1.6. Data Exploration Tools

  • Data Points
  • Describe DF
  • Describe feature + hist