Table of Contents

1 Workflow

1.1 scikit-learn

  • Load
  • Data analysis (numerics vs. categorical)
  • Preprocess Pipeline (features union, impute, fill, scale)
  • Features, Target
  • Estimator (fit)
  • Pickle
  • Version and Deploy
  • Input
  • Preprocess Pipeline
  • Predict

1.2 AWS (Machine Learning)

https://docs.aws.amazon.com/machine-learning/latest/dg/the-machine-learning-process.html

  • Analyze your data
  • Split data into training and evaluation datasources
  • Shuffle your training data
  • Process features
  • Train the model
  • Select model parameters
  • Evaluate the model performance
  • Feature selection
  • Set a score threshold for prediction accuracy
  • Use the model
  • Prediction input to S3
  • Boto3 batch prediction
  • Waiter poll
  • Prediction response (S3)
  • Clean data source, model, and batch prediction resources

1.3 AWS (SageMaker)

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-notebooks-instances.html

  • Explore and Preprocess Data
  • Model Training
  • Model Deployment
  • Validating Models
  • Programming Model

1.4 Google (ML Engine)

https://github.com/GoogleCloudPlatform/cloudml-samples

  • Model
  • Version
  • Framework (e.g., scikit-learn)
  • Notebook: joblib: model.pkl

1.5 Azure (Cognitive Services)

https://github.com/Azure/MachineLearningNotebooks

  • Experiments
  • Pipelines
  • Compute
  • Models
  • Images
  • Deployments
  • Activities

1.5.1 Models

  • Azure notebooks
  • Load data
  • Cleanse data
  • Convert types and filter
  • Split and rename columns
  • Transform data
  • Clean up resources

1.6 IBM

1.7 Oracle

2 Project Pipeline

Task Features v2 Target v2 Clustering Model 2 Model 3
Data Collection          
Data Integration          
Data Cleaning          
Analysis Tools          
Data Analysis          
Feature Engineering          
Pipeline Management          
Model Training          
Tuning          
Model Evaluation          
Configuration          
Deployment          
A/B Testing          
Resource Management          
Feature Extraction          
Target Management          
Model Deprecation          

3 Training and Prediction Input Pipelines

4 Versioning

5 Validation

  • MSE
  • Training Error
  • Resubstitution
  • Hold-out
  • K-fold cross-validation
  • LOOCV
  • Random subsampling
  • Bootstrapping
  • Over-Fit
  • Confidence

6 Exploration

7 Deployment

8 Algorithms

8.1 AWS Machine Learning

Four our purposes we are simply using linear regression (squared loss function and SGD)

9 Models

10 Frameworks

11 Training / Conferences

Author: Jason Walsh

Created: 2019-03-15 Fri 16:12

Validate