Table of Contents

1. Workflow

1.1. scikit-learn

  • Load
  • Data analysis (numeric vs. categorical)
  • Preprocess Pipeline (features union, impute, fill, scale)
  • Features, Target
  • Estimator (fit)
  • Pickle
  • Version and Deploy
  • Input
  • Preprocess Pipeline
  • Predict

1.2. AWS (Machine Learning)

https://docs.aws.amazon.com/machine-learning/latest/dg/the-machine-learning-process.html

  • Analyze your data
  • Split data into training and evaluation datasources
  • Shuffle your training data
  • Process features
  • Train the model
  • Select model parameters
  • Evaluate the model performance
  • Feature selection
  • Set a score threshold for prediction accuracy
  • Use the model
  • Prediction input to S3
  • Boto3 batch prediction
  • Waiter poll
  • Prediction response (S3)
  • Clean data source, model, and batch prediction resources

1.3. AWS (SageMaker)

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-notebooks-instances.html

  • Explore and Preprocess Data
  • Model Training
  • Model Deployment
  • Validating Models
  • Programming Model

1.4. Google (ML Engine)

https://github.com/GoogleCloudPlatform/cloudml-samples

  • Model
  • Version
  • Framework (e.g., scikit-learn)
  • Notebook: joblib: model.pkl

1.5. Azure (Cognitive Services)

https://github.com/Azure/MachineLearningNotebooks

  • Experiments
  • Pipelines
  • Compute
  • Models
  • Images
  • Deployments
  • Activities

1.5.1. Models

  • Azure notebooks
  • Load data
  • Cleanse data
  • Convert types and filter
  • Split and rename columns
  • Transform data
  • Clean up resources
  • Train the automatic regression model
  • Test the best model accuracy

1.6. IBM

1.7. Oracle

2. Project Pipeline

Task Features v2 Target v2 Clustering Model 2 Model 3
Data Collection          
Data Integration          
Data Cleaning          
Analysis Tools          
Data Analysis          
Feature Engineering          
Pipeline Management          
Model Training          
Tuning          
Model Evaluation          
Configuration          
Deployment          
A/B Testing          
Resource Management          
Feature Extraction          
Target Management          
Model Deprecation          

3. Training and Prediction Input Pipelines

4. Versioning

5. Validation

  • MSE
  • Training Error
  • Resubstitution
  • Hold-out
  • K-fold cross-validation
  • LOOCV
  • Random subsampling
  • Bootstrapping
  • Over-Fit
  • Confidence

6. Exploration

7. Deployment

8. Algorithms

8.1. AWS Machine Learning

Four our purposes we are simply using linear regression (squared loss function and SGD)

9. Models

10. Frameworks

11. Training / Conferences

Author: Jason Walsh

Created: 2023-10-24 Tue 12:05

Validate