Table of Contents

Workflow

scikit-learn

  • Load
  • Data analysis (numeric vs. categorical)
  • Preprocess Pipeline (features union, impute, fill, scale)
  • Features, Target
  • Estimator (fit)
  • Pickle
  • Version and Deploy
  • Input
  • Preprocess Pipeline
  • Predict

AWS (Machine Learning)

https://docs.aws.amazon.com/machine-learning/latest/dg/the-machine-learning-process.html

  • Analyze your data
  • Split data into training and evaluation datasources
  • Shuffle your training data
  • Process features
  • Train the model
  • Select model parameters
  • Evaluate the model performance
  • Feature selection
  • Set a score threshold for prediction accuracy
  • Use the model
  • Prediction input to S3
  • Boto3 batch prediction
  • Waiter poll
  • Prediction response (S3)
  • Clean data source, model, and batch prediction resources

AWS (SageMaker)

https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-notebooks-instances.html

  • Explore and Preprocess Data
  • Model Training
  • Model Deployment
  • Validating Models
  • Programming Model

Google (ML Engine)

https://github.com/GoogleCloudPlatform/cloudml-samples

  • Model
  • Version
  • Framework (e.g., scikit-learn)
  • Notebook: joblib: model.pkl

Azure (Cognitive Services)

https://github.com/Azure/MachineLearningNotebooks

  • Experiments
  • Pipelines
  • Compute
  • Models
  • Images
  • Deployments
  • Activities

Models

  • Azure notebooks
  • Load data
  • Cleanse data
  • Convert types and filter
  • Split and rename columns
  • Transform data
  • Clean up resources
  • Train the automatic regression model
  • Test the best model accuracy

IBM

Oracle

Project Pipeline

Task Features v2 Target v2 Clustering Model 2 Model 3
Data Collection          
Data Integration          
Data Cleaning          
Analysis Tools          
Data Analysis          
Feature Engineering          
Pipeline Management          
Model Training          
Tuning          
Model Evaluation          
Configuration          
Deployment          
A/B Testing          
Resource Management          
Feature Extraction          
Target Management          
Model Deprecation          

Training and Prediction Input Pipelines

Versioning

Validation

Exploration

Deployment

Algorithms

AWS Machine Learning

Four our purposes we are simply using linear regression (squared loss function and SGD)

Models

Frameworks

Training / Conferences

https://www.eventbrite.com/e/odsc-east-2019-open-data-science-conference-save-50-for-limited-time-tickets-50666130761?aff=ebdssbdestsearch

STAY15