With Automated ML, is Everyone an ML Engineer?

With AutomatedML,
Is Everyonean ML Engineer?
Dan Sullivan- DevFest Northeast 2020 – October 31,2020

Bio
• Principle Engineer, PEAK6 Technologies
• Author
• Instructor
• Udemy
• Google Cloud
• LinkedIn Learning
• Data Science
• Machine Learning
• Databases & Data Modeling

Overview
• Machine Learning Workflow
• Formulating an ML Problem
• Building ML Models in GCP
• Data Engineering
• Monitoring and Evaluating Fairness

0. Machine Learning Workflow
• Formulate problem
• Identity data sources
• Prepare data
• Train, evaluate, and tune model
• Deploy model
• Use model in production
• Monitor and Evaluate Fairness
https://thenounproject.com/term/workflow/2409348/

Define the Problem to
Be Solved
• Informal description
• What is the value of
solving the problem?
• How can the problem
be solved
• Regression
• Classification
https://static.thenounproject.com/png/230138-200.png

IdentifyData Sources
• Amount of data available
• Quality
• Rate of generation
• Requirements to access
• Limitations on the use of data
https://commons.wikimedia.org/wiki/File:Data_types_-_en.svg

Services for BuildingModels
• Cloud AutoML
• AI Platform Training
• Kubeflow
• Dataproc with Spark ML
• BigQuery ML

CloudAutoML
• Designed for model builders with limited
ML experience
• GUI for training, evaluating, and tuning
• Services for sight, language, and
structured data
• AutoML Tables uses structured data to
build regression and classification
models

AI Platform Training
• Trains and runs models built in
• Tensorflow
• Scikit Learn
• XGBoost
• Hosted frameworks but can run custom containers
• Service provisions compute resources needed for a job and
then executes the job

Kubeflow
• Kubeflow is machine learning toolkit for
Kubernetes
• Packages models like applications
• Compose, deploy, and manage ML
workflows

Dataproc and Spark ML
• Dataproc is managed Spark and
Hadoop service
• Spark ML is machine learning
library
• ML Algorithms
• Feature engineering
• Pipelines
• Persistence
• Utilities

BigQueryML
• BigQuery is serverless analytical database
• BigQuery ML brings machine learning
functions to SQL
• Key advantages are:
• Ability to train and run models in
BigQuery
• Use SQL, not Python or ML frameworks

CloudComposer
• Managed Apache Airflow Service
• Executes workflows defined in directed
acyclic graphs (DAGs)
• Accessed through console or command
line (gcloud composer environments)

Apache AirflowDAG
• Workflow is a collection of tasks with
dependencies.
• DAGs stored in Cloud Storage
• Supports custom plugins for
operators, hooks, and interfaces
• Python dependencies (packages)

DAGs are Python Programs
Source: https://cloud.google.com/composer/docs/how-to/using/writing-dags

CloudComposer Environments
• Deployed in environments, which are collections of
GCP service components based on Kubernetes Engine
• Uses a combination of tenant and customer project
resources

Architecture
Source: https://cloud.google.com/composer/docs/concepts/overview

CloudData Fusion
• Managed service based on open source CDAP
data analytics platform
• Code-free ETL/ELT development tool
• Over 160 connectors and transformations
• Drag-and-drop ETL/ELT construction

ExecutionEnvironment
• Cloud Data Fusion deployed as an instance
• Two editions
• Basic – visual designer, transformations, SDK, etc.
• Enterprise – Basic plus streaming pipelines,
integration metadata repository, high availability,
triggers, schedules, etc.

Visual Interface
Isource: https://cloud.google.com/data-fusion/docs/quickstart

4. Monitoringand Evaluating Fairness

Objective
• Understand performance of model
• Metrics to monitor:
• Traffic patterns
• Error rates
• Latency
• Resource utilization
• Configure alerts in Cloud Monitoring

MonitoringAI Platforms
• AI Platform exports metrics to
Cloud Monitoring
• Metrics
• Error count
• Latencies
• Accelerator utilization
• Memory utilization
• CPU utilization
• Network
• Prediction counts

MonitoringML ModelBest Practices
• Monitor for data skew
• Watch for changes in dependencies
• Models are refreshed as needed
• Assess model prediction quality
• Test for unfairness

Fairness
• Anti-classification
• Protected attributes not used in model
• Example: gender
• Classification parity
• Measures of predictive performance are equal across
groups
• Calibration
• Outcomes are independent of protected attributes

Fairness Resources
• Google’s Machine Learning Fairness
• https://developers.google.com/machine-
learning/fairness-overview
• AI Fairness 360 https://github.com/IBM/AIF360
• FairML https://github.com/adebayoj/fairml
• What-If Tool https://pair-code.github.io/what-if-tool/

QuickSummary
• Machine learning workflows
are multi-step
• Automated ML addresses
some, but not all steps
• Lots of data engineering and
monitoring still required

With Automated ML, is Everyone an ML Engineer?

More Related Content

What's hot

Similar to With Automated ML, is Everyone an ML Engineer?

More from Dan Sullivan, Ph.D.

Recently uploaded

With Automated ML, is Everyone an ML Engineer?