KEMBAR78
Agile experiments in Machine Learning with F# | PDF
Agile Experiments in Machine
Learning
About me
• Mathias @brandewinder
• F# & Machine Learning
• Based in San Francisco
• I do have a tiny accent 
Why this talk?
• Machine learning competition as a team
• Team work requires process
• Code, but “subtly different”
• Statically typed functional with F#
These are unfinished thoughts
Code on GitHub
• JamesSDixon/Kaggle.HomeDepot
• mathias-brandewinder/Presentations
Plan
• The problem
• Creating & iterating Models
• Pre-processing of Data
• Parting thoughts
Kaggle Home Depot
Team & Results
• Jamie Dixon
(@jamie_Dixon), Taylor
Wood (@squeekeeper), &
alii
• Final ranking: 122nd/2125
(top 6%)
The question
“6 inch damper”
“Battic Door Energy Conservation
Products Premium 6 in. Back Draft
Damper”
Is this any good?
Search Product
The data
"Simpson Strong-Tie 12-Gauge Angle","l bracket",2.5
"BEHR Premium Textured DeckOver 1-gal. #SC-141 Tugboat Wood and
Concrete Coating","deck over",3
"Delta Vero 1-Handle Shower Only Faucet Trim Kit in Chrome (Valve Not
Included)","rain shower head",2.33
"Toro Personal Pace Recycler 22 in. Variable Speed Self-Propelled Gas
Lawn Mower with Briggs & Stratton Engine","honda mower",2
"Hampton Bay Caramel Simple Weave Bamboo Rollup Shade - 96 in. W x 72
in. L","hampton bay chestnut pull up shade",2.67
"InSinkErator SinkTop Switch Single Outlet for InSinkErator
Disposers","disposer",2.67
"Sunjoy Calais 8 ft. x 5 ft. x 8 ft. Steel Tile Fabric Grill
Gazebo","grill gazebo",3
...
The problem
• Given a Search, and the Product that was recommended,
• Predict how Relevant the recommendation is,
• Rated from terrible (1.0) to awesome (3.0).
The competition
• 70,000 training examples
• 20,000 search + product to predict
• Smallest RMSE* wins
• About 3 months
*RMSE ~ average distance between correct and predicted values
Machine
Learning
Experiments in Code
An obvious solution
// domain model
type Observation = {
Search: string
Product: string
}
// prediction function
let predict (obs:Observation) = 2.0
So… Are we done?
Code, but…
• Domain is trivial
• No obvious tests to write
• Correctness is (mostly) unimportant
What are we trying to do here?
We will change the function predict,
over and over and over again,
trying to be creative, and
come up with a predict function that
fits the data better.
Observation
• Single feature
• Never complete, no binary test
• Many experiments
• Possibly in parallel
• No “correct” model - any model could work. If it performs
better, it is better.
Experiments
We care about “something”
What we want
Observation Model Prediction
What we really mean
Observation Model Prediction
x1, x2, x3 f(x1, x2, x3) y
We formulate a model
What we have
Observation Result
Observation Result
Observation Result
Observation Result
Observation Result
Observation Result
We calibrate the model
0
10
20
30
40
50
60
0 2 4 6 8 10 12
Prediction is very difficult,
especially if it’s about the
future.
We validate the model
… which becomes the
“current best truth”
Overall process
Formulate model
Calibrate model
Validate model
ML: experiments in code
Formulate model: features
Calibrate model: learn
Validate model
Modelling
• Transform Observation into Vector
• Ex: Search length, % matching words, …
• [17.0; 0.35; 3.5; …]
• Learn f, such that f(vector)~Relevance
Learning with Algorithms
Validating
• Leave some of the data out
• Learn on part of the data
• Evaluate performance on the rest
Recap
• Traditional software: incrementally build solutions by
completing discrete features,
• Machine Learning: create experiments, hoping to improve
a predictor
• Traditional process likely inadequate
Practice
How the Sausage is Made
How does it look?
// load data
// extract features as vectors
// use some algorithm to learn
// check how good/bad the model does
An example
What are the problems?
• Hard to track features
• Hard to swap algorithm
• Repeat same steps
• Code doesn’t reflect what we are after
wasteful
ˈweɪstfʊl,-f(ə)l/
adjective
1. (of a person, action, or process) using or
expending something of value carelessly,
extravagantly, or to no purpose.
To avoid waste,
build flexibility where
there is volatility,
and automate repeatable steps.
Strategy
• Use types to represent what we are doing
• Automate everything that doesn’t change: data loading,
algorithm learning, evaluation
• Make what changes often (and is valuable) easy to
change: creation of features
Core model
type Observation = {
Search: string
Product: string }
type Relevance : float
type Predictor = Observation -> Relevance
type Feature = Observation -> float
type Example = Relevance * Observation
type Model = Feature []
type Learning = Model -> Example [] -> Predictor
“Catalog of Features”
let ``search length`` : Feature =
fun obs -> obs.Search.Length |> float
let ``product title length`` : Feature =
fun obs -> obs.Product.Length |> float
let ``matching words`` : Feature =
fun obs ->
let w1 = obs.Search.Split ' ' |> set
let w2 = obs.Product.Split ' ' |> set
Set.intersect w1 w2 |> Set.count |> float
Experiments
// shared/common data loading code
let model = [|
``search length``
``product title length``
``matching words``
|]
let predictor = RandomForest.regression model training
Let quality = evaluate predictor validation
Feature 1
…
Feature 2
Feature 3
Algorithm 1
Algorithm 2
Algorithm 3
…
Feature 1
Feature 3
Algorithm 2
Data
Validation
Experiment/Model
Shared / Reusable
Example, revisited
Food for thought
• Use types for modelling
• Model the process, not the entity
• Cross-validation replaces tests
Domain modelling?
// Object oriented style
type Observation = {
Search: string
Product: string }
with member this.SearchLength =
this.Search.Length
// Properties as functions
type Observation = {
Search: string
Product: string }
let searchLength (obs:Observation) =
obs.Search.Length
// "object" as a bag of functions
let model = [
fun obs -> searchLength obs
]
Did it work?
Recap
• F# Types to model Domain with common “language”
across scripts
• Separate code elements by role, to enable focusing on
high value activity, the creation of features
The unbearable
heaviness of data
Reproducible research
• Anyone must be able to re-compute everything, from
scratch
• Model is meaningless without the data
• Don’t tamper with the source data
• Script everything
Analogy: Source Control + Automated Build
If I check out code from source control,
it should work.
One simple main idea:
does the Search query look like the Product?
Dataset normalization
• “ductless air conditioners”, “GREE Ultra
Efficient 18,000 BTU (1.5Ton) Ductless
(Duct Free) Mini Split Air Conditioner with
Inverter, Heat, Remote 208-230V”
• “6 inch damper”,”Battic Door Energy
Conservation Products Premium 6 in. Back
Draft Damper”,
• “10000 btu windowair conditioner”, “GE
10,000 BTU 115-Volt Electronic Window Air
Conditioner with Remote”
Pre-processing pipeline
let normalize (txt:string) =
txt
|> fixPunctuation
|> fixThousands
|> cleanUnits
|> fixMisspellings
|> etc…
Lesson learnt
• Pre-processing data matters
• Pre-processing is slow
• Also, Regex. Plenty of Regex.
Tension
Keep data intact
& regenerate outputs
vs.
Cache intermediate results
There are only two hard problems
in computer science.
Cache invalidation, and
being willing to relocate to San Francisco.
Observations
• If re-computing everything is fast –
then re-compute everything, every time.
• Can you isolate causes of change?
Feature 1
…
Feature 2
Feature 3
Algorithm 1
Algorithm 2
Algorithm 3
…
Feature 1
Feature 3
Algorithm 2
Data
Validation
Experiment/Model
Shared / Reusable
Pre-Processing
Cache
Conclusion
General
• Don’t be religious about process
• Why do you follow a process?
• Identify where you waste energy
• Build flexibility around volatility
• Automate the repeatable parts
Statically typed functional
• Super clean scripts / data pipelines
• Types help define clear domain models
• Types prevent dumb mistakes
Open questions
• Better way to version features?
• Experiment is not an entity?
• Is pre-processing a feature?
• Something missing in overall versioning
• Better understanding of data/code dependencies (reuse
computation, …)
Shameless plug
I have a book out, “Machine
Learning projects for .NET
developers”, Apress
Thank you 
@brandewinder /
brandewinder.com
• Come chat if you are interested in
the topic!
• Check out fsharp.org…

Agile experiments in Machine Learning with F#

  • 1.
    Agile Experiments inMachine Learning
  • 2.
    About me • Mathias@brandewinder • F# & Machine Learning • Based in San Francisco • I do have a tiny accent 
  • 3.
    Why this talk? •Machine learning competition as a team • Team work requires process • Code, but “subtly different” • Statically typed functional with F#
  • 4.
  • 5.
    Code on GitHub •JamesSDixon/Kaggle.HomeDepot • mathias-brandewinder/Presentations
  • 6.
    Plan • The problem •Creating & iterating Models • Pre-processing of Data • Parting thoughts
  • 7.
  • 8.
    Team & Results •Jamie Dixon (@jamie_Dixon), Taylor Wood (@squeekeeper), & alii • Final ranking: 122nd/2125 (top 6%)
  • 9.
    The question “6 inchdamper” “Battic Door Energy Conservation Products Premium 6 in. Back Draft Damper” Is this any good? Search Product
  • 10.
    The data "Simpson Strong-Tie12-Gauge Angle","l bracket",2.5 "BEHR Premium Textured DeckOver 1-gal. #SC-141 Tugboat Wood and Concrete Coating","deck over",3 "Delta Vero 1-Handle Shower Only Faucet Trim Kit in Chrome (Valve Not Included)","rain shower head",2.33 "Toro Personal Pace Recycler 22 in. Variable Speed Self-Propelled Gas Lawn Mower with Briggs & Stratton Engine","honda mower",2 "Hampton Bay Caramel Simple Weave Bamboo Rollup Shade - 96 in. W x 72 in. L","hampton bay chestnut pull up shade",2.67 "InSinkErator SinkTop Switch Single Outlet for InSinkErator Disposers","disposer",2.67 "Sunjoy Calais 8 ft. x 5 ft. x 8 ft. Steel Tile Fabric Grill Gazebo","grill gazebo",3 ...
  • 11.
    The problem • Givena Search, and the Product that was recommended, • Predict how Relevant the recommendation is, • Rated from terrible (1.0) to awesome (3.0).
  • 12.
    The competition • 70,000training examples • 20,000 search + product to predict • Smallest RMSE* wins • About 3 months *RMSE ~ average distance between correct and predicted values
  • 13.
  • 14.
    An obvious solution //domain model type Observation = { Search: string Product: string } // prediction function let predict (obs:Observation) = 2.0
  • 15.
  • 16.
    Code, but… • Domainis trivial • No obvious tests to write • Correctness is (mostly) unimportant What are we trying to do here?
  • 17.
    We will changethe function predict, over and over and over again, trying to be creative, and come up with a predict function that fits the data better.
  • 18.
    Observation • Single feature •Never complete, no binary test • Many experiments • Possibly in parallel • No “correct” model - any model could work. If it performs better, it is better.
  • 19.
  • 20.
    We care about“something”
  • 21.
    What we want ObservationModel Prediction
  • 22.
    What we reallymean Observation Model Prediction x1, x2, x3 f(x1, x2, x3) y
  • 23.
  • 24.
    What we have ObservationResult Observation Result Observation Result Observation Result Observation Result Observation Result
  • 25.
    We calibrate themodel 0 10 20 30 40 50 60 0 2 4 6 8 10 12
  • 26.
    Prediction is verydifficult, especially if it’s about the future.
  • 27.
    We validate themodel … which becomes the “current best truth”
  • 28.
  • 29.
    ML: experiments incode Formulate model: features Calibrate model: learn Validate model
  • 30.
    Modelling • Transform Observationinto Vector • Ex: Search length, % matching words, … • [17.0; 0.35; 3.5; …] • Learn f, such that f(vector)~Relevance
  • 31.
  • 32.
    Validating • Leave someof the data out • Learn on part of the data • Evaluate performance on the rest
  • 33.
    Recap • Traditional software:incrementally build solutions by completing discrete features, • Machine Learning: create experiments, hoping to improve a predictor • Traditional process likely inadequate
  • 34.
  • 35.
    How does itlook? // load data // extract features as vectors // use some algorithm to learn // check how good/bad the model does
  • 36.
  • 37.
    What are theproblems? • Hard to track features • Hard to swap algorithm • Repeat same steps • Code doesn’t reflect what we are after
  • 38.
    wasteful ˈweɪstfʊl,-f(ə)l/ adjective 1. (of aperson, action, or process) using or expending something of value carelessly, extravagantly, or to no purpose.
  • 39.
    To avoid waste, buildflexibility where there is volatility, and automate repeatable steps.
  • 40.
    Strategy • Use typesto represent what we are doing • Automate everything that doesn’t change: data loading, algorithm learning, evaluation • Make what changes often (and is valuable) easy to change: creation of features
  • 41.
    Core model type Observation= { Search: string Product: string } type Relevance : float type Predictor = Observation -> Relevance type Feature = Observation -> float type Example = Relevance * Observation type Model = Feature [] type Learning = Model -> Example [] -> Predictor
  • 42.
    “Catalog of Features” let``search length`` : Feature = fun obs -> obs.Search.Length |> float let ``product title length`` : Feature = fun obs -> obs.Product.Length |> float let ``matching words`` : Feature = fun obs -> let w1 = obs.Search.Split ' ' |> set let w2 = obs.Product.Split ' ' |> set Set.intersect w1 w2 |> Set.count |> float
  • 43.
    Experiments // shared/common dataloading code let model = [| ``search length`` ``product title length`` ``matching words`` |] let predictor = RandomForest.regression model training Let quality = evaluate predictor validation
  • 44.
    Feature 1 … Feature 2 Feature3 Algorithm 1 Algorithm 2 Algorithm 3 … Feature 1 Feature 3 Algorithm 2 Data Validation Experiment/Model Shared / Reusable
  • 45.
  • 46.
    Food for thought •Use types for modelling • Model the process, not the entity • Cross-validation replaces tests
  • 47.
    Domain modelling? // Objectoriented style type Observation = { Search: string Product: string } with member this.SearchLength = this.Search.Length // Properties as functions type Observation = { Search: string Product: string } let searchLength (obs:Observation) = obs.Search.Length // "object" as a bag of functions let model = [ fun obs -> searchLength obs ]
  • 48.
  • 49.
    Recap • F# Typesto model Domain with common “language” across scripts • Separate code elements by role, to enable focusing on high value activity, the creation of features
  • 50.
  • 51.
    Reproducible research • Anyonemust be able to re-compute everything, from scratch • Model is meaningless without the data • Don’t tamper with the source data • Script everything
  • 52.
    Analogy: Source Control+ Automated Build If I check out code from source control, it should work.
  • 53.
    One simple mainidea: does the Search query look like the Product?
  • 54.
    Dataset normalization • “ductlessair conditioners”, “GREE Ultra Efficient 18,000 BTU (1.5Ton) Ductless (Duct Free) Mini Split Air Conditioner with Inverter, Heat, Remote 208-230V” • “6 inch damper”,”Battic Door Energy Conservation Products Premium 6 in. Back Draft Damper”, • “10000 btu windowair conditioner”, “GE 10,000 BTU 115-Volt Electronic Window Air Conditioner with Remote”
  • 55.
    Pre-processing pipeline let normalize(txt:string) = txt |> fixPunctuation |> fixThousands |> cleanUnits |> fixMisspellings |> etc…
  • 56.
    Lesson learnt • Pre-processingdata matters • Pre-processing is slow • Also, Regex. Plenty of Regex.
  • 57.
    Tension Keep data intact &regenerate outputs vs. Cache intermediate results
  • 58.
    There are onlytwo hard problems in computer science. Cache invalidation, and being willing to relocate to San Francisco.
  • 59.
    Observations • If re-computingeverything is fast – then re-compute everything, every time. • Can you isolate causes of change?
  • 60.
    Feature 1 … Feature 2 Feature3 Algorithm 1 Algorithm 2 Algorithm 3 … Feature 1 Feature 3 Algorithm 2 Data Validation Experiment/Model Shared / Reusable Pre-Processing Cache
  • 61.
  • 62.
    General • Don’t bereligious about process • Why do you follow a process? • Identify where you waste energy • Build flexibility around volatility • Automate the repeatable parts
  • 63.
    Statically typed functional •Super clean scripts / data pipelines • Types help define clear domain models • Types prevent dumb mistakes
  • 64.
    Open questions • Betterway to version features? • Experiment is not an entity? • Is pre-processing a feature? • Something missing in overall versioning • Better understanding of data/code dependencies (reuse computation, …)
  • 65.
    Shameless plug I havea book out, “Machine Learning projects for .NET developers”, Apress
  • 66.
    Thank you  @brandewinder/ brandewinder.com • Come chat if you are interested in the topic! • Check out fsharp.org…