KEMBAR78
the machine learning in artificial intelligence | PPT
CHAPTER 1:
Introduction
2
Why “Learn”?
īŽ Machine learning is programming computers to
optimize a performance criterion using example
data or past experience.
īŽ There is no need to “learn” to calculate payroll
īŽ Learning is used when:
 Human expertise does not exist (navigating on Mars),
 Humans are unable to explain their expertise (speech
recognition)
 Solution changes in time (routing on a computer network)
 Solution needs to be adapted to particular cases (user
biometrics)
3
What We Talk About When We
Talk About“Learning”
īŽ Learning general models from a data of particular
examples
īŽ Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
īŽ Example in retail: Customer transactions to
consumer behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
īŽ Build a model that is a good and useful
approximation to the data.
4
Data Mining/KDD
īŽ Retail: Market basket analysis, Customer relationship
management (CRM)
īŽ Finance: Credit scoring, fraud detection
īŽ Manufacturing: Optimization, troubleshooting
īŽ Medicine: Medical diagnosis
īŽ Telecommunications: Quality of service optimization
īŽ Bioinformatics: Motifs, alignment
īŽ Web mining: Search engines
īŽ ...
Definition := “KDD is the non-trivial process of
identifying valid, novel, potentially useful, and
ultimately understandable patterns in data” (Fayyad)
Applications:
5
What is Machine Learning?
īŽ Machine Learning
 Study of algorithms that
 improve their performance
 at some task
 with experience
īŽ Optimize a performance criterion using example data or
past experience.
īŽ Role of Statistics: Inference from a sample
īŽ Role of Computer science: Efficient algorithms to
 Solve the optimization problem
 Representing and evaluating the model for inference
Growth of Machine Learning
īŽ Machine learning is preferred approach to
 Speech recognition, Natural language processing
 Computer vision
 Medical outcomes analysis
 Robot control
 Computational biology
īŽ This trend is accelerating
 Improved machine learning algorithms
 Improved data capture, networking, faster computers
 Software too complex to write by hand
 New sensors / IO devices
 Demand for self-customization to user, environment
 It turns out to be difficult to extract knowledge from human expertsfailure
of expert systems in the 1980’s.
Alpydin & Ch. Eick: ML Topic1
6
7
Applications
īŽ Association Analysis
īŽ Supervised Learning
 Classification
 Regression/Prediction
īŽ Unsupervised Learning
īŽ Reinforcement Learning
Learning Associations
īŽ Basket analysis:
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
9
Classification
īŽ Example: Credit
scoring
īŽ Differentiating
between low-risk
and high-risk
customers from
their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Model
10
Classification: Applications
īŽ Aka Pattern recognition
īŽ Face recognition: Pose, lighting, occlusion (glasses,
beard), make-up, hair style
īŽ Character recognition: Different handwriting styles.
īŽ Speech recognition: Temporal dependency.
 Use of a dictionary or the syntax of the language.
 Sensor fusion: Combine multiple modalities; eg, visual (lip
image) and acoustic for speech
īŽ Medical diagnosis: From symptoms to illnesses
īŽ Web Advertizing: Predict if a user clicks on an ad on
the Internet.
11
Face Recognition
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK
http://www.uk.research.att.com/facedatabase.html
12
Prediction: Regression
īŽ Example: Price of a
used car
īŽ x : car attributes
y : price
y = g (x | Î¸ī€ )
g ( ) model,
ī€‰Î¸ parameters
y = wx+w0
13
Regression Applications
īŽ Navigating a car: Angle of the steering wheel (CMU
NavLab)
īŽ Kinematics of a robot arm
Îą1= g1(x,y)
Îą2= g2(x,y)
Îą1
Îą2
(x,y)
14
Supervised Learning: Uses
īŽ Prediction of future cases: Use the rule to predict
the output for future inputs
īŽ Knowledge extraction: The rule is easy to
understand
īŽ Compression: The rule is simpler than the data it
explains
īŽ Outlier detection: Exceptions that are not covered
by the rule, e.g., fraud
Example: decision trees tools that create rules
15
Unsupervised Learning
īŽ Learning “what normally happens”
īŽ No output
īŽ Clustering: Grouping similar instances
īŽ Other applications: Summarization, Association
Analysis
īŽ Example applications
 Customer segmentation in CRM
 Image compression: Color quantization
 Bioinformatics: Learning motifs
16
Reinforcement Learning
īŽ Topics:
 Policies: what actions should an agent take in a particular
situation
 Utility estimation: how good is a state (used by policy)
īŽ No supervised output but delayed reward
īŽ Credit assignment problem (what was responsible for the
outcome)
īŽ Applications:
 Game playing
 Robot in a maze
 Multiple agents, partial observability, ...
17
Resources: Datasets
īŽ UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
īŽ UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.html
īŽ Statlib: http://lib.stat.cmu.edu/
īŽ Delve: http://www.cs.utoronto.ca/~delve/
18
Resources: Journals
īŽ Journal of Machine Learning Research
www.jmlr.org
īŽ Machine Learning
īŽ IEEE Transactions on Neural Networks
īŽ IEEE Transactions on Pattern Analysis and Machine
Intelligence
īŽ Annals of Statistics
īŽ Journal of the American Statistical Association
īŽ ...
19
Resources: Conferences
īŽ International Conference on Machine Learning (ICML)
īŽ European Conference on Machine Learning (ECML)
īŽ Neural Information Processing Systems (NIPS)
īŽ Computational Learning
īŽ International Joint Conference on Artificial Intelligence (IJCAI)
īŽ ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD)
īŽ IEEE Int. Conf. on Data Mining (ICDM)
Summary COSC 6342
īŽ Introductory course that covers a wide range of machine
learning techniques—from basic to state-of-the-art.
īŽ More theoretical/statistics oriented, compared to other
courses I teach might need continuous work not “to get
lost”.
īŽ You will learn about the methods you heard about: Naïve
Bayes’, belief networks, regression, nearest-neighbor (kNN), decision
trees, support vector machines, learning ensembles, over-fitting,
regularization, dimensionality reduction & PCA, error bounds,
parameter estimation, mixture models, comparing models, density
estimation, clustering centering on K-means, EM, and DBSCAN, active
and reinforcement learning.
īŽ Covers algorithms, theory and applications
īŽ It’s going to be fun and hard work
Alpydin & Ch. Eick: ML Topic1
20
Which Topics Deserve More Coverage
—if we had more time?
īŽ Graphical Models/Belief Networks (just ran out of time)
īŽ More on Adaptive Systems
īŽ Learning Theory
īŽ More on Clustering and Association Analysiscovered
by Data Mining Course
īŽ More on Feature Selection, Feature Creation
īŽ More on Prediction
īŽ Possibly: More depth coverage of optimization
techniques, neural networks, hidden Markov models,
how to conduct a machine learning experiment,
comparing machine learning algorithms,â€Ļ
Alpydin & Ch. Eick: ML Topic1
21

the machine learning in artificial intelligence

  • 1.
  • 2.
    2 Why “Learn”? īŽ Machinelearning is programming computers to optimize a performance criterion using example data or past experience. īŽ There is no need to “learn” to calculate payroll īŽ Learning is used when:  Human expertise does not exist (navigating on Mars),  Humans are unable to explain their expertise (speech recognition)  Solution changes in time (routing on a computer network)  Solution needs to be adapted to particular cases (user biometrics)
  • 3.
    3 What We TalkAbout When We Talk About“Learning” īŽ Learning general models from a data of particular examples īŽ Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. īŽ Example in retail: Customer transactions to consumer behavior: People who bought “Da Vinci Code” also bought “The Five People You Meet in Heaven” (www.amazon.com) īŽ Build a model that is a good and useful approximation to the data.
  • 4.
    4 Data Mining/KDD īŽ Retail:Market basket analysis, Customer relationship management (CRM) īŽ Finance: Credit scoring, fraud detection īŽ Manufacturing: Optimization, troubleshooting īŽ Medicine: Medical diagnosis īŽ Telecommunications: Quality of service optimization īŽ Bioinformatics: Motifs, alignment īŽ Web mining: Search engines īŽ ... Definition := “KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad) Applications:
  • 5.
    5 What is MachineLearning? īŽ Machine Learning  Study of algorithms that  improve their performance  at some task  with experience īŽ Optimize a performance criterion using example data or past experience. īŽ Role of Statistics: Inference from a sample īŽ Role of Computer science: Efficient algorithms to  Solve the optimization problem  Representing and evaluating the model for inference
  • 6.
    Growth of MachineLearning īŽ Machine learning is preferred approach to  Speech recognition, Natural language processing  Computer vision  Medical outcomes analysis  Robot control  Computational biology īŽ This trend is accelerating  Improved machine learning algorithms  Improved data capture, networking, faster computers  Software too complex to write by hand  New sensors / IO devices  Demand for self-customization to user, environment  It turns out to be difficult to extract knowledge from human expertsfailure of expert systems in the 1980’s. Alpydin & Ch. Eick: ML Topic1 6
  • 7.
    7 Applications īŽ Association Analysis īŽSupervised Learning  Classification  Regression/Prediction īŽ Unsupervised Learning īŽ Reinforcement Learning
  • 8.
    Learning Associations īŽ Basketanalysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips | beer ) = 0.7 Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
  • 9.
    9 Classification īŽ Example: Credit scoring īŽDifferentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk Model
  • 10.
    10 Classification: Applications īŽ AkaPattern recognition īŽ Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style īŽ Character recognition: Different handwriting styles. īŽ Speech recognition: Temporal dependency.  Use of a dictionary or the syntax of the language.  Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech īŽ Medical diagnosis: From symptoms to illnesses īŽ Web Advertizing: Predict if a user clicks on an ad on the Internet.
  • 11.
    11 Face Recognition Training examplesof a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html
  • 12.
    12 Prediction: Regression īŽ Example:Price of a used car īŽ x : car attributes y : price y = g (x | Î¸ī€ ) g ( ) model, ī€‰Î¸ parameters y = wx+w0
  • 13.
    13 Regression Applications īŽ Navigatinga car: Angle of the steering wheel (CMU NavLab) īŽ Kinematics of a robot arm Îą1= g1(x,y) Îą2= g2(x,y) Îą1 Îą2 (x,y)
  • 14.
    14 Supervised Learning: Uses īŽPrediction of future cases: Use the rule to predict the output for future inputs īŽ Knowledge extraction: The rule is easy to understand īŽ Compression: The rule is simpler than the data it explains īŽ Outlier detection: Exceptions that are not covered by the rule, e.g., fraud Example: decision trees tools that create rules
  • 15.
    15 Unsupervised Learning īŽ Learning“what normally happens” īŽ No output īŽ Clustering: Grouping similar instances īŽ Other applications: Summarization, Association Analysis īŽ Example applications  Customer segmentation in CRM  Image compression: Color quantization  Bioinformatics: Learning motifs
  • 16.
    16 Reinforcement Learning īŽ Topics: Policies: what actions should an agent take in a particular situation  Utility estimation: how good is a state (used by policy) īŽ No supervised output but delayed reward īŽ Credit assignment problem (what was responsible for the outcome) īŽ Applications:  Game playing  Robot in a maze  Multiple agents, partial observability, ...
  • 17.
    17 Resources: Datasets īŽ UCIRepository: http://www.ics.uci.edu/~mlearn/MLRepository.html īŽ UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html īŽ Statlib: http://lib.stat.cmu.edu/ īŽ Delve: http://www.cs.utoronto.ca/~delve/
  • 18.
    18 Resources: Journals īŽ Journalof Machine Learning Research www.jmlr.org īŽ Machine Learning īŽ IEEE Transactions on Neural Networks īŽ IEEE Transactions on Pattern Analysis and Machine Intelligence īŽ Annals of Statistics īŽ Journal of the American Statistical Association īŽ ...
  • 19.
    19 Resources: Conferences īŽ InternationalConference on Machine Learning (ICML) īŽ European Conference on Machine Learning (ECML) īŽ Neural Information Processing Systems (NIPS) īŽ Computational Learning īŽ International Joint Conference on Artificial Intelligence (IJCAI) īŽ ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) īŽ IEEE Int. Conf. on Data Mining (ICDM)
  • 20.
    Summary COSC 6342 īŽIntroductory course that covers a wide range of machine learning techniques—from basic to state-of-the-art. īŽ More theoretical/statistics oriented, compared to other courses I teach might need continuous work not “to get lost”. īŽ You will learn about the methods you heard about: Naïve Bayes’, belief networks, regression, nearest-neighbor (kNN), decision trees, support vector machines, learning ensembles, over-fitting, regularization, dimensionality reduction & PCA, error bounds, parameter estimation, mixture models, comparing models, density estimation, clustering centering on K-means, EM, and DBSCAN, active and reinforcement learning. īŽ Covers algorithms, theory and applications īŽ It’s going to be fun and hard work Alpydin & Ch. Eick: ML Topic1 20
  • 21.
    Which Topics DeserveMore Coverage —if we had more time? īŽ Graphical Models/Belief Networks (just ran out of time) īŽ More on Adaptive Systems īŽ Learning Theory īŽ More on Clustering and Association Analysiscovered by Data Mining Course īŽ More on Feature Selection, Feature Creation īŽ More on Prediction īŽ Possibly: More depth coverage of optimization techniques, neural networks, hidden Markov models, how to conduct a machine learning experiment, comparing machine learning algorithms,â€Ļ Alpydin & Ch. Eick: ML Topic1 21