KEMBAR78
Statistical Machine Learning from Data - Introduction to ... | PDF
What is Machine Learning?
     Types of Problems and Situations
                Content of the Course
                      Documentation




Statistical Machine Learning from Data
              Introduction to Machine Learning


                             Samy Bengio

      IDIAP Research Institute, Martigny, Switzerland, and
 Ecole Polytechnique F´d´rale de Lausanne (EPFL), Switzerland
                      e e
                       bengio@idiap.ch
                http://www.idiap.ch/~bengio




                        November 30, 2005
Samy Bengio      Statistical Machine Learning from Data         1
What is Machine Learning?
             Types of Problems and Situations
                        Content of the Course
                              Documentation




1   What is Machine Learning?

2   Types of Problems and Situations

3   Content of the Course

4   Documentation




        Samy Bengio      Statistical Machine Learning from Data   2
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation




1   What is Machine Learning?

2   Types of Problems and Situations

3   Content of the Course

4   Documentation




        Samy Bengio      Statistical Machine Learning from Data               3
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation


What is Machine Learning? (Graphical View)




        Samy Bengio      Statistical Machine Learning from Data               4
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation


What is Machine Learning?



     Learning is an essential human property
     Learning means changing in order to be better (according to a
     given criterion) when a similar situation arrives
     Learning IS NOT learning by heart
     Any computer can learn by heart, the difficulty is to generalize
     a behavior to a novel situation




        Samy Bengio      Statistical Machine Learning from Data               5
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation


Why Learning is Difficult?

     Given a finite amount of training data, you have to derive a
     relation for an infinite domain
     In fact, there is an infinite number of such relations




     How should we draw the relation?

        Samy Bengio      Statistical Machine Learning from Data               6
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation


Why Learning is Difficult? (2)

     Given a finite amount of training data, you have to derive a
     relation for an infinite domain
     In fact, there is an infinite number of such relations




     Which relation is the most appropriate?

        Samy Bengio      Statistical Machine Learning from Data               7
What is Machine Learning?
             Types of Problems and Situations     What is Machine Learning?
                        Content of the Course     Why Learning is Difficult?
                              Documentation


Why Learning is Difficult? (3)

     Given a finite amount of training data, you have to derive a
     relation for an infinite domain
     In fact, there is an infinite number of such relations




     ... the hidden test points...

        Samy Bengio      Statistical Machine Learning from Data               8
What is Machine Learning?
              Types of Problems and Situations     What is Machine Learning?
                         Content of the Course     Why Learning is Difficult?
                               Documentation


Occam’s Razor’s Principle

      William of Occam: Monk living in the 14th century
      Principle of Parcimony:
                One should not increase, beyond what is necessary,
                the number of entities required to explain anything

      When many solutions are available for a given problem, we
      should select the simplest one
      But what do we mean by simple?
      We will use prior knowledge of the problem to solve to define
      what is a simple solution
                Example of a prior: smoothness


         Samy Bengio      Statistical Machine Learning from Data               9
What is Machine Learning?
                Types of Problems and Situations     What is Machine Learning?
                           Content of the Course     Why Learning is Difficult?
                                 Documentation


Learning as a Search Problem

    Initial solution
                                                        Set of solutions chosen a priori




                                       Optimal solution




                                            Set of solutions
                                            compatible with training set




           Samy Bengio      Statistical Machine Learning from Data                         10
What is Machine Learning?
                                                  Types of Problems
             Types of Problems and Situations
                                                  Types of Learning Situations
                        Content of the Course
                                                  Types of Applications
                              Documentation




1   What is Machine Learning?

2   Types of Problems and Situations

3   Content of the Course

4   Documentation




        Samy Bengio      Statistical Machine Learning from Data                  11
What is Machine Learning?
                                                  Types of Problems
             Types of Problems and Situations
                                                  Types of Learning Situations
                        Content of the Course
                                                  Types of Applications
                              Documentation


Types of Problems

     There are 3 kinds of problems:
          regression




        Samy Bengio      Statistical Machine Learning from Data                  12
What is Machine Learning?
                                                  Types of Problems
             Types of Problems and Situations
                                                  Types of Learning Situations
                        Content of the Course
                                                  Types of Applications
                              Documentation


Types of Problems

     There are 3 kinds of problems:
          regression, classification




        Samy Bengio      Statistical Machine Learning from Data                  13
What is Machine Learning?
                                                  Types of Problems
             Types of Problems and Situations
                                                  Types of Learning Situations
                        Content of the Course
                                                  Types of Applications
                              Documentation


Types of Problems

     There are 3 kinds of problems:
          regression, classification, density estimation



                      P(X)




                                                                  X



        Samy Bengio      Statistical Machine Learning from Data                  14
What is Machine Learning?
                                                    Types of Problems
               Types of Problems and Situations
                                                    Types of Learning Situations
                          Content of the Course
                                                    Types of Applications
                                Documentation


Types of Learning

  Supervised learning:
      The training data contains the desired behavior
      (desired class, outcome, etc)

  Reinforcement learning:
       The training data contains partial targets
      (for instance, simply whether the machine did well or not).

  Unsupervised learning:
      The training data is raw, no class or target is given
      There is often a hidden goal in that task (compression,
      maximum likelihood, etc)

          Samy Bengio      Statistical Machine Learning from Data                  15
What is Machine Learning?
                                                    Types of Problems
               Types of Problems and Situations
                                                    Types of Learning Situations
                          Content of the Course
                                                    Types of Applications
                                Documentation


Applications

      Vision Processing
           Face detection/verification
           Handwritten recognition
      Speech Processing
           Phoneme/Word/Sentence/Person recognition
      Others
           Indexing: google, text mining, information retrieval
           Finance: asset prediction, portfolio and risk management
           Telecom: traffic prediction
           Data mining: make use of huge datasets kept by large
           corporations
           Games: Backgammon, go
           Control: robots
      ... and plenty of others of course!

         Samy Bengio       Statistical Machine Learning from Data                  16
What is Machine Learning?
             Types of Problems and Situations
                        Content of the Course
                              Documentation




1   What is Machine Learning?

2   Types of Problems and Situations

3   Content of the Course

4   Documentation




        Samy Bengio      Statistical Machine Learning from Data   17
What is Machine Learning?
              Types of Problems and Situations
                         Content of the Course
                               Documentation


Content of the Course

     Theoretical Issues
          What are the theoretical foundations for statistical learning?
          How can we measure the expected performance of a model?
     Modeling Issues
          Models specialized for classification, regression, distributions,
          sequences, images, etc
          For each model, we need to devise a training algorithm
     Others
          Other practical issues, such as feature selection, parameter
          sharing, etc.
     Laboratories
          About one third of the course will be through practical
          laboratories, using the python programming language


        Samy Bengio       Statistical Machine Learning from Data             18
What is Machine Learning?
                                                  Journals and Conferences
             Types of Problems and Situations
                                                  Books and Lecture Notes
                        Content of the Course
                                                  Electronic Resources
                              Documentation




1   What is Machine Learning?

2   Types of Problems and Situations

3   Content of the Course

4   Documentation




        Samy Bengio      Statistical Machine Learning from Data              19
What is Machine Learning?
                                                  Journals and Conferences
             Types of Problems and Situations
                                                  Books and Lecture Notes
                        Content of the Course
                                                  Electronic Resources
                              Documentation


Journals and Conferences


     Journals:
          Journal of Machine Learning Research
          Neural Computation
          IEEE Transactions on Neural Networks
          IEEE Transactions on Pattern Analysis and Machine
          Intelligence
     Conferences:
          NIPS: Neural Information Processing Systems
          COLT: Computational Learning Theory
          ICML: International Conference on Machine Learning




        Samy Bengio      Statistical Machine Learning from Data              20
What is Machine Learning?
                                                   Journals and Conferences
              Types of Problems and Situations
                                                   Books and Lecture Notes
                         Content of the Course
                                                   Electronic Resources
                               Documentation


Books and Lecture Notes

     Books:
          C. Bishop. Neural Networks for Pattern Recognition, 1995.
          V. Vapnik. The Nature of Statistical Learning Theory, 1995.
          T. Hastie, R. Tibshirani, J. Friedman. The elements of
          Statistical Learning, 2001.
          B. Sch¨lkopf, A. J. Smola. Learning with Kernels, 2002.
                 o
     Other lecture notes: (some are in french...)
          Bengio, Y.: http://www.iro.umontreal.ca/˜pift6266/A03/
          Kegl, B.: http://www.iro.umontreal.ca/˜kegl/ift6266/
          Jordan, M.:
          http://www.cs.berkeley.edu/˜jordan/courses/281A-fall04/
          LeCun, Y.:
          http://www.cs.nyu.edu/˜yann/2005f-G22-2565-001/


        Samy Bengio       Statistical Machine Learning from Data              21
What is Machine Learning?
                                                   Journals and Conferences
              Types of Problems and Situations
                                                   Books and Lecture Notes
                         Content of the Course
                                                   Electronic Resources
                               Documentation


Electronic Resources



      Search engines:
           NIPS online: http://nips.djvuzone.org
           Citeseer: http://citeseer.ist.psu.edu/
           Google scholar: http://scholar.google.com/
      Machine learning libraries:
           Torch: http://www.Torch.ch
           Lush: http://lush.sf.net




         Samy Bengio      Statistical Machine Learning from Data              22

Statistical Machine Learning from Data - Introduction to ...

  • 1.
    What is MachineLearning? Types of Problems and Situations Content of the Course Documentation Statistical Machine Learning from Data Introduction to Machine Learning Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique F´d´rale de Lausanne (EPFL), Switzerland e e bengio@idiap.ch http://www.idiap.ch/~bengio November 30, 2005 Samy Bengio Statistical Machine Learning from Data 1
  • 2.
    What is MachineLearning? Types of Problems and Situations Content of the Course Documentation 1 What is Machine Learning? 2 Types of Problems and Situations 3 Content of the Course 4 Documentation Samy Bengio Statistical Machine Learning from Data 2
  • 3.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation 1 What is Machine Learning? 2 Types of Problems and Situations 3 Content of the Course 4 Documentation Samy Bengio Statistical Machine Learning from Data 3
  • 4.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation What is Machine Learning? (Graphical View) Samy Bengio Statistical Machine Learning from Data 4
  • 5.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation What is Machine Learning? Learning is an essential human property Learning means changing in order to be better (according to a given criterion) when a similar situation arrives Learning IS NOT learning by heart Any computer can learn by heart, the difficulty is to generalize a behavior to a novel situation Samy Bengio Statistical Machine Learning from Data 5
  • 6.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation Why Learning is Difficult? Given a finite amount of training data, you have to derive a relation for an infinite domain In fact, there is an infinite number of such relations How should we draw the relation? Samy Bengio Statistical Machine Learning from Data 6
  • 7.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation Why Learning is Difficult? (2) Given a finite amount of training data, you have to derive a relation for an infinite domain In fact, there is an infinite number of such relations Which relation is the most appropriate? Samy Bengio Statistical Machine Learning from Data 7
  • 8.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation Why Learning is Difficult? (3) Given a finite amount of training data, you have to derive a relation for an infinite domain In fact, there is an infinite number of such relations ... the hidden test points... Samy Bengio Statistical Machine Learning from Data 8
  • 9.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation Occam’s Razor’s Principle William of Occam: Monk living in the 14th century Principle of Parcimony: One should not increase, beyond what is necessary, the number of entities required to explain anything When many solutions are available for a given problem, we should select the simplest one But what do we mean by simple? We will use prior knowledge of the problem to solve to define what is a simple solution Example of a prior: smoothness Samy Bengio Statistical Machine Learning from Data 9
  • 10.
    What is MachineLearning? Types of Problems and Situations What is Machine Learning? Content of the Course Why Learning is Difficult? Documentation Learning as a Search Problem Initial solution Set of solutions chosen a priori Optimal solution Set of solutions compatible with training set Samy Bengio Statistical Machine Learning from Data 10
  • 11.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation 1 What is Machine Learning? 2 Types of Problems and Situations 3 Content of the Course 4 Documentation Samy Bengio Statistical Machine Learning from Data 11
  • 12.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation Types of Problems There are 3 kinds of problems: regression Samy Bengio Statistical Machine Learning from Data 12
  • 13.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation Types of Problems There are 3 kinds of problems: regression, classification Samy Bengio Statistical Machine Learning from Data 13
  • 14.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation Types of Problems There are 3 kinds of problems: regression, classification, density estimation P(X) X Samy Bengio Statistical Machine Learning from Data 14
  • 15.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation Types of Learning Supervised learning: The training data contains the desired behavior (desired class, outcome, etc) Reinforcement learning: The training data contains partial targets (for instance, simply whether the machine did well or not). Unsupervised learning: The training data is raw, no class or target is given There is often a hidden goal in that task (compression, maximum likelihood, etc) Samy Bengio Statistical Machine Learning from Data 15
  • 16.
    What is MachineLearning? Types of Problems Types of Problems and Situations Types of Learning Situations Content of the Course Types of Applications Documentation Applications Vision Processing Face detection/verification Handwritten recognition Speech Processing Phoneme/Word/Sentence/Person recognition Others Indexing: google, text mining, information retrieval Finance: asset prediction, portfolio and risk management Telecom: traffic prediction Data mining: make use of huge datasets kept by large corporations Games: Backgammon, go Control: robots ... and plenty of others of course! Samy Bengio Statistical Machine Learning from Data 16
  • 17.
    What is MachineLearning? Types of Problems and Situations Content of the Course Documentation 1 What is Machine Learning? 2 Types of Problems and Situations 3 Content of the Course 4 Documentation Samy Bengio Statistical Machine Learning from Data 17
  • 18.
    What is MachineLearning? Types of Problems and Situations Content of the Course Documentation Content of the Course Theoretical Issues What are the theoretical foundations for statistical learning? How can we measure the expected performance of a model? Modeling Issues Models specialized for classification, regression, distributions, sequences, images, etc For each model, we need to devise a training algorithm Others Other practical issues, such as feature selection, parameter sharing, etc. Laboratories About one third of the course will be through practical laboratories, using the python programming language Samy Bengio Statistical Machine Learning from Data 18
  • 19.
    What is MachineLearning? Journals and Conferences Types of Problems and Situations Books and Lecture Notes Content of the Course Electronic Resources Documentation 1 What is Machine Learning? 2 Types of Problems and Situations 3 Content of the Course 4 Documentation Samy Bengio Statistical Machine Learning from Data 19
  • 20.
    What is MachineLearning? Journals and Conferences Types of Problems and Situations Books and Lecture Notes Content of the Course Electronic Resources Documentation Journals and Conferences Journals: Journal of Machine Learning Research Neural Computation IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Conferences: NIPS: Neural Information Processing Systems COLT: Computational Learning Theory ICML: International Conference on Machine Learning Samy Bengio Statistical Machine Learning from Data 20
  • 21.
    What is MachineLearning? Journals and Conferences Types of Problems and Situations Books and Lecture Notes Content of the Course Electronic Resources Documentation Books and Lecture Notes Books: C. Bishop. Neural Networks for Pattern Recognition, 1995. V. Vapnik. The Nature of Statistical Learning Theory, 1995. T. Hastie, R. Tibshirani, J. Friedman. The elements of Statistical Learning, 2001. B. Sch¨lkopf, A. J. Smola. Learning with Kernels, 2002. o Other lecture notes: (some are in french...) Bengio, Y.: http://www.iro.umontreal.ca/˜pift6266/A03/ Kegl, B.: http://www.iro.umontreal.ca/˜kegl/ift6266/ Jordan, M.: http://www.cs.berkeley.edu/˜jordan/courses/281A-fall04/ LeCun, Y.: http://www.cs.nyu.edu/˜yann/2005f-G22-2565-001/ Samy Bengio Statistical Machine Learning from Data 21
  • 22.
    What is MachineLearning? Journals and Conferences Types of Problems and Situations Books and Lecture Notes Content of the Course Electronic Resources Documentation Electronic Resources Search engines: NIPS online: http://nips.djvuzone.org Citeseer: http://citeseer.ist.psu.edu/ Google scholar: http://scholar.google.com/ Machine learning libraries: Torch: http://www.Torch.ch Lush: http://lush.sf.net Samy Bengio Statistical Machine Learning from Data 22