KEMBAR78
Agile deployment predictive analytics on hadoop | PDF
Agile Deployment of
                                    Predictive Analytics on
                                                   Hadoop

         Faster Insights through Open Standards
                                                           Hadoop Summit 2012



     © 2012 Datameer, Inc. All rights reserved.

© 2012 Datameer, Inc. All rights reserved.        Page 1
Today s Session

                      Ulrich Rueckert                                      Michael Zeller
                      Data Scientist                                       CEO
                      Datameer                                             Zementis



    After this session, you will be able to…

    1.  Effectively deliver predictive solutions combining:
             a.  R, KNIME & Others               [Model Development]
             b.  Zementis Universal PMML Plug-in [Model Deployment & Execution]
             c.  Datameer                        [Scalable Hadoop Infrastructure]

    2.  Identify PMML as a vendor-neutral & open standard to:
             a.  Incorporate predictive models from virtually any commercial vendor or open source tool
             b.  Apply such models on Big Data

    3.  Leverage a lightweight, agile deployment process for predictive analytics to:
             a.  Accelerate time-to-market
             b.  Lower cost and complexity
             c.  Reuse existing predictive assets

© 2012 Datameer, Inc. All rights reserved.          Page 2
Who is Datameer?

     §  “Business Intelligence on top of Hadoop”
     §  Established 2009 by Hadoop and enterprise software veterans
     §  Offices in Silicon Valley, New York and Germany




     §  Some customers:




© 2012 Datameer, Inc. All rights reserved.   Page 3
Who is Zementis?

     §  Focus on Operational Predictive Analytics
     §  Offices in San Diego and Hong Kong
     §  Predictive Analytics Software Technology:
              •    ADAPA® Decision Engine (Predictive Models and Rules)
              •    ADAPA Add-in for Excel
              •    PMML Converter
              •    Universal PMML Plug-in (UPPI)


     §  Global Partner Network




© 2012 Datameer, Inc. All rights reserved.      Page 4
Big Data and Analytics


        §  People and Sensor Data
             •  Transaction records
             •  Social media
             •  Climate information                   90% of the data today
                                                      created in the last 2 years
             •  Mobile GPS signals
             •  Healthcare
             •  Smart Grid

        §  Benefits from Analytics
             •  Descriptive Analytics answers What happened?
             •  Predictive Analytics answers What will happen next?


© 2012 Datameer, Inc. All rights reserved.   Page 5
Operational Predictive Analytics

                                                                                                               Score Distribution
                                                                                                         1st Lien Stand-Alone Loans

                                                                    14%                              Goods
                                                                                                     Bads
                                                                    12%
                                                                                                     Poly. (Goods)
                                                                                                     Poly. (Bads)
                                                   % Within Class




                                                                    10%

                                                                    8%

                                                                    6%

                                                                    4%

                                                                    2%

                                                                    0%
                                                                           50

                                                                                100

                                                                                      150

                                                                                            200

                                                                                                   250

                                                                                                          300

                                                                                                                350

                                                                                                                      400

                                                                                                                            450

                                                                                                                                  500

                                                                                                                                        550

                                                                                                                                              600

                                                                                                                                                     650

                                                                                                                                                           700

                                                                                                                                                                 750

                                                                                                                                                                       800

                                                                                                                                                                             850

                                                                                                                                                                                   900

                                                                                                                                                                                         950

                                                                                                                                                                                               1000
                                                      % of Delinquent Loans per Month
                                                                                                                              Score
                                      90

                                      80
              % of Delinquent Loans




                                      70
                                                                                                                                               700
                                      60
                                                                                                                                               750
                                      50                                                                                                       800
                                      40                                                                                                       850
                                                                                                                                               900
                                      30
                                                                                                                                               950
                                      20

                                      10

                                      0
                                       Jan   Feb      Mar            Apr    May       Jun    Jul         Aug    Sep     Oct       Nov

                                                                                  Months




© 2012 Datameer, Inc. All rights reserved.                                                                                                    Page 6
From Model Building to Deployment

              Model Building                                     Model Deployment
                                                               Integration / Execution



                                                                      Datameer Server
                                                               	
  
                                                               	
          PMML	
  
                                                                            PMML	
  
                                                                             PMML	
  
                                                                          (models)	
  
                                                               	
          (models)	
  
                                                                            (models)	
  
                                             PMML
                                                	
             	
  
                                                               	
  
                                                               	
           UPPI	
  
                                                               	
  
                                                               	
  


                                                          Simple Deployment & Execution
                                                          1.  Upload PMML file(s) in DAS
                                                          2.  PMML turns into custom function
                                                          3.  Seamlessly score data in Datameer

© 2012 Datameer, Inc. All rights reserved.       Page 7
PMML
Predictive Model Markup Language



                                             •  PMML is an XML-based language used to define
                                             statistical and data mining models and to share these
                                             between compliant applications.

                                             •  Mature standard developed by the DMG (Data Mining
                                             Group) to avoid proprietary issues and incompatibilities
                                             and to deploy models.
 Transformations
                                             •  Supported by all leading data mining tools, commercial
                                             and open-source.

                                             •  Allows for the clear separation of tasks: Model
                                             development vs. model deployment.

                                             •  Eliminates the need for custom code and proprietary
      PMML book available on                 model deployment solutions.
          Amazon.com
                                             •  Uniform deployment platform ensures scalability and
                                             reliability of model execution.
© 2012 Datameer, Inc. All rights reserved.        Page 8
PMML: Predictive Model Management
  Integrating across all systems and processes



            Business Process




                                             PMML


                                                      IBM SmartCloud
         Applications                                 Amazon EC2
         CRM, ERP, EXCEL, etc.


© 2012 Datameer, Inc. All rights reserved.   Page 9
PMML: One Standard, One Process


                                                  Divisions



      Service Providers
                                                                 External Vendors




                                                       PMML




                                             Applications
© 2012 Datameer, Inc. All rights reserved.             Page 10
Demo Setup

    §  End-to-end Model Development Lifecycle
    §  PMML Standard as the Glue

Real-time Process
                                                                                                Understand
Improvement and ROI                             Model
                                                                                Data Analysis   Client s Data
                                              Deployment




                                                     Universal	
  
                                                      PMML	
  	
  
                                                      Plug-­‐In	
  


                                              Development
Demonstrate                                                                     Model Design    Build Model(s) to
                                                and Test
Model Performance                                                                               Unlock Hidden Value


 © 2012 Datameer, Inc. All rights reserved.                           Page 11
Demo: Annual Marketing Campaign

   §  Which customers should we
       target?                                                 2011                    2012
                                                             Campaign                Customer
   §  Split 2011 results in training                         Results                   List


       and test set
   §  Learn model on training set                                      Subset for
                                                                         Testing

   §  Apply model on test set                                                       Fine-Tuned
                                                                                      Prediction
                                                                                        Model
   §  Fine-tune model until                           Subset for       Prediction

       evaluation shows success                         Training          Model



   §  Apply final model on 2012
       customer list                                                      Model
                                                                        Evaluation
                                                                                     Campaign
                                                                                     Candidates




© 2012 Datameer, Inc. All rights reserved.   Page 12
Summary


•      Open Standards vs.                    •    Minimize Data Movement         •    Leverage Datameer UI
       Proprietary Code                      •    Massively Parallel Execution   •    Deploy in Minutes vs. Months
•      Best-of-Breed Tool Set                •    Scale with Business Demand     •    No Coding Skills Required




      Avoid Vendor                                                                     Ease of Use
        Lock-in                                    Hadoop-based                         Fast ROI
                                                  Scoring Paradigm
© 2012 Datameer, Inc. All rights reserved.                 Page 13
Online Resources




 §  Learn More About PMML
 §     Data Mining Group website                                 http://www.dmg.org
 §     Join LinkedIn PMML Discussion Group                       http://www.linkedin.com/groupRegistration?gid=2328634
 §     Articles, on-line videos, blogs                           http://www.zementis.com/community.htm



 §  Product Info
 §     On Demand Webinar                    http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/

 §     UPPI for Datameer                    http://www.zementis.com/DAS-plugin.htm



© 2012 Datameer, Inc. All rights reserved.                  Page 14

Agile deployment predictive analytics on hadoop

  • 1.
    Agile Deployment of Predictive Analytics on Hadoop Faster Insights through Open Standards Hadoop Summit 2012 © 2012 Datameer, Inc. All rights reserved. © 2012 Datameer, Inc. All rights reserved. Page 1
  • 2.
    Today s Session Ulrich Rueckert Michael Zeller Data Scientist CEO Datameer Zementis After this session, you will be able to… 1.  Effectively deliver predictive solutions combining: a.  R, KNIME & Others [Model Development] b.  Zementis Universal PMML Plug-in [Model Deployment & Execution] c.  Datameer [Scalable Hadoop Infrastructure] 2.  Identify PMML as a vendor-neutral & open standard to: a.  Incorporate predictive models from virtually any commercial vendor or open source tool b.  Apply such models on Big Data 3.  Leverage a lightweight, agile deployment process for predictive analytics to: a.  Accelerate time-to-market b.  Lower cost and complexity c.  Reuse existing predictive assets © 2012 Datameer, Inc. All rights reserved. Page 2
  • 3.
    Who is Datameer? §  “Business Intelligence on top of Hadoop” §  Established 2009 by Hadoop and enterprise software veterans §  Offices in Silicon Valley, New York and Germany §  Some customers: © 2012 Datameer, Inc. All rights reserved. Page 3
  • 4.
    Who is Zementis? §  Focus on Operational Predictive Analytics §  Offices in San Diego and Hong Kong §  Predictive Analytics Software Technology: •  ADAPA® Decision Engine (Predictive Models and Rules) •  ADAPA Add-in for Excel •  PMML Converter •  Universal PMML Plug-in (UPPI) §  Global Partner Network © 2012 Datameer, Inc. All rights reserved. Page 4
  • 5.
    Big Data andAnalytics §  People and Sensor Data •  Transaction records •  Social media •  Climate information 90% of the data today created in the last 2 years •  Mobile GPS signals •  Healthcare •  Smart Grid §  Benefits from Analytics •  Descriptive Analytics answers What happened? •  Predictive Analytics answers What will happen next? © 2012 Datameer, Inc. All rights reserved. Page 5
  • 6.
    Operational Predictive Analytics Score Distribution 1st Lien Stand-Alone Loans 14% Goods Bads 12% Poly. (Goods) Poly. (Bads) % Within Class 10% 8% 6% 4% 2% 0% 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 % of Delinquent Loans per Month Score 90 80 % of Delinquent Loans 70 700 60 750 50 800 40 850 900 30 950 20 10 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Months © 2012 Datameer, Inc. All rights reserved. Page 6
  • 7.
    From Model Buildingto Deployment Model Building Model Deployment Integration / Execution Datameer Server     PMML   PMML   PMML   (models)     (models)   (models)   PMML         UPPI       Simple Deployment & Execution 1.  Upload PMML file(s) in DAS 2.  PMML turns into custom function 3.  Seamlessly score data in Datameer © 2012 Datameer, Inc. All rights reserved. Page 7
  • 8.
    PMML Predictive Model MarkupLanguage •  PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications. •  Mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models. Transformations •  Supported by all leading data mining tools, commercial and open-source. •  Allows for the clear separation of tasks: Model development vs. model deployment. •  Eliminates the need for custom code and proprietary PMML book available on model deployment solutions. Amazon.com •  Uniform deployment platform ensures scalability and reliability of model execution. © 2012 Datameer, Inc. All rights reserved. Page 8
  • 9.
    PMML: Predictive ModelManagement Integrating across all systems and processes Business Process PMML IBM SmartCloud Applications Amazon EC2 CRM, ERP, EXCEL, etc. © 2012 Datameer, Inc. All rights reserved. Page 9
  • 10.
    PMML: One Standard,One Process Divisions Service Providers External Vendors PMML Applications © 2012 Datameer, Inc. All rights reserved. Page 10
  • 11.
    Demo Setup §  End-to-end Model Development Lifecycle §  PMML Standard as the Glue Real-time Process Understand Improvement and ROI Model Data Analysis Client s Data Deployment Universal   PMML     Plug-­‐In   Development Demonstrate Model Design Build Model(s) to and Test Model Performance Unlock Hidden Value © 2012 Datameer, Inc. All rights reserved. Page 11
  • 12.
    Demo: Annual MarketingCampaign §  Which customers should we target? 2011 2012 Campaign Customer §  Split 2011 results in training Results List and test set §  Learn model on training set Subset for Testing §  Apply model on test set Fine-Tuned Prediction Model §  Fine-tune model until Subset for Prediction evaluation shows success Training Model §  Apply final model on 2012 customer list Model Evaluation Campaign Candidates © 2012 Datameer, Inc. All rights reserved. Page 12
  • 13.
    Summary •  Open Standards vs. •  Minimize Data Movement •  Leverage Datameer UI Proprietary Code •  Massively Parallel Execution •  Deploy in Minutes vs. Months •  Best-of-Breed Tool Set •  Scale with Business Demand •  No Coding Skills Required Avoid Vendor Ease of Use Lock-in Hadoop-based Fast ROI Scoring Paradigm © 2012 Datameer, Inc. All rights reserved. Page 13
  • 14.
    Online Resources § Learn More About PMML §  Data Mining Group website http://www.dmg.org §  Join LinkedIn PMML Discussion Group http://www.linkedin.com/groupRegistration?gid=2328634 §  Articles, on-line videos, blogs http://www.zementis.com/community.htm §  Product Info §  On Demand Webinar http://data.datameer.com/power-of-big-data-insights-of-predictive-analytics/ §  UPPI for Datameer http://www.zementis.com/DAS-plugin.htm © 2012 Datameer, Inc. All rights reserved. Page 14