KEMBAR78
Big data + data science startup focus points | PPTX
BIG DATA & DATA SCIENCE
START-UP FOCUS POINTS
+ BUSINESS AND TECHNOLOGY
REFERENCE ARCHITECTURE
@TomZorde
I HAVE AN IDEA FOR A DATA SCIENCE START-UP
• Use these slides to focus conversation
• What stage are you at?
• What is the problem you’re trying to solve?
• What type of business model would work?
• Tools? – A rapidly evolving space.
• Reference Architecture helps identify what level of the stack
we’re talking about.
AREAS OF EARLY FOCUS
SEED STAGE - Research & Development
1. Research & Define Concept, business model, internal & sourced capabilities
2. Define customer value proposition and identify target market
ANGEL – Business Planning & Product Development
1. Identify services and products required and evaluate gaps for go-to-market readiness
2. Source funding partner to build minimum viable product and get commitment for round 2 funding
3. Assemble team and build MVP prototype exceeding expectations
ROUND 1/ SERIES A FUNDING – Commercially operational
ROUND 2 / SERIES B FUNDING – Fully Operational
ROUND 3 / SERIES C FUNDING – Expansion
IPO/ ACQUISITION
BUSINESS PLANNING & DEVELOPMENT - LOGICAL STEPS
1. Full business needs and information requirements
analysis. Business Drivers
• Revenue generation? Cost reduction? Customer
retention? Compliance?
• Process Improvement? Fraud detection?
Analytics? Dashboard?
• Solving a tough problem? Retiring/replacing
assets, technologies and systems?
2. Technology Evaluation and Selection
• Define requirements and objective first
• Evaluation a variety of technology stacks –
develop a framework first
3. Board Support for Start-up Resources
4. Prototyping, Discovery, and Planning
• Rent Infrastructure in Cloud – VMWare, AWS, MS
Azure and others
• Use Spare Hardware and Network Bandwidth
• Assessment, Proposal. Project/Program Plan for
next steps
• Start small and keep delivering
5. Architecture Design, Estimation, Business Case
6. Obtain funding and executive sponsorships,
owners, etc.
7. SDLC, don’t forget Hardware, Security, Testing,
Data governance etc.
FORESEEABLE CHALLENGES
Business urgency, time to market pressures
• Big Data /Data Science start up needs careful planning
• Big Data needs infrastructure, software stacks, people, start up plan
Lack of Big Data Resources, Lack of Sponsorships (except in some companies)
• Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration,
Security, Programming, Testing, etc.
• Skepticism about Big Data
Integration with Existing Technologies and Systems
• Can not develop isolated big data solutions
• Integration with existing systems will be a top challenge (requires both sides to do additional work)
Open Sources: Stability, Maturity, and Security
INFORMATION AS A PRODUCT/SERVICE
TYPES OF RELEVANT BUSINESS MODELS
Differentiation
New Services
Customers Experience
Contextual Relevance
Brokering
Raw Data
Benchmarking
Analysis and Insight
(Meta Data)
Delivery
Market Place
Facilitator
Advertising
REFERENCE ARCHITECTURE
Decisions & Insight
Analytics & Discovery
Data Access and Distribution
Data Collection& Organisation
Infrastructure Platform
Monitoring,Alerts,Tools,
Security,Governance
• The technology stack is rapidly evolving with all traditional as well as new vendors providing offerings
• Open source tools remain at the foundation layers.
• Different use cases will require different technology tools.
REFERENCE ARCHITECTURE
Decisions & Insight
• IBM Watson
• Industry Specific
Analytics & Discovery
• SAP Business Objects
• IBM Cognos
• SAS Analytics
• Dell Statistica
• Oracle Hyperion
• Microsoft BI
• KNIME
• Pentaho
• Informatica
REFERENCE ARCHITECTURE
Data Access and Distribution
• Document: MongoDB, CouchDB
• Graph: Neo4j, Titan
• Key Value Pair: Riak, Redis
• Columnar: Cassandra, Hbase
• Search: Lucene, Solr, ElasticSearch
Monitoring, Alerts, Tools, Security, Governance:
• Hadoop:Apache, CloudEra, Hortonworks,
MapR, IBM
• SQL Mapping: Hive
• Big Data Transformation: Pig
• Hadoop Load: Sqoop
• Realtime-ETL: Storm
• Cluster Computing: Apache Spark
• Languages: Python, Java, R, Scala
REFERENCE ARCHITECTURE
Data Collection& Organisation (Batch & Real-Time)
• Hadoop
• Hadoop Map Reduce
• Mahout
Infrastructure Platform
• AWS
• Azure
• Mortar
• Google BigQuery
• Qubole
• Dell
• HP
• IBM
BIG DATA & DATA SCIENCE
START-UP FOCUS POINTS
@TomZorde
Thank you

Big data + data science startup focus points

  • 1.
    BIG DATA &DATA SCIENCE START-UP FOCUS POINTS + BUSINESS AND TECHNOLOGY REFERENCE ARCHITECTURE @TomZorde
  • 2.
    I HAVE ANIDEA FOR A DATA SCIENCE START-UP • Use these slides to focus conversation • What stage are you at? • What is the problem you’re trying to solve? • What type of business model would work? • Tools? – A rapidly evolving space. • Reference Architecture helps identify what level of the stack we’re talking about.
  • 3.
    AREAS OF EARLYFOCUS SEED STAGE - Research & Development 1. Research & Define Concept, business model, internal & sourced capabilities 2. Define customer value proposition and identify target market ANGEL – Business Planning & Product Development 1. Identify services and products required and evaluate gaps for go-to-market readiness 2. Source funding partner to build minimum viable product and get commitment for round 2 funding 3. Assemble team and build MVP prototype exceeding expectations ROUND 1/ SERIES A FUNDING – Commercially operational ROUND 2 / SERIES B FUNDING – Fully Operational ROUND 3 / SERIES C FUNDING – Expansion IPO/ ACQUISITION
  • 4.
    BUSINESS PLANNING &DEVELOPMENT - LOGICAL STEPS 1. Full business needs and information requirements analysis. Business Drivers • Revenue generation? Cost reduction? Customer retention? Compliance? • Process Improvement? Fraud detection? Analytics? Dashboard? • Solving a tough problem? Retiring/replacing assets, technologies and systems? 2. Technology Evaluation and Selection • Define requirements and objective first • Evaluation a variety of technology stacks – develop a framework first 3. Board Support for Start-up Resources 4. Prototyping, Discovery, and Planning • Rent Infrastructure in Cloud – VMWare, AWS, MS Azure and others • Use Spare Hardware and Network Bandwidth • Assessment, Proposal. Project/Program Plan for next steps • Start small and keep delivering 5. Architecture Design, Estimation, Business Case 6. Obtain funding and executive sponsorships, owners, etc. 7. SDLC, don’t forget Hardware, Security, Testing, Data governance etc.
  • 5.
    FORESEEABLE CHALLENGES Business urgency,time to market pressures • Big Data /Data Science start up needs careful planning • Big Data needs infrastructure, software stacks, people, start up plan Lack of Big Data Resources, Lack of Sponsorships (except in some companies) • Big Data is complex and multiple skill sets (mostly new to many companies) – Infrastructure, Administration, Security, Programming, Testing, etc. • Skepticism about Big Data Integration with Existing Technologies and Systems • Can not develop isolated big data solutions • Integration with existing systems will be a top challenge (requires both sides to do additional work) Open Sources: Stability, Maturity, and Security
  • 6.
    INFORMATION AS APRODUCT/SERVICE TYPES OF RELEVANT BUSINESS MODELS Differentiation New Services Customers Experience Contextual Relevance Brokering Raw Data Benchmarking Analysis and Insight (Meta Data) Delivery Market Place Facilitator Advertising
  • 7.
    REFERENCE ARCHITECTURE Decisions &Insight Analytics & Discovery Data Access and Distribution Data Collection& Organisation Infrastructure Platform Monitoring,Alerts,Tools, Security,Governance • The technology stack is rapidly evolving with all traditional as well as new vendors providing offerings • Open source tools remain at the foundation layers. • Different use cases will require different technology tools.
  • 8.
    REFERENCE ARCHITECTURE Decisions &Insight • IBM Watson • Industry Specific Analytics & Discovery • SAP Business Objects • IBM Cognos • SAS Analytics • Dell Statistica • Oracle Hyperion • Microsoft BI • KNIME • Pentaho • Informatica
  • 9.
    REFERENCE ARCHITECTURE Data Accessand Distribution • Document: MongoDB, CouchDB • Graph: Neo4j, Titan • Key Value Pair: Riak, Redis • Columnar: Cassandra, Hbase • Search: Lucene, Solr, ElasticSearch Monitoring, Alerts, Tools, Security, Governance: • Hadoop:Apache, CloudEra, Hortonworks, MapR, IBM • SQL Mapping: Hive • Big Data Transformation: Pig • Hadoop Load: Sqoop • Realtime-ETL: Storm • Cluster Computing: Apache Spark • Languages: Python, Java, R, Scala
  • 10.
    REFERENCE ARCHITECTURE Data Collection&Organisation (Batch & Real-Time) • Hadoop • Hadoop Map Reduce • Mahout Infrastructure Platform • AWS • Azure • Mortar • Google BigQuery • Qubole • Dell • HP • IBM
  • 11.
    BIG DATA &DATA SCIENCE START-UP FOCUS POINTS @TomZorde Thank you