KEMBAR78
GTU GeekDay Data Science and Applications | PDF
Data Science and
Applications
(An Introduction)
Kürşat İNCE
kince@havelsan.com.tr
Kürşat İNCE
1996 – Today HAVELSAN, Inc.
• 1996 - Development of HVL Firewall (#1 in Turkey)
• 2001 - Developer in various projects: TuAF IS, MELTEM, etc.
• 2010 - YGO Project/Product manager
• 2014 - Move to HVL Istanbul Office 
• 2014 - Systems Engineer
• 2016 - R&D Coordinator
June 2016 - Organizer at www.DataIstanbul.org
BSc, Bilkent University Computer Engineering, 1996
MSc, Bilkent University Computer Engineering, 1999
PhD, Gebze Technical University Computer Engineering (in progress)
FACILITIES
HAVELSAN HEADQUARTERS
(ANKARA)
SIMULATION CENTER NAVAL COMBAT SYSTEMS
CENTER - İSTANBUL
R&D CENTER
(METU Technopolis)
TEST & INTEGRATION
FACILITIES
HAVELSANKEYFACTS
SİSATEM
BUSINESS AREAS
HAVELSANKEYFACTS
Command & Control
Solutions House of
Turkey
COMMAND, CONTROL &
COMBAT SYSTEMS
A Global Brand in
Simulation &
Training
TRAINING TECHNOLOGIES &
SIMULATION SYSTEMS
Leading E-
Transformation
Company of Turkey
MANAGEMENT INFORMATION
SYSTEMS
Center of Excellence
in Security Solutions
HOMELAND & CYBER
SECURITY SOLUTIONS
• Meetup Community
• Established: Mart 2016
• Members: ~1500
• Latest Events:
• Büyük Veri için Veri Yapıları ve Algoritmalar
• Web Analitiği ve Dönüşüm Oranı Optimizasyonu
• Veri Bilimi ve Kişisel Verilerin Korunması
• Planning hands-on Data Science course
/data_istanbul/dataistanbul
Agenda
• Data Science
• Roles, Skill Sets, and Process
• Applications
• Final Word
• Resources, etc.
Data Science
9
Evolution of Sciences
• Before 1600, empirical science
• Direct observations
• 1600-1950s, theoretical science
• Each discipline has grown a theoretical component. Theoretical models often motivate
experiments and generalize our understanding.
• 1950s-1990s, computational science
• Over the last 50 years, most disciplines have grown a third, computational branch (e.g.
empirical, theoretical, and computational ecology, or physics, or linguistics.)
• Computational Science traditionally meant simulation. It grew out of our inability to find
closed-form solutions for complex mathematical models.
• 1990-now, data science
• The flood of data from new scientific instruments and simulations
• The ability to economically store and manage petabytes of data online
• The Internet and computing Grid that makes all these archives universally accessible
Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002
Data Science is…
• ...the art of turning data into actions.
The Field Guide to Data Science by Booz Allen Hamilton
http://www.wired.co.uk/article/art-algorithm-recreates-paintings
J.M. Turner’s “The Wreck of a Transport Ship”
Van Gogh’s “The Starry Night,”
Data Science is…
• …the exploration and quantitative analysis of all
available structured and unstructured data to
develop understanding, extract knowledge, and
formulate actionable results.
Where is all the data coming from?
Data
Value
From Data to Actions
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
The Sexiest Job Cartoon
Marion van de Wiel, DSC/e Workshop with industry in 2014.
Data Science is…
The Roles
18
Increasing
potential
to support
business
decisions
Customer
/ End User
Business
Analyst
Data
Scientists
Data
Engineer
/ DBA
Decision
Making
Data Presentation
Visualization Techniques
Modelling and Algorithms
Machine learning, and statistical models
Data Exploration
Statistical analysis, data visualization…
Data Preprocessing / Integration
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
Missing values, Duplicate values…
Data Science Skill Set
Data Scientist: The Engineer of the Future
Technical Skills
Business Skills
Social Skills
Data Science Process
Process DataFrame the Problem Collect Raw Data
Explore Data Perform in-depth analysis
Data Science Process
Communicate the Results
Applications of Data
Science
https://www.forbes.com/sites/kashmirhill/20
12/02/16/how-target-figured-out-a-teen-girl-
was-pregnant-before-her-father-did/
Story of Target
https://www.top500.org/news/watson-proving-better-than-doctors-in-diagnosing-cancer/
http://fortune.com/2016/10/11/ibm-watson-empoyees-cancer-drugs/
Web Analytics
• Web analytics is the measurement, collection, analysis
and reporting of web data for purposes of
understanding and optimizing web usage.
• Used to measure metrics / key performance indicators
such as
• Hit
• Page View Event
• Visitor
• Impression
• Bounce Rate
• Exit Rate
• Session Duration
• Click path
Web Analytics Software
• Google Analytics
• Yandex Metrica
• Count.ly (Turkey origin)
• Rakam.io (Turkey origin)
Count.ly
Collaborative Filtering
• Collaborative filtering is a method of making
automatic predictions (filtering) about the interests
of a user by collecting preferences or taste
information from many users (collaborating).
Collaborative Filtering
Collaborative Filtering
Linkedin People You May Know
Health Care Analytics
• Health care analytics is a term used to describe the
healthcare analysis activities that can be
undertaken as a result of data collected in
healthcare services, namely
• clinical data (electronic medical records),
• patient behavior and sentiment data.
• pharmaceutical and research and development data,
• claims and cost data,
Health Care Analytics
• Electronic Health Records (EHRs)
• Infrastructure and use cases to store, retrieve, and share
EHRs securely.
• Real-time Alerting
• Clinical Decision Support via wearables
• Predictive Analytics in Healthcare
• Increase the accuracy of diagnoses, preventive medicine and
public health, detection of risk of diabetes, etc.
• Telemedicine
• Delivery of remote clinical services such as remote patient
monitoring, initial diagnosis
• Telesurgery with the use of robots, etc.
http://www.datapine.com/blog/big-data-examples-in-healthcare/
Predictive Maintenance
• Reactive Point Processes: A New Approach to Predicting Power
Failures in Underground Electrical Systems. Seyda Ertekin, Cynthia
Rudin, Tyler McCormick. Annals of Applied Statistics,2015.
• A new statistical model designed for predicting discrete events (e.g.
fires, explosions & power failures) in time based on the past history.
Final Words
Resources and Datasets
• Kaggle Competitions http://kaggle.com
• UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/
• Kdnuggets http://www.kdnuggets.com/datasets/
• DataQuest https://www.dataquest.io/
• Massive Open Online Courses
• Coursera, edX, etc.
• …
Final Words
• Data science is the art of turning data into actions.
• As data increases data scientist will be a rare
resource.
Data Science Process
http://www.kdnuggets.com/2016/03/data-science-process.html
Data is Everywhere
Thank you

GTU GeekDay Data Science and Applications

  • 1.
    Data Science and Applications (AnIntroduction) Kürşat İNCE kince@havelsan.com.tr
  • 2.
    Kürşat İNCE 1996 –Today HAVELSAN, Inc. • 1996 - Development of HVL Firewall (#1 in Turkey) • 2001 - Developer in various projects: TuAF IS, MELTEM, etc. • 2010 - YGO Project/Product manager • 2014 - Move to HVL Istanbul Office  • 2014 - Systems Engineer • 2016 - R&D Coordinator June 2016 - Organizer at www.DataIstanbul.org BSc, Bilkent University Computer Engineering, 1996 MSc, Bilkent University Computer Engineering, 1999 PhD, Gebze Technical University Computer Engineering (in progress)
  • 4.
    FACILITIES HAVELSAN HEADQUARTERS (ANKARA) SIMULATION CENTERNAVAL COMBAT SYSTEMS CENTER - İSTANBUL R&D CENTER (METU Technopolis) TEST & INTEGRATION FACILITIES HAVELSANKEYFACTS SİSATEM
  • 5.
    BUSINESS AREAS HAVELSANKEYFACTS Command &Control Solutions House of Turkey COMMAND, CONTROL & COMBAT SYSTEMS A Global Brand in Simulation & Training TRAINING TECHNOLOGIES & SIMULATION SYSTEMS Leading E- Transformation Company of Turkey MANAGEMENT INFORMATION SYSTEMS Center of Excellence in Security Solutions HOMELAND & CYBER SECURITY SOLUTIONS
  • 6.
    • Meetup Community •Established: Mart 2016 • Members: ~1500 • Latest Events: • Büyük Veri için Veri Yapıları ve Algoritmalar • Web Analitiği ve Dönüşüm Oranı Optimizasyonu • Veri Bilimi ve Kişisel Verilerin Korunması • Planning hands-on Data Science course /data_istanbul/dataistanbul
  • 7.
    Agenda • Data Science •Roles, Skill Sets, and Process • Applications • Final Word • Resources, etc.
  • 8.
  • 9.
    9 Evolution of Sciences •Before 1600, empirical science • Direct observations • 1600-1950s, theoretical science • Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. • 1950s-1990s, computational science • Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) • Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. • 1990-now, data science • The flood of data from new scientific instruments and simulations • The ability to economically store and manage petabytes of data online • The Internet and computing Grid that makes all these archives universally accessible Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002
  • 10.
    Data Science is… •...the art of turning data into actions. The Field Guide to Data Science by Booz Allen Hamilton
  • 11.
    http://www.wired.co.uk/article/art-algorithm-recreates-paintings J.M. Turner’s “TheWreck of a Transport Ship” Van Gogh’s “The Starry Night,”
  • 12.
    Data Science is… •…the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
  • 13.
    Where is allthe data coming from?
  • 14.
  • 15.
  • 16.
    The Sexiest JobCartoon Marion van de Wiel, DSC/e Workshop with industry in 2014.
  • 17.
  • 18.
    The Roles 18 Increasing potential to support business decisions Customer /End User Business Analyst Data Scientists Data Engineer / DBA Decision Making Data Presentation Visualization Techniques Modelling and Algorithms Machine learning, and statistical models Data Exploration Statistical analysis, data visualization… Data Preprocessing / Integration Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems Missing values, Duplicate values…
  • 19.
    Data Science SkillSet Data Scientist: The Engineer of the Future Technical Skills Business Skills Social Skills
  • 20.
    Data Science Process ProcessDataFrame the Problem Collect Raw Data Explore Data Perform in-depth analysis
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 27.
    Web Analytics • Webanalytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. • Used to measure metrics / key performance indicators such as • Hit • Page View Event • Visitor • Impression • Bounce Rate • Exit Rate • Session Duration • Click path Web Analytics Software • Google Analytics • Yandex Metrica • Count.ly (Turkey origin) • Rakam.io (Turkey origin)
  • 28.
  • 29.
    Collaborative Filtering • Collaborativefiltering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).
  • 30.
  • 31.
  • 32.
  • 33.
    Health Care Analytics •Health care analytics is a term used to describe the healthcare analysis activities that can be undertaken as a result of data collected in healthcare services, namely • clinical data (electronic medical records), • patient behavior and sentiment data. • pharmaceutical and research and development data, • claims and cost data,
  • 34.
    Health Care Analytics •Electronic Health Records (EHRs) • Infrastructure and use cases to store, retrieve, and share EHRs securely. • Real-time Alerting • Clinical Decision Support via wearables • Predictive Analytics in Healthcare • Increase the accuracy of diagnoses, preventive medicine and public health, detection of risk of diabetes, etc. • Telemedicine • Delivery of remote clinical services such as remote patient monitoring, initial diagnosis • Telesurgery with the use of robots, etc. http://www.datapine.com/blog/big-data-examples-in-healthcare/
  • 35.
    Predictive Maintenance • ReactivePoint Processes: A New Approach to Predicting Power Failures in Underground Electrical Systems. Seyda Ertekin, Cynthia Rudin, Tyler McCormick. Annals of Applied Statistics,2015. • A new statistical model designed for predicting discrete events (e.g. fires, explosions & power failures) in time based on the past history.
  • 37.
  • 38.
    Resources and Datasets •Kaggle Competitions http://kaggle.com • UCI Machine Learning Repository http://archive.ics.uci.edu/ml/ • Kdnuggets http://www.kdnuggets.com/datasets/ • DataQuest https://www.dataquest.io/ • Massive Open Online Courses • Coursera, edX, etc. • …
  • 39.
    Final Words • Datascience is the art of turning data into actions. • As data increases data scientist will be a rare resource.
  • 40.
  • 41.