KEMBAR78
Smart Data for Smart Labs | PDF
Smart Data for Smart Labs:
Utilizing Semantic Technologies for Improved
Integration and Sharing of Laboratory Data
Eric Little, PhD
VP Data Science
eric.little@osthus.com
Slide 2
Outline
The Current Laboratory Data Situation
The Growing Importance of Data As A Corporate Asset
What is Semantic Technology and How Can It Help?
Moving Beyond Semantics – Big Data & Analytics
Smart Labs for the 21st Century
Slide 3
The Current Lab Situation
Many challenges exist for
data to be captured,
integrated and shared
 Data Silos
 Incompatible
instruments and
software systems
 Legacy architectures are
brittle and rigid
 SME knowledge resides
in people’s heads
 Data schemas are not
explicitly understood
 Lack of common vision
between business units
and scientists
Slide 4
Pharma Example in Action
Documentation
Initial Step
Local
Regulatory
Affiliate
Calibration
SME
Instrumentation
Marketing
R&D Data
R&D Tech
R&D Data Stores
Production Data
External
Regulatory
Affiliate
Manual Data
Verification
Process
Verify
OK?
NO
YES
Finalized
Report
• This process can take weeks to complete
because it often had to be done several
times over due to errors.
• Relations must be built by hand on the
user side from flat files or spreadsheets.
The relations can therefore not be retained
over time or automatically generated later.
• The DBs are not built for retrieval of
different information types – the joins are
not always there.
Slide 5
Why Data Matters
Enterprise systems are increasingly
“hybrid” in their design and architectures
 Legacy Data Sources combined with new
tech
Integrating data is becoming more
complex
 The size of data sources continues to grow
 Different user groups within organizations
 Answers need to reflect increasingly
complex patterns
Finding and utilizing key data within an
organization is of increasing importance
 Data is a valuable corporate asset
 The fundamentals of data management
have changed. Basic storage & retrieval has
given way to analytics and
responsiveness.
Slide 6
Analytics and Data Science for the 21st Century
The rate of change in digital information is growing exponentially
 Cloud Computing is now critical for scaling an enterprise
 New data types are being created - hold significant value
 Data is becoming more personalized and context-based
The effect of data is changing the business landscape
 90% of the world’s data was produced in the last 2 years – how well can
you mine/leverage this data? What is this worth to a company?
 $900 Billion/year: cost of lowered employee productivity and reduced
innovation from information overload – how can we avoid these costs?
“Increasing volume and detail of enterprise information, multimedia, social media, and the
Internet of Things will fuel exponential growth in data for the foreseeable future.”
“The use of big data will become a key basis of competition and growth for individual firms.”
McKinsey: “Big data: The next frontier for innovation, competition, and productivity”, May 2011
Semantic Technologies:
What Are They & How Are They Used?
Slide 8
The Value of Semantics
Has its origins in philosophy - generally understood as the abstract
study of meaning
Distinguished from syntax – which is the rules-based grammar of a
language
“Washington”
Slide 9
Semantic Web and IT Evolution: Evolving from
Code-Centric to Data-Centric IT
Semantic technologies: IT evolution from code to data centricity
 In the Code-Centric years, data was often stored in flat files
 The creation of databases, specifically Network and RDBMS, was
one of the first steps leading to Data-Centric evolution
 The last decade has seen standards such as XML, RDF, Web
Services, and now OWL, that further evolve IT to a Data-Centric
environment
2016
Slide 10
Utilizing Taxonomies for Reference Data
Management
Taxonomies provide important
structure to data - as a-cyclical
tree graphs
2 Types of Applications:
• Captures sub-class and super-
class relationships
• Captures broad/narrow
relationships between terms
Slide 11
Allotrope Foundation Taxonomies (AFT)
mass
intensity
af-m:AFM_0000350
af-r:AFR_0000495
Slide 12
Utilizing the Semantic Spectrum
(Moving Beyond Taxonomies)
Code (Lists) Terms (Soil, Plant, etc.)
Controlled Vocabulary
(Agreed Upon Terms)
Taxonomy
(Hierarchy)
Thesaurus
(Preferred Labels, Synonyms, etc.)
RDF Models
(Triples as Graphs)
OWL Ontologies
(RDF + Axioms)
Reasoning
(Rule-based Logics:
Discover New Patterns)
Ontologies and Reasoning add
Axioms and Advanced Logic
Slide 13
Levels of Semantic Expressivity
Semantics can be modeled at many levels
 Finding the right level is a tradeoff of expressivity, performance,
decidability, and other factors
 The weakest representation is basic syntax matching
 The strongest representation is higher order logic
 Semantic representation in RDF and ontologies is roughly in the
middle
Using knowledge representation one can separate schema
level from data level
 Data becomes much more flexible and reusable
 Allows easier transformation of data to knowledge creation
 Raises computational value (now data can be more easily
extracted from legacy systems, shared, and used across an
enterprise).
Slide 14
Benefits of Semantic Technology
Interoperability
Searching/
Browsing
Reuse
Architectural
Intent
Automated
Reasoning
Development
Lifecycle
Moving From Semantics to
Big Data Analytics
Slide 16
The power of analytics is now just
beginning to be felt
 Moore’s Law pertaining to
processing is not the problem
Focus on the growth of Analysis:
 From 1988-2003 Computer
processing speed grew by
1000x
 In the same period algorithm
dev grew by 43,000x
 What does this tell you about
the direction in which we are
headed?
As data grows, so too will the need
to utilize it more effectively
The Rise of Analytics is Changing the Game
ANALYTICS
Slide 17
Understanding the 4V’s of Big Data
Normally the focus –
Big Data Analysis is
more than just size
Performance is
Critical to Success
Data complexity is
increasing – Model
complexity
Uncertainty abounds
– requires statistics
and probabilities
Majority of Big Data analytics
approaches treat these two V’s
Semantic
technologies provide
clear advantages
Mathematical
Clustering
Techniques
provide clear
advantages
Slide 18
Why Semantics Matters for Data Analytics
Big Data approaches
require proper metadata
and terminologies to
integrate information well
Relationships matter in the
data
Understanding perspective
(context) is crucial for
success in today’s world
Semantics provides better
data models/schemas
Slide 19
Smart Labs for the 21st Century
Smart labs in the future will provide
customers with:
Integrated Data – common reference
data structures (vocabularies)
Sharable Data – easier interaction
across teams and business units
Scalability – Big data applications
that can be highly elastic
Conceptual Representations –
context and perspective are captured
Advanced Analytics – complex &
automated problem-solving
capabilities

Smart Data for Smart Labs

  • 1.
    Smart Data forSmart Labs: Utilizing Semantic Technologies for Improved Integration and Sharing of Laboratory Data Eric Little, PhD VP Data Science eric.little@osthus.com
  • 2.
    Slide 2 Outline The CurrentLaboratory Data Situation The Growing Importance of Data As A Corporate Asset What is Semantic Technology and How Can It Help? Moving Beyond Semantics – Big Data & Analytics Smart Labs for the 21st Century
  • 3.
    Slide 3 The CurrentLab Situation Many challenges exist for data to be captured, integrated and shared  Data Silos  Incompatible instruments and software systems  Legacy architectures are brittle and rigid  SME knowledge resides in people’s heads  Data schemas are not explicitly understood  Lack of common vision between business units and scientists
  • 4.
    Slide 4 Pharma Examplein Action Documentation Initial Step Local Regulatory Affiliate Calibration SME Instrumentation Marketing R&D Data R&D Tech R&D Data Stores Production Data External Regulatory Affiliate Manual Data Verification Process Verify OK? NO YES Finalized Report • This process can take weeks to complete because it often had to be done several times over due to errors. • Relations must be built by hand on the user side from flat files or spreadsheets. The relations can therefore not be retained over time or automatically generated later. • The DBs are not built for retrieval of different information types – the joins are not always there.
  • 5.
    Slide 5 Why DataMatters Enterprise systems are increasingly “hybrid” in their design and architectures  Legacy Data Sources combined with new tech Integrating data is becoming more complex  The size of data sources continues to grow  Different user groups within organizations  Answers need to reflect increasingly complex patterns Finding and utilizing key data within an organization is of increasing importance  Data is a valuable corporate asset  The fundamentals of data management have changed. Basic storage & retrieval has given way to analytics and responsiveness.
  • 6.
    Slide 6 Analytics andData Science for the 21st Century The rate of change in digital information is growing exponentially  Cloud Computing is now critical for scaling an enterprise  New data types are being created - hold significant value  Data is becoming more personalized and context-based The effect of data is changing the business landscape  90% of the world’s data was produced in the last 2 years – how well can you mine/leverage this data? What is this worth to a company?  $900 Billion/year: cost of lowered employee productivity and reduced innovation from information overload – how can we avoid these costs? “Increasing volume and detail of enterprise information, multimedia, social media, and the Internet of Things will fuel exponential growth in data for the foreseeable future.” “The use of big data will become a key basis of competition and growth for individual firms.” McKinsey: “Big data: The next frontier for innovation, competition, and productivity”, May 2011
  • 7.
    Semantic Technologies: What AreThey & How Are They Used?
  • 8.
    Slide 8 The Valueof Semantics Has its origins in philosophy - generally understood as the abstract study of meaning Distinguished from syntax – which is the rules-based grammar of a language “Washington”
  • 9.
    Slide 9 Semantic Weband IT Evolution: Evolving from Code-Centric to Data-Centric IT Semantic technologies: IT evolution from code to data centricity  In the Code-Centric years, data was often stored in flat files  The creation of databases, specifically Network and RDBMS, was one of the first steps leading to Data-Centric evolution  The last decade has seen standards such as XML, RDF, Web Services, and now OWL, that further evolve IT to a Data-Centric environment 2016
  • 10.
    Slide 10 Utilizing Taxonomiesfor Reference Data Management Taxonomies provide important structure to data - as a-cyclical tree graphs 2 Types of Applications: • Captures sub-class and super- class relationships • Captures broad/narrow relationships between terms
  • 11.
    Slide 11 Allotrope FoundationTaxonomies (AFT) mass intensity af-m:AFM_0000350 af-r:AFR_0000495
  • 12.
    Slide 12 Utilizing theSemantic Spectrum (Moving Beyond Taxonomies) Code (Lists) Terms (Soil, Plant, etc.) Controlled Vocabulary (Agreed Upon Terms) Taxonomy (Hierarchy) Thesaurus (Preferred Labels, Synonyms, etc.) RDF Models (Triples as Graphs) OWL Ontologies (RDF + Axioms) Reasoning (Rule-based Logics: Discover New Patterns) Ontologies and Reasoning add Axioms and Advanced Logic
  • 13.
    Slide 13 Levels ofSemantic Expressivity Semantics can be modeled at many levels  Finding the right level is a tradeoff of expressivity, performance, decidability, and other factors  The weakest representation is basic syntax matching  The strongest representation is higher order logic  Semantic representation in RDF and ontologies is roughly in the middle Using knowledge representation one can separate schema level from data level  Data becomes much more flexible and reusable  Allows easier transformation of data to knowledge creation  Raises computational value (now data can be more easily extracted from legacy systems, shared, and used across an enterprise).
  • 14.
    Slide 14 Benefits ofSemantic Technology Interoperability Searching/ Browsing Reuse Architectural Intent Automated Reasoning Development Lifecycle
  • 15.
    Moving From Semanticsto Big Data Analytics
  • 16.
    Slide 16 The powerof analytics is now just beginning to be felt  Moore’s Law pertaining to processing is not the problem Focus on the growth of Analysis:  From 1988-2003 Computer processing speed grew by 1000x  In the same period algorithm dev grew by 43,000x  What does this tell you about the direction in which we are headed? As data grows, so too will the need to utilize it more effectively The Rise of Analytics is Changing the Game ANALYTICS
  • 17.
    Slide 17 Understanding the4V’s of Big Data Normally the focus – Big Data Analysis is more than just size Performance is Critical to Success Data complexity is increasing – Model complexity Uncertainty abounds – requires statistics and probabilities Majority of Big Data analytics approaches treat these two V’s Semantic technologies provide clear advantages Mathematical Clustering Techniques provide clear advantages
  • 18.
    Slide 18 Why SemanticsMatters for Data Analytics Big Data approaches require proper metadata and terminologies to integrate information well Relationships matter in the data Understanding perspective (context) is crucial for success in today’s world Semantics provides better data models/schemas
  • 19.
    Slide 19 Smart Labsfor the 21st Century Smart labs in the future will provide customers with: Integrated Data – common reference data structures (vocabularies) Sharable Data – easier interaction across teams and business units Scalability – Big data applications that can be highly elastic Conceptual Representations – context and perspective are captured Advanced Analytics – complex & automated problem-solving capabilities