Data Science
Objectives
• Describe the data-related implications for data and database
systems within the data to wisdom continuum.
• Explain database models.
• Describe the purpose, structures, and functions of database
management systems (DBMSs).
• Describe generating, storing, curating, retrieving, and
interpreting data and related issues.
• Explore concepts and issues related to data warehouses, data
marts, data stores, big data, dashboards, and data analytics.
• Explain knowledge discovery in databases (KDD) including data
mining, data analytics, and benchmarking and their
relationship to evidence-based practice and value-based
patient centric care.
Information Management
• Information Management: applying management
techniques in the collection, storage, analyses,
dissemination, archiving, and destruction of internal
and external data in order to manage operations,
take decisions, plan projects, develop policies,
resource planning ….etc.
Association for Project management (APM) (2020)
IM Steps/Plan/Process
1. Define goal and aims
2. Identify needed data
3. Data collection
4. Storage
5. Analysis
6. Dissemination
7. Archiving
8. Destruction
THE NELSON DATA TO WISDOM CONTINUUM
• The Data, Information, Knowledge and Wisdom Model (Nelson
D-W) was included for the first time in the 2008 American
Nurses Association (ANA) Scope and Standards of Practice for
Nursing Informatics (American Nurses Association, 2008).
• The first version of the model was published in 1989 and
included only a brief definition of the concepts (Nelson & Joos,
1989). Since that initial publication there have been three
additional versions of the model published. Each revision of
the model attempted to better illustrate the overlapping
nature of the four concepts of data, information, knowledge,
and wisdom and the complex interaction between and within
each of these four concepts as well as the environment
(Nelson, 2018; Ronquillo, Currie, & Rodney, 2016).
THE NELSON DATA TO WISDOM CONTINUUM
DIKW Model
W
i
s
d
o
Actionable
m
K
n
o
w
l
e
d Context
g
e
I
n
f
o
r
m
a
t
i
Meaning
o
n
D
a
t Raw
a
Data Science
Data Science:
It is the amalgamation
of classical disciplines
like statistics, data
mining, databases,
domain knowledge,
and computer
systems.
Van Der Aalst (2016)
Data Sources
Data Types
1. Structured Data
2. Unstructured Data
• Which type of Data is more common in Health Care?
• NLP (Natural language Processing).
Big Data
• Volume: When speaking about the volume of big data, this means the amount of
data created on a given day. It is estimated that 2.5 Quintillion bytes of data are
being created each day.
• Variety: A second aspect of big data is the variety of data being produced and
combined in order to gain insights. In terms of healthcare, this variety of data could
be handwritten doctor’s notes that have been digitized, lab results, medical
imaging, social media posts, etc.
• Velocity: Describes the trend toward gathering data from sensors or other real-
time data sources, such as Fitbits, that are streaming information directly into our
data repository.
• Veracity: One of the potential pitfalls of relying on big data is that the veracity of
the data is often not verified. As will be discussed in the next section, massive
amounts of data are often being collected, but these data are not being cleaned or
curated over time.
• Value: The fifth aspect of big data is clinically relevant data that bring value to both
the patient and healthcare systems. The value of big data is that it can lead to
value-based patient centric care and reduced costs.
• Variability: addresses the extent and speed that the structure of the data are
changing as well as the frequency of the change. In healthcare, seasonal variations
in flu strains and outbreaks of epidemics demonstrate the variability of illnesses.
DATA REPOSITORIES (STORING DATA)
• Database Manage (DBMS): is software that contains the
database as well as a collection or set of programs for
accessing and processing these data in the database thereby
identifying relationships between the data.
• It is important to realize that different DBMS can manage the
same database.
• A common example of this in healthcare are the many
different library-based DBMSs used to access the data in the
MEDLINE database. Another obvious example is the variety of
electronic health record (EHR) Systems.
From Servers to End products
DBMS Storing
Curating
Curating
• Servers • Warehouse
Backend Servers/Databases
• Cloud vs. In-house: One of the biggest developments in the recent
past is cloud computing. In a cloud-hosted DBMS the back end is
accessed through the Internet, while in an in-house hosted system
the server that houses the database is on site.
• Distributed vs. Centralized: One of the decisions that needs to be
made is whether the database is going to be distributed or
centralized. A centralized system is one where there is a single,
central computer that hosts a database and the DBMS. Many
hospitals today are examples of this type of system. The hospital is
the “hub” and hosts the system where many users on the network
access this database. A distributed system is one where there are
multiple database files located at different sites. The main difference
between these two options is one of control. In a centralized system,
there is a central control mechanism. Conversely, in a distributed
system there is no centralized control structure
Data Base Models
• Relational Database Models The Relational Database Model is still the
most popular form of DBMS, but Non-Relational Databases (e.g. MpSQ) are
on the rise. In the Relational Database Model, tables are related to each
other through a system of keys. Each table has a primary key which allows
the system to request one record at a time. Tables can be combined in such
a way to allow the system to generate reports based on all of tables. The
main features of this type of a system are tables, attributes, and keys
where attributes are the columns in the tables and keys are what allows us
to find one record in the table. The functions they provide include creating,
updating, or changing data, deleting data, and querying generally by means
of Structured Query Language (SQL) statements. Examples of widely used
RDBMS include Oracle, MySQL, Microsoft SQL Server, and DB2.
• NoSQL Database Models NoSQL is an agile system that easily processes
unstructured data and semi-structured data. It is cloud-friendly and a new
way of thinking about databases. NoSQL doesn’t adhere to traditional
RDMS structure, has a rich query language, and is easily scalable
(MongoDB, 2019)
Data Base Models
From Servers to End products
DBMS Storing
Curating
Curating
• Servers • Warehouse
Data Warehousing
• While a DBMS provides a structure to data, a data warehouse
provides specificity. Many organizations have developed
specific systems to meet their needs: these are data
warehouses.
• Data warehousing is “the process of extracting, integrating,
transforming, and cleansing data and storing it in a
consolidated database” (Mullins, 2013, p. 638).
• The purpose of a data warehouse is to provide a place to store
multiple forms of data in a lightly summarized way.
Other Types of Data Repository
• Data Marts: is a DBMS that is for a single unit of work and
may contain a subset of data stored in a warehouse. For
instance, a hospital may have a data warehouse where all
information is housed, and a single department may have a
data mart.
• Data Lakes: is a freer form of a DBMS where the structure of
the data is loose and varied including structured, semi-
structured, and unstructured data. Input processing can be in
batch, real-time, or one-time loads.
Which one Best Fits?
• Network configuration: What type of network will the system
be running on? For example, local area network (LAN), wide
area network (WAN), and wireless local area network (WLAN).
• Type of data being stored: If there will be a lot of medical
images, videos, or sounds, it is important to realize that these
need a lot of space.
• Amount of data: How much data are there? If there is a large
amount of data, a system that allows for faster retrieval from
the system may be necessary.
• Systems interoperability: Are there requirements that the
system interface with another system?
• Budget considerations: How much money is being dedicated
to the database project?
From Servers to End products
DBMS Storing
Curating
Curating
• Servers • Warehouse
Analytics and Data mining
• Analytics: Once the data has been stored, curated, and
retrieved, it is then the responsibility of the end user to go
through and perform analytics on the data.
• Descriptive Analytics: Describing Current Status.
• Prescriptive Analytics: “prescribing” solutions.
• Predictive Analytics: Forecasting.
• Data Mining: The process of extracting information and
knowledge from large-scale databases is known as knowledge
discovery and data mining (KDD).
• In more detail, data mining is the computational process that
allows us to use our data in order to “mine” insights that we
may not have seen without the assistance of a computer.
Dashboards
• Dashboards: The presentation of data may also take the form of a
dashboard.
• The main characteristic of a dashboard is a snapshot view of several
pieces of key metrics at once.
• The data for the dashboard are usually from diverse data sources and
the system is updating the data in real time.
• The purpose of a dashboard is to make it easy to get a snapshot of the
database or key performance indicators in one glance.
Dashboards
Expert Systems
• Expert Systems Expert systems represent the present and
future vanguard of nursing informatics. These systems aim to
help make the nurse “more intelligent” in providing quality
care based on evidence.
• Expert systems use artificial intelligence (AI) to model the
decisions an expert nurse would make.
• They provide the “best decision” recommendation based on
what an expert nurse would do unlike decision support
systems that provide several options from which the nurse
selects.