Lecture-8
Database Management and
IssuesManagement and Issues
Database
Networking Infrastructure
Database Management
Systems
Time to Think
Databases and Business Decision Making
Tools for Business Intelligence
Time to Think
How is the data organized?
Time to Think?
How is the data organized?
Time to Think?
How is the data organized?
Time to Think?
How is the data organized?
File organization concepts
–Database: Group of related files
–File: Group of records of same type
–Record: Group of related fields
–Field: Group of characters as word(s) or number
•Describes an entity (person, place, thing on which we store
information)
•Attribute: Each characteristic, or quality, describing entity
–E.g., Attributes Date or Grade belong to entity COURSE
DBMS
Issues with traditional Data Organization
The Data Hierarchy
Attributes of an entity
A computer system
organizes data in a
hierarchy that starts with
the bit, which represents
either a 0 or a 1. Bits can be
grouped to form a byte to
represent one character,
number, or symbol. Bytes
can be grouped to form a
field, and related fields can
be grouped to form a record.
Related records can be
collected to form a file, and
related files can be
organized into a database.
DBMS:Issues with traditional Data Organization
Data redundancy and
inconsistency
Program-data dependence
Lack of flexibility
Poor security
Lack of data sharing and
availability
DBMS
Database and DBMS
Database
Collection of data organized to serve many applications by
centralizing data and controlling redundant data
Database management system
A software that permits an organization to centralize data, manage them
efficiently, and provide access to the stored data by application
programs
Interfaces between application programs and physical data files
Separates logical and physical views of data
Solves problems of traditional file environment
Controls redundancy
Eliminates inconsistency
Uncouples programs and data
Enables organization to central manage data and data security
DBMS
Database approach to Data Management
DBMS: RDBMS Systems
Relational DBMS
Represent data as two-dimensional tables called relations or files
Each table contains data on entity and attributes
Table: grid of columns and rows
Rows (tuples): Records for different entities
Fields (columns): Represents attribute for entity
Key field: Field used to uniquely identify each record
Primary key: Field in table used for key fields
Foreign key: Primary key used in second table as look-up field to
identify records from original table
DBMS: RDBMS Systems
A relational database
organizes data in the
form of two-
dimensional tables.
Illustrated here are
tables for the entities
SUPPLIER and PART
showing how they
represent each entity
and its attributes.
Supplier Number is a
primary key for the
SUPPLIER table and a
foreign key for the
PART table.
DBMS: RDBMS Systems
Operations of a Relational DBMS
Three basic operations used to develop useful sets of data
SELECT: Creates subset of data of all records that meet stated
criteria
JOIN: Combines relational tables to provide user with more
information than available in individual tables
PROJECT: Creates subset of columns in table, creating tables with
only the information specified
Evolution of DBMS
RDBMS Systems
The select, project, and join operations enable data from two different tables
to be combined and only selected attributes to be displayed.
DBMS: RDBMS Systems – MS Access
Microsoft Access has a
rudimentary data dictionary
capability that displays
information about the size,
format, and other
characteristics of each field in
a database. Displayed here is
the information maintained
in the SUPPLIER table. The
small key icon to the left of
Supplier Number indicates
that it is a key field.
Illustrated here are
the SQL statements
for a query to
select suppliers for
parts 137 or 150.
Evolution of DBMS
RDBMS Systems – MS Access
Illustrated here is how the
query would be
constructed using query-
building tools in the
Access Query Design View.
It shows the tables, fields,
and selection criteria used
for the query.
Designing Databases
–you must understand the relationships among the data,
–the type of data that will be maintained in the database,
–how the data will be used, and how the organization will need to change to
manage data from a company-wide perspective.
–Conceptual (logical) design: Abstract model from business perspective
–Physical design: How database is arranged on direct-access storage
devices
–Normalization
•Streamlining complex groupings of data to minimize redundant data
elements and awkward many-to-many relationships.
•Most efficient way to group data elements to meet business requirements,
needs of application programs
–referential integrity
•rules to ensure that relationships between coupled tables remain
consistent
–Entity-relationship diagram
•Used by database designers to document the data model
•Illustrates relationships between entities
THE CHALLENGE OF BIG DATA
•beyond the ability of typical DBMS to capture, store, and
analyze.
•billions to trillions of records, all from different sources.
•Businesses are interested in big data because they can reveal
more patterns and interesting anomalies than smaller data
sets, with the potential to provide new insights into customer
behavior, weather patterns, financial market activity, or other
phenomena.
•To derive business value from these data, organizations need new
technologies and tools capable of managing and analyzing non-
traditional data along with their traditional enterprise data.
Databases and Business Decision
Making
Databases store historical data so the information about
trends, changes across entire company cannot be obtained
from a database
Functional silos in an organization prevent data connectivity
between various departments in an organization
Databases and Business Decision
Making
Data warehouse:
Stores current and historical data from many core operational
transaction systems
Consolidates and standardizes information for use across
enterprise, but data cannot be altered
Data warehouse system will provide query, analysis, and
reporting tools
Data marts:
Subset of data warehouse
Summarized or highly focused portion of firm’s data for use by
specific population of users
Typically focuses on single subject or line of business
Hadoop
–is an open source software framework managed by the Apache Software
Foundation
–For handling unstructured and semi-structured data in vast quantities, as well
as structured data.
–enables distributed parallel processing of huge amounts of data across
inexpensive computers “servers”.
–breaks a big data problem down into sub-problems, distributes them among
up to thousands of inexpensive computer processing nodes,
– and then combines the result into a smaller data set that is easier to analyze.
–Hadoop consists of several key services:
•Hadoop Distributed File System (HDFS) for data storage.
•MapReduce for high-performance parallel data processing
–Facebook announced the data gathered in the warehouse grows by roughly half
a PB per day. / PB is 1000⁵
In-Memory Computing
–Another way of facilitating big data analysis.
–relies primarily on a computer’s main memory (RAM) for data
storage. (Conventional DBMS use disk storage systems.)
Databases and Business Decision
Making
Tools for Business Intelligence
Tools for consolidating, analyzing, and providing access to
vast amounts of data to help users make better business
decisions
E.g., Harrah’s Entertainment analyzes customers to develop
gambling profiles and identify most profitable customers
Principle tools include:
Software for database query and reporting
Online analytical processing (OLAP)
Data mining
Tools for Business Intelligence
A series of
analytical tools
works with data
stored in
databases to find
patterns and
insights for
helping managers
and employees
make better
decisions to
improve
organizational
performance.
Tools for Business Intelligence
Online analytical processing (OLAP)
Supports multidimensional data analysis
Viewing data using multiple dimensions
Each aspect of information (product, pricing, cost, region, time
period) is different dimension
E.g., how many washers sold in East in June compared with other
regions?
OLAP enables rapid, online answers to ad hoc queries
Tools for Business Intelligence
The view that is showing
is product versus
region. If you rotate the
cube 90 degrees, the
face that will show is
product versus actual
and projected sales. If
you rotate the cube 90
degrees again, you will
see region versus actual
and projected sales.
Other views are
possible.
Multi-dimensional Data Model
Tools for Business Intelligence
Data Mining
More discovery driven than OLAP
Finds hidden patterns, relationships in large databases and
infers rules to predict future behavior
E.g., Finding patterns in customer data for one-to-one
marketing campaigns or to identify profitable customers.
Tools for Business Intelligence
Data Mining-Types of Information Available
Associations: Occurrences linked to a single event
E.g., When corn chips are purchased, a cola drink is purchased
65% of the time
Sequences: Events are linked over time
E.g., If a house is purchased, a new refrigerator will be purchased
within two weeks 65% of the time
Classification: Recognizes patterns that describe the group to
which an item belongs by examining existing items that have
been classified and by inferring a set of rules
Help discover the kind of customers who are likely to leave
Tools for Business Intelligence
Data Mining-Types of Information Available
Clustering: Similar to classification but groups are not
defined
Discovers groupings within data, such as finding affinity for bank
cards
Categorizing database into groups of customers based on
demographics and types of persona investments
Forecasting: Uses a series of existing values to forecast what
other values will do
E.g., Finding patterns in data to help managers estimate the future
value of continuous variables, such as sales figures
Tools for Business Intelligence
Predictive Analysis and Text Mining
Predictive analysis
Uses data mining techniques, historical data, and assumptions
about future conditions to predict outcomes of events
E.g., Probability a customer will respond to an offer or purchase
a specific product
Text mining
Extracts key elements from large unstructured data sets (e.g.,
stored e-mails)