Chapter 4:
Data Management
DR. SARAH NAIEM
S OU RCE BOOK : I NTRODUCTION TO I NFORMATION
SYSTEMS-1ST E DI TI ON [ EA R LY R E L EASE]
BY: P ROF. MA N A L A BDE L - K A DER A BDE L - FAT TA H
Introduction
▪A database is a structured data repository essential to information
systems that help organizations achieve their goals by providing accurate
and timely information.
▪ For example, a movie database with details about Tom Cruise's career can help an
agent quickly connect clients with relevant products or films. Databases support
business efficiency by reducing costs, boosting profits, tracking history, and exploring
new markets.
▪ Some databases are collaborative and international, like the one used by
organizations such as OPEC and the UN to manage global oil supply.
▪ Accurate databases are critical for decision-making; for instance, Albuquerque, New
Mexico’s public database enables residents to access data on water usage, crime
statistics, and campaign contributions, reducing the city's administrative burden.
Introduction
▪Databases form the foundation of most systems development projects, and poor design
can lead to system failure.
▪A database management system (DBMS) manages databases, ensuring secure and
organized data handling.
▪These systems, purchased from vendors, centralize control and are maintained by a
database administrator (DBA) responsible for data security and quality.
▪Security incidents, such as data breaches at educational institutions or paycheck errors
due to database issues, highlight the importance of vigilant database management.
▪DBAs must remain vigilant regarding data quality and accuracy,
▪ A database error in the United Kingdom that left 400,000 individuals without paychecks in March
2007
Overview of Data Management
▪An organization's ability to carry out essential business operations hinges on
both data and its processing capabilities.
▪These functions are indispensable for tasks like compensating employees,
generating invoices, replenishing inventory, and providing decision-making
support to managers.
▪Data encompasses unprocessed information, including metrics like employee
IDs and sales data.
▪To convert data into valuable information, it must initially undergo meaningful
organization.
Overview of Data Management: Data Hierarchy
▪Data is organized hierarchically, beginning with the smallest unit, the "bit," and
culminating in a "database.“
▪ A "bit" represents a binary digit that can be either on or off.
▪Bits combine to form "bytes," each made up of eight bits and used to represent a
"character" (letters, numbers, or symbols), the basic unit of information.
▪Characters are grouped into "fields," which describe specific attributes of business
objects (like employees or locations) or activities (like sales) and can include derived
fields, such as totals or averages.
▪A set of related fields creates a "record," offering a complete description of an entity or
activity, like an "employee record" with fields for name, address, and salary.
Overview of Data Management: Data Hierarchy
▪Records are compiled into "files" (or "tables"
in some software), such as an "employee file"
containing all employee records.
▪At the top of the hierarchy is the "database,"
which integrates and interconnects multiple
files, representing the complete structure:
bits, characters, fields, records, files, and
finally, the database itself.
Overview of Data Management: Data
Entities, Attributes, and Keys
▪Database concepts include entities, attributes, and keys, each playing a crucial role in organizing
data.
▪An entity represents a broad category—such as individuals, locations, or objects—about which
data is gathered, stored, and managed.
▪ Common entities include employees, inventory items, and customers, forming the foundational
structure in which organizations organize their data.
▪Attributes are specific characteristics of each entity.
▪ For example, an employee entity may include attributes like employee number, name, hire date, and
department number, while an inventory item may have attributes such as inventory number,
description, quantity, and warehouse location.
▪ Likewise, customer entities include attributes like customer number, name, address, and credit rating.
▪These attributes are selected to represent the most relevant details about each entity type.
Overview of Data Management: Data
Entities, Attributes, and Keys
▪In databases, the value of an attribute, called a data item, is stored within a record’s fields that
describe an entity.
▪Attributes and data items are widely used in data management by both organizations and
government agencies.
▪ For instance, the FBI’s Next Generation Identification database will store digital images of faces,
fingerprints, and palm prints of citizens and visitors, with each person as an entity, each biometric as an
attribute, and each image as a data item. This database will aid in forensic analysis and homeland
security.
▪A record consists of related fields about a particular object, while a key identifies that record.
▪ A primary key is a unique field, like an employee number, that distinguishes each record.
Additionally, secondary keys can help locate records based on alternative criteria.
▪ For instance, if a mail-order customer forgets their customer number, a clerk can use the last name as a
secondary key to locate their record and complete the order
Overview of Data Management: Data
Entities, Attributes, and Keys
Overview of Data Management: Data
Entities, Attributes, and Keys
▪In the traditional data management approach, each application
maintains its own separate data files tailored specifically to its function.
▪ For instance, a payroll application would have its own payroll file, an
inventory system would have its own inventory file, and so on.
▪This siloed structure often leads to data redundancy (duplicate data
across different files) and data inconsistency (different values for the same
data across files), as each application operates independently without a
shared data source.
▪Maintenance can also become a hideous job, as any changes in data
formats or values must be updated across multiple files to ensure
consistency.
Overview of Data Management: Data
Entities, Attributes, and Keys
▪In contrast, the database approach centralizes all data within a unified database managed by a
database management system (DBMS).
▪Here, data is stored once and made accessible to all authorized applications, avoiding duplication and
ensuring consistency.
▪ For example, employee data is stored in a single, centralized database, and both the payroll and HR
applications access this same data.
▪ This approach also enables data integrity and security, as the DBMS enforces consistent data rules
and permissions.
▪Centralization in the database approach promotes efficiency by enabling controlled data sharing
across applications, improving data accuracy, reducing redundancy, and simplifying management.
▪As a result, applications within an organization can work together seamlessly, accessing and updating
the same data without risking inconsistencies, ultimately providing a more reliable and manageable
data environment.
Overview of Data Management: Data
Entities, Attributes, and Keys
▪Organizations increasingly adopt the database approach to manage data, enabling multiple
applications to access a shared pool of interconnected data.
▪This shared database facilitates data and information sharing.
▪Example, federal databases might store DNA test results for convicted criminals, accessible
nationwide to law enforcement agencies.
▪Implementing this approach requires specialized software called a database management
system (DBMS), which consists of programs that mediate between the database and its users,
including application programs.
▪The DBMS typically serves as a buffer, ensuring seamless interaction between application
programs and the database.
Overview of
Data
Management:
Data Entities,
Attributes,
and Keys
Data Modeling
▪When organizing a database, key considerations include determining the data to collect,
defining access permissions, and outlining potential use cases.
▪After addressing these factors, an organization can begin the database creation process, which
involves two main phases: logical design and physical design.
▪The logical design phase focuses on structuring data in an abstract way to meet the
organization’s information needs.
▪It involves identifying relationships among data elements and ensuring that the design
addresses the needs of all stakeholders across functional areas.
▪Collaboration in this phase ensures that the database supports various business functions
effectively
▪The physical design phase follows, refining the database for better performance and cost-
efficiency.
Data Modeling
▪This includes optimizing response times, reducing storage, and minimizing operating costs.
▪Adjustments to the logical design may involve consolidating data entities, including summary
totals, or duplicating data attributes for improved performance, even if it results in some
planned data redundancy.
▪To create a data model, database designers analyze business challenges and the data required
to address them.
▪Enterprise data modeling, which starts with strategic data needs and moves to specific
functional areas, is used to understand the overall data requirements and how different
departments use it.
Data Modeling
▪Several models have been developed to help managers and database designers analyze data and
information needs.
▪One such model is the entity-relationship (ER) diagram, which uses standardized graphical symbols to
represent data structure and relationships.
▪In ER diagrams, boxes typically represent data items or entities within tables, while diamonds
symbolize the relationships between these entities.
▪These diagrams visually show how data entities are structured in tables and how they are
interconnected.
▪ER diagrams are vital for ensuring that relationships among data entities are accurately designed,
which in turn supports the alignment of application programs with business operations and user
requirements.
▪They also serve as valuable reference tools for making future modifications to the database design
once it is in use.
Data Modeling
▪In this particular design, one salesperson serves
multiple customers, representing a one-to-many
relationship, denoted by the "crow's-foot"
symbol.
▪The diagram also depicts that each customer can
place multiple orders, each order can contain
multiple line items, and many line items can be
associated with the same product—a many-to-
one relationship.
▪This database design can accommodate one-to-
one relationships, such as one order generating
one invoice.
Relational Database Model
▪The relational model describes data using a standardized tabular format, where all data
elements are arranged in two-dimensional tables referred to as relations, mirroring the
logical equivalence of files.
▪ In relational databases, data is organized into rows and columns within these tables,
simplifying data retrieval and manipulation.
▪It is generally more comprehensible for managers compared to other database models
▪In the relational model, each row in a table signifies a data entity, commonly known as
a record, while each column represents an attribute, which is equivalent to a field.
▪Each attribute has a predefined domain that limits the permissible values it can accept.
Relational Database Model
▪This domain concept ensures data accuracy; for instance, an attribute like
gender would only accept values "male" or "female,“ while a pay rate
attribute would exclude negative numbers.
▪Defining domains in this manner enhances data precision.
▪Prominent databases built on the relational model encompass IBM DB2,
Oracle, Sybase, Microsoft SQL Server, Microsoft Access, and MySQL.
▪Oracle leads the market in general-purpose databases, commanding
approximately half of the multibillion-dollar database industry.
▪Oracle's latest edition, 11g, is notably sophisticated, featuring database
grids that enable a single database to run across a cluster of computers
Relational
Database Model
Relational Database Model
▪After data is input into a relational database, users can perform various Data Manipulation operations
to query and analyze the data.
▪Key operations include selecting, projecting, and joining.
▪Selecting" refers to filtering rows based on specific criteria.
▪ For instance, in a project table containing data about multiple projects, a user can select to isolate the row
related to Project 226, the sales manual project, and retrieve its corresponding department number, which in
this case is 598.
▪"Projecting" involves reducing the number of columns in a table to focus on specific data. For
example, a department table might include the department number, department name, and Social
Security number (SSN) of project managers
▪ A sales manager may want to create a new table with only the department number and the SSN of managers
responsible for the sales manual project. Using projection, they can remove the department name column
and generate a table containing only the department number and SSN
Relational Database Model
▪"Joining" refers to merging two or more tables to combine related information.
▪For instance, a user can join the project table with the department table to create a new table
that includes project details (such as project number and description), department details (such
as department number and name), and the SSN of the manager overseeing each project.
▪In a relational database, as long as tables share at least one common data attribute, they can be
interconnected to provide valuable information and generate reports.
▪The ability to link tables via common data attributes is a crucial aspect of the flexibility and
power inherent in relational databases.
▪Suppose the company president wishes to ascertain the name of the manager for the sales
manual project and the manager's tenure with the company.
Relational Database Model
▪The crowfeet near the department means that
has many projects and also a manger can
supervise several departments
▪In order to retrieve the name of the manager
managing a certain department the DBMS
begins with the project description, searches
the project table to retrieve the project's
department number, uses this number to
locate the managers SSN in the department
table then use it access manager name in
manager department
A primary advantage of relational
databases is their capacity to link
tables, as demonstrated in Figure
4.7. This linkage reduces data
redundancy and facilitates a more
logical organization of data.
The ability to link to the
manager's SSN
stored once in the manager table
obviates the need for multiple
entries in the project table.
The relational database model is
known as the most widely
adopted choice due to its ease of
control, flexibility, and intuitive
organization of data into tables
Relational Database Management
System
Figure 4.8 showcases a relational database
management system, such as Access, offering
guidance and tools for creating and employing
database tables.
The system provides information on data types
and indicates the availability of additional
assistance.
The capability to link relational tables also
enables users to explore data relationships
without the need to redefine intricate
connections.
Relational Database Management
System
Given these advantages, many companies employ the relational model for extensive corporate
databases, including those dedicated to marketing and accounting.
Furthermore, the relational model can be implemented on both personal computers and
mainframe systems.
For instance, a travel reservation company can develop a fare-pricing system utilizing relational
database technology capable of handling millions of daily queries from online travel companies
like Expedia, Travelocity, and Orbitz.
Big Data
We are amassing an ever-expanding reservoir of data and information from a wide array of sources,
including company documents, emails, web pages, credit card transactions, phone messages, stock
trades, memos, address books, and radiology scans.
In addition to the data collected through emails, blogs, youtube and much more resources including
all the data organizations collect through events.
This huge amount of data require so much work from both organizations and individuals as they face
the overwhelming task of processing an overwhelmingly vast and ever-accelerating volume of data.
According to IDC, a technology research firm, the world generates exabytes of data annually (with an
exabyte equal to one trillion terabytes).
As initially discussed this excess amount of data is referred to as "Big Data," with "Big Data" being
capitalized to distinguish it from traditional data in large quantities.
Big Data
▪At its essence, Big Data revolves around Consider the following examples:
making predictions. These predictions do not
arise from teaching computers to emulate • Estimating the probability that an email
human thinking; rather, they result from the message is spam.
application of mathematical principles to • Assessing the likelihood that the typed
massive datasets, allowing us to infer letters "teh" should be correctedto "the."
probabilities
• Determining the probability that a
▪The effectiveness of Big Data systems derives jaywalker's trajectory and velocity suggest
from their access to vast datasets upon which they will safely cross the street, indicating that
to base predictions. a self-driving car only needs to make a minor
▪Furthermore, these systems are designed to adjustment in speed.
enhance their performance over time by
identifying the most valuable signals and
patterns as additional data is fed into them.
Defining Big Data
Firstly, according to Gartner, Big Data is characterized by its diversity, high volume, and rapid velocity.
It comprises information assets that demand novel processing methods to facilitate improved
decision-making, the discovery of insights, and the optimization of processes.
Secondly, the Big Data Institute (TBDI; www.the-bigdatainstitute.com) defines Big Data as expansive
datasets that possess the following attributes:
• Diverse in nature,
• Comprising structured, unstructured, and semi-structured data,
• Generated at a high velocity with an unpredictable pattern,
• Not neatly fitting into conventional, structured, relational databases
• Requiring sophisticated information systems for effective capture, processing, transformation, and
analysis within a reasonable timeframe.
Defining Big Data
Big Data typically encompasses the following categories, though it's important to note that this list is
not exhaustive and may expand with the emergence of new data sources:
▪Traditional enterprise data: Examples include customer information from customer relationship
management systems, transactional enterprise resource planning data, web store transactions,
operational data, and general ledger data.
▪Machine-generated/sensor data: This category includes data from sources like smart meters,
manufacturing sensors, sensors integrated into smartphones, automobiles, airplane engines,
industrial machines, equipment logs, and trading systems.
▪Social data: Social data comprises information like customer feedback comments, microblogging site
content (e.g., Twitter), and content from social media platforms such as Facebook, YouTube, and
LinkedIn.
▪Images from various devices: These images are captured by countless devices worldwide, ranging
from digital cameras and camera phones to medical scanners and security cameras.
Big Data Examples
The Sloan Digital Sky Survey in New Mexico, which commenced in 2000, amassed more data
within its initial weeks than the entire history of astronomy. By 2013, its archive contained
hundreds of terabytes of data.
Every hour, Facebook users upload more than 10 million new photos and engage with content by
clicking "like" or leaving comments nearly 3 billion times each day.
Google's YouTube service, with over 800 million monthly users, saw users uploading more than
an hour of video every second.
On Twitter, the number of messages grew at a staggering rate of 200 percent each year,
surpassing 450 million tweets per day by mid-2013, Twitter is currently known as “Threads” and
owned by Elon Musk
Netflix and Spotify use big data to recommend movies, shows, or songs based on user history
and preferences
Characteristics of Big Data
Big Data possesses three distinct characteristics: volume, velocity, and variety, setting it apart
from conventional data.
Volume
Definition: Volume refers to the vast amounts of data generated every second across the globe.
This data comes from diverse sources such as social media, IoT devices, sensors, business
transactions, and digital communications.
Example: Social media platforms like Facebook generate petabytes of data daily, including posts,
images, and videos. Similarly, companies like Walmart collect terabytes of sales data from
thousands of stores.
Challenge: Storing, managing, and processing such massive datasets require scalable storage
solutions and distributed computing frameworks, such as Hadoop or cloud platforms.
Characteristics of Big Data
Velocity Variety
Definition: Velocity describes the speed at which Definition: Variety refers to the diverse formats
data is generated, collected, and processed. It and types of data that big data encompasses.
highlights the need for real-time or near-real- This includes structured data (e.g., databases),
time data handling to make timely decisions. semi-structured data (e.g., XML, JSON), and
unstructured data (e.g., images, videos, social
Example: Streaming platforms like Netflix and media posts).
stock trading systems generate data in
milliseconds, requiring instant processing to Example: A single organization might deal with
deliver personalized recommendations or customer reviews (text), sales data (structured),
execute trades. and website analytics (semi-structured).
Combining these to extract insights is
Challenge: Traditional systems cannot keep up challenging.
with the high rate of data inflow. Technologies
like Apache Kafka or Spark Streaming are often Challenge: Integrating and analyzing such diverse
used to handle this continuous data flow. datasets requires flexible data models and
advanced tools capable of handling
heterogeneous data.
Big Data Sources
Big data sources can be
broadly categorized into two
main types: internal data
sources and external data
sources. An example of big
data sources is depicted in
Figure 4.9
Big Data Sources
Internal Data Sources:
Internal data refers to data that is generated, owned, and controlled by a company or
organization. This data is typically generated through the company's day-to-day operations,
transactions, and interactions with customers, suppliers, and other stakeholders.
Examples of internal data sources including, customer transaction records, sales data, employee
records, production data, financial statements, and any other data that is collected and stored
by the company as part of its business activities.
Companies have full control over internal data and can use it for various purposes, such as
business analytics, decision-making, and improving internal processes.
Big Data Sources
External Data Sources:
External Data refers to data that is not generated, owned, or controlled by the company. This data is
typically sourced from outside the organization and can come from a wide range of external providers
and public sources.
Examples of external data sources include:
▪ Publicly available data: Information from government agencies, public databases, social media, news feeds,
and other publicly accessible sources.
▪ Third-party data providers: Companies that specialize in collecting and aggregating data, such as market
research firms, data brokers, and data syndication services.
▪ Internet of Things (IoT) devices: Data generated by sensors and devices connected to the internet, such as
weather sensors, GPS devices, and smart appliances.
▪ Social media and web data: User-generated content, online reviews, social media posts, and website analytics.
Companies do not have direct control over external data, but they can acquire and use it to gain
insights, enhance decision-making, and supplement their internal data.
Big Data Sources
Both internal and external data sources are valuable for organizations in the era of big data, as
they provide a wealth of information that can be analyzed to uncover insights, trends, and
opportunities.
Effective data management and integration strategies are often needed to harness the full
potential of both types of data sources for business intelligence and strategic decision-making