CIT3641 - LECTURE NOTES 3
ADVANCED DATABASES
Data Modelling Processes
        (Part 1)
    Data modeling, Entity-Relationship (E-R)
      modeling, and Relational Modeling
Data modeling, E-R modeling, and relational modeling are
all related concepts / techniques used in the process of
designing and organizing databases.
However, they focus on different aspects and stages of the
database design process.
The slides below present the key differences among them:
Data Modeling:
 Scope: Data modeling is a broader term that
  encompasses various techniques and methodologies
  for designing and organizing data structures in a
  database.
 Purpose: The primary goal of data modeling is to
  create a blueprint or roadmap for the database, guiding
  the implementation process and ensuring that the
  database meets the organization's requirements.
 Levels: Data modeling can be used at different levels
  of abstraction, including conceptual, logical, and
  physical models.
 Techniques: Data modeling techniques include Entity-
  Relationship modeling, UML (Unified Modeling
  Language), dimensional modeling (used in data
  warehousing), and others.
 Structure: Data Modeling involves defining the
  structure of the data, including entities, attributes,
  relationships, and constraints, to ensure that the
  database can accurately represent and store the
  information required by an organization.
Entity-Relationship (E-R) Modeling:
Focus: E-R modeling is a specific type of data modeling
technique that focuses on representing the entities,
attributes, and relationships in a system in a visual manner.
Representation: In E-R modeling, entities are represented
as rectangles, attributes as ovals, and relationships as lines
connecting entities.
Usage: E-R modeling is particularly useful for conceptual and
logical modeling stages, where the emphasis is on
understanding the entities and relationships in the problem
domain without concern for the specific implementation
details.
 Constraints: E-R diagrams can include cardinality
  constraints (e.g., one-to-one, one-to-many, many-to-
  many) and participation constraints (e.g., total
  participation, partial participation) to define how entities
  are related to each other.
 Communication: E-R diagrams provide a clear and
  intuitive visualization of the data model, which can
  facilitate communication between stakeholders, including
  business users, analysts, and database designers.
 Structure: While E-R modeling is a powerful tool for
  understanding the structure of the data and relationships
  in a system, it is often complemented by other modeling
  techniques, such as relational modeling for logical and
  physical design phases.
Relational Modeling:
 Focus: Relational modeling specifically deals with the
  design of relational databases, which store data in
  tables organized in rows and columns.
 Representation: Relational modeling focuses on
  representing the structure of data using relational
  concepts such as tables, keys, and relationships.
 Normalization: Relational modeling often involves the
  process of normalization, which aims to minimize
  redundancy and dependency in the database schema
  by decomposing tables into smaller, more manageable
  structures.
 Implementation: Relational modeling is closely tied to the
  relational model of data, proposed by E.F. Codd, and
  involves translating the conceptual and logical models into
  a physical database design using SQL (Structured Query
  Language) or other database management system-
  specific tools.
Data Modeling is a broad term that encompasses various
techniques for designing database structures, including E-
R modeling and relational modeling.
E-R Modeling is a specific type of data modeling
technique focused on representing entities, attributes, and
relationships in a visual manner. It is typically used during
conceptual and logical modeling stages.
Relational Modeling specifically deals with the design of
relational databases using tables, keys, and relationships.
Each technique serves different purposes and is used at
different stages of the database design process. Ideally,
both the E-R Modeling and Relational Modeling are said to
be subsets of Data Modeling.
  Conceptual vs Logical vs Physical Database
                   Design
Below is a brief explaining for each of the three phases /
concepts associated with the building of databases:
Conceptual, Logical, and Physical.
Conceptual Model:
 Purpose: The conceptual model represents the high-level view
  of the database, focusing on the entities, their attributes, and
  the relationships between them, without concern for
  implementation details.
 Abstraction: It provides a conceptual abstraction of the real-
  world domain that the database aims to represent, helping
  stakeholders understand the structure and semantics of the
  data.
 Representation: Conceptual models are typically expressed
  using techniques such as Entity-Relationship (E-R) diagrams or
  Unified Modeling Language (UML) diagrams.
 Stakeholder Communication: The conceptual model serves
  as a communication tool between business stakeholders,
  analysts, and database designers to ensure a shared
  understanding of the requirements and scope of the database.
Logical Model:
 Purpose: The logical model translates the conceptual model
  into a more detailed and structured representation that can be
  implemented in a database management system (DBMS).
 Normalization: It involves the process of normalization to
  minimize redundancy and dependency in the database schema,
  ensuring data integrity and efficiency.
 Entity-Relationship (E-R) Mapping: Entities, attributes, and
  relationships defined in the conceptual model are mapped to
  tables, columns, and foreign key constraints in the logical
  model.
 Database Independence: The logical model is typically
  independent of the specific features and constraints of any
  particular DBMS, allowing it to be translated into various
  physical implementations.
Physical Model:
 Purpose: The physical model represents the actual
  implementation of the database on a specific database
  management system, taking into account the technical
  considerations and constraints of the target environment.
 Storage Structures: It defines the storage structures,
  indexing schemes, and optimization strategies used to
  efficiently store and retrieve data.
 Data Types: The physical model specifies the data types,
  constraints, and other DBMS-specific features needed to
  enforce data integrity and ensure optimal performance.
 Normalization Optimization: While the logical model
  focuses on normalization for logical consistency, the physical
  model may denormalize certain aspects for performance
  optimization.
 Indexing and Partitioning: The physical model may include
  decisions on indexing strategies, partitioning schemes, and
  other performance-enhancing techniques tailored to the
  specific workload and access patterns of the database.
In summary, the conceptual, logical, and physical
phases/concepts associated with building databases represent
progressively more detailed and concrete views of the
database.
The processes start from a high-level conceptual understanding
of the data domain,
moving to a structured and normalized representation suitable
for implementation, and
finally translating into the specific technical implementation
details necessary for deployment on a database management
system.
A general understanding to the three data models is that a
business analyst uses a conceptual and logical model to model
the business objects that exist in the system.
The database designer or database engineer elaborates the
conceptual and logical models to produce the physical model
that presents the physical database structure ready for
database creation.
While all the three levels of data modeling contain entities with
attributes and relationships, they differ in the purposes they are
created for and the audiences they are meant to target.
Conceptual model vs Logical model vs Data model
Conceptual Data Model
 Conceptual design models the business objects that should
  exist in a system and the relationships between them.
 A conceptual model is developed to present an overall picture of
  the system by recognizing the business objects involved.
 It defines what entities exist, which are not necessarily tables.
 Thinking of tables is too detailed for this type of data model. At
  this level, the main interest is the relationship between entities
  with little interest on the cardinality.
 The cardinality of a relationship is the number of related rows
  for each of the two entities in the relationship.
 An example of a conceptual data model is shown here. It shows
  student entity, course entity, and how they relate.
Logical Data Model
 Logical data model is a detailed version of a Conceptual design.
  A logical design model is developed to enrich a conceptual
  model by defining explicitly the columns in each entity.
 At this stage the database model type, such as relational model,
  would also be determined.
 In other words, attributes are added to each entity, and further
  entities can be added that represent areas to capture data in the
  system.
 Although a logical data model is still independent of the actual
  database system in which the database will be created, you can
  still consider how it may affect the design.
 Here’s an example of the student and course data model
  created as a logical data model.
Physical Data Model
Physical data model represents the actual design blueprint of a
relational database.
It is the most detailed data model in this process. It defines a set of
tables and columns and how they relate to each other.
It includes primary and foreign keys, as well as the data types for
each column.
A physical data model elaborates on the logical data model by
assigning each column with type, length, nullable, etc.
Since a physical design represents how data should be structured
and related in a specific DBMS it is important to consider the
convention and restriction of the actual database system in which the
database will be created.
Make sure the column types are supported by the DBMS and
reserved words are not used in naming entities and columns.
Here’s an example of the student and course physical data
model.
            Entity Relationship Diagram
What is an Entity Relationship Diagram?
 An Entity Relationship Diagram (ERD) is a type of
  diagram that lets you see how different entities (e.g.
  people, customers, or other objects) relate to each other
  in an application or a database.
 An ERD may also be said to be a pictorial representation
  of the information that can be captured by a database.
 It allows database professionals to describe an overall
  design concisely yet accurately.
 This is because an ER Diagram can easily be transformed
  into the relational database.
 ERDs are created when a new system is being designed so
  that the development team can understand how to structure
  the database.
 They can also be created on an existing system to help the
  team understand how the system works and to find and
  resolve any issues.
Below is an example of an ERD.
Entities
 An entity is a thing that can have data collected and stored
  about it.
 It can be a physical object (e.g. car, person), a concept (e.g.
  address, lodging) or an event (e.g. student enrolment in a
  course, birthday party). They represent nouns.
 Entities are basic objects of an Entity Relationship Diagram
  which are usually represented as rectangles with an entity
  name inside.
 These are the tables of your database, i.e. students, courses,
  books, campus, employees, payment, projects.
 A specific example of an entity is called an instance. Each
  instance becomes a record or a row in a table.
An entity can be a strong entity or a weak entity. What’s the
difference?
 A strong entity is an entity that DOES NOT depend on
  another entity for its existence, e.g. Student or Employee.
 Strong entities are also known as parent, owner or dominant
  entities.
 A weak entity is an entity that DOES depend on another
  entity for its existence, e.g. Grades or Bonuses.
 Weak entities are also known as child, dependent, or
  subordinate entities.
 A strong entity is an entity that has its own primary key
  attribute. The primary key of a strong entity uniquely identifies
  each instance of that entity.
 A weak entity does not have a primary key attribute that can
  uniquely identify its instances on its own.
 Instead, it has a partial key which, in combination with the
  primary key of the associated strong entity, forms a
  composite primary key.
 A weak entity is defined by its inability to exist independently
  without being related to another entity (the strong entity).
 For example, an enrolment of a student is a weak entity, as
  an enrolment cannot exist without a student (strong entity).
 The primary key of the strong entity is NOT always required
  to be part of the primary key of the associated weak entity.
 However, it is common practice for the primary key of the
  strong entity to be included as part of the primary key of the
  associated weak entity.
 The primary key of the weak entity needs to be unique within
  the scope of its relationship with the strong entity.
 How that uniqueness is achieved can vary depending on the
  specific requirements of the data model.
 In some cases, the relationship between the weak entity and
  the strong entity may be established by a combination of
  attributes other than the primary key of the strong entity.
 This might occur if the weak entity's existence is dependent
  on a different set of attributes in the strong entity.
However,
including the primary key of the strong entity in the primary key
of the weak entity is often a logical choice because
it helps ensure the uniqueness of instances of the weak entity
and
maintains the integrity of the relationship between the entities.
Attributes
An attribute is a property of an entity or something that can be
used to describe an entity.
Attributes are facts or descriptions of entities. They are often
represented as ovals, or as entries inside an entity.
They are also often nouns and become the columns of the
table.
For example, for entity students, the attributes can be first
name, last name, email, address, and phone numbers.
There are several different types of attributes represented on an
Entity Relationship Diagram:
 Simple: an attribute that cannot be split into other attributes,
  such as a first name.
 Composite: an attribute that can be split into other attributes,
  such as name being split into first, middle, and last name;
  address being split into street/road name, plot/house number,
  P.O. Box number, township, and town.
 Derived: an attribute that is calculated or determined from
  another attribute, such as the age of a student being
  calculated from a Date of Birth (DOB) Attribute.
An attribute can also be single-value or multi-value:
 Single-Value: an attribute that is only captured once for an
  entity.
 Multi-Value: an attribute that can be captured more than
  once for an entity, such as multiple mobile phone numbers.
Other related terminologies:
 Primary Key – Is an attribute Or A Set of attributes that
  uniquely identifies an instance of the entity. For example, for
  a Student Entity, a Student Number is the Primary Key since
  no two students have the same Student Number. We can
  have only ONE Primary Key in a table. It uniquely identifies
  every row and it can not be Null.
 Foreign Key – Is a Key used to link two tables together.
  Typically, you take the primary key field from one table and
  insert it into the other table where it becomes a foreign key (It
  remains a primary key in the original table). We can have
  more than one foreign key in a table.
Relationships
A relationship in an ERD defines how two entities are
related to each other.
They can be derived from verbs when speaking about a
database or a set of entities.
Relationships in ERDs are represented as lines between
two entities, and often have a label on the line to further
describe the relationship (such as “enrols”, “registers”,
“completes”).
There are several types of relationships that are
represented on an ERD:
 One to one: One record of an entity is directly related to
  another one record of an entity
 One to many: One record of an entity is related to zero,
  one or more records of another entity.
 Many to many: Many records of one entity can be
  related to many records of another entity.
Cardinality
 It defines the possible number of occurrences in one entity
  which is associated with the number of occurrences in
  another.
 This is often expressed as a number but could also be a
  symbol, depending on the style of diagram used.
 Common cardinality values are zero, one, or many.
 For example, ONE team has MANY players. When present
  in an ERD, the entity Team and Player are inter-connected
  with a one-to-many relationship.
 In an ER diagram, cardinality may be represented as a
  crow’s foot at the connector’s ends.
 As indicated above, the three common cardinal
  relationships are one-to-one, one-to-many, and many-to-
  many.
 The next slide shows some cardinality examples of
  relationship in ERD using the crow’s foot notation:
Suppose we have the following business scenario:
 One Customer May Be Making One Or More Appointments
 One Appointment Must Be Made By One And Only One
  Customer
 The Cardinality Linked From Customer To Appointments Is 0
  To Many
      The ERD above using the Crow’s Foot notation:
 Entities Are Shown In A Box With Attributes Listed Below The
  Entity Name.
 Relationships Are Shown As Solid Lines Between Two
  Entities.
 The Minimum And Maximum Cardinalities Of The
  Relationship Linked Between Customer And Appointment Are
  Shown With Either A Straight Line And Hash Marks, Or A
  Crow’s Foot As Shown In The Figure.
A Parent Table - The entity on the "one" (/1) side of a relation
with another table
A Child Table - The entity on the "many" (/N/*) side of a relation
with another table
 In entity relationship modelling, solid lines usually represent
  strong relationships and dashed lines usually represent
  weak relationships.
 When the relationship is strong, i.e., the primary_key of the
  parent becomes part of the composite primary_key in the
  child entities, we use a solid line to depict that.
 Similarly, for the case when the relationship is weak, which
  means the primary_key of the parent is not used as part of a
  primary_key in the child entity, then we use a dashed line to
  show that.
Strong Relationship Vs Weak Relationship
Strong Relationship – Definition:
 A strong relationship in data modeling implies a significant
  dependency or association between two entities.
 This relationship is typically enforced through constraints,
  such as foreign keys, which ensure data integrity and
  consistency.
 Strong relationships are commonly seen in scenarios where
  one entity is directly dependent on another, and the existence
  of one entity is crucial for the other.
Strong Relationship - Example:
 The relationship between a customer and an order is typically
  strong because an order is directly tied to a customer's
  action.
 Each order is placed by a specific customer, and each
  customer can have multiple orders associated with them.
 This relationship is often enforced through constraints such
  as foreign keys in the database.
 For instance, the "Order" table might have a foreign key
  referencing the "Customer" table, ensuring that each order is
  associated with a valid customer.
Weak Relationship - Definition:
 A weak relationship in data modeling implies a looser or less
  significant association between two entities.
 Unlike strong relationships, weak relationships do not
  necessarily enforce strict constraints or dependencies
  between entities.
 Entities in a weak relationship may exist independently of
  each other, and the relationship may not be crucial for the
  integrity of the data model.
Weak Relationship - Example:
 Consider a database modeling a library system. In this
  system, there is a weak relationship between the "Books" and
  "Authors" entities.
 Each book is associated with one or more authors, but
  authors can exist independently of any specific book.
 While books rely on authors for attribution, the absence of an
  author or a change in authors does not necessarily impact
  the existence or integrity of the books themselves.
 This relationship is typically represented by a foreign key in
  the "Books" table referencing the "Authors" table, but the
  absence of an author does not prevent a book from being
  created.
  Strong relationships in data modeling involve significant
  dependencies and constraints between entities, often
  enforced through mechanisms like foreign keys,
  while
  weak relationships imply looser associations where entities
  may exist independently of each other, and constraints may
  be less strict or essential for data integrity.
In summary, the choice between strong and weak relationships
in data modeling involves trade-offs between data integrity,
flexibility, complexity, and performance.
The decision should be based on the specific requirements of
the application, considering factors such as data consistency
needs, scalability, and the ability to accommodate changes over
time.
Symbols and Notations
There are several symbols and notations that are used in the
Entity Relationship Diagrams.
Examples are indicated below:
1. Chen
Here’s an example of the symbols used with the Chen notation
style.
Here’s an example ERD using Chen notation:
2. Crow’s Foot
Here’s an example of the symbols used with the Crow’s foot
   notation style. It’s called “crow’s foot’ as the symbol for a
   many relationship looks like the foot of a crow with its three
   prongs.
Here’s an example ERD using Crow’s Foot notation:
3. Bachman
Here’s an example of the symbols used with the Bachman
   notation style.
Here’s an example ERD using Bachman notation:
4. IDEF1X
Here’s an example of the symbols used with the IDEF1X
   notation style.
Here’s an example ERD using IDEF1X notation:
5. Barker
Here’s an example of the symbols used with the Barker notation
   style.
Here’s an example ERD using Bachman notation:
Relationships in Relational Databases
Relational databases enable users to store and retrieve data
from related tables. This involves the use of tables, rows,
columns, keys, atomic data, and relationships between tables.
There are three types of relationships in these databases:
One to One (1:1)
Example: In nature, each element has one formula and each
formula represents one element.
One to Many (1:M)
Example: In business, each staff works in one department and
in each department, many staff can be working.
Many to Many (M:N)
Example: In university, each student can attend many courses
and in each course, there can be many students.
Many to Many(M:N) Relationship
Many to many relationships create uncertainty and duplications
on data that would in some cases result in wrong statements for
queries. In the example below, Each person can use many
banks and each bank can have many customers.
In the Entity-Relationship (ER) Diagram, ‘person_id’ is specified
as Primary Key, and Primary Keys should be unique. There are
rows that have the same Primary Key in people table. SQL
Server will be raising an error if such a table is created.
Violation of PRIMARY KEY constraint
‘PK__customer__543848DFE23A1B1A’. Cannot insert duplicate
key in object ‘dbo.customer’. The duplicate key value is (1).
A common way to avoid problems while setting many to many
relationships is creating a new table which is called ‘join table’,
’junction table’, ’cross-reference table’, ‘bridging table’,
‘intersection table’ in various resources.
Join table has to be composed of ‘person_id’, ‘bank_id’ from
related tables’ Primary Keys and it is possible to add a new
column in Join table (i.e. ‘account_type’). ‘person_id’ and
‘bank_id’ are both Primary and Foreign Keys in Join table.
Primary Keys can’t be NULL values. Because of that every
‘person_id’ and ‘bank_id’ values should be filled
from people and banks tables.
People and banks tables have ‘many optional’ to ‘many
optional’ relationship. It means that a customer can be working
with many banks or he/she can’t be working with any bank. Also,
a bank can have many customers or it can’t be having any
customers.
In people_banks table, each ‘person_id’ should match up with
only one ‘person_id’ in people table. So that, each row
in people_banks table has ‘one mandatory’ relationship
with people table.
This rule is also valid for the relationship
between banks and people_banks tables.
Summarizing
An Entity Relationship Diagram is a great tool to help you
define, understand, and communicate the requirements of a
system. It can be modelled at a high level (conceptual data
model), a detailed level (physical data model) or a level in
between (logical data model). There is a range of modelling
notations or types of symbols which define how entities,
attributes, and relationships are captured.
The main goal of designing a data model is to make certain that
data objects offered by the functional team are represented
accurately. We should first start from the conceptual data
model.
As more and more information become available, we add more
details to refine it from conceptual to the logical model.
Finally, when we know exactly how to implement the database
of our system, we can refine our logical model into the physical
data model which can directly map between the diagram and
the actual database system.
                  Relational Modeling
           (Entities, Attributes, Values and Keys)
 E-R Model and Relational Model are both types of a
  Data Model.
 Data Model describes a way to design database at
  physical, logical and Conceptual (view) level.
 The main difference between E-R Model and Relational
  Model is that E-R Model is entity specific, and
  Relational Model is table specific.
We look at similar structures from the Relational Model’s point
of view.
Entities
Many organisations (such as businesses, government
departments, supermarkets, universities and hospitals) have a
number of branches, divisions or sections in order to deal with a
variety of functions or different geographical areas.
Each branch, division or section may itself be split up into
smaller units. It is possible to regard each branch, division or
section (or each unit within these) as an organisation in its own
right.
Organisations require information in order to carry out the tasks
and activities for which they are responsible. The information
that these organisations need could be categorised in a number
of ways, for example:
People
 Payroll
 Pensions
 Annual leave
 Sick leave
Things
 Furniture
 Equipment
 Stationery
 Fire extinguishers
Locations
 Offices
 Warehouses
 Stock rooms.
Events
 Sale is made
 Purchase order is raised
 Item is hired
 Invoice is issued
Concepts
 Image of product
 Advertising
 Marketing
 Research and development
Each of these indicated above can be regarded as an entity. An
entity may represent a category of people, things, events,
locations or concepts within the area under consideration. An
entity instance is a specific example of an entity. For example,
John Smith is an entity instance of an employee entity.
Attributes
Entities have attributes. The following are typical of the
attributes that an entity might possess:
Entity: House
Attributes:
Entity: Book
Attributes:
• Entity: Employee
• Attributes:
Values
Using the entities and attributes shown above, the following are
examples of one set of values for a particular instance of each
entity.
Every occurrence of an entity will have its own set of values for
attributes it possesses.
Entity: House
Attributes:
Values
Entity: Book
Attributes:
Values:
Entity: Employee
Attributes
Values:
QUESTIONS