An Introduction To Entity Relationship Diagrams
An Introduction To Entity Relationship Diagrams
Diagrams (ERDs)
Version: 5
Health Informatics
Entity Relationship Diagrams (ERDs)
How th i s d oc um en t sh ou l d be u sed:
This document has been designed to be suitable for web based and face-to-face teaching. The text has been made
to be as interactive as possible with exercises, Multiple Choice Questions (MCQs) and web based exercises.
If you are using this document as part of a web-based course you are urged to use the online discussion board to
discuss the issues raised in this document and share your solutions with other students.
Contents
1. Before you start .......................................................................................................................................... 5
1.1 Prerequisites ............................................................................................................................................. 5
1.2 Required Resources.................................................................................................................................. 5
2. Learning Outcomes .................................................................................................................................... 6
3. Introduction ................................................................................................................................................ 7
3.1 Why Learn About ERDs? .......................................................................................................................... 7
3.2 What is an ERD?....................................................................................................................................... 8
4. Entity types, instances and attributes ....................................................................................................... 9
4.1 Showing Attributes in ERDs..................................................................................................................... 13
4.2 Summary ................................................................................................................................................ 13
4.3 Identifying Entity Types ........................................................................................................................... 15
4.3.1 Analysing a Narrative Description of Requirements .............................................................................................. 15
4.3.2 Identifying Appropriate Entity Types ...................................................................................................................... 17
4.3.3 Good and Bad Entity Types ................................................................................................................................... 18
4.3.4 Warning about the „is a Kind of‟ Situation .............................................................................................................. 19
4.3.5 Workshops............................................................................................................................................................. 19
4.3.6 Drawing ERDs ....................................................................................................................................................... 19
4.4 Summary ................................................................................................................................................ 20
5. Relationships? .......................................................................................................................................... 22
5.1 What Does it Mean?................................................................................................................................ 22
5.2 Foreign Keys - How it all Works ............................................................................................................... 23
5.3 What Name do you give a Foreign Key Attribute? .................................................................................... 23
5.4 What are Dependency and Referential Integrity? ..................................................................................... 23
5.5 What is a One to One Relationship? ........................................................................................................ 24
5.6 What is a One to Many Relationship? ...................................................................................................... 25
5.7 Optionality ............................................................................................................................................... 25
5.8 How Many is Many? ................................................................................................................................ 26
5.9 What is a Many to Many Relationship? .................................................................................................... 26
6. Some Common Mistakes in ERDs ........................................................................................................... 28
6.1 Confusing Instances and Types............................................................................................................... 28
6.2 Including Lookup/Descriptor Entities ........................................................................................................ 29
6.3 Unnecessary Complexity ......................................................................................................................... 30
6.4 Poor Aesthetics ....................................................................................................................................... 31
7. The Relationship between Narrative Descriptions and ERDs ................................................................ 32
8. Recursion in ERDs ................................................................................................................................... 38
9. Conceptual, Logical and Physical Data Models ...................................................................................... 39
9.1 Electronic Message Board Example ........................................................................................................ 40
10. Where do ERDs fit Into the Database Design Process? ......................................................................... 41
10.1 The Small Personal Database.............................................................................................................. 41
10.2 Departmental and Hospital Systems .................................................................................................... 42
1.1 Prerequisites
This document assumes that you have the following knowledge and skills:
Skills:
That you have used the following features of a Database Management System (DBMS) such as Access to:
Create Tables
Create Relationships (and therefore know about the relationship window)
Create simple Queries in the query design window
Create a simple form
Knowledge:
You should also be able to describe what the following concepts mean:
Tables, indexes and Fields
Relationships (not a detailed description)
Forms
Queries
If you have completed the ECDL (European Computer Driving Licence) you will have covered these topics. If not I
recommend that you do so now. You can take the ECDL either at a local college (in the UK) or as a distance
Learning course. There are also some very good books guiding you through the ECDL.
2. Learning Outcomes
This document aims to provide you with the following skills and information. After you have completed it you should
come back to these points, ticking off those you feel happy with.
Be able to list the reasons why the ability to develop ERDs is important for database developers and those involved in database
development
Be able to explain the relationship between entity type, entity instance and attribute
Be able to discuss the various criteria you may use to help refine an initial list of entity types
Be aware of the use of workshops and informal ERDs to develop ERDs in a group setting
Be able to provide an example of how the relationship concept is implemented within a relational DBMS (e.g. Access)
Be able to discuss the main issues in the confusion concerning conceptual, logical and physical data models
Be able to evaluate the quality of an ERD from the perspective of undue complexity and aesthetics
Be able to describe the pragmatic approach which is usually adopted when developing successive ERDs
Be able to describe the additional functions a CASE tool offers over drawing programs
Be able to provide examples of various models that already exist and are available to the public for a number of situations
Be able to describe how recursion is often removed from ERDs as they are refined
Be able to indicate that the process of normalising to fifth business normal form removes recursive relationships
Be able to contrast the main differences between small scale and large scale database projects
Be able to demonstrate the central part ERDs play in all database development methods
3. Introduction
This document will introduce you to a technique used to provide a diagrammatic description of certain aspects of the
data requirements for a database. The technique is called entity relationship modelling and uses Entity Relationship
Diagrams (ERDs) as the main method. It is a particular kind of data modelling and is one of the oldest techniques
around having been developed in the 1970's
by Peter Chen who is very much still alive and
continues to produce important research
(http://www.csc.lsu.edu/~chen/chen.html).
You can read his original paper at
http://www.csc.lsu.edu/~chen/pdf/erd.pdf. One
of my online students was motivated enough
to go and see him when he was working
through a former version of this document (he
did live close by).
You may have come across such diagrams
already using a variety of databases including
Microsoft Access. The example below shows
an example from a database to collect
research data from those who suffer from
diabetes and are also pregnant (don‟t worry
about understanding the diagram at this
stage):
Exercise 1. MCQs
1. From the list below choose two reasons why it is important for a „domain expert‟ such as you to learn about the
ERD method:
a. Provides insight into the mindset of database developers
b. Is the only method available to describe the data requirements
c. Provides credibility to IT personnel
d. Forms the basis of most data modelling techniques
e. Has been proved to be the most cost effective method of specifying data requirements
2. From the list below choose the one option that describes the most desirable „domain expert‟ from the medical
profession:
a. Someone who has developed several databases but knows little of database modelling or current issues in
medicine
b. Someone who has little interest in how information may help the department
c. Someone who has problems working in a collaborative environment
d. Someone who has previously managed IT projects
e. Someone who has knowledge of data modelling techniques and currently works in the appropriate situation
Exercise 2. MCQs
1. ERDs provide a description of (one correct answer):
Another way of thinking about the bond between an entity type and instance is to think of the type as being the
column headings (= attribute names) for a number of rows each of which is an instance of the entity type.
Doctor
surname forename date of birth GMC number salary gender Doctor entity type with
no entity instances of
Doctor
Doctor
surname forename date of birth GMC number salary gender Doctor entity type with 4
entity instances of
smyth Dodger 19/12/1956 39200456 89K male
Doctor
Coates Jill 03/05/1966 5748337 67K female
Worsley Alan 11/07/1970 578493 80K Male
Brown John 29/10/1055 4958576 45K Male
Now can you see why an entity type is like a template for entity instances.
Entity type Entity instance example(s) providing values for some of the attributes
Advice leaflet Chronic back pain leaflet given to Mrs Smith last friday
Entity type Entity instance example(s) providing values for some of the attributes
Entity type Entity instance example(s) providing values for some of the attributes
Spend five minutes filling in the tables below. List some entity types along with examples of their entity
instances. Some of your entity types may have no or several instances. The important thing is that each
instance possesses the same set of attributes as the entity type.
Attributes names:
Attributes values:
Attributes names:
Attributes values:
Attributes names:
Attributes values:
Attributes names:
Attributes values:
You may have begun to feel that there is some similarity between the concepts we have been discussing above and
various concepts you will have come across when using a database management system such as Access.
In fact there are two very basic similarities:
Table definitions (in a database) = Entity types (in an ERD)
Records in a table (in a database) = Entity instances (in an ERD)
It must be realised that this is only a rough guide as there are numerous exceptions to the rule, but it does help to
think of the concepts in the above manner.
4.2 Summary
An ERD consists of entity types and relationships.
An entity type may be bound to zero or more entities instances (you can therefore have an entity type with
no instances).
An entity type possesses a number of attributes (e.g. a GP possesses name, date of birth, GMC number,
salary, etc).
Each entity instance possesses the same set of attributes as the associated entity type, where now each
attribute possesses a particular value (e.g. GP = Smith; date of birth = 21/09/78, etc).
We will now look in more depth at the very important task of identifying entity types for a particular context, but
before that here are some MCQs for you to answer.
Exercise 4. MCQs
1. Which of the following provides the best description of an entity type (select one)?
a. A specific concrete object with a defined set of processes (e.g. John Brown with diabetes)
b. A value given to a particular attribute (e.g. height - 230 cm)
c. A thing that we wish to collect data about where zero or more, possibly
real world examples of it may exist
d. A template for a group of things with the same set of characteristics that may exist in the real world
e. An undefined concept that needs further clarification
2. Which of the following provides the best description of an entity instance (select one)?
a. A specific concrete object with a defined set of processes (e.g. John Brown with diabetes)
b. A value given to a particular attribute (e.g. height - 230 cm)
c. A thing that we wish to collect data about where zero or more, possibly
real world examples of it may exist
d. A template for a group of things with the same set of characteristics that may exist in the real world
e. An undefined concept that needs further clarification
5. Select from the following the best list of attributes for the entity SHOE for someone working in a shoe shop (select
one answer):
a. 1, 2, 6
b. 1, 2, 6, 3
c. 1, 2
d. 1, 6
e. 1, 2, 3, 4, 5, 6
1=size
2=number of lace holes
3=propulsion
6. Select from the following the best list of attributes for the 4=cooking time
entity RECIPE for someone following a recipe at home (select 5=number of portions
one): 6=material colour
7=volume (loudness)
a. 4, 5, 7, 8
8=cooking temperature
b. 4, 5
c. 4, 5, 6
d. 4, 5, 8
e. 4, 5, 6, 8
The first stage is to pick out all the nouns (names) in the above passage; this provides a good baseline from which
to consider possible entity types.
Mark in the above narrative all the nouns (names) you see.
Gathering together all the above nouns produces the following list, not in any specific order:
Patients System
Doctors Users
Healthcare Cost
assistants
Care assistants Time
Ward Day
Drug treatment Lengths
Drugs Details
Patient treatment Overtime
Treatment Rates
Staff payment Grade
Staff Week
Insurance Results
company
Screen Use
Information
The next stage is to consider which of these nouns are appropriate entity types to include in the ERD.
Rumbaugh et al (1991 p152-3, repeated in Blaha & Rumbaugh 2005 p 185 -6) provides some guidance as to how
to identify appropriate entity types. He suggests that you consider the following issues:
Redundant entity types Two entity types may express the same information (i.e. two names for the same
concept, or synonyms). In the above example DOCTORS and HEALTHCARE ASSISTANTS might be
considered to be just entity instances of the entity type called STAFF; however, this is unlikely in the above
example because we know from our own expert domain knowledge that doctors are concerned with
prescribing whereas healthcare assistants are not. In addition the information we plan to collect is
specifically concerned with prescribing. We therefore consider the entity type STAFF to be redundant, at
least for the time being; possibly when the system is developed further such considerations can be re-
visited.
Irrelevant entity types If an entity type has little or nothing to do with the problem, it should possibly be left
out. In the above example one would need to clarify with the person requesting the system if they need
information stored about INSURANCE COMPANIES. Also if the system is initially only concerned with drug
treatments, is there any point collecting information about other treatments?
Vague entity types In a narrative description, often words are used indiscriminately. In the above example
the description states that “Initially the system will be concerned solely with drug treatment” yet the following
paragraphs refer to PATIENT TREATMENT and also TREATMENT. Are these three different entity types or
not? Again we can only find out by discussing this issue with the person who requested the database. We
would probably suggest that we have two entity types called DRUG TREATMENT and TREATMENT where
TREATMENT is concerned with information about any non-drug intervention and probably not included in
the initial ERD.
Entity types that are really attributes Often an initial entity type is an attribute. In the above, COST has
been classified as an entity type yet it is probably an attribute of TREATMENT or DRUG TREATMENT or
PATIENT (renaming it BILL).
Multiple roles become entity types "The name of a entity type should reflect its intrinsic nature and not a
role that it plays. For example, OWNER would be a poor name for a entity type in a car manufacturer's
database. What if a list of drivers is added latter? What about persons who lease cars? The proper class is
PERSON (or possibly CUSTOMER), which assumes various different roles, such as owner, driver, lessee."
(Blaha & Rumbaugh 2005 p 186).
Implementation information The ERD is concerned with defining the data that needs to be stored. It is not
concerned with the hardware or processing of the data. Therefore, in the above example SYSTEM,
SCREEN and USER can be ignored. Similarly RESULTS are an outcome of processing data. Perhaps
from your knowledge of Access or other databases, you will realise that such things as RESULTS and
reports are just a process of using a query or report within the database management system.
Another type of problem occurs resulting from the existence of homonyms. This is where two entity types with the
same name actually mean different things. For example the entity TREATMENT may mean very different things to
different healthcare professionals. In this instance it may be necessary to add one or more additional entity types to
express the different concepts more clearly (e.g. DRUG_TREATMENT and ART_THERAPY etc).
So what have we ended up with after all the above deliberations? The following is the revised list of entity types:
Patients, Doctors, Drug treatment, Drugs
By concentrating on the drug aspect of treatment and considering each of Rumbaugh‟s guidelines we have reduced
the list from 27 to 4 for a possible initial ERD.
Exercise 6. Identifying Entity Types from a Narrative
Time: 60 minutes
Look through the DopeHead scenario, found in the scenarios handout, and:
2. Using Rumbaugh‟s guidelines revise your list to identify possible entity types.
While the previous page was concerned with identifying entity types, we will now look at a few of the many
guidelines suggested by Reingruber & Gregory 1994 (p65-77) for what makes good entity types.
Entity type names:
Should be unique for the particular model. (This does not usually apply to attribute names which only need
to be unique to a particular entity type.)
Should be a singular noun (e.g. PATIENT not PATIENTS).
Should be self-explanatory to those reading the ERD.
Should follow any naming conventions locally defined by the modellers. For example, two common
constraints are that:
They should be written in UPPER CASE.
Possible spaces should be represented by the _ character (eg DRUG_TREATMENT
rather than DRUG TREATMENT).
Should not be a name of an individual object (e.g. Freeman Hospital Newcastle upon Tyne).
Should not express more than one concept (e.g. EQUIPMENT/BED).
Should be documented in the system design specification. The description for each entity type should be
clear, unambiguous and supplemented with examples.
When carrying out this process it is a good idea to draw up a table similar to the one below to make sure you carry
out the process, the initial proposed entity type name on the left and the final entity type name on the right.
Initial entity Unique Singular Self- UPPER The _ Not a Not more than Documented Final entity
type name: noun explanatory CASE character individual one concept type name
object
Patients
Doctors
Drug
treatment
Drugs
Time: 60 minutes
1. Looking back at the hospital example, consider the following four possible entities. Use the above
guidelines to edit them appropriately:
Patients
Doctors
Drug treatment
Drugs
2. Revisit the initial set of entity types you defined for the DopeHead scenario and edit them accordingly. It
may help to draw up a table similar to the one above.
To start with, people are often confused about the relationship between entity types and entity instances thinking
that an entity instance “is a kind of” entity type. An example will help to make this clearer. Someone might say that
for the entity type ANIMAL entity instances in this case might be BIRD or HUMAN. This is not usually the case. The
reason for this is because thinking about what information you might want to collect about BIRDs and HUMANs
would probably be different in several respects for each of them. Would you really want to collect information about
the size of wingspan for a HUMAN or the IQ of a particular bird? In other words the set of attributes for each is
different. If you are unsure of this go back to Exercise three on page 11
Unfortunately it is not always as simple as this. Here is the argument for keeping BIRD and HUMAN as entity
instances of HUMAN. Suppose you were an antique dealer and just wished to collect information about the
instances of ANIMAL statue you had. In this case you would probably just want to instance of ANIMAL statues you
had rather than animal specific, information in which the amount and type would vary for each. In this situation you
are happy about keeping the same type of information about each of the statues regardless of what they represent,
probably just species name would do for your attribute name. The context dictates what entity types you will
find.
4.3.5 Workshops
The alternative to using a paper document to partially derive a set of entity types is to hold a series of interactive
workshops. This requires commitment, including the willingness to learn, from the person(s) who have requested
the system (i.e. database). They need to understand to a limited extent what you are learning now!
Often a mixed approach can be taken, using a paper document as the starting point, to a workshop. Running
workshops requires good group working skills, and must be carried out in a non-threatening manner. Often the
modellers need to show considerable tact when people get hung up on what the modellers themselves know to be
irrelevant issues (e.g. font size/colour, type of screen or computer etc) for the particular task at hand.
Now that we have looked at how to create an initial list of entity types we will consider the actual mechanics of
drawing entities.
It must be remembered that database modelling, of which ERDs are a particular tool, is a discipline that has only
been in existence for about a quarter of a century. I was recently looking at a book on systems analysis written in
1984 (Daniels & Yeates) and only managed to find 2 of the 300 pages dedicated to databases, and within those two
pages no mention was made of ERD diagramming.
There are a large number of different ERD diagramming styles. And this document is not concerned with you
knowing any particular one intimately but rather being aware of the general style and meaning of them.
Learning to draw ERD's is usually a pen and paper exercise to begin with and I would strongly encourage you to
practice using this simple technique to begin with - be warned you'll also need a larger rubber! Drawing informal
ERDs and discussing them is a very good way of developing them in a measured and appropriate manner, flip
charts, or electronic white boards are excellent in this respect.
Other documents on my web site provide detailed tutorials on how to use specialised software (called case tools) to
draw ERDs. http://www.robin-beaumont.co.uk/virtualclassroom/contents.htm
1. Which of the following types of software (applications) is most suitable for developing ERDs?
4.4 Summary
Specifically we have gone through the following stages to come up with a list of initial entity types for both the
hospital and Dopehead scenarios and obviously we can also use this method to produce a list of candidate entity
types from any narrative description. The list below should remind you of the steps you went through:
1. Identifying nouns and drawing up a list
2. Removing:
a. Redundant entity types (e.g. synonyms)
b. Irrelevant entity types
c. Vague entity types
d. Entity types that are really attributes
e. Roles amalgamated into entity types
f. Implementation information
3. Adding entity types due to homonyms
4. Considering Reingruber & Gregory 1994‟s guidelines:
a. Initial entity type name
b. Unique
c. Singular noun
d. Self-explanatory
e. UPPER CASE
f. The _ character
g. Not an individual object
h. Not more than one concept
i. Documented
j. Final entity type name
We could have also used a more interactive approach with clients in a workshop environment, and developed
ERD's with them.
After a few MCQs we will move onto the second aspect of ERDs, the relationships. If you feel like one this would be
a good time to take a break.
Exercise 9. MCQs
1. Which one of the following best describes the technique used to identify entity types in a narrative?
a. Identification of verbs, the application of various criteria and standards (eg naming conventions etc) to
refine the list and then the creation of a list of appropriately named entity types
b. Identification of nouns then the application of ISO standards to create a list of appropriately named entity
types
c. Identification of verbs then the application of standards (eg naming conventions etc) to create a list of
appropriately named entity types
d. Identification of nouns, the application of various criteria (eg Reingruber & Gregory 1994‟s) and standards
(eg naming conventions etc) to refine the list and then the creation of a list of appropriately named entity
types
e. Identification of a few important nouns then the application of standards (eg naming conventions etc) to
create a list of appropriately named entity types
Space for you to make additional notes about Entity types and Instances:
5. Relationships?
At the beginning of this document I stated that ERDs consist of entity types and relationships. We have already
looked at entity types and will now move on to consider relationships. Just as with Entities you can consider
Relationships at the Type or Instance level but because I find it easier to think of them at the instance level I will
only discuss them at this level.
Relationships are shown on ERDs by lines, usually between entity types. A relationship is a method of linking
together entity types. Within an ERD, each entity type has one or more
relationships with a number of other entity types. A relationship is optionally
(see Reingruber & Gregory 1994 p174) given a name which is a verb
(action). For example:
DOCTOR has several PATIENT(s)
PATIENT visits a DOCTOR
The ERD opposite shows the two above relationships:
e5
e9
PATIENT MEDICAL_RECORD
Important:
Set notation metaphor Relations
are to do
with entity
Instances
e5
e9
Patient
The parent is the „one‟ side of the relationship while the child
surname forename date of birth id is represented on the „many‟ side. In this instance one parent
jones Amanda 19/12/1956 004
entity instance can be associated with possibly many child
entity instances. The „many‟ aspect is shown on an ERD
Oaks gale 03/05/1966 005 diagram by „crows feet‟. See below for an example.
kowal Andrew 11/07/1970 006
When you are drawing a one to many relationship you are
bull Chris 29/10/1055 007 also explicitly stating a parent child relationship. If you delete
.... .... ... ... a particular parent instance and it has child instances
associated with it they will more likely than not be deleted as
well. (This is technically called cascade delete). However you
Foreign key can specify a „restricted deletion‟. But then you could end up
with child instances that are no longer associated with a
parent instance. Lets take our Patent / medical_record
Medical_record
example again and this time assume that a PATIENT can now
id source date Patient_ID have multiple MEDICAL_RECORD(s). Suppose we would
9 GP 19/12/1956 005 now like our patient Oaks to have two medical records this is
easily achieved by linking multiple entity instances in the
10 St james 03/05/1966 005
hospital
Medical record entity to the appropriate patient instance. In
other words now two entity instances have the same foreign
11 GP 11/07/1970 ... key value.
12 Istanbul 29/10/1055 ...
5.7 Optionality
As with the one to one relationship, it is
possible to have either a mandatory one to
many („1:N‟) or optional one to many
(„0:N‟) relationship. Once again, if at all
possible, all optional relationships should
be investigated thoroughly to see if they
can be converted to mandatory
relationships. Reingruber & Gregory 1994
provide details.
Time: 60 minutes
Work through the first DeZign tutorial document. For details of how to obtain this tutorial see the first
section of this document - required resources.
2. Define and then draw the various relationships (you have identified the entity types in a previous
exercise!) that you have identified in the DopeHead scenario. You can do this either in pen and paper or
using the DeZign software.
1
Humulus viridens
2
Cynara cardunculus
3
4 Digitalis purpurea
5
Papaver paeoniforum
6
7 Alcea facifolia
8 Robinia pseudoacacia
9
Acer
10
11
12
2. Which of the following best describes an optional relationship (one correct answer)?
a. An optional relationship is one where a foreign key attribute can only take a specific value. For example, patient 001
may have a medical record 003 but no other.
b. An optional relationship is one where a foreign key attribute can take a null value. For example, a patient may or may
not have a medical record.
c. An optional relationship is one where a foreign key attribute can take any value except null. For example, a p atient
may have a medical record or a dummy record.
d. An optional relationship is one where a foreign key attribute may or may not exist. For example, a patient may have a
medical record.
e. An optional relationship is one where a foreign key attribute can take any value regardless of the values in the
associated primary key. For example, a patient may have a specific medical record, an undefined one or none at all
(depending upon your interpretation).
For a variety of reasons this is the best way of modelling the relationship. However, let‟s consider another way the
relationships may have been modelled.
A different modeller argues that while a particular DRUG_TREATMENT
has one or more DRUGs a particular DRUG is associated with one or
more DRUG_TREATMENTs. While this may seem more logical it should
be realised that now if we change the particular value in an entity
instance in DRUG we may (depending upon how many DRUG_TREATMENT instances it
is related to) be changing several DRUG_TREATMENT entity instances at the same time!
Why has the confusion occurred? The problem is the result of two things:
1. The modeller getting confused between entity types and entity instances. The modeller
is correct in thinking that a particular DRUG_TREATMENT has one or more DRUGs. S/he
is working at the instance level (good) and thinking about a particular DRUG_TREATMENT
for a particular patient. However, s/he then states that a particular DRUG is associated with
one or more DRUG_TREATMENTs. Here s/he has forgotten that s/he is dealing with instances.
While we are talking about a particular patient‟s particular drug treatment, s/he has now moved to the level of
thinking about all patients‟ relationships to drugs. In this case the following is correct, A PATIENT takes many
DRUG(s) and a DRUG is associated with many patients thus:
Where we are thinking of DRUG as just a name of the drug, for example
Mr Jones and Miss Smith take Penicillin. This is not the situation in the
scenario where the DRUG is associated with a particular PATIENT and
DRUG_TREATMENT DRUG here refers to a specific dose given to a
PATIENT. It makes no sense for more than one PATIENT to share a single dose of something!
One method of avoiding this confusion is to create fake instances with attributes and values. For example:
2. Another reason why the modeller may have become confused is concerned with the concept of the lookup table,
which we will discuss next.
One of the main problems with ERDs is the desire to introduce a multitude of relationships and entities when
thoughtfulness (along with a large amount of scrap paper!) can usually produce a model with much more clarity.
Here are some examples.
Unnecessary entities and relationships
Consider the following:
A COURIER delivers one or more SAMPLEs.
Each sample can have zero or more TESTs carried out
on it.
Each TEST produces zero or more FINDINGs.
Each sample is signed for by the
RECEIVING_TECHNICIAN.
The sample is then managed by a
SUPERVISOR_TECHNICIAN who also confirms any
FINDINGs.
The TESTs are carried out by JUNIOR_TECHNICIANs.
From the above information one possible ERD is shown
opposite. This all looks very complex. The entity types
COURIER, SAMPLE, TEST and FINDING seem sufficiently
different. However, how different are the
RECEIVING_TECHNICIAN, SUPERVISING_TECHNICIAN and
JUNIOR_TECHNICIAN? I even wonder that on a Sunday
afternoon they could in real life be the same person.
Obviously the modeller has tried to present the information that is collected at
each stage concerning the TECHNICIAN input, but I wonder if the following
might be just as suitable.
To indicate that a particular grade of technician should only carry out certain
tasks a small amount of text could be added to the ERD and it is important to
realise that a ERD can not express all the complexities you may wish to model.
Looking again at the ERD there still appears to be a large number of
relationships from the TECHNICIAN entity to various others. Do we need all
these? To help answer this question I would like to digress slightly and consider
two different aspects:
Independent versus sequential relationships
Multiple level relationships
Lines crossed
Entity
inappropriately
positioned
End of relationship
lines overlapping.
Which is mandatory
which is optional?
Optionality
The existence of an entity in a relationship is defined as either mandatory or optional. If an instance of an entity
must always occur for an entity to be included in a relationship, then it is mandatory. An example of mandatory
existence is the statement “Every project must be managed by a single department.”
Mandatory = must
If the instance of the entity is not required, it is optional. An example of optional existence is the statement
“Employees may be assigned to work on projects.”
Optional = maybe
The following exercise gives you some practice at working between narrative descriptions and ERDs.
Time: 60 minutes
These excellent examples have been taken from David McLean‟s (D.McLean@doc.mmu.ac.uk) home
pages. He is a senior lecturer in computing at Manchester Metropolitan University
(http://www.doc.mmu.ac.uk/online/SAD/T06/erdexes.htm no longer active June 2006).
This scenario describes the activities of the Science and Technology faculty in a university in northwest
England. Students take courses in the university. The courses are run by departments. It is possible for a
course to be run by more than one department. Each course consists of a number of units. Lecturers teach
the units, and in some circumstances more than one lecturer will teach a unit. Lecturers of course teach on
more than one unit. Since the introduction of 'modular' courses, it is possible for one unit to be included on
more than one course. A lecturer may be a course leader for a particular course.
A farmer wishes to keep computerised records on the milk and calf production of the dairy herd. All calves
produced are sold and not added to the dairy herd. Each cow has a name and date of birth, and will
produce milk for a lactation period after the birth of a calf or calves. Milk recordings for each cow in terms of
litres are taken each day. The information required for each pregnancy of a cow are the bull's name, date of
mating, date of birth of calf or calves and each calf's sex and birth weight. The system is to provide the
following information to the farmer:
• Milk yield of a cow over a particular pregnancy (note: a lactation period is associated with each
pregnancy)
Olde Worlde Homes (OWH) is a regional building company employing both permanent and contract staff.
The main activity of the company is building houses that recapture the feeling of bygone days on land it
acquires cheaply. All matters of policy (eg what land should be purchased) are determined by
management. Once the land has been purchased, the land is known as a 'site' and details of it such as
area, cost, vendor etc are kept so that management decisions can be assessed.
Several house are built on the site; the number and style are determined by the location, the land size,
target market and the physical assets of the land (eg trees or ponds). It follows from this that all houses are
built to an individual design ('architect designed' according to the sales literature). Information on each
house such as number of bedrooms, style, sale price etc is recorded, again so that management can
review their decision making and also monitor costs.
In order to maintain quality, only building materials which have been approved by the quality assurance
manager can be used to build the houses. Records are kept of the approved materials. All the materials
which are used to build a particular house are itemised and then purchased from the cheapest supplier.
The items (bricks, cement, kitchen units etc) are noted on a list which also contains details of the supplier
and cost. A list of suppliers containing nature of business, telephone number and address is maintained.
Once the houses have been built, they are marketed directly by OWH.
Houses are sold directly to the purchaser (or in some cases joint purchasers) without involving estate
agents. Details of purchasers are kept on file.
Borrowers take books out on loan. A borrower may take out up to five books at any one time. Several
copies of the same book are held for books which are continually in demand. Borrowers may make
reservations for titles which are out on loan.
Items which need to be stored include borrower number, borrower name, date borrowed, ISBN, acquisition
number (allocated when a book is purchased), date acquired, title, author, borrower making reservation,
and date reserved.
Draw an ERD to represent the library. Resolve the many-to-many relationships and suggest identifiers and
the main attributes for each entity.
1. All calves produced are sold and not added to the Initially we have a many to many relationship between
dairy herd.
Cow and Bull. This is easily resolved to Cow has zero or
2. Each cow has a name and date of birth, and will more MATING(s) and Bull has zero or more MATING(s).
produce milk for a lactation period after the birth of a All the other relationships appear to be of the one to many
calf or calves.
variety; each MATING can result in a pregnancy which will
3. Milk recordings for each cow in terms of litres are result in zero or more CALVES and CALFS. There will
taken each day. also be a LACTATION period.
4. The information required for each pregnancy of a cow
are the bull's name, date of mating, date of birth of
calf or calves and each calf's sex and birth weight.
5. The system is to provide the following information to
the farmer:
The diagram below shows the attributes that are described in the scenario:
David McLean provides the following possible
solution which is simpler than mine and probably
does the job as well. Quoting David, “by reading the
question carefully, the key features are calves, bulls,
milk yield and cows”:
D) Library scenario
Once again I have just reproduced David‟s solution below.
You now have a good grasp of the basics of ERDs. But before we finish all the practical aspects of ERDs I would
like to introduce the idea of relationships that occur within a single entity type.
8. Recursion in ERDs
Recursive relationships are also called unary or involuted relationships.
So far we have discussed examples of relationships between entity types; however, it is possible to have links
within an entity. Such a relationship is called recursive.
The usual example given is that of an EMPLOYEE entity type that has a relationship with itself called supervisor. In
other words an EMPLOYEE entity instance can relate to another EMPLOYEE entity instance in a supervisor role.
Clearly a „one to many‟ or „many to many‟ recursive relationship may also exist. (see
Carter 1995 p61 - 68, Elmasri & Navathe 1989 p. 49) Most writers suggest that these
recursive relationships are acceptable in early models but should be converted to „one
to many‟ or „one to one‟ relationships, as the model is refined. For example the above
recursive relationship could be modelled by introducing a new entity type called
ROLE.
Very briefly, when designing a database one aims for a highly normalised (basically structured) design but how you
achieve this is a matter of dispute. Each normal form represents a greater degree of data structure. This usually
involves breaking entity types up from those that have a large number of attributes to several with a smaller number
of attributes.
I have not discussed the process of normalisation in this document as there are two conflicting schools of thought
about how you go about developing these eventually highly normalised ERDs. One school of thought recommends
the method I have described in this document, you take a narrative and abstract the nouns etc and constantly refine
your model. A conflicting more formal approach suggests that you subject an initial model to a formal process of
normalisation to obtain the highly normalised ERD. If you wish to find out more about normalisation see
http://www.robin-beaumont.co.uk/virtualclassroom/chap7/s7/index.htm
Whichever method you use it should now be clear to you that developing ERDs is an iterative process, starting with
very rough models and gradually refining them. The topic of the next section firstly considers how this process has
been refined into a number of stages and then secondly where ERDs fit in the whole database development
process, but first a quick exercise.
Time: 10 minutes
My advice to you is to be like Reingruber & Gregory, and the vast majority of actual database designers, by
totally ignoring the arbitrary divisions. Just say you start off with a high level model and gradually add
detail until it is possible to map the model directly onto the DBMS of your choice.
The problems described above are related to the history of the entity concept. For details see
http://www.aisintl.com/case/method.html.
In this instance both models have the same number of entity types; however, this is not always the case as you will
have already realised with the situation concerning lookup tables. Notice also the addition of the foreign keys in the
physical model. You can also download the SQL script to make this into an Access database at
http://www.databaseanswers.com/data_models/msg_board_physical.htm.
a. A loosely defined set of terms, often used inappropriately, indicating the gradual progression from a high level
data model to one which provides all the detail required to implement it in a specific database system.
b. A clearly defined set of terms, often used inappropriately, indicating the gradual progression fr om a data model
containing attribute descriptions to one which provides all the detail required to implement it in a specific
database system.
c. A loosely defined set of terms, often used inappropriately, indicating the gradual progression from a data model
containing attribute descriptions to one which provides all the detail required to implement it in a specific
database system.
d. A clearly defined set of terms, often used inappropriately, indicating the gradual progression from a high level
data model to one which provides all the detail required to implement it in a Microsoft compliant database
system.
e. A clearly defined set of terms, often used inappropriately, indicating the gradual progression from a high level
data model to one which provides all the detail required to implement it in a referential database system.
Time: 30 minutes
1. Please visit some of the sites below. I do not want you to get any detailed knowledge but rather to
become aware of such methods and how they differ from that described by Carter above.
2. Add a few of your own links below or if you are working through this document as part of a web based
course why not share any good or bad links with your colleagues via the discussion board.
You can see a health centre data model, along with numerous others at:
http://www.databaseanswers.com/data_models/health_centre/index.htm
The NHS Data Dictionary & Manual (July 2006): http://www.connectingforhealth.nhs.uk/datastandards/
The NHS Healthcare model: http://www.virtualtravelog.net/projects/Ontology/CBS/hier.htm
An excellent article describing why high level data modelling approaches sometimes fail (specifically the above
models):
http://www.virtualtravelog.net/entries/2004/01/ontology_review_1_the_nhs_common_basic_specification_why_top_l
evel_ontologies_dont_work.html
UNITED KINGDOM: Max Jones‟ PhD thesis on „Formal generic Modelling‟ (no longer active July 2006)
EUROPE: The standards work of CEN/TC251
AUSTRALIA: http://www.aihw.gov.au/publications/index.cfm/title/9813
CANADA: http://secure.cihi.ca/cihiweb/splash.html
UNITED STATES:
CORBAMed, http://www.hipaanet.com/upin7-10.htm the healthcare domain taskforce of the Object Management
Group
HL7
The US Military Medical Technology site http://www.military-medical-technology.com/
One of my Msc students (Luisa Whitelaw) kindly suggested the following additional links on the module discussion
board:
http://support.esri.com/index.cfm?fa=downloads.dataModels.gateway
http://www.who.int/healthmetrics/library/south_africa_05apr.doc
http://www.nisc.co.za/databases
http://www.tama-sa.gov.za/terminology_sa_home.htm
http://www.doh.gov.za/nhis/docs/nchmis.htm
http://folk.uio.no/leopoldo/Publications/Papers/HCIS-1465JN_Finland.pdf
http://www.sita.co.za/documents/invitations/504_2006/RFB%20504EHR%20Strategy.doc
http://groups.csail.mit.edu/medg/courses/6872/2003/slides/lecture2-print.pdf
http://www.avert.org/safricastats.htm
http://www.tac.org.za/aidsstats.html (see links under section “further reading”)
http://iri.columbia.edu/outreach/training/course/bamako1999/html/slide/ZAMO.ppt#263,10,Proposed outputs/results
http://www.musc.edu/infoservices/lanvision/Regulations_Policies_Guidelines/ASTMEMRGuide.pdf
Exercise 19. Public Data Models
Time: 30 minutes
1. Please visit some of the sites listed above. I do not want you to get any detailed knowledge but rather to
become aware of such resources. Make sure you visit the NHS data dictionary and NHS Healthcare model
sites, but don‟t attempt to understand it all fully!
2. Add a few of your own links below or if you are working through this document as part of a web based
course why not share any good or bad links with your colleagues via the discussion board.
Argo UML http://www.gentleware.co Free - Open Source For Modelling UML Diagrams - but not for the faint-hearted. Has a good
now called m. export facility for diagrams including svg. Requires Java installed on the
Poseidon computer.
Case Studio CharonWare Free demo Download Looks impressive - if you try it, please let me have your comments. Here‟s
Maximum 6 entities), a “5-Star” Review from Dan Horn on 27th. January 2002.
then $149 or $299
CASEWise CASEWise Aimed at Business Process Modelling, but has links to DataArchitect.
DataArchitect Sybase $2,000 If you are buying for the company, buy this one.
DataArchitect is part of Power Designer, which is described below.
Database from Jo Janssens Free A French version is also available. I found this link did not work on October
st
Design Tool 21 , 2002, but if you try it and find it works, please let me know.
(DDT)
Data Design Chilli Source $99 downloaded or DDS supports the ERD Modelling and reverse engineering is being added.
Studio $189 on CD
Dezign Datanamic Offers a free download If you are buying for yourself, buy this one - now offers a free Tutorial.
or $139 to buy.
Designer Oracle $200 Available from the Oracle Technology Network (free to join), offering
unbeatable value at a price of $200
Enterprise Sparx Systems $99 UML Analysis & Design Tool and very affordable.
Architect
ER Creator Model Creator (Danish) About $100 Includes a Tutorial and a Trial version. Nice, easy to use, simple tool.
MagicDraw No Magic inc. Free educational versions of the Personal edition for educational
Varies
institutions
Power Sybase $2,000 If you are buying for the company, buy this one.
Designer The Data Modelling Tool in this composite product is called
“DataArchitect”.
Unfortunately, the design of the web site has changed recently and
information about Data Architect is difficult to find. But the product is
excellent, and I have found its powerful Reverse Engineering facilities to
be very useful.
QDesigner Quest. From the „Toad Company‟ - looks good and is well-recommended by at
Pricing difficult to find
least one satisfied user - a downloadable Trial is available(rather
on the Quest Web Site
substantial at 63MB).
Telelogic Formerly Popkin now $3,000-5,000 Up until early 2007 a student edition of System Architect
System Telelogic
Architect
was available on CD-Rom which cost around $34 (ISBN: 0-
2007 07-293278-3). Unfortunately Telelogic, the company that
have taken over Popkins, have no plans to develop the
student edition and are singularly unhelpful. The alternative
tool they recommend, Telelogic modeler at
http://www.ilogix.com/sublevel.aspx?id=1756 can be freely
downloaded, however this is a far inferior piece of software
compared to the student edition of system architect. After
registering you can still download an evaluation copy of
Telelogic System Architect at the Telelogic web site.
Toolkit for University of Twente, Free and Very interesting combination of Tools but not for the PC as runs under
Conceptual Holland Downloadable various Unix platforms
Modeling
(TCM)
Visio Microsoft Trial CD available Many versions - Standard ($199), Technical ($399), Professional ($399)
and Enterprise ($999)
Visual UML Visual Object Modelers Trial time limited Many versions
download
I agree with the anonymous author above about System Architect being the best if your company can afford it
however my note above detailing my dissatisfaction with Telelogic means that I would now recommend MagicDraw.
Exercise 20. CASE Tools
Time: 15 minutes
Visit the MagicDraw site listed above and then some of the others.
13. Exercises
It is very important to practice the skill of developing ERDs, and to help you I have provided a number of
more complex exercises than those you have encountered as you have worked through this document.
Go to my main website and download the “Scenarios for practicing modelling techniques” document:
http://www.robin-beaumont.co.uk/virtualclassroom/contents.htm
• List a number of attributes for each of the entities plus a description of each attribute
• List a set of constraints / assumptions - You do not have the luxury of being able to question the client?
Hints:
Remember to use your own „expert domain knowledge‟ where appropriate. But always state clearly any
assumptions / constraints you have made for each of the models.
By the time you have finished with the model you should have:
• Removed all recursive relationships as described in the page concerned with recursion.
14. MCQs
I have repeated all the MCQs together as a revision exercise for you.
3. From the list below choose two reasons why it is important for a „domain expert‟ such as you to learn about the
ERD method:
4. From the list below choose the one option that describes the most desirable „domain expert‟ from the medical
profession:
a. Someone who has developed several databases but knows little of database modelling or current issues
in medicine
b. Someone who has little interest in how information may help the department
c. Someone who has problems working in a collaborative environment
d. Someone who has previously managed IT projects
e. Someone who has knowledge of data modelling techniques and currently works in the appropriate
situation
5. Which of the following provides the best description of an entity type (select one)?
a. A specific concrete object with a defined set of processes (e.g. John Brown with diabetes)
b. A value given to a particular attribute (e.g. height - 230 cm)
c. A thing that we wish to collect data about where zero or more, possibly
real world examples of it may exist
d. A template for a group of things with the same set of characteristics that may exist in the real world
e. An undefined concept that needs further clarification
6. Which of the following provides the best description of an entity instance (select one)?
a. A specific concrete object with a defined set of processes (e.g. John Brown with diabetes)
b. A value given to a particular attribute (e.g. height - 230 cm)
c. A thing that we wish to collect data about where zero or more, possibly
real world examples of it may exist
d. A template for a group of things with the same set of characteristics that may exist in the real world
e. An undefined concept that needs further clarification
9. From the following, select the best list of attributes for the entity SHOE for someone working in a shoe shop
(select one):
a. 1, 2, 6
b. 1, 2, 6, 3
c. 1, 2
d. 1, 6
e. 1, 2, 3, 4, 5, 6
1=size
10. From the following, select the best list of attributes for the 2=number of lace holes
3=propulsion
entity RECIPE for someone following a recipe at home (select 4=cooking time
one): 5=number of portions
6=material colour
a. 4, 5, 7, 8 7=volume (loudness)
8=cooking Temperature
b. 4, 5
c. 4, 5, 6
d. 4, 5, 8
e. 4, 5, 6, 8
11. Which of the following best describes the technique used to identify entity types in a narrative:
a. Identification of verbs, the application of various criteria and standards (eg naming conventions etc) to
refine the list and then the creation of a list of appropriately named entity types
b. Identification of nouns then the application of ISO standards to create a list of appropriately named entity
types
c. Identification of verbs then the application of standards (eg naming conventions etc) to create a list of
appropriately named entity types
d. Identification of nouns, the application of various criteria (eg Reingruber & Gregory 1994‟s) and
standards (eg naming conventions etc) to refine the list and then the creation of a list of appropriately
named entity types
e. Identification of a few important nouns then the application of standards (eg naming conventions etc) to
create a list of appropriately named entity types
13. Which of the following types of software (applications) is most suitable for developing ERDs?
a. A relationship within an ERD is always a link between various entity instances. It is implemented within a
database by the use of a foreign key.
b. A relationship within an ERD is always a link between various entity instances. It is implemented within a
database by the use of a primary key.
c. A relationship within an ERD is always a link between two entity instances within an entity type. It is
implemented within a database by the use of a foreign key.
d. A relationship within an ERD is always a link between two entity instances within an entity type. It is
implemented within a database by the use of a primary key.
e. A relationship within an ERD is always a link between two fields within an entity instance. It is
implemented within a database by the use of a primary key.
a. An optional relationship is one where a foreign key attribute can only take a specific value. For example,
patient 001 may have a medical record 003 but no other.
b. An optional relationship is one where a foreign key attribute can take a null value. For example, a patient
may or may not have a medical record.
c. An optional relationship is one where a foreign key attribute can take any value exc ept null. For example,
a patient may have a medical record or a dummy record.
d. An optional relationship is one where a foreign key attribute may or may not exist. For example, a patient
may have a medical record.
e. An optional relationship is one where a foreign key attribute can take any value regardless of the values
in the associated primary key. For example, a patient may have a specific medical record, an undefined one
or none at all (depending upon your interpretation).
a. A mandatory relationship is also called external dependency. A mandatory one to one relationship
implies that one instance of entity A is always associated with one instance of B. A mandatory one to many
relationship implies that one instance of entity A is always associated with zero or more instances of entity
B.
b. A mandatory relationship is also called existence dependency. A mandatory one to one relationship
implies that one instance of entity A is always associated with one instance of B. A mandatory one to many
relationship implies that one instance of entity A is always associated with zero or more instances of entity
B.
c. A mandatory relationship is also called existence dependency. A mandatory one to one relationship
implies that one instance of entity A is always associated with one instance of B. A mandatory one to many
relationship implies that one instance of entity A is always associated with one or more instances of entity
B.
d. A mandatory relationship is also called external dependency. A mandatory one to one relationship
implies that one instance of entity A is associated with zero or one instance of B. A mandatory one to many
relationship implies that one instance of entity A is always associated with one or more instances of entity
B.
e. A mandatory relationship is also called existence dependency. A mandatory one to one relationship
implies that one instance of entity A is associated with zero or one instance of B. A mandatory one to many
relationship implies that one instance of entity A is always associated with zero or more instances of entity
B.
a. A loosely defined set of terms, often used inappropriately, indicating the gradual progression from a high
level data model to one which provides all the detail required to implement it in a specific database system
b. A clearly defined set of terms, often used inappropriately, indicating the gradual progression from a data
model containing attribute descriptions to one which provides all the detail required to implement it in a
specific database system
c. A loosely defined set of terms, often used inappropriately, indicating the gradual progression from a data
model containing attribute descriptions to one which provides all the detail required to implement it in a
specific database system
d. A clearly defined set of terms, often used inappropriately, indicating the gradual progression from a high
level data model to one which provides all the detail required to implement it in a Microsoft compliant
database system
e. A clearly defined set of terms, often used inappropriately, indicating the gradual progression from a high
level data model to one which provides all the detail required to implement it in a referential database
system
18. Which of the following are true statements about CASE tools (choose three)?
19. Which of the following are correct statements about recursion (choose two)?
15. Summary
This document has introduced you to a number of new concepts and provided you with the skills to use them
effectively to produce ERDs. You have seen how to define and refine a set of entity types for a given scenario. You
have also considered in detail the various types of relationship that can exist between entity types.
The CASE tool concept has been introduced to you by way of seeing what a number of suppliers have to offer.
Other documents provide practical exercises in using CASE tools.
I would now recommend that you return to the learning outcomes at the beginning of the document and see how
much you have learnt!
16. References
Carter John 2000 Database design & programming with Access, SQL and visual basic McGraw Hill
Carter John 1995 The Relational Database. Chapman and Hall [An excellent introductory book]
Date C J. 1995 (6th ed.) An introduction to database systems. Addison-Wesley.
Elmasri R Navathe S B 1989 Fundamentals of database systems. Benjamin Cummings Wokingham UK.
Everest G 1986 Database management. McGraw-Hill. London.
Finkelstein Clive 1989 An introduction to information engineering. Addision-Wesley. Wokingham UK.
Finkelstein Clive 1992 Information engineering. Addision-Wesley. Wokingham UK.
Hernandez M J 1997 Database design for mere Mortals. Addison - Wesley
Martin James 1981 An end user‟s guide to data base. Prentice Hall [ISBN 0-13-277129-2]
Reingruber Michael C. Gregory William W 1994 The Data Modelling Handbook John Wiley & Sons Chichester
Rumbaugh J Blaha M Premerlani W et al 1991 Object-Oriented Modelling and design. Prentice Hall.
Blaha M Rumbaugh 2005 (2nd ed.) Object-Oriented Modelling and design with UML. [basically this is the seocnd
edition of Rumbaugh et al 1991]Prentice Hall.
17. Links
University of Texas data modelling notes:
http://www.utexas.edu/cc/database/datamodeling/dm/objects.html
Links to database articles (applied information science web site):
http://www.aisintl.com/case/reading.html
Wikipedia has a good article about ERDs:
http://en.wikipedia.org/wiki/Entity-relationship_model
Additional ones for you to add:
Optionality
Cardinality
Document details:
C:\HIcourseweb new\chap11\s9\erds_1.doc
End of document