KEMBAR78
Cs501 intro | PDF
CS501: DATABASE SYSTEMS AND
DATA MINING
Introduction: Database Systems1
 Course Structure
 CS501 (3-0-0-6)
 Class Timings
 Mon: 04:00pm - 05:30pm
 Fri: 03:00pm – 04:30pm
 Slides, lecture materials, assignments will be
uploaded on time to time manner
2
SYLLABUS- DATABASE SYSTEMS
 Data models: entity-relationship, relational, network,
hierarchical, and logic data models, with the emphasis on
the relational model.
 Query languages: relational algebra, relational calculus,
SQL, QBE.
 Theory of database design: functional dependencies;
normal forms: 1NF, 2NF, 3NF, Boyce-Codd NF;
decompositions; normalization; multivalued dependencies,
join dependencies, 4NF, 5NF.
 Data storage and indexing: disks, files, file
organizations, indexes; tree structured indexing (ISAM, B-
trees), hash based indexing.
 Query processing: evaluation of relational operators,
query optimization; transaction management, Concurrency
control; error recovery; security.
 Case studies: ORACLE, Microsoft access etc. Introduction
to Open Database Connectivity, Client-Server environment
etc.
3
SYLLABUS-DATA MINING
 Types of data mining problems. The process of data mining.
 Statistical evaluation of big data: statistical prediction,
performance measures, pitfalls in data-mining evaluation.
 Data preparation: data models, data transformations,
handling of missing data, time-dependent data, textual
data.
 Data reduction: feature selection, principal components,
smoothing data, case subsampling.
 Predictive modeling: mathematical models, linear models,
neural nets, advanced statistical models, distance
solutions, logic solutions, decision trees, decision rules,
model combination.
 Solution analyses: graphical trend analyses, comparison of
methods.
 Case studies. Future trends: text mining, visualization,
distributed data. Practical sessions using open-source
software.
4
BOOKS
 A. Silberschatz, H. F. Korth and S. Sudarshan,
Database System Concepts, 6th Ed, McGraw Hill,
2011.
 J. Han, M. Kamber and J. Pei, Data Mining
Concepts and Techniques, 3rd Ed, Morgan
Kaufmann
5
EVALUATION POLICY
 CS501: Database Systems and Data Mining
 Assignment, Quiz, Attendance: 20%
 Mid Sem: 30%
 End Sem: 50%
 Attendance is compulsory
6
DATABASE
 A collection of interrelated data
 Usually designed to manage large bodies of
information
 Models real world enterprise
 Entities (e.g. student, courses)
 Relationships (e.g. students are enrolled to courses)
 A database management system (DBMS) is a
software package designed to store and manage
databases in a convenient and efficient way
7
DATABASE SYSTEM APPLICATIONS
 Some representative applications-
 Banking
 Airlines
 Universities
 Credit card transactions
 Telecommunications
 Finance
 Sales
 Manufacturing
 Human Resource
8
FILE SYSTEM VS DATABASE SYSTEM
9File
System
Database
System
FILE SYSTEMS VS DBMS
 Data redundancy and inconsistency
 Difficulty in accessing data
 Data isolation
 Integrity problem
 Atomicity problem
 Concurrent access anomalies
 Security and access control
10
WHY USE A DBMS
 Data independence and efficient access.
 Reduced application development time.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.
11
DATA MODELS
 Underlying the structure of the database is the
data model
 It is a collection of tools for describing data, data
relationships, data semantics and consistency
constraints.
 The relational model of data is the most widely
used model today.
 Main concept: relation, basically a table with
rows and columns.
 Every relation has a schema, which describes the
columns, or fields.
12
 Example of customer relation
13
WHY STUDY DATABASES
 Shift from computation to information
 Datasets increasing in diversity and volume.
 DBMS encompasses most of CS applications
14
VIEW OF DATA
 A major purpose of database system is to provide
users with an abstract view of the data
 The data from the database must be retrieved
efficiently
 This need has led designers to use complex data
structures
 Since many users are not computer trained
 So developers hide the complexity from users through
several levels of abstraction
15
LEVELS OF ABSTRACTION
 Many views, single
conceptual (logical)
schema and physical
schema.
 Views describe how
users see the data.
 Conceptual schema
defines logical structure
 Physical schema
describes the files and
indexes used.
 Schemas are defined
using DDL; data is
modified/queried using
DML.
View1 View2 View3
Conceptual Schema
Physical Schema
16
EXAMPLE: UNIVERSITY DATABASE
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 Conceptual schema:
 Students(sid: string, name: string, login: string, age:
integer, gpa:real)
 Courses(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string)
 External Schema (View):
 Course_info(cid:string,enrollment:integer)
17
DATA INDEPENDENCE
 Applications insulated from how data is
structured and stored.
 Logical data independence: Protection from
changes in logical structure of data.
 Physical data independence: Protection from
changes in physical structure of data.
 Data Independence is one of the most important
benefits of using a DBMS
18
INSTANCES AND SCHEMAS
 Instance of the database: the collection of
information stored in the database at a particular
moment
 Database schema: the overall design of the
database
19
DATA MODELS
 A data model is a collection of conceptual tools for
describing data, data relationships, data
semantics and consistency constraints.
 Relational Model
 Entity Relationship Model
 Object-Based Data Model
 Semistructured Data Model
 Older models
 Network
 Hierarchical
20
DATABASE USERS
Users are differentiated by the way they expect to
interact with the system
 Naive users – invoke one of the permanent
application programs that have been written
previously
 Application programmers – computer
professionals who interact with system through
application programs
 Sophisticated users – form requests in a
database query language
 Specialized users – write specialized database
applications that do not fit into the traditional
data processing framework 21
DATABASE ADMINISTRATOR
 Coordinates all the activities of the database
system
 Should have a good understanding of the enterprise’s
information resources and needs.
 Database administrator's duties include:
 Storage structure and access method definition
 Schema and physical organization modification
 Granting users authority to access the database
 Backing up data
 Monitoring performance and responding to changes
22
ENTITY RELATIONSHIP MODEL
 Widely used conceptual level data model
 proposed by Peter P Chen in 1970s
 Data model to describe the database system at
the requirements collection stage
 high level description.
 easy to understand for the enterprise managers.
 rigorous enough to be used for system building.
 Concepts available in the model
 entities and attributes of entities.
 relationships between entities.
 diagrammatic notation.
23
ENTITIES
 Entity Real-world object distinguishable from
other objects. An entity is described (in DB) using
a set of attributes.
 In the University database context, an individual
student, faculty member, a class room, a course are
entities.
 Entity Set or Entity Type-
 Collection of entities all having the same properties.
 Student entity set –collection of all student entities.
 Course entity set –collection of all course entities.
24
ATTRIBUTE
 Each entity is described by a set of
attributes/properties.
 Student entity
 StudName–name of the student.
 RollNumber–the roll number of the student.
 Sex–the gender of the student etc.
 All entities in an Entity set/type have the same
set of attributes.
25
TYPES OF ATTRIBUTES
 Simple Attributes
 having atomic or indivisible values.
 E.g. Dept–a string
 PhoneNumber–an eight digit number
 Composite Attributes
 having several components in the value.
 E.g. Qualification with components
 (DegreeName, Year, UniversityName)
 Derived Attributes
 Attribute value is dependent on some other attribute.
 E.g: Age depends on DateOfBirth. So age is a derived
attribute.
26
TYPES OF ATTRIBUTES (2)
 Single-valued
 having only one value rather than a set of values.
 E.g., PlaceOfBirth–single string value.
 Multi-valued
 having a set of values rather than a single value.
 E.g., CoursesEnrolled attribute for student
 EmailAddress attribute for student
 PreviousDegree attribute for student.
 Attributes can be:
 simple single-valued, simple multi-valued,
 composite single-valued or composite multi-valued.
27
DIAGRAMMATIC NOTATIONS
28
student
name
fname
mname
lname
sex
age
dob email
Admiss
ion_yr
Progra
m
Roll_N
o
DOMAINS OF ATTRIBUTES
 Each attribute takes values from a set called its
domain
 For example,
 StudentAge–{17,18, …, 55}
 HomeAddress–character strings of length 35
 Domain of composite attributes –
 cross product of domains of component attributes
 Domain of multi-valued attributes –
 set of subsets of values from the basic domain
29
ENTITY SETS AND KEY ATTRIBUTES
 Key–an attribute or a collection of attributes
whose value(s) uniquely identify an entity in the
entity set.
 For instance,
 RollNumber- Key for Student entity set
 EmpID- Key for Faculty entity set
 HostelName, RoomNo- Key for Student entity set
(assuming that each student gets to stay in a single
room)
 A key for an entity set may have more than one
attribute.
 An entity set may have more than one key.
 Determined by the designers 30
RELATIONSHIPS
 When two or more entities are associated with
each other, we have an instance of a
Relationship.
 E.g: student Ramesh enrolls in Discrete Mathematics
course
 Relationship Enrolls has Student and Course as the
participating entity sets.
 Formally, Enrolls ⊆ Student ×Course
 (s,c) ∈ enrolls ⇔ Student ‘s’ has enrolled in Course ‘c’
 Tuples in enrolls known as relationship instances
 Enrolls is called a relationship Type/Set.
31
DEGREE OF A RELATIONSHIP
 Degree: the number of participating entities.
 Degree 2: binary
 Degree 3: ternary
 Degree n: n-ary
Binary relationships are very common and widely used.
32
DIAGRAMMATIC NOTATION
33
A B
C
R
BINARY RELATION & CARDINALITY
34
E1 E2R
m n
The number of entities from E2 that an entity from E1 can possibly be
associated through R (and vice-versa) determines the cardinality ratio
of R.
Four possibilities-
One to one, one to many, many to one and many to many
PARTICIPATION CONSTRAINT
 An entity set may participate in a relation either
totally or partially.
 Total participation: Every entity in the set is
involved in some association (or tuple) of the
relationship.
 Partial participation: Not all entities in the set
are involved in association (or tuples) of the
relationship.
35
E1 E2R
total partial
STRUCTURAL CONSTRAINTS
 Cardinality Ratio and Participation Constraints are
together called Structural Constraints.
 They are called constraints as the data must satisfy
them to be consistent with the requirements.
 Min-Max notation: pair of numbers (m,n) placed on
the line connecting an entity to the relationship.
 m: the minimum number of times a particular entity
must appear in the relationship tuples at any point of
time
 0 –partial participation
 ≥1 –total participation
 n: similarly, the maximum number of times a
particular entity can appear in the relationship tuples
at any point of time 36
37
E1 E2R
(1,1) (0,n)
ATTRIBUTES FOR RELATIONSHIP TYPES
 Relationship types can also have attributes.
 Grade gives the letter grade (S,A,B, etc.) earned by
the student for a course.
 neither an attribute of student nor that of course.
38
Student CourseEnr
olls
m n
Grade
RECURSIVE RELATIONSHIP AND ROLE
NAME
 Recursive relationship: An entity set relating to
itself gives rise to a recursive relationship
 E.g., the relationship prereqOf is an example of a
recursive relationship on the entity Course
 Role Names –used to specify the exact role in
which the entity participates in the relationships
 Role Names are essential in case of recursive
relationships
39
Course prereqOf
prerequisite
course
Role Names
WEAK ENTITY SET
 Weak Entity Set: An entity set whose members owe
their existence to some entity in a strong entity set.
 Entities are not of independent existence.
 Each weak entity is associated with some entity of the
owner entity set through a special relationship.
 Weak entity set may not have a key attribute.
 The discriminator (or partial key) of a weak entity set is
the set of attributes that distinguishes among all the
entities of a weak entity set.
40
S WR
Always
total
Owner Entity Identifying
Relationship
Weak entity
WEAK ENTITY SET EXAMPLE
41
Loan PaymentLoan_
No
Amount
Loa
n_p
aym
ent
Payment_no
PayDa
te
Amount
EXTENDED ER FEATURES
 Basic ER concepts are used to model most
database features
 However, some features may be expressed more
aptly by using certain extensions to the basic ER
model
 Some of these features are
 Specialization
 Generalization
 Aggregation
42
SPECIALIZATION
 A top-down design process
 Designate subgroupings within an entity set that
are distinctive from other entities in the set
 These subgroupings become lower-level entity
sets that have attributes or participate in
relationships that do not apply to the higher-level
entity set
 Depicted by a triangle component labeled ISA
(E.g. customer “is a” person)
 Attribute inheritance – a lower-level entity set
inherits all the attributes and relationship
participation of the higher-level entity set to
which it is linked 43
GENERALIZATION
 A bottom-up design process – combine a
number of entity sets that share the same
features into a higher-level entity set.
 Specialization and generalization are simple
inversions of each other; they are represented in
an E-R diagram in the same way.
 The terms specialization and generalization are
used interchangeably.
 The ISA relationship also referred to as
superclass - subclass relationship
44
EXAMPLE OF SPECIALIZATION/ GENERALIZATION
45
CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION
 Constraints on which entities can be members of
a given lower-level entity set
 Condition-defined:
 all customers over 65 years are members of senior-citizen
entity set; senior-citizen ISA person.
 User-defined
 An employee is assigned to a group after 3 months
 Not done automatically
 The user in charge of the dept. makes the assignment
46
CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION (CONTD)
 Constraint on whether or not entities may belong
to more than one lower-level entity set within a
single generalization.
 Disjoint
 an entity can belong to only one lower-level entity set
 Noted in E-R diagram by writing disjoint next to the ISA
triangle
 Overlapping
 an entity can belong to more than one lower-level entity set
47
ISA
Disjoint
CONSTRAINTS ON SPECIALIZATION/
GENERALIZATION (CONTD.)
 Completeness constraint : specifies whether or
not an entity in the higher-level entity set must
belong to at least one of the lower-level entity
sets within a generalization.
 Total: an entity must belong to one of the lower-level
entity sets
 Partial: an entity need not belong to one of the
lower-level entity sets
 Partial generalization is the default
48
ISA
Total
generalization
AGGREGATION
 Consider the ternary relationship works_on,
 Suppose we want to record managers for tasks
performed by an employee at a branch
49
AGGREGATION (CONTD.)
 Relationship sets works_on and manages
represent overlapping information
 Every manages relationship corresponds to a
works_on relationship
 However, some works_on relationships may not
correspond to any manages relationships
 So we can’t discard the works_on relationship
50
AGGREGATION (CONTD.)
 Eliminate this redundancy via aggregation
 Treat relationship as an abstract entity
 Allows relationships between relationships
 Abstraction of relationship into new entity
 Without introducing redundancy, the following
diagram represents:
 An employee works on a particular job at a particular
branch
 An employee, branch, job combination may have an
associated manager
51
ER DIAGRAM WITH AGGREGATION
52
ER
DIAGRAM
NOTATIONS
53
54
ER DIAGRAM VS CLASS DIAGRAM
55
RELATIONAL MODEL
 Proposed by Edgar. F. Codd(1923-2003) in the
early seventies. [ Turing Award –1981 ]
 Most of the modern DBMS are relational
 Simple and elegant model with a mathematical
basis
 Led to the development of a theory of data
dependencies and database design.
 Relational algebra operations –
 crucial role in query optimization and execution.
 Laid the foundation for the development of
 Tuple relational calculus and then
 Database standard SQL 56
STRUCTURE OF RELATIONAL DATABASES
 Consists of a collection of tables
 Row in a table represents a relationship among a
set of values
 Thus a table is a collection of relationships
 In relational model table is also referred by
relation
 Tuple is a sequence of values
 In relational model, a tuple corresponds to a row
in a table
57
RELATIONAL SCHEMA
 Consists of relation name, and a set of attributes
or field names or column names. Each attribute
has an associated domain.
 Example:
 student ( studentName: string,
rollNumber: string,
phoneNumber: integer,
yearOfAdmission:integer,
branchOfStudy :string )
 Domain–set of atomic(or indivisible) values –data type
Relation
name
Attribute
name
Domain
58
RELATION INSTANCE
 A finite set of tuples constitute a relation
instance.
 A tuple of relation with schema R = (A1, A2, …,
Am) is an ordered sequence of values (v1,v2, ... ,vm)
such that vi∈ domain (Ai), 1≤i ≤m
Roll_no Name yearOfAdmin branchOfSt
udy
10CS001 Rajesh 2010 CSE
09CS020 Kiran 2009 CSE
09EE011 Ravi 2009 EE
59
KEYS
 Key: should have a capability of uniquely
identifying a tuple in a relation
 Superkey: a set of one or more attributes that
taken collectively allow us to identify uniquely a
tuple in the relation
 Example: {customer_name, customer_street} and
{customer_name}
are both superkeys of Customer, if no two customers can
possibly have the same name
60
KEY (CONTD.)
 K is a candidate key if K is minimal
Example: {customer_name} is a candidate key for
Customer, since it is a superkey and no subset of
it is a superkey.
 Primary key: a candidate key chosen as the
principal means of identifying tuples within a
relation
 Should choose an attribute whose value never, or
very rarely, changes.
 E.g. email_address is unique, but may change
however empid rarely changes 61
KEYS (CONTD.)
 Foreign Key: The attribute that corresponds to
the primary key of another relation.
 E.g. customer_name and account_number attributes
of depositor are foreign keys to customer and account
respectively.
 Only values occurring in the primary key attribute of
the referenced relation may occur in the foreign
key attribute of the referencing relation.
62
FOREIGN KEY (CONTD.)
 It is possible for a foreign key in a relation to
refer to the primary key of the relation itself
 An Example:
 Employee ( empNo, name, sex, salary, dept, reportsTo)
 reportsTo is a foreign key referring to empNo of the
same relation
63
BANK EXAMPLE
64

Cs501 intro

  • 1.
    CS501: DATABASE SYSTEMSAND DATA MINING Introduction: Database Systems1
  • 2.
     Course Structure CS501 (3-0-0-6)  Class Timings  Mon: 04:00pm - 05:30pm  Fri: 03:00pm – 04:30pm  Slides, lecture materials, assignments will be uploaded on time to time manner 2
  • 3.
    SYLLABUS- DATABASE SYSTEMS Data models: entity-relationship, relational, network, hierarchical, and logic data models, with the emphasis on the relational model.  Query languages: relational algebra, relational calculus, SQL, QBE.  Theory of database design: functional dependencies; normal forms: 1NF, 2NF, 3NF, Boyce-Codd NF; decompositions; normalization; multivalued dependencies, join dependencies, 4NF, 5NF.  Data storage and indexing: disks, files, file organizations, indexes; tree structured indexing (ISAM, B- trees), hash based indexing.  Query processing: evaluation of relational operators, query optimization; transaction management, Concurrency control; error recovery; security.  Case studies: ORACLE, Microsoft access etc. Introduction to Open Database Connectivity, Client-Server environment etc. 3
  • 4.
    SYLLABUS-DATA MINING  Typesof data mining problems. The process of data mining.  Statistical evaluation of big data: statistical prediction, performance measures, pitfalls in data-mining evaluation.  Data preparation: data models, data transformations, handling of missing data, time-dependent data, textual data.  Data reduction: feature selection, principal components, smoothing data, case subsampling.  Predictive modeling: mathematical models, linear models, neural nets, advanced statistical models, distance solutions, logic solutions, decision trees, decision rules, model combination.  Solution analyses: graphical trend analyses, comparison of methods.  Case studies. Future trends: text mining, visualization, distributed data. Practical sessions using open-source software. 4
  • 5.
    BOOKS  A. Silberschatz,H. F. Korth and S. Sudarshan, Database System Concepts, 6th Ed, McGraw Hill, 2011.  J. Han, M. Kamber and J. Pei, Data Mining Concepts and Techniques, 3rd Ed, Morgan Kaufmann 5
  • 6.
    EVALUATION POLICY  CS501:Database Systems and Data Mining  Assignment, Quiz, Attendance: 20%  Mid Sem: 30%  End Sem: 50%  Attendance is compulsory 6
  • 7.
    DATABASE  A collectionof interrelated data  Usually designed to manage large bodies of information  Models real world enterprise  Entities (e.g. student, courses)  Relationships (e.g. students are enrolled to courses)  A database management system (DBMS) is a software package designed to store and manage databases in a convenient and efficient way 7
  • 8.
    DATABASE SYSTEM APPLICATIONS Some representative applications-  Banking  Airlines  Universities  Credit card transactions  Telecommunications  Finance  Sales  Manufacturing  Human Resource 8
  • 9.
    FILE SYSTEM VSDATABASE SYSTEM 9File System Database System
  • 10.
    FILE SYSTEMS VSDBMS  Data redundancy and inconsistency  Difficulty in accessing data  Data isolation  Integrity problem  Atomicity problem  Concurrent access anomalies  Security and access control 10
  • 11.
    WHY USE ADBMS  Data independence and efficient access.  Reduced application development time.  Data integrity and security.  Uniform data administration.  Concurrent access, recovery from crashes. 11
  • 12.
    DATA MODELS  Underlyingthe structure of the database is the data model  It is a collection of tools for describing data, data relationships, data semantics and consistency constraints.  The relational model of data is the most widely used model today.  Main concept: relation, basically a table with rows and columns.  Every relation has a schema, which describes the columns, or fields. 12
  • 13.
     Example ofcustomer relation 13
  • 14.
    WHY STUDY DATABASES Shift from computation to information  Datasets increasing in diversity and volume.  DBMS encompasses most of CS applications 14
  • 15.
    VIEW OF DATA A major purpose of database system is to provide users with an abstract view of the data  The data from the database must be retrieved efficiently  This need has led designers to use complex data structures  Since many users are not computer trained  So developers hide the complexity from users through several levels of abstraction 15
  • 16.
    LEVELS OF ABSTRACTION Many views, single conceptual (logical) schema and physical schema.  Views describe how users see the data.  Conceptual schema defines logical structure  Physical schema describes the files and indexes used.  Schemas are defined using DDL; data is modified/queried using DML. View1 View2 View3 Conceptual Schema Physical Schema 16
  • 17.
    EXAMPLE: UNIVERSITY DATABASE Physical schema:  Relations stored as unordered files.  Index on first column of Students.  Conceptual schema:  Students(sid: string, name: string, login: string, age: integer, gpa:real)  Courses(cid: string, cname:string, credits:integer)  Enrolled(sid:string, cid:string, grade:string)  External Schema (View):  Course_info(cid:string,enrollment:integer) 17
  • 18.
    DATA INDEPENDENCE  Applicationsinsulated from how data is structured and stored.  Logical data independence: Protection from changes in logical structure of data.  Physical data independence: Protection from changes in physical structure of data.  Data Independence is one of the most important benefits of using a DBMS 18
  • 19.
    INSTANCES AND SCHEMAS Instance of the database: the collection of information stored in the database at a particular moment  Database schema: the overall design of the database 19
  • 20.
    DATA MODELS  Adata model is a collection of conceptual tools for describing data, data relationships, data semantics and consistency constraints.  Relational Model  Entity Relationship Model  Object-Based Data Model  Semistructured Data Model  Older models  Network  Hierarchical 20
  • 21.
    DATABASE USERS Users aredifferentiated by the way they expect to interact with the system  Naive users – invoke one of the permanent application programs that have been written previously  Application programmers – computer professionals who interact with system through application programs  Sophisticated users – form requests in a database query language  Specialized users – write specialized database applications that do not fit into the traditional data processing framework 21
  • 22.
    DATABASE ADMINISTRATOR  Coordinatesall the activities of the database system  Should have a good understanding of the enterprise’s information resources and needs.  Database administrator's duties include:  Storage structure and access method definition  Schema and physical organization modification  Granting users authority to access the database  Backing up data  Monitoring performance and responding to changes 22
  • 23.
    ENTITY RELATIONSHIP MODEL Widely used conceptual level data model  proposed by Peter P Chen in 1970s  Data model to describe the database system at the requirements collection stage  high level description.  easy to understand for the enterprise managers.  rigorous enough to be used for system building.  Concepts available in the model  entities and attributes of entities.  relationships between entities.  diagrammatic notation. 23
  • 24.
    ENTITIES  Entity Real-worldobject distinguishable from other objects. An entity is described (in DB) using a set of attributes.  In the University database context, an individual student, faculty member, a class room, a course are entities.  Entity Set or Entity Type-  Collection of entities all having the same properties.  Student entity set –collection of all student entities.  Course entity set –collection of all course entities. 24
  • 25.
    ATTRIBUTE  Each entityis described by a set of attributes/properties.  Student entity  StudName–name of the student.  RollNumber–the roll number of the student.  Sex–the gender of the student etc.  All entities in an Entity set/type have the same set of attributes. 25
  • 26.
    TYPES OF ATTRIBUTES Simple Attributes  having atomic or indivisible values.  E.g. Dept–a string  PhoneNumber–an eight digit number  Composite Attributes  having several components in the value.  E.g. Qualification with components  (DegreeName, Year, UniversityName)  Derived Attributes  Attribute value is dependent on some other attribute.  E.g: Age depends on DateOfBirth. So age is a derived attribute. 26
  • 27.
    TYPES OF ATTRIBUTES(2)  Single-valued  having only one value rather than a set of values.  E.g., PlaceOfBirth–single string value.  Multi-valued  having a set of values rather than a single value.  E.g., CoursesEnrolled attribute for student  EmailAddress attribute for student  PreviousDegree attribute for student.  Attributes can be:  simple single-valued, simple multi-valued,  composite single-valued or composite multi-valued. 27
  • 28.
  • 29.
    DOMAINS OF ATTRIBUTES Each attribute takes values from a set called its domain  For example,  StudentAge–{17,18, …, 55}  HomeAddress–character strings of length 35  Domain of composite attributes –  cross product of domains of component attributes  Domain of multi-valued attributes –  set of subsets of values from the basic domain 29
  • 30.
    ENTITY SETS ANDKEY ATTRIBUTES  Key–an attribute or a collection of attributes whose value(s) uniquely identify an entity in the entity set.  For instance,  RollNumber- Key for Student entity set  EmpID- Key for Faculty entity set  HostelName, RoomNo- Key for Student entity set (assuming that each student gets to stay in a single room)  A key for an entity set may have more than one attribute.  An entity set may have more than one key.  Determined by the designers 30
  • 31.
    RELATIONSHIPS  When twoor more entities are associated with each other, we have an instance of a Relationship.  E.g: student Ramesh enrolls in Discrete Mathematics course  Relationship Enrolls has Student and Course as the participating entity sets.  Formally, Enrolls ⊆ Student ×Course  (s,c) ∈ enrolls ⇔ Student ‘s’ has enrolled in Course ‘c’  Tuples in enrolls known as relationship instances  Enrolls is called a relationship Type/Set. 31
  • 32.
    DEGREE OF ARELATIONSHIP  Degree: the number of participating entities.  Degree 2: binary  Degree 3: ternary  Degree n: n-ary Binary relationships are very common and widely used. 32
  • 33.
  • 34.
    BINARY RELATION &CARDINALITY 34 E1 E2R m n The number of entities from E2 that an entity from E1 can possibly be associated through R (and vice-versa) determines the cardinality ratio of R. Four possibilities- One to one, one to many, many to one and many to many
  • 35.
    PARTICIPATION CONSTRAINT  Anentity set may participate in a relation either totally or partially.  Total participation: Every entity in the set is involved in some association (or tuple) of the relationship.  Partial participation: Not all entities in the set are involved in association (or tuples) of the relationship. 35 E1 E2R total partial
  • 36.
    STRUCTURAL CONSTRAINTS  CardinalityRatio and Participation Constraints are together called Structural Constraints.  They are called constraints as the data must satisfy them to be consistent with the requirements.  Min-Max notation: pair of numbers (m,n) placed on the line connecting an entity to the relationship.  m: the minimum number of times a particular entity must appear in the relationship tuples at any point of time  0 –partial participation  ≥1 –total participation  n: similarly, the maximum number of times a particular entity can appear in the relationship tuples at any point of time 36
  • 37.
  • 38.
    ATTRIBUTES FOR RELATIONSHIPTYPES  Relationship types can also have attributes.  Grade gives the letter grade (S,A,B, etc.) earned by the student for a course.  neither an attribute of student nor that of course. 38 Student CourseEnr olls m n Grade
  • 39.
    RECURSIVE RELATIONSHIP ANDROLE NAME  Recursive relationship: An entity set relating to itself gives rise to a recursive relationship  E.g., the relationship prereqOf is an example of a recursive relationship on the entity Course  Role Names –used to specify the exact role in which the entity participates in the relationships  Role Names are essential in case of recursive relationships 39 Course prereqOf prerequisite course Role Names
  • 40.
    WEAK ENTITY SET Weak Entity Set: An entity set whose members owe their existence to some entity in a strong entity set.  Entities are not of independent existence.  Each weak entity is associated with some entity of the owner entity set through a special relationship.  Weak entity set may not have a key attribute.  The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set. 40 S WR Always total Owner Entity Identifying Relationship Weak entity
  • 41.
    WEAK ENTITY SETEXAMPLE 41 Loan PaymentLoan_ No Amount Loa n_p aym ent Payment_no PayDa te Amount
  • 42.
    EXTENDED ER FEATURES Basic ER concepts are used to model most database features  However, some features may be expressed more aptly by using certain extensions to the basic ER model  Some of these features are  Specialization  Generalization  Aggregation 42
  • 43.
    SPECIALIZATION  A top-downdesign process  Designate subgroupings within an entity set that are distinctive from other entities in the set  These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set  Depicted by a triangle component labeled ISA (E.g. customer “is a” person)  Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked 43
  • 44.
    GENERALIZATION  A bottom-updesign process – combine a number of entity sets that share the same features into a higher-level entity set.  Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.  The terms specialization and generalization are used interchangeably.  The ISA relationship also referred to as superclass - subclass relationship 44
  • 45.
    EXAMPLE OF SPECIALIZATION/GENERALIZATION 45
  • 46.
    CONSTRAINTS ON SPECIALIZATION/ GENERALIZATION Constraints on which entities can be members of a given lower-level entity set  Condition-defined:  all customers over 65 years are members of senior-citizen entity set; senior-citizen ISA person.  User-defined  An employee is assigned to a group after 3 months  Not done automatically  The user in charge of the dept. makes the assignment 46
  • 47.
    CONSTRAINTS ON SPECIALIZATION/ GENERALIZATION(CONTD)  Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization.  Disjoint  an entity can belong to only one lower-level entity set  Noted in E-R diagram by writing disjoint next to the ISA triangle  Overlapping  an entity can belong to more than one lower-level entity set 47 ISA Disjoint
  • 48.
    CONSTRAINTS ON SPECIALIZATION/ GENERALIZATION(CONTD.)  Completeness constraint : specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.  Total: an entity must belong to one of the lower-level entity sets  Partial: an entity need not belong to one of the lower-level entity sets  Partial generalization is the default 48 ISA Total generalization
  • 49.
    AGGREGATION  Consider theternary relationship works_on,  Suppose we want to record managers for tasks performed by an employee at a branch 49
  • 50.
    AGGREGATION (CONTD.)  Relationshipsets works_on and manages represent overlapping information  Every manages relationship corresponds to a works_on relationship  However, some works_on relationships may not correspond to any manages relationships  So we can’t discard the works_on relationship 50
  • 51.
    AGGREGATION (CONTD.)  Eliminatethis redundancy via aggregation  Treat relationship as an abstract entity  Allows relationships between relationships  Abstraction of relationship into new entity  Without introducing redundancy, the following diagram represents:  An employee works on a particular job at a particular branch  An employee, branch, job combination may have an associated manager 51
  • 52.
    ER DIAGRAM WITHAGGREGATION 52
  • 53.
  • 54.
  • 55.
    ER DIAGRAM VSCLASS DIAGRAM 55
  • 56.
    RELATIONAL MODEL  Proposedby Edgar. F. Codd(1923-2003) in the early seventies. [ Turing Award –1981 ]  Most of the modern DBMS are relational  Simple and elegant model with a mathematical basis  Led to the development of a theory of data dependencies and database design.  Relational algebra operations –  crucial role in query optimization and execution.  Laid the foundation for the development of  Tuple relational calculus and then  Database standard SQL 56
  • 57.
    STRUCTURE OF RELATIONALDATABASES  Consists of a collection of tables  Row in a table represents a relationship among a set of values  Thus a table is a collection of relationships  In relational model table is also referred by relation  Tuple is a sequence of values  In relational model, a tuple corresponds to a row in a table 57
  • 58.
    RELATIONAL SCHEMA  Consistsof relation name, and a set of attributes or field names or column names. Each attribute has an associated domain.  Example:  student ( studentName: string, rollNumber: string, phoneNumber: integer, yearOfAdmission:integer, branchOfStudy :string )  Domain–set of atomic(or indivisible) values –data type Relation name Attribute name Domain 58
  • 59.
    RELATION INSTANCE  Afinite set of tuples constitute a relation instance.  A tuple of relation with schema R = (A1, A2, …, Am) is an ordered sequence of values (v1,v2, ... ,vm) such that vi∈ domain (Ai), 1≤i ≤m Roll_no Name yearOfAdmin branchOfSt udy 10CS001 Rajesh 2010 CSE 09CS020 Kiran 2009 CSE 09EE011 Ravi 2009 EE 59
  • 60.
    KEYS  Key: shouldhave a capability of uniquely identifying a tuple in a relation  Superkey: a set of one or more attributes that taken collectively allow us to identify uniquely a tuple in the relation  Example: {customer_name, customer_street} and {customer_name} are both superkeys of Customer, if no two customers can possibly have the same name 60
  • 61.
    KEY (CONTD.)  Kis a candidate key if K is minimal Example: {customer_name} is a candidate key for Customer, since it is a superkey and no subset of it is a superkey.  Primary key: a candidate key chosen as the principal means of identifying tuples within a relation  Should choose an attribute whose value never, or very rarely, changes.  E.g. email_address is unique, but may change however empid rarely changes 61
  • 62.
    KEYS (CONTD.)  ForeignKey: The attribute that corresponds to the primary key of another relation.  E.g. customer_name and account_number attributes of depositor are foreign keys to customer and account respectively.  Only values occurring in the primary key attribute of the referenced relation may occur in the foreign key attribute of the referencing relation. 62
  • 63.
    FOREIGN KEY (CONTD.) It is possible for a foreign key in a relation to refer to the primary key of the relation itself  An Example:  Employee ( empNo, name, sex, salary, dept, reportsTo)  reportsTo is a foreign key referring to empNo of the same relation 63
  • 64.