KEMBAR78
DBMS Unit9 Lecture1 | PDF | Databases | Data Warehouse
0% found this document useful (0 votes)
20 views67 pages

DBMS Unit9 Lecture1

The document covers advanced database concepts, focusing on object-oriented and distributed database models, including their properties and applications. It discusses the evolution from relational models to object-oriented and object-relational models, highlighting their features and benefits. Additionally, it explains distributed databases, their architectures, fragmentation, replication, and advantages over centralized systems.

Uploaded by

cocid2080
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views67 pages

DBMS Unit9 Lecture1

The document covers advanced database concepts, focusing on object-oriented and distributed database models, including their properties and applications. It discusses the evolution from relational models to object-oriented and object-relational models, highlighting their features and benefits. Additionally, it explains distributed databases, their architectures, fragmentation, replication, and advantages over centralized systems.

Uploaded by

cocid2080
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Database Management System

Unit 9: Advanced Database


Concepts

Lecture 1
Unit 9: Advanced Database Concepts
9.1 Concept of Objet-Oriented and Distributed Database Model
9.2 Properties of Parallel and Distributed Databases
9.3 Concept of Data warehouse Database
9.4 Concept of Spatial Database
STONEBRAKER’S APPLICATION MATRIX

No Query Query

Complex Data
OODBMS ORDBMS

Simple Data
File System RDBMS

Dr. Stonebraker claims that it is a matrix for classifying DBMS


applications. Beautiful in its simplicity, the matrix has become
part of the database culture and "knowledge.“
Revised : STONEBRAKER’S DBMS MATRIX

• The primary error in Stonebraker's DBMS Matrix is the apparent


assumption that ODBMSs do not have a query capability
MOTIVATION
• Relational model (70’s):
• Clean andsimple.
• Great for administrative and transactional data.
• Not as good for other kinds of complex data (e.g., multimedia,
networks,CAD).

• Object-Oriented models(80’s):
• Complicated, but some influential ideas from Object Oriented
• Complex data types.

• Idea: BuildDBMS based on OOmodel.


Programming languages have evolved from Procedural to
Object Oriented. So why not DBMSs ???
RELATIONAL MODEL
• Relations are the key concept, everything else isaround relations

• Primitivedata types, e.g., strings,integer, date, etc.

• Great normalization, query optimization, and theory

• Whatismissing??
• Handling of complexobjects
• Handling of complex data types
• Codeisnot coupled with data
• No inherence, encapsulation,etc.
RELATIONAL MODEL OF A ‘CAT’
Relational database of a cat:

At query time, try toput


things together as you
want !!!!
OBJECT ORIENTED MODEL OF A ‘CAT’

Object-oriented database of a cat: The first areas where ODBMS were widely used were:
! CASE :Computer aided software engineering

! CAD :Computer aided design

! CAM :Computer aided manufacture

Increasingly now used in:


! telecommunications

! healthcare

! finance

! multimedia

! text/document/quality management
TWO APPROACHES

• Object-Oriented Model (OODBMS)


• Pure OO concepts

• Object-Relational Model (ORDBMS)


• Extended relational model with OO concepts
FIRST APPROACH: OBJECT-ORIENTED
MODEL
• Relations are not the central concept, classes and objects are
the main concept

• Object-Oriented DBMS(OODBMS) are DBMS based on an Object-


Oriented Data Model inspired by OO programming languages

• Main Features:
• Powerful type system
• Classes
• Object Identity
• Inheritance

• OODBMS are capable of storing complex objects, I.e., objects


that are composed of other objects, and/or multi-valued
attributes.
FEATURE 1: POWERFUL TYPE SYSTEM

• Primitive types
• Integer, string, date, Boolean, float,etc.

• Structure type
• Attribute can be a record with a schema
Struct {integer x, string y}
• Collection type
• Attribute can be a Set, Bag, List, Array of other types

• Reference type
• Attribute can be a Pointer to another object
FEATURE 2: CLASSES
• A ‘class’ is in replacement of ‘relation’

• Same concept as in OO programming languages


• All objects belonging to a same class share the same
properties and behavior

• An ‘object’ can be thought of as ‘tuple’ (but richer


content)

• Classes encapsulate data +methods +relationships


• Unlike relations that contain data only

• In OODBMSs objects are persistency (unlikeOO


programming languages)
FEATURE 3: OBJECT IDENTITY
• OID is a unique identity of each object regardless of
its content
• Even if all attributes are the same, still objects have different
OIDs

• Easier for references

• An object is made of twothings:


• State: attributes (name, address, birthDate of a person)
• Behaviour: operations (age of a person is computed from
birthDate and currentdate)
FEATURE 4: INHERITANCE

• A class can be defined in terms of


Person

another one. name: {firstName: string,


middleName: string,
lastName: string}
address: string
birthDate: date
• Person is super-class and Student age(): Integer

is sub-class.
changeAddress(newAdd: string)

• Student class inherits attributes Student


and operations of Person. regNum: string {PK}
major: string

register(C: Course): boolean


STANDARDS FOR OBJECT-ORIENTED
MODEL
• ODMG: Object Data Management Group (1991)
• provide a standard where previously there was none
• support portability between products
• standardize model, querying and programming issues

• Language of specifying the structure of object


database
• ODL: Object Definition Language
• OQL: Object Query Language

• ODL is somehow similar to DDL (Data Definition


Language) in SQL
SECOND APPROACH: OBJECT-
RELATIONAL MODEL
• Object-oriented model tries to bring the main
concepts from relational model to the OO domain
• The heart is OO concepts with some extensions

• Object-relational model tries to bring the main


concepts from the OO domain to the relational
model
• The heart is the relational model with some extensions
• Extensions through user-defined types
CONCEPTUAL VIEW OF OBJECT-
RELATIONAL MODEL
• Relation is still the fundamental structure

• Relational model extended with the following features


• Type system with primitive and structure types (UDT)
• Including set, bag, array, list collection types
• Including structures like records
• Methods
• Special operations can be defined over the user-defined types
(UDT)
• Specialized operators for complex types, e.g., images, multimedia,
etc.
• Identifiers for tuples
• Unique identifiers even for identical tuples
• References
• Several ways for references and de-references
CONCEPTUAL VIEW OF OBJECT-
RELATIONAL MODEL

• Allow of nestedrelations

• Repeating movies inside the stars


records is redundancy

Star(name, address(street, city), birthdate, • To avoid redundancy, usepointers


movies(title, year, length)) (references)

Movie

Star 37
SUPPORT FROM VENDORS
• Several major software companies including IBM,
Informix, Microsoft, Oracle, and Sybase have all
released object-relational versions of their products

• Extended SQL standards called SQL-99 or SQL3


Summary
• First Approach: Object-Oriented Model
• Concepts from OO programming languages
• ODL: Object Definition Language
• OQL: Object Oriented Query Language

• Second Approach: Object-Relational Model


• Conceptual view
• Data Definition Language (Creating types, tables, and
relationships)
• Querying object-relational database (SQL-99)
WHEN TO CONSIDER
OODBMS OR ORDBMS
• Complex Relationships
• A lot of many-to-many relationships, tree structures or network (graph)
structures.

• Complex Data
• Multi-dimensional arrays, nested structures, or binary data, images,
multimedia, etc.

• Distributed Databases
• Need for free objects without the rigid table structure.

• Repetitive use of Large Working Sets of Objects


• To make use of inheritance and reusability

• Expensive Mapping Layer


• Expensive decomposition of objects (normalization) and re-
composition at querytime
KEY BENEFITS OF ODBMS
• Persistence & Versioning
• Created objects are maintained across different database runs
(persistent)
• Different evolving copies of the same object can be created over
time (versioning)

PersistentObject Superclass
Approach

• Superclass encapsulates any class for


storage and retrieval

• This superclass implements all


functionalities of read/write
operations
KEY BENEFITS OF ODBMS (CONT’D)
• Sharing in highly distributed environment
• Easier to share and distribute objects than tables
KEY BENEFITS OF ODBMS (CONT’D)
• Better memory usage and less paging
• Bringing only objects of interest

ODBMS Relational DBMS


OBJECT-ORIENTED VS. OBJECT-
RELATIONAL
• Object-oriented DBMSs
• Did not achieve much success (until now) in the market
place
• No query support(Indexing, optimization)
• No security layer

• Object-relational DBMSs
• Better support from big vendors
• Tries to make use of all advances in RDBMSs
• Indexes, views, triggers, query optimizations, security layer,etc.
• Work in progress --- Long way to go
Distributed Database
Distributed Database System
• A distributed database (DDB) is a collection of multiple, logically
interrelated databases distributed over a computer network.
• A distributed database management system (D–DBMS) is the
software that manages the DDB and provides an access mechanism
that makes this distribution transparent to the users.
• Distributed database system (DDBS) = DDB + D–DBMS
Centralized DBMS on a Network
Distributed DBMS Environment
Distributed Database
• Data is stored at several sites, each managed by a DBMS that can run
independently
• Desired properties:
• Distributed Data Independence
• Distributed Transaction Atomicity
Distributed Data Independence
• Users should not have to know where data is located
• – no need to know the locations of references relations, their copies or
fragments
• extends Physical and Logical Data Independence principles
• Queries spanning multiple sites should be optimized in a cost-based
manner
• taking into account communication costs and differences in local
computation costs
Distributed Transaction Atomicity
• Users should be able to write transactions accessing multiple sites
just like local transactions
• The effects of a transaction across sites should be atomic
• all changes persist if transaction commits
• none persist if transaction aborts
Types of Distributed Databases

• Homogeneous:
• Every site runs same type of DBMS
• Heterogeneous:
– Different sites run different DBMSs
– different RDBMSs or even non-relational DBMSs
More on Heterogeneous
Distributed Databases
• Database servers are accessed through well-accepted and
standard Gateway protocols
– masks the differences of DBMSs (capability, data format etc.)
– e.g. ODBC, JDBC
• However, can be expensive and may not be able to hide all
differences
– e.g. when a server is not capable of supporting distributed
transaction management

Gateway

DBMS1 DBMS2 DBMS3


Distributed DBMS Architectures

• Three alternative approaches

1. Client-Server
2. Collaborating Server
3. Middleware
Client-Server Systems
• One or more client (e.g. personal computer) and one or more server processes
(e.g. a mainframe)
– A client process can ship a query to any server process
– Clients are responsible for user interfaces
– Server manages data and executes queries
• Advantages
– clean separation and centralized server
– expensive server machines are not underutilized by simple user interactions
– users can run GUI on clients that they are familiar with
• Challenges
– need to carefully handle communication costs
– e.g. fetching tuples one at a time might be bad – need to do caching on client side

QUERY

CLIENT CLIENT

SERVER SERVER SERVER


Collaborating Server Systems
• Queries can span multiple sites
– not allowed in client-servers as the clients would have
had to break queries and combine the results
• When a server receives a query that requires
access to data at other servers
– it generates appropriate subqueries
– puts the result together
• Eliminates distinction between client and server
SERVER
SERVER
SERVER
QUERY
Middleware Systems
• Allows a single query to span multiple servers

• But does not require all db servers to be capable of


handling multi-site execution strategies
– need just one db server capable of managing queries and
transactions spanning multiple servers (called middleware)
– the remaining servers can handle only the local queries and
transactions

• The middleware layer is capable of executing joins and


other operations on data obtained from other servers, but
typically does not maintain any data

• Useful when trying to integrate several “legacy systems”


– whose basic capabilities cannot be extended
Storing Data in a Distributed DBMS
• Relations are stored across several sites
• Accessing data at a remote site incurs message-
passing costs
• To reduce this overhead, a single relation may be
partitioned or fragmented across several sites
– typically at sites where they are most often accessed
• The data can be replicated as well
– when the relation is in high demand
Fragmentation
• Break a relation into smaller relations or fragments
– store them in different sites as needed
TID
t1
t2
t3
• Horizontal: t4
– Usually disjoint
– Can often be identified by a selection query (employees in a city – locality of
reference)
– To retrieve the full relation, need a union

• Vertical:
– Identified by projection queries
– Typically unique TIDs added to each tuple
– TIDs replicated in each fragments
– Ensures that we have a Lossless Join
Replication
• When we store several copies of a relation or relation fragments
– can be replicated at one or more sites
– e.g. R is fragmented into R1, R2, R3; one copy of R2, R3; but two copies
at R1 at two sites
• Advantages
– Gives increased availability – e.g. when a site or communication link goes
down
– Faster query evaluation – e.g. using a local copy
• Synchronous and Asynchronous (later)
– Vary in how current different copies are when a relation is modified

SITE A SITE B

R1 R3
R1 R2
Distributed Catalog Management
• Must keep track of how data is fragmented and replicated across sites
– in addition to usual schema, authorization, and statistical information

• Must be able to uniquely identify each replica of each fragment


– Globally unique name may compromise autonomy of servers
– To preserve local autonomy: Global relation name = <local-name, birth-
site>
– To identify a replica, add a replica-id field (now called global replica
name)

• Site Catalog: Describes all objects (fragments, replicas) at a site +


Keeps track of replicas of relations created at this site
– To find a relation, look up its birth-site catalog
– Birth-site never changes, even if relation is moved
Peer-to-Peer Replication
• More than one of the copies of an object can be a
master
• Changes to a master copy must be propagated to
other copies somehow
• If two master copies are changed in a conflicting
manner, conflict resolution needed
– e.g., Site 1: Joe’s age changed to 35; Site 2: to 36
• Best used when conflicts do not arise:
– E.g., Each master site owns a disjoint fragment
– E.g., Updating rights held by one master at a time – then
propagated to other sites
Advantages of DDBMS
• 1. Increased reliability and availability
• A distributed database system is robust to failure to some extent.
Hence, it is reliable when compared to a Centralized database system.
• 2. Local control
• The data is distributed in such a way that every portion of it is local to
some sites (servers). The site in which the portion of data is stored is
the owner of the data.
• 3. Modular growth (resilient)
• Growth is easier. We do not need to interrupt any of the functioning
sites to introduce (add) a new site. Hence, the expansion of the whole
system is easier. Removal of site is also does not cause much problems.
• 4. Lower communication costs (More Economical)
• Data are distributed in such a way that they are available near to the
location where they are needed more. This reduces the
communication cost much more compared to a centralized system.
Advantages of DDBMS conti..
• 5. Faster response
• – Most of the data are local and in close proximity to where they are
needed. Hence, the requests can be answered quickly compared to a
centralized system.
• 6. Reflects the organizational structure
• Normally, database is fragmented into various locations wherever we
have controls.
• 7. Secured management of distributed data
• Various transparencies like network transparency, fragmentation
transparency, and replication transparency are implemented to hide
the actual implementation details of the whole distributed system. In
such way, Distributed database provides security for data.
Advantages of DDBMS conti..
• 8. Robust
• The system is continued to work in case of failures. For example,
replicated distributed database performs in spite of failure of other
sites.
• 9. Complied with ACID properties
• Distributed transactions demands Atomicity, Consistency, Isolation,
and Reliability.
• 10. Improved performance and Parallelism in executing
transactions can be achieved.
Disadvantages of DDBMS
• 1. Complex Software
• Complex implementation. Costs more in terms of software cost compared to a
centralized system. Additional software might be needed in most of the cases over a
centralized system.
• 2. Increased Processing overhead
• It costs many messages to be shared between sites to complete a distributed
transaction.
• 3. Data integrity
• Data integrity becomes complex. Too much network resources may be used.
• 4. Different data formats might be used
• This may cost time.
• 5. Deadlock is difficult to handle compared to a centralized system.
Disadvantages of DDBMS Conti..
• 6. May cause much more network traffic in case of write operation
in a replicated form of distributed database

• 7. Distributed System supported Operating System is required to


implement distributed database system

• 8. The data shared between sites over networks are vulnerable to


attack. Hence, network oriented security protocols to be used based
on the sensitivity of data shared

• 9. More complex in terms database design – According to various


applications, we may need to fragment a database, or replicate a
database or both

• 10. Handling failures is a difficult task. In some cases, we may not


distinguish site failure, network partition, and link failure.
Concepts of Data Warehouses
Data Warehouses
• Stores static data that has been extracted from different
databases (data sources) in an organization
• Central source of data that has been cleaned,
transformed, and cataloged
• Data is used for data mining, analytical processing,
analysis, research, decision support
• Data warehouses may be divided into data marts
• Subsets of data that focus on specific aspects
of a company (department or business process)

53
Data Warehouse Components

54
Applications and Data Marts

55
Data Warehouse : Characteristics
• Separate from operational databases
• Subject oriented: provides a simple, concise view on one or more
selected areas, in support of the decision process

• Constructed by integrating multiple, heterogeneous data sources

• Contains historical data: spans a much longer time horizon than


operational databases
• (Mostly) Read-Only access: periodic, infrequent updates
• Include metadata

Data Warehousing is process of constructing and using data warehouses


which requires data integration, data cleaning, and data consolidation.

56
Types of Data Warehouses
• Enterprise Warehouse: covers all areas of interest for an
organization

• Data Mart: covers a subset of corporate-wide data that is of


interest for a specific user group (e.g., marketing).
• Virtual Warehouse: offers a set of views constructed on demand on
operational databases. Some of the views could be materialized
(precomputed)

57
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as customer, product, sales
• Focusing on the modeling and analysis of data for decision makers, not on
daily operations or transaction processing
• Provide a simple and concise view around particular subject issues by
excluding data that are not useful in the decision support process

58
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous data sources
• relational databases, flat files, on-line transaction records
• Data cleaning and data integration techniques are applied.
• Ensure consistency in naming conventions, encoding structures, attribute measures, etc.
among different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
• When data is moved to the warehouse, it is converted.

59
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly longer than that of
operational systems
• Operational database: current value data
• Data warehouse data: provide information from a historical perspective (e.g., past 5-10
years)
• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
• But the key of operational data may or may not contain “time element”

60
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the operational
environment
• Operational update of data does not occur in the data warehouse
environment
• Does not require transaction processing, recovery, and concurrency control mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data

61
Data Mining

62
Data Mining
• Process of discovering interesting patterns or knowledge from a
(typically) large amount of data stored either in databases, data
warehouses, or other information repositories

• Interesting: previously unknown, potentially useful

• Alternative names:knowledge discovery/extraction,


information harvesting, business intelligence
• In fact, data mining is a step of the more general process of
knowledge discovery in databases (KDD)

63
Data Mining --2
• Data in data warehouses are analyzed to reveal
hidden patterns and trends
• Market-basket analysis to identify new
product bundles
• Find root cause of qualify or manufacturing
problems
• Prevent customer attrition
• Acquire new customers
• Cross-sell to existing customers
• Profile customers with more accuracy

64
65
Steps of a KDD Process
• Learning the application domain:
• relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation:
• Find useful features, dimensionality/variable reduction, invariant
representation.
• Choosing functions of data mining
• summarization, classification, regression, association, clustering.
• Choosing the mining algorithm(s)
• Data mining : search for patterns of interest
• Pattern evaluation and knowledge presentation
• visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge
66
Why is Data Mining necessary?
• Make use of your data assets
• There is a big gap from stored data to knowledge; and the transition
won’t occur automatically.
• Many interesting things you want to find cannot be found using
database queries
“find me people likely to buy my products”
“Who are likely to respond to my promotion”

67
Data Mining Applications
• Credit ratings/targeted marketing:
• Given a database of 100,000 names, which persons are the least likely
to default on their credit cards?
• Identify likely responders to sales promotions
• Fraud detection:
• Which types of transactions are likely to be fraudulent, given the
demographics and transactional history of a particular customer?
• Customer relationship management:
• Which of my customers are likely to be the most loyal, and which are
most likely to leave for a competitor?

And Explore More!!!!

68
Self Study
• Parallel Database System
• Spatial Database
End of Unit 9
• Good Luck for your Examination!!

You might also like