The document outlines the structure and principles of database schemas, emphasizing the importance of ACID properties (Atomicity, Consistency, Isolation, Durability) in relational databases. It also discusses the process of decomposition in database design, normal forms, and the significance of dependency preservation. Additionally, it introduces machine learning, distinguishing between supervised and unsupervised learning, along with examples and algorithms for each type.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
12 views8 pages
DB & ML
The document outlines the structure and principles of database schemas, emphasizing the importance of ACID properties (Atomicity, Consistency, Isolation, Durability) in relational databases. It also discusses the process of decomposition in database design, normal forms, and the significance of dependency preservation. Additionally, it introduces machine learning, distinguishing between supervised and unsupervised learning, along with examples and algorithms for each type.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
DATABASE AND ML
Schema
A database schema defines the structure and organization
of data within a database. It outlines how data is logically
stored, including the relationships between different tables
and other database objects. The schema serves as a
blueprint for how data is stored, accessed, and manipulated,
ensuring consistency and integrity throughout the system.
Features of a good relational design
Relational databases need ACID characteristics.
ACID refers to four essential properties: Atomicity,
Consistency, Isolation, and Durability.
These features are the key difference between a relational
database and a non-relational database.
Atomicity:
The term atomicity defines that the data remains atomic. It
means if any operation is performed on the data, either it
should be performed or executed completely or should not
be executed at all. It further means that the operation should
not break in between or execute partially. In the case of
executing operations on the transaction, the operation
should be completely executed and not partially.
2) Consistency
The word consistency means that the value should remain
preserved always. In DBMS, the integrity of the data shouldbe maintained, which means if a change in the database is
made, it should remain preserved always. In the case of
transactions, the integrity of the data is very essential so that
the database remains consistent before and after the
transaction. The data should always be correct.
3) Isolation
In DBMS, Isolation is the property of a database where no
data should affect the other one and may occur concurrently.
In short, the operation on one database should begin when
the operation on the first database gets complete. It means
if two operations are being performed on two different
databases, they may not affect the value of one another.
4) Durability
Durability ensures the permanency of something. In DBMS,
the term durability ensures that the data after the successful
execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that
even if the system fails or leads to a crash, the database still
survives. However, if gets lost, it becomes the responsibility
of the recovery manager for ensuring the durability of the
database.
Design alternative: larger schemas and smaller schemas
Rdb design
The Relational Model represents data and their relationshipsthrough a collection of tables. Each table also known as a
relation consists of rows and columns. Every column has a
unique name and corresponds to a specific attribute, while
each row contains a set of related data values representing a
real-world entity or relationship. This model is part of the
record-based models which structure data in fixed-format
records each belonging to a particular type with a defined set
of attributes.
E.F. Codd introduced the Relational Model to organize data
as relations or tables.
Decomposition using fd:
When we divide a table into multiple tables or divide a
relation into multiple relations, then this process is termed
Decomposition in DBMS.
When a relation in the relational model is not in
appropriate normal form then the decomposition of a
relation is required.
In a database, it breaks the table into multiple tables.
+ If the relation has no proper decomposition, then it may
lead to problems like loss of information.
+ Decomposition is used to eliminate some of the
problems of bad design like anomalies, inconsistencies,
and redundancy.
Decomposition }Sere Decomposition
If the information is not lost from the relation that is
decomposed, then the decomposition will be lossless.
The lossless decomposition guarantees that the join of
relations will result in the same relation as it was
decomposed.
The relation is said to be lossless decomposition if
natural joins of all the decomposition give the original
relation.
Lossy Decomposition
As the name suggests, lossy decomposition means when we
perform join operation on the sub-relations it doesn't result to
the same relation which was decomposed. After the join
operation, we always found some extraneous tuples. These
extra tuples genrates difficulty for the user to identify the
original tuples.
Dependency preservation
+ Dependency Preservation: Dependency Preservation is
an important technique in database management
system. It ensures that the functional dependencies
between the entities is maintained while performing
decomposition. It helps to improve the database
efficiency, maintain consistency and integrity.
It is an important constraint of the database.
In the dependency preservation, at least onedecomposed table must satisfy every dependency.
If a relation R is decomposed into relation R1 and R2,
then the dependencies of R either must be a part of R1
or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
For example, suppose there is a relation R (A, B, C, D)
with functional dependency set (A->BC). The relational R
is decomposed into R1(ABC) and R2(AD) which is
dependency preserving because FD A->BC is a part of
relation R1(ABC).
Normal forms
First Normal Form (1NF):
The table should have atomic (indivisible) values.
Each column must contain only one value per row.
The order in which data is stored does not matter.
There should be no repeating groups or arrays in any
column.
Second Normal Form (2NF):
The table must first be in 1NF.
+ All non-key attributes must be fully dependent on the
primary key (no partial dependency).
+ This eliminates cases where attributes depend only on
part of a composite primary key (in case of composite
keys).
Third Normal Form (3NF):
The table must first be in 2NF.
There should be no transitive dependency (i.e., non-key
attributes should not depend on other non-key
attributes).This ensures that each non-key attribute is only
dependent on the primary key.
Boyce-Codd Normal Form (BCNF):
The table must be in 3NF.
Every determinant (an attribute or set of attributes that
determines another attribute) must be a candidate key.
It is a stricter version of 3NF that eliminates certain
types of anomalies that 3NF may allow.
Fourth Normal Form (4NF):
The table must first be in BCNF.
There should be no multi-valued dependencies (i.e., no
situation where one attribute determines multiple
independent attributes).
Fifth Normal Form (5NF):
The table must first be in 4NF.
There should be no join dependency and the table
should not contain redundant data that can be
reconstructed by joining other tables.
Sixth Normal Form (6NF):
This deals with temporal databases where data is
related to time and is used to handle cases of temporal
redundancy. It splits data into even more granular tables.
Uma of Normal Forms:
1NF: Atomic values, no repeating groups.
2NF: 1NF + no partial dependency on the primary key.
3NF: 2NF + no transitive dependency.
BCNF: 3NF + every determinant is a candidate key.
ANF: BCNF + no multi-valued dependencies.5NF: 4NF + no join dependency.
6NF: Deals with temporal data and its redundancy.
Machine learning
Machine learning (ML) is a branch of artificial intelligence
(Al) focused on enabling computers and machines to imitate
the way that humans learn, to perform tasks autonomously,
and to improve their performance and accuracy through
experience and exposure to more data
Types:
Supervised learning and unsupervised learning are two
fundamental types of machine learning, differing in how
models are trained:
Supervised Learning:
Definition: In supervised learning, the model is trained
on labeled data, where each input has a corresponding
correct output (target or label).
Goal: The goal is to learn a mapping from inputs to
outputs so that the model can predict the output for
new, unseen data.
+ Examples:
Classification: Predicting whether an email is spam or
not (label: "spam" or "not spam").
Regression: Predicting house prices based on features
like location, size, etc. (label: price).
Algorithms: Linear regression, decision trees, support
vector machines, neural networks, etc.
Unsupervised Learning:Definition: In unsupervised learning, the model is trained
on data that has no labeled outputs. The goal is to
identify patterns or structures in the data.
Goal: The goal is to find hidden patterns, groupings, or
relationships in the data without prior knowledge of the
output labels.
Examples:
Clustering: Grouping customers based on purchasing
behavior (no predefined labels).
Dimensionality Reduction: Reducing the number of
features while retaining essential information, like PCA
(Principal Component Analysis).
Algorithms: K-means clustering, hierarchical clustering,
DBSCAN, PCA, etc.
Linear regression
Logistic regression
Naive Bayes