Final review:
INDEX
Index is a structure used to retrieve data from a database very fast.
a quick lookup table used to locate records that users commonly need to search. For
example, it is used in contact lists, where the data may be physically stored in the order
in which people's contact information is entered, but it is easier to discover them when
they are listed alphabetically.Large-scale table searches and relational table
connections both benefit greatly from it.
There are two types of database indexes: clustered index and non-clustered index.
When more than two records are stored in the same file these types of storing are
known as cluster indexing. By using the cluster indexing we can reduce the cost of
searching since multiple records related to the same thing are stored at one place and it
also gives the frequent joining of more than two tables.It is a unique index per table that
uses primary key to organize data.
A non-clustered index sorted reference from the main table for a given field that holds
pointers back to the original entries, it just tells us where the data lies.Specifically, it
gives us a list of virtual pointers or references to the location where the data is actually
stored. Data is not physically stored in the order of the index. Instead, data is present in
leaf nodes,For example , it it like the back page of a book, the index for a specific field
got pointers to more general chapters
Both clustered and non-clustered, are stored and searched as B-trees.B-Tree structure
consists of three main levels: the Root Level, the top node that contains a single index
page, from which the data searches begins,branch level, middle part which contains key
values and index values. leaf level,the lowest level nodes,holding the actual data. All
other nodes including the root node only hold the key values and pointers to the next
nodes.
Ranking Functions:
The RANK() function is a window function that could be used in SQL Server to calculate
a rank for each row within a partition of a result set.
The same rank is assigned to the rows in a partition which have the same values. The
rank of the first row is 1. The ranks may not be consecutive in the RANK() function as it
adds the number of repeated rows to the repeated rank to calculate the rank of the next
row.
● ROW_NUMBER() gives unique sequential numbers for each row.
● RANK()returns a unique rank number for each distinct row. It behaves like
ROW_NUMBER() except for the rows with equal values, where it will rank
with the same rank ID and generate a gap after it. Skip the previous row
processed
Ties matters
● DENSE_RANK() is similar to RANK() but the only difference is
DENSE_RANK() does not skip any rank, i.e. leaving no gap(s) between the
gap(s).
● NTILE() is used to distribute the rows into the rows set with a specific
number of groups.(calculate percentage of the ranking
Ranking functions are a subset of the built in functions in SQL Server. They are used to
provide a rank of one kind or another to a set of rows in a partition. The partition can be
the full result set, if there is no partition.
It is important in analyzing data since it is more efficient. With the data shown by ranking
functions we can know the relationship and other things more directly, and Understand
where each data is located in the database. For example if a company needs
customers’ info to investigate.Like Who are your top 10 customers? Who are the
top-performing employees in each department? When the scale of customers is small,
like ten , we can come up with the rankings manually, But if the scale is more than 1000,
it saves more time to use ranking functions to have the results.
Synthetic Transaction:
Synthetic transactions allow people to employ scripted transactions to simulate certain
scenarios in order to identify flaws that may affect end users.
Synthetic transactions also allow us to test the performance of our base stored
procedures with real data at a set number of calls. We are able to know what the
performance is going to be , which eliminates surprises. We used to not test and simply
respond to errors when the code was live. This has a negative impact on customer
satisfaction because we must devote a significant amount of time and resources to
dealing with the problem as it arises to disrupt user experience.
Synthetic transactions can also help us in production, outside of the testing
environment. It is the most reliable method for us to evaluate the availability of our
application and distinguish its status from 'looks alive' to 'is alive,' because you can
connect to the server and the green light will appear even if the server is already dead.
In troubleshooting, synthetic transactions might be useful.
When testing the code with synthetic transactions, the code will release without drama
since we have already tested it. It keeps the high availability in the database so that we
can minimize the risk of lost revenue, minimize downtime and ensure system
availability at all times.
For example:Knowing the availability of our application is vital for a company because
customers pay for our services to be available, and if we are not responsive to customer
requests, we may as well not be in business. When we can program and send synthetic
transactions every couple of seconds, we can determine whether or not our application
is actually working (alive). We would constantly submit made-up client data with random
information and a quantity of 0 to make up order information and determine if our order
procedure was working within a few seconds. This way, we can detect an outage or
issue as quickly as possible, and most likely before our clients.Gnerally, it saves us
surprises, downtime(since we can get it fixed way quicker) and money .
Modularity:
Modularity is a design concept that states that a complicated design can be broken
down into a collection of simpler component modules. It involves dividing a program's
functionality into distinct parts or building blocks, each of which has all of the
components needed to carry out a certain aspect of the functionality.
Stored procedures and user-defined functions as well as aggregate functions are
involved in modularity. This is because in these database objects we can know by
coding that each part has a corresponding job. And with these objects we can reduce
the duplicated code.
modularity makes your code easier to read and modularity will promote reusability. For
example the code we normally write about inserting a new row into table order, we
need to write nested procedures with several unique attributes to locate the foreign
keys.These nested procedures can be called multiple times once they are created
throughout the whole procedure.When our files have thousands of lines of code like a
team project, we may find it is difficult to understand and identify specific portions. But
with modularity, The separate part is readable and It is excellent for testing.
If there is no modularity ,the whole system will be messy since everything is piled up
into one piece without a good division, and there will be a lot of inconvenience in both
writing code and reading. And we will have many duplicate parts appearing in the
process many times.
Control of Flow:
The control flow function evaluates the condition specified in it. The output
generated by them can be a true, false, static value or column expression. We can
use the control flow functions in the SELECT, WHERE, ORDER BY, and GROUP
BY clause.
provide the capability to control the logic flow, declare and set variables, and
handle warnings and exceptions.
Flow control is important because it is possible for a sending computer to
transmit information at a faster rate than the destination computer can receive
and process it.
While and if else :
Error Handling:
Error handling is a database item that allows us to control code transactions. When a
database problem occurs, error handling helps ensure that the data fails gracefully.
It can help avoiding cascading rollbacks(failure of one transaction causes several
other dependent transactions to rollback )which are useful and mandantary for ACID
in databases. That is, when the code failed before transactions, the ACID properties
were unaffected. However, committing and failing these transactions will result in a
cascade rollback. The failure of one transaction, like a snowball of errors, will cause the
failure of several linked transactions.
For example, when we write code including transaction T1, the transaction T1 writes an
uncommitted command that is read by Transaction T2. Transaction T2 writes it that is
read by Transaction T3.Suppose at this point T1 fails.T1 must be rolled back, and the
other T2,T3 which is dependent on the previous steps must be all rolled back. And then
with the existence of error handling , The failure of the error happened will be
suspended in the T1 phase.The information in the error section will be reported to us, so
that we can correct some of our errors in time
Throw(new school) and Raiserror(old school) as database objects handle errors.They
deal with data differently since raiserror always generates new exceptions and so loses
the original exception details, and it is a central control. However, Throw is used to raise
an exception and redirect execution to a catch block.
Error management is essential because it makes it easier for end users to utilize code
correctly. Another significant benefit is that it makes code easier to maintain. Error
handling simplifies the incorporation of input specifications into code, eliminating the
need for employees to review the design as they construct and maintain the code. If
errors are not correctly managed by humans, the application may crash, forcing users to
lose all of their work and leaving them unsure of where the problem occurred.
Temporary Objects:
The unnamed object used to store temporary values is known as a temporary object.
For the duration of the query session, we can generate and use temporary objects. The
speed and scalability of operations can be greatly impacted by the temporary object
type people select from the many temporary object kinds (Common table expression,
temporary table, and table variable) that we might use for storing, querying, and
transferring data.
CTE:CTE (Common table expression) was developed to simplify certain classes of
queries for which a derived table was just insufficient.
It allows many people who are not familiar with sql of sub-queries to have CTE as
another way for them to do the sub-queries.
Table variable:@ Table variables (new school) are a sort of local variable that is used to
temporarily store data. It is used to hold table variables in the same way that the
temporary table is. it is remembered. As a result, the table variable retrieves data
faster. And it will clear up the table automatically, so we won't have to manually update
and delete the data in the table variables.
Temporary table:The temporary table (1960s old school) is most likely permanent. It is a
base table that is not stored in the database and exists just for the duration of the active
database session in which it was created. It's on disk, and it's persistent. Temporary
tables are required because they provide interim result sets, which can reduce query
plan complexity in large or complex workloads. It can reduce query plan complexity.
View
Derived table
Why important : If there are no temporary objects in the database, there will be many
complex queries in a large database, causing heavy workload to execute it.And There
will be a lot of redundant data in the whole database and the whole structure of the data
is messy. The efficiency of our data processing will be greatly reduced.