SQL Interview Questions and Answers
for 2023
What is DBMS?
A Database Management System (DBMS) is system software for creating and
managing databases. It serves as an interface between databases and end-users or
application programs so that data is consistently organized and remains easily
accessible. It allows end-users to create, read, update, and delete data in a database.
There are two types of DBMS:
Relational Database Management System (RDBMS): In RDBMS, is used to
store and manage data that are in the form of tables.
Non-Relational Database Management System (often called NoSQL
databases): It stores data in a non-tabular form. Example – MongoDB
What is SQL?
SQL or Structured Query Language is a widely known programming language and is
primarily used to interact with databases. SQL helps in operating different operations
on databases such as extracting data, modifying data or updating data from the
database. The programming language is used worldwide by a large number of
companies to keep management of their various data, such as employee database,
client database, etc.
It is the standard language for Relational Database System. All the RDMS like MySQL,
MS Access, Oracle, and SQL Server use SQL as their standard database language. SQL
also supports a wide range of data types, including numeric, text, and date/time
values.
What is MySQL?
And. MySQL is an open-source relational database management system (RDBMS) that
is developed and distributed by Oracle Corporation. Supported by various operating
systems, such as Windows, Unix, Linux, etc., MySQL can be used to develop different
types of applications. Known for its speed, reliability, and flexibility, MySQL is mainly
used for developing web applications.
What are the subsets of SQL? Explain them.
The following are the three subsets of SQL:
Data Definition Language (DDL) – It allows end-users to CREATE, ALTER,
and DROP database objects.
Data Manipulation Language (DML) – With this, you can access and
manipulate data. It allows you to Insert, Update, Delete, and Retrieve data from
the database.
Data Control Language (DCL) – This lets you control access to the database.
It includes the Grant and Revoke permissions to manipulate or modify the
database.
What is a constraint, and how many levels of constraints are there?
Limit on the data type of table is specified using a constraint. One can determine a
constant while altering or even creating a table statement.
There are two levels of constraint –
Column level – Limits only column data
Table level – Limits whole table data
Following are the most used constraints that can be applied to a table:
NOT NULL
UNIQUE
CHECK
DEFAULT
PRIMARY KEY
FOREIGN KEY
What is a UNIQUE constraint?
The UNIQUE Constraint prevents identical values in a column from appearing in two
records. The UNIQUE constraint guarantees that every value in a column is unique.
What is the point of using a foreign key constraint?
The foreign key constraint comprises a set of rules, or limits, that will ensure that the
values in the child and parent tables match. Technically, this means that the foreign
key constraint will maintain the referential integrity within the database.
What is the primary key?
A Primary key in SQL is a column (or collection of columns) or a set of columns that
uniquely identifies each row in the table.
Uniquely identifies a single row in the table
Null values not allowed
A table can have only one primary key. It can consist of single or multiple fields.
Example- In the Student table, Stu_ID is the primary key.
Define Unique Key.
Uniquely identifies a single row in the table.
Multiple values allowed per table.
Null values allowed.
A primary key is a special kind of unique key.
What is a foreign key?
A foreign key (often called the referencing key) is a column(group of columns) in one
table, that refers to the Primary key in another table .
The Table with the foreign key is called child Table where as the table with Primary
key is called Parent Table.
What is RDBMS?
Relational Database Management System or RDBMS is based on the relational
database model and is among the most popular database management systems.
A relational database management system (RDBMS) is a type of database
management system (DBMS) that stores data in a row-based table structure that links
related data components. An RDBMS contains functions that ensure the data’s
security, accuracy, integrity, and consistency. This is not the same as the file storage
utilized by a database management system.
What are the features of MySQL?
Here are some of the important features of MySQL:
It is reliable and easy to use
It supports standard SQL (Structured Query Language)
MySQL is secure as it consists of a data security layer that protects sensitive data
from unauthorized users
MySQL has a flexible structure and supports a large number of embedded
applications
It is one of the very fast database languages
It is a suitable database software for both large and small applications
MySQL offers very high-performance results compared to other databases
It is supported by many well-known programming languages, such as PHP, Java, and
C++
It is free to download and use
What are the disadvantages of MySQL?
The disadvantages of MySQL are:
It is hard to make MySQL scalable
It does not support a very large database size as efficiently
MySQL does not support SQL check constraint
It is prone to data corruption
What are the differences between MySQL vs SQL?
MySQL SQL
1. It is a relational database that uses
1. It is a query language
SQL to query a database
2. MySQL supports multiple storage 2. SQL supports a single storage
engines and plug-in storage engines engine
3. It is a database that stores the 3. SQL is used to access, update, and
existing data in a database in an manipulate the data stored in a
organized manner. database
4. Supports many platforms 4. Supports only Linux and Windows
5. It has a complex syntax 5. It has a simpler syntax
Distinguish between global and local variables.
Global variables exist throughout the program and can be used anywhere in the
program. Also one can not create the global variable when the function is called.
Local variables exist throughout the function and can be used anywhere in the
function. One can call little variables whenever the function is called.
What’s an index and how is it used?
An index is a database management system that is used to improve the performance
of SQL queries. Indexes can be created on columns in a table, and they are typically
used to speed up searches for specific values in those columns. When a query is
executed, the database management system will first check to see if an index exists
for the columns that are being searched; if an index exists, the engine will use the
index to quickly locate the desired data, which can improve query performance.
Explain the different types of indexes in SQL.
There are three types of indexes in SQL:
Unique Index – It does not allow a field to have duplicate values if the column is
unique indexed.
Clustered Index – This index defines the order in which data is physically stored in
a table. It reorders the physical order of the table and searches based on key
values. There can be only one clustered index per table.
Non-Clustered Index – It does not sort the physical order of the table and
maintains a logical order of the data. Each table can have more than one non-
clustered index.
Difference between RDBMS and DBMS
DBMS RDBMS
DBMS stores data as file. RDBMS stores data in tabular form.
Data elements need to access Multiple data elements can be accessed at
individually. the same time.
Data is stored in the form of tables which
No relationship between data. are related to each other.
DBMS RDBMS
Normalization is not present. Normalization is present.
DBMS does not support distributed
database. RDBMS supports distributed database.
It uses a tabular structure where the
It stores data in either a headers are the column names, and the
navigational or hierarchical form. rows contain corresponding values.
It deals with small quantity of data. It deals with large amount of data.
Data redundancy is common in this Keys and indexes do not allow Data
model. redundancy.
It is used for small organization
and deal with small data. It is used to handle large amount of data.
It supports single user. It supports multiple users.
Data fetching is slower for the Data fetching is fast because of relational
large amount of data. approach.
The data in a DBMS is subject to
low security levels with regards to There exists multiple levels of data security
data manipulation. in a RDBMS.
Low software and hardware
necessities. Higher software and hardware necessities.
Examples: XML, Window Registry, Examples: MySQL, PostgreSQL, SQL Server,
etc. Oracle, Microsoft Access etc.
What is the Difference between DELETE, DROP and TRUNCATE Commands in
SQL?
DELETE Command DROP Command TRUNCATE Command
The DELETE command is Data The TRUNCATE command is a
The DROP command is Data
Language Manipulation Language Data Definition Language
Definition Language Command.
Command. Command.
The DELETE command deletes The TRUNCATE Command
one or more existing records The DROP Command drops the deletes all the rows from the
Use
from the table in the complete table from the database. existing table, leaving the row
database. with the column names.
We can restore any deleted We cannot restore all the
We cannot get the complete table
row or multiple rows from the deleted rows from the
Transition deleted from the database using the
database using the ROLLBACK database using the ROLLBACK
ROLLBACK command.
command. command.
The TRUNCATE command
The DELETE command does The DROP command removes the
Memory does not free the space
not free the allocated space of space allocated for the table from
Space allocated for the table from
the table from memory. memory.
memory.
The DELETE command The DROP Command has faster The TRUNCATE command
performs slower than the performance than DELETE Command works faster than the DROP
Performance DROP command and but not as compared to the Truncate command and DELETE
Speed TRUNCATE command as it Command because the DROP command because it deletes
deletes one or more rows command deletes the table from the all the records from the table
based on a specific condition. database after deleting the rows. without any condition.
The Integrity Constraints The Integrity Constraints will
Integrity The Integrity Constraints get
remain the same in the not get removed from the
Constraints removed for the DROP command.
DELETE command. TRUNCATE command.
We need ALTER permission on the
DELETE permission is required We need table ALTER
schema to which the table belongs
Permission to delete the rows of the permission to use the
and CONTROL permission on the
table. TRUNCATE command.
table to use the DROP command.
DELETE FROM table_name TRUNCATE TABLE
Syntax DROP TABLE table_name;
WHERE condition; table_name;
What is the difference between TRUNCATE and DELETE?
DELETE TRUNCATE
Delete command is used to delete a Truncate is used to delete all the
specified row in a table. rows from a table.
You can roll back data after using the
You cannot roll back data.
delete statement.
It is a DML command. It is a DDL command.
It is slower than a truncate statement. It is faster.
What is the difference between:
SELECT * FROM MyTable WHERE MyColumn <> NULL
SELECT * FROM MyTable WHERE MyColumn IS NULL
The first syntax will not work because NULL means ‘no value’, and you cannot use
scalar value operators. This is why there is a separate IS – a NULL predicate in SQL.
What is the difference between CHAR and VARCHAR?
CHAR is a fixed-length character data type, while VARCHAR is a variable-length
character data type.
What is a subquery in SQL? What are the different types of subquery?
A subquery is a query within another query. When there is a query within a query, the
outer query is called the main query, while the inner query is called a subquery. There
are two types of a subquery:
Correlated subquery: It obtains values from its outer query before it executes.
When the subquery returns, it passes its results to the outer query.
Non-Correlated subquery: It executes independently of the outer query. The
subquery executes first and then passes its results to the outer query. Both inner
and outer queries can run separately
For example, consider the following customer table:
Suppose we want to find all customers who live in the same city as customer with
id=1. We could use the following SQL query:
SELECT * FROM customers WHERE city IN (SELECT city FROM customers WHERE id
= 1)
This would return all rows from the customer’s table, including the row with id=1.
To exclude this row, we could add a condition to the subquery that checks for
customer_id != 1:
SELECT * FROM customers WHERE city IN (SELECT city FROM customers WHERE id
= 1 AND id != 1)
What is a correlated subquery?
A correlated subquery is a type of SQL query that contains a reference to a value
from outer query. Correlated subqueries are typically used when you want to find
rows from a table that match certain conditions, but you can only know those
conditions after examining other rows in the same table.
For example, you could use a correlated subquery to find all employees who make
more than the average salary in their department. In this case, you would need to
calculate the average salary for each department before you could compare each
employee’s salary to it.
What is collation sensitivity?
Collation sensitivity defines the rules to sort and compare the strings of character
data, based on correct character sequence, case sensitivity, character width, and
accent marks, among others.
What are the different types of collation sensitivity?
There are four types of collation sensitivity, which include –
Case Sensitivity: A and a and B and b.
Kana Sensitivity: Japanese Kana characters.
Width Sensitivity: Single byte character and double-byte character.
Accent Sensitivity.
What is a “scheduled job” or “scheduled task”?
Scheduled job or task allows automated task management on regular or predictable
cycles. One can schedule administrative tasks and decide the order of the tasks.
Can you name different types of MySQL commands?
SQL commands are divided into the following –
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)
Explain different DDL commands in MySQL.
DDL commands include –
CREATE – Used to create the database or its objects like table, index, function,
views, triggers, etc.
DROP – Used to delete objects
ALTER – Used to change database structures
TRUNCATE – It erases all records from a table, excluding its database structure
COMMENT – Used to add comments to the data dictionary
RENAME – Used to rename a database object
Explain different DML commands in MySQL.
This is one of the most popularly asked SQL interview questions.
DML commands include –
SELECT – Used to select specific database data
INSERT – Used to insert new records into a table
UPDATE – It helps in updating existing records
DELETE – Used to delete existing records from a table
MERGE – Used to UPSERT operation (insert or update)
CALL – It is used when you need to call a PL/SQL or Java subprogram
EXPLAIN PLAN – Used to interpret data access path
LOCK TABLE – Used to control concurrency
Explain different DCL commands in MySQL.
DCL commands are –
GRANT – It provides user access privileges to the database
DENY – Used to deny permissions to users
REVOKE – Used to withdraw user access by using the GRANT command
Explain different TCL commands in MySQL.
DCL commands include –
COMMIT – Used to commit a transaction
ROLLBACK – Used to roll back a transaction
SAVEPOINT – Used to roll back the transaction within groups
SET TRANSACTION – Used to specify transaction characteristics
What are the different types of Database relationships in MySQL? What is
Database Relationship?
A Database Relationship is defined as the connection between two relational database
tables. The primary table has a foreign key that references the primary key of another
table.
There are three types of Database Relationship –
One-to-one – Both tables can have only one record
One-to-many – The single record in the first table can be related to one or more
records in the second table
Many-to-many – Each record in both the tables can be related to any number of
records
Self-Referencing Relationship.
Is MySQL query case-sensitive?
MySQL queries are not case-sensitive by default. The following queries are the same.
SELECT * FROM `table` WHERE `column` = ‘value’
SELECT * FROM `table` WHERE `column` = ‘VALUE’
SELECT * FROM `table` WHERE `column` = ‘VaLuE’
How many TRIGGERS are allowed in the MySQL table?
6 triggers are allowed in the MySQL table:
BEFORE INSERT
AFTER INSERT
BEFORE UPDATE
AFTER UPDATE
BEFORE DELETE
AFTER DELETE
What are the different column comparison operators in MySQL?
The =, <>, <=, <, >=, >, <<, >>, < = >, AND, OR or LIKE operator are the
comparison operators in MySQL.
Comparisons operators are generally used with SELECT statements. They are used to
compare one expression to another value or expression.
What syntax can we use to get a version of MySQL?
By using the given query in your phpmyadmin-
SELECT version();
What is Auto Increment in SQL?
Auto Increment allows a unique number to be generated whenever a new record is
created in a table. Generally, it is the PRIMARY KEY field that we want to be created
automatically every time a new record is inserted.
AUTO INCREMENT keyword can be used in Oracle and IDENTITY keyword can be used
in SQL SERVER.
SQL Server runs in which TCP/IP port? Can it be changed?
SQL Server runs on port 1433, and it can be changed from the Network Utility TCP/IP
properties.
Name symmetric key encryption algorithms supported in the SQL server.
SQL Server supports several symmetric key encryption algorithms, such as DES, Triple
DES, RC2, RC4, 128-bit RC4, DESX, 128-bit AES, 192-bit AES, and 256-bit AES.
What is faster between a table variable and a temporary table?
Between these, a table variable is faster mostly as it is stored in memory, whereas a
temporary table is stored on disk. In case the size of the table variable exceeds
memory size, then both the tables perform similarly.
Mention the command used to get back the privileges offered by the GRANT
command?
REVOKE command is used to get back the privileges offered by the GRANT command.
What is a Clause in SQL?
A clause in SQL is a part of a query that allows users to filter or customize how they
want their data to be queried to them. It lets users limit the result set by providing a
condition to the query. When there is a large amount of data stored in the database,
Clause can be used to query and get data required by the user. The clause function
helps filter and analyze data quickly.
For Example – WHERE clause, HAVING clause.
Explain the ‘WHERE’ Clause and the ‘HAVING’ Clause.
It is one of the most important SQL interview questions.
The WHERE clause is used to filter the records from the table or used while joining
more than one table. It returns the particular value from the table if the specified
condition in the WHERE clause is satisfied. It is used with SELECT, INSERT, UPDATE,
and DELETE queries to filter data from the table or relation.
For Example:
SELECT * FROM employees
WHERE working_hour > 9;
The HAVING clause is used to filter the records from the groups based on the given
condition in the HAVING Clause. It can only be used with the SELECT statement. It
returns only those values from the groups in the final result that fulfills certain
conditions.
For Example:
SELECT name, SUM(working_hour) AS “Total working hours”
FROM employees GROUP BY name
HAVING SUM(working_hour) > 6;
What is the SELECT statement?
A SELECT command gets zero or more rows from one or more database tables or
views. The most frequent data manipulation language (DML) command is SELECT in
most applications. SELECT queries define a result set, but not how to calculate it,
because SQL is a declarative programming language.
What are some common clauses used with SELECT query in SQL?
The following are some frequent SQL clauses used in conjunction with a SELECT
query:
Syntax:
SELECT * FROM myDB.employees;
WHERE clause: In SQL, the WHERE clause is used to filter records that are required
depending on certain criteria.
ORDER BY clause: The ORDER BY clause in SQL is used to sort data in ascending
(ASC) or descending (DESC) order depending on specified field(s) (DESC).
GROUP BY clause: GROUP BY clause in SQL is used to group entries with identical
data and may be used with aggregation methods to obtain summarised database
results.
HAVING clause in SQL is used to filter records in combination with the GROUP BY
clause. It is different from WHERE, since the WHERE clause cannot filter aggregated
records.
What are the differences between the ‘WHERE’ Clause and the ‘HAVING’
Clause?
Below are the major differences between the ‘WHERE’ Clause and the ‘HAVING’
Clause:
WHERE Clause HAVING Clause
It performs filtration on HAVING clause performs
individual rows based on the filtration on groups based on
specified condition. the specified condition.
It can be used without GROUP It is always used with the
BY Clause. GROUP BY Clause.
WHERE Clause is applied in row HAVING is applied in column
operations. operations.
We cannot use the WHERE
This clause works with
clause with aggregate
aggregate functions.
functions.
WHERE comes before GROUP HAVING comes after GROUP
BY BY.
HAVING clause acts as a post-
This clause acts as a pre-filter.
filter.
WHERE Clause can be used
This Clause can only be used
with SELECT, INSERT, UPDATE,
with the SELECT statement.
and DELETE statements.
Syntax of WHERE clause:
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Syntax of HAVING clause:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
How to find:
1. duplicate records with one field?
2. duplicate records with more than one field?
Finding duplicate records with one field:
SELECT COUNT(field)
FROM table_name
GROUP BY field
HAVING COUNT(field) > 1
Finding duplicate records with more than one field:
SELECT field1,field2,field3, COUNT(*)
FROM table_name
GROUP BY field1,field2,field3
HAVING COUNT(*) > 1
What are the authentication modes in SQL Server?
SQL Server has two authentication modes –
Windows Mode – Default. This SQL Server security model is integrated with
Windows
Mixed Mode – Supports authentication both by Windows and by SQL Server
We can change modes by selecting tools of SQL Server configuration properties and
then hover over the security page.
You can go to the below steps to change authentication mode in SQL Server:
Click Start> Programs> Microsoft SQL Server and click SQL Enterprise Manager
to run SQL Enterprise Manager from the Microsoft SQL Server program group.
Then select the server from the Tools menu.
Select SQL Server Configuration Properties, and choose the Security page.
What is PL/SQL?
PL/SQL or Procedural Language for SQL was developed by Oracle. It is an extension of
SQL and enables the programmer to write code in a procedural format. Both PL/SQL
and SQL run within the same server process and have features like – robustness,
security, and portability of the Oracle Database.
What is SQL Profiler?
SQL Server Profiler is a graphical user interface for creating and saving data about
each event of a file. It also allows a system administrator to analyze and replay trace
results when a problem is being diagnosed. SQL Server Profiler is used to:
Examine the problem queries to find the cause of the problem
Diagnose slow-running queries
Determine the Transact-SQL statements that lead to a problem
Monitor the performance of the SQL Server
Correlate performance counters to diagnose problems
What is the SQL Server Agent?
SQL Server Agent is a Microsoft Windows service that executes day-to-day tasks or
jobs of SQL Server Database Administrator (DBA). This service enables the
implementation of tasks on a scheduled date and time.
What is Data Integrity?
Data integrity attributes to the accuracy, completeness, and consistency of the data
in a database. It also refers to the safety and security of data and is maintained by a
collection of processes, rules, and standards that were implemented during the design
phase. Three types of data integrity are:
Column Integrity
Entity Integrity
Referential Integrity
What is the difference between Rename and Alias?
Rename is actually changing the name of an object. Alias is giving another name
(additional name) to an existing object. Rename involves changing the name of a
database object and giving it a permanent name whereas Alias is a temporary name
given to a database object.
Syntax of a table Alias:
SELECT column1, column2….
FROM table_name AS alias_name
WHERE [condition];
Syntax of a table Rename:
RENAME TABLE {tbl_name} TO {new_tbl_name};
Which are the main steps in Data Modeling?
Following are the main steps in Data Modeling:
Identify and analyze business requirement
Create a quality conceptual and logical data model
Select the target database to create scripts for physical schema using a data
modeling tool
What is Referential Integrity?
Referential integrity is a relational database concept that suggests that the accuracy
and consistency of data should be maintained between primary and foreign keys.
What is Business Intelligence?
Business intelligence (BI) includes technologies and practices for collecting,
integrating, analyzing, and presenting business information. It combines business
analytics, data mining, data visualization, data tools and infrastructure, and best
practices.
Mention the types of privileges available in SQL?
Following are the types of privileges used in SQL:
System Privilege: It deals with an object of a specific type and indicates actions on
it which include admin that helps users to perform administrative tasks, alter any
cache group, and alter any index.
Object Privilege: It helps users to perform actions on an object using commands like
table, view, and indexes. There are other object privileges used in SQL EXECUTE,
INSERT, SELECT, FLUSH, LOAD, INDEX, UPDATE, DELETE, REFERENCES, etc.
What is ERD?
ERD or Entity Relationship Diagram is a visual representation of the database
structures and shows a relationship between the tables. The ER Diagrams have three
basic elements:
Entities – An entity is a person, place, thing, or event for which data is collected.
Attributes – It refers to the data we want to collect for an entity. It is a property,
trait, or characteristic of an entity, relationship, or another attribute.
Relationships – It describes how entities interact.
How will you find the unique values, if a value in the column is repeatable?
To find the unique values when the value in the column is repeatable, we can use
DISTINCT in the query, such as:
SELECT DISTINCT user_firstname FROM users;
We can also ask for several distinct values by using:
SELECT COUNT (DISTINCT user_firstname) FROM users;
Explain database white box testing.
White Box Testing is concerned with the internal structure of the database. The users
are unaware of the specification details.
Database white box testing includes testing of database triggers and logical views
that support database refactoring.
Validates database tables, data models, database schema
Performs module testing of database functions and SQL queries
Select default table values to check on database consistency
Adheres to referential integrity rules
Exhibit the students who are having the same batch ID and study in the
same department as student ids, 1002 and 1004.
Ans.
SELECT *
FROM students
WHERE batch_id IN (
SELECT batch_id
FROM students
WHERE student_id IN (1002, 1004)
)
AND department = (
SELECT department
FROM students
WHERE student_id IN (1002, 1004)
)
AND student_id NOT IN (1002, 1004);
What is the ACID property in SQL?
ACID is short for Atomicity, Consistency, Isolation, Durability. It ensures Data Integrity
during a transaction.
Atomicity: It means either all the operations (insert, update, delete) inside a
transaction take place or none. So, if one part of any transaction fails, the entire
transaction fails and the database state is left unchanged.
Consistency: Consistency ensures that the data must meet all the validation rules.
Irrespective of whatever happens in the middle of the transaction, Consistency
property will never leave your database in a half-completed state.
Isolation: It means that every transaction is individual. One transaction can’t access
the result of other transactions until the transaction is completed.
Durability: It implies that maintaining updates of committed transactions is
important. These updates must never be lost. It refers to the ability of the system to
recover committed transaction updates if either the system or the storage media fails.
Explain string functions in SQL?
SQL string functions are used for string manipulation.
Following are the extensively used SQL string functions:
UPPER(): Converts character data to upper case
LOWER(): Converts character data to lower case
SUBSTRING() : Extracts characters from a text field
RTRIM(): Removes all whitespace at the end of the string
LEN(): Returns the length of the value in a text field
REPLACE(): Updates the content of a string.
LTRIM(): Removes all whitespace from the beginning of the string
CONCAT(): Concatenates function combines multiple character strings
What are the differences between the Primary key and the Unique key?
Primary Key Unique Key
Enforces column uniqueness in a Determines a row that isn’t a
table primary key
Does not allow NULL values Accepts one NULL value
Has only one primary key Has more than one unique key
Creates clustered index Creates non-clustered index
Primary Key on CREATE TABLE Unique Key on CREATE TABLE
Syntax: CREATE TABLE Students Syntax: CREATE TABLE Students
( ID int NOT NULL PRIMARY ( ID int NOT NULL
KEY, LastName varchar(255) NOT UNIQUE, LastName varchar(255)
NULL, FirstName NOT NULL, FirstName
varchar(255), Age int); varchar(255), Age int);
Write the SQL query to convert the string to UPPERCASE and LOWERCASE.
The SQL query used to convert the string to UPPERCASE and LOWERCASE is:
STRING UPPER(“naukrilearning”); => NAUKRILEARNING
STRING LOWER(“LEARNERS”); => learners
What is the procedure to hide a specific table name of the schema?
By using SYNONYMS, we can hide a specific table name of the schema.
Syntax:
CREATE SYNONYM STU for STUDENTS;
After creating the above synonym, we can access the data of the STUDENTS table
using STU as the table name below
SELECT * from STU;
What is the syntax to eliminate duplicate rows?
By using the DISTINCT keyword, we can eliminate duplicate records.
Syntax:
SELECT DISTINCT CLASS_ID
FROM STUDENTS;
Find out nth highest salary from emp table?
Syntax:
select salary from
(select salary, rownum EP from
(select salary from employees
order by salary desc))
where EP=n;
Name the encryption mechanisms in the SQL server.
This is one of the most popular SQL interview questions. The encryption mechanism
used in SQL servers are
Transact-SQL functions – Individual items can be encrypted as they are inserted or
updated using Transact-SQL functions.
Asymmetric keys – It is made up of a private key and the corresponding public key.
Each key can decrypt data encrypted by the other.
Symmetric keys – It is used for both encryption and decryption.
Certificates – Also known as a public key certificate, it binds the value of a public
key to the identity of the person, device, or service that holds the corresponding
private key.
Transparent Data Encryption – It is a special case of encryption using a symmetric
key that encrypts an entire database using that symmetric key.
What is the procedure to pass variables in a SQL routine?
Variables can be passed to a SQL routine by using:
“&” symbol
SQLPLUS command
Can a view be updated/inserted/deleted? If yes, at what conditions?
It is not possible to add the data through a view if the view contains the following:
Group by clause
Group functions
DISTINCT keyword
Columns defined by expressions
Pseudo column ROWNUM keyword
NOT NULL column in the base table that is not selected by the view.
How can you create an SQL table from another table without copying any
values from the old table?
Syntax:
CREATE TABLE new_table
AS (SELECT *
FROM old_table WHERE 1=0);
This will create a new table with the same structure as the old table with no rows
copied.
Explain what is an inline view?
An inline view is a SELECT statement in the FROM-clause of another SELECT
statement. In-line views are used to reduce complex queries by removing join
operations and summarizing multiple separate queries into a single query.
Syntax:
SELECT SALARY FROM
(SELECT SALARY, ROWNUN EP FROM
(SELECT SALARY FROM EMPLOYEES ORDER BY SALARY DESC) )
WHERE EP=7
Mention the use of the DROP option in the ALTER TABLE command.
The use of the DROP option in the ALTER TABLE command is to drop a particular
COLUMN.
Syntax:
ALTER TABLE TABLE_NAME
DROP COLUMN COLUMN_NAME
What are the aggregate functions in SQL?
SQL aggregate functions allow us to return a single value, which is calculated from
values in a column.
Following are the aggregate functions in SQL:
AVG() : This function returns the average value
COUNT(): This function returns the number of rows
MAX() : It returns the largest value
MIN() : This function returns the smallest value
ROUND(): This function rounds a numeric field to the number of decimals specified
SUM() : It returns the sum
Write the SQL query to update the student names by removing leading and
trailing spaces.
This can be done by using ‘Update’ command with ‘LTRIM’ and ‘RTRIM’ function.
Syntax:
UPDATE StudentDetails
SET FullName = LTRIM(RTRIM(FullName));
Write the SQL query to fetch alternate records from a table
Records can be fetched for odd and even row numbers:
Syntax to fetch even numbers:
Select employeeId from (Select rowno, employeeId from employee) where
mod(rowno,2)=0
Syntax to fetch odd numbers:
Select employeeId from (Select rowno, employeeId from employee) where
mod(rowno,2)=1
How do you return a hundred books starting from the 15th?
The syntax will be:
SELECT book_title FROM books LIMIT 15, 100.
The first number in LIMIT is the offset, and the second is the number.
How will the query select all teams that lost either 1, 3, 5, or 7 games?
We will use-
SELECT team_name FROM teams WHERE team_lost IN (1, 3, 5, 7)
How will you delete a column?
We can delete a column by –
ALTER TABLE techpreparation_answers DROP answer_user_id.
What is the meaning of this query – Select User_name, User_isp From Users
Left Join Isps Using (user_id)?
It means:
SELECT user_name, user_isp FROM users LEFT JOIN isps WHERE
users.user_id=isps.user_id
How will you see all indexes defined for a table?
By using:
SHOW INDEX FROM techpreparation_questions;
How would you change a table to InnoDB?
By using:
ALTER TABLE techpreparation_questions ENGINE InnoDB;
What is the possible way to add five minutes to a date?
By using:
ADDDATE(techpreparation_publication_date, INTERVAL 5 MINUTE)
What is the possible way to convert between Unix timestamps and Mysql
timestamps?
Example:
UNIX_TIMESTAMP converts from MySQL timestamp to Unix timestamp
FROM_UNIXTIME converts from Unix timestamp to MySQL timestamp
How do you implement Enums and sets internally in MySQL?
To implement an ENUM column, use the given syntax:
CREATE TABLE table_name ( … col ENUM (‘value1′,’value2′,’value3’), … );
How can we restart SQL Server in the single user or the minimal
configuration modes?
The command line SQLSERVER.EXE used with ‘–m’ will restart SQL Server in the
single-user mode.
The command line SQLSERVER.EXE used with ‘–f’ will restart it in the minimal
configuration mode.
What is the use of the tee command in Mysql?
Tee is a UNIX command that takes the standard output of a Unix command and writes
it to both the terminal and a file. Tee followed by a filename turns on MySQL logging
to a specified file. It can be paused by a command note.
Is it possible to save your connection settings to a conf file?
Yes, it is possible, and you can name it ~/.my.conf. You can also change the
permissions on the file to 600 so that it’s not readable by others.
How to convert numeric values to character strings?
We can convert numeric values to character strings by using the CAST(value AS
CHAR) function, as shown in the following examples:
SELECT CAST(4123.45700 AS CHAR) FROM DUAL;
4123.45700
Use mysqldump to create a copy of the database?
mysqldump -h mysqlhost -u username -p mydatabasename > dbdump.sql
What are federated tables?
Federated tables allow access to the tables situated on other databases on other
servers in MySQL. It lets you access data from a remote MySQL database without
using replication or cluster technology. Querying a local FEDERATED table pulls the
data from the remote (federated) tables. Data is not stored on the local tables.
What are the different groups of data types in MySQL?
There are three groups of data types in MySQL, as listed below:
String Data Types – BINARY, VARBINARY, TINYBLOB, CHAR, NCHAR, VARCHAR,
NVARCHAR, TINYTEXT, BLOB, TEXT, MEDIUMBLOB, LONGBLOB, LONGTEXT, ENUM,
SET, MEDIUMTEXT.
Numeric Data Types – MEDIUMINT, INTEGER, BIGINT, FLOAT, BIT, TINYINT,
BOOLEAN, SMALLINT, DOUBLE, REAL, DECIMAL.
Date and Time Data Types – TIMESTAMP, TIME, DATE, DATETIME, YEAR.
What is the procedure to concatenate two character strings?
To concatenate various character strings into one, you can use the CONCAT()
function. Example:
SELECT CONCAT(’Naukri’,’ Learning’) FROM DUAL;
Naukri Learning
SELECT CONCAT(‘Learner’,’Thing’) FROM DUAL;
Learner Thing
What is the procedure to change the database engine in Mysql?
By using:
ALTER TABLE EnterTableName ENGINE = EnterEngineName;
What is the default storage engine in MySQL?
InnoDB is the default storage engine in MySQL.
What syntax is used to create an index in MySQL?
By using-
CREATE INDEX [index name] ON [table name]([column name]);
How to store videos in SQL Server table?
We use the FILESTREAM datatype to store videos in SQL server table.
Explain the use of the NVL() function.
The NVL()function converts the Null value to the other value.
What are the different storage engines/table types present in MySQL?
MySQL supports two types of tables: transaction-safe tables (InnoDB and BDB) and
non-transaction-safe tables (HEAP, ISAM, MERGE, and MyISAM).
MyISAM: This is a default table type that is based on the Indexed Sequential Access
Method (ISAM). It extends the former ISAM storage engine. These tables are
optimized for compression and speed.
HEAP: It allows fast data access. However, the data will be lost if there is a crash.
HEAP table cannot have BLOB, TEXT, and AUTO_INCREMENT fields.
BDB: It supports transactions using COMMIT and ROLLBACK. It is slower than the
others.
InnoDB: These tables fully support ACID-compliant and transactions.
MERGE: Also known as the MRG_MyISAM engine, MERGE is a virtual table that
combines multiple MyISAM tables that have a similar structure to one table.
What are the differences between and MyISAM and InnoDB?
The following are the differences between and MyISAM and InnoDB
MyISAM InnoDB
No longer supports transactions Supports transactions
It supports Table-level Locking It helps in Row-level Locking
No longer assist ACID (Atomicity,
Supports ACID property
Consistency, Isolation, and Durability)
Does not support FULLTEXT
Supports FULLTEXT index
index
What drivers are available in MySQL?
Below are the drivers available in MySQL:
PHP Driver
C WRAPPER
ODBC Driver
JDBC Driver
PYTHON Driver
RUBY Driver
PERL Driver
CAP11PHP Driver
Ado.net5.mxj
What is a pattern matching operator in SQL?
The pattern matching operator in SQL allows you to perform a pattern search in data
if you have no clue as to what that word should be. Rather than writing the exact
word, this operator uses wildcards to match a string pattern. The LIKE operator is
used with SQL Wildcards to get the required information.
LIKE operator is used for pattern matching, and it can be used as -.
1. % – It matches zero or more characters.
For example- select * from students where studentname like ‘a%’
2. _ (Underscore) – it matches exactly one character.
For example- select * from student where studentname like ‘abc_’
For Example – To search for any employee in the database with the last name
beginning with the letter A
SELECT *
FROM employees
WHERE last_name LIKE ‘A%’
_ (Underscore) – it matches exactly one character.
For Example – This example matches only if A appears at the third position of the
last name
SELECT *
FROM employees
WHERE last_name LIKE ‘_ _A%’
Here are a few examples:
WHERE Employee Name LIKE '%r' - Finds matches that end with "r"
WHERE Employee NAME LIKE '%gh%' - Finds matches that include "gh" in any
position `
WHERE Employee NAME LIKE '_ch%' - Finds matches with "ch" in the second and third
positions
WHERE Employee NAME LIKE 'g%r' - Finds matches that start with "g" and end with
"r"
Explain the STUFF and REPLACE functions.
The STUFF function deletes a substring of a certain length of a string and replaces it
with a new string. It inserts the string at a given position and deletes the number of
characters specified from the original string.
Syntax:
STUFF (string_expression, start, length, replacement_string)
Parameters:
string_expression: the main string in which the stuff is to be applied.
start: starting position of the character in string_expression.
length: length of characters that need to be replaced.
replacement_string: a new string that is to be applied to the main string.
The REPLACE function replaces all occurrences of a specific string value with
another string.
Syntax:
REPLACE (string_expression, search_string, replacement_string)
Parameters:
string_expression: the main string that contains the substring to be replaced.
Search_string: to locate the substring.
replacement_string: the new replacement string.
What is a Database Cursor?
A database cursor is a mechanism that allows for traversal over the records in a
database. Cursors also allow processing after traversal, like retrieval, addition, and
deletion of database records. A cursor is behaviorally similar to the programming
language iterator.
How to use a Database Cursor in SQL Procedures
Declare variables.
Declare a cursor that defines a result set. The cursor declaration must always be
associated with a SELECT Statement.
Open the cursor to initialize the result set.
FETCH statement to retrieve and move to the next row in the result set.
Close the cursor.
Deallocate the cursor.
What are SQL Scalar functions? Name some.
An SQL scalar function returns a single value based on the user input. Below are some
of the commonly used scalar functions:
SQL
Scalar
Format Description
Functio
n
SELECT LCASE(column_name) converts the value of a
LCASE()
FROM table_name; field to lowercase
SELECT UCASE(column_name) converts the value of a
UCASE()
FROM table_name; field to uppercase
SELECT LENGTH(column_name) returns the total length
LEN()
FROM table_name; of the value in a text field
SELECT rounds a numeric field to
ROUND() ROUND(column_name,decimals the number of decimals
) FROM table_name; specified
SELECT NOW() FROM returns the current
NOW()
table_name; system date and time
SELECT
FORMAT sets the format to display
FORMAT(column_name,format)
() a collection of values
FROM table_name;
What is the difference between SQL and PL/SQL?
SQL PL/SQL
PL/SQL or Procedural
Language/Structured Query Language is
SQL is a database Structured
a database programming language using
Query Language.
SQL. It is a dialect of SQL to enhance SQL
capabilities.
It was developed by IBM
It was developed by Oracle Corporation
Corporation and first appeared
in the early 90s.
in 1974.
Data variables are not
Data variables are available.
available.
SQL is a declarative language. PL/SQL is a procedural language.
It is data-oriented. PL/SQL is application-oriented.
It can execute only a single It can execute a whole block of code at a
query at a time. time.
SQL can directly interact with PL/SQL cannot directly interact with the
the database server. database server.
It can be embedded in PL/SQL. It cannot be embedded in SQL.
It is used to write program blocks,
SQL is used to write queries,
functions, procedures triggers, and
DDL, and DML statements.
packages
SQL acts as the source of data PL/SQL acts as a platform where SQL
that is to be displayed. data will be displayed.
Explain SQL comments.
SQL comments help in explaining the sections of the SQL statements. They also help
in avoiding the execution of SQL statements. There are three types of comments:
Single line comments: They start and end within a single line. Single line comments
start with –. The text between — and the end of the line is not executed.
Multi-line comments: These comments start in one line and end in a different one.
Any text between /* and */ will not be executed.
Inline comments: They are an extension of multi-line comments. We can write the
comments between the statements enclosed within ‘/*’ and ‘*/.’
How do you subset or filter data in SQL?
To subset or filter data in SQL, we use WHERE and HAVING clauses.
Consider the following movie table.
Using this table, let’s find the records for movies that were directed by Brad Bird.
Now, let’s filter the table for directors whose movies have an average duration greater
than 115 minutes.
Is the below SQL query correct? If not, how will you rectify it?
The query stated above is incorrect as we cannot use the alias name while filtering
data using the WHERE clause. It will throw an error.
How are Union, Intersect, and Except used in SQL? What are UNION, MINUS
and INTERSECT commands?
The UNION operator is used to combine the results of two tables while also removing
duplicate entries.
The MINUS operator is used to return rows from the first query but not from the
second query.
The INTERSECT operator is used to combine the results of both queries into a single
row.
Before running either of the above SQL statements, certain requirements must be
satisfied –
Within the clause, each SELECT query must have the same amount of columns.
The data types in the columns must also be comparable.
In each SELECT statement, the columns must be in the same order.
The Union operator combines the output of two or more SELECT statements.
Syntax:
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Let’s consider the following example, where there are two tables - Region 1 and
Region 2.
To get the unique records, we use Union.
The Intersect operator returns the common records that are the results of 2 or more
SELECT statements.
Syntax:
SELECT column_name(s) FROM table1
INTERSECT
SELECT column_name(s) FROM table2;
The Except operator returns the uncommon records that are the results of 2 or more
SELECT statements.
Syntax:
SELECT column_name(s) FROM table1
EXCEPT
SELECT column_name(s) FROM table2;
Below is the SQL query to return uncommon records from region 1.
Using the product_price table, write an SQL query to find the record with the
fourth-highest market price.
Fig: Product Price table
select top 4 * from product_price order by mkt_price desc;
Now, select the top one from the above result that is in ascending order of mkt_price.
From the product_price table, write an SQL query to find the total and
average market price for each currency where the average market price is
greater than 100, and the currency is in INR or AUD.
The SQL query is as follows:
The output of the query is as follows:
Using the product and sales order detail table, find the products with total
units sold greater than 1.5 million.
Fig: Products table
Fig: Sales order detail table
We can use an inner join to get records from both the tables. We’ll join the tables
based on a common key column, i.e., ProductID.
The result of the SQL query is shown below.
What is a Stored Procedure? What are its advantages and disadvantages?
A Stored Procedure is an SQL function that consists of several SQL statements to
access the database system. It can be stored for later use and can be used many
times. If you have to perform a particular task, repeatedly, you won’t have to write
the statements repeatedly, you will just have to call the stored procedure. This saves
time and avoids writing code again.
Syntax: To create a stored procedure
CREATE PROCEDURE procedure_name
AS
Begin
sql_statement
End;
Syntax: To execute a stored procedure
EXEC procedure_name;
Advantages of Stored Procedure:
Execution becomes fast and efficient as stored procedures are compiled once and
stored in executable form.
A Stored Procedure can be used as modular programming. Once created and
stored, it can be called repeatedly, whenever required.
Maintaining a procedure on a server is easier than maintaining copies on different
client machines.
Better security.
Disadvantages of Stored Procedure:
It can be executed only in the database and utilizes more memory in the database
server.
Any data errors in handling stored procedures are not generated until runtime.
Version control is not supported.
What do you mean by recursive stored procedure?
Recursive stored procedure refers to a stored procedure which calls by itself until it
reaches some boundary condition. This recursive function or procedure helps the
programmers to use the same set of code n number of times.
What is the default port for SQL?
The default TCP port assigned by the official Internet Number Authority(IANA) for SQL
server is 1433.
Name the default port for the MySQL server.
The default port for the MySQL server is 3306.
What do you mean by DBMS? What are its different types?
A Database Management System (DBMS) is a software application that interacts
with the user, applications and the database itself to capture and analyze data. The
data stored in the database can be modified, retrieved and deleted, and can be of any
type like strings, numbers, images etc.
There are mainly 4 types of DBMS, which are Hierarchical, Relational, Network, and
Object-Oriented DBMS.
1. Hierarchical DBMS: As the name suggests, this type of DBMS has a style of
predecessor-successor type of relationship. So, it has a structure similar to that of
a tree, wherein the nodes represent records and the branches of the tree
represent fields.
2. Relational DBMS (RDBMS): This type of DBMS, uses a structure that allows the
users to identify and access data in relation to another piece of data in the
database.
3. Network DBMS: This type of DBMS supports many to many relations wherein
multiple member records can be linked.
4. Object-oriented DBMS: This type of DBMS uses small individual software called
objects. Each object contains a piece of data and the instructions for the actions
to be done with the data.
What is Normalization? Explain different types of Normalization with
advantages.
Normalization is the process of organizing data to avoid duplication and
redundancy. There are many successive levels of normalization. These are
called normal forms. Each consecutive normal form depends on the previous one.
The first three normal forms are usually adequate.
1. First Normal Form (1NF) – No repeating groups within rows
2. Second Normal Form (2NF) – Every non-key (supporting) column value is
dependent on the whole primary key.
3. Third Normal Form (3NF) – Dependent solely on the primary key and no other
non-key (supporting) column value.
4. Boyce- Codd Normal Form (BCNF) – BCNF is the advanced version of 3NF. A
table is said to be in BCNF if it is 3NF and for every X ->Y, relation X should be the
super key of the table.
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)
Some of the advantages are:
Better Database organization
More Tables with smaller rows
Efficient data access
Greater Flexibility for Queries
Quickly find the information
Easier to implement Security
Allows easy modification
Reduction of redundant and duplicate data
More Compact Database
Ensure Consistent data after modification
What is Denormalization?
Denormalization is a database optimization technique for increasing a database
infrastructure performance by adding redundant data to one or more tables.
The method used to enter data from higher to the lower normal form of the database
is known as DeNormalization. We can bring up redundancy into a table by integrating
data from the relative table by DeNormalization.
What are the different types of Joins?
The various types of joins used to retrieve data between tables are Inner Join, Left Join,
Right Join and Full Outer Join. Refer to the image on the right side.
1. Inner join: Inner Join in MySQL is the most common type of join. It is used to
return all the rows from multiple tables where the join condition is satisfied.
2. Left Join: Left Join in MySQL is used to return all the rows from the left table,
but only the matching rows from the right table where the join condition is
fulfilled.
3. Right Join: Right Join in MySQL is used to return all the rows from the right
table, but only the matching rows from the left table where the join condition is
fulfilled.
4. Full Join: Full join returns all the records when there is a match in any of the
tables. Therefore, it returns all the rows from the left-hand side table and all
the rows from the right-hand side table.
Suppose you have a table of employee details consisting of columns names
(employeeId, employeeName), and you want to fetch alternate records from
a table. How do you think you can perform this task?
You can fetch alternate tuples by using the row number of the tuple. Let us say if we
want to display the employeeId, of even records, then you can use the mod function
and simply write the following query:
Select employeeId from (Select rownumber, employeeId from employee)
1
where mod(rownumber,2)=0
where ‘employee’ is the table name.
Similarly, if you want to display the employeeId of odd records, then you can write the
following query
Select employeeId from (Select rownumber, employeeId from employee)
1
where mod(rownumber ,2)=1
Consider the following two tables.
Now, write a query to get the list of customers who took the course more
than once on the same day. The customers should be grouped by customer,
and course and the list should be ordered according to the most recent
date.
SELECT
1
c.Customer_Id,
2
CustomerName,
3
Course_Id,
4
Course_Date,
5
count(Customer_Course_Id) AS count
6
FROM customers c JOIN course_details d ON d.Customer_Id =
7
c.Customer_Id
8
GROUP BY c.Customer_Id,
9
CustomerName,
10
Course_Id,
11
Course_Date
12
HAVING count( Customer_Course_Id ) > 1
13
ORDER BY Course_Date DESC;
Consider the below Employee_Details table. Here the table has various
features such as Employee_Id, EmployeeName, Age, Gender, and Shift. The
Shift has m = Morning Shift and e = Evening Shift. Now, you have to swap
the ‘m’ and the ‘e’ values and vice versa, with a single update query.
You can write the below query:
UPDATE Employee_Details SET Shift = CASE Shift WHEN 'm' THEN 'e' ELSE
1
'm' END
Write a SQL query to get the third highest salary of an employee from
Employee_Details table as illustrated below.
1 SELECT TOP 1 Salary
2 FROM(
3 SELECT TOP 3 Salary
4 FROM Employee_Details
5 ORDER BY salary DESC) AS emp
6 ORDER BY salary ASC;
What is the usage of the NVL() function?
You may use the NVL function to replace null values with a default value. The function
returns the value of the second parameter if the first parameter is null. If the first
parameter is anything other than null, it is left alone.
This function is used in Oracle, not in SQL and MySQL. Instead of NVL() function,
MySQL have IFNULL() and SQL Server have ISNULL() function.
What is the difference between NVL and NVL2 functions in SQL?
NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions which check whether the
value of exp1 is null or not.
If we use NVL(exp1,exp2) function, then if exp1 is not null, then the value of exp1 will
be returned; else the value of exp2 will be returned. But, exp2 must be of the same
data type of exp1.
Similarly, if we use NVL2(exp1, exp2, exp3) function, then if exp1 is not null, exp2 will
be returned, else the value of exp3 will be returned.
How does PROC SQL work?
PROC SQL is nothing but a simultaneous process for all the observations. The following
steps occur when a PROC SQL gets executed:
SAS scans each and every statement in the SQL procedure and checks the
syntax errors.
The SQL optimizer scans the query inside the statement. So, the SQL optimizer
basically decides how the SQL query should be executed in order to minimize
the runtime.
If there are any tables in the FROM statement, then they are loaded into the
data engine where they can then be accessed in the memory.
Codes and Calculations are executed.
The Final Table is created in the memory.
The Final Table is sent to the output table described in the SQL statement.
If you are given an unsorted data set, how will you read the last observation
to a new dataset?
We can read the last observation to a new dataset using end = dataset option.
For example:
1 data example.newdataset;
2 set example.olddataset end=last;
3 If last;
4 run;
Where newdataset is a new data set to be created and olddataset is the existing data
set. last is the temporary variable (initialized to 0) which is set to 1 when the set
statement reads the last observation.
What are the differences between the sum function and using “+” operator?
The SUM function returns the sum of non-missing arguments whereas “+” operator
returns a missing value if any of the arguments are missing. Consider the following
example.
Example:
1 data exampledata1;
2 input a b c;
3 cards;
4 44 4 4
5 34 3 4
6 34 3 4
7 .12
8 24 . 4
9 44 4 .
10 25 3 1
11 ;
12 run;
13 data exampledata2;
14 set exampledata1;
15 x = sum(a,b,c);
16 y=a+b+c;
17 run;
In the output, the value of y is missing for 4th, 5th, and 6th observation as we have
used the “+” operator to calculate the value of y.
xy
52 52
41 41
41 41
3.
28 .
48 .
29 29
Write a query
The questions most commonly associated with the SQL technical screening ask you to
solve a given problem by writing out a query in SQL. You’ll typically be given one or
more tables and asked to write one or more queries to retrieve, edit, or remove data
from those tables.
The difficulty of questions will likely vary based on the company and the role (entry-
level vs. advanced). In general, you should be comfortable writing queries using the
following concepts, statements, and clauses:
Categorization, aggregation, and ratio (CASE, COUNT, or SUM, numerator and
denominator)
Joining two tables (JOIN inner vs. left or right)
Modifying a database (INSERT, UPDATE, and DELETE)
Comparison operators (Less than, greater than, equal to)
Organizing data (ORDER BY, GROUP BY, HAVING)
Subqueries
Forms query-writing questions may take:
Given a table or tables with a few sample rows,
1. List the three stores with the highest number of customer transactions.
2. Extract employee IDs for all employees who earned a three or higher on their last
performance review.
3. Calculate the average monthly sales by product displayed in descending order.
4. Find and remove duplicates in the table without creating another table.
5. Identify the common records between two tables.
Six-step strategy for your SQL interview
Sometimes the best way to keep nerves calm before an interview is to walk into the
screening with a clear plan of action. No matter what type of query you’re asked to
write, you can use this six-step process to organize your thoughts and guide you to a
solution, even when you’re feeling nervous.
1. Restate the question to make sure you understand what you’re being asked to do.
2. Explore the data by asking questions. What data type is in each column? Do
any columns contain unique data (such as user ID)?
3. Identify the columns you’ll need to solve the problem. This helps you focus on
the data that matters so you’re not distracted by the data that is irrelevant to the
query.
4. Think about what your answer should look like. Are you looking for a single
value or a list? Will the answer be the result of a calculation? If so, should it be a float or
an integer? Do you need to account for this in your code?
5. Write your code one step at a time. It can help to outline your approach first. By
writing down the steps you plan to take, you’ll have a clear outline once you start
writing your query (and you’ll give the interviewer a chance to correct you if there’s an
issue with your approach).
Then code in increments, taking one step of your outline at a time. After you’re happy
with your code for the first step, build onto that code with the second step.
6. Explain your solution as a whole. If there’s a more efficient way you could have
written your code—using subqueries for example—explain that. And remember to
answer the original question.
SQL interview tips for success
In addition to the process above, here are some tips to keep in mind when you’re in
your SQL interview.
Talk through your process out loud. Your interviewer may or may not know SQL
themselves, so be sure to explain the what, how, and why of each step.
Include written comments as to what each step of your query is meant to
accomplish. This can help you keep track of where you are in the problem, and it can
make your code easier to understand. If you’re coding in a live environment, you can
type comments using a double hash (--). On a whiteboard, write your comments off to
the side.
Use correct formatting. While your ability to problem solve is more important than
precise syntax, you can avoid confusing the interviewer (and yourself) by keeping your
hand-written code organized.
Embrace the awkwardness. It’s okay if the room is silent while you think through a
problem. As you’re thinking out loud, you may find yourself re-starting sentences with a
better way to explain something. That’s okay too.
Next steps: Preparing for your SQL interview
Table 1:
Table 2:
Following are the SQL interview questions to expect at data analyst interviews based
on queries:
1. Write a query to fetch salary records from a table in descending order.
2. Write a query to fetch the record with the highest salary from the tables.
3. What query will you use to calculate odd and even records in a table?
4. What query would you use to identify which position draws the maximum
salary from the tables?
5. Write a query to find which project associates with the position that pays the
highest salary.
6. Write a query to determine which employee draws the highest salary from the
table.
7. Write a query to identify whether male or female employees make more on
average.
------------------------------------------------------------------------------------------------------------------------
What are some of the most common SQL commands?
Some of the most common SQL commands are CREATE TABLE, INSERT INTO, UPDATE,
DELETE, and SELECT.
CREATE TABLE is used to create a new table in a database.
INSERT INTO is used to insert data into a table.
UPDATE is used to update data in a table.
DELETE is used to delete data from a table.
SELECT is used to select data from a table.
How can SQL be used to analyze data?
SQL provides a number of built-in functions that can be used to perform various types
of data analysis. For example, the COUNT function can be used to counting the
number of records in a table, while the SUM function can be used to calculate the sum
of numeric values in a column. By using these and other SQL functions, data analysts
can quickly and easily perform complex data analysis tasks.
For example, a data analyst interview question might use SQL to count the number of
orders placed on a website each day. The following SQL query would return the total
number of orders for each day in the dataset:
SELECT COUNT(*) AS “Total Orders”
FROM orders
GROUP BY order_date
What are some common errors that occur when writing SQL queries?
One common error that occurs when writing SQL queries is forgetting to include a
WHERE clause. Without a WHERE clause, your query will return all rows from the table
you’re querying, which can make it difficult to find the specific information you’re
looking for. Another common error is using incorrect syntax, which can lead to
unexpected results or errors when your query is executed. Finally, it’s important to
make sure that your SQL queries are properly formatted and easy to read; otherwise,
they may be difficult for others to understand or debug if something goes wrong.
For example, the following SQL query would return all rows from the orders table,
regardless of the order_date:
SELECT *
FROM orders
This would return a very large dataset that would be difficult to work with. To fix this,
we can add a WHERE clause to filter the data by order_date:
SELECT *
FROM orders
WHERE order_date = ‘2018–01–01’
What’s a SQL join and how is it used?
A SQL join is used to combine data from two or more tables into a single result set.
Joins are performed using the JOIN keyword, followed by the name of the table to join
with. There are a number of different types of joins, including inner joins, outer joins,
and self-joins. Inner joins return rows from both tables that have matching values in
the specified columns, while outer joins return all rows from both tables, including
rows with no matching values. Self-joins are used to join a table to itself; for example,
you could use a self-join to find all customers who live in the same city as another
customer.
What’s a SQL window function and how is it used?
A SQL window function is a function that performs a calculation on a set of values and
returns a single value. Unlike aggregate functions, which return one result per group,
window functions return one result per row. Common window functions include RANK,
DENSE_RANK, and NTILE.
Window functions are often used with the ORDER BY clause to calculate a value for
each row
When would you not want to use a window function in a SQL?
Window functions are a type of SQL function that return a value for each row in the
query result, based on values from other rows in the same result. For example, you
could use a window function to calculate the running total of all order totals in your
customer orders table.
Window functions are not typically used with aggregate functions, because the results
would not make sense. For example, if you tried to find the average salary for each
department using a window function, you would end up with the same average salary
for every department, because the window function would calculate the average
salary for each row in the result set (which would be all employees in all
departments).
What is a View?
A view is a virtual table which consists of a subset of data contained in a table. Since
views are not present, it takes less space to store. View can have data of one or more
tables combined and it depends on the relationship.
A view is a virtual table whose contents are obtained from an existing table or tables,
called base tables. The retrieval happens through an SQL statement, incorporated into
the view. So, you can think of a view object as a view into the base table. The view
itself does not contain any real data; the data is electronically stored in the base
table. The view simply shows the data contained in the base table.
What are Views used for?
A view refers to a logical snapshot based on a table or another view. It is used for the
following reasons:
Restricting access to data.
Making complex queries simple.
Ensuring data independence.
xProviding different views of same data.
What’s the difference between a view and a table?
A view is a virtual table that is based on the results of an SQL query. Views are often
used to provide security or simplify complex queries. For example, you could create a
view that only includes customer information that is relevant to your current project.
Tables, on the other hand, are database structures that actually store data.
Tell me as a data analyst, what does the ORDER BY keyword do?
The ORDER BY keyword is used to sort the results of an SQL query in ascending or
descending order. By default, ORDER BY will sort the results in ascending order; to
sort the results in descending order, you can use the DESC keyword.
What’s the difference between a data scientist and data analyst?
A data analyst is responsible for organizing and analyzing data to help companies
make better business decisions. A data analyst might use SQL to query databases and
uncover trends, or build predictive models using Excel or statistical software. A data
analyst typically has a background in mathematics or computer science, and strong
analytical and problem-solving skills.
A data scientist is responsible for extracting insights from data through the use of
techniques such as machine learning and statistical modeling. Data scientists
typically have a background in mathematics, statistics, and computer science, and
are skilled at programming languages such as R and Python.
What are some of the most common SQL functions?
Some of the most common SQL functions are SUM(), AVG(), COUNT(), MIN(), and
MAX(). These functions are used to calculate aggregated values, such as sums,
averages, or counts.
How have you used SQL to solve a problem?
This is a common SQL interview question that is designed to assess your real-world
experience with the language. When answering this question, be sure to describe a
specific problem that you were able to solve using SQL. This will help to show the
interviewer that you have a good understanding of how SQL can be used in practice.
What’s the difference between a lag and lead function in SQL?
Lag and lead functions are used to access data from a previous or future row in a
table. Lag functions return data from a row that is preceding the current row, while
lead functions return data from a row that is following the current row.
For example,
If the current row is customer_id 3 (Joe Bloggs), a lag function would return
customer_id 2 (Jane Doe), while a lead function would return customer_id 4 (Sarah
Connor).
Write a query to get the total two-day rolling average for sales by day.
For example, suppose we have a sales table with the following data:
If we wanted to get the total two-day rolling average for sales by day, we could use
the following SQL query:
SELECT date, sales, AVG(sales) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING) AS “Two-Day Rolling Average”
FROM sales
GROUP BY date
ORDER BY date
This would return the following result:
How would you teach SQL to a beginner?
If you were teaching SQL to a beginner, you would start by explaining the basics of
databases and how they are used to store data. You would then move on to
explaining the different types of SQL queries and how they are used to retrieve data
from a database. Finally, you would teach them how to use SQL to insert, update, and
delete data from a database.
What are some common date functions in SQL?
Some common date functions in SQL are:
CURRENT_DATE: Returns the current date.
CURRENT_TIME: Returns the current time.
CURRENT_TIMESTAMP: Returns the current date and time.
DATE_ADD: Adds a specified number of days, months, or years to a date.
DATE_SUB: Subtracts a specified number of days, months, or years from a date.
DAY: Returns the day of the month for a given date.
MONTH: Returns the month for a given date.
YEAR: Returns the year for a given date.
What are some advanced SQL functions?
There are many advanced SQL functions, but some of the most common are
aggregate functions, window functions, and pivoting.
Aggregate functions are used to calculate a single value from multiple values. For
example, the SUM() function calculates the sum of a column of values, and the AVG()
function calculates the average of a column of values.
Window functions are used to calculate a value for each row in a table based on the
values in other rows in the table. For example, the RANK() function assigns a rank to
each row in a table, and the LAG() function returns the value of a column in a previous
row.
Pivoting is when you rotate data from one format to another. For example, you can
pivot data from a horizontal format to a vertical format, or from a columnar format to
a row-based format.
What is Self-Join?
Self-join is ready to be query used to examine itself. This is used to examine values in
a column with different values inside the identical column within the identical table.
ALIAS ES is used for the identical table comparison.
What is Cross-Join?
Cross-Join is a Cartesian product wherein a range of rows within the first table is
extended through a range of rows within the second table. For instance, if the WHERE
clause is utilized in costs-join, the question will function like an INNER JOIN.
Mention the various user-defined functions.
There are majorly three type of user defined functions and are as follow:-
Scalar Functions.
Inline Table valued Function.
Multi declaration valued Function.
What is a Data Warehouse?
Datawarehouse is a critical repository of data from a couple of assets of information.
That information is consolidated, converted, and made to be had for mining and
online processing. In addition, warehouse data has a subset of information referred to
as Data Marts.
Explain how you can restore the database in SQL Server?
First, Launch the SQL Server Management Studio and from the Object Explorer
window interface, right-click on Databases and click on Restore. This action will
routinely repair the database.
What is the way of inserting data in a SQL server ?
The basic step is If the RDBMS is MYSQL, this is the way we insert date:
INSERT INTO tablename (col_name, col_date) VALUES (‘DATE: Manual Date’, ‘2021-9-
21’)
How do I view tables in SQL ?
Show tables;
It is the only command used for viewing tables in SQL .
What is ETL in SQL ?
It’s a three-step process, i.e., Extract, Transformation, and load. First, we would
extract data from sources that are collectively named Raw Data. The data is now
transformed into a tidy form which is the second primary step. Finally, at last, we’ll
load all these data to tools that help find insights.
Explain the procedure to rename column names in SQL Server?
Using sp_rename is necessary to rename column names in SQL server because ALTER
TABLE command is useless in SQL server.
What are Nested Triggers?
Triggers used for data modification is known nested triggers. Using DELETE
statements, INSERT and UPDATE by DML can implement triggers.2
What query will you use to find the current date and time in MySQL, SQL
Server and Oracle?
MySQL:-
SELECT NOW();
SQL Server:-
SELECT getdate();
Oracle:-
SELECT SYSDATE FROM DUAL;
What is Cursor? How to use a Cursor?
After any variable declaration, DECLARE a cursor. A SELECT Statement must always
be coupled with the cursor definition.
To start the result set, move the cursor over it. Before obtaining rows from the result
set, the OPEN statement must be executed.
To retrieve and go to the next row in the result set, use the FETCH command.
To disable the cursor, use the CLOSE command.
Finally, use the DEALLOCATE command to remove the cursor definition and free up
the resources connected with it.
What is OLTP?
OLTP, or online transactional processing, allows huge groups of people to execute
massive amounts of database transactions in real time, usually via the internet. A
database transaction occurs when data in a database is changed, inserted, deleted, or
queried.
What are the differences between OLTP and OLAP?
OLTP stands for online transaction processing, whereas OLAP stands for online
analytical processing. OLTP is an online database modification system, whereas OLAP
is an online database query response system.
What is PostgreSQL?
In 1986, a team lead by Computer Science Professor Michael Stonebraker created
PostgreSQL under the name Postgres. It was created to aid developers in the
development of enterprise-level applications by ensuring data integrity and fault
tolerance in systems. PostgreSQL is an enterprise-level, versatile, resilient, open-
source, object-relational database management system that supports variable
workloads and concurrent users. The international developer community has
constantly backed it. PostgreSQL has achieved significant appeal among developers
because to its fault-tolerant characteristics.
It’s a very reliable database management system, with more than two decades of
community work to thank for its high levels of resiliency, integrity, and accuracy.
Many online, mobile, geospatial, and analytics applications utilise PostgreSQL as their
primary data storage or data warehouse.
Explain character-manipulation functions? Explains its different types in
SQL.
Change, extract, and edit the character string using character manipulation routines.
The function will do its action on the input strings and return the result when one or
more characters and words are supplied into it.
The character manipulation functions in SQL are as follows:
A) CONCAT (joining two or more values): This function is used to join two or more
values together. The second string is always appended to the end of the first string.
B) SUBSTR: This function returns a segment of a string from a given start point to a
given endpoint.
C) LENGTH: This function returns the length of the string in numerical form, including
blank spaces.
D) INSTR: This function calculates the precise numeric location of a character or word
in a string.
E) LPAD: For right-justified values, it returns the padding of the left-side character
value.
F) RPAD: For a left-justified value, it returns the padding of the right-side character
value.
G) TRIM: This function removes all defined characters from the beginning, end, or
both ends of a string. It also reduced the amount of wasted space.
H) REPLACE: This function replaces all instances of a word or a section of a string
(substring) with the other string value specified.
Write the SQL query to get the third maximum salary of an employee from a
table named employees.
Employee table
employee_name salary
A 24000
C 34000
D 55000
E 75000
F 21000
G 40000
H 50000
SELECT * FROM(
SELECT employee_name, salary, DENSE_RANK()
OVER(ORDER BY salary DESC)r FROM Employee)
WHERE r=&n;
To find 3rd highest salary set n = 3
What is the difference between the RANK() and DENSE_RANK() functions?
The RANK() function in the result set defines the rank of each row within your ordered
partition. If both rows have the same rank, the next number in the ranking will be the
previous rank plus a number of duplicates. If we have three records at rank 4, for
example, the next level indicated is 7.
The DENSE_RANK() function assigns a distinct rank to each row within a partition
based on the provided column value, with no gaps. It always indicates a ranking in
order of precedence. This function will assign the same rank to the two rows if they
have the same rank, with the next rank being the next consecutive number. If we
have three records at rank 4, for example, the next level indicated is 5.
What is the CASE WHEN in SQL Server?
The CASE statement is used to construct logic in which one column’s value is
determined by the values of other columns.
At least one set of WHEN and THEN commands makes up the SQL Server CASE
Statement. The condition to be tested is specified by the WHEN statement. If the
WHEN condition returns TRUE, the THEN sentence explains what to do.
When none of the WHEN conditions return true, the ELSE statement is executed. The
END keyword brings the CASE statement to a close.
1 CASE
2 WHEN condition1 THEN result1
3 WHEN condition2 THEN result2
4 WHEN conditionN THEN resultN
5 ELSE result
6 END;
NoSQL vs SQL
In summary, the following are the five major distinctions between SQL and NoSQL:
1. Relational databases are SQL, while non-relational databases are NoSQL.
2. SQL databases have a specified schema and employ structured query
language. For unstructured data, NoSQL databases use dynamic schemas.
3. SQL databases scale vertically, but NoSQL databases scale horizontally.
4. NoSQL databases are document, key-value, graph, or wide-column stores,
whereas SQL databases are table-based.
5. SQL databases excel in multi-row transactions, while NoSQL excels at
unstructured data such as documents and JSON.
What is the difference between NOW() and CURRENT_DATE()?
NOW() returns a constant time that indicates the time at which the statement began
to execute. (Within a stored function or trigger, NOW() returns the time at which the
function or triggering statement began to execute.
The simple difference between NOW() and CURRENT_DATE() is that NOW() will fetch
the current date and time both in format ‘YYYY-MM_DD HH:MM:SS’ while
CURRENT_DATE() will fetch the date of the current day ‘YYYY-MM_DD’.
What is BLOB and TEXT in MySQL?
BLOB stands for Binary Huge Objects and can be used to store binary data, whereas
TEXT may be used to store a large number of strings. BLOB may be used to store
binary data, which includes images, movies, audio, and applications.
BLOB values function similarly to byte strings, and they lack a character set. As a
result, bytes’ numeric values are completely dependent on comparison and sorting.
TEXT values behave similarly to a character string or a non-binary string. The
comparison/sorting of TEXT is completely dependent on the character set collection.
What is Database Black Box Testing?
Black Box Testing is a software testing approach that involves testing the functions of
software applications without knowing the internal code structure, implementation
details, or internal routes. Black Box Testing is a type of software testing that focuses
on the input and output of software applications and is totally driven by software
requirements and specifications. Behavioral testing is another name for it.
What are the different types of SQL sandbox?
SQL Sandbox is a secure environment within SQL Server where untrusted
programmes can be run. There are three different types of SQL sandboxes:
1. Safe Access Sandbox: In this environment, a user may execute SQL activities
like as building stored procedures, triggers, and so on, but they can’t access
the memory or create files.
2. Sandbox for External Access: Users can access files without having the
ability to alter memory allocation.
3. Unsafe Access Sandbox: This contains untrustworthy code that allows a user
to access memory.
Where MyISAM table is stored?
Prior to the introduction of MySQL 5.5 in December 2009, MyISAM was the default
storage engine for MySQL relational database management system versions. It’s
based on the older ISAM code, but it comes with a lot of extra features. Each MyISAM
table is split into three files on disc (if it is not partitioned). The file names start with
the table name and end with an extension that indicates the file type. The table
definition is stored in a.frm file, however this file is not part of the MyISAM engine;
instead, it is part of the server. The data file’s suffix is.MYD (MYData). The index file’s
extension is.MYI (MYIndex). If you lose your index file, you may always restore it by
recreating indexes.
How can you fetch common records from two tables?
You can fetch common records from two tables using INTERSECT. For example:
Select studentID from student. <strong>INTERSECT </strong> Select StudentID from
Exam
List some case manipulation functions in SQL?
There are three case manipulation functions in SQL, namely:
LOWER: This function returns the string in lowercase. It takes a string as an
argument and returns it by converting it into lower case. Syntax:
LOWER(‘string’)
UPPER: This function returns the string in uppercase. It takes a string as an
argument and returns it by converting it into uppercase. Syntax:
UPPER(‘string’)
INITCAP: This function returns the string with the first letter in uppercase and
rest of the letters in lowercase. Syntax:
INITCAP(‘string’)
Given the tables users and rides, write a query to report the distance
traveled by each user in descending order.
For this question you need to accomplish two things: the first is to figure out the total
distance travelled for each user_id, and the second is to order from greatest to least
eachuser_id by a calculated distance traveled.
This question has been asked in Uber data analyst interviews.
Write a query to find all the users that are currently “Excited” and have
never been “Bored” within a campaign.
For this medium SQL problem, assume you work at an advertising firm. You have a
table of users’ impressions of ad campaigns over time. Each user_id from these
campaigns has an attached impression_id, categorized as either “Excited” or “Bored”.
You will need to assess which users are “Excited” by their most recent campaign, and
have never been “Bored” in any past campaign.
Write a SQL query to select the second highest salary in the engineering
department.
To [answer this question], you need the name of the department to be associated
with each employee in the employees table, to understand which department each
employee is a part of.
The “department_id” field in the employees table is associated with the “id” field in
the departments table. You can call the “department_id” a foreign key because it is a
column that references the primary key of another table, which in this case is the “id”
field in the departments table.
Based on this shared field you can join both tables using INNER JOIN to associate the
department name to their employees.
SELECT salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id
With the department name in place you can now look at the employees of the
Engineering team and sort by salary to find the second highest paid.
Given a table of bank transactions, write a query to get the last transaction
for each day.
More Context: The table includes the columns: id, transaction_value and created_at
(representing the time for each transaction).
Since our goal in this problem is to pull the last transaction from each day, you want
to group the transactions by the day they occurred and create a chronological
ordering within each day from which you can retrieve the latest transaction.
To accomplish the task of grouping and ordering, create a modified version of the
bank_transactions table with an added column denoting the chronological ordering of
transactions within each day.
To partition by date you can use an OVER() statement. After partitioning, you should
use a descending order so that the first entry in each partition is the last transaction
chronologically. Here is how that query can be written:
SELECT (*), ROW_NUMBER() OVER(PARTITION BY DATE(created_at)
ORDER BY created_at DESC)
AS ordered_time
Write a query to debug an error and select the top five most expensive
projects by budget-to-employee ratio.
More context: You are given two tables. A projects table and another that maps
employees to their projects, called employee_projects. In this question however, a
bug exists that is causing duplicate rows in the employee_projects table.
Example:
Input:
projects table
colu
type
mn
INTEG
id
ER
VARC
title
HAR
state_ DATE
date TIME
end_d DATE
ate TIME
budg INTEG
et ER
employee_projects table
Column Type
INTEGE
project_id
R
employee_ INTEGE
id R
Output:
Column Type
VARCHA
title
R
budget_per_emplo
FLOAT
yee
This is a good example of a logic-based SQL problem. Although there are a few steps
to the solution, the actual SQL queries are fairly simple. HINT: One way to do the
debugging is to simply group by columns project_id and employee_id. By grouping by
both columns, you are creating a table that provides distinct values on project_id and
employee_id, thereby excluding any duplicates.
You have a table that represents the total number of messages sent
between two users by date on Facebook Messenger. Answer these
questions:
What are some insights that could be derived from this table?
What do you think the distribution of the number of conversations created by each
user per day looks like?
Write a query to get the distribution of the number of conversations created by each
user by day in 2020.
This question tests your data sense, as well as your SQL writing skills. It has also
appeared in Facebook data analyst interviews.
To answer the first part of the question regarding insights there are a number of
metrics you could evaluate. You could find the total number of messages sent per
day, number of conversations being started, or the average number of messages per
conversation. All of these metrics seek to find users level of engagement and
connectivity.
Write a SQL query to create a histogram of the number of comments per
user in the month of January 2020.
This intermediate SQL question has been asked in Amazon data analyst interviews.
Here is a partial answer from Interview Query:
What does a histogram represent, and what kind of story does it tell? In this case you
are interested in using a histogram to represent the distribution of comments each
user has made in January 2020. A histogram with bin buckets of size one means that
you can avoid the logical overhead of grouping frequencies into specific intervals.
For example, if you wanted a histogram of size five, you would have to run a SELECT
statement like so:
SELECT
CASE WHEN frequency BETWEEN 0 AND 5 THEN 5
WHEN frequency BETWEEN 5 AND 10 THEN 10 etc..
Select the largest three departments with ten or more employees and rank
them according to the percentage of employees making over $100,000.
In this problem, you are given two tables: An employees table and
a departments table.
Example:
Input:
employees table
Columns Type
id INTEGER
VARCHA
first_name
R
VARCHA
last_name
R
salary INTEGER
department_
INTEGER
id
departments table
Colum
Type
ns
id INTEGER
VARCHA
name
R
Output:
Column Type
percentage_over_1
FLOAT
00k
VARCHA
department_name
R
number of
INTEGER
employees
First, break down the question to understand what it’s asking. Specifically, you break
the question down into three clauses of conditions:
Top three departments by employee count.
% of employees making over $100,000 a year.
Department must have at least ten employees.
From here, think about how you would associate employees with their department,
calculate and display the percentage of employees making over $100,000 a year, and
order those results to provide an answer to the original question.
Given a table of students and their SAT test scores, write a query to return
the two students with the closest test scores by score difference.
Given that this problem is referencing one table with only two columns, you have to
self-reference different creations of the same table. It is helpful to think about this
problem in the form of two different tables with the same values.There are two parts
to this question:
The first part is comparing each combination of students and their SAT scores.
The second part is figuring out which two student’s scores are then the closest.
Write a query to support or disprove the hypothesis: Clickthrough Rate
(CTR) is dependent on search rating.
This question provides a table that represents search results on Facebook, including a
query, a position, and a human rating.
Write a query to get the number of customers that were upsold by
purchasing additional products.
For this problem, you are given a table of product purchases. Each row in the table
represents an individual product purchase.
Note: If the customer purchased two things on the same day, that does not count as
an upsell as they were purchased within a similar timeframe. We are looking for a
customer returning on a different date to purchase a product.
This question is a little tricky because you have to note the dates that each user
purchased products. You can’t just group by the user_id and look where the number
of products purchased is greater than one, because of the upsell condition.
You have to group by both the date field and the user_id to get each transaction
broken out by day and user:
SELECT
user_id
, DATE(created_at) AS date
FROM transactions
GROUP BY 1,2
The query above will now give us a user_id and date field for each row. If there exists
a duplicate user_id then you know that the user purchased on multiple days, which
satisfies the upsell condition. What comes next?
Given the transactions table below, write a query that finds the third
purchase of every user.
Note: Sort the results by the user_id in ascending order. If a user purchases two
products at the same time, the lower id field is used to determine which is the first
purchase.
Example:
Input:
transactions table
Column
Type
s
id INTEGER
user_id INTEGER
created_ DATETIM
at E
product_i
INTEGER
d
quantity INTEGER
Output:
Column
Type
s
user_id INTEGER
created_ DATETIM
at E
product_i
INTEGER
d
quantity INTEGER
Here is a helpful hint for this question: You need an indicator of which purchase was
the third by a specific user. Whenever you are thinking of ranking a dataset, it is
helpful to then immediately think of a specific window function you can use. You need
to apply the RANK function to the transactions table. The RANK function is a window
function that assigns a rank to each row in the partition of the result set.
SELECT *, RANK() OVER (
PARTITION BY user_id ORDER BY created_at ASC
) AS rank_value
FROM transactions
LIMIT 100
Write a query to retrieve the number of users who have posted each of their
job listings only once, and the number of users who have posted at least
one job multiple times.
Write a query to get the top three highest employee salaries by department.
For this problem, you are given an employees and a departments table.
Note: If the department contains less than three employees, the top two or top one
highest salaries should be listed.
Here’s a hint: You need to order the salaries by department. A window function is
useful here. Window functions enable calculations within a certain partition of rows. In
this case, the RANK() function would be useful. What would you put in the PARTITION
BY and ORDER BY clauses?
Your window function can look something like:
RANK() OVER (PARTITION BY id ORDER BY metric DESC) AS ranks
Note: When you substitute for the actual id and metric fields, make sure the
substitutes are relevant to the question asked and aligned to the data provided to
you.
Write a query to find the number of non-purchased seats for each flight.
In this Robinhood data analyst question assume you work for a small airline and you
are given three tables: flights, planes, and flight_purchases.
To get the number of unsold seats per flight, you need to get each flight’s total
number of seats available and the total seats sold.
You can do an inner join on all 3 tables since the question states that
the flight_purchases table does not have entries of flights or seats that do not exist.
To calculate the number of seats per flight, you use GROUP BY on the flight_id
together with COUNT() on seat_id to get a count of seats sold. You then do the
calculation of the number of total seats on the flight minus the total seats sold to
reach how many seats remained unsold.
Given a transactions table with date timestamps, sample every fourth row
ordered by date.
Here’s a hint for this question to get you started: If you are sampling from this table
and you want to specifically sample every fourth value, you will probably have to use
a window function.
A general rule of thumb to follow is that when a question states or asks for some Nth
value (like the third purchase of each customer or the tenth notification sent) then a
window function is the best option. Window functions allow us to use the RANK() or
ROW_NUMBER() function to provide a numerical index based on a certain ordering.
Write a query that returns all of the neighborhoods that have zero users.
More Context: You are given two tables: the first is a users table with demographic
information and the neighborhoods they live in, and the second is
a neighborhoods table.
This is an intermediate SQL problem that requires you to write a simple query. Our
task is to find all the neighborhoods without users. To reframe the task, you need all
the neighborhoods that do not have a single user living in them. This means you have
to introduce a column in one table, but not in the other, such that you can see user
counts by neighborhood.
Hard SQL Questions for Data Analysts
Advanced SQL questions are common for mid- and senior-level data analyst jobs, and
they require you to write advanced SQL queries or work through complex logic-based
case studies. The two types of questions include:
Advanced SQL writing - Writing queries to debug code, using indices to tune SQL
queries, and using advanced SQL clauses.
Logic-based questions - These questions can be more challenging analytics cases
studies or queries that first require you to solve a logic-based problem.
An online marketplace company has introduced a new feature that allows
potential buyers and sellers to conduct audio chats with each other prior to
transacting. Answer the following questions:
How would you measure the success of this new feature?
Write a query that can represent if the feature is successful or not.
Write a query to get the total three-day rolling average for deposits by day.
For this question, you are given a table of bank transactions with three columns,
user_id, a deposit or withdrawal value (determined if the value is positive or
negative), and created_at time for each transaction.
Here’s a hint: Usually, if the problem asks to solve for a moving/rolling average, you
are provided the dataset in the form of a table with two columns: date and value. This
problem is taken one step further as it provides a table of just transactions, with an
interest in filtering for deposits (positive value) and removing records representing
withdrawals (negative value, e.g. -10).
You also need to know the total deposit amount (sum of all deposits) for each day, as
it will factor into calculating the numerator for the rolling three-day average:
rolling three-day avg for day 3 = [day 3 + day 2 + day 1] / 3
Write a SQL query that creates a cumulative distribution of the number of
comments per user. Assume bin buckets class intervals of one.
To solve this cumulative distribution practice problem, you are given two tables,
a users table and a comments table.
frequen cumulati
cy ve
0 10
1 25
2 27
Write a query to display a graph to understand how unsubscribes are
affecting login rates over time.
For this question, assume that you work at Twitter. Twitter wants to roll out more
push notifications to users because they think users are missing out on good content.
Twitter decides to do this in an A/B test. After you release more push notifications,
you suddenly see the total number of unsubscribes increase. How would you visually
represent this growth in unsubscribes and its effect on login rates?
You are given a table of user experiences representing each person’s past
employment history. Answer the following:
Write a query to prove or disprove this hypothesis: Data scientists who switch jobs
more frequently become managers faster than data scientists that stay at one job for
longer.
For this question, you are interested in analyzing the career paths of data scientists.
Let’s say that the titles you care about are bucketed into data scientist, senior data
scientist, and data science manager.
Here’s a partial solution for this question:
This question requires a bit of creative problem solving to understand how you can
prove or disprove the hypothesis. The hypothesis is that data scientists who end up
switching jobs more often get promoted faster. Therefore, in analyzing this dataset,
you can prove this hypothesis by separating the data scientists into specific segments
on how often they switch jobs in their careers.
For example, if you looked at the number of job switches for data scientists that have
been in their field for five years, you could prove the hypothesis if the number of data
science managers increased with the number of times they had switched jobs.
Never switched jobs: 10% are managers
Switched jobs once: 20% are managers
Switched jobs twice: 30% are managers
Switched jobs three times: 40% are managers
Query Questions
1. How to remove duplicate rows in SQL?
The following SQL query removes the duplicate ids from the table:
DELETE FROM table WHERE ID IN (
SELECT
ID, COUNT(ID)
FROM table GROUP BY
ID HAVING
COUNT (ID) > 1);
2. Write a SQL query to find the names of employees that begin with ‘A’?
SELECT * FROM Table_name WHERE EmpName like 'A%'
3. Write a SQL query to get the third-highest salary of an employee from employee_table?
SELECT TOP 1 salary FROM(
SELECT TOP 3 salary FROM
employee_table
ORDER BY salary DESC) AS emp ORDER
BY salary ASC;
4. Given a table of candidates and their skills, you're tasked with finding the candidates best suited
for an open Data Science job. You want to find candidates who are proficient in Python, Tableau,
and PostgreSQL. Write a SQL query to list the candidates who possess all of the required skills for
the job. Sort the output by candidate ID in ascending order.
SELECT candidate_id FROM
candidates
WHERE skill IN ('Python', 'Tableau', 'PostgreSQL') GROUP BY candidate_id HAVING
COUNT(skill) = 3
ORDER BY candidate_id;
5. Assume you are given the tables below about Facebook pages and page likes. Write a query to
return the page IDs of all the Facebook pages that don't have any likes. The output should be in
ascending order.
SELECT pages.page_id
FROM pages LEFT OUTER JOIN page_likes AS likes ON
pages.page_id = likes.page_id
WHERE likes.page_id IS NULL;
6. Given a table of bank deposits and withdrawals, return the final balance for each account.
SELECT account_id,
SUM(CASE WHEN transaction_type = 'Deposit' THEN amount ELSE -amount END) AS
final_balance FROM transactions
GROUP BY account_id;
7. Assume you are given the table below containing information on user reviews. Write a query to
obtain the number and percentage of businesses that are top rated. A
top-rated business is defined as one whose reviews contain only 4 or 5 stars. Output the
number of businesses and percentage of top rated businesses rounded to the nearest integer.
SELECT COUNT(business_id) AS business_count,
ROUND(100.0 * COUNT(business_id)/ (SELECT COUNT (business_id) FROM
reviews),0) AS top_rated_pct
FROM reviews
WHERE review_stars IN (4, 5);
8. Tesla is investigating bottlenecks in their production, and they need your help to extract the
relevant data. Write a SQL query that determines which parts have begun the assembly process
but are not yet finished.
SELECT part
FROM parts_assembly WHERE
finish_date IS NULL GROUP BY
part;
9. Assume that you are given the table below containing information on viewership by device type
(where the three types are laptop, tablet, and phone). Define “mobile” as the sum of tablet and phone
viewership numbers. Write a query to compare the viewership on laptops versus mobile devices.
Output the total viewership for laptop and mobile devices in the format of "laptop_views" and
"mobile_views".
SELECT
SUM(CASE WHEN device_type = 'laptop' THEN 1 ELSE 0 END) AS laptop_views, SUM(CASE
WHEN device_type IN ('tablet', 'phone') THEN 1 ELSE 0 END)
AS mobile_views FROM
viewership;
10. Visa is trying to analyze its Apply Pay partnership. Calculate the total transaction volume for each
merchant where the transaction was performed via Apple Pay.
Output the merchant ID and the total transactions by merchant. For merchants with no
Apple Pay transactions, output their total transaction volume as 0.
Display the result in descending order of transaction volume.
SELECT merchant_id, SUM(CASE WHEN LOWER(payment_method) = 'apple pay' THEN
transaction_amount ELSE 0 END) AS volume FROM transactions
GROUP BY merchant_id ORDER BY
volume DESC;
11. Assume you are given the table below that shows job postings for all companies on the LinkedIn
platform. Write a query to get the number of companies that have posted duplicate job listings
(two jobs at the same company with the same title and description).
WITH jobs_grouped AS ( SELECT company_id, title, description, COUNT(job_id) AS job_count
FROM job_listings
GROUP BY company_id, title, description)
SELECT COUNT(DISTINCT company_id) AS duplicate_companies FROM
jobs_grouped
WHERE job_count > 1;
12. Given the reviews table, write a query to get the average stars for each product every month.
The output should include the month in numerical value, product id, and average star rating
rounded to two decimal places. Sort the output based on month followed by the product id.
SELECT EXTRACT(MONTH FROM submit_date) AS mth, product_id, ROUND(AVG(stars), 2)
AS avg_stars
FROM reviews
GROUP BY
EXTRACT(MONTH FROM submit_date), product_id
ORDER BY mth, product_id;
13. Assume that you are given the table below containing information on various orders made by eBay
customers. Write a query to obtain the user IDs and number of products purchased by the top 3
customers; these customers must have spent at least $1,000 in total. Output the user id and number
of products in descending order. To break ties (i.e., if 2 customers both bought 10 products), the user
who spent more should take precedence.
SELECT user_id, COUNT(product_id) AS
product_num FROM user_transactions
GROUP BY user_id
HAVING SUM(spend) >= 1000
ORDER BY product_num DESC, SUM(spend) DESC LIMIT 3;
14. The LinkedIn Creator team is looking for power creators who use their personal profile as a
company or influencer page. If someone's LinkedIn page has more followers than the company
they work for, we can safely assume that person is a power creator.
Write a query to return the IDs of these LinkedIn power creators ordered by the IDs.
SELECT profiles.profile_id
FROM personal_profiles AS profiles INNER JOIN
company_pages AS pages
ON profiles.employer_id = pages.company_id WHERE profiles.followers > pages.followers
ORDER BY profiles.profile_id;
15. Assume you are given the table below containing tweet data. Write a query to obtain a histogram of
tweets posted per user in 2022. Output the tweet count per user as the bucket, and then the number
of Twitter users who fall into that bucket
SELECT tweets_num AS tweet_bucket, COUNT(user_id) AS users_num FROM
( SELECT user_id, COUNT(tweet_id) AS tweets_num
FROM tweets
WHERE tweet_date BETWEEN '2022-01-01' AND '2022-12-31' GROUP BY user_id)
AS total_tweets GROUP BY tweets_num;
16. Assume you have an events table on app analytics. Write a query to get the click-through rate
(CTR %) per app in 2022. Output the results in percentages rounded to 2 decimal places.
SELECT app_id,
ROUND(100.0 *
SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) / SUM(CASE WHEN
event_type = 'impression' THEN 1 ELSE 0 END), 2) AS ctr_rate FROM
events
WHERE timestamp >= '2022-01-01' AND timestamp < '2023-01-01' GROUP
BY app_id;
17. Google marketing managers are analyzing the performance of various advertising accounts over the
last month. They need your help to gather the relevant data. Write a query to calculate the return on
ad spend (ROAS) for each advertiser across all ad campaigns. Round your answer to 2 decimal
places, and order your output by the advertiser_id.
SELECT advertiser_id,
ROUND(((SUM(revenue) / SUM(spend))), 2) AS ROAS FROM
ad_campaigns
GROUP BY advertiser_id ORDER BY
advertiser_id;
Hint: ROAS = Ad Revenue / Ad Spend
18. You are given the tables below containing information on Robinhood trades and users. Write a
query to list the top three cities that have the most completed trade orders in descending order.
Output the city and number of orders.
SELECT users.city, COUNT(trades.order_id) AS
total_orders FROM trades INNER JOIN users
ON trades.user_id = users.user_id WHERE
trades.status = 'Completed' GROUP BY
users.city
ORDER BY total_orders DESC LIMIT 3;
19. Microsoft Azure's capacity planning team wants to understand how much data its customers are
using, and how much spare capacity is left in each of its data centers. You’re given three tables:
customers, data centers, and forecasted_demand.Write a query to find the total monthly unused
server capacity for each data center. Output the data center id in ascending order and the total
spare capacity.
SELECT centers.datacenter_id, (centers.monthly_capacity - demands.total_demand) AS
spare_capacity
FROM ( SELECT datacenter_id, SUM(monthly_demand) AS total_demand FROM
forecasted_demand GROUP BY datacenter_id) AS demands INNER JOIN
datacenters AS centers
ON demands.datacenter_id = centers.datacenter_id ORDER BY centers.datacenter_id;
20. You are given a table of PayPal payments showing the payer, the recipient, and the amount paid. A
two-way unique relationship is established when two people send money back and forth. Write a
query to find the number of two-way unique relationships in this data.
SELECT COUNT(payer_id) / 2 AS unique_relationships
FROM ( SELECT payer_id, recipient_id FROM payments INTERSECT SELECT recipient_id,
payer_id FROM payments) AS relationships;
21. Given a table of Facebook posts, for each user who posted at least twice in 2021, write a query to
find the number of days between each user’s first post of the year and last post of the year in the
year 2021. Output the user and number of the days between each user's first and last post.
SELECT user_id,
MAX(post_date) - MIN(post_date) AS days_between
FROM posts
WHERE DATE_PART('year', post_date) = 2021 GROUP BY user_id
HAVING COUNT(post_id)>1;
22. When you log in to your retailer client's database, you notice that their product catalog data is full of
gaps in the category column. Can you write a SQL query that returns the product catalog with the
missing data filled in?
WITH fill_products AS
( SELECT product_id, category, name, COUNT(category)
OVER (ORDER BY product_id) AS category_group FROM products)
SELECT product_id, FIRST_VALUE (category)
OVER ( PARTITION BY category_group ORDER BY product_id)
AS category, name FROM fill_products;
23. Assume you are given the table below containing information on user session activity. Write a query that
ranks users according to their total session durations (in minutes) by descending order for each session type
between the start date
(2022-01-01) and end date (2022-02-01). Output the user id, session type, and the ranking of the total
session duration.
SELECT user_id, session_type,
RANK() OVER ( PARTITION BY session_type ORDER BY total_duration DESC) AS ranking
FROM ( SELECT user_id, session_type, SUM(duration) AS total_duration FROM
sessions
WHERE start_date >= '2022-01-01' AND start_date <= '2022-02-01'
GROUP BY user_id, session_type) AS user_duration ORDER BY session_type, ranking;
24. Assume you are given the table below containing the information on the searches attempted and the
percentage of invalid searches by country. Write a query to obtain the percentage of invalid search
results. Output the country (in ascending order), total number of searches and percentage of invalid
search rounded to 2 decimal places.
WITH invalid_results AS
( SELECT country, num_search, invalid_result_pct, CASE WHEN invalid_result_pct IS NOT NULL
THEN num_search
ELSE NULL END AS num_search_2, ROUND((num_search * invalid_result_pct)/100.0,0) AS
invalid_search
FROM search_category
WHERE num_search IS NOT NULL AND invalid_result_pct IS NOT NULL )
SELECT country, SUM(num_search_2) AS total_search,
ROUND(SUM(invalid_search)/SUM(num_search_2) * 100.0,2) AS invalid_result_pct FROM
invalid_results
GROUP BY country ORDER BY
country;
25. Assume you are given the following tables on Walmart transactions and products. Find the top 3 products
that are most frequently bought together (purchased in the same transaction). Output the name of
product #1, name of product #2 and number of combinations in descending order.
WITH purchase_info AS (
SELECT transactions.user_id, transactions.product_id, transactions.transaction_id,
products.product_name FROM transactions
INNER JOIN products
ON transactions.product_id = products.product_id)
SELECT p1.product_name AS product1, p2.product_name AS product2, COUNT(*) AS
combo_num
FROM purchase_info AS p1 INNER JOIN
purchase_info AS p2
ON p1.transaction_id = p2.transaction_id AND p1.product_id > p2.product_id GROUP BY
p1.product_name, p2.product_name ORDER BY combo_num DESC LIMIT 3;