You are absolutely right to point out that understanding how to write queries is crucial, and
my previous explanation focused more on what each part does rather than the thought
process behind constructing them. My apologies for that oversight!
Let's shift gears and focus on the thought process and logic behind writing SQL queries,
broken down into a step-by-step approach. We'll build up from simple questions to more
complex ones, always focusing on how you'd think about getting the answer from your data.
Imagine you're talking to a very diligent, but very literal, assistant who only understands SQL.
You need to give them precise instructions.
Our Tables (Our Data Lists) - Reminder:
1. Students Table:
StudentID Name Age Major GPA
101 Alice 20 Computer 3.8
Science
102 Bob 22 Engineering 3.2
103 Charlie 19 Arts 3.9
104 David 21 Computer 3.5
Science
105 Eve 20 Engineering 3.7
106 Frank 23 Arts 3.0
2. Courses Table:
CourseID CourseName Credits StudentID
CS101 Intro to 3 101
Programming
ENG201 Mechanics 4 102
ART100 Art History 3 103
CS201 Data Structures 4 104
ENG201 Mechanics 4 105
ART100 Art History 3 106
CS101 Intro to 3 104
Programming
CS301 Algorithms 3 101
The Query Writing Process: A Step-by-Step Thought
Flow
When you want to retrieve information from a database, think in terms of these questions:
1. What do I want to see (columns)? (SELECT)
2. Where is this information stored (tables)? (FROM)
3. Are there any specific conditions (filters)? (WHERE)
4. Do I need to group things together for summaries? (GROUP BY, HAVING)
5. How do I want to sort the results? (ORDER BY)
Let's apply this to various scenarios.
Scenario 1: Simple Selection
Question: "I just want to see the names and ages of all our students."
● Step 1: What do I want to see?
○ I want the Name column.
○ I want the Age column.
○ SQL thought: SELECT Name, Age
● Step 2: Where is this information stored?
○ Both Name and Age are in the Students table.
○ SQL thought: FROM Students
● Step 3: Are there any specific conditions?
○ No, I want all students. No WHERE clause needed.
● Step 4: Do I need to group things?
○ No, I just want to see individual student details. No GROUP BY.
● Step 5: How do I want to sort?
○ Let's sort them alphabetically by name.
○ SQL thought: ORDER BY Name ASC (ASC for ascending, A-Z)
Putting it together:
SQL
SELECT Name, Age
FROM Students
ORDER BY Name ASC;
Result (what you'd expect to see):
| Name | Age |
| :------ | :-- |
| Alice | 20 |
| Bob | 22 |
| Charlie | 19 |
| David | 21 |
| Eve | 20 |
| Frank | 23 |
Scenario 2: Filtering Data (WHERE clause)
Question: "Show me the names of all students who are studying 'Computer Science'."
● Step 1: What do I want to see?
○ Just the Name of the student.
○ SQL thought: SELECT Name
● Step 2: Where is this information stored?
○ Name and Major (which I'll use for filtering) are in the Students table.
○ SQL thought: FROM Students
● Step 3: Are there any specific conditions?
○ Yes, the Major must be 'Computer Science'.
○ SQL thought: WHERE Major = 'Computer Science' (Note: Text values usually need
single quotes)
● Step 4 & 5: No grouping or specific sorting requested for this simple query.
Putting it together:
SQL
SELECT Name
FROM Students
WHERE Major = 'Computer Science';
Result:
Name
Alice
David
Scenario 3: Using Aggregate Functions and Grouping (GROUP BY)
Question: "How many students are in each major, and what's their average GPA?"
● Step 1: What do I want to see?
○ I need to know the Major itself.
○ I need a count of students. This means COUNT(*).
○ I need the average GPA. This means AVG(GPA).
○ SQL thought: SELECT Major, COUNT(*) AS NumberOfStudents, AVG(GPA) AS
AverageGPA
● Step 2: Where is this information stored?
○ All these columns (Major, GPA) are in the Students table.
○ SQL thought: FROM Students
● Step 3: Are there any specific conditions?
○ No, I want this for all majors. No WHERE.
● Step 4: Do I need to group things?
○ Yes! I want "for each major," which immediately tells me I need GROUP BY Major. This
groups all "Computer Science" students together, all "Engineering" students
together, etc., before calculating COUNT and AVG for each group.
○ SQL thought: GROUP BY Major
● Step 5: How do I want to sort?
○ Let's sort by the number of students, from most to least.
○ SQL thought: ORDER BY NumberOfStudents DESC (DESC for descending, Z-A or
highest to lowest)
Putting it together:
SQL
SELECT
Major,
COUNT(*) AS NumberOfStudents,
AVG(GPA) AS AverageGPA
FROM Students
GROUP BY Major
ORDER BY NumberOfStudents DESC;
Result:
Major NumberOfStudents AverageGPA
Arts 2 3.45
Computer Science 2 3.65
Engineering 2 3.45
Scenario 4: Filtering Groups (HAVING clause)
Question: "Show me only the majors that have more than 1 student." (This builds on Scenario
3).
● Step 1, 2, 3, 4 (Same as Scenario 3):
○ SELECT Major, COUNT(*) AS NumberOfStudents, AVG(GPA) AS AverageGPA
○ FROM Students
○ (No WHERE for individual rows)
○ GROUP BY Major
● New Step: Filtering after grouping. I need to filter the groups based on the COUNT(*)
which is an aggregate. You cannot use WHERE with aggregate functions. For filtering
groups, you use HAVING.
○ SQL thought: HAVING COUNT(*) > 1
● Step 5: ORDER BY NumberOfStudents DESC (still want this).
Putting it together:
SQL
SELECT
Major,
COUNT(*) AS NumberOfStudents,
AVG(GPA) AS AverageGPA
FROM Students
GROUP BY Major
HAVING COUNT(*) > 1 -- This filters the groups *after* they are formed
ORDER BY NumberOfStudents DESC;
Result: (In our small dataset, all majors have 2 students, so the result is the same as Scenario
3, but imagine if a major only had 1 student, it would be excluded here).
Major NumberOfStudents AverageGPA
Arts 2 3.45
Computer Science 2 3.65
Engineering 2 3.45
Scenario 5: Combining Tables (INNER JOIN)
Question: "List all students and the names of the courses they are enrolled in."
● Step 1: What do I want to see?
○ Student Name (from Students table)
○ Course Name (from Courses table)
○ SQL thought: SELECT S.Name, C.CourseName (Using S. and C. for clarity because
both tables might have a Name column, although Courses doesn't in our example).
● Step 2: Where is this information stored?
○ Name is in Students.
○ CourseName is in Courses.
○ Since the data is in two different tables, I need a JOIN.
○ SQL thought: FROM Students S INNER JOIN Courses C
● Crucial Join Step: How do these tables connect?
○ Both tables have StudentID. That's our common link.
○ We want to match rows where the StudentID in Students is the same as the
StudentID in Courses.
○ SQL thought: ON S.StudentID = C.StudentID
● Step 3, 4, 5: No specific filtering, grouping, or ordering requested beyond the join itself.
Putting it together:
SQL
SELECT S.Name, C.CourseName
FROM Students S
INNER JOIN Courses C ON S.StudentID = C.StudentID;
Result:
Name CourseName
Alice Intro to Programming
Alice Algorithms
Bob Mechanics
Charlie Art History
David Data Structures
David Intro to Programming
Eve Mechanics
Frank Art History
Scenario 6: Left Join (Including non-matches from one side)
Question: "Show me all students, and if they're taking any courses, list those. If they're not
taking any, still show their name."
● Step 1: What do I want to see?
○ Student Name (from Students)
○ Course Name (from Courses)
○ SQL thought: SELECT S.Name, C.CourseName
● Step 2: Where is this information stored?
○ Students and Courses.
○ Key difference: I want all students regardless of a match. This is the hallmark of a
LEFT JOIN (where Students is the "left" table).
○ SQL thought: FROM Students S LEFT JOIN Courses C
● Crucial Join Step: How do these tables connect?
○ Same as before: ON S.StudentID = C.StudentID
● Step 3, 4, 5: No further conditions.
Putting it together:
SQL
SELECT S.Name, C.CourseName
FROM Students S
LEFT JOIN Courses C ON S.StudentID = C.StudentID;
Result (assuming we temporarily add a student 107 Zoe, Biology, no courses):
Name CourseName
Alice Intro to Programming
Alice Algorithms
Bob Mechanics
Charlie Art History
David Data Structures
David Intro to Programming
Eve Mechanics
Frank Art History
Zoe NULL
Scenario 7: Subquery (Using a result from one query to filter another)
Question: "Find the names of students who have a GPA higher than the overall average GPA
of all students."
● This requires two steps in your mind:
1. First, figure out: "What is the overall average GPA?"
2. Second, use that average to find students whose individual GPA is higher.
● Step 1 (Inner Query - the prerequisite): What is the overall average GPA?
○ What to see: The average of GPA.
○ Where: Students table.
○ SQL thought for inner query: (SELECT AVG(GPA) FROM Students)
● Step 2 (Outer Query - using the prerequisite): Now, find the students.
○ What to see: Name.
○ Where: Students table.
○ Conditions: GPA must be greater than the result of Step 1's query.
○ SQL thought for outer query: SELECT Name FROM Students WHERE GPA > (result
from inner query)
Putting it together:
SQL
SELECT Name
FROM Students
WHERE GPA > (SELECT AVG(GPA) FROM Students);
Result:
Name
Alice
Charlie
Eve
Scenario 8: Correlated Subquery (Inner query depends on outer
query)
Question: "Which students are enrolled in at least one course that has 4 credits?"
● Think: For each student, I need to check their specific courses to see if any have 4
credits.
● Step 1 (Outer Query): Start with each student.
○ What to see: Name.
○ Where: Students table (let's call it S).
○ Conditions: This is where the EXISTS with a subquery comes in.
○ SQL thought: SELECT S.Name FROM Students S WHERE EXISTS (...)
● Step 2 (Inner Query): Check courses for that specific student.
○ What to see: Just need to confirm existence, so SELECT 1 is fine.
○ Where: Courses table (let's call it C).
○ Conditions:
■ The StudentID in Courses must match the StudentID of the current student from
the outer query (S.StudentID). This is the correlation.
■ The Credits in Courses must be 4.
○ SQL thought: (SELECT 1 FROM Courses C WHERE C.StudentID = S.StudentID AND
C.Credits = 4)
Putting it together:
SQL
SELECT S.Name
FROM Students S
WHERE EXISTS (
SELECT 1
FROM Courses C
WHERE C.StudentID = S.StudentID -- This connects the inner query to the current row of the
outer query
AND C.Credits = 4
);
Result:
Name
Bob
David
Eve
Key Takeaways for Writing Queries:
● Start with SELECT and FROM: What data do you need, and where is it?
● Filter with WHERE: What specific rows do you want? (Applies to individual rows before
grouping).
● Group with GROUP BY for summaries: If you want calculations per category.
● Filter groups with HAVING: If you want to filter the results of aggregations.
● Join with JOIN ... ON to combine tables: If your answer spans multiple lists.
○ INNER JOIN: Only exact matches in both.
○ LEFT JOIN: All from left, plus matches from right (NULL if no match).
○ RIGHT JOIN: All from right, plus matches from left (NULL if no match).
○ FULL JOIN: All from both, with NULLs where no match.
● Use Subqueries (SELECT ... FROM ... WHERE (SELECT ...)): When you need to answer
a small question first to help answer a bigger one.
● Correlated Subqueries (SELECT ... FROM Table1 T1 WHERE EXISTS (SELECT ...
FROM Table2 T2 WHERE T2.ID = T1.ID)): When the inner question needs details from
each row of the outer query.
● Practice, Practice, Practice: The best way to learn is to ask yourself questions about
your data and try to write the SQL to answer them.
Don't be afraid to break down a complex question into smaller, simpler ones. SQL often
mirrors that logical thought process.