"The Data Analyst's Learning Path" by
Rohan Adus
Introduction: 1
Chapter 1: Building a Strong Foundation 2
Part 1: Understanding Statistics 3
Important Terms: 3
Resources: 3
Part 2: Excel for Data Analysis 4
Important Terms: 4
Resources: https://www.codecademy.com/learn/analyze-data-with-microsoft-excel 6
Chapter 2: Diving Deeper into Data Manipulation 6
Part 1: Mastering SQL 7
Important Terms 7
Resource: 9
Part 2: Python for Data Analysis 9
Important Terms: 10
Resource: 12
Chapter 3: Visualization and Presentation: Bringing Data to Life 12
Part 1: The Art of Visualization 12
Resources: 13
Part 2: Tableau - A Deep Dive 13
Important Terms 14
Resources: 15
Part 3: Storytelling with Data 16
Resources: 16
Part 4: Beyond Static Charts - Interactive Visualization 16
Resources: 16
Chapter 4: Showcasing Your Skills: Portfolio and Resume Building 16
Part 1: Crafting the Perfect Data Analyst Resume 17
Part 2: Building a Portfolio on GitHub 17
Conclusion: Embarking on a Rewarding Journey in Data Analysis 17
Please Read (Authors Note)
Hello! Thank you for choosing to go with this e book. This book is meant to provide FREE
resources for you to learn the essentials that I personally have used in over 3 years working as
a Data Analyst from large technology companies to start ups. I don’t want to waste any time, so
go through the links to resources and spend time briefly going over them. Go in with a project
first learning. Meaning start working on a project then use the resources to help guide you.
I want to thank you again for downloading this e book. I will continue to revise and improve it for
the next few months so stay tuned.
Hey! Before you go forward I want to plug this, if you want a more 1:1 experience.
If you want to have a 1:1 experience learning these concepts please apply to the
Analytics Collective bootcamp which is a 2 month intensive learning experience.
Link: https://tr.ee/N9CMnPzWBo
Introduction:
In the digital age, data drives decisions. From businesses to governments, the ability to harness
and interpret data is of paramount importance. Becoming a data analyst is not just about landing
a coveted job title; it's about understanding the stories data tells and using it to shape the future.
This eBook is tailored for individuals like you, passionate and eager to traverse the intricate
world of data analysis. With a blend of foundational knowledge and hands-on practice, the
resources in this guide are carefully selected to ensure a comprehensive learning experience.
Chapter 1: Building a Strong Foundation
Every great structure, be it a skyscraper or a software application, is built on a strong
foundation. In the realm of data analysis, foundational knowledge is crucial to grasp complex
concepts and techniques down the line.
Source:
https://datascience.virginia.edu/news/how-much-do-data-scientists-need-know-about-statistics
Part 1: Understanding Statistics
Statistics, often termed as the grammar of science, is the backbone of data analysis. It equips
you with the skills to make sense of vast amounts of data, identify patterns, and make informed
decisions.
Important Terms:
Population: the source of data to be collected.
Sample: a portion of the population.
Variable: any data item that can be measured or counted.
Quantitative analysis (statistical): collecting and interpreting data with patterns and data
visualization.
Qualitative analysis (non-statistical): producing generic information from other non-data forms of
media.
Descriptive statistics: characteristics of a population.
Inferential statistics: predictions for a population.
Central tendency (measures of the center): mean (average of all values), median (central value
of a data set), and mode (the most recurrent value in a data set).
Measures of the spread:
Range: the distance between each value in a data set.
Variance: the distance between a variable and its expected value.
Standard deviation: the dispersion of a data set from the mean.
Resources:
https://www.coursera.org/learn/stanford-statistics
https://www.coursera.org/learn/probability-statistics
Textbook (not free but worth including here)
I believe there is a free trial.
https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/
Part 2: Excel for Data Analysis
Source: https://sqlspreads.com/blog/how-to-use-sql-with-excel-for-data-analysis/
Before delving deeper into advanced data manipulation tools, it's pivotal to master Excel. A
mainstay in the business world, Excel offers a myriad of features for data manipulation, analysis,
and visualization. It's the first step towards handling and understanding data.
Important Terms:
Cells, Rows, and Columns:
The basics of Excel's layout – cells are the individual "boxes" where data is entered, rows are
horizontal lines, and columns are vertical.
Range:
A selection of two or more cells on a sheet. The cells in a range can be adjacent or
non-adjacent.
Formulas and Functions:
Formulas are used to calculate numbers or manipulate data. Functions are predefined formulas
in Excel, like SUM, AVERAGE, VLOOKUP, INDEX(MATCH), and IF statements.
Pivot Tables:
A tool that allows you to reorganize and summarize selected columns and rows of data in a
spreadsheet or database to obtain a desired report.
Data Types:
Understanding different data types such as numeric, text, date, and Boolean is crucial for
effective data analysis.
Conditional Formatting:
Allows you to automatically apply formatting—such as colors, icons, and data bars—to one or
more cells based on the cell value.
Data Validation:
Used to control the type of data or the values that users enter into a cell.
Charts and Graphs:
Visual representations of data. Common types include bar charts, line charts, pie charts,
histograms, and scatter plots.
Filtering and Sorting:
These tools are essential for managing large sets of data, allowing you to view only the
information that meets certain criteria and to order your data in a useful way.
Tables:
A range of cells that are treated as a single entity, with features that allow for quick sorting,
filtering, and formatting.
Data Analysis Toolpak:
An Excel add-in that provides additional statistical analysis tools, including various complex
analyses like regression, ANOVA, and t-tests.
What-If Analysis:
Tools like Goal Seek and Data Tables that allow you to forecast outcomes and analyze
scenarios.
Named Ranges:
A range of cells that have been given a name. This can make formulas easier to understand
and maintain.
Macros:
A series of instructions that can be triggered by a keyboard shortcut or a button in the
spreadsheet to automate repetitive tasks.
Lookup Functions:
Functions like VLOOKUP, HLOOKUP, and the newer XLOOKUP, which are used to search for
an item in a range and return a corresponding value.
Power Query:
An Excel tool to connect, combine, and refine data across a wide variety of sources.
Power Pivot:
An Excel add-in you can use to perform powerful data analysis and create sophisticated data
models.
Resources:
https://www.codecademy.com/learn/analyze-data-with-microsoft-excel
https://www.coursera.org/learn/excel-basics-data-analysis-ibm
https://www.analyticsvidhya.com/blog/2021/11/a-comprehensive-guide-on-microsoft-excel-for-da
ta-analysis/
Chapter 2: Diving Deeper into Data Manipulation
The world of data is vast and varied. To truly harness its potential, one must be adept at using
the right tools and techniques. This chapter delves into the heart of data manipulation, exploring
powerful languages and platforms that enable analysts to sift through, modify, and extract
meaningful insights from data.
Part 1: Mastering SQL
Source: https://www.datapine.com/blog/sql-joins-and-data-analysis-using-sql/
SQL (Structured Query Language) is more than just a data manipulation language; it's a key
that unlocks the vast treasure troves of databases. Whether you're dealing with a small
database or a massive data warehouse, SQL allows you to retrieve, modify, and manage data
with precision and efficiency. From basic commands to complex queries, this course will ensure
you're well-equipped to tackle any data challenge.
Important Terms
Database:
A structured set of data held in a computer, especially one that is accessible in various ways.
Table:
A set of data elements that is organized using a model of vertical columns (which are identified
by their name) and horizontal rows.
Column:
A vertical entity in a table that contains all information associated with a specific field in a
database.
Row/Record:
A single, implicitly structured data item in a table.
Primary Key:
A field in a table which uniquely identifies each row/record in that table.
Foreign Key:
A field (or collection of fields) in one table that uniquely identifies a row of another table.
SQL Statement:
The means by which operations are performed on the data within the database, such as
SELECT, INSERT, UPDATE, and DELETE.
SELECT:
The command used to select data from a database and is the most commonly used command.
WHERE:
A clause that allows you to specify the criteria for the data you want to select or manipulate.
JOIN:
A means for combining fields from two tables by using values common to each. There are
different types of joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
GROUP BY:
A clause used with aggregate functions (like SUM, COUNT, MAX, MIN, AVG) to group the
result-set by one or more columns.
HAVING:
A clause that allows you to specify conditions on the GROUP BY field(s) in aggregate functions.
ORDER BY:
A clause used to define the order of the returned data set.
INDEX:
A performance optimization feature that enables data to be accessed faster; indexes are used
to quickly locate data without having to search every row in a database table every time a
database table is accessed.
Transaction:
A unit of work performed within a database management system against a database, and
treated in a coherent and reliable way independent of other transactions.
View:
A virtual table based on the result-set of an SQL statement. It contains rows and columns, just
like a real table.
Stored Procedure:
A prepared SQL code that you can save and reuse over and over again.
Trigger:
A special kind of stored procedure that automatically runs when certain events occur in the
database.
Resource:
https://www.codecademy.com/learn/learn-sql
https://www.udemy.com/course/introduction-to-databases-and-sql-querying
Part 2: Python for Data Analysis
Source: https://dataschoolnigeria.com.ng/how-to-become-data-analyst-using-python/
Python isn't just a programming language; it's a data analyst's best friend. Its simplicity
combined with powerful libraries makes it a go-to choice for data manipulation, cleaning, and
analysis. With Python, you can automate mundane tasks, perform complex statistical analyses,
and even delve into machine learning. This guide introduces you to the world of data analysis
with Python, ensuring you have a robust toolkit to approach varied data challenges.
Important Terms:
Variable:
A name that is used to denote something that can be changed, such as a memory location to
store data values.
Data Types:
The classification of data items. Python has various data types including integers (int),
floating-point numbers (float), strings (str), and booleans (bool).
List:
An ordered collection of items which can be of different types. Lists are mutable, which means
they can be altered.
Tuple:
Similar to a list, but immutable; once created, objects within it cannot be changed.
Dictionary:
An unordered collection of items. While lists are indexed by a range of numbers, dictionaries are
indexed by keys, which can be any immutable type.
Set:
An unordered collection of unique items. Sets are mutable and new elements can be added to
them.
Function:
A block of organized, reusable code that is used to perform a single, related action.
Lambda Function:
A small anonymous function defined with the lambda keyword, which can take any number of
arguments but can only have one expression.
Module:
A file containing Python code that can define functions, classes, and variables, which can then
be imported and used in other Python programs.
Package:
A way of organizing related modules into a single directory hierarchy.
Class:
A code template for creating objects. Objects have member variables and have behavior
associated with them.
Object:
An instance of a class. Objects can represent real-world entities.
Method:
A function that is associated with an object. In Python, all functions in a class are methods.
Inheritance:
A way of arranging classes to provide a way to reuse code, whereby a class can inherit
attributes and behavior methods from another class.
Loop:
A programming structure that repeats a sequence of instructions until a specific condition is met.
Python has for and while loops.
Iterator:
An object that contains a countable number of values and can be iterated upon, meaning that
you can traverse through all the values.
Generator:
A function that produces a sequence of results instead of a single value, each time it is iterated
over.
Exception:
An error that occurs during the execution of a program. Python has built-in exceptions and
allows for custom exceptions to be defined as well.
Decorator:
A design pattern in Python that allows a user to add new functionality to an existing object
without modifying its structure.
List Comprehension:
A concise way to create lists. It consists of brackets containing an expression followed by a for
clause, then zero or more for or if clauses.
Pandas:
A software library written for the Python programming language for data manipulation and
analysis. In particular, it offers data structures and operations for manipulating numerical tables
and time series.
NumPy:
A library for the Python programming language, adding support for large, multi-dimensional
arrays and matrices, along with a large collection of high-level mathematical functions to
operate on these arrays.
Resource:
https://www.freecodecamp.org/learn/data-analysis-with-python/
https://www.geeksforgeeks.org/data-analysis-with-python/
https://learn.microsoft.com/en-us/training/modules/explore-analyze-data-with-python/
Chapter 3: Visualization and Presentation: Bringing
Data to Life
Visualization is the bridge between complex data structures and human understanding. The
ability to translate rows of numbers into insightful graphs, charts, and dashboards is not just an
art; it's a critical skill that amplifies the impact of your analysis. This chapter will immerse you in
the world of data visualization, emphasizing the importance of design, clarity, and storytelling.
Part 1: The Art of Visualization
Source: https://venngage.com/blog/data-visualization-infographic/
Before diving into the tools, it's essential to understand the principles that underpin effective
visualization. This section will explore:
The importance of choosing the right type of chart or graph for your data.
Color theory and its role in conveying information.
The balance between aesthetics and clarity.
Highly recommend spending a week on this section!!!
Resources:
https://www.coursera.org/learn/visualize-data
Part 2: Tableau - A Deep Dive
Source: https://www.datacamp.com/learn/tableau
Tableau is a leading tool in the world of data visualization. Its intuitive interface combined with
powerful capabilities makes it a favorite among data analysts.
Important Terms
Workbook:
A collection of sheets, including dashboards, worksheets, and stories, that are used to display
data visualizations.
Worksheet:
The space where you create views of your data by dragging and dropping fields onto shelves.
Dashboard:
A collection of several views, allowing you to compare a variety of data simultaneously.
Story:
A sequence of visualizations that work together to show different facets of your data, or a
data-driven narrative.
Shelf:
Areas at the top and left of the workspace where you can drag fields to build a view. Examples
include Columns, Rows, Filters, and Marks.
Marks Card:
A card that controls the marks in the view. You can use it to set mark properties like type, color,
size, shape, label, and detail.
Data Pane:
A pane on the left side of the workspace that displays the data source fields, including
dimensions and measures.
Dimensions:
Qualitative fields that cannot be aggregated; typically used for categorical information like
names, dates, or geographical data.
Measures:
Quantitative fields that can be aggregated, typically numerical values that you can perform
mathematical operations on.
Continuous Field:
A field that produces an axis that is continuous. These fields are typically measures and can
represent a range of numeric values.
Discrete Field:
A field that produces headers when placed on a shelf. These fields are typically dimensions and
can represent categorical information.
Calculated Field:
A new field that you create by using a formula to modify the existing fields in your data source.
Parameter:
A dynamic value that replaces a constant in calculations and can serve as filters or prompts.
Filter:
A way to restrict the data displayed in your view by including or excluding values.
Extract:
A saved subset of a data source that you can use to improve performance and analyze offline.
Data Blending:
The process of combining data from multiple sources into a single view.
Hierarchy:
A way of organizing related dimensions that can be expanded and collapsed in a view.
Bins:
User-defined groups of numeric data, useful for creating histograms.
Table Calculation:
A calculation that you can apply to the values in a table. It is often used for running totals,
differences, or percentages.
Reference Line:
A line in a graph or chart that serves as a reference point for interpretation, such as an average
or median line.
Tooltip:
A box with additional information that appears when you hover over marks in the view.
VizQL:
Tableau’s proprietary query language that translates drag-and-drop actions into data queries
and then expresses that data visually.
Resources:
https://www.tableau.com/learn/training
https://public.tableau.com/app/learn/how-to-videos
https://www.youtube.com/watch?v=Bo3i4jieZMA&t=374s
Enjoy the plug haha if you got this far ;)
Part 3: Storytelling with Data
Source: https://venngage.com/blog/data-storytelling/
Data on its own is just a collection of facts. The real magic happens when you weave these
facts into a compelling narrative. This section emphasizes:
The importance of context in data presentation.
Techniques to craft a narrative around your data.
How to cater your presentations to different audiences, from technical experts to business
stakeholders.
The role of interactivity in engaging your audience.
Resources:
https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science
-skill-everyone-needs/?sh=7f4eb4f252ad
Part 4: Beyond Static Charts - Interactive Visualization
The digital age offers a plethora of tools to make your visualizations interactive and dynamic
Resources:
https://www.udacity.com/course/data-visualization-with-d3js--ud507
Chapter 4: Showcasing Your Skills: Portfolio and
Resume Building
In the competitive landscape of data analysis, it's not just about what you know, but how you
present it. This chapter is dedicated to helping you curate and showcase your skills and projects
effectively. With the right presentation, your hard work and expertise can shine, making you a
standout candidate for any role or opportunity.
Part 1: Crafting the Perfect Data Analyst Resume
Source: https://www.rezi.ai/resume-templates/data-analyst-resume-template
Source:
https://www.freecodecamp.org/news/writing-a-killer-software-engineering-resume-b11c91ef699d
/
Your resume is often the first impression you make on a potential employer. Crafting it with care
and precision is crucial.
Tips:
● Do not keep more than 1 page if you have less than 10 years of experience
● No images on resume
Structure
1. Your name
2. Contact Information
3. Education (If you are new grad other wise put this at bottom)
4. Employment
5. Projects
6. Skills
Resources:
https://www.freecodecamp.org/news/writing-a-killer-software-engineering-resume-b11c91ef699d
/
https://www.rezi.ai/resume-templates/data-analyst-resume-template
Part 2: Building a Portfolio on GitHub
Source: https://github.blog/2023-05-16-addressing-githubs-recent-availability-issues/
A well-constructed portfolio can speak volumes about your skills, approach, and experience.
GitHub is a popular platform for data analysts to showcase their projects.
(will be done soon)
Conclusion: Embarking on a Rewarding Journey in
Data Analysis
As we close the pages of this guide, it's essential to remember that the realm of data analysis is
dynamic, ever-evolving, and brimming with opportunities. The tools, techniques, and resources
we've explored together are merely stepping stones on a much larger journey—a journey
defined not just by algorithms and datasets, but by curiosity, passion, and the drive to uncover
insights that can change the world.
The world of data is vast and sometimes intimidating, but it's also a playground for the
inquisitive mind. Every dataset tells a story, every visualization paints a picture, and every
analysis solves a puzzle. The beauty of this field lies not just in the numbers and codes but in
the narratives they weave and the impact they create.
Yet, as with any journey, there will be challenges. There will be moments of doubt, days of
frustration, and times when the data just doesn't seem to make sense. In these moments, lean
on the community, revisit your foundational learning, and remember why you started. Data
analysis is as much a test of perseverance as it is of skill.
If you got this far. Congrats :) I wish you the best of luck!