MBA To Analyst - A Comprehensive Beginner's Guide
MBA To Analyst - A Comprehensive Beginner's Guide
Guide
Introduction
Transitioning from an MBA in Sales & Marketing to a data-driven role can feel daunting, but it’s absolutely
attainable. Modern businesses run on data, and the ability to analyze and present data is in high demand.
In fact, a data analyst’s job is to collect, organize, and analyze data in ways that non-analysts can easily
understand – making tools like Excel one of the most valuable assets in business today 1 . This guide will
walk you through the essential tools and skills you need, from the basics of Excel to building interactive
Power BI dashboards and writing SQL queries, even if you don’t come from a technical or mathematical
background. We’ll explore the tools’ historical evolution in a storytelling format (to show why they exist),
work through real-world case studies in sales/marketing analytics, compile a handy cheat sheet of key
functions, and offer tips for hands-on practice, interview preparation, and building a job-winning portfolio.
Who is this guide for? It’s tailored for beginners from business backgrounds – people who have strong
domain knowledge in sales or marketing (like MBA grads) but need to develop the technical skills for roles
like Sales Analyst or Business Analyst. The tone is approachable and jargon-free, focusing on practical
understanding and job-oriented skills. By the end, you should not only grasp what each tool is and does,
but also how to apply them to solve business problems and showcase your results to potential employers.
Let’s start by introducing the key tools in your analytics toolkit and how they evolved to meet business
needs over time.
What it is: Excel is a spreadsheet program and arguably the original business analytics tool. It allows you to
tabulate data, perform calculations, create charts, and much more, all through a familiar grid interface. For
many business users, Excel is the first step into data analysis – and it’s often more powerful than people
realize.
Story & Evolution: Excel wasn’t the first electronic spreadsheet, but it became the most popular. Back in the
late 1970s, business students like Dan Bricklin were wrestling with either hand calculations or clunky
1
mainframe programs for case studies. Bricklin envisioned a better way and created VisiCalc (1978), the first
spreadsheet software, which was a hit on the Apple II. A few years later came Lotus 1-2-3 in 1983, which
added charting and basic database functions on PCs 2 . Microsoft entered the fray with Excel in 1985 (first
on Mac, then Windows in 1987). Excel used a graphical interface (pull-down menus, mouse support) which
was innovative for its time 3 . Over the years, Excel evolved from a basic spreadsheet to a complex
analytics tool, gaining features like 3D charts, macros (VBA scripting), and PivotTables for summarizing
data 4 . Each new version expanded its capabilities – by the 1990s Excel was part of Microsoft Office and a
staple in every office environment. Even today, Excel continues to adapt (now with cloud collaboration via
Office 365 and even Python integration in the latest versions).
Why it matters for you: Excel is ubiquitous in business. It’s often the first tool you’ll use to store and
explore data. Need to calculate sales growth, clean up a list of customers, or make a quick chart of quarterly
revenue? Excel can do it. Despite newer tools, Excel remains a “common language” of data in organizations.
Its flexibility is both a strength and a weakness – you can do almost anything in Excel, but for very large data
or advanced analytics, we have other tools (which we’ll get to). Still, as a beginner, mastering Excel’s core
features is non-negotiable. The good news is you probably have some Excel familiarity from your MBA; now
we’ll elevate those skills (think advanced formulas, PivotTables, and data cleanup techniques).
Key features to know: Basic arithmetic formulas ( =SUM() , =AVERAGE() , etc.), IF conditions and lookup
functions (e.g. =VLOOKUP or the newer XLOOKUP for finding values, =SUMIF for conditional sums), data
cleaning tools (remove duplicates, text-to-columns), PivotTables (for summarizing data by categories), and
charting (bar/column charts, line charts for trends, pie charts for composition, etc.). We’ll provide a cheat
sheet of formulas later, but these are the building blocks. For instance, if you have a list of sales leads, you
might use Excel to filter by region, compute the average deal size with AVERAGE , or create a PivotTable to
summarize total sales by product line. Excel’s strength is quick, ad-hoc analysis on moderate-sized data
(thousands of rows, not millions). When data grows or analysis gets complex, you might then move to
databases or BI tools – which leads us to…
What it is: Power Query is a tool for data extraction and transformation. Think of it as an advanced data
import tool – it helps you connect to various data sources (another Excel file, a CSV, a database, a website,
etc.), then clean, combine, or transform that data before you analyze it. Power Query was first introduced
as an Excel add-in and is now built into Excel (under Get & Transform Data on the Data tab) and also into
Power BI. It uses a user-friendly interface to apply steps like filtering rows, splitting columns, merging
tables, etc., and it records those steps so you can refresh the process when new data comes in. This saves
tons of time compared to doing repetitive cleanup in Excel manually.
Story & Evolution: As Excel usage grew, analysts needed better ways to handle messy data and automate
the cleanup. Microsoft introduced Power Query in Excel 2010 as a free add-in (initially called “Data
Explorer”) 5 . It became popular and was integrated into Excel 2016+ as the default way to import data
(replacing older, manual import wizards) 6 . Essentially, Power Query brings ETL (Extract, Transform,
Load) capabilities to the Excel user. In Power BI (launched 2015, which we’ll cover next), the same Power
Query technology is used in the “Query Editor” to prepare data for dashboards. The language working
behind the scenes is called “M”, but you typically don’t need to code M directly – you perform actions
through the interface (like clicking “Remove Columns” or “Merge Tables”) and Power Query builds the steps
for you. Over the last decade, Power Query has become a critical tool because data often comes from
2
multiple sources and rarely is it analysis-ready: you might have to clean text, fix date formats, filter out
irrelevant records, etc. Power Query allows these transformations to be applied consistently and without
coding (for most operations). Microsoft has continued improving it and even extended support from
Windows Excel to Mac Excel recently due to its popularity 7 5 .
Why it matters for you: In a sales or marketing analyst role, you’ll often pull data from different places –
maybe an export from a CRM system (like Salesforce), an Excel sheet from a finance team, and a CSV from a
market research report – and then you need to mash it together. Power Query is built for exactly that
scenario. For example, you could use Power Query to connect to monthly sales CSV files, append them into
one big table, filter out test entries, calculate new columns (perhaps categorize products), and load the
refined data into Excel or Power BI for analysis. The next time you get new monthly files, just drop them in
the folder and hit “Refresh” – voila, all steps apply again. This automation of data prep is a huge time-saver
and ensures consistency. Non-technical users love Power Query because it’s mostly point-and-click. It’s
especially useful if you find yourself doing the same cleanup in Excel over and over – that’s your cue to let
Power Query handle it.
Key features to know: How to connect to various data sources (Excel, CSV, SQL database, web), the Power
Query Editor interface (and its four phases: Connect, Transform, Combine, Load 8 9 ), common
transformations (filtering rows, sorting, removing or renaming columns, changing data types, splitting or
merging columns, pivoting/unpivoting data, appending queries, merging (joining) queries). Each
transformation is recorded as a step (you can see them in the Applied Steps pane) which you can edit or
reorder 10 11 . Also, knowing that Power Query is case-sensitive and somewhat Excel-like but not exactly
(for example, it won’t automatically fill down like Excel formulas – you explicitly add transformations). We’ll
see it in action in a case study soon. For now, remember: Power Query is your friend for cleaning messy
data without manual drudgery.
What it is: Power BI is a Business Intelligence (BI) tool by Microsoft that lets you create interactive reports
and dashboards. Think of taking data (which you might clean with Power Query) and then building a multi-
page report with charts, graphs, and KPIs that users can click, filter, and explore. Power BI has a Desktop
application for designing reports and a cloud service for sharing dashboards in an organization. It
integrates data modeling (like a lightweight database), a calculation engine (DAX, which we’ll cover next),
and rich visualization capabilities. For someone coming from Excel, Power BI feels both familiar (it uses
similar functions and pivot-table-like concepts) and empowering (you can handle much larger datasets and
make professional-looking dashboards with ease). It’s commonly used to monitor sales performance, track
marketing metrics, create financial dashboards, and more – basically any scenario where you have data that
people need to see and interact with.
Story & Evolution: The history of Power BI is essentially Microsoft’s journey from Excel to a full BI platform.
Around 2010, Microsoft developed PowerPivot, an add-in for Excel 2010 that allowed users to load large
data models into Excel and perform fast calculations (powered by a new in-memory engine and the DAX
formula language) 12 13 . PowerPivot demonstrated that there was huge demand for “self-service BI” –
tools that analysts (not just IT departments) could use to model and analyze data. Microsoft then added
Power View (for data visualization) and Power Query (for data loading) as Excel add-ins by 2013 14 . The
first incarnation of “Power BI” was actually Power BI for Office 365 in 2013, which was a cloud service tying
together those Excel add-ins 15 . In 2015, Power BI was re-launched as a standalone product – with Power
3
BI Desktop (a free Windows application for creating reports) and the Power BI cloud service 16 . From that
point on, Power BI rapidly evolved as its own ecosystem, gaining features at a fast pace (monthly updates!).
Key milestones included introduction of the Power BI mobile apps, Power BI Pro vs. Premium (for larger
scale deployments), and continuous improvements in visuals and AI integrations 17 . Today, Power BI is
recognized as a leader in analytics – Gartner has consistently put Microsoft in the Leaders quadrant for BI
platforms 18 – largely because Power BI made advanced analytics accessible to a broad audience. It went
“from an Excel add-in to a comprehensive BI platform”, fulfilling Microsoft’s vision of bringing data
analytics to the masses 19 .
Why it matters for you: As a prospective analyst, knowing Power BI gives you the ability to present data in
a compelling way. It’s one thing to crunch numbers in a spreadsheet; it’s another to build a dashboard that
management can interact with. For example, imagine you have sales data by region, product, and month. In
Power BI, you can create a dashboard where a manager clicks on a region on a map and all the charts
update to show sales in that region, or they select a year from a slicer and see trends for that year. The
interactivity and visual clarity help decision-makers grasp insights quickly – and that’s a valuable skill. In
roles like Sales Analyst, you might be tasked with creating a sales performance dashboard for your
company: Power BI is an ideal tool for that. It also handles larger datasets better than Excel (hundreds of
thousands or even millions of rows can be imported into Power BI’s data model). Another advantage is the
ability to combine multiple data sources in one model – e.g., connect to an ERP database for actual sales, an
Excel file for targets, and a CSV for industry benchmarks, and bring them all together in one report.
Key features to know: The Power BI Desktop interface has three main views – Report (where you design
visuals), Data (where you see tables of your data), and Model (where you manage relationships between
tables). Important concepts include connecting to data (Power BI can connect to Excel files, databases,
online services – you name it 20 ), using Power Query (inside Power BI) to transform data, building a data
model (multiple tables linked by relationships, like a simplified database), creating Measures using DAX
(e.g., a measure for Total Sales or Year-over-Year Growth), and designing visuals: bar charts, line charts, pie
charts, maps, tables, cards (KPIs), etc. Power BI offers a rich set of visuals, including custom visuals – typical
ones you’ll use are bar/column charts for comparisons, line charts for trends, pie or donut charts for parts
of a whole, maps for geographic data, and cards/gauges for single metrics 21 . You should also know about
slicers and filters (ways for the user or report designer to filter data in visuals), and the basics of the Power
BI service (to publish and share reports). Don’t worry – you can start creating useful dashboards in Power BI
with relatively little coding, mostly dragging and dropping fields onto visuals and writing basic formulas.
We’ll do a step-by-step case study to illustrate this.
What it is: DAX stands for Data Analysis Expressions, and it’s the formula language used in Power BI,
Power Pivot, and Analysis Services Tabular (i.e., Microsoft’s analytical databases). If you think of Power BI as
Excel on steroids, then DAX is like Excel formulas on steroids. It looks a bit like Excel formula syntax (and
includes many familiar functions like SUM , AVERAGE , etc.), but it’s designed to work with relational data
(multiple tables) and to perform dynamic aggregations based on filter context. You use DAX to create
measures (calculations like Total Sales, Average Discount, Conversion Rate) and calculated columns or tables
in your data model. Learning DAX is key to unlocking the full power of Power BI – it’s what allows you to
define custom metrics, business KPIs, and complex calculations that update based on user selections in the
report.
4
Story & Purpose: DAX was developed around 2009 when Microsoft created PowerPivot. In essence, they
wanted to give Excel users a way to perform calculations on the new in-memory data model that
PowerPivot introduced. Traditional Excel formulas work cell-by-cell, but in a data model you need formulas
that work on entire columns and respect relationships between tables. DAX was thus created as a blend of
Excel functions and database querying capabilities (borrowing some concepts from SQL and from MDX, an
older analysis language for OLAP cubes) 22 . It was first released in 2009 with the PowerPivot for Excel 2010
add-in 23 . The language has evolved since but remains focused on analytic calculations. A simple example:
in Excel you might do SUM(B2:B100) to sum cells; in DAX, you define a measure Total Sales =
SUM(Sales[Amount]) – similar idea, but Sales[Amount] refers to the entire column [Amount] in the
Sales table, and the result of the measure will depend on the context (e.g., if you put Total Sales in a chart by
Region, it will sum only the rows for that region). DAX introduced powerful functions like CALCULATE
(which lets you modify filter context), time intelligence functions (e.g., calculate Year-to-Date or Year-over-
Year with ease), and functions to traverse relationships (e.g., RELATED or FILTER across tables). It’s
considered easy to learn the basics (sums, ratios) but has depth to solve very complex analytics (like
“percentage of total of category X for customers who bought product Y” – these types of things are doable
in DAX with the right pattern).
Why it matters for you: If you aim to create job-level dashboards in Power BI, you will inevitably need some
DAX. Many out-of-the-box things require it: for example, a measure for Conversion Rate = (Count of converted
leads) / (Count of total leads) or Growth % = (This period sales – last period sales) / last period sales. These are
not hard-coded in Power BI – you create them with DAX. The good news: you can start with basic DAX and
gradually learn patterns. As a sales/marketing analyst, common DAX measures might include: Total Sales,
Total Units, Average Deal Size, Number of Customers, % of Total Market Share, Running Total (cumulative
sales over months), Year-over-Year Growth, etc. DAX shines in scenarios like “compare this period vs previous
period”, “show top 5 products by sales”, “calculate ratio metrics like conversion or churn”. For interview prep, you
should at least understand what DAX is – a collection of functions and operators used in formulas to
calculate custom results in Power BI’s data model 24 .
Key features to know: The concept of a Measure vs a Calculated Column: a measure computes on the fly
depending on filter context (e.g., Total Sales changes whether you look at 2023 or 2024 data), while a
calculated column is like an extra data field added to a table (computed row by row, like adding a “Profit =
Sales – Cost” column). Most of your core KPIs will be measures. Know a few fundamental DAX functions:
SUM , COUNTROWS , AVERAGE are straightforward aggregators. CALCULATE is the most important
function – it allows you to apply filters inside a measure (e.g., CALCULATE(SUM(Sales[Amount]),
Sales[Region]="East") would compute sales only for East region). Time intelligence functions like DATEADD
or SAMEPERIODLASTYEAR help with period comparisons. Also understanding filter context (filters
coming from rows, columns, slicers in a report) and row context (used in calculated columns or when
iterating). At a beginner level, you might start by writing simple measures and using implicit measures
(Power BI can auto-sum a field), but to stand out in a role, you will want to demonstrate you can write your
own DAX measures for business logic. Don’t worry – we’ll use a bit of DAX in the case study, and the cheat
sheet will list a few crucial DAX formula examples.
What it is: SQL (Structured Query Language) is the standard language for interacting with relational
databases. Whenever data is stored in a database (think: your company’s sales transactions in an Oracle or
MySQL database, or customer info in a SQL Server database), analysts use SQL to retrieve and manipulate
5
that data. In simple terms, SQL lets you ask questions of the data like “Hey database, give me all sales from
2023, grouped by product category, where region is ‘West’”. SQL queries are composed of statements like
SELECT, FROM, WHERE, GROUP BY, etc., which you combine to filter, aggregate, and join data across tables.
Even if you haven’t coded before, SQL is quite readable and highly valuable for analysts – it’s how you pull
raw data for further analysis in Excel, Power BI, or Python. Many analyst roles expect at least basic SQL
proficiency because companies have their data in databases or data warehouses.
Story & Evolution: SQL’s history goes back to the very foundations of modern data management. In 1970,
Edgar F. Codd at IBM proposed the relational model for databases (data organized in tables with rows and
columns) 25 . To implement this, IBM created a language called SEQUEL (Structured English Query
Language) in the early 1970s as part of System R, a prototype relational database. SEQUEL was later
shortened to SQL (but everyone still pronounces it “sequel”) 26 . The first commercial SQL database was
released by Oracle in 1979 27 , and by the mid-1980s SQL was adopted as an ANSI/ISO standard language
for relational databases. The beauty of SQL is its longevity – here we are, over 40 years later, and SQL is still
the de facto way to work with structured data. All major database systems (Oracle, SQL Server, MySQL,
PostgreSQL, etc.) use SQL (with minor dialect differences). Over time, SQL has evolved with more features
(like window functions, common table expressions, etc.), but its core
( SELECT ... FROM ... WHERE ... ) remains. It’s noteworthy that SQL was designed to be declarative –
you state what you want (e.g., “sum of sales by region”) and the database engine figures out how to get it.
This makes it accessible even to non-programmers, as long as you learn the syntax.
Why it matters for you: In many analytics jobs, the data you need lives in a database, not a neat Excel
file. For example, a company might have a sales table with millions of records on a SQL Server – you can’t
(and shouldn’t) export all that to Excel. Instead, you’d write a SQL query to get exactly the subset you need:
maybe total sales by month for the last 2 years, or a list of customers who purchased specific products.
Knowing SQL means you can self-serve your data needs without always asking a data engineer. Even if you
primarily use Power BI or Excel, under the hood those tools often use queries to fetch data. Many Power BI
data models begin by the analyst writing a SQL query in Power Query to import the data. During interviews,
it’s common to be asked to demonstrate SQL knowledge – e.g., “How do you join two tables?”, “What is the
difference between WHERE and HAVING clauses?”, or “Write a query to find the top 5 products by revenue.” Being
comfortable with SELECT statements, JOINS, and basic aggregations will go a long way. Also, if you ever use
Python or R for analysis, they often pull data from databases with SQL. So think of SQL as the key to unlock
the company’s big data sources.
Key features to know: The basic SQL query pattern: SELECT [columns] FROM [table] WHERE
[condition] . For example, SELECT Region, SUM(Sales) FROM SalesTable WHERE Year=2024
GROUP BY Region; would give total 2024 sales per region. Understand SELECT (choosing columns),
FROM (the table or tables to query), WHERE (filter rows), GROUP BY (when aggregating), HAVING (filter
after aggregation, e.g., “HAVING SUM(Sales) > 1,000,000”), ORDER BY (sort results). Also, JOINs are crucial:
joining tables by keys (INNER join for matching data in both tables, LEFT join when you want all records
from one table regardless of matches, etc.). Example: joining a Product table to a Sales table to get product
names alongside sales amounts. Other handy things: Aggregating functions like COUNT, SUM, AVG, MIN,
MAX for reporting; LIKE for pattern matching in text; and knowing about subqueries or common table
expressions (for more advanced querying logic). For a business analyst, you likely won’t be writing
extremely complex stored procedures, but you should be able to do multi-table select queries to assemble a
data set for analysis. We will see an example of using SQL in our case studies (for instance, segmenting
6
customers via a SQL query). Remember, SQL is a transferrable skill – once you know it, you can work with
any relational database, which is why it’s often listed in job requirements.
What it is: Python is a general-purpose programming language that has become one of the top languages
for data analysis, data science, and automation. Unlike the other tools discussed (which are mostly specific
to data tasks), Python is a full-fledged programming language (you can build web apps or do automation
with it), but it has fantastic libraries for data work. For a business analyst, Python can complement the
above tools in various ways: you can use it to automate repetitive tasks (e.g., reading 100 Excel files and
consolidating them), perform statistical or text analysis that Excel/Power BI can’t, or handle big data that
won’t fit in Excel. Important libraries (add-ons) that make Python useful for analytics include pandas (for
data manipulation, like Excel in code), NumPy (for numerical computing), Matplotlib/Seaborn (for plotting
graphs), and others like scikit-learn (for machine learning, if you venture there). The great thing is Python
is open-source (free) and has a huge community, so for almost any problem, there’s likely a library or
solution out there.
Story & Evolution: Python was created by Guido van Rossum and first released in 1991 28 . It was designed
with an emphasis on code readability and simplicity (“executable pseudocode” was a guiding idea). The
name “Python” actually comes from the British comedy Monty Python’s Flying Circus, not the snake 29 , which
reflects Guido’s fun approach. For the first couple of decades, Python was popular among software
developers and in academia, but not specifically in business analytics. However, in the 2010s, the rise of data
science changed that. Libraries like pandas (released around 2010) made it easy to manipulate
spreadsheets of data in Python, and Jupyter Notebooks provided an interactive way to write code and see
results, which analysts and scientists loved. Python’s popularity exploded as it became the language of
choice for machine learning, data analysis, and automation. It consistently ranks as one of the most popular
programming languages in the world 30 , fueled by its versatile ecosystem. Companies like Google,
Facebook, and Netflix use Python extensively for data analytics and backend services. Importantly, Python
has a shallow learning curve for basic tasks – many find its syntax more readable than, say, R or SQL. As of
2025, Python remains widely used for data analysis, processing, and machine learning, thanks to its
simplicity and huge community support 31 . For an analyst, learning Python is like adding a superpower:
you can do things that might be very hard or impossible in Excel/Power BI, such as complex statistical
modeling, text mining (e.g., analyzing customer comments), or seamlessly integrating data from the web or
APIs.
Why it matters for you: While you can get an analyst job without Python, having it on your resume is a big
plus. It signals that you can automate tasks and potentially work with data scientists or engineers more
effectively. In practical terms, you might use Python to do things like: scrape data from a website (maybe
gather competitor pricing or social media metrics), clean a messy dataset using pandas (sometimes faster
and more robust than Excel for large data), perform a what-if simulation or statistical test, or even build a
simple predictive model (like forecasting next month’s sales using a regression). For example, say marketing
provides you 10,000 customer survey responses – in Excel, analyzing text responses is painful, but in Python
you could quickly do a word frequency analysis or sentiment analysis with the right libraries. Python can
also directly connect to databases and spreadsheets, so you can pull data with SQL in Python, do
transformations, and output a report – all in one script. Many analysts start using Python to automate the
boring stuff: if you’re doing a tedious Excel process every week, you can write a Python script to do it and
schedule it to run automatically. In interviews, you might be asked if you have experience with Python or
7
how you’ve used it. Even a basic example like “I wrote a Python script to automatically clean and aggregate
our sales data Excel files, which saved me hours each month” will impress, as it shows initiative and
technical ability.
Key features to know: Python itself has simple syntax (for instance, a = 5 assigns a value, loops are
written as for x in range(10): etc.). As an analyst, focus on the pandas library: know how to read
data ( pandas.read_excel or read_csv ), the concept of a DataFrame (pandas’ table structure,
analogous to a worksheet), and how to do operations like filtering ( df[df['Region']=='East'] gives
only East region rows), computing new columns ( df['Profit'] = df['Sales'] - df['Cost'] ),
grouping and aggregating ( df.groupby('Region')['Sales'].sum() gives total sales by region), and
merging data (joining two DataFrames). Also, using Jupyter Notebooks is common for data analysis – it lets
you run code and see output (data or charts) step by step, which is great for exploration. Additionally, basic
visualization: with pandas you can quickly plot a chart, or use Matplotlib/Seaborn for more control.
Another thing is Python can interface with SQL – libraries like sqlite3 or SQLAlchemy allow running
queries, so sometimes you use a mix: SQL to get raw data, Python to further analyze and visualize. Don’t
worry if this sounds like a lot – you don’t need to be a software engineer, just comfortable writing and
running scripts for data tasks. We will outline a simple case in the next section to illustrate how Python
might be used in a business scenario.
1. Sales Performance Dashboard – using Excel, Power Query, Power BI, and DAX to create an
interactive sales dashboard.
2. Customer Segmentation Analysis – using SQL (and a bit of Excel) to segment customers and derive
insights on purchasing behavior.
3. Marketing Campaign Analysis – using Python to analyze and visualize the results of a marketing
campaign (e.g. email marketing data), including a step-by-step solution.
Each case will also highlight the thought process – how to approach the problem, not just the final answer.
This is important, because in job settings and interviews, explaining how you arrived at an insight can be as
crucial as the insight itself.
Problem: You are a newly hired Sales Analyst at a retail company. You’ve been given a dataset of sales
transactions for the past two years, along with some product and store information. Management wants a
dashboard that shows key sales metrics and allows them to drill down by time period, product category,
and store region. Essentially, they want to answer questions like “Which regions are performing best this
quarter?”, “What are our top-selling product categories overall and by region?”, and “How are this year’s
sales trending compared to last year?”. They also mentioned interest in seeing if external factors like
holidays or even weather (temperature) had any correlation with sales dips or spikes.
8
Data: You have three CSV files – one for Sales (each row is a transaction with date, product ID, store ID,
quantity, revenue), one for Products (product ID, category, price), and one for Stores (store ID, region,
maybe size). Additionally, someone provides you a CSV of average monthly temperatures by region (to
explore the weather angle). The sales data has about 100,000 rows (so manageable in Excel or Power BI).
There are missing values in some fields (some transactions missing store ID due to data entry issues, etc.),
and product categories need standardizing (e.g., “Electrnics” vs “Electronics” spelling).
Approach: We’ll use Power Query in Excel to clean and combine the data, do some initial analysis in Excel
(like pivot tables to validate numbers), then load it into Power BI to build the dashboard, using DAX to
create measures for year-over-year comparisons and other metrics.
1. Data Cleaning & Preparation (Power Query in Excel): First, import the Sales CSV into Excel via
Power Query. In the Power Query Editor, perform transformations: remove any obviously erroneous
rows (e.g., transactions with negative quantities), fill in missing Store IDs if possible (maybe using
product or date info if it’s recoverable, or otherwise mark as “Unknown”), standardize the Product
categories text (we can do a transform to capitalize or correct common misspellings). Also, merge in
the product and store info: use Merge Queries to bring Category from Products into the Sales table
(matching on Product ID), and Region from Stores (matching on Store ID). This way, the Sales table
query ends up enriched with Category and Region columns for each sale – making it easy to
aggregate by those. We also merge in the weather data: perhaps add an AverageTemp column to
each sale by matching on Region and Month of sale. (In Power Query, we might have to derive
Month from the date, then merge with the weather table on Region+Month). After transformations,
we load the cleaned data into Excel’s Data Model. The Power Query steps are recorded, so if we get
fresh data next month, we could refresh and apply the same cleanup automatically 32 33 .
2. Data Model & Initial Pivot (Excel): In this case, we might use Excel’s Data Model (Power Pivot) to
relate tables, but since we merged everything into one table for simplicity, we can directly create a
PivotTable on the result. In Excel, we make a quick PivotTable to sanity-check the data: for example,
sum of revenue by year – do the totals make sense? We find 2024 total sales = \$5.2M, 2023 = \$4.8M
(just hypothetical). We note seasonality – pivot by month shows dips in certain months. This is just to
get familiar with the data. Now, rather than trying to make a complex dashboard in Excel (which is
possible but less interactive), we decide to use Power BI for the final reporting.
3. Building the Dashboard (Power BI): Open Power BI Desktop, get data -> import the cleaned Sales
table (we could either import the original raw files and redo cleaning in Power BI’s Power Query, or
since we perfected it in Excel, one trick: we can connect Power BI to the Excel file’s Data Model or
output Power Query to a worksheet and import that). Let’s assume we import the raw CSVs into
Power BI and apply similar Power Query steps there (just to show it can be done either place). Once
data is in Power BI, we check relationships: if we kept separate tables (Sales, Products, Stores,
Weather), we ensure relationships: Sales[ProductID] -> Product[ID], Sales[StoreID] -> Stores[ID], and
perhaps Sales on Region+Month -> Weather (if we had a composite key; or we could merge Weather
in Query as we did). For simplicity, assume we have one flat table in the model (SalesMaster). Now,
create DAX measures for key metrics: e.g., Total Revenue = SUM(SalesMaster[Revenue]) ,
Total Quantity = SUM(SalesMaster[Quantity]) , Avg Transaction Value =
DIVIDE([Total Revenue], COUNTROWS(SalesMaster)) (or use average per transaction if
available). For year-over-year, we might create a measure: Revenue PY = CALCULATE([Total
Revenue], DATEADD(SalesMaster[Date], -1, YEAR)) which shifts the date filter one year
9
back. Then YoY Growth % = DIVIDE([Total Revenue] - [Revenue PY], [Revenue PY]) .
We also add a measure for “Temperature Correlation” if desired – perhaps not a simple measure, but
we can visualize temperature vs sales by month to see patterns.
4. Designing visuals: On Power BI report canvas, create a summary dashboard page. For example: a
card visual showing Total Revenue this year, another card for YoY Growth (formatted as a
percentage). A line chart showing monthly revenue for 2 years (with last year and this year lines,
using the Revenue and Revenue PY measures). A bar chart for Top 5 Categories by Revenue
(Category on axis, Revenue on value, sorted desc). A map visual showing sales by Region (using
built-in maps: place Region as location, Revenue as size/color – or use a filled choropleth map).
Possibly a clustered bar chart showing each region’s revenue and maybe number of stores to
contextualize. We also include slicers: perhaps a slicer for Year (so we can filter the whole report to
2024 or 2023 or compare), and one for Region (so they can focus on a specific region). With a few
clicks, we format it nicely, add titles like “Sales Dashboard – as of Q4 2024”.
5. Adding interactivity & final touches: We enable drilling on the charts (e.g., on the category bar
chart, allow drill down to the product level if needed). We add a tooltip on the line chart to show both
year’s values when hovering. If we included the temperature data, maybe we add a scatter chart: X-
axis = Average Temperature, Y-axis = Sales, with month as points, to see if there’s a correlation
(perhaps for seasonal products). In this fictitious scenario, say we discover that sales dip in extremely
hot months for our product line; that insight could be shown by that chart, or we could note it
separately.
6. Findings: The dashboard reveals a few interesting points. For example: Overall sales are up 8% YoY
(growth % shown on the card). The East region is the highest revenue but had a dip in June (which
stands out on the line chart – indeed June sales dropped significantly while other months followed
last year’s trend). The category chart shows “Electronics” and “Home Appliances” are top categories,
but the map shows the West region underperforming in Electronics compared to others. By
interacting, management can see for themselves these insights – e.g., they click West region, and
see that Electronics sales in West are much lower and maybe declining, signaling a potential issue
(perhaps a regional competitor). We also see from our scatter chart that in the hottest month (June),
sales fell – could heat waves be affecting foot traffic? The CMO might hypothesize that an external
factor like weather or a missed marketing campaign in June caused the dip. Our dashboard doesn’t
prove causation, but it aids asking the right questions.
One example from a real portfolio project: an analysis of Walmart store sales used Excel and Power BI
together to uncover trends. The analyst created dashboards illustrating sales trends and store
performance, and identified key factors like temperature, CPI (consumer price index), and holidays that
impacted sales 34 . This is very similar to what we’ve done in this case – integrating an external factor
(temperature) and internal sales data to explain performance. The result of such a case study is not just a
pretty dashboard, but a narrative: “Sales were growing, but a notable exception was June where revenue fell
sharply, especially in the East region. Further analysis suggests extreme heat that month coincided with the drop,
and our largest category (Electronics) saw the biggest decline in East – possibly indicating weather-sensitive
footfall or supply issues. Overall annual growth is positive, led by strong performance in other months and
regions. We recommend investigating the June anomaly and ensuring mitigation (like summer promotions or e-
commerce push in hot periods) moving forward.” By presenting both the visual dashboard and the insights,
you demonstrate technical skill and business acumen.
10
Tools used: Excel (for initial data merging and validation), Power Query (to automate data cleaning), Power
BI (for visualization), DAX (for calculations in the dashboard). This end-to-end example shows a lot of
moving parts, but each part is manageable. If you recreate such a project, you’d solidify a ton of skills – and
this could become a portfolio piece demonstrating you can turn raw sales data into actionable insights with
modern BI tools.
Problem: A marketing manager wants to understand customer purchasing behavior to target promotions
better. Specifically, they ask: “Can we segment our customers into groups based on their purchase patterns and
value? For example, identify high-value repeat customers vs. one-time buyers, or see if membership subscribers
buy more often than non-members.” They also are curious about customer lifetime value (LTV) – how much
revenue a customer brings over time – and whether enrolling in a loyalty membership correlates with
higher sales. As the analyst, you are tasked with mining the transaction database to pull out these insights.
Data: Suppose there’s a Customer table in a database (with customer ID, demographics, and a flag if they
are a member of the loyalty program) and a Sales Transactions table (each sale has customer ID, date,
amount, product, etc.). The transactions table is large (maybe 500k rows, over several years). Doing this
analysis purely in Excel would be tedious and possibly too slow, so you decide to use SQL to aggregate the
data, then bring a summary into Excel for final calculations and charts.
Approach: Use SQL queries to segment customers and compute key metrics like total revenue per
customer, number of purchases, first and last purchase dates, etc. Then classify customers based on those
metrics (either in SQL or after exporting to Excel). Finally, analyze those segments – e.g., average revenue
per customer in each segment, and how many are members vs. not.
1. Querying the Data (SQL): Start by writing a SQL query to get customer-level purchase stats. For
instance:
SELECT
c.CustomerID,
c.Name,
c.IsMember,
COUNT(DISTINCT s.TransactionID) AS NumPurchases,
SUM(s.Amount) AS TotalSpend,
MAX(s.Date) AS LastPurchaseDate,
MIN(s.Date) AS FirstPurchaseDate
FROM Customers c
JOIN SalesTransactions s ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.Name, c.IsMember;
This query joins customers with their sales and aggregates. It yields one row per customer with total
spend, number of purchases, and their first/last purchase dates. Let’s say we execute this and export
the result (say 50,000 customers) as a CSV or directly import into Excel (Power Query can also run a
SQL query and fetch data).
11
We might add more to the query: for LTV calculation, maybe we define a time range (like last 2 years) or
incorporate returns if any (assuming none for simplicity). Also, perhaps compute the average purchase value
( AVG(s.Amount) ) per customer in SQL too. But it’s fine to do some calculations in Excel after we get the
aggregated data.
1. Bringing Data into Excel: Using Power Query or a direct ODBC connection, we bring the query
results into Excel. Now we have (CustomerID, IsMember, NumPurchases, TotalSpend,
FirstPurchaseDate, LastPurchaseDate, etc.) for all customers. We add a column in Excel to calculate
each customer’s Active Years (e.g., year of last purchase minus year of first purchase + 1) – this gives
a sense of span. Another column for Average Spend per Purchase = TotalSpend / NumPurchases
(to see if they do big one-off purchases or many small ones).
2. Segmentation Logic: Now, define criteria for segments. For example, we decide:
Instead, let’s do something straightforward: categorize each customer into one of four groups: -
Champions: NumPurchases > 5 and TotalSpend > \$5k (they buy frequently and spend a lot)
- High-Spenders: TotalSpend > \$5k but NumPurchases <= 5 (few large purchases)
- Frequent Shoppers: NumPurchases > 5 but TotalSpend <= \$5k (many small purchases)
- Others: the rest (low frequency, low spend)
This is just an illustrative segmentation. In Excel, we add a column with something like:
1. Analyzing Segments: Create a few PivotTables or charts: for example, a pivot of Segment vs Count
of Customers, and average TotalSpend per segment. Suppose results show: 500 Champions, 800
High-Spenders, 1000 Frequent, 47,700 Others (just hypothetical). Champions account for a large
portion of revenue despite being few. We also analyze the IsMember flag by segment: maybe use a
PivotTable with Segment as rows and % of members in each. We find, for example, 80% of
Champions are loyalty members, but only 10% of Others are members – indicating membership
12
does correlate with more purchases (which makes sense). Another analysis: use conditional
formatting or a scatter plot in Excel: plot NumPurchases vs TotalSpend for all customers, perhaps
highlighting members vs non-members in different colors. We might see two clusters: members
(clustered higher in both axes) and non-members (many low spenders). This addresses the question
of loyalty impact: clearly, members on average spent more and shopped more often (for instance,
members averaged \$1,200 annual spend vs \$300 for non-members). We cite an actual example:
one academic project did exactly this kind of analysis using SQL – they segmented retail
customers, computed LTV, and found membership increased sales using CASE statements and
SQL logic 35 . For instance, they calculated LTV, segmented customers, and analyzed membership
impact on sales with SQL conditions (CASE) 36 . Our findings echo that: being a member is
associated with higher lifetime value.
2. Presenting Insights: We summarize: “We segmented customers into four groups. The ‘Champion’
segment (our best customers) is only ~1% of customers but contributes 20% of revenue. Most of them are
loyalty members. The ‘High-Spender’ segment makes infrequent but large purchases – perhaps big ticket
items; targeted high-end promotions might work for them. ‘Frequent’ shoppers are engaged but low
average spend – maybe they respond to frequent deals or small items; upselling could increase their value.
The majority (95%) are in ‘Others’ with low engagement – these might be one-time or occasional bargain
hunters; converting some of them to loyal repeat buyers is our growth opportunity. Importantly, loyalty
program members have an average lifetime spend 4x that of non-members, highlighting the program’s
effectiveness. We should consider strategies to convert more customers into the loyalty program or provide
incentives for repeat purchases.” Backing these statements with numbers (from our pivot analysis) and
perhaps a simple chart (like a bar chart of average spend by segment, and a pie of revenue
contribution by segment) makes it convincing.
Tools used: SQL (to efficiently crunch the transactional data), Excel (to further manipulate and visualize
segment data), possibly Excel’s PivotTables and charts for the presentation. This case shows how an analyst
uses SQL to do heavy lifting (imagine writing an equivalent in pure Excel – it would be messy and slow), then
uses Excel’s strengths (flexibility, manual inspection) to refine and present the results. It answers concrete
business questions about customer behavior and ties it to actionable insights like focusing on loyalty
programs. The combination of SQL for data prep and Excel for analysis is very common in analytics roles.
Problem: The marketing team ran an email campaign and a social media ad campaign last quarter, and
they’ve collected data on how many customers engaged and how it translated to sales. They want to know:
“Did our campaigns significantly lift sales or engagement? What patterns can we find in customer responses? And
how can we better target customers next time?” Essentially, this involves analyzing campaign data, possibly
doing an A/B test analysis or just before-after comparison. Let’s assume we have two datasets: one for the
Email Campaign (listing which customers were emailed, whether they opened/clicked, and if they ended up
making a purchase after the email) and one for Social Ads (aggregated data by day or by ad, showing
impressions, clicks, conversions). We want to derive insights like the conversion rate of each channel, any
customer segments that responded better, and also perhaps predict which customers are more likely to
engage in future (though we won’t go deep into predictive modeling here, just exploratory).
Data: For simplicity, let’s say we have a CSV of email campaign results with fields (CustomerID, Age, Region,
EmailOpened (Yes/No), EmailClicked (Yes/No), PurchaseMade (Yes/No), RevenueIfPurchased). Similarly, for
13
social ads, we might have daily data (Date, Platform, Impressions, Clicks, Purchases, Spend). The data might
not be huge – maybe 10,000 customers emailed, and summary of social ads over 90 days.
Approach: We’ll use Python (pandas) to aggregate and analyze this data because it’s well-suited for mixing
data manipulation and some statistics. We can also easily calculate rates and even do a statistical test if
needed (e.g., compare conversion of those who got email vs those who didn’t – if a control group exists).
We’ll then produce a few plots using Python libraries to visualize the funnel (e.g., how many opened vs
clicked vs purchased) and the performance over time.
1. Reading and Cleaning Data (Python): Using pandas, read the CSVs: df_email =
pandas.read_csv('email_campaign.csv') . Check for missing values
( df_email.isnull().sum() ) – maybe fill or drop if insignificant. If fields like Yes/No are present,
convert them to boolean True/False for easier analysis (pandas can map {"Yes": True, "No": False}).
Similarly read the social data: df_ads = pandas.read_csv('social_ads.csv') . Ensure data
types are correct (Date as datetime, etc.).
2. Email Campaign Analysis: With df_email, we can quickly compute overall metrics: open rate =
number of opens / total emailed. Click rate = number of clicks / total, conversion rate = number of
purchases / total (or specifically, those who purchased after email). For example:
Suppose we get open_rate = 40%, click_rate = 10%, conversion_rate = 5%. That means out of 10k
emails, 4k opened, 1k clicked, 500 purchased. That’s a funnel we should report. We can also break it
down by segment: e.g., does age or region affect these rates? Using pandas groupby:
df_email.groupby('Region')['PurchaseMade'].mean() might show that customers in the
East had an 8% conversion vs West 3% – maybe due to differing preferences or campaign resonance.
Or df_email.groupby('AgeGroup')['EmailOpened'].mean() if we bucket ages. Let’s say we
find younger customers (<30) opened less frequently (maybe they ignore emails) but if they do open,
they click more often, etc. We also check the average revenue from those who purchased:
avg_revenue = df_email[df_email['PurchaseMade']==True]
['RevenueIfPurchased'].mean() . Also, how much total revenue did the campaign bring? Sum
of RevenueIfPurchased for PurchaseMade True. This might be compared to campaign cost if known
to evaluate ROI.
If we had a control group (customers who didn’t get the email), we could compare their purchase rate in
that period to see the lift. If not, we assume all purchases recorded were campaign-influenced.
1. Social Ads Analysis: Using df_ads, calculate overall click-through rate (CTR) = total clicks / total
impressions, and conversion rate = total purchases / total clicks (or impressions). Also, maybe
compute cost per click (CPC = total spend / total clicks) and cost per acquisition (CPA = total spend /
total purchases). These are key marketing metrics. For example, if over 90 days we spent \$50k, got
14
1,000,000 impressions, 50,000 clicks, and 2,000 purchases: CTR = 5%, CPC = $1 per click, conversion
from click = 4%, CPA = \$25 per purchase. We can also see which platform (if data split by Platform,
say Facebook vs Google) performed better: df_ads.groupby('Platform')
[['Clicks','Impressions','Purchases','Spend']].sum() , then calculate CTR and CPA per
platform. We might find Facebook had higher CTR but lower conversion, etc. Also, see trends over
time: maybe the campaign started strong but fatigued – we can plot
df_ads.groupby('Date').sum() and see purchases by date. Using Matplotlib, we plot Date vs
Purchases to visualize if there’s a spike when emails were sent or ads refreshed.
2. Visualization (Python): We create a few charts. For example, an email funnel chart: a simple bar
chart showing # of Opens, # of Clicks, # of Purchases out of 10k emailed. This visualizes drop-off. In
code, we can do:
We might get something like 【Images not literally here, but imagine a bar chart】 where Emailed =
10000, Opened = 4000, Clicked = 1000, Purchased = 500. Next, perhaps a pie or bar chart comparing
conversion rate by Region or AgeGroup to highlight any differences (if, say, East region converted 2x
West, that’s noteworthy). Also, a time series line chart for social ad conversions per week to see if
certain weeks performed better – perhaps annotate if new content was introduced.
If we want to get fancy (optional), we could even use a statistical test (like chi-square) to see if differences in
conversion by segment are significant or if the email vs no-email difference is significant (if we had a
control).
1. Insights & Recommendations: Our analysis might show: The email campaign yielded a 5%
conversion which is decent. Most purchases came from older customers (say age 40+ had 7%
conversion vs under 30 had 2%). This suggests maybe email is more effective with an older
demographic; younger consumers might respond better to other channels (perhaps social). Region-
wise, East responded more – maybe because content was more relevant to them or timing matched
their timezone better, etc. For social ads, we find Facebook ads had a lower CPA than Google ads,
implying better ROI on Facebook for this product. Or we see that after an initial boost, ad
performance dropped – recommending refreshing creative every 4-6 weeks to avoid ad fatigue.
We could combine these to say: “The email campaign engaged 50% of recipients (open or click) and converted
5% to sales, generating \$X revenue (ROI of Y% given campaign cost). The highest engagement was among 40-50
age group and East region. We recommend tailoring future email content to younger audiences or using channels
like social media for them, as they were under-engaged via email. The social media ads drove an additional Z
purchases at a cost per acquisition of \$25. Facebook outperformed Google in CTR (6% vs 4%) and CPA (\$20 vs \
$30), so we should allocate more budget to Facebook next time. Overall, combining email and social, the
15
campaign lifted sales by an estimated 10% that quarter. For next quarter, we plan to A/B test different email
subject lines for younger vs older customers, and refresh ad creatives every month to maintain engagement.”
By doing this in Python, we not only got the numbers, but we automated the analysis steps. If next
campaign data comes in, we can run the same notebook, and it will spit out updated metrics and charts.
This is powerful in an interview or portfolio context: you could show a Jupyter Notebook with narrative,
code, and charts embedded, demonstrating your ability to derive insights programmatically.
Tools used: Python with pandas and matplotlib for analysis and visualization. We leveraged Python’s
strength in handling data and doing quick calculations (like conversion rates) across large datasets that
Excel might struggle with or require a lot of manual formula work. This case highlights that for more
complex analysis (mixing data sources, computing rates, slicing by multiple factors), a programming
approach can be more efficient. It’s also a taste of “data science lite” – we did descriptive analytics and a bit
of diagnostic (why differences by group) and maybe a hint of predictive (which group likely to respond).
These case studies show how the tools come together to solve business problems: Excel/Power BI for
creating sharable dashboards, SQL for heavy data lifting, and Python for flexible analysis and automation.
They also illustrate how to communicate findings – an essential part of an analyst’s role. As you learn, try
replicating parts of these: e.g., take a sample dataset and build a mini-dashboard, or write a simple SQL
query and see if you can load the result into Excel. Hands-on practice with case-like scenarios is the best
way to cement these skills.
16
• Text Functions: CONCATENATE(text1, text2, ...) or TEXTJOIN – combine strings (useful
for creating IDs or labels). LEFT , RIGHT , MID – extract substrings. TRIM (remove extra spaces),
PROPER/UPPER/LOWER (change text case).
• Date Functions: TODAY() gives current date, YEAR(A1) , MONTH() , etc., to extract parts of
dates. EDATE(start_date, months) to add months, DATEDIF(start, end, "Y") to get
difference in years (or "D", "M"). Useful for age or tenure calculations.
• PivotTables: (feature) Quickly summarize data by dragging fields into Rows, Columns, Values (e.g.,
sum of Sales by Region by Quarter). It’s not a formula but essential – know how to create a
PivotTable and use Value Field Settings to choose sum, count, etc. PivotTables can also show % of
total, % difference from previous, and so on – great for quick analysis.
• Charts: Know how to create basic chart types: Column/Bar (for comparisons), Line (trends over time),
Pie (share of whole – use sparingly and only for few categories), Scatter (relationship between two
numeric variables). Also understand when to use which (e.g., don’t use pie for too many slices or for
precise comparisons).
• Data Cleaning Tools: Remove Duplicates (button on Data tab), Text-to-Columns (split one column
into multiple by a delimiter), Sorting & Filtering (to quickly find outliers or blanks). These are not
formulas but handy features.
• Professional Tips: Use Tables (Ctrl+T in Excel) to make your data ranges dynamic – formulas and
pivots update as data grows. Named ranges can make formulas easier to read. If working with large
data, learn about Excel’s limitations (~1 million rows; consider Power Pivot or Power BI for bigger
data). Also, save versions – Excel is prone to human error, so keeping backup copies is wise.
(Excel Top Functions: Average, Sum, Count, IF, VLOOKUP, etc., are among the most important to master 37 .
PivotTables and charts turn those numbers into insights.)
17
• Refreshing: Remember to click “Refresh” in Excel (or refresh in Power BI) to rerun the query. If
connecting to external data, ensure connection info is correct. You can set queries to refresh on file
open as well.
• Data Types: Check the data type icon in Power Query (123 for number, ABC for text, etc.). Make sure
numbers are number type, dates are date type. Mismatched types can cause errors or wrong sorts.
• Power Query in Power BI: It’s almost identical. In Power BI, go to Power Query Editor to shape data
before it lands in the model. In Power BI, after shaping, you usually “Close & Apply” to load into the
data model.
• When to use: Use Power Query when you have to repeat a data cleaning process or combine
multiple data sources. It shines for automating those mundane steps and handling larger data than
Excel formulas could.
• Data Modeling: In the model view, define relationships (usually Power BI auto-detects based on
column names). Know the difference between a fact table (e.g., Sales transactions) and dimension
tables (e.g., Products, Customers) and use one-to-many relationships properly (a common mistake is
not having a proper unique key on one side). If something doesn’t aggregate as expected, check
relationship directions and filtering (typically one-direction from dimension to fact).
• Basic DAX Measures:
◦ SUM(Column) – sums up a numeric column (used for things like Total Sales =
SUM(Sales[Amount]) 38 .
◦ COUNTROWS(Table) or COUNT(Column) – counts entries (COUNT vs COUNTROWS
differences aside, just know how to count). DISTINCTCOUNT(Column) – count unique
values (e.g., number of customers).
◦ CALCULATE(Expression, filters...) – this is the workhorse to modify context. E.g.,
CALCULATE([Total Sales], Year(Calendar[Date]) = 2023) gives you 2023 sales
regardless of what’s filtered in the visual. Often used with filter functions like
CALCULATE([Total Sales], ALL(Customer)) to get total ignoring customer filters (for
% of total calcs), or with time intel: CALCULATE([Total Sales],
DATEADD(Calendar[Date], -1, YEAR)) gives last year’s sales for same period 23 .
◦ Time Intelligence (requires a date table): TOTALYTD([Measure], Calendar[Date]) –
year-to-date sum; SAMEPERIODLASTYEAR(Calendar[Date]) used inside CALCULATE to
get last year same period; DATEADD as mentioned for shifting. If using a built-in date
hierarchy, some of these just work via quick measures.
◦ IF in DAX: You can use it similar to Excel, e.g., IF([Total Sales] > 100000, "High",
"Low") but usually measures output numbers, not text categories (text measures don’t slice
well). More often, you might create a calculated column if you need a category label like this
(but be cautious – too many calc columns can bloat model; measures are preferred for calcs).
◦ Common DAX Functions: AVERAGE , MIN , MAX (for measures if needed), DIVIDE(x,y)
– safer division (avoids divide-by-zero errors by returning blank or alternate result),
FILTER(table, condition) – used inside CALCULATE or other functions to filter a table
(e.g., CALCULATE([Total Sales], FILTER(Product, Product[Category]="Electronics")) ).
RELATED(column) – in a calc column, fetch a value from a related table (like Excel
VLOOKUP but within model). ALL(table or column) – removes filters (for calculations
18
like % of All). ALLSELECTED(column) – similar but respect outer filters (for % of subtotal
scenarios). These are more advanced, but worth noting.
◦ DAX is very powerful (with functions for rolling averages, ranking, etc.) but start simple. If in
doubt, do a quick measure or create intermediate measures step by step (like first measure =
total, second = last year total, third = diff).
• Visuals and Design:
◦ Use a Card for single big number (e.g., Total Sales, KPIs).
◦ Clustered Bar/Column charts for comparing categories (e.g., sales by region, by product).
Clustered means side-by-side bars if multiple series, Stacked means one on top of other
(stacked is for part-to-whole over categories or to show composition over time).
◦ Line charts for trends over time. You can combine line and column (e.g., column for sales,
line for % growth – dual-axis).
◦ Pie/Donut charts to show parts of whole (limit to few slices and label percent – e.g., market
share by 5 products). These are often overused; bar might be better unless highlighting a
single percentage.
◦ Matrix (like a pivot table) for showing values by two categories (e.g., region and product).
◦ Map visuals for geographic data (filled map to color regions or bubble map to size by value –
be sure data has proper geo fields like country, state names).
◦ Slicers: Add slicer for any field you want the user to filter by (e.g., Year, Region). Slicers can be
dropdowns, lists, or even a slider for continuous data like date or number.
◦ Interactive Features: Enable tooltips (hover to see details), use drill-down in hierarchies (e.g.,
year -> quarter -> month), use Bookmarks and Buttons if you want to create custom
navigation or highlight scenarios.
◦ Formatting: Give titles to charts, label axes, maybe turn on data labels if it helps readability
(but not on every chart as it can clutter). Use consistent colors (e.g., the same color for the
same category across charts by using a custom palette or theme).
◦ Publishing/Sharing: In Power BI Service, you can publish the report and pin visuals to a
dashboard, share with colleagues. Know that free tier requires uploading to Power BI web
and is mostly for personal use unless your org has Pro licenses.
• DAX vs Power Query: Use Power Query to shape data before loading; use DAX for calculations that
need to respond to user filters or be easily adjusted in visuals. For example, computing a new “Profit”
column might be done in Power Query (since it’s a straightforward row-by-row calc = revenue-cost).
But computing “% of total sales by region” is best done in DAX (because it’s context-dependent in a
visual).
• Learning tip: Many DAX problems have patterns (like “top N with others” or “moving average” etc.).
There are great online resources (like Microsoft docs or community blogs) that provide formula
patterns. Understanding filter context is key: e.g., a measure calculates considering the filters on
rows, columns, slicers in that report context 39 . If a measure isn’t behaving, think about what filters
are active and if you need CALCULATE to modify that.
• SELECT & FROM: The core of any query. Example: SELECT FirstName, LastName, Age FROM
Customers; retrieves three columns from the Customers table. The result set will have as many
rows as the table (unless we filter or aggregate).
19
• WHERE: Filters rows based on a condition.
SELECT * FROM Sales WHERE Region = 'East' AND Year = 2024; – gets all columns for
sales in East region in 2024. Use quotes for text, no quotes for numeric. Common operators:
=, >, <, <> (not equal), IN (list), BETWEEN (range), LIKE (pattern matching) –
e.g., WHERE Name LIKE 'A%' (names starting with A). Note: SQL string matching is often case-
insensitive depending on collation. Use AND / OR to combine conditions (remember order: ANDs
execute before ORs, so use parentheses if mixing).
• GROUP BY & Aggregates: To summarize data. Must list non-aggregated columns in GROUP BY. E.g.,
SELECT Region, SUM(Amount) as TotalSales FROM Sales GROUP BY Region; gives one
row per region 27 . Aggregates: SUM, COUNT, AVG, MIN, MAX . Use COUNT(*) to count rows,
or COUNT(DISTINCT CustomerID) to count unique customers (distinct is supported in many SQL
dialects).
• HAVING: Filter on aggregated conditions (since WHERE can’t use aggregate results). For example,
... GROUP BY Region HAVING SUM(Amount) > 100000; would return only regions with total
sales > 100k.
• ORDER BY: Sort the results. E.g., ORDER BY TotalSales DESC to sort by that alias descending. If
multiple columns, e.g., ORDER BY Region ASC, Year DESC .
• JOINS: Combine data from multiple tables. Types: INNER JOIN (only matching rows in both),
LEFT JOIN (all rows from left table and matching from right, else nulls for right), RIGHT JOIN
(vice versa, less common if you can flip tables), FULL JOIN (all rows from both, matching where
possible – not all databases use FULL). Syntax:
Here c and o are aliases for tables. The ON clause specifies how to match rows (common keys). If
you need to join multiple tables, just chain multiple JOIN clauses. Ensure not to accidentally create a
Cartesian join (missing or wrong ON can produce huge result).
• Subqueries: A query inside another. For example, SELECT * FROM Orders WHERE CustomerID
IN (SELECT CustomerID FROM Customers WHERE Country='USA'); . That inner query finds
all customers in USA, and outer returns orders of those customers. Subqueries can also be used to
compute something and select from it as if it’s a table (common table expressions or derived tables).
• Common Functions: Most SQL flavors have date functions (YEAR(date), DATEADD, DATEDIFF), string
functions (SUBSTR/LEFT, UPPER, TRIM), etc. For example, you might use
CONCAT(FirstName, ' ', LastName) to combine name, or CASE expressions to do
conditional logic in SELECT (like categorize values). A simple CASE:
SELECT OrderID,
CASE
WHEN Amount > 1000 THEN 'High'
WHEN Amount > 500 THEN 'Medium'
ELSE 'Low'
20
END as OrderSize
FROM Orders;
This will label each order. This is analogous to Excel’s nested IF or a calculated column in DAX. The
CASE statement is very useful for bucketing in SQL (like we used in the customer segmentation via
SQL example 36 ).
• Comments: In SQL, -- starts a single-line comment. Good to annotate complex queries.
• Formatting: It’s convention to write SQL keywords (SELECT, WHERE, JOIN) in all-caps and each clause
on new line for readability. Use aliases for tables (as FROM TableName AS t ) to shorten
references.
• Getting Help: If you’re unsure of a query, start small: Select from one table first, then add join, then
add where, etc., stepwise. Use LIMIT (or TOP in T-SQL) to preview few rows. In interview scenarios,
explaining your intended approach is as important as writing perfect syntax. If you recall the main
clauses and their order (SELECT -> FROM -> JOINs -> WHERE -> GROUP BY -> HAVING -> ORDER BY),
you’ll do fine.
Remember: SQL is about asking questions of the data in a structured way. With practice, writing queries
becomes like forming sentences. Also, as a business analyst, you’ll often use tools that let you input SQL
(like a SQL editor or in Power BI as a source) – so having a cheat sheet like this to recall exact syntax is
handy.
for x in [1,2,3]:
print(x*x)
21
◦ Merging/Joining: pd.merge(df1, df2, how='left', on='CustomerID') similar to SQL
join on key.
◦ Dealing with missing: df.dropna() removes rows with any NaN, or df.fillna(0)
replaces NaNs with 0 (or specific value or method like forward-fill).
◦ Date handling: If your dates are strings, do df['Date'] =
pd.to_datetime(df['Date']) . Then you can do df['Year'] = df.Date.dt.year ,
etc. Resample time series by frequency (like df.resample('M', on='Date').sum() for monthly
total, if Date set as index or given).
◦ pandas vs Excel: The mindset shift is you do column operations and vectorized operations
instead of cell-by-cell. It’s very powerful for large data. For example, filtering 100k rows in
Excel might be slow or manual, but one line in pandas does it. Also, merging data sources
(like VLOOKUP) is trivial with pd.merge.
• NumPy: Often used under the hood by pandas. If you need fast array math (like element-wise
operations or large matrix calcs), use numpy arrays. But for analysis, stick with DataFrames mostly.
• Matplotlib/Seaborn (plots):
◦ Basic plotting with pandas: df['Sales'].hist() for a histogram, or
df.plot(x='Month', y='Sales', kind='line') . But often you’ll use matplotlib for
more control. Example:
This will create a line chart of sales by month, rotate x labels for readability.
◦ Seaborn is a higher-level plotting library that integrates with pandas nicely. For example:
automatically computes mean sales by region (unless you specify estimator) and plots with
error bars. Seaborn also has nice scatter plots with regression lines (sns.regplot), boxplots for
distributions (sns.boxplot), etc.
◦ One powerful Python feature: you can create notebooks that mix code and narrative (e.g.,
Jupyter Notebook). This is great for presenting an analysis step-by-step with visualizations
inline.
• Libraries for advanced tasks: If you venture further: statsmodels for statistical tests or
regression (e.g., running a quick linear regression or t-test on campaign results), scikit-learn
for predictive modeling (like training a simple model to predict which customers are likely to churn
or respond, using logistic regression or decision trees), but those require a bit more learning. As a
business analyst, you don’t always need to build models, but familiarity is a bonus.
• Automation: Python can schedule scripts (with Windows Task Scheduler or cron jobs) to
automatically pull data and generate reports. It can also interface with APIs (e.g., pull data from
22
Google Analytics API or Twitter). This can set you apart – e.g., “I wrote a Python script to fetch daily
website analytics and email a summary report.” That shows initiative.
• Common Gotchas: Indentation errors (make sure code blocks align properly). Using the correct
environment (e.g., installing pandas via pip and importing correctly). Knowing that indexing starts at
0 (the first element of a list is list[0]). Also, Python is case-sensitive. When merging, ensure key fields
are same dtype (both int or both string, etc.).
• Why Python: It’s versatile – once data is in pandas, you can do anything: group, merge, pivot (using
groupby or pd.pivot_table), and beyond. And if Excel would require manual steps for each new data
drop, Python can automate it. Realize that Python doesn’t have a GUI like Excel; it’s code, but there
are user-friendly tools (like writing formulas in pandas is not too far from Excel logic). Over time,
writing a loop or comprehension feels natural.
Keep this cheat sheet as a quick reference. In a real job or interview, you might not recall exact syntax for
every function, but you should know that something is possible and roughly how. For example, you might not
recall the exact order of parameters for VLOOKUP or the exact DAX formula for year-over-year, but you
know VLOOKUP can fetch data and DAX can do year-over-year – the specifics you can quickly look up or
derive. As you practice, the most common ones will stick in memory. Consider printing this out or making
your own notes from it, and try to use each item at least once in practice to solidify it.
1. Build Knowledge through Projects: Don’t just read or watch tutorials – actively work on small projects.
For example, take a public dataset (Kaggle has many) relevant to sales or marketing – like a retail sales
dataset or an A/B test result – and go through the analysis process: clean it (Power Query or Python),
analyze it (maybe pivot in Excel or SQL queries), then present it (Power BI or even just clear Excel charts).
Each project can focus on a different tool or combination. The case studies we described can be inspiration.
For instance, you could simulate a sales dashboard by taking sample sales data, setting targets, and
creating a Power BI report. Or do a customer analysis by taking a dataset of transactions and using SQL to
find top customers and purchase patterns. These projects not only teach you the tools, but also yield
something tangible to discuss or show employers (especially if you can publish a Power BI report to the web
or share code via GitHub).
2. Mix Conceptual Learning with Hands-On: As you learn a concept (say, DAX measures or a SQL join),
immediately apply it on a dataset. If you learn about DAX CALCULATE , try creating a measure that uses
CALCULATE to filter something (like CALCULATE(Total Sales, Region="East")). If you read about SQL joins,
create two small tables in a spreadsheet and mentally or actually write out join queries to see the result.
This helps cement understanding. Many online courses (e.g., Microsoft Learn, DataCamp, Coursera) have
guided exercises – those can be great to follow. Just ensure you don’t only follow steps blindly; think about
why you do each step.
3. Leverage Your MBA Domain Knowledge: Your background in sales and marketing is an asset. Use it to
come up with analyses that are meaningful. For example, because you understand marketing funnels, you
can better design an email campaign analysis. Or your knowledge of sales KPIs will guide which metrics to
23
include in a dashboard (e.g., pipeline conversion rates, customer acquisition cost, etc.). Emphasize this in
interviews: you’re not just a tech person, you also get the business side. Many companies value an analyst
who can bridge business questions with data answers. So when practicing, always tie the technical results
to a narrative: what does this number mean for the business? – just like we did in case studies.
4. Common Interview Topics & Questions: Be prepared to answer both technical and conceptual
questions. Here are some frequent ones for data/business analyst roles and how to tackle them:
• “Which tools have you used for data analysis? Can you describe a project where you used [Excel/SQL/Power
BI/Python]?” – This is your chance to talk about the projects you’ve practiced. For Excel, maybe
mention how you built a complex model with pivot tables and VLOOKUP. For SQL, mention writing
queries to extract insights (maybe refer to the segmentation analysis we did: “I wrote SQL queries to
segment customers by purchase behavior, using GROUP BY and CASE statements 36 , which helped
marketing target high-value clients.”). For Power BI, describe the dashboard you built and what
insights it showed 34 . Always frame it in terms of the problem and solution, not just “I know X”.
Employers care that you can apply the skill to help the company.
• “How do you ensure accuracy in your analysis?” – They want to see attention to detail. You can say you
always do data cleaning (e.g., remove duplicates, handle missing values), cross-verify numbers
(reconcile totals from different sources, sanity-check results with simple logic). Mention version
control for data (keeping backups of raw data, so you can trace back if something seems off). If
using Excel, you might double-check formulas or use Excel’s auditing features; if using Python,
maybe write tests or compare small sample outputs to manual calc. Also mention business intuition:
e.g., “If my analysis said sales doubled overnight, I would be skeptical and re-check data collection,
because I tie results to realistic expectations.”
• “Give an example of a difficult data analysis problem you solved and how.” – Think of a scenario (could
be hypothetical from practice if you lack real job examples). Maybe the data was very messy
(multiple sources) and you used Power Query to consolidate and clean, then found an insight that
wasn’t obvious. Or you had to convince a non-technical stakeholder of something, so you created a
clear visualization in Power BI to make the point. For instance: “In a project, I analyzed 1 year of sales
data across 3 regions from two different systems. The challenge was different data formats and some
missing entries. I used Power Query to standardize and merge the datasets, then built a pivot analysis to
identify that one region’s sales were lagging specifically in Q3. Initially, people thought it was a year-round
issue, but the data pinpointed it to a specific quarter – which turned out to coincide with a manager
departure in that region. This guided the company to focus on stabilizing Q3 processes.” – This story
shows you can wrangle data and draw a useful conclusion.
• “Explain [a specific concept] in simple terms.” They might pick something from your resume or
something like “What is a JOIN?” or “What does PivotTable do?” or “What is the difference between a
clustered and stacked bar chart?” They check your fundamental understanding and communication.
So be ready to explain key concepts plainly:
• For SQL JOIN: “A JOIN in SQL is how we combine data from two tables based on a common field. For
example, we have a customer table and an orders table – a JOIN can attach customer info to each order
using the Customer ID, so we get a combined view.” Maybe mention types: “An inner join only keeps
records that match in both; a left join keeps all records from the left even if no match on right, which we
24
use when, say, we want all customers and their orders, including those who had no orders – those would
show up with nulls.”
• For a PivotTable: “It’s an Excel feature that quickly summarizes data. I can take a list of sales transactions
and pivot it to show total sales by region and month in just a few clicks, instead of writing formulas. It’s
great for quick analysis and you can drag fields around to explore different breakdowns.”
• Visualization example: “A clustered bar chart compares values side by side for different categories (e.g.,
sales of Product A vs B in 2024 and 2025 side by side for each year), whereas a stacked bar stacks them to
show a combined total and each category’s part. Clustered is better to compare category values, stacked is
better to see overall total and composition.”
The key is to show you understand and can articulate it to someone who might not know. Practice
explaining to a friend from a non-technical field.
• “How would you handle a request for analysis from a stakeholder?” – Here they assess your business
sense and communication. A good approach: “First, I’d clarify the question behind the request – what
decision are they trying to make or what problem are they addressing? Sometimes stakeholders ask for a
specific report, but understanding why helps me possibly provide a better solution or find the right data.
Then, I would assess what data is needed and available. I’d gather the data, ensure its quality, and
perform the analysis (describe whether you’d use Excel, SQL, etc., depending on scenario). I’d double-check
the findings (validate with a quick cross-check or ensure logic is sound). Then I’d present the results in a
clear way – often with a visual or a concise summary. I’d also be open about limitations or assumptions.
Finally, I’d be ready to answer follow-up questions or do additional breakdowns if they need.” This shows
you don’t just dive in blind; you use a thoughtful process.
• “Tell me about a time you had to convince someone of an insight from your analysis.” – Behavioral Q. If
you have a real example from any project or coursework, use it. If not, you can use a scenario from
the projects you practiced (frame it as hypothetical: “I worked on a project where...”). Emphasize how
you communicated: “I noticed in the data that our online channel was outperforming in growth but was
under-resourced compared to retail. I had to convince the sales director to allocate more budget online. I
prepared a one-page report with a clear chart showing the trend and a projection of what extra
investment could yield. Initially they were skeptical because traditionally retail was king, but by showing
the numbers and explaining the opportunity cost (supported by data), they understood and agreed to a
pilot increase for online marketing. That resulted in 15% incremental sales the next quarter.” – This kind of
story shows you can not only find insights but also influence decisions, a valuable trait.
25
insights on membership impact on sales which guided loyalty program enhancements.” These phrases
show you didn’t just learn a tool, you applied it meaningfully.
6. Portfolio Projects on Resume/LinkedIn: If you’ve created some good projects, consider publishing
them. For example, you can upload a Power BI report to web (there’s a public publish to web feature for
non-sensitive data) and share the link. Or put code and write-up on a GitHub repository or a personal blog.
Then, on your resume or LinkedIn, you can provide those links. Employers often appreciate seeing a sample
of your work. It demonstrates initiative and also gives them proof of skill. Even screenshots of a dashboard
in a PDF portfolio can help during an interview to walk through your thought process. However, make sure
anything you share publicly does not violate any data privacy (stick to public or dummy data, or anonymize
it).
7. Interview Mindset: In addition to technical prep, remember the soft skills. Show enthusiasm for analysis
– e.g., say how you enjoy finding stories in data. Emphasize your ability to learn: technology changes, so
highlight that you’re continuously learning (maybe mention the latest thing you learned, like a new Power
BI feature or a Python library, to show you keep up-to-date). Being from a non-tech background, turn that
into a strength: “My MBA and sales experience help me understand the business context, so I always make sure
my analysis is aligned with business goals – the tools then help me deliver the message effectively.” This can ease
any concern about not having a CS degree, etc. Many great analysts come from business backgrounds and
picked up the tech – it can actually make them better at focusing on relevant insights rather than getting
lost in technical details.
Finally, stay updated. The data field evolves – new Power BI updates, new Excel functions (like XLOOKUP),
etc., appear. A good practice is to follow a few blogs or LinkedIn groups related to these tools. For example,
Microsoft’s Power BI blog for monthly updates, or forums like Reddit (r/Excel, r/PowerBI, r/SQL) where
people discuss problems and solutions. This keeps your knowledge fresh and might provide great talking
points (“I saw a new Excel function that does X, which could be handy for Y…” shows you’re genuinely
interested).
1. Sales Dashboard & Forecast: Using a dataset (real or simulated) of past sales, create a Power BI (or
Excel) dashboard as we described. Include year-over-year comparisons, top products, and maybe a
simple forecast for next period (could be as basic as average of last 3 periods or using Excel’s
FORECAST function). Business Impact: This shows you can deliver management dashboards that
inform strategy and highlight trends. On a resume: “Built an interactive Sales Dashboard that
highlighted a 10% YoY dip in Q3 sales, enabling the team to identify and address regional
performance issues.”
26
2. Marketing Campaign Analysis: Similar to the case study, take email or digital marketing data. If
you don’t have actual data, you can find sample datasets for marketing (Kaggle has datasets like an
Email campaign or Google Analytics sample). Perform analysis to compute conversion rates, ROI,
and recommendations. Possibly use Python for data crunching and create a presentation or report.
Business Impact: Demonstrates ability to evaluate marketing effectiveness and guide marketing
spend. Resume angle: “Analyzed a marketing campaign using Python/pandas – calculated a 5%
conversion rate and identified higher engagement from customers aged 30-40, informing a more
targeted campaign strategy that could improve ROI by focusing on responsive segments.”
5. Financial Analysis Project: If you have interest, analyze a company’s public financial data (e.g., from
Yahoo Finance or EDGAR). Could be as simple as pulling two years of quarterly revenue and profit,
then visualizing trends and doing ratio analysis (profit margin, growth rates). Or a budget vs actual
spending analysis if you can simulate data. Business Impact: Illustrates data-driven financial insight,
relevant if you target roles overlapping finance. Resume: “Using Power BI, analyzed Company X’s
financial performance over 8 quarters, visualizing trends in revenue growth and margin. Detected a
declining gross margin trend (down 5% over 2 years) and identified increased COGS in two key
product lines as the cause, informing management’s cost optimization initiative.”
6. Web Analytics & Funnel Analysis: If web data interests you, Google offers sample Google Analytics
data or you can use your own site if available. Analyze visitor stats, conversion funnel on a website
(visitors -> signups -> purchases). Maybe use SQL or Python to manipulate log data. Business
Impact: Understanding user behavior online is valuable for e-commerce or tech companies.
Resume: “Examined a website funnel using Google Analytics data – found that while product page
views were high, only 1% of those views converted to add-to-cart. Conducted analysis in Excel/
Python revealing a drop-off at the shipping info step, leading to a recommendation to simplify
checkout, potentially improving conversion by up to 15%.”
27
When listing these projects on a resume, treat them like job entries if you lack actual job experience –
possibly under a heading “Analytical Projects” or similar. Use action verbs and mention tools: e.g.,
“Developed a [Excel/Power BI] dashboard...”, “Implemented SQL queries to ...”, “Utilized Python to ...”, and
crucially, “resulting in...” or “which led to...” something tangible (insight, time saved, predicted
improvement, etc.). Even if the result is hypothetical (since in a project you might not have a company to
implement the change), you can frame it as an insight that would lead to some action or benefit.
Also mention domain context – e.g., say “sales dashboard for a retail company” or “email campaign analysis
for a marketing initiative” to contextualize.
On LinkedIn, you can write articles or posts about your project process. Or upload media (images of your
dashboard or link to it). This signals your proactive learning.
Presenting on a Resume vs. Interview: On paper, keep it concise (one or two lines per project highlighting
key tech and finding). In an interview, be ready to dive deeper: they might ask “how exactly did you do X?” –
then you can explain step-by-step. For instance, if interviewer is curious about your mention of SQL: “You
said you used SQL to analyze customer LTV – what was your approach?” Then you can describe the query you
wrote with a GROUP BY CustomerID to sum their revenue and maybe a CASE to categorize
membership 36 , etc. This proves you really did it (or at least know how to do it).
Confidence and Continuous Learning: Finally, convey confidence tempered with humility. You have
acquired a lot in a short time – that’s impressive. Show you’re excited to keep learning on the job (“Data field
is always evolving and I enjoy keeping up – for example, I recently learned a new Excel function that...”). If
asked something you don’t know, it’s fine to say “I haven’t used that yet, but I’m familiar with the concept
and I’m sure I could pick it up quickly – for instance, I learned DAX on my own in a few weeks by building a
project, so I’m confident I can adapt to new tools as needed.” This assures them you have the fundamental
aptitude.
28
12 13 14 15 16 17 19 20 Overview of Power BI: What is Power BI? History, Evolution, and Importance in
26 27 History of SQL
https://docs.oracle.com/cd/B13789_01/server.101/b10759/intro001.htm
34 35 36 41 42 Amrit_Das_Resume.pdf
file://file-2TDBrRNX79BStdTLin88Fc
29