KEMBAR78
Transcript | PDF | Business Intelligence | Performance Indicator
0% found this document useful (0 votes)
25 views39 pages

Transcript

Uploaded by

puchaako
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views39 pages

Transcript

Uploaded by

puchaako
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

- One of the challenges we face as we decide to pursue data as a career choice is the fact that there are many

different paths and specializations. Let's define some of those roles, and then discuss the common skills that are
shared among all. The most universal role is that of the data worker. This person consumes data regularly. Works
with data often. Performs some data manipulation, and presents that data as a part of their everyday work. Let's
take Sally for an example. She works in the business unit, and not necessarily the IT department, and each week
she prepares a report for her manager. She'll then prepare the data for the reports, and her reports are the same
as last week, the only difference is new data. You see most data workers have limited access to all the different
systems through the backend. They likely receive data from people who have access to the databases. Data
workers like Sally may even export the data out of a system into CSV or Excel files, and the process of their data
work begins. A data analyst goes further. Generally, they might have a little more access to data, model the data,
and have connected to the data, so they can just refresh the reports, and begin the analysis, and the presentation
of the data. The data analyst will handle a lot of ad hoc requests, and especially if they're efficient. They will likely
have more than just Excel to work with and are likely considered a guru or a wizard in their department. The data
worker and the data analyst are what I consider the largest amount of roles available. Most people are some form
of data worker, and they dive into data analysis more than they even know. The common skills of all data
professionals are gathering data. manipulating it to me requirements, and then reporting the outcomes in some
way. Data engineers take a special skill of being able to build and design data sets. Whereas a data worker, and the
data analysts will work with what is already built, and model the data as needed. You will find a lot of people in
crossover roles where sometimes they play data engineers, and sometimes they act as a data analyst. One could
argue on the top of the hierarchy of data roles are the data architect, and the data scientist. A data architect is a
creator of architecture, no different than an architect that designs a building. The data architect designs data
systems. The importance of the architecture really can't be understated as all roles including the data scientist
needs this architecture. What most see as the literal top of the hierarchy is the data scientist. And I believe this is
likely due to the fact that most companies have their data architecture in place, and now it's time to take that data,
and put it to use. This is where the data scientist comes into play. A data scientist will likely have all the common
skills of the data analyst, the data engineer and the data architect. They'll also have deeper skills in coding,
statistics and math. It's okay to not know where you'll end on your journey, but I think it's important to start. You
can begin either as a data worker or recognizing you already are one. You can increase your skills as a data analyst,
and then as you grow deeper in your experience with data, you'll discover where you want to be. In all roles, you'll
gain a deeper understanding of data, and it's okay to find a place and stay there.

- Is your organization data literate, data fluent, or none of the above? Let's break these terms down. Data literate
means that you could read it, converse about it, and understand it. Let me give you an example, your bank
account. It's all about your finances, right? And if you really look at it, it's just all your data through transactions.
Can you read your balance? Can you tell me when something is there that shouldn't be? And if it is, can you call the
bank and explain it? That means you're literate about your banking data. Now to the meaning of fluent. Fluent
means you can create something with it that shows skills outside of just being able to read it and use it. We know
people who speak other languages. They are either literate or fluent. Someone who is literate can, again, pick up
on the common things with the language and speak in simple sentences, but a person who is fluent can carry on
conversations and author stories in that language. Just like these terms apply to language, they apply to data. Let's
go back to our banking example. If you are fluent with data, you can then turn your banking data from last year
into insights that will allow you to build a budgeting system and a finance tracking system. To really build your data
skills, you must begin to think about how that data skill applies to your everyday life. If you are data fluent, and at
work someone hands you information, and it's the first time you've ever seen it, you will have an approach that
lets you learn that data, and you will have questions that seem natural for you to ask. Approach is everything, and
building an approach or identifying it can start today, right now even. Start thinking about every time you have a
new data set in front of you, what do you do? That's your approach. If your approach is to stare at it and wonder,
well, that tells you where to begin. Now there are degrees of data literacy and data fluency that are appropriate in
the workplace. And I would argue that everyone should be data literate. The ability to read, speak, listen, and
understand the data, or at least the data that applies to them. Could be time sheets or even paychecks. An
organization that only has a small percentage of data fluent people means they do not have enough people to do
the exploring and building that might just be the tool that takes their company to the next level. Becoming data
literate and then transitioning to data fluent can be a game changer in your career. You can go from reading and
basic understanding to producing insight and data tools for your organization or maybe for yourself.
Understanding how data governance impacts the data analyst

Selecting transcript lines in this section will navigate to timestamp in the video
- Have you ever asked permission to gain access to data and been denied? Have you ever asked for
permissions and just been given global admin when all you needed was read permissions? If so, then
you've been a part of the data governance of the organization, or the lack of it. Data governance is a
framework that incorporates strategies to create solid quality data, enable accountability, and provide
transparency to the data in the organization. Data governance has processes, procedures, and people at
various levels of the organization. It's meant to control every aspect of the data in the organization. Data
governance can support quality of data, accountability, trust, and compliance. There is some form of data
governance in every organization at every size and level. If you work in a regulated industry, then data
governance will likely be more mature than other industries. I have worked in almost every size industry,
regulated or not regulated, and I'm either at the mercy of the data governance or protecting myself for the
lack of it. Here are some common components of data governance that directly impacts the data analyst.
Access to information, how you can access it. There is typically a chain of command and the data analyst
is rarely meant to be the top of it. If you need access to information, there is someone that you will
request permission to gain access to the information like your manager. Once they hear the request, they
will typically instruct you to contact the next person responsible, or they will contact them on your behalf.
I once requested access to the back end of a system to my manager. He then sent the request to the
technology department who then between the two parties agreed I could have it. Little did I know it
would go to a third person to implement it and notify me. The third person was the person in the cube to
my left. I ate lunch with him every day. It was very controlled, and I did not understand it at the time, but
now I have an appreciation of it. As a data analyst, we seek the source of truth, the golden record, and
data governance is a part of providing that. We want to make sure there's an identifiable truth and that we
can trust what we're working with. When we do not have at least two or three of these components to
work with, we'll deal with challenges. For example, you may have been given more access than you need,
and it might leave you wondering which data set you could really trust. Master data management is also a
key component of the data governance framework. Making sure that the data we all need is complete,
accurate, and meets the business rules. This is one area where organizations that do not have a strong data
governance plan or strategy will have suffering data analysts. You may find yourself always correcting
something as simple as a product name that have been entered incorrectly, but are literally the same
product. You might be constantly customer address information. I'm always telling organizations that
regardless of regulations, they have a data governance plan in place, whether they documented it or not.
As a data analyst, determining the data governance plan at your organization will help you to know who
to talk to, when to talk to them, and how to adequately follow the process of all things that relate to the
life cycle of data at the organization.
Understanding the importance of data quality

Selecting transcript lines in this section will navigate to timestamp in the video
- As a little girl, I got sick. I mean really sick. My mother immediately took me to the doctor, and they did
an x-ray because I had a headache so bad, I'd been sick for two days. The x-ray showed nothing, but the
physical signs of the illness, and the blood work were enough for the doctor to send me to the ER. A day
or two later, they did another x-ray, and when they did, they discovered why I was so sick. You see, I had
a bacterial infection that was unfortunately on its way to my brain. I was hospitalized for 11 days, and
then given the right types of treatments to prevent it getting worse, and treatments to help get it better.
What does this have to do with data quality? Well, the first x-ray showed nothing, and what they actually
discovered is that their machine was broken. Would I have gotten better faster if that first x-ray showed
them what the second x-ray did? We'll never know. We can't go back in time. Quality data is data that can
be trusted to produce accurate insights so decisions can be made. In my situation, had they waited even
longer to do the second x-ray or even sent me home, I would not be here today. Not all data decisions are
life or death, but they can have terrible consequences for businesses if data quality is not an everyday part
of the culture. It is important for us to all remember as data professionals that people are using data to
make decisions, and bad data can mean bad decisions with profound consequences.
There are data quality dimensions that you can be aware of as a data analyst. This isn't a complete list of
everything you will find for data quality, but here are the
four major hallmarks of quality data, complete, consistent, valid and accurate. Completeness of data. Do
we have all the data that's needed? Is any of it missing? Is it all usable? Consistency. Is this data in other
systems, and is the information consistent across all of them? In other words, does the same record in
production system match what we sent to the invoicing system?

Validity. Does the data meet the requirements of what we are attempting to do with it? And is it in the
right format in which we need to do it?

Accuracy. Is it accurate? This is a big one. Is this information accurate? And in my case, it was not. I
think it's important that we know quality can be measured, and we can determine if it's complete,
consistent, valid and accurate. And if it's not 100%, well, we need to know that. Again, some data means
life or death. So data quality at the highest rate is important.
What is BI and the value to business?

Selecting transcript lines in this section will navigate to timestamp in the video
- Have you ever heard the phrase, Our company makes data driven decisions? Well, of course they do.
And they are making data driven decisions all the time. It could just be bad decisions because the data is
bad. Data driven decisions happen all the time. We need more money in the account, So someone get
those salespeople motivated. That is a data driven decision and action. The problem is this is a single data
point and at a crisis moment. I tend to ask people if they want to be data informed? Data and business
intelligence, lets you have both information and the ability to make intelligent business decisions. For
example, with the correct data and an understanding of the process and the business goals defined with a
solid set of KPI, key performance indicators, a business can see a downward trend in sales before it
becomes a problem. This allows the business an opportunity to course correct to attempt to prevent a
crisis moment. For business intelligence to be practical, it requires you to store the data that's important to
the business and all its processes. You can't just focus on one number like our earlier example. Just
knowing one number and that you have to hit it, means that you understand the goal. However, it's not all
the information you need. All the other data that impacts that goal needs to be analyzed with the business
roles. Fortunately, we have business intelligence tools and they are a tool to build business intelligence
with. The tools do not provide it by themselves. Just like a hammer requires nails and someone to use it to
build something. Businesses need to define the metrics that help them track the overall health of the
organization. Again, these metrics are KPIs. Let me make it practical. And I'll use health as an example. If
you know your heart rate as an adult is supposed to be anywhere between 60 to 100 beats per minute and
you watch it every day and suddenly you see it spike and stay elevated, it would indicate that something
is happening to make your heart beat faster. With a bit more information like tracking what you eat or
drink, you can analyze this data and you notice that it's elevated when you drink a certain type of drink
and it stays elevated for a couple of hours and then goes back down. To make an adjustment, you can stop
drinking that drink or reduce the amount of it. Whatever the adjustment is, you make it, you then analyze
your heart rate to determine the adjustment to see if it made a difference. When you apply this concept to
the overall health of business then you can easily determine what the heart rate of the business is. And
what are the items that impact it. This allows you to start to define the metrics that help you monitor the
health of the business and provide business intelligence.

How are business analytics and BI different?

Selecting transcript lines in this section will navigate to timestamp in the video
- I started running about four years ago and it made me realize that business intelligence, business
analytics, and even data analytics are really three individual things with lots of overlap. Let me break it
down. I was preparing for my first half marathon. I needed the data to tell me how fast I was running so I
could improve my speed. That speed for my mile was my business intelligence. For this example, it
represents a single number. Business analysis focuses on all the numbers that would allow me to get
faster over time by again, analyzing the data and creating more of it. For me, every time I made a run, I
was tracking that information. Data analysis is where we analyze and capture the actual data. We can then
analyze the historical data, keep capturing new data as it grows every day. To use all these concepts
together, I was comparing every run to the last run. However, initially I just started capturing a run to
establish a baseline and I just added more runs and more miles. The business intelligence was telling me
how fast on average I was running a mile or the timing of certain miles, like, my 5K speed versus just my
single mile. I had a goal that I wanted to attain. So, using business analysis skills, I applied this to my
running. I would also use other values, like where I ran, what time I was running, to determine my future
outcomes. For example, I discovered I needed to change my shoes. I changed my shoes, I ran a little bit
faster and I hurt a lot less. I also discovered that if I picked more familiar routes, that I might run a little
bit faster than on a route that was new. I was using these pieces of information to adjust my routine so I
could see a faster speed over time. So, business intelligence tells us where we are on any given day for
any process that we use data to study. My example is running. But it could easily be applied to business
metrics, like, sales or production. And business analytics helps us to see the trends and predict future
outcomes which are critical to businesses. We need both business intelligence and business analytics and
we use data analysis to determine where we are and how to reach our end goals or the desired outcomes.
Think of it this way. Business intelligence can tell you how you're performing today and business and
data analysis can tell you how you can potentially perform in the future.

How data can provide intelligence to the organization

Selecting transcript lines in this section will navigate to timestamp in the video
- Call me strange, but I have a relationship with data. To me, data is living. I do realize it's an inanimate
object, but I do think it's an intelligent object. Data cannot only provide information like this month's
sales, but it can also communicate with software to provide automation. My first experience with data at
this level was with a regulated industry. They were required to provide information at different time
points, and then they were required to report on all the things they did to meet those times. Mail came into
the office, people went to retrieve their mail for the contracts they supported three times a day. They also
printed information and scanned it as needed by walking to and from the printer. Let's just look at the data
around the printer, and the scanner process alone. Look at the data as a person walks, there is a distance
and that distance takes time. And what if they stop and talk? That's more time. Think about the amount of
time at the printer, and what if two people walk up at the same time? Now there is a time the person is
there, and the time the other person is waiting. When they walk back to the cube, they begin the real
work. When a person has everything ready to go, they walk back to get the mail and deliver the mail that's
ready to go out. This occurred multiple times over time by different people all the time. Okay, now let us
multiply that by 10 people doing the same job all day long and then let's multiply that by 260 business
days. Now business intelligence says that today that's X number of hours spent in transit, business
analysis says if we put a printer at their desk, we save X. This is just one way data can help support
improvements to the process. We often think about data in the form of a field, or columns on a
spreadsheet. Just with one key date, we can create other dates to trigger other events. Technology will
allow us to create information automatically the very same information that a human would have to figure
out and then we can have the human verify the information, save time and be more accurate. The most
effective data analyst develop skills and a relationship with data. It's important to start learning to see it as
a living thing that can help us refine current processes like walking to a printer and automate processes
like creating more data.

Understanding the value of data-driven decision-making

Selecting transcript lines in this section will navigate to timestamp in the video
- If I told you that we have an opportunity to purchase a product, and that it was going to make a million
dollars in the first year, you would get excited, right? That one number sounds amazing to people. Some
people would immediately look at that number and begin to act; that is a data-driven decision. In our
scenario, our company thought the product at cost was a steal of a deal, and they bought it all with the
hopes that a million in revenue would produce a large profit. The only problem is that million dollar
number is only the top line and in no way reflects the impact to the bottom line. When people get a single
number in their mind, they can miss the other more important numbers, and it can have damaging
consequences. Let's take our million-dollar project and break it down. The company bought the product at
cost, and the company will sell the product at a list price, and the difference between that cost and the list
price is the margin. All the numbers here, cost, list and margin, matter but of the three, the margin matters
the most. It's important to remember that a million in revenue does not equal a million in profit. Our profit
is made by the margin. Some people see that margin and get excited, but if you stop there, you are in
trouble. You must account for the items that eat the margin because that eats the profit. When you use
data to inform your decision making, you must use the top-down and bottom-up approach together. If
you're an experienced person with these types of scenarios, you may have already figured out where this
is headed. What do we need to do to produce the distribution of this product? We'll keep it simple. When
companies sell products, someone has to sell them and there's a cost to that. And even if it's an online
sales model, there are people involved in maintaining the information to make that happen. Let's just say
for every $1 of the product, it costs 10 cents of that dollar to pay for the sales process. Then there are
other costs: cost to store the product, cost to package the materials, cost to deliver the product. There's
cost in infrastructure. Cost to automate sales processes. Payroll costs for people to maintain systems,
answer phones, and ensure that delivery is met. If you really dig in, you realize quickly that if everything
that is required to make the million in revenue eats up the entire margin, or you sold it at the wrong price,
or you get hit with unexpected costs like increase in delivery, increase in storage or changes to tax, you
are sunk. And if you can't sell it and you can't hold onto it, that million dollars no longer looks like a gold
mine. So the impact of being data informed can ensure you're profitable, and that million dollars is really
not a million dollars after all, but a total loss, which is maybe why it was a steal of a deal in the first place.
Questioning techniques to collect the right data

Selecting transcript lines in this section will navigate to timestamp in the video
- Have you ever heard of analysis paralysis? It's where you overthink a problem that stops you from
moving forward. It's a real thing for some people. It's likely due to stress and anxiety related to making
the wrong decision or not knowing exactly what to do next. Building an approach and thinking through
standard questions and critical thinking with active listening should help you. Technical skills or hard
skills are one thing for the analyst, but the soft skills matter just as much. And if you're stuck, no hard
skill matters. To be fair, the more exposure to real problems and solving them with data solutions will
help you build your approach, but you can slowly start building your questioning now. There are some
common questions you might ask for every data-related project and the questions might be more specific
based on actual problems and the data that you have at hand. Our scenario is that we have five of our top
products. They're being purchased all the time, but the company is losing money. First, you need to
understand that there is data in everything, in it and around it. This will help you start to consider the
questions. Our task as the analyst is to try to determine why if the sales are moving, why are we losing?
There are some basic questions that you should ask about each of the five products. Have these products
ever been profitable? If they were profitable in the past, at what point in time? What is different about this
point in time versus that point in time? Did the wholesale cost change? Did the list price change? Did the
cost of storing or delivering the product change? Any of these answers will lead you further into data
analysis. When we start with these basic questions and begin to answer them, then it will lead to more
questions. As an example, let's say that in our initial questioning, we determine that the wholesale costs
nor the list price have changed in the last three years. The cost to deliver has not changed enough to drive
an impact. The cost of storing the products has been steadily increasing. The next round of questions
begin. Is it only these five products that are impacted by the steady increase of storage cost? And what we
discover, it's not just impacting these five products but all the products. The company just started to
realize it in these five products. What can we do to reduce the storage costs? What type of increase can
we justify on the products without overpricing the product? Both these questions lead to very different
datasets within the organization and then each round of questions and answers leads to more questions.
The goal here is to remember you must start asking questions and then remember they rarely stop. They
just drive further investigation. The greatest part of the question process is that the end result is discovery
and recommendations that are made to improve outcomes.
Discovering and interpreting existing data

Selecting transcript lines in this section will navigate to timestamp in the video
- Have you really thought about how much data is around a person? There's more than you may think.
There's data like date of birth, names, race, and ethnicity. There's work data like employee ID, job title,
hire date, or department. These data points are the items we think about when we work with data related
to people, right. Some of this data is one value, like birthday. It's a value that it is, and it doesn't change.
Then there are other items like job title, which might change when you get a new promotion at work.
There's also real time data always occurring like heart rate, blood sugar, blood pressure, and even
temperature. There's also geographical data like location. Imagine social data as well as what brands we
follow, what brands we purchase, how often we have food delivered versus go out to eat. Data is always
happening. The challenge we face as data analysts is there's a lot of potential data and not all of it is
actually available to us. We also find a lot of the same data is redundant and in some cases can even be
incomplete or inaccurate. All of us are seeking the single source of truth from the data that we work with.
We actually want it to be accurate when we report on the data. Let me give you some examples.
Companies have several different software packages that are used to handle different types of
information. And they're often disconnected. There's people management software for HR type
information, which is employee data. We have our marketing and sales management data. That's maybe
in a couple of different systems and it handles not only staff information in regards to sales, but also
customer information. There is also software that kicks in when a customer goes from being in
conversations with our sales team to purchasing from the company. That data flows from purchasing to
the warehouse. There's also data that flows to the accounting team to handle transactions that support
reporting like profit and loss. What this means is that data flows through the organization at different
times. Systems are often disconnected so finding which systems have the most accurate information is
one of the first challenges. The only way to really know is to begin the investigation and question along
the way. We sometimes hit roadblocks due to permissions and the sensitivity of data. For example, the
data you might need to confirm your values is stored in the accounting software and only the accounting
team has access to that data. Just because you can't directly access it doesn't mean you're done. You can
provide them the values and those teams will work to help you validate. In reality, whether systems are
connected or not, they should hold the same record of information. If your sales team reports that there's a
hundred thousand dollars set to invoice this month, then the accounting software should reflect a hundred
thousand dollars worth of invoices. When they don't balance out, you have to figure out where the
breakdown has occurred. As a data analyst, you need to be thoughtful of the type of data you might find.
And then you have to find the data you do have access to and develop strategies to validate your reports.
Just remember data shows up in everything but it's our job to bring it together accurately.
Data sources and structures

Selecting transcript lines in this section will navigate to timestamp in the video
- We hear about data all the time, right? But what does that really mean? Let's start with the basics. Data
has a value, like your birthday, that's a value like November 20th of any year. So your birthday would be
11/20 of that year. Data has a type, like birthday. It's a date data type. And data has a field name, like
DOB, for Date of Birth. When we put these fields together, like First Name, Last Name and Date of Birth,
we're creating a record. People use records and spreadsheets all the time, but they don't really think of the
sheet as a table, but it actually is. It's just a table called Sheet One. And when fields are combined in a
database, they're stored in tables. They still have names, values, and data types. And when we fill in this
information for a person, we're creating a record. Tables are a great way to capture multiple types of data
in a structured way. This way of storing data is way more flexible than the spreadsheet environment.
There are also other types of systems that collect and store data for the analysts to use for their reporting
requirements. This varies of course by company, but you can expect to find spreadsheets, databases or
even data warehouses. Data warehouses really are data systems that have the refined tables from our
production systems, like the purchasing system, for example. A customer-dedicated software system
might have a database with hundreds of tables and details, but only certain tables and fields are needed for
reporting. These fields get cleaned up by data warehousing professionals and brought into the warehouse
for storage and safekeeping. It is a valuable source of nicely structured data that has been vetted for the
analysts to begin their reporting projects. Structured data that fits neatly into tables and feeds a beautifully
designed warehouse is amazing, but not all data is structured. This is where systems like data lakes help
organizations capture data so they're storing it before it's actually refined for reporting needs. Data
warehousing and data lakes and even data lake houses are very interesting. And if you're into designing
databases or designing data solutions, you may find you want to explore these skills further. Data analysts
will tap into these systems for the data. They don't necessarily create them. As a data analyst, you will
find yourself working with various systems and file types. At the start of your career, you can expect a lot
of spreadsheets and CSP files as you work your way up into working with data stored in larger data
systems. And don't worry, no matter the level, most data professionals love a good spreadsheet when it's
used for analysis and not for storing data.
Describing data best practices

Selecting transcript lines in this section will navigate to timestamp in the video
- [Narrator] Do you have an approach to data? Have you ever really thought about it? I know after years
of working with data for projects or ad hoc reporting, that I've built a pretty defined approach to every
data set that I work with. There are just some things that I do with every data set. The process may be a
little bit different based on the software that I'm working with, but in this example, I'm using Microsoft
Excel. This transactions file has actually been exported from a software that we use to analyze our
transactions. Normally, when we're working on an ad hoc report or a project, we have an expectation of
what we're going to deliver. But to show you that this approach will work with any data set, I don't have
an end goal in mind. I just want to learn about this data set. If I take a little time upfront to learn more
about this data set, I'll be better off when I start trying to meet the end goal of the project. Excel will sort,
filter and perform data commands on what it sees as the data set. And that's a key point, what Excel sees
as the data set. So the very first thing I want to do is confirm that the data that I'm working with in the
transactions list, is entirely recognized by Excel as a data set, meaning that there are no breaks in the data.
I do this by using one of my most favorite shortcuts. It will select all the data that Excel sees in the range.
To do this shortcut, I just simply do Ctrl+A. That's not enough though, because this is a lot of data. It
looks like it picked it all up. But if I zoom out, I notice pretty quickly that I have a broken data set. You
see all of column Z is empty. So that means Excel will only sort and filter everything to the left. In order
to fix this data set, I can right-click column Z and delete it. Okay, let's do that shortcut again. I'll do
Ctrl+A, and now I have a fully intact data set that Excel will recognize. This makes it easier for me to
sort, filter, and do all sorts of data commands. Okay, let me do Ctrl+Home to go up to A1. Before I go
any further, one of the very first things I'll do in working with the data set, is I'll make a copy of it. So I'm
going to take my mouse and put it on the bottom of the transactions list here on the transaction sheet tab.
I'll hold my Ctrl key, and then I'll drag and drop it one step to the right. Now, it's important, I'm going to
let go of my mouse first, and then let go of Ctrl. That makes a copy. Okay, I'll rename it to working.
Copy. That way, if I mess up, I can always go back to the original transactions list. Okay. Let's take a
deeper look at this data. When I see fields named ID, like transaction ID, this is database language for key
fields. Okay, let's see how many of those we have. So I'm going to hit the select all which selects the
entire sheet, and double click in between the A and the B column headers, and this sizes all of the data. So
I'm looking at transaction ID. I have product IDs. I have reference order ID. So these are key fields and it
automatically makes me wonder, are there duplicates in this data set? So let me highlight the transaction
ID because that's what I really need to be unique. So I highlight transaction ID, and I want to spot the
duplicates before I deal with them if they exist or not. I'll go to conditional formatting. I'll choose
highlight sales rules, and I'll choose duplicate values. I'll go ahead and make them light red fill, and click
okay. I have to look at the data and immediately I see some duplicated data. That means, that I have
duplicates in this data set. So if I were to total it up or count the records, I would get an inflated amount of
information. Okay, so I need to address these duplicates. Let me do Ctrl+Home to go back up to A1. It's
easy to deal with duplicates when you know what fields to choose. What makes this a duplicate
transaction, is the fact that the transaction ID is duplicated. I see them all highlighted in red. It's a little bit
more obvious now that we know that duplicates exist, but in a sea of data, it can be hard to find them.
Okay, let's go remove the duplicates. Now this command will actually remove them, but that's okay, I
have my copy here. I'll go to data. I'll choose, remove duplicates. I'll choose, unselect all for this example.
And I'll choose transaction ID. I'll go ahead and click okay. It tells me that it found a ton of duplicates,
and that it's only going to leave me 1,228 records that are unique. Perfect. I'll go ahead and click okay.
Now I have a data set with integrity, no blank rows, no solid columns. I know that I don't have duplicates
because I've removed them, and I have a working copy so that I can continue to explore this data. This is
in no way, a comprehensive list of approaches. These are just techniques that when you start working
with Excel data, you might want to do them on every data set.
Assessing and adapting the data for transformation

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Have you ever heard of data profiling? It's where we create a high-level profile of the
characteristics of the data that we're working with. We should apply this approach to every data set. The
greatest thing about profiling data is that when we use this approach, we get to learn about the data we're
working with at a high level. Profiling helps to inform us on some pretty valuable items. It tells us how
much data we have in this set. It can also tell us what the totals counts or averages of any number may be.
This helps us validate our numbers later. It can also inform us about the data cleaning we will need to
complete when we get ready to transform our data. I have some sales order data here, and I want to
profile this data to help me get started working towards a report on sales orders. I'll first start by profiling
the amount of data. I want to take a look at the record counts. How many records do I have in this data
set? To do this, I can click on column A, and use the auto calculate feature on the bottom right-hand side
of my screen. Now I have all of the auto calculate functions turned on. To do that, I just right-click the
auto calculate area and then I can select each one of the options that I need. Okay, great. So when I look
at this, I can see there's a count and a numerical count. So count will count everything I have highlighted
and the numerical count will only count the numbers. So if I look at this record set, I actually have 3,500
records that represent the sales orders. We can also use some in average. Let's take a look at how much
money is actually represented in this record set based on total due. I'll highlight column L. And this tells
me I have approximately $33,700,000 worth of money represented in the total due column. It also tells
me that my average is $9,633. Let's look at the average of the subtotal. This is the money before tax and
freight. So the average sub total in this data set is $8,581. And the total is around 30 million. This tells me
that if I see numbers like 60 million or 66 million, I have a problem in my data. So knowing how much it
would total is important for validation later. Data profiling is so easy to do, but this is just the starting
point of what you'll learn to profile your data. Remember, it will also help us inform our data cleaning.
Take a look at columns, B, C, and D with me. These are order dates, but they look like zeros. If I click on
B2, I can see that there is actually a date included. Just can't see it based on the formatting. Also for the
purposes of my reporting, I don't need those timestamps. They're all set to midnight anyway. So this
informs me that on my data cleaning process, I'll need to address the dates. There are additional profiling
options that we will uncover as we explore deeper into our data and with other tools, but everyone with
the data set and Excel, can use these options to profile their data.
Understanding the rules of the data

Selecting transcript lines in this section will navigate to timestamp in the video
- We hear about business requirements in the world of business all the time. They control what we are
doing on any given project. Part of meeting the business requirements is the business rules. It is important
when working with any data that you understand the rules around the data that you're working with.
These rules can inform you when to expect data, what you can do with certain data of certain criteria, and
also explain what needs to happen in the transformation of data. Let's work through some examples of
business rules and how they can impact our data. Let's get started with just understanding what we mean
by rules. Business rules can be as simple as a definition, is a contact for a salesperson, a customer, or a
prospect. It could be as simple as a business rule that defines a customer is a customer once they actually
place an order. These rules also control the flow of data. So if in our system, we have a sales order record,
that means that the order has occurred. It means that that prospect and the potential sale made it to a
certain stage of the process. Then the business can use this to easily distinguish a potential sale from an
actual sale. This is an example of a simple business rule, and this rule can also be used to then convert a
prospect to a customer using data. Some rules can be a bit more specific and have a technical
requirement. We have some sales order data. This sales order data is going to be prepared to go into a
new system that provides additional reporting about our sales orders. This information will go to our
production team. So the business requirement is that we need to prepare the data to go into the new
system. Now we have the data that we want to transfer to another system for reporting purposes. It has a
specific template, and we must use this data from our system to match that data specification of where it's
going. We've been provided this technical requirements document for our data. Let's take a quick read
through that. First of all, it tells us that the sales order ID must be converted to a text data type, but it must
not contain any letters. All of the date fields should not include time stamps. We also have to have a main
account GL number. And that main account GL number holds a four-digit code for accounting and the
last two digits to specify the category. Also, we see that territory ID and comment fields need to be
removed. And the final step is to save our data in a CSV or comma-separated value file so that we can
import it into the new reporting system. So now that we have our technical requirements, let's take a look
at the data. Okay, so the business role in our technical spec said that sales order ID and sales order
number need to be text data types. So I can look at sales order number and see pretty quickly it's a number
data type. I know that because it's right-aligned in the field. I can see the sales order number is already a
text data type. It's aligned left, but it doesn't meet the requirements because it contains two letters, S and
O for sales order. I'll take a look at my dates. I can clearly see they include time stamps, so part of my
technical requirement will be to clean this data to meet the rules, which would be only dates and no
timestamps. Our specification also said we had to have a main account GL number, and this is a four-digit
code for accounting and the last two digits specify the category. But when I look at the data, I don't see a
main account GL number. However, because I know the business rules of the account number for these
records, I know that that main account GL number could actually be created from the account number. I
also see we do have columns that they said to not include, which would be the territory ID and the
comments. When working with any new data project, you want to make sure you consider the rules of the
organization in regard to their definitions for data. You also need to account for the flow of data and any
specific technical requirements.
Tips on preparing the data in Excel

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] In the life of every data analyst, you reach a point where it's time to prepare the data. This is
the part where we clean and transform our data to meet the requirements, and if you haven't heard, we do
this a lot. You've profiled your data, you've reviewed all the business rules, and now it's time to dig in and
actually get started. I want to work with my sales order data to prepare it for a template to import it into a
new system. I typically start with a new blank workbook and I'll use Power Query to connect to my data,
and then I'll do my data transformations there. I'll go to my data tab, I'll choose get data, and I'll choose
from file. There are several connections here, but because my data is an export that's stored in an Excel
workbook, I can choose from file and from workbook. Okay, I'll navigate to my data for template, I'll
double click it, and this is what is establishing the connection between my Excel file and Power Query.
I'll choose sales orders, and then I have two options. I can go ahead and load the data to this spreadsheet,
or I can choose transform. Because I know I have transformations to make, I'll go ahead and choose
transform data. Okay, so I'm connected to my data, and I can see my sales order query. I see my query
settings and my applied steps. First off, I want to show you that it promoted my headers. Now, what that
actually means is it took the first row of information that it saw from my spreadsheet and made that my
column headers. And then it changed the type. What that means is that it looked at the second row of
information, which was actually the first row of values, and tried to determine what the data types would
be based on the values that it sees. Okay, for example, sales order ID. It has numbers. So it automatically
translated that as a number. Order date has a date and a time, so it automatically made that date and time.
Okay, as part of my requirements, I know that I have to change sales order to be text. So I'll hit the one,
two, three, and change it to a text data type. It's asking me do I want to replace the current step or add a
new step? I don't want to change how it read every single data type, so I'll go ahead and add a new step,
and then on the right hand side, you see my applied steps has a new step where I changed the sales order
ID to text. We also know from our technical requirements that we can have sales order number. It is also
supposed to be text, but it cannot contain any letters. So I need to remove the S and the O from the front
of the data. So what I'll do is I'll highlight that whole column and I can right click and choose replace
values. I can also just select a single field and choose replace values. I can highlight the whole column
and choose replace values up top. Okay, so I'll choose replace values. No matter what step I choose, the
outcome will be the same. So I want to find all of the SOs in this column, and I want to replace them with
nothing because I want just the number. I'll take a look at the advanced options. It's asking me do I want
to match the entire sale contents or replace using special characters? Neither of these apply. Okay. I'll
choose okay, and then immediately, I see sales order number. It's still text, which is appropriate for my
requirements, but it no longer contains the S and the O. On the right hand side in my applied steps, I see
replaced value. And if I needed to change anything, I could hit the little gear shape, and that takes me
right back into my steps. Okay, I'll choose cancel there. Because Power Query keeps all of our steps, it's
similar to what people do with recording macros or coding VBA for data cleaning. Except we're not
having to code or record. We're just actually performing the actions and it's keeping up with it. Let me
show you what I mean. So, let me click on navigation. Notice that the first row contains my column
headers. So now when I choose the next step, it shows me that it promoted those headers. Then it changed
all the data types based on the data that it sees. And then I started my first step which was changing the
sales order ID, and notice I still see the SO until I choose replaced value. That means if my data changes,
I can update my data source, and it will reapply all the same steps. Okay. Let's go ahead and change the
data types for dates. I don't need the timestamp, and also notice they're all set to midnight anyway. I'll go
ahead and hit the dropdown and choose date, choose date again, and then date again. So now I have my
dates in order. Perfect. Let's go ahead and work with parsing text. So, first of all, we have an account
number. This account number actually really needs to be referred to as the main account GL. So I'll go
ahead and double click account number and change it to main account GL. Now each piece of this main
account GL actually represents another field of data that I need. So, I need to actually parse this text. I
need to split it apart and I'll use what's called a delimiter to do that. Notice there's a dash in between each
section. The first thing I'll do is duplicate this field, and that will throw it all the way to the right. That
way I can keep the main account GL and then also create the three new fields. I'll right click, I'll choose
split column, and I'll choose delimiter. Notice there's several options here. I'll choose delimiter. My
delimiter is a dash, although I have multiple options here. Okay, so custom dash, and I do want to split it
at each occurrence of the delimiter. All right, I'll go ahead and click okay. Let me scroll over. And I have
my three new fields built from the main account GL. Let's go ahead and name these. So I'll call this
gentle. This should be labeled GL number. This field will be called account number 'cause that's what it
represents. And then this last number here is called category. Okay, perfect. Now, if you look up top, you
see what's called M for mashup. This is the language that's keeping all of my steps. If you want to see all
of those steps, you can go to the advanced editor, and this is its recording of everything we're completing.
Okay. I'll go ahead and close that advanced editor. I also need to remove columns. Now, I can actually
right click any column and choose remove. I can keep the columns that I want and then right click and tell
it to remove all other columns, or I can go to choose columns up top and then just deselect the ones I do
not need. So I do not need territory ID or comments for my final file. I'll go ahead and do okay. Now that
all my transformations are made, I can go ahead and close and load this data to my sheet. It tells me that I
have 3,500 rows loaded. This is perfect. Okay, great, tells me where my data sources are and the time of
my last refresh. These are basic steps that anyone can perform to clean up columns, convert data types,
and break text apart.
Transforming data in Excel with Power Query

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] We've been tasked to look at how long it takes for our supplier transactions to go from the
transaction date, to the finalization date. We want to see if there's any suppliers that may take a little bit
longer for any given transaction. What we really hope to find, is that most of our transactions are under
three days. We have all the data we need, but we don't have all the calculations we need to perform the
analysis. So let's get started with a few transformations, and building out the calculations we need. Okay,
I'll go to queries and connections, and I'll choose to edit my supplier's query. There are few
transformations that, normally would've caused me to create functions in Excel, but because I'm using
Power Query, I can perform these functions without having to create a function. Let me start by showing
you supplier name. For our purposes, we need all the supplier names to be in uppercase. I can easily
transform this column to uppercase. Also, we need the transaction date, but we also need the transaction
year. I'll right-click transaction date, I'll duplicate that column, and then I'll transform this to just show the
year. I can do that by right-clicking, transform, I can choose year, and then choose year. Okay, I'll go
ahead and name that, transaction year. And then I'll just go ahead, and move it over by my transaction
date. I have two amounts here. I have the amount excluding tax, and the actual tax amount. What I really
need is the total amount. So I'm going to create my first formula. I'll go to add column, I'll choose custom
column, I'll name it total amount. And then using my available columns on the right hand side, I'll scroll,
I'll double-click amount, excluding tax, I'll add the plus sign, I'll double-click tax amount. It tells me that I
have no syntax errors, and I can click OK. I'll go ahead and adjust this to be a currency data type. I only
need the total amount, so I'll go ahead and right-click amount excluding tax, and choose remove. And
then I can also remove my tax amount. Now we want to look at the number of days that have elapsed
between the transaction date, and the finalization date. Let's go add another column. I'll go to custom
column, I'll name this days, I'll choose transaction date, minus, finalization date, and click OK. Using this
method will return the number of days, but because the transaction date was before the finalization date,
it's showing as a negative number. It also doesn't really look like a number. It looks like a timestamp.
What I'll do is go ahead and change it to a whole number. And what I'm really looking for is the absolute
value. So again, I'll right-click, transform, and choose absolute value. Now I have all of the information I
need, except I don't have the field that tells me if it's over or under three days. I'll use a conditional
column. I'll tell it to look at the days, and then provide me text, that says over three days, or under. I'll go
to conditional column, I'll name this over under, I'll choose days. And if it's greater than, or equal to, three
days, I want it to say three days or more. For anything that's two days or less, I want it to say two days or
less. This is a logical function that looks at the days, and then gives me a value if it's true, or a value of
false. If I were doing this in Excel, it's similar to an IF function. I'll go ahead and click OK. Now I'm
prepared to start my analysis. I'll go to home, I'll choose close and load. Now I see I have all of my extra
columns that I've added, and my supplier name is automatically capitalized. This is fantastic. I'm ready to
start looking at my supplier transactions, to determine if they're over or under three days. Now that our
data is prepared, we can answer a few common questions on the production days. We'll start by inserting
a pivot. I'll do insert, pivot table. It's going to use my supplier's range, and it'll be on a new worksheet.
Perfect. I'll drag my over and under to rows, I'll go ahead and drag my supplier transaction ID to values.
And because it's a number, it will automatically sum it. I'll go ahead and change that to account. Click
OK. Just looking at the numbers, I can tell that most of the transactions have been three days or more. Let
me do one more quick analysis step. I can right-click, show, value as, and tell it to show me the
percentage of the grand total. This high level detail tells us that 69% of our transactions, really are taking
three days or more to produce, only 31% approximately, are actually under two days. Okay. We need to
do some more analysis. Transforming data can mean a lot of different small techniques applied to the data
as you work to get to your analysis.
Transforming data in SQL

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] If you are a data analyst, at some point you will encounter or hear SQL or SEQUEL. Let's
start with the basics. SQL stands for Structured Query Language. Structured query language is vast. It's
not unlike any other language. SEQUEL, however, is a computer language that works with data and the
relationships between them. Microsoft SQL Server is a relational database management system
developed by Microsoft with the primary function of being storing and retrieving data, although it does
so much more. It was developed over 30 years ago and it does a lot of different things with data. And it's
important to understand you don't need to know them all. As a data analyst, you do need to know some
basic queries. A basic query allows you to select data from the database. There are two required
statements for a SELECT. You must know what you want and where you want it from. This is the SELECT
and the FROM statement. The SELECT will list all the fields from the table, and the FROM actually lists
the table name. If I want to filter data, then I'll use the WHERE statement. And if I want to sort data, I
can use the ORDER BY statement. WHERE and ORDER BY are not required to be in the statement.
However, when they are used together, they are required to be in the right order. You have to filter the
data before you sort it. Let me show you how to run some basic SEQUEL statements. I'm using SQL
Server Express and I'm working with Microsoft SQL Server Management Studio on the Wide World
Importer Sample Database. I'm going to run the supplier transactions. I'll right click, I'll select top 1000
rows. This generates a basic SQL statement. It selects all of the fields from Wide World Importers. Okay,
it's also a filter for the top 1000 records. I'll go ahead and remove that statement and execute it again. If
you look on the bottom right hand side, this tells me I have 2438 supplier transactions. To add more
meaning to this data, I actually need to add another table. And this brings us to working with joins.
When you have data in multiple tables, you leverage joins to control what data shows in the results.
Okay, I'm going to highlight my select statement. I'll right click it and go to the design query and editor.
Even though I can code all these statements, it is easier to work inside a GUI, a graphical user interface,
especially if you're at the beginning. Okay. So I'm going to size my table here so I can see everything. All
right, I'll right click and go add a table. And I want to add the suppliers. Because these tables have an
established relationship in the database design, they're automatically joined. They're joined by the
supplier ID being in both tables. I can also see that it's a key shape with a one to many, meaning I have
one supplier listed and they may be attached to many transactions. When I hover over the diamond
shape, it shows me the inner join, but I can also see that in the statement here. Okay, perfect. Now let
me add the supplier name. Now it will automatically throw the supplier name at the last part of the
select statement. But if I want to put it at the beginning, I can just drag it up. Okay, I'll click OK, and then
I'll execute my statement. An inner join works by looking at both tables to find a match. And what that
means with these two tables is that if I have a supplier name and that supplier has a transaction record,
they will show in the results. This is showing me 2438 records where I have a supplier and a transaction.
This is perfect. The only issue I have with this data is if I wanted to report on suppliers that we have in
our system, regardless of their transactions, I have to adjust the join type. All right, I'll highlight my
statement, I'll go to the design view. The diamond shape is where I can control the joins. I'll right click
this and tell it to show me all rows from suppliers. And this will create an outer join. If it's left or right,
will be determined on how the database sees left or right. So I've told it to show me all suppliers,
regardless of their transactions. I'll click OK. And if you'll notice in the statement, it's a right outer join. It
sees the supplier table on the right of the data. Okay, I'll go ahead and execute. I now see that I have
more records. I have 2,444 records. This means I do have suppliers listed in our data set that do not have
transaction records. Let's scroll to the bottom and see what that looks like. Starting with nod publishers,
I see the first set of suppliers that do not have transaction records. They're easy to see because the
transaction record all says null in each field. That's because there are no supplier transactions for these
final suppliers. If I want to see if there are supplier transactions that do not have a supplier, then I can
adjust the join type. I can tell it to give me a left outer join. Now I could go to the design view and adjust
this, or I can just type left outer join here in my statement. And then I can execute. That's because
there's a relationship between these tables that will not allow you to put a transaction in without a valid
supplier. But again, you could easily have suppliers that do not have transactions yet. Because join types
do impact the data we have in our results set, you always need to critically think through what you're
trying to achieve with your data and know that you might need to adjust the join type.
Transforming data in Power BI

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] Power BI offers two core functions for the data analyst: Transforming the data, as well as
presenting the data. We want to analyze the sales of our products to note our top 10 products. We will
eventually visualize this data for an executive meeting or a future dashboard. The opening screen of
Power BI desktop is a blank page ready for visualization. This happens after you've connected to your
data. If you notice in my Field pane on the far right, I've connected to the tables I need to analyze the
top products. It's the Order Details and the Products. We'll do the transformations on Product and Order
tables in Power Query. I'll get to show you the Group By function, so I can total the orders. As well as the
Query function, where I merge the orders and the products together using the Merge Queries option.
Because I've connected to my data, I can just go to the Transform option and begin my data cleanup. I'll
start with Products. The only information I need for the top product analysis is the product ID and the
product name. So I'll click Product ID. Hold my Control key and choose Product Name. I'll right-click and
Remove Other Columns. This just leaves me with the two that I need. Because this information might
present better with the product name being all uppercase, I'll go ahead and right-click, and transform it
to be uppercase. I'm now ready to move on to Order Details. Order Details gives me the order ID, the
product ID, the unit price of that product, how much was ordered, and the discount. One of the very
first things I need to create is the function that gives me the total amount after the applied discount.
Okay, I'll choose Add Column. I'll do a Custom Column. I'll do Total Order Amount. In this statement, I'm
going to create the subtotal, calculate the discount, and then deduct them from each other. Again,
mathematically, you could do this multiple ways. Okay. I'll click OK. And now I have my quantity times
my unit price minus the discount amount. I want this to be a fixed decimal number. Fantastic, I now
have my Total Order Amount. Now I'll use the Merge function to create the query that merges the
Products table to the Order Details table. I'll start by clicking on Products. I'll go to my Home tab. And I
have the option for Merge Queries. We have two options here: Merge Queries or Merge Queries as
New. Merge queries, if I select it, will allow me to merge data directly into Products. Merge Queries as
New will give me a third object to work with. For the example in what I'm creating today for analysis, I
just need to do a Merge Query. I can merge that data directly into Products. Now I have the Merge
screen and I have Products. And I want to merge it with Order Details. And the common field between
the two is the Product ID. So I want to make sure that I've highlighted those. The join types, just like any
other data set, are here in the Merge. If you look at the bottom of the screen, it says Join Kind. And if
you notice, there are 75 records that match 77 rows from the first table. That means I actually have two
products with no records. Meaning, they haven't been ordered. And that's okay. We're looking at the
top products, so obviously, all of them wouldn't have been ordered. I'll go ahead and hit that drop
down. I have Left Outer, which would show me what it's showing now, all products, and if it hasn't
order. Right Outer, which would show me all order details regardless of the match to the product. A Full
Outer, meaning, if I have products and order details that don't have records matching, it would show all
rows from both. An Inner join, which is what I need here, show me just products with orders. You also
have Left Anti and Right Anti. This would show you just the null values. So if I were to choose Left Anti, it
would only list the two products that didn't have order details. For my top analysis, I need Inner. I'll go
ahead and click OK. And now I can expand my table. I'll hit my Expand here. I don't need to use the
original column name as a prefix, but that's a preference. I don't need all of the columns. I really just
need the Total Order Amount. I can go ahead and click OK. And now I see the Product Name and the
Total Order Amount. Now I'm ready to group them up. This will allow me to use the Group By function,
and total by each product. Okay, I'll go to my Transform tab. I'll go to Group By. Okay, I want to group by
the product name. And I want to get a total... by summing up... the actual total order amount. This will
take each individual line item and total it up by product. Giving me the total orders. I'll go ahead and
click OK. Now I see each product name and how much was ordered. Okay. When I go back into my
visualization, I really only need to see Products. So I'll go ahead and tell Order Details not to load. I'm not
using it at any visualizations, so it's okay for me to continue here. All right, I'll go ahead and go to Home.
And I'm ready to apply this data set... to my visualization page. So now on my Fields list, I see my
Products. I'll visualize this in a table, so I'll choose Table. And then I'll drag my Product Name... and my
Total to the Values. Okay, I'll go ahead and size this out so I can see it. Now, right now, this represents
every single product. And we're trying to get to the top 10. I'll go to the Product Name on the filters. I'll
tell it to do an advanced Top N filter. Where top is 10. I'm going to base that on the total, so I'll drag that
Total to the By value. And then I'll apply that filter. After applying that filter, I see the top 10 products.
Let's go ahead and sort it. Just by clicking that Total header. These techniques and joins show you
exactly how powerful Power BI and data can be when you establish cleaning routines for basic
presentations of data.
Common cleaning and transformation

Selecting transcript lines in this section will navigate to timestamp in the video

- When building your cleaning and transformation toolbox, there's some common cleaning and
transformation items you will use. Others will be more specific to the deeds of the data you work with.
Let's start with general cleaning. Spaces are invisible to the eye, but in fact, they're characters. And
when a field has extra spaces, you will want to clean those by removing them. There are leading spaces
which are spaces that are at the front of the field. There are trailing spaces which are at the end of the
field. When we want to remove either leading or trailing spaces, then we can use functions like trim or
clean. The act of breaking out text is referred to as parsing text. And we can do this with any type of
delimiter and every program handles this a little bit differently, but the outcome is the same. Spaces will
also serve as a delimiter, like the spaces between words are valid spaces. Imagine first name and last
name. In the case we want to have both last and first in their own individual columns for sorting, as an
example, we will use the space to break those columns. This is not the only time we parse texts using
delimiters. You might break apart text fields based on things like a dash or even a comma. We use things
like text-to-columns, split by delimiter and functions like left, right and mid to work with parsing text.
We don't only break apart text. There's also times when we need to combine text fields together. This is
commonly known as concatenate or concat. We also replace text with valid text. For example, if
someone enters an abbreviation of a state in the United States, but we want the full state spelled out,
we might replace that text with the valid response. It could be a misspelling that we're correcting. There
are several methods for replacing invalid data with valid data. We also change the case of text. Example
would be maybe we need everything to be in uppercase or lowercase or even corrected to proper case.
There are functions to do each of these commands, and again, they might differ between programs, but
the outcome will be the same. These are very simple commands to perform in any data program. You
may find that you'll also remove duplicates from a dataset and this can be done with commands like
remove duplicates or using distinct keywords inquiry statements. We also transform data types to be
appropriate for what we need to do with the data. You may have date fields that are stored as text, but
to work with date-related functions, you need to convert it to an actual date data type. The same goes
for numbers. If you need to work with a mathematical function, then the value of the field must be a
number data type. These are just a few of the basic commands that we use for cleaning and
transformation of data and some of the first ones to understand and master.
Using built-in functions

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] There are a lot of people who don't enter into the field of data because they're intimidated
by math. It's important to recognize that one of the powers of these tools is that it performs all types of
math from basic to complex mathematical computations for us. We don't have to manually create every
function we need. The tool provides us a lot of calculations. For example, in this Power BI dashboard,
let's take a look at the fields, look at Quantity and UnitPrice and even Discount. Do you see how they
have the sigma shape? It's because it recognizes them as numbers, and this means it will automatically
aggregate them for us and summarize them. Let me show you what I mean. I'll go ahead and add a
table, and I'll go ahead and expand orders. I want to bring in the order ID. I also want to take a look at
the product name. Now let me expand this. I want to be able to see as I ad fields in. I'll go ahead and
bring in the quantity, and then I'll bring in the unit price. Do you see how it automatically totals the
quantity and the unit price? This doesn't make sense to me. The unit price is just the price and the
quantity, well, that's the quantity that was ordered for that order ID. So what I'll do is I'll right-click on
the quantity and tell it to not summarize. I also go to unit price and tell it not to summarize. I think I
would prefer to see unit price over quantity. So what I'll do is I'll just drag that order and change them,
perfect. I do want to see the subtotal, and one thing I'm finding here is that I don't have it. I'll build that
in my model. I'll go to Transform Data. I want to add it to the order details. I'll go to Add Column, and I'll
choose Custom. Okay, I'll go ahead and call the SubTotal, and here I'm using their function builder. I'm
going to go ahead and say UnitPrice, by double-clicking, multiplied by Quantity. Tells me I have no syntax
errors, which is great. I'll click OK, and now I have my new subtotal. Notice that my default is A, B, C and
1, 2, 3, alphanumeric. I'm going to go ahead and change that to a fixed decimal number, all right? I'll go
to Home, Close & Apply, and then I'll bring my subtotal into my table. Now, in this case, I do want this
number to total. This makes perfect sense to do that. This is my amount before I apply a discount. Okay,
let's take a look at something else Power BI does for us and, in this case, it keeps us from having to write
functions. Notice the order date. It's actually got a date icon, and when I hit the little expand, it has a
date hierarchy. That's because Power BI assumes that I will probably want to work with year, quarter,
month or day. Let me drag my date hierarchy into my model, and I'll put it up by the order ID. Notice,
automatically, I get the four individual fields. There are times I do want this and times I don't. In this
case, I just want to see the order date. So what I'll do is I'll actually right-click the order date hierarchy
and tell it just to show me the order date. If you work as a data analyst, you probably work with pivots
and matrix. Remember, that's roses, columns and summary values. Here, I'm going to add a matrix, and
I'm going to look at values based on the shipping country. I'll go ahead and add Ship Country to the
rows, and I'll grab my subtotal and add it to my values. This lets me see every single country, and it
automatically summarizes its subtotal. Now, if I wanted it to be an average, I could right-click and
choose Average. If I wanted to show the max of any particular subtotal in a country, I could choose Max.
Again, I'm not completing this math. I'm just choosing the right options. I'll go ahead and choose Sum.
Another powerful feature of Power BI is the ability to use quick measures. I'll go ahead and click on
Quick measures. These are actually measures that are written in DAX. They're freely available for me to
use. I can go ahead and hit the dropdown, and I can see options like Aggregate per category, giving me
average, variance, max or min. I have different filter scenarios, different time intelligence scenarios, like
year-to-date totals, year-over-year change. I also see totals, like running total. Let's do that. I'll choose
the running total. I want to work with my subtotal, and I want that running total to be based on the
different country. So I'll go to my orders. I'll choose my ship country. I'll go ahead and leave it as
ascending, and I can hover over each one of these options to learn more about it. All right, I'll go ahead
and click OK. Now I see the DAX behind this particular calculation and on the right-and side, do you
notice how I have my subtotal running total? And it has a little calculator shape. Let me go change this to
read RT_SubTotal. Okay, and then I want to actually go put this into my matrix, which is going to just
drag it underneath my subtotal there. So the running total works by adding each value to each value. So
I started with 8,000 approximately. So my starting running total is 8,000 approximately, and now when I
go to the next value, Austria at 134, it adds that 8,000 to that 134 and gives me the 142,000. That's the
running total. One of the really great things is that I can actually add more variation to this and change
my running total. For example, I want to see a running total across the years. So I can actually drag Year
into my columns, and then my running total starts over again for each year. These are just simple
examples of some of the power of the built-in functionality in Power BI. Just remember, with power
comes a great responsibility. That really sounds like the beginning of a superhero movie. I tell people all
the time, anyone can make numbers show up, but that does not make them correct. If I could offer you
a piece of advice, really think through what you're trying to accomplish with the numbers, consider what
functions you might need and then also read about what's available. The more experience you have, the
easier the research will be for you but don't worry, you'll always be studying something new.
Relational databases

Selecting transcript lines in this section will navigate to timestamp in the video

- Have you ever really thought about how systems store data? I bet if you're a new analyst, you have not
gotten that deep into the idea of how data is stored, but you just know that it is stored. Relational
databases have been around for a while and you will hear people talk about SQL databases or SQL
scripts or statements. A data analyst doesn't have to be fluent to be effective. You can do a lot with just
being somewhat literate with SQL. This is a key area that you can further study if you're interested.
RDBMS stands for Relational Database Management Systems and server technology, like Microsoft SQL
Server, can store these databases. There are others. Even something as simple as an Access database
has relationships and relational data. We need to go back one step and discuss structured data. When
you work with a spreadsheet that has column headings and data values, then you're actually working
with structured data. This data has a field name, we see it in the column headings, and then it has a data
type and a value. When we build relational databases, we build structured data sets that are stored in
the form of tables. These tables then become connected through a relationship between key fields.
These key fields are unique identifiers that help control the data that can and cannot go into a table.
When structured data is defined and then stored into tables and then the tables are related, this creates
a relational database. These relational databases are used to hold information and we as data analysts
use this structured and stored data to build reports, visuals, and analyze data. One thing that is
important to note is that you as the data analyst must understand the structure that is used to store the
data does not always make it easy for reporting. Why? The rules for effective storage are different from
the rules used to combine data for reporting. They are two very distinct roles and functions, even if they
work with the same data. As an analyst, you do not have to know how to design large-scale data
systems, but you will want to understand some database design techniques so that it makes
understanding someone else's design easier.

Modeling data for Power BI

Selecting transcript lines in this section will navigate to timestamp in the video

- We will work with different data in different data sets or tables to do analysis and visualization. When
we have multiple tables that we're working with, we'll want to model our data to get the most out of it.
When you have an entity relationship diagram where the tables and relationships are showing in a
model, you're actually seeing the model of the data. Now I've already connected data to my Power BI
Desktop and on the right hand side, you see my fields list. I have different tables of information required
for my reporting. It appears that I'm ready to go but I need to go one step further. These data sets are
meant to be joined together. There are several ways to join and model data in Power Query for Power
BI. When we perform merge queries, for example, we're actually establishing a join but we can also go
to the modeling section and model this data from the very beginning and this allows the data to
communicate through the joins, meaning if I reference an order, it knows what product and what order
details are related to that order. In looking at the diagram, we see that there are some joins already
established. Power BI as a convenience tries to join the data automatically, this is called auto detect and
it tries to auto detect the relationships. You should always confirm that the relationships that it
establishes for you are correct. Remember, it's easy to model data when you know what data is related
Master data management

Selecting transcript lines in this section will navigate to timestamp in the video

- Have you ever been working with data and you see customer addresses all have different ways to
reference the region? In some countries, we have states, provinces, or districts. And when they're used
in the data and entered by different people, they may reference the full state, province, or district name
or they may list the abbreviation. Data like our customer and their address information would be
considered master data. We want everyone in the organization who works with this data to have the
same consistent list of information. When an organization takes the time to design rules around the
master data, this will also inform all the data analysts of what types of transformations apply. Using
tools like Power Query, either in Excel or Power BI, we can easily make these corrections and save these
steps so that as new data comes into our reports, it will conform to the standards. Master data is not
just address information, though. It could be project names or product names. If we call a project
something different, then it makes it difficult for the data analyst to report on this information with
ease. There are tools that exist to support large scale organizations with master data management. But I
would argue no matter the size of your organization, if you do not have a plan in place, the analyst will
be dealing with it all the time. So as master data management aims to keep a clean, complete, and
accurate list of master data for the organization, if you don't have master data management, then you
will need to develop a plan to keep a nice, consistent list of data when you report. Let's take products, as
an example. Two companies have merged. They sell the exact same products, but in both companies,
they're not called the same name. As a data analyst, you can use a table that holds every possible name
and the correct name so that when you report, you can leverage joints to give yourself a master table of
information. When a new name pops up, you'll have to address it in your master table, but it's better to
have that table than to not have it. Your data set being clean and complete is one of the most important
parts of any project. Just remember that all of your data skills can apply to many types of data scenarios,
not just the analysis or the presentation part of the job.

to each other. Let's look at the orders table and the order details. These are joined together by the order
ID. Also notice we have a '1' and a '*' or star symbol. This shows us the cardinality of this relationship,
it's a one to many, meaning we have one order and many order details, not unlike when you place an
order and buy multiple things, you have one order record and then the different line items and
quantities for the products that you purchased. Let's look at the products information and the order
details. These are joined by the product ID and again, it's a one to many relationship. There are other
relationships when we refer to cardinality, there's one to many, many to one, one to one and many to
many. One to one means that there is only one record tied to one record between the two tables. One
to many and many to one, like our examples here mean that we have one record in one table that's tied
to many records in another table. I do have a join that needs to exist but doesn't. Take a look at the
employees, you see how there's no line to any other table, this means that the model doesn't know how
the employees relate. I'll use the employee ID and drag it to employee ID, this establishes my
relationship. I can go ahead and look at the properties of this relationship. I'll right click the line and go
to properties. This shows me the orders table, which is the many side and the employees table, which is
the one side and I see the cardinality is many to one. I'll go ahead and click OK. To manage all the
relationships, I can go to manage relationships up top and work with each one of them. Okay, let's see
the model at work. I'll go to report and I'll begin to build a basic visual. I'll start by just adding a table. I'll
go ahead and bring in the company name from customers. I'll bring in the last name from employees.
Okay, I'll collapse those so I can see. From order details, I'll actually go ahead and bring in the order ID.
I'll bring in the order date hierarchy. I just want to actually show the order date so I'll right click that and
just show the order date and then I'll bring in the product. I actually want to put the product in between
the order ID and the order date and then I'll also bring in from order details, the unit price and the
quantity and then I'll bring in my total after discount. Because I've modeled my data, I know that I have
the correct company listed with the correct last name of the salesperson with the appropriate order ID
and the order details for each one of their orders. Because we've modeled this data together, we can
now explore the data using all the features that help us visually without having to create various merge
queries to accomplish the joins.

Unstructured data

Selecting transcript lines in this section will navigate to timestamp in the video

- Did you know that there is way more unstructured data in the world than structured? As a matter of
fact, did you know we use structured data to produce even more unstructured data? Data that neatly
fits into tables or spreadsheets is structured data, and unstructured data is literally everything else.
When we post videos, take pictures, create PDFs of bills for our clients, we are contributing to the vast
amount of constant unstructured data. The minute we had the ability to walk around with a PC, video
camera, still camera and social media outlets in our hands like our mobile devices, the world of data
exploded. Let's just take an image for example. This is unstructured data. You must look at the image to
understand what the image is representing. Same thing for a video. You have to watch it, and it's an
immense amount of data. With that said, there's also semi-structured data, which is a mix of both
structured data, and unstructured data. Let's say you receive an image of the cutest cat ever on the
beach via a text from your best friend. When you see this image, you see the cutest cat ever or at least
someone's opinion that is the cutest cat ever, and you see the beach. A data professional sees much,
much more. What I see when I look at the cutest cat ever picture is much more than a cat in the beach. I
see the time of day, the weather, the location, the type of cat, the color of the cat, even the age of the
cat. I also see the image type like PNG, and what's the image size, as well as the dimensions, and what's
the quality of the image. Don't forget. We mentioned that we received this from someone, that's data.
We received it at a certain time, and that's data. Did I mention when the picture was taken, and by who?
I mean this list can keep going. Just think. It went from being the cutest cat ever to a lot of data really
fast. Now, imagine people posting their favorite images on their social feeds. Multiple times per minute,
and then others are sharing that image or they like it, or they look at it and move on. That's also data.
Unstructured data requires our brain to review and provide context, and structured data fits neatly into
designs. And semi-structured is everything in-between structured and unstructured. Depending on the
organization you work with, and what they do as their product or service will determine what tools, and
software you need to work with with their data. Just knowing that there are different types of data like
structured and unstructured can help you explore the roles in data. Data is not going anywhere. It is only
growing. And just think, there was a time when the name didn't exist. It should keep us all motivated for
what's coming next.
Visualization methods and best practices

Selecting transcript lines in this section will navigate to timestamp in the video

- I read a post lately about how the person designed this beautiful dashboard and no one was using it.
This left the data professional perplexed and frustrated. I get that. But it immediately made me start
thinking of why. Why, if it is so great, are the users not using it? Well, have you ever heard of beauty is
in the eye of the beholder? It could be as simple as the data analyst designed something that only the
data analyst can use. Looks great, but no one else understands it. It could be that the data analyst just
loaded this beautiful dashboard, sent a link, and said, "Here's your great, new dashboard." Best practice
number one. For a moment, be the person you're designing for. If you want to see what this feels like,
imagine driving a 10 or 15 year old vehicle and then go sit in the newest car on the lot with the most
features. The dashboard will likely take you more than a minute to translate before you drive it. Now,
imagine that someone hands you the keys and says, "Take it for a spin. It's amazing." What do you do?
Depending on you, you freeze, go with it, or get out? In the same scenario, imagine that that car
salesman came out and explained the differences between your car dashboard and this new dashboard,
or at least hit the high points with you. What would you do then if they directed you where to look to
make you feel more comfortable for going for a drive? Always take time to document and provide a little
bit of training on your visuals. Be consistent. Use the same color for the same item all the way through.
If my brain says, "This product is blue on this stacked bar," then every time I see a reference to this
product, it will be blue. And then when I believe this new visual has the same blue, is the same item, and
I realize it's a totally different product, I get stuck on why isn't this the same? And the data is not
showing me anything. Don't overcomplicate to show your fancy vis skills. I do understand that people
want to use advanced visuals to show their skills. But the point of the dashboard has nothing to do with
your skills, but providing information. If you provide valuable insight through correct visuals and layout,
they will believe you to be a visual magician, and they will not care that you presented it in simplified
visuals. Be sure to title, label, and add tooltips appropriately. People should be able to read a title for
context, be able to easily read the labels, and hover over to get additional insight, not just see the same
thing the visual already shows. Remember that a picture is worth a thousand words. And if we could all
make decisions by consuming thousands of lines of data, we wouldn't need visuals. Not all data
visualization is a chart or graph. Make appropriate use of cards for high level totals and other aggregate
functions. And remember, a table, matrix or pivot is also a visual presentation of data, and some people
prefer that matrix to a chart. So it never hurts to give them both to meet the needs of the audience.
Always remember that your visuals will be used to provide information. So make sure that it does it in a
way that people can quickly understand and make decisions.
Creating reports to visualize your data over pages

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] Not all data is best consumed using a dashboard. Yes, dashboards provide valuable
capabilities, but some reports can be valuable in different formats. When we have reports that have line
items and that is the type of display this will produce several pages and a dashboard representation of
that data may not be the most user friendly. There are tools like Power BI Report Builder that allow us to
build what are called paginated reports. Paginated reports allow you to connect to data, not unlike
dashboards. In fact, before the popularity of the dashboard most of the reports were paginated reports.
Although some people think they are a thing of the past I think it's important to remember what
determines the style of your report is the need for how that data is best visualized and how it's going to
be consumed. If it's going to be delivered via PDF, or even printed for a meeting. In our role we've been
asked to update an existing report that's currently a line item report and is many, many pages long. This
report would simply benefit from some groups and summaries. Let's go to Report Builder and redesign
this sales order meeting records report. This report is connected to AdventureWorks 2019, which is a
popular sample report. And I have a data set here called sales records. And when I expand that, I see all
of the fields that are available to me. But if I want to look at the underlying query I can right click and go
to query. This lets me look at the different fields that are being used in the actual report. It'll also let me
take a look at the relationships, which again, multiple tables means I need relationships. And it shows
me the join type, which is inner. Now there are some fields that I need that are not in the data set. I can
go right click and go to the data set properties. And this lets me work with the individual fields. I can go
in and add a calculation, which I've already done here for order date. Let's go take a look at that
function. This actually allows me to format that order date value in a date format that's a short date,
which is perfect. I'll go ahead and click okay. Click okay again. This report does provide valuable
information, but again, that multiple lines is not effective for the meeting. So we're going to replace it
with the matrix. And even though we'll have line items, we'll just have fewer, and it will become more
meaningful for the meeting. The matrix is just like a pivot in Excel. It has rows, columns, and summary
values. So we want to look at a simple subtotal for the sales people for each product. So we'll go to
insert, choose matrix, we'll click on insert matrix and then we can click in the body of our report. We'll
drag name for product name to the rows. We'll put the last name of our salesperson in the columns and
then we'll use the total due for our summary field. In Report Builder we build in the design view, but to
see the data we have to run the report. I'll choose run. Now this report actually shows the high level
subtotal for each salesperson across the top, and then also shows each breakdown of each product
down the left hand side. We can go to the very last page and see that we went from 3,000 plus pages to
6 pages. Fantastic. Let's go back to the design view. We can make a few adjustments. Want to increase
our product there, a little bit wider. Let's go preview it again. Definitely getting a little bit better. We'll go
to our page set up. Because it's wide we'll make it landscape. We definitely want to adjust our margins
to be smaller. And then when we're ready, we can export our report into various formats. But we can
also publish these reports. Paginated reports can provide valuable reporting when your data expands
over many pages. And remember it can easily be published, PDF'd, or printed.
Creating a dashboards for reporting

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] Dashboards can provide valuable insight into different data scenarios. And the scenarios
are created by us, the users. Dashboards can be built where they show key performance indicators. And
for us, it's the sales performance in the countries that we're interested in. Here, the size of the dot
represents the amount of sales in the country. So if I click on North America, I will see on the left-hand
side all the products. And because these headings are sortable, I can interact and sort the total after
discount, bringing the highest products ordered to the top. But how does that look in different
countries? So for example, if I choose Sweden, will it also be wines? As soon as I click Sweden, I see that
the bratwurst takes the lead. I click off my map and brings all of my sales back to the top where I see
wines and bratwurst and even peanut butter cups are the top of my sales. I also noticed that I have
some formatting issues here, so I think I'll go ahead and address those now. If I go to order details, I can
choose total after discount. I'll go ahead and make that two decimal places. Perfect. Okay. So we want
to create a dashboard for our sales managers. They need to be able to work through several scenarios of
the data. One of the key things they ask for are the order details. I'll go ahead and add a page here. And
we'll name it sales orders. I'll start by adding a table and then adding various fields to it. So I'll start with
adding the company name of the customer. I also want to bring in from the orders table the order ID. Go
ahead and size it out just a little bit, so I can see it populate. I also want to bring in product name. I want
to bring in the quantity. Because that's a number field, it automatically tries to sum it up. I'm going to
right-click it until it don't summarize. I want to bring in the unit price. I also do not want to summarize it.
And then I want to bring in that total. Okay, great. I have all the basic information my sales managers
will need. One of the other requests that they had was to be able to see how the sales people handle
different customers. Are they just working with one customer? Or are they working with a lot of
different customers? We can visualize this data using a stacked bar. So I'll click the stacked bar. I'll bring
in the last name from employees. I'll go to customers, make that my legend. And since we're looking at
their sales volume, I'll go back and grab that total there. Perfect. Now I can clearly see that there's a nice
spread of how we help our customers here. We have several sales people, and they serve several of the
same customers. We can see the total of all sales at the bottom of the table. But it would be nice if we
could see that across the top. I'll go ahead and decrease the size of my stacked bar. And I'll introduce a
card into the mix. I'll bring it up here. I'm not going to size too much. And I'll bring in that total after
discount. That information really stands out across the top. Really easy to see it. Okay. So let's see how
this interacts. Right now, we see all sales, all order details. There's lots of them. We see our scroll there.
But what if I just want to focus on Sergio? I can click Sergio, and now I'm seeing only his records. What
about Blankenship? Choose Blankenship. I can see the total for Blankenship, and I can see all of
Blankenship sales. So this is just one way these visuals interact with each other to filter other
information. Okay, I'll go ahead and click in the corner there and remove that. There are times though
that we want to see other filters. Like, what about the year? What about the country? What about the
customer? Again, we can see that multiple customers are served by our sales people. So I'll go ahead
and create some slicers here. Set that sort back to company. I'll choose my slicer. And go ahead and sort
of size it. I'll save that intricate sizing for last. The very first type of filter I want to create onto the
dashboard will be the order date. I'll grab my order date hierarchy and drag that into the field list. Now, I
don't really need quarter or day, just month and year is fine. And I really don't want to take up the
screen space, so I'll go ahead and make this a dropdown. And that will be the same for all of my slicers
so that I can be consistent. Okay. I'll go ahead and add another slicer. This slicer will be for last name.
Put that into my fields list there. And again, adjust it to a dropdown. If I want to focus on a particular
group of customers, then I can actually go add that slicer. I'll put in that company name. Again, that's a
healthy list, so I'll make it a dropdown. I'll go ahead and size it just a little bit. And if I want to create a
country dropdown list, I can do that as well. Create my last slicer here. Place it where I think it's going to
go. Go ahead and do some basic sizing here to fit it all in. Don't want them to overlap. All right, perfect.
Now I need to use the ship country. I don't want to use the customer's country, but where they're
shipping the information, so I'll drag that ship country to field. All right. And then again, to be consistent,
I'll make it a dropdown list. Okay. Now size my card over. All right. Let's watch our dashboard interact.
So I want to see 2022 sales. And I only want to look at Brazil. Okay. Let's look at Germany. Excellent. I'm
seeing valuable insight. Imagine that we've gotten all this data collected, and we need to use it, maybe
we need to email it to these particular customers. Let me show you a very valuable feature. Now that I
have these filters set, I can actually export this data. This actually creates a spreadsheet of my filtered
scenario. This provides a ton of value for the user. Exporting the data out in this way provides valuable
access to information. The sales managers merely need to create the scenarios. Then they can work with
this data or even copy and paste it to use it into emails to share information with customers when
they're not going to ever have access to our dashboards. Dashboards are amazing. And when designed
effectively, they can provide a lot of value. Remember, effective being the keyword. When we take data
from reading it line by line on multiple pages to being interactive, we're giving people the ability to
question the data, create different scenarios and then also providing actionable insight.

Gathering requirements for visualizations

Selecting transcript lines in this section will navigate to timestamp in the video
- We have all heard the stories where the entrepreneur designs their life-changing app on a
napkin and then moves on to greatness. Well, guess what? You can apply the same approach to
your visuals. Maybe it's not a napkin, but I can tell you from my experience, even starting with
your customer and a napkin is better than guessing at the visual representation of the data on
your own. People never know what they want in a dashboard or report until they can see what
you see, and if that's all in your head, well no one is a mind reader. The best way to express your
ideas is to create a mockup of the dashboard. Just lay out different objects, like a table, matrix or
stack chart, add a few filters on the image, this will help everyone get on the same page about the
design. And if it's multiple pages with navigation, then wireframing helps communicate the
navigation of information before you build it. Wireframing allows you to build out a skeleton of
the pages, it doesn't have to be designed with all the colors and final graphics, it's just a sketch.
The mockup might have a little more visual styling than the wireframe, but even just a few
minutes of investing time into these together will reduce tons of back and forth on the design
process. There are many ways to produce mockups and wireframes, we can thank all the
software developers and UX designers for these sets of tools. If you are newer, it might be hard
to visualize the visuals needed because you may still be trying to determine the right visual for
the data. You can look for inspiration through samples that you can find available in the
software, like Power BI has a whole set of dashboards you can play around with to get started. In
addition to getting on the same page about the look of the dashboard, we must consider other
requirements. Be sure you're documenting these in every meeting and then following up with
notes to all stakeholders afterwards. A few items to always address are, what type of filters do
we need on the data? That way you're not bringing in more than what's needed. Example would
be a 100 year old company doesn't need 100 years of data in the dashboard. I would call this a
hard filter, that's because you handle this type of filter at the data level. What type of filters are
needed for the consumer? Which is the user of the dashboard. What might they search and filter?
These are soft filters and they're meant to be interactive. Common filters might be years and
dates. If it's dedicated to products, it will likely have a product filter. And if it's dedicated to
customers, it will have some customer filters. Never fail to find out who this dashboard actually
is for, and also determine if they have the permissions to the data and the correct licensing to use
the dashboard. Visualization is as much an art as it is a science, and these requirements are pretty
standard to every type of visualization project. And you'll discover there are many more, but if
you start with these, you'll be designing better dashboards right from the beginning.

Presenting data challenges effectively to others

Selecting transcript lines in this section will navigate to timestamp in the video
- There are some moments in meetings where I realize no matter how simple I make what I say
next, all of the sudden people are going to be staring through me, trying to figure out what in the
world I'm talking about. That's tough. It shouldn't be like that in every meeting. And if it is that
way for you every time you talk about data, then you need to focus on communication skills.
Again, a very important soft skill for a data professional. I find that talking to leadership through
the process can really help their understanding of what we're working with on a data project.
Here's an example. The data team has been tasked with studying a scenario that will have a major
impact on the organization, and it's imperative that we get this right. It's a high stakes project.
They've provided us all the access to the data, all the questions they need answered, and we have
our approach and we're ready to go. In the first several passes of the project, we realize one of
the key pieces of information that we need for the study has not been collected consistently. And
there appears to be major gaps in the time for the data we do have, and it really makes us
question if we can trust the data. What do you do facing this scenario? I can tell you very easily
what not to do. Do not wait to communicate the challenges and make sure you're prepared to
discuss them. Here are a few ways you can address this situation. Be sure to let the right person
on the team know what data appears to be missing. People make mistakes. It could have been a
bad file or even a missing file. Also communicate about what you see in the data you do have.
This gives you an opportunity to confirm that they understand about the gaps you're finding in
the data, this way there's no big surprise. And by the way, they may have a very sound reason for
those gaps. You may just not know about it. This is part of the learning curve of any new data
set. There are other scenarios. The organization is hoping that the data team will be able to show
something very positive with the data, and you found the exact opposite to be true. This is truly a
challenging scenario and not a fun one to face. So what do you do? When I find myself in this
situation where there is a totally different understanding of the data reality versus the actual
reality, I start by confirming that I'm not missing something. I double check everything. I
confirm that I've not introduced an error in any way. If I find that this is the truth of the data, then
I turn to the person in leadership and discuss my findings to get further insight into what I may
be missing and get guidance from them on the next steps for me to take. Remember, we don't
have access to all the data or even all the knowledge. Turning to your leadership is the legitimate
next step. If you discover no errors, you have done all that you can have and the truth isn't going
to be exactly what they planned. Having some communication skills on how to deliver
information might be your next step. Remember, data is used to inform a business for
improvement and sometimes delivering the results can be hard. As a data professional, just make
sure you have thoroughly checked all your results, follow the chain of command of information,
and by all means, communicate with your team.

Finalizing dashboards

Selecting transcript lines in this section will navigate to timestamp in the video
- [Instructor] Visualization tools give us so many features, including some that we really need to
pay attention to, like automatically creating titles and built-in tool tips. These features are so
nice, but they don't always really make sense to the users that are not involved in the back end of
the data. Changing titles should be an overall part of your process. And when you're ready to
finalize your dashboard, it should be one of the final things you check. You can change them at
any time, but you certainly want to make time for it. Let's look at our SalesManagerDashboard
here. There's definitely a few titles we can change to make things more meaningful. For example,
we have TotalAfterDiscount by Last Name and Company Name, and really what this does is
shows each salesperson and the total for each of their customers. Also, there's a couple of other
little things that are not too meaningful, like this company name in the legend. It's really small,
and there's a lot of different customers here. Okay, so I'll choose that option. I'll go to my format
Visual, and I can look at the Y and the X axis. So first of all, I'll turn that Title off the Y axis.
And you'll notice that the last name here on the left disappears. I'll go to the X axis and I'll turn
the Title off here and it will disappear from the bottom. And then I really don't think I need a
legend for this. There's other ways I can work with that information. So I'll turn the Legend off.
Okay, now I'll go to General and I'll go to Title. Right now, the Title is turned on, but it's not
really meaningful. So let's do Total By Salesperson For Each Customer. And I'll go ahead and
center align this. Perfect. I'm going to bring it down just a little bit. And then I have my card up
top. Let's go ahead and expand that. If I make it just a little bit bigger, I can see that it has a
TotalAfterDiscount. Okay, that's called its category label. I've got that selected. I'll go to its
format and I'll turn off that Category label. Okay, I'll work with this call out value. First of all,
it's really big. So I'll go ahead and make it a size 30, make it a little bit smaller. But I want to
change the way it displays, like I want the whole number there. I can go ahead and choose that.
I'll go ahead and leave it for Auto 'cause these numbers get large when I remove the filters. And
I'll go to General and then I can turn on its Title and have to supply that title. And we'll do Total
Sales here. And again, we can make that just a tad bit bigger, and then let's center it. Okay,
perfect. Now there's no question that that's the Total Sales and then underneath that is the Total
By Salesperson For Each Customer. Okay, also notice that we have different slicers across the
top. This is the perfect opportunity to provide some instructions. So I'll go ahead and click on
Year and Month, I'll go to the format, I'll go to my Slicer settings. I want to leave it as Multi-
select because I want people to be able to select multiple criteria, but I also like the Select all
option, so I'll turn that on. Can take a look at the header. And then notice the title text. Here, I
can change this to read Select Year and Month. Just the word select tells people, hey, this is
something I can select. I'll go to a Last Name. Because I was on that area, it will automatically
update. Okay, and I can do Salesperson. Company name is fine, but I'll go ahead and put Select
Company Name. Now, I want to be consistent. So I'll go back to my Salesperson and tell it to be
Select Salesperson. Looks much better. And then I'll go to my ShipCountry, and I'll change to
Select Shipping Country. Now because we have two countries, the country that the customers' in
and the shipping country, it is probably important to specify. Now, these changes are minimal,
but they've already made a big difference. Okay, let's go here to our table here. Let's go to
General. It's Title is turned off, so let's turn it on, and let's call those Sales Order Records.
Fantastic.

Adding dashboard filters

Selecting transcript lines in this section will navigate to timestamp in the video

- [Narrator] One thing I've noticed is that we have these filters. Let me go ahead and clear them. And I'll
clear this country filter. The dashboard, when it opens, it's actually going to look like this. And if people
start to make changes we might want to give them the ability to go back to this original view. We can do
this by adding a bookmark, I'll choose add bookmark. And I'm going to rename that as clear. And let me
show you how this works. So I'll go ahead and choose Sergio and it updates to show me Sergio's sales,
just perfect. And then if I choose the bookmark, it clears it back. If I go select 2021 I choose control
select on these sales people. And then I choose clear, I go back to the original state. This is really, really
great. This could be very handy for your end users, gives them the ability to clear all their filters and go
back to the original state, but they may not know how to navigate to bookmarks. Let's go add a button
onto our dashboard. I'll go to buttons. I'll go ahead and add a blank bookmark. I'll go ahead and move it
over here to the right. Okay, I want to change it to a pill shape. So it'll look more like a button. I'll go to
my style settings here and I'll turn on the text. Need the text to say clear filters and don't need the icon.
Let's go back to that text and make it centered. Let's go ahead and make it black. We'll go to our style
here and let's turn the fill of the button on and let's make that sort of a darker gray color. Perfect. So I
have my clear filters button created. Now I need to apply my action. I'll tell it because I chose a
bookmark to go to the clear bookmark. Okay, let's go ahead and close our bookmark pane or format
pain. What I'll do now is I'll go ahead and select a few of my sales people. I'm holding my control key. I'll
go ahead and say, let's see for 2022. And then what I need to do is clear my filters. I can just control,
click, and I go back to the original state and notice all my filters are cleared.

Modifying dashboard tooltips

Selecting transcript lines in this section will navigate to timestamp in the video

- [Instructor] Let's hover over some of the information in our stacked bar. One thing I want you to notice
is that we have some pre-built tool tips, which is great. Gives us a lot of information but it may not be all
the information we'd like to have. So let's go ahead and choose that stacked bar. Let's go back to our
visualizations and take a look at tool tips. By default, it'll bring in the tool tips based on what information
has been supplied to the visual. This is why we see last name, company name, and total after discount.
Okay, let's just go ahead and name that total after discount to total amount. And let's change this last
name here to salesperson. Last name is all preference if you do that. I think we'll just do salesperson. Let
me bring quantity to the tool tips. Now I do want to see a total quantity. Okay, so I'll go ahead and make
sure that's set to sum, which is perfect. And then I want to count how many orders they actually placed.
And I want to do a distinct count of the order ID. And I'll name this total order count. Now when I hover
over, I can see the salesperson, the company name, the total amount, the quantity of what was ordered
and the total order count. Okay, really don't need that quantity because again, that's related to each
individual line item so I'll go ahead and take that out. I'll go ahead and put this total after discount in
again. And I want to change that to an average. And then I'll do average of order amounts. Perfect. This
gives me a lot of information just by simply changing a few things in the tool tips and naming things
appropriately. One last thing as you finalize your dashboard is sometimes people want to have a little bit
more background, different formats. I want to change it up to look a little bit more than just solid white.
When you're in Power BI, you can actually go in and change a lot. For example, go to view. Let's go
change this dashboard to a dark background. There are several different option here for you to choose
from. You can just point and click until you find the one you like. You can also create your own custom
themes. Okay, let's do that black background, like that dark background. One final step is the mobile
layout. I'm in the page view, I'll go to mobile layout. This is how this Power BI dashboard will look on this
page when people visit it. I'm going to go ahead and bring my card to the top. And again, it's a
responsive design. So even if these look big, they'll work themselves out. I'll go ahead and move my
stacked bar here. And then I'll bring in my sales orders. That way, if someone consumes this dashboard,
this is how it'll look in that mobile environment. Okay, very last thing we want to do is bring in the filters.
Want those to be at the top. And again, I'll just keep sizing. Go ahead and put this one here. I'll do two
slicers per section. So now I have the mobile layout covered as well as the page view. Okay, let me go
ahead and go out of the mobile layout. Check and make sure everything is labeled. Also check and make
sure things are functional, so I'll go ahead and click on Sergio, Jeffers. Perfect. And then I'll choose to
clear my filters and hold my Control key and clear my filters. Now I'm ready to save and publish my
dashboard. There are certainly more items that can be adjusted and tweaked with these dashboards,
but at a minimum, this is a great start.
Data workers

Selecting transcript lines in this section will navigate to timestamp in the video

- If you use spreadsheets every day and you create valuable insights for people through various
presentations or reporting, you are a data worker. But you're not likely called that by your job title. You
may have a job title that represents a department or the people you support, but you're not titled data
worker. You just are one. I would also consider you a data worker if you find yourself exporting data out
of systems, building some form of report or presentation weekly or monthly. You may also receive data
from someone in another department, like IT, who has access to more data than you. You may
frequently visit the company's data warehouse, or data system, to gain information for your reporting
purposes. Data workers also work with functions and do some aggregate functions with the data. You
may use some logical functions like an if. You're able to search for functions and find the ones that are
relevant to your data work, you are likely a data worker. I believe that there are far more data workers
than our organization realize, and if you're in this role, guess what? You're a great resource. And one of
the first places an organization can turn to, to upscale and data. If you're looking for areas of growth,
then make sure you're using tools in Excel like Power Query and other analysis techniques like
PivotTables and basic visualizations. If you have more than average skill with these, you might be more
than a data worker already. You can also build skills like PowerPoint because this is another way we
visualize data for meetings and presentations. Documentation is a critical competency for any data role,
so being a wizard at Microsoft Word doesn't hurt. Remember, like every other tool, it's powerful and
often because we use it every day, we don't believe we need to explore training. Trust me, you should.
For the soft skills, you'll want to focus on effective presentations and communication skills. Having these
skills make you more than suitable for roles that require advanced skills in Excel and doing basic analysis.

Data analysts

Selecting transcript lines in this section will navigate to timestamp in the video

- I have spent years trying to define data analysts to people. I've come up with several ways to try to
define this role and the skills. It's important to know that mos, do not have a job title that contain the
words data analyst, but if you have a data department, then they are likely to be called data analyst. Not
all organizations have a dedicated data department. So you might be called an operations analyst or a
marketing analyst. Your title likely has analyst in the title. There are also varying levels of data analyst,
and you can be a data analyst, and not know it. Or be performing the skills of an analyst, and have no
idea that you are. A data analyst will have a deeper understanding of data systems and have more
knowledge about database designs than a data worker. A data analyst will find they have a little more
access to see tables, and views of the databases. They probably have some basic SQL querying skills and
may write SQL statements to gain access to data all the time. This varies by organization, and access
levels. A data analyst will have a better than average understanding of the data governance plan
because if you're a data analyst, you are going to be working under the policies, and procedures that are
established. Data analysts that are a few years in are likely to understand more about what questions to
ask, and research in general. Data analysts understand how to clean data, and transform it to meet the
requirements of the project. Data analysts also know how to create functions of varying types like
conditional statements, logical statements. Data analysts work with statistics, and most certainly at the
beginning of their career, basic stats and aggregate functions, and certainly have learned how to
connect data in a way that they can just refresh their data, and update their visuals and reports. If you're
looking for areas of growth, then you can go a little bit deeper into statistics. It's a must. Note that I said
a little deeper, not a full statistician, which is another role entirely. You'll find the data sets you are
developing might be used for different statistical tests. So it is important to have a basic knowledge. You
can never have enough experience writing functions, and you definitely want to be able to write if
functions, aggregate functions and simple lookups. You must understand joins, and how they impact
data sets. And for the soft skills, active listening, data storytelling, and critical thinking. If you're realizing
that you're a data analyst, then you might relate to being called a wizard at work.

Data engineers

Selecting transcript lines in this section will navigate to timestamp in the video
- It is one thing to refine and add to a data set. It's an entirely different skill to be able to build
data sets. I personally believe what most people consider, as a data analyst in their organization,
may be performing data engineering tasks more than analysis tasks. The crossover between the
analysts and the engineering and skills are real. They share a lot of common foundational skills.
A data engineer is someone who fully understands how to look at the data sets, knows how to
refine them into smaller more sensible sets for people to use. You may receive data from
someone who is engineering that data from a set of queries, and then providing it to you or
others. A data engineer also is likely to have more access to data, which is why they're sending it
to you in the first place. They also understand security and privacy of data through the overall
data governance strategy. Data engineers can transition to data architect, which covers more
systems, more server and more security strategies for systems across all of the organization. If
you want to grow further in this role, you will certainly need to understand more about structured
and unstructured data and how to convert it to usable data sets. You'll want to understand the
design methodologies of relational database systems and you will need to understand how to
design databases. You'll also want the shared skills of communication, effective presentations,
critical thinking, and active listening. These skills will be used to learn how to take hundreds of
tables to define them into usable tables for other processes using ETL or ELT, which is extract,
transform, and load or extract, load, and transform. This is how data goes from a production
system to a data warehouse, as an example. I believe there is a lot of opportunity for data
analysts to pursue this role as they grow deeper in their understanding of data and infrastructure.

Data scientists

Selecting transcript lines in this section will navigate to timestamp in the video
- People often pursue data with the hopes of becoming a data scientist. And I believe it's
important to know that not all data professionals grow into data scientists, nor do we need all
analysts or engineers to turn into data scientists. Data scientists will likely have all the skills of
the analyst engineer and they will have likely worked in those roles. However, a data scientist
will have a heavier requirement for skills in coding, mathematics, and statistics. A data scientist
will be instrumental in developing tools and instruments that provide valuable insight to the
organization, but they can't do it alone without all the other roles, or well, maybe they can
perform the task, but when you don't have all the other roles, the data scientists must perform
them. Data scientists or data science teams comprised of all the disciplines will interpret large
sets. They'll likely build machine learning models. They'll present outcomes and make
suggestions as a portion of what they do. They'll likely be leaders in the data science team.
They'll provide support and strategy to the overall data governance plan. If you want to further
your skills in this area, you should consider gaining a better understanding of programmatic
thinking. You'll want to dive deeper into learning code and maybe start with something like
Python. If you have some stats experience, or not, you will definitely want to grow in this area.
Remember, one of the key differences between data scientists and all other roles is heavier math,
coding, and stats. It's also important to remember that for most organizations, having a data
scientist and not having all the other roles means that that data scientist is having to perform all
those roles before they get to the data science. This is where having a team of multi-discipline
people serving all the roles might just be your next play.

Next steps and additional resources

Selecting transcript lines in this section will navigate to timestamp in the video
- I hope that by now you've gained a deeper understanding of what data can mean for your
career. If you want to go deeper into some of these hard skills we've discussed I would encourage
you to watch my learning data analytics courses and other fellow data authors on courses on
visualization, statistics, data science, and many other fascinating data categories. I hope you'll
follow me on LinkedIn and seek out other thought leaders, as we're all on this journey together.
Look for people who are doing what you want to do and learn from them. Invest in yourself.
Invest in your skills. And never stop learning.

You might also like