Data Visualization : Pixel-oriented Visualization Techniques, Geometric Projection Visualization Techniques
, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques, Visualizing Complex Data and
Relations.
•
What is Data Visualization
Data visualization is the graphical representation of information.
Understanding Data Visualization
Data visualization translates complex data sets into visual formats that are easier for the
human brain to understand. This can include a variety of visual tools such as:
• Charts: Bar charts, line charts, pie charts, etc.
• Graphs: Scatter plots, histograms, etc.
• Maps: Geographic maps, heat maps, etc.
• Dashboards: Interactive platforms that combine multiple visualizations.
The primary goal of data visualization is to make data more accessible and easier
to interpret allow users to identify patterns, trends, and outliers quickly. This is
particularly important in big data where the large volume of information can be
confusing without effective visualization
techniques.
Why is Data Visualization Important?
Let’s take an example. Suppose you compile data of the company’s profits from 2013 to
2023 and create a line chart. It would be very easy to see the line going constantly up with
a drop in just 2018. So you can observe in a second that the company has had continuous
profits in all the years except a loss in 2018.
It would not be that easy to get this information so fast from a data table. This is just one
demonstration of the usefulness of data visualization. Let’s see some more reasons why
visualization of data is so important.
Importance of Data Visualization
1. Data Visualization Simplifies the Complex Data
Large and complex data sets can be challenging to understand. Data visualization helps
break down complex information into simpler, visual formats making it easier for the
audience to grasp. For example in a scenario where sales data is visualized using a heat
map on Tableau states that have suffered a net loss are colored red. This visual makes it
instantly obvious which states are underperforming.
2. Enhances Data Interpretation
Visualization highlights patterns, trends, and correlations in data that might be missed in
raw data form. This enhanced interpretation helps in making informed decisions. Consider
another Tableau visualization that demonstrates the relationship between sales and profit. It
might show that higher sales do not necessarily equate to higher profits this trend that
could be difficult to find from raw data alone. This perspective helps businesses adjust
strategies to focus on profitability rather than just sales volume.
3. Data Visualization Saves Time
It is definitely faster to gather some insights from the data using data visualization rather
than just studying a chart. In the screenshot below on Tableau it is very easy to identify the
states that have suffered a net loss rather than a profit. This is because all the cells with a
loss are coloured red using a heat map, so it is obvious states have suffered a loss. Compare
this to a normal table where you would need to check each cell to see if it has a negative
value to determine a loss. Visualizing Data can save a lot of time in this situation.
4. Improves Communication
Visual representations of data make it easier to share findings with others especially those
who may not have a technical background. This is important in business where
stakeholders need to understand data-driven insights quickly. Let see the below TreeMap
visualization on Tableau showing the number of sales in each region of the United States
with the largest rectangle representing California due to its high sales volume. This visual
context is much easier to grasp rather than detailed table of numbers.
5. Data Visualization Tells a Data Story
Data visualization is also a medium to tell a data story to the viewers. The visualization can
be used to present the data facts in an easy-to-understand form while telling a story and
leading the viewers to an inevitable conclusion. This data story should have a good
beginning, a basic plot, and an ending that it is leading towards. For example, if a data
analyst has to craft a data visualization for company executives detailing the profits of
various products then the data story can start with the profits and losses of multiple
products and move on to recommendations on how to tackle the losses.
Best Practices for Visualizing Data
Effective data visualization is crucial for conveying insights accurately. Follow these best
practices to create compelling and understandable visualizations:
1. Audience-Centric Approach: Tailor visualizations to your audience’s
knowledge level, ensuring clarity and relevance. Consider their familiarity with
data interpretation and adjust the complexity of visual elements accordingly.
2. Design Clarity and Consistency: Choose appropriate chart types, simplify visual
elements, and maintain a consistent color scheme and legible fonts. This ensures a
clear, cohesive, and easily interpretable visualization.
3. Contextual Communication: Provide context through clear labels, titles,
annotations, and acknowledgments of data sources. This helps viewers understand
the significance of the information presented and builds transparency and
credibility.
4. Engaging and Accessible Design: Design interactive features thoughtfully,
ensuring they enhance comprehension. Additionally, prioritize accessibility by
testing visualizations for responsiveness and accommodating various audience
needs, fostering an inclusive and engaging experience.
Data visualization Techniques:
1. Pixel Oriented Visualization technique
2. Geometric Projection Visualization technique-
a. Scatter plot matrices
b. Hyper slice
c. Parallel coordinates
3. Icon based visualization techniques
4. Hierarchical Visualization techniques
1. Pixel-Oriented Visualization Techniques:
➢ A simple way to visualize the value of a dimension is to use a pixel where the
color of the pixel reflects the dimension’s value.
➢ For a data set of m dimensions, pixel-oriented techniques create m windows on
the screen, one for each dimension.
➢ The m dimension values of a record are mapped to m pixels at the
corresponding positions in the windows. The colors of the pixels reflect the
corresponding values.
➢ Inside a window, the data values are arranged in some global order shared by all
windows.
➢ The global order may be obtained by sorting all data records in a way that’s
meaningful for the task at hand.
Pixel-based visualizations use that approach and are capable of displaying large
amounts of data on a single screen.
Case Study:
➢ All Electronics maintains a customer information table, which consists of 4
dimensions: income, transaction_volume and age.
➢ We analyse the correlation between income and other attributes by visualization.
➢ We sort all customers in income in ascending order and use this order to layout
the customer data in the 4 visualization windows as shown in fig.
2. Geometric Projection visualization techniques:
A drawback of pixel-oriented visualization techniques is that they cannot help us
much in understanding the distribution of data in a multidimensional space.
Geometric projection techniques help users find interesting projections of
multidimensional data sets.
Geometric projection techniques are a good choice for finding outliers and correlation
between attributes in multivariate data. A geometric projection technique does this by
using transformations and projections of the data. When using large data sets a
clustering algorithm is usually necessary to apply before the visualization technique to
avoid cluttered and unclear data caused by the too much information. Some widely
used geometric projection techniques are:
Line Plot:
■ This is the plot that you can see in the nook and corners of any sort of
analysis between 2 variables.
■ The line plots are nothing but the values on a series of data points
will be connected with straight lines.
■ The plot may seem very simple but it has more applications not only in
machine learning but in many other areas.
■ Used to analyze the performance of a model using the ROC- AUC curve.
Bar Plot
■ This is one of the widely used plots, that we would have seen multiple
times not just in data analysis, but we use this plot also wherever there
is a trend analysis in many fields.
■ We can visualize the data in a cool plot and can convey the details straight
forward to others.
■ This plot may be simple and clear but it's not much frequently used in
Data science applications.
Stacked Bar Graph:
■ Unlike a Multi-set Bar Graph which displays their bars side-by-side,
Stacked Bar Graphs segment their bars. Stacked Bar Graphs are used to
show how a larger category is divided into smaller categories and what
the relationship of each part has on the total amount. There are two
types of Stacked Bar Graphs:
■ Simple Stacked Bar Graphs place each value for the segment after the
previous one. The total value of the bar is all the segment values added
together. Ideal for comparing the total amounts across each
group/segmented bar.
■ 100% Stack Bar Graphs show the percentage-of-the-whole of each group
and are plotted by the percentage of each value to the total amount in
each group. This makes it easier to see the relative differences between
quantities in each group.
■ One major flaw of Stacked Bar Graphs is that they become harder to
read the more segments each bar has. Also comparing each segment to
each other is difficult, as they're not aligned on a common baseline.
Scatter plots:
A scatter plot is one of the most common visualization techniques and can be
visualized both in 3D and 2D. The scatter plot visualizes different attributes of the
data on the x,y axis for 2D visualizations and also along the z-axis in 3D. Scatter plots
are usable to find correlations between attributes in arbitrary small data sets. If the
data set gets too big or contains too many attributes the scatter plot gets cluttered and
hard to interpret.
Box and Whisker Plot
■ This plot can be used to obtain more statistical details about the data.
■ The straight lines at the maximum and minimum are also called whiskers.
■ Points that lie outside the whiskers
will be considered as an outlier.
■ The box plot also gives us a
description of the 25th, 50th,75th
quartiles.
■ With the help of a box plot, we can
also determine the
Interquartile range(IQR) where
maximum details of the data will
be present
■ These box plots come under
univariate analysis, which means
that we are exploring data only
with one variable.
Pie Chart
A pie chart shows a static number and how categories represent part of a
whole the composition of something. A pie chart represents numbers in
percentages, and the total sum of all segments needs to equal 100%.
• Extensively used in presentations and offices, Pie Charts help show
proportions and percentages between categories, by dividing a circle into
proportional segments. Each arc length represents a proportion of each
category, while the full circle represents the total sum of all the data, equal to
100%.
Donut Chart:
• A donut chart is essentially a Pie Chartwith an area of the centre cut out. Pie
Charts are sometimes criticised for focusing readers on the proportional areas
of the slices to one another and to the chart as a whole. This makes it tricky to
see the differences between slices, especially when you try to compare multiple
Pie Charts together.
• A Donut Chart somewhat remedies this problem by de-emphasizing the use of
the area. Instead, readers focus more on reading the length of the arcs, rather
than comparing the proportions between slices.
• Also, Donut Charts are more space-efficient than Pie Charts because the
blank space inside a Donut Chart can be used to display information inside
it.
Marimekko Chart:
Also known as a Mosaic Plot.
lOMoARcPSD|47937399
• Marimekko Charts are used to visualise categorical data over a pair
of variables. In a Marimekko Chart, both axes are variable with a
percentage scale, that determines both the width and height of each
segment. So Marimekko Charts work as a kind of two- way 100%
Stacked Bar Graph. This makes it possible to detect relationships
between categories and their subcategories via the two axes.
• The main flaws of Marimekko Charts are that they can be hard to
read, especially when there are many segments. Also, it's hard to
accurately make comparisons between each segment, as they are
not all arranged next to each other along a common baseline.
Therefore, Marimekko Charts are better suited for giving a more
general overview of the data.
HyperSlice:
HyperSlice is a new method for the visualization of scalar functions of many
variables. With this method the multi-dimensional function is presented in a
simple and easy to understand way in which all dimensions are treated
identically. The central concept is the representation of a multi-dimensional
function as a matrix of orthogonal two-dimensional slices. These two-
dimensional slices lend themselves very well to interaction via direct
manipulation, due to a one to one relation between screen space and
variable space.
Parallel coordinates:
➢ To visualize n-dimensional data points, the parallel coordinates
technique draws n equally spaced axes, one for each dimension,
parallel to one of the display axes.
➢ A data record is represented by a polygonal line that intersects each
axis at the point corresponding to the associated dimension value.
➢ A major limitation of the parallel coordinate’s technique is that it
cannot effectively show a data set of many records.
lOMoARcPSD|47937399
3.Icon-based visualization techniques:
Icon-based techniques visualize data by changing the properties of an icon
or glyph according to the data. An early version was Chernoff faces where
data is mapped to different face parts as nose, mouth, eyes and more. For
example how rich people are can be mapped to the mouth of the Chernoff
face. Rich people represented by a happy mouth and and poor people by a
sad mouth. Other methods are:
Stick figures:
It maps multidimensional data to five –piece stick figure, where each figure
has 4 limbs and a body.
Two dimensions are mapped to the display (x and y) axes and the
remaining dimensions are mapped to the angle and/or length of the limbs.
lOMoARcPSD|47937399
4. Hierarchical Visualization Techniques:
Hierarchical visualization techniques are techniques, whose domain data
structure and type of information are, respectively, tree and hierarchical
information. There are two basic branches of visualization
techniques for hierarchies. The first is based on a node-edge graph-layout
approach which focuses attention on the structure and relationships, and
the second on space-filling approaches, which focus attention on the relative
sizes of nodes in the hierarchy.
➢ The visualization techniques discussed so far focus on visualizing
multiple dimensions simultaneously.
➢ However, for a large data set of high dimensionality, it would be difficult
to visualize all dimensions at the same time.
➢ Hierarchical visualization techniques partition all dimensions into
subsets (i.e., subspaces). The subspaces are visualized in a hierarchical
manner.
➢ “Worlds-within-Worlds,” also known as n-Vision, is a representative
hierarchical visualization method.
➢ Suppose we want to visualize a 6-D data set, where the dimensions are
F,X1, : : : ,X5.
Given more dimensions, more levels of worlds can be used, which is why
the method is called “worlds-within-worlds.”
lOMoARcPSD|47937399
Worlds - Within – Worlds
As another example of hierarchical visualization methods, tree-maps display
hierarchical data as a set of nested rectangles. For example, a tree-map
visualizing Google news stories.
Tree Map
All news stories are organized into seven categories, each shown in a large
rectangle of a unique color. Within each category (i.e., each rectangle at the
top level), the news stories are further partitioned into smaller
subcategories.
Visualizing Complex Data and Relations.
lOMoARcPSD|47937399
• For a large data set of high dimensionality, it would be
difficult to visualize all dimensions at the same time.
• Hierarchical visualization techniques partition all
dimensions into subsets (i.e., subspaces).
• The subspaces are visualized in a hierarchical manner
• "Worlds-within-Worlds," also known as n-Vision, is a
representative hierarchical visualization method.
• To visualize a 6-D data set, where the dimensions are
F,X1,X2,X3,X4,X5.
• We want to observe how F changes w.r.t. other
dimensions. We can fix X3,X4,X5dimensions to selected
values and visualize changes to F w.r.t. X1, X2
• Most visualization techniques were mainly for numeric data.
• Recently, more and more non-numeric data, such as
text and social networks, have become available.
• Many people on the Web tag various objects such as
pictures, blog entries, and product reviews.
• A tag cloud is a visualization of statistics of user-generated tags.
• Often, in a tag cloud, tags are listed alphabetically or in a user-
preferred order.
• The importance of a tag is indicated by font size or color.
Word Cloud:
lOMoARcPSD|47937399
Also known as aTag Cloud.
• A visualization method that displays how frequently words appear in
a given body of text, by making the size of each word proportional to
its frequency. All the words are then arranged in a cluster or cloud of
words. Alternatively, the words can also be arranged in any format:
horizontal lines, columns or within a shape.
• Word Clouds can also be used to display words that have meta-data
assigned to them. For example, in a Word Cloud with all the World's
country's names, the population could be assigned to each name to
determine its size
• Colour used on Word Clouds is usually meaningless and is primarily
aesthetic, but it can be used to categorise words or to display
another data variable.
• Typically, Word Clouds are used on websites or biogs to depict
keyword or tag usage. Word Clouds can also be used to compare
two different bodies of text together.
• Although being simple and easy to understand, Word Clouds have some
major flaws:
• Long words are emphasised over short words.
• Words whose letters contain many ascenders and descenders
may receive more attention.
• They're not great for analytical accuracy, so used more for aesthetic
reasons instead.
Data visualization choices:
Five factors that influence data visualization choices:
Audience: It’s important to adjust data representation to the specific target
audience.
lOMoARcPSD|47937399
Content: The type of data you are dealing with will determine the tactics.
Context: You can use different data visualization approaches and read data
depending on the context.
Dynamics: There are various types of data, and each type has a different
rate of change.
Purpose: The goal of data visualization affects the way it is implemented. In
order to make a complex analysis, visualizations are compiled into dynamic
and controllable dashboards that work as visual data analysis techniques
and tools.
Tools for Data visualization:
Data visualization tools for different types of users and purposes.
Tableau is one of the leaders in this field. A user-friendly interface and a
rich library of interactive visualizations, Tableau stands out for its powerful
capabilities. The platform provides large integration options including My
SQL, Teradata, Hadoop and Amazon Web Services. This platform to derive
meaning from data and use insights for effective storytelling.
R and Python are well-equipped for data visualization. Customizing
graphics is easier and more intuitive in R with the help of ggplot2 than in
Python with Matplotlib. The Seaborn library helps to overcome this, and
offers good standard solutions which get by with relatively few lines of code.
Plotly is one of the most popular platforms in this category. It’s more
complex than Tableau, however, comes with analytics perks. With this
visualization tool, you can create charts using R or Python, build custom
data analytics
IBM Watson Analytics is known for its NLP capabilities. The platform
literally supports conversational data control a longside strong dashboard
building and data reporting tools.
Tools for complex data visualization:
The growing adoption of connected technology places a lot of opportunities
lOMoARcPSD|47937399
before the companies and organizations. To deal with large volumes ofmulti-
source often unstructured data, businesses search for more complex
visualization and analytics solutions. This category includes Power BI,
Kibana and Grafana.
Power BI is exceptional for its highly intuitive drag-and-drop interface, short
learning curve and large integration capabilities, including Salesforce and
MailChimp.
Kibana is the part of the Elastic Stack that turns data into visual insights.
It’s built on and designed to work on Elasticsearch data only. This
exclusivity, however, does not prevent it from being one of the best data
visualization tools for log data.
Grafana a professional data visualization and analytic tool that supports up
to 30 data sources, including AWS, Elastic search and Prometheus. Grafana
is more flexible in terms of integrations compared to Kibana, each of the
systems works best with its own type of data.
Data Visualization Process:
Data visualization is the practice of translating information into a visual
context, such as a map or graph, to make data easier for the human brain
to understand and pull insights from. The main goal of data
visualization is to make it easier to identify patterns, trends and outliers in
large data sets.
Fig: Data visualization Process