UNIT 5 Data Analytics
UNIT 5 Data Analytics
Maps: Represent spatial data, good for geographically bound data like population density or
resource locations.
Scatter Plots: Show relationships or correlations between two variables.
Box plots: It display key statistics, while area charts highlight cumulative trends by filling the area
under the line.
Area Charts: Similar to line charts but with the area under the line filled, these charts accentuate
cumulative data patterns.
Bubble Charts: Enhance scatter plots by introducing a third dimension through varying bubble sizes,
revealing additional insights.
Treemaps: Efficiently represent hierarchical data structures, breaking down categories into nested
rectangles.
Violin Plots: Violin plots combine aspects of box plots and kernel density plots, providing a detailed
representation of the distribution of data.
Word Clouds: Word clouds are visual representations of text data where words are sized based on
their frequency.
3D Surface Plots: 3D surface plots visualize three-dimensional data, illustrating how a response
variable changes in relation to two predictor variables.
Network Graphs: Network graphs represent relationships between entities using nodes and edges.
They are useful for visualizing connections in complex systems, such as social networks,
transportation networks, or organizational structures.
Sankey Diagrams: Sankey diagrams visualize flow and quantity relationships between multiple
entities. Often used in process engineering or energy flow analysis.
Tools and Software for Data Visualization: Common tools for creating data visualizations include Microsoft
Excel, Tableau, Power BI, Google Data Studio, and Python libraries like Matplotlib and Seaborn.
Some popular data visualization tools include:
Tableau: Known for its drag-and-drop interface and powerful analytics capabilities.
Microsoft Power BI: Offers rich analytics with integration into other Microsoft services.
Google Data Studio: A free tool that integrates well with Google products and other data sources.
Python Libraries (Matplotlib, Seaborn, Plotly): Widely used for custom visualizations and in data
science applications.
D3.js: A JavaScript library that creates dynamic and interactive visualizations for web applications.
This category includes Power BI, Kibana and Grafana.
Power BI: It is exceptional for its highly intuitive drag-and-drop interface, short learning curve and
large integration capabilities, including Salesforce and MailChimp.
Kibana : It is the part of the Elastic Stack that turns data into visual insights. It’s built on and
designed to work on Elasticsearch data only. This exclusivity, however, does not prevent it
from being one of the best data visualization tools for log data.
Grafana: It is a professional data visualization and analytic tool that supports up to 30 data sources,
including AWS, Elastic search and Prometheus. Grafana is more flexible in terms of integrations
compared to Kibana, each of the systems works best with its own type of data.
Guiding Principles
Clarity and Simplicity: Keep the visualization simple and avoid clutter. The goal is to communicate
data effectively (focus on clarity and relevance)
Choose the Right Chart Type: The chart type should match the nature of the data and the message
you want to convey.
Use Consistent Colors and Symbols: Colors should highlight important data points and not confuse
or overwhelm the viewer.
Provide Context/Label Clearly: Labels, titles, and legends are crucial for helping the audience
understand the visualization.
Tell a Story: Guide viewers through the data by arranging visuals logically and emphasizing key
Benefits of Data Visualization:
Simplifies Complex Data: Helps break down large volumes of data, making it easier to understand.
Reveals Patterns and Trends: Assists in identifying patterns that are not immediately apparent in
raw data.
Enhances Decision-Making: Facilitates quicker and more accurate decisions by providing clear data
insights.
Engages Audiences: Visually engaging data representations can hold attention better than raw
numbers or text.
Encourages Discovery: Enables users to explore and investigate data on their own.
Data visualization has some more specialties such as:
It can identify areas that need improvement or modifications.
It can clarify which factor influence customer behavior.
It helps you to understand which products to place where.
It can predict sales volumes
Uses of the Data Visualization:
To make easier in understand and remember.
To discover unknown facts, outliers, and trends.
To visualize relationships and patterns quickly.
To ask a better question and make better decisions.
To competitive analyze and improve insights.
Applications of Data Visualization
1. Business Intelligence and Reporting
2. Financial Analysis
3. Healthcare
4. Marketing and Sales
5. Human Resources
5.2 Pixel-Oriented Visualization Technique: These are a way to display very large datasets by using pixels as
a visual representation of data values. Each data point is shown as a tiny dot (a pixel) on the screen, and its
color represents the value of the data.
If the data has many dimensions (or features), a separate section of the screen (called a window) is
created for each dimension. The data is arranged in the same order across all these windows, so you
can compare patterns easily.
This method makes it possible to show massive amounts of data at once while still being able to spot
trends, patterns, or unusual values (called anomalies). It uses every pixel on the screen efficiently to
give a clear and compact view of the data.
Eg: All Electronics maintains a customer information table, which consists of 4 dimensions: income,
credit limit, transaction volume and age. We analyze the correlation between income and other
attributes by visualization.
We sort all customers in income in ascending order and use this order to layout the customer data in
the 4 visualization windows
The pixel colors are chosen so that the smaller the value, the lighter the shading.
Using pixel based visualization we can easily observe that credit limit increases as income increases
customer whose income is in the middle range are more likely to purchase more from All
Electronics, these is no clear correlation between income and age.
Fig: Pixel oriented visualization of 4 attributes by sorting all customers in income Ascending order.
Key Concepts:
Pixel Mapping: In pixel-oriented visualizations, each data point is represented as a single pixel. The color or
intensity of the pixel is used to encode the data value.
These visualizations/positions are most effective for datasets with a large number of data points
because each pixel can hold one data value, allowing millions of values to be visualized on a
standard display.
High-Density Displays/ Dense Information Representation:
Pixel-oriented techniques are designed to handle very high data density, making them suitable for
large-scale datasets where traditional visualization methods, like bar or line charts, would become
cluttered.
By representing each data point as a pixel, these techniques can handle millions of data points at
once, maximizing screen space.
Arrangement of Pixels:
Data values are arranged in a specific layout on the screen. Common arrangements include:
Row-wise and column-wise arrangement: Pixels are placed in rows and columns to maintain spatial
relationships within the data.
Recursive pattern: The data is arranged in a pattern that recursively groups related data points to show
correlations or trends.
Space-filling curves: These curves are used to map multidimensional data into 2D space without breaking
the continuity of the data structure.
Color Encoding: Color plays a critical role in pixel-oriented visualization techniques, as it conveys the value of
each data point. Color map selection is essential to ensure clarity and accurate interpretation of data.
Gradients or distinct colors can be used to represent data values, helping to distinguish between high, low,
or average values.
Types of Pixel-Oriented Visualization Techniques: There are several common techniques within pixel-
oriented visualization, each suited to specific data structures or analytical goals:
1. Recursive Pattern Technique: This technique arranges pixels in a recursive pattern, allowing for efficient
use of space and creating visually recognizable patterns that help in detecting correlations or clusters.
Recursive patterns work well for data with hierarchical structures, where users need to understand different
levels of the data simultaneously.
2. Circle Segments Technique: In this approach, data values are arranged in circular segments or ring
patterns, with each ring representing a different subset or level of data.
Circle segments are useful for visualizing cyclic data patterns, such as seasonal trends or periodic activities,
providing an intuitive way to spot recurring patterns.
3. Axes-Based Techniques: Data is displayed in a structured grid, where each row or column represents a
specific dimension or attribute, and each cell (or pixel) represents a data value for that attribute.
Axes-based techniques are helpful for comparing multiple dimensions simultaneously, especially for
identifying correlations and trends across attributes.
4. Spiral and Temporal Techniques: Pixels are arranged in spiral or other time-based layouts, which are
particularly effective for time-series data, where patterns over time are a focal point.
Temporal techniques help users quickly spot anomalies or seasonal variations in time-series data.
5. Query-Dependent Techniques: Here, the pixel arrangement adapts based on user queries or the specific
aspects of the data that are most relevant to the current analysis.
Query-dependent techniques are interactive, allowing users to focus on specific data slices, filtering and
zooming in on areas of interest
6. Query-independent techniques: Visualize the entire dataset, showing all the data points regardless of
user queries. Examples include the pixel bar chart and spiral visualization.
Advantages of Pixel-Oriented Visualization Techniques:
High Data Density: They allow visualization of massive datasets in a compact space, enabling detailed
analysis on a single screen.
Efficient Pattern Recognition: Patterns, trends, and outliers become visually distinct, especially when
coupled with effective color schemes.
Scalability: Pixel-oriented techniques scale well for datasets with millions of records, making them ideal for
big data applications.
Dimensional Flexibility: They can visualize high-dimensional data by arranging pixels across multiple axes or
attributes.
Applications of Pixel-Oriented Visualization:
Pixel-oriented techniques are especially useful in fields like finance, scientific research, and large-scale
business data analysis, where the volume of data is too large for conventional chart types.
They can be used for time series data, multi-dimensional datasets, and datasets with millions of records.
Financial Market Analysis: For analyzing stock trends, price changes, and market patterns over time.
Medical and Genomic Research: Used to visualize gene expression levels, mutations, or other biological
markers in high-dimensional data.
Network Security: Helps in identifying anomalies, suspicious patterns, and irregularities within large network
data logs.
Climate Science: For visualizing climate patterns, temperature changes, or pollution levels over large areas
and extended time periods.
Drawback of the pixel-oriented visualization techniques
That cannot help us much in understanding the distribution of data in multidimensional space.
1. Hard to Understand: With so many tiny pixels, it can be overwhelming and confusing to interpret
patterns, especially for people unfamiliar with this style of visualization.
2. Color Confusion: These techniques rely on color to show data differences, but small color changes
can be hard to notice. This makes it easy to miss details or misinterpret the data.
3. Limited Detail: Since each data point is just a pixel, there’s no room for labels or extra details. This
makes it difficult to understand specific data points without extra context.
4. Screen Limitations: Even though these techniques show a lot of data, they’re limited by screen
resolution. If there’s too much data, it can look cluttered, and details might get lost.
5.3 Geometric Projection Visualization Techniques: It simplify multi-dimensional data by projecting it into a
lower-dimensional space (2D or 3D) using mathematical transformations, while preserving the original
structure and relationships. This helps in visualizing complex datasets with many variables that are otherwise
difficult to interpret.
These are particularly useful for visualizing data in fields like data mining, machine learning, and
bioinformatics, where datasets often have a high number of features or dimensions.
Eg. Where x and y are two spatial attributes and the third dimension is represented by different shapes
Through this visualization, we can see that points of types “+” &”X” tend to be collocated.
Dimensionality Reduction: These methods reduce the number of dimensions in the dataset,
typically from a high-dimensional space (e.g., hundreds of variables) to a lower-dimensional space
(e.g., 2D or 3D).
By projecting data onto fewer dimensions, these techniques help in visualizing complex relationships
between variables that would otherwise be hidden in higher-dimensional space.
Geometric Transformations/Preserve Structure: The transformation seeks to preserve important
aspects of the original data structure, such as distances, similarities, or groupings.
Interactivity: Many projection methods are combined with interactive elements, enabling users to
zoom, rotate, or focus on specific data areas for deeper insights.
HyperSlice: It is a new method for the visualization of scalar functions of many variables. With this
method the multi-dimensional function is presented in a simple and easy to understand way in
which all dimensions are treated identically. The central concept is the representation of a multi-
dimensional function as a matrix of orthogonal two-dimensional slices. These two- dimensional
slices lend themselves very well to interaction via direct manipulation, due to a one to one
relation between screen space and variable space
Common Techniques:
Principal Component Analysis (PCA): it is One of the most widely used geometric projection
techniques, PCA reduces dimensionality by finding the directions (principal components) along
which the variance in the data is maximized. The data is projected onto the first few principal
components, which contain the most significant information about the dataset, allowing it to be
visualized in 2D or 3D.
Advantages: PCA is fast, easy to interpret, and useful for initial exploratory analysis. It’s commonly
used in fields like image compression, genetics, and finance.
Multidimensional Scaling (MDS): MDS visualizes the similarity or dissimilarity between data points
by projecting them into a lower- dimensional space while preserving the pair wise distances
between points as much as possible. It’s commonly used in cases where the goal is to visualize the
relationships or proximities between different items or observations.
Advantages: MDS is ideal for datasets with a meaningful distance metric, such as psychological or
preference studies, where perceived distances between items are critical.
T-Distributed Stochastic Neighbor embedding (t-SNE): Preserves local structures, revealing clusters
and patterns in high-dimensional data, often used in machine learning. OR t-SNE is a nonlinear
technique that maps data into two or three dimensions, preserving local structures by grouping
similar data points together and pushing dissimilar points apart. T-SNE is widely used in machine
learning and bioinformatics for visualizing clusters or substructures within high-dimensional data.
Advantages: t-SNE is particularly effective for identifying clusters or groupings within the data,
making it popular for complex datasets like images, gene expressions, and word embeddings.
Self-Organizing Maps (SOMs): SOMs are a type of neural network that performs a nonlinear
projection, mapping high-dimensional data onto a two-dimensional grid while maintaining
topological relationships
Neural networks that project high-dimensional data onto a lower-dimensional grid to reveal patterns
They are used in data mining and exploratory data analysis, particularly for clustering, classification,
and visualization tasks.
Advantages: SOMs can highlight clusters and relationships within the data, making them useful for
tasks that benefit from an intuitive understanding of data topology.
Radial Coordinate Visualization (RadViz): It places data points on a circular layout, with each
dimension represented as a point on the circumference. Each data point is plotted within the circle
based on its relative values across dimensions. It is best suited for datasets with a small to moderate
number of dimensions, allowing users to see relative relationships across multiple attributes.
Advantages: It provides a compact view of multiple dimensions and is intuitive for small datasets,
particularly for feature comparison tasks.
Parallel Coordinates: In this, each data dimension is represented by a vertical axis, and each data
point is represented by a line intersecting each axis based on its values.This technique is particularly
effective for visualizing high-dimensional data and exploring correlations between multiple
attributes.
Advantages: Parallel coordinates are beneficial for datasets where understanding relationships
between variables is critical, such as in finance and engineering
A major limitation of the parallel coordinate’s technique is that it cannot effectively show a
data set of many records.
Linear Discriminate Analysis (LDA): It used for classification, separating data by maximizing the distinction
between predefined categories.
Representation types of the Geometric Projection Visualization Technique:
Line Plot: This is the plot that you can see in the nook and corners of any sort of analysis between2 variables.
The line plots are not hing but the values on a series of data points will be connected with straight lines. The
plot may seem very simple but it has more applications not only in machine learning but in many other
areas. Used to analyze the performance of a model using the ROC-AUC curve.
Scatter Plots: A scatter plot is a widely used visualization technique in machine learning and data science to
represent relationships between variables. It can display data in 2D or 3D, with points plotted based on
attributes along the x, y, and optionally z axes.In 2D scatter plots, patterns, clusters, and data separability
can be observed. Data points can be colored according to their class labels or other target attributes, making
it easier to identify patterns. While effective for small datasets, scatter plots can become cluttered and hard
to interpret with large datasets or too many attributes.
Bar Plot: This is one of the widely used plots that we would have seen multiple times not just in data analysis
but we use this plot also wherever there is a trend analysis in many fields. We can visualize the data in a cool
plot and can convey the details straight forward to others. This plot may be simple and clear but it’s not
much frequently used in Data science applications.
Stacked Bar Graph: Unlike a Multi-set Bar Graph which displays their bars side-by-side, Stacked Bar Graphs
segment their bars. Stacked Bar Graphs are used to show how a larger category is divided into smaller
categories and what the relationship of each part has on the total amount. There are two types of Stacked
Bar Graphs:
1. Simple Stacked Bar Graphs place each value for the segment after the previous one.
2. The total value of the bar is all the segment values added together.
Ideal for Comparing the total amounts across each group/segmented bar.
100% Stack Bar Graphs show the percentage-of-the-whole of each group and are plotted by the percentage
of each value to the total amount in each group. This makes it easier to see the relative differences between
quantities in each group.
One major flaw of Stacked Bar Graphs is that they become harder to read the more segments each bar has.
Box and Whisker Plot: This plot can be used to obtain more statistical details about the data. The straight
lines at the maximum and minimum are also called whiskers.
Points that lie outside the whiskers will be considered as an outlier.
The box plot also gives us a descriptionofthe25th, 50th, 75th quartiles.
With the help of a box plot, we can also determine the Inter quartile
Range (IQR) where maximum details o f the data will be present
These box plots come under univariate analysis, which means that we
Are exploring data only with one variable.
Also comparing each segment to each other is difficult, as they're not aligned on a common baseline.
Pie Chart: It shows a static number and how categories represent part of a whole the composition of
something. It represents numbers in percentages, and the total sum of all segments needs to equal 100%.
Extensively used in presentations and offices, Pie Charts help show proportions and percentages
between categories, by dividing a circle into proportional segments. Each arc length represents a
proportion of each category, while the full circle represents the total sum of all the data, equal to
100%.
Donut chart: It is essentially a Pie Chart with an area of the centre cut out. Pie Charts are sometimes
criticized for focusing readers on the proportional areas of the slices to one another and to the chart as a
whole. This makes it tricky to see the differences between slices, especially when you try to compare
multiple Pie Charts together.
A Donut Chart somewhat remedies this problem by de-emphasizing the use of the area. Instead, readers
focus more on reading the length of the arcs, rather than comparing the proportions between slices. Also,
Donut Charts are more space-efficient than Pie Charts because the blank space inside a Donut Chart can be
used to display information inside it.
Star Glyphs (Star Plots): Star glyphs (or star plots) use star-shaped icons where each spoke (or ray)
of the star represents a different variable. The length of the spoke reflects the value of that variable.
These are useful for comparing multiple data points at once, as users can quickly gauge which
variables are strong or weak for each data point based on the shape of the star. They display
multidimensional data of up to 18 variables as a cartoon human face.
Limitations: When used for datasets with too many dimensions, star glyphs can become cluttered and hard
to interpret.
Stick Figures: Data points are visualized as stick figures, where body part lengths or angles
correspond to data values. For instance, arm length might represent one attribute, while leg angle
represents another. These are often used in situations where human-like representations make data
variations intuitive to observe.
Limitations: Similar to Chernoff faces, stick figures can be somewhat abstract and may not clearly
represent all data variations.
Flower Glyphs: These are similar to star glyphs, but use petal-like structures to represent variables.
The size, shape, or number of petals can be used to indicate different attributes. They provide a
visually appealing way to represent complex, multi-dimensional data.
Shape-Coding Techniques: Icons of different shapes (triangles, squares, circles, etc.) are used to
represent data points, where each shape or attribute of the shape represents a data
dimension.Shape-coding is flexible and effective in distinguishing different groups or categories
within a dataset.
Limitations: Limited by the human ability to differentiate between many shapes at once; too many
dimensions can make the visualization cluttered.
Applications of Icon-Based Visualization:
Finance: For portfolio analysis, where each stock or asset can be represented by an icon that encodes
various performance metrics.
Medical Data: In patient data visualization, where multiple health metrics (like blood pressure, cholesterol,
etc.) can be mapped to icon features.
Marketing and Customer Segmentation: To display customer characteristics or buying behaviors in a way
that allows for quick comparison and grouping.
Environmental Monitoring: Representing multiple environmental variables (temperature, humidity,
pollution levels, etc.) in ecological studies.
Exploratory Data Analysis: It help users explore multi-dimensional datasets, making it easier to identify
patterns, trends, or outliers.
Comparative Analysis: By using icons, analysts can quickly compare multiple data points across several
attributes at once.
Biology: For visualizing multi-dimensional genetic data or analyzing molecular structures.
Business and Finance: Helps in market research and customer segmentation by visually comparing various
attributes of clients or products.
Advantages of Icon-Based Visualization:
Multi-Dimensional Representation: Can handle datasets with many attributes, allowing users to see
multiple data dimensions in a single visual form.
Pattern Recognition: Leverages the human brain’s ability to detect visual patterns, making it easier to spot
trends, similarities, or outliers in the data.
Customizable: Icons can be easily tailored to represent specific variables or categories, offering flexibility in
design.
High Information Density: Icons can encode multiple data attributes simultaneously, providing a
comprehensive view of each data point.
Interactive Exploration: Much icon-based visualization allow for interactive adjustments, such as zooming or
filtering, which helps in focusing on specific data aspects.
Challenges /Drawbacks of Icon-Based Visualization:
Visual Overload/ Subjectivity in Visual Representation: When dealing with very large datasets or a
high number of dimensions, icon-based techniques can become overwhelming or cluttered.
Some icon-based techniques, like Chernoff faces, rely on abstract features (like faces or figures) which may
not be universally interpretable or precise.
Interpretation Difficulty: Icons can become confusing if there are too many data dimensions or if
the icons themselves are complex.
Limited Scalability: As the number of data points increases, the effectiveness of icons as a means of
representation may diminish due to crowding or overlapping icons. Large datasets can quickly make
icon-based visuals cluttered, making it hard to distinguish between individual data points.
Examples of Use:
Chernoff Faces in Stock Market: Faces show stock indicators like volatility, price changes, and
trading volume.
Star Glyphs in Medical Data: Used to compare patient health metrics such as blood pressure, heart
rate, and cholesterol.
Flower Glyphs in Climate Data: Petals represent climate factors like temperature, humidity, and
wind speed, showing variations over time or location.
These techniques provide an intuitive and flexible way to visualize multi-dimensional data by
encoding several attributes into different aspects of an icon. They are highly effective for pattern
recognition and comparison tasks, but they must be carefully designed to avoid visual overload and
ensure interpretability.
5.5 Hierarchical Data Visualizations: Hierarchical Visualizations (or Trees) are collections of items, where
each item connects to one parent (except the root), and both items and connections can have multiple
attributes.
Hierarchical data is organized in a tree-like structure, with each data point having defined
relationships that create parent-child connections.
It represent data with a hierarchical structure, helping users understand relationships and
organization within complex datasets, especially those with multiple category levels like
organizational charts or taxonomies.
For a large data set of high dimensionality, it would be difficult to visualize all dimensions at the
same time.
These techniques partition all dimensions into subsets (i.e., subspaces).
The subspaces are visualized in a hierarchical manner
“Worlds-within-Worlds,” also known as n-Vision, is a representative hierarchical visualization
method.
To visualize a 6-D data set, where the dimensions are F, X1, X2, X3, X4, X5.
We want to observe how F changes w.r.t. Other dimensions. We can fix X3,X4,X5 dimensions to
selected values and visualize changes to F w.r.t. X1, X
Key Concepts:
Hierarchy: It is a way of organizing data into levels, where higher levels represent broader categories and
lower levels represent more specific subcategories. For example, a company structure can have a CEO at the
top, followed by department heads, team leaders, and individual employees.
Visualization Goals: The primary goals of hierarchical visualization are to show relationships between data
points, help users navigate through different levels of detail, and make complex structures comprehensible
at a glance.
Data Structure: Hierarchical data is often represented in tree-like structures, where each node represents a
data point, and branches connect parent nodes to their child nodes. This structure helps visualize the
hierarchy and relationships among different data levels.
Common Hierarchical Visualization Techniques:
Tree Diagrams: These visually represent hierarchical data using branches. Each node (or leaf)
represents a data point, and lines connect parent nodes to their children. They are useful for
showing relationships and structures, like family trees or organizational charts
Tree maps: Tree maps display hierarchical data as nested rectangles, with each rectangle's area
representing the size of the category it represents. This technique is useful for visualizing part-to-
whole relationships and is effective in showing data with many categories
Dendrograms: This is a type of tree diagram that’s often used in fields like biology, machine learning
(cluster analysis), or linguistics. It represents the hierarchical clustering of data, where nodes
represent clusters, and branches show relationships between them. This visualization is often used
in hierarchical clustering, where data points are grouped into nested clusters, with similar items
placed closer together.
Icicle Plot: An icicle plot is similar to a sunburst chart but uses a vertical or horizontal bar layout
rather than a radial one. Each level of the hierarchy is represented by a stack of bars, with each bar
representing a node in the hierarchy. Icicle plots are particularly effective at visualizing deep
hierarchies and are good for showing the "depth" of a hierarchy.
Organizational Chart: An organizational chart is a type of hierarchy visualization commonly used in
business to depict the relationships between employees or departments within an organization. The
chart typically has a tree structure with the CEO or top-level manager at the top and branching down
to lower-level employees or departments.
Sunburst Charts: A sunburst chart is a circular visualization that represents hierarchical data in a
radial layout. It starts with a central node (the root) and has concentric rings radiating out, each ring
representing a level in the hierarchy. This type of visualization is particularly useful when you want
to display multiple levels of categories and subcategories, and allows for easy exploration of
hierarchical relationships.
Circular Tree Diagram: This diagram is another variant of the hierarchical structure but laid out in a
circular format rather than a traditional tree. It may have a central node surrounded by rings or
layers, each representing different levels of hierarchy. It is used to emphasize the
interconnectedness or cyclical nature of the hierarchy.
Circle Packing: It is a variation of a Tree map that uses circles instead of rectangles. Containment
within each circle represents a level in the hierarchy: each branch of the tree is represented as a
circle and its sub-branches are represented as circles inside of it. The area of each circle can also be
used to represent an additional arbitrary value, such as quantity or file size. Color may also be used
to assign categories or to represent another variable via different shades. As beautiful as Circle
Packing appears, it's not as space-efficient as a Tree map, as there's a lot of empty
Space within the circles. Despite this, Circle Packing actually reveals hierarchal structure better than a Tree
map.
Radial Tree Diagram: This variation of the tree diagram uses a radial (circular) layout, often in a
radial or spider-web pattern, to display the hierarchical relationships. Each node is represented by a
circle, with connecting lines or arcs to display parent-child relationships. This can be particularly
helpful when you have multiple branches of a hierarchy.
Nested Pie Charts: Nested pie charts, or donut charts, show hierarchical data in concentric circles.
Each layer of the pie represents a different level of the hierarchy. This technique allows for
comparing the sizes of categories and subcategories in a compact visual format.
Applications/ Use Cases for Hierarchical Data Visualizations:
Organizational Structures: Visualizing company hierarchies, department structures, and employee
roles.
Biological Classification: Representing relationships among species or biological classifications.
File Systems: Showing the structure of folders and files in a computer’s directory.
Product Categories: Organizing product lines into categories and subcategories in e-commerce
platforms.
Knowledge Organization: Structuring information for databases, libraries, or ontologies.
Family Trees and Genealogy: Representing generational lineage or ancestry.
Project Management: Visualizing task dependencies in a project, especially for Gantt charts or Work
Breakdown Structures (WBS).
Website Structures: Mapping the hierarchy of pages and content on a website.
Clarity/ Simplified Complex Data: They provide a way to break down complex datasets into more digestible
chunks, allowing users to understand high-level relationships and drill down to more detailed layers.
Hierarchical visualizations make complex structures more understandable and navigable.
Spatial Organization: They effectively represent relationships among categories, allowing for quick
comprehension of the data's organization.
Detail Exploration: Users can drill down into subcategories for more detailed insights while maintaining the
context of the overall structure. Much hierarchical visualization (like sunbursts or tree maps) allow users to
drill down into different layers of data interactively, making it possible to explore large datasets dynamically.
Intuitive Structure: The tree-like or layered structure mimics natural cognitive patterns for organizing
information, making it easier for users to follow the hierarchy and relationships between entities.
Highlight Proportions: Some visualization, like tree maps, also allow you to visually represent proportional
data (e.g., sales or quantities) within the hierarchy, helping to identify trends and outliers at different levels.
Efficient Navigation: When dealing with very large datasets or deep hierarchies, hierarchical data
visualizations can help users navigate efficiently, finding relevant information quickly.
Challenges /Drawback of the Hierarchical Data Visualizations:
Overcrowding: Large hierarchies can become cluttered and difficult to read if too many levels or
items are included.
Limited Detail: Hierarchical visualizations can sometimes oversimplify data, making it hard to
capture nuances or outliers.
Scalability: As the number of categories grows, maintaining a clear and effective visualization
becomes challenging.
5.6 Visualizing complex data and relationships: It involves using various techniques and tools to represent
intricate datasets, enabling users to understand patterns, trends, and connections within the data. As data
becomes more complex—often comprising multiple dimensions, categories, or interrelationships—effective
visualization becomes crucial for analysis and decision-making. Here’s an overview of key concepts,
techniques, and applications in this area.
Key Concepts:
Complex Data: Complex data refers to datasets that contain numerous variables, relationships, or
dimensions. This includes multi-dimensional data, time series, and interconnected datasets, such as
those found in social networks or scientific research. It often includes hierarchical structures,
categorical variables, and temporal aspects, making it challenging to analyze and interpret without
appropriate visualization.
Relationships: Relationships in data can be direct or indirect, linear or non-linear, and can involve
multiple dimensions. Understanding these relationships is key to deriving insights and making
informed decisions.
Dimensionality: Complex data often operates in high-dimensional spaces, where each dimension
represents a different variable. Visualizing these dimensions effectively is essential to reveal
underlying patterns and correlations.
For a large data set of high dimensionality it would be difficult to visualize all dimensions at the same
time.
Hierarchical visualization techniques partition all dimensions into subsets (i.e., subspaces).
The sub spaces are visualized in a hierarchical manner
“Worlds-within-Worlds, “also known as n-Vision, is a representative hierarchical visualization
method.
Tovisualizea6-Ddataset,where the dimensions areF,X1,X2,X3,X4,X5.
We want to observe how F Changes w.r.t. other dimensions. We can fix X3,X4,X5 dimensions to
selected values and visualize changes to Fw.r.t.X1,X2
Most visualization techniques were mainly for numeric data.
Recently, more and more non-numeric data, such as text and social networks, have become
available.
Many people on the Web tag various objects such as pictures, blog entries, and product reviews.
A tag cloud is a visualization of statistics of user-generated tags.
Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order.
The importance of a tag is indicated by font size or color.
Word Cloud: Also known as a Tag Cloud. A visualization method that displays how frequently words appear
in a given body of text, by making the size of each word proportional to its frequency. All the words are then
arranged in a cluster or cloud of words. Alternatively, the words can also be arranged in any format:
horizontal lines, columns or within a shape.
Word Clouds can also be used to display words that have meta-data assigned to them. For example,
in a Word Cloud with all the World's country's names, the population could be assigned to each
name to determine its size.
Color used on Word Clouds is usually meaningless and is primarily aesthetic, but it can be used to
categorize words or to display another data variable.
Typically, Word Clouds are used on websites or blogs to depict keyword or tag usage. Word Clouds
can also be used to compare two different bodies of text together.
Although being simple and easy to understand, Word Clouds have some major flaws: