0% found this document useful (0 votes)

59 views20 pages

UNIT 5 Data Analytics

The document discusses various data visualization techniques, including pixel-oriented, geometric projection, and icon-based methods, emphasizing their importance in simplifying complex datasets and aiding in pattern recognition. It outlines key concepts, types of visualizations, and tools available for creating effective data visuals, highlighting their applications in fields such as business intelligence, healthcare, and finance. Additionally, it addresses the advantages and drawbacks of these techniques, providing insights into how they can enhance decision-making and data analysis.

Uploaded by

226p1a05d3swetha.edunet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views20 pages

UNIT 5 Data Analytics

Uploaded by

226p1a05d3swetha.edunet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

UNIT- V Data Visualization: Pixel-Oriented Visualization Techniques, Geometric Projection Visualization

Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques, Visualizing Complex

Data and Relations.
5.1 Data Visualization:
 Data visualization is a graphical representation of quantitative information and data by using visual
elements /context like graphs, charts, and maps.
 It converts large and small data sets into visuals, which is easy to understand and process for
humans.
 It provides tools an accessible way to see and understand trends, outliers, and patterns in data. In
the world of Big Data, the data visualization tools and technologies are required to analyze vast
amounts of information.
 These are common in your everyday life, but they always appear in the form of graphs and charts.
The combination of multiple visualizations and bits of information are still referred to as Info
graphics.
 Data visualizations are used to discover unknown facts and trends. You can see visualizations in the
form of line charts to display change over time

Key Concepts in Data Visualization:

Visual Elements:
Charts and Graphs: Bar charts, line graphs, scatter plots, and pie charts are common forms of visual data
representations.
Tables: Display raw data in a structured manner, often used when precision is more important than visual
appeal.
Maps: Geographic data can be visualized through heat maps, choropleth maps, or point-based maps to show
patterns across locations.
Purpose of Data Visualization:
Simplification: Data visualization simplifies large and complex datasets into visual formats that are easier to
digest.
Pattern Recognition: It helps users quickly identify trends, patterns, and correlations in data.
Storytelling: Visuals can be used to tell a story or convey insights in a compelling way, making the message
clearer and more impactful.
Decision-Making: Helps stakeholders make informed decisions by presenting key data points in a clear and
actionable format.
Types of Data Visualizations:
 Line Charts: Show trends over time, useful for tracking variables over periods.
 Bar Charts: Compare quantities across categories, good for showing part-to-whole relationships.
 Pie Charts: Display proportions within a dataset, best for showing how segments compare to a
whole.
 Histograms: Show frequency distributions of variables, useful for displaying data spread
 Heat maps: Represent data using color variations to show intensity or frequency.

 Maps: Represent spatial data, good for geographically bound data like population density or
resource locations.
 Scatter Plots: Show relationships or correlations between two variables.
 Box plots: It display key statistics, while area charts highlight cumulative trends by filling the area
under the line.
 Area Charts: Similar to line charts but with the area under the line filled, these charts accentuate
cumulative data patterns.
 Bubble Charts: Enhance scatter plots by introducing a third dimension through varying bubble sizes,
revealing additional insights.
 Treemaps: Efficiently represent hierarchical data structures, breaking down categories into nested
rectangles.
 Violin Plots: Violin plots combine aspects of box plots and kernel density plots, providing a detailed
representation of the distribution of data.
 Word Clouds: Word clouds are visual representations of text data where words are sized based on
their frequency.
 3D Surface Plots: 3D surface plots visualize three-dimensional data, illustrating how a response
variable changes in relation to two predictor variables.
 Network Graphs: Network graphs represent relationships between entities using nodes and edges.
They are useful for visualizing connections in complex systems, such as social networks,
transportation networks, or organizational structures.
 Sankey Diagrams: Sankey diagrams visualize flow and quantity relationships between multiple
entities. Often used in process engineering or energy flow analysis.
Tools and Software for Data Visualization: Common tools for creating data visualizations include Microsoft
Excel, Tableau, Power BI, Google Data Studio, and Python libraries like Matplotlib and Seaborn.
Some popular data visualization tools include:
 Tableau: Known for its drag-and-drop interface and powerful analytics capabilities.
 Microsoft Power BI: Offers rich analytics with integration into other Microsoft services.
 Google Data Studio: A free tool that integrates well with Google products and other data sources.
 Python Libraries (Matplotlib, Seaborn, Plotly): Widely used for custom visualizations and in data
science applications.
 D3.js: A JavaScript library that creates dynamic and interactive visualizations for web applications.
 This category includes Power BI, Kibana and Grafana.
 Power BI: It is exceptional for its highly intuitive drag-and-drop interface, short learning curve and
large integration capabilities, including Salesforce and MailChimp.
 Kibana : It is the part of the Elastic Stack that turns data into visual insights. It’s built on and
designed to work on Elasticsearch data only. This exclusivity, however, does not prevent it
from being one of the best data visualization tools for log data.
 Grafana: It is a professional data visualization and analytic tool that supports up to 30 data sources,
including AWS, Elastic search and Prometheus. Grafana is more flexible in terms of integrations
compared to Kibana, each of the systems works best with its own type of data.
Guiding Principles
 Clarity and Simplicity: Keep the visualization simple and avoid clutter. The goal is to communicate
data effectively (focus on clarity and relevance)
 Choose the Right Chart Type: The chart type should match the nature of the data and the message
you want to convey.
 Use Consistent Colors and Symbols: Colors should highlight important data points and not confuse
or overwhelm the viewer.
 Provide Context/Label Clearly: Labels, titles, and legends are crucial for helping the audience
understand the visualization.
 Tell a Story: Guide viewers through the data by arranging visuals logically and emphasizing key
 Benefits of Data Visualization:
 Simplifies Complex Data: Helps break down large volumes of data, making it easier to understand.
 Reveals Patterns and Trends: Assists in identifying patterns that are not immediately apparent in
raw data.
 Enhances Decision-Making: Facilitates quicker and more accurate decisions by providing clear data
insights.
 Engages Audiences: Visually engaging data representations can hold attention better than raw
numbers or text.
 Encourages Discovery: Enables users to explore and investigate data on their own.
Data visualization has some more specialties such as:
 It can identify areas that need improvement or modifications.
 It can clarify which factor influence customer behavior.
 It helps you to understand which products to place where.
 It can predict sales volumes
Uses of the Data Visualization:
 To make easier in understand and remember.
 To discover unknown facts, outliers, and trends.
 To visualize relationships and patterns quickly.
 To ask a better question and make better decisions.
 To competitive analyze and improve insights.
Applications of Data Visualization
1. Business Intelligence and Reporting
2. Financial Analysis
3. Healthcare
4. Marketing and Sales
5. Human Resources
5.2 Pixel-Oriented Visualization Technique: These are a way to display very large datasets by using pixels as
a visual representation of data values. Each data point is shown as a tiny dot (a pixel) on the screen, and its
color represents the value of the data.
 If the data has many dimensions (or features), a separate section of the screen (called a window) is
created for each dimension. The data is arranged in the same order across all these windows, so you
can compare patterns easily.
 This method makes it possible to show massive amounts of data at once while still being able to spot
trends, patterns, or unusual values (called anomalies). It uses every pixel on the screen efficiently to
give a clear and compact view of the data.
 Eg: All Electronics maintains a customer information table, which consists of 4 dimensions: income,
credit limit, transaction volume and age. We analyze the correlation between income and other
attributes by visualization.
 We sort all customers in income in ascending order and use this order to layout the customer data in
the 4 visualization windows
 The pixel colors are chosen so that the smaller the value, the lighter the shading.
 Using pixel based visualization we can easily observe that credit limit increases as income increases
customer whose income is in the middle range are more likely to purchase more from All
Electronics, these is no clear correlation between income and age.

Fig: Pixel oriented visualization of 4 attributes by sorting all customers in income Ascending order.
Key Concepts:
Pixel Mapping: In pixel-oriented visualizations, each data point is represented as a single pixel. The color or
intensity of the pixel is used to encode the data value.
 These visualizations/positions are most effective for datasets with a large number of data points
because each pixel can hold one data value, allowing millions of values to be visualized on a
standard display.
High-Density Displays/ Dense Information Representation:
 Pixel-oriented techniques are designed to handle very high data density, making them suitable for
large-scale datasets where traditional visualization methods, like bar or line charts, would become
cluttered.
 By representing each data point as a pixel, these techniques can handle millions of data points at
once, maximizing screen space.
Arrangement of Pixels:
Data values are arranged in a specific layout on the screen. Common arrangements include:
Row-wise and column-wise arrangement: Pixels are placed in rows and columns to maintain spatial
relationships within the data.
Recursive pattern: The data is arranged in a pattern that recursively groups related data points to show
correlations or trends.
Space-filling curves: These curves are used to map multidimensional data into 2D space without breaking
the continuity of the data structure.
Color Encoding: Color plays a critical role in pixel-oriented visualization techniques, as it conveys the value of
each data point. Color map selection is essential to ensure clarity and accurate interpretation of data.
Gradients or distinct colors can be used to represent data values, helping to distinguish between high, low,
or average values.
Types of Pixel-Oriented Visualization Techniques: There are several common techniques within pixel-
oriented visualization, each suited to specific data structures or analytical goals:

1. Recursive Pattern Technique: This technique arranges pixels in a recursive pattern, allowing for efficient
use of space and creating visually recognizable patterns that help in detecting correlations or clusters.
Recursive patterns work well for data with hierarchical structures, where users need to understand different
levels of the data simultaneously.
2. Circle Segments Technique: In this approach, data values are arranged in circular segments or ring
patterns, with each ring representing a different subset or level of data.
Circle segments are useful for visualizing cyclic data patterns, such as seasonal trends or periodic activities,
providing an intuitive way to spot recurring patterns.
3. Axes-Based Techniques: Data is displayed in a structured grid, where each row or column represents a
specific dimension or attribute, and each cell (or pixel) represents a data value for that attribute.
Axes-based techniques are helpful for comparing multiple dimensions simultaneously, especially for
identifying correlations and trends across attributes.
4. Spiral and Temporal Techniques: Pixels are arranged in spiral or other time-based layouts, which are
particularly effective for time-series data, where patterns over time are a focal point.
Temporal techniques help users quickly spot anomalies or seasonal variations in time-series data.
5. Query-Dependent Techniques: Here, the pixel arrangement adapts based on user queries or the specific
aspects of the data that are most relevant to the current analysis.
Query-dependent techniques are interactive, allowing users to focus on specific data slices, filtering and
zooming in on areas of interest
6. Query-independent techniques: Visualize the entire dataset, showing all the data points regardless of
user queries. Examples include the pixel bar chart and spiral visualization.
Advantages of Pixel-Oriented Visualization Techniques:
High Data Density: They allow visualization of massive datasets in a compact space, enabling detailed
analysis on a single screen.
Efficient Pattern Recognition: Patterns, trends, and outliers become visually distinct, especially when
coupled with effective color schemes.
Scalability: Pixel-oriented techniques scale well for datasets with millions of records, making them ideal for
big data applications.
Dimensional Flexibility: They can visualize high-dimensional data by arranging pixels across multiple axes or
attributes.
Applications of Pixel-Oriented Visualization:
Pixel-oriented techniques are especially useful in fields like finance, scientific research, and large-scale
business data analysis, where the volume of data is too large for conventional chart types.
They can be used for time series data, multi-dimensional datasets, and datasets with millions of records.
Financial Market Analysis: For analyzing stock trends, price changes, and market patterns over time.
Medical and Genomic Research: Used to visualize gene expression levels, mutations, or other biological
markers in high-dimensional data.
Network Security: Helps in identifying anomalies, suspicious patterns, and irregularities within large network
data logs.
Climate Science: For visualizing climate patterns, temperature changes, or pollution levels over large areas
and extended time periods.
Drawback of the pixel-oriented visualization techniques
That cannot help us much in understanding the distribution of data in multidimensional space.
1. Hard to Understand: With so many tiny pixels, it can be overwhelming and confusing to interpret
patterns, especially for people unfamiliar with this style of visualization.
2. Color Confusion: These techniques rely on color to show data differences, but small color changes
can be hard to notice. This makes it easy to miss details or misinterpret the data.
3. Limited Detail: Since each data point is just a pixel, there’s no room for labels or extra details. This
makes it difficult to understand specific data points without extra context.
4. Screen Limitations: Even though these techniques show a lot of data, they’re limited by screen
resolution. If there’s too much data, it can look cluttered, and details might get lost.
5.3 Geometric Projection Visualization Techniques: It simplify multi-dimensional data by projecting it into a
lower-dimensional space (2D or 3D) using mathematical transformations, while preserving the original
structure and relationships. This helps in visualizing complex datasets with many variables that are otherwise
difficult to interpret.
These are particularly useful for visualizing data in fields like data mining, machine learning, and
bioinformatics, where datasets often have a high number of features or dimensions.
Eg. Where x and y are two spatial attributes and the third dimension is represented by different shapes
Through this visualization, we can see that points of types “+” &”X” tend to be collocated.

Fig: visualization of 2D data set using scatter plot

Key Concepts/ Principles of Geometric Projection Techniques:

Dimensionality Reduction: These methods reduce the number of dimensions in the dataset,
typically from a high-dimensional space (e.g., hundreds of variables) to a lower-dimensional space
(e.g., 2D or 3D).
 By projecting data onto fewer dimensions, these techniques help in visualizing complex relationships
between variables that would otherwise be hidden in higher-dimensional space.
 Geometric Transformations/Preserve Structure: The transformation seeks to preserve important
aspects of the original data structure, such as distances, similarities, or groupings.
 Interactivity: Many projection methods are combined with interactive elements, enabling users to
zoom, rotate, or focus on specific data areas for deeper insights.
 HyperSlice: It is a new method for the visualization of scalar functions of many variables. With this
method the multi-dimensional function is presented in a simple and easy to understand way in
which all dimensions are treated identically. The central concept is the representation of a multi-
dimensional function as a matrix of orthogonal two-dimensional slices. These two- dimensional
slices lend themselves very well to interaction via direct manipulation, due to a one to one
relation between screen space and variable space
Common Techniques:
 Principal Component Analysis (PCA): it is One of the most widely used geometric projection
techniques, PCA reduces dimensionality by finding the directions (principal components) along
which the variance in the data is maximized. The data is projected onto the first few principal
components, which contain the most significant information about the dataset, allowing it to be
visualized in 2D or 3D.
Advantages: PCA is fast, easy to interpret, and useful for initial exploratory analysis. It’s commonly
used in fields like image compression, genetics, and finance.
 Multidimensional Scaling (MDS): MDS visualizes the similarity or dissimilarity between data points
by projecting them into a lower- dimensional space while preserving the pair wise distances
between points as much as possible. It’s commonly used in cases where the goal is to visualize the
relationships or proximities between different items or observations.
Advantages: MDS is ideal for datasets with a meaningful distance metric, such as psychological or
preference studies, where perceived distances between items are critical.

 T-Distributed Stochastic Neighbor embedding (t-SNE): Preserves local structures, revealing clusters
and patterns in high-dimensional data, often used in machine learning. OR t-SNE is a nonlinear
technique that maps data into two or three dimensions, preserving local structures by grouping
similar data points together and pushing dissimilar points apart. T-SNE is widely used in machine
learning and bioinformatics for visualizing clusters or substructures within high-dimensional data.
Advantages: t-SNE is particularly effective for identifying clusters or groupings within the data,
making it popular for complex datasets like images, gene expressions, and word embeddings.
 Self-Organizing Maps (SOMs): SOMs are a type of neural network that performs a nonlinear
projection, mapping high-dimensional data onto a two-dimensional grid while maintaining
topological relationships
 Neural networks that project high-dimensional data onto a lower-dimensional grid to reveal patterns
 They are used in data mining and exploratory data analysis, particularly for clustering, classification,
and visualization tasks.
Advantages: SOMs can highlight clusters and relationships within the data, making them useful for
tasks that benefit from an intuitive understanding of data topology.
 Radial Coordinate Visualization (RadViz): It places data points on a circular layout, with each
dimension represented as a point on the circumference. Each data point is plotted within the circle
based on its relative values across dimensions. It is best suited for datasets with a small to moderate
number of dimensions, allowing users to see relative relationships across multiple attributes.
Advantages: It provides a compact view of multiple dimensions and is intuitive for small datasets,
particularly for feature comparison tasks.
 Parallel Coordinates: In this, each data dimension is represented by a vertical axis, and each data
point is represented by a line intersecting each axis based on its values.This technique is particularly
effective for visualizing high-dimensional data and exploring correlations between multiple
attributes.
 Advantages: Parallel coordinates are beneficial for datasets where understanding relationships
between variables is critical, such as in finance and engineering
 A major limitation of the parallel coordinate’s technique is that it cannot effectively show a
data set of many records.

Linear Discriminate Analysis (LDA): It used for classification, separating data by maximizing the distinction
between predefined categories.
Representation types of the Geometric Projection Visualization Technique:

Line Plot: This is the plot that you can see in the nook and corners of any sort of analysis between2 variables.
The line plots are not hing but the values on a series of data points will be connected with straight lines. The
plot may seem very simple but it has more applications not only in machine learning but in many other
areas. Used to analyze the performance of a model using the ROC-AUC curve.

Scatter Plots: A scatter plot is a widely used visualization technique in machine learning and data science to
represent relationships between variables. It can display data in 2D or 3D, with points plotted based on
attributes along the x, y, and optionally z axes.In 2D scatter plots, patterns, clusters, and data separability
can be observed. Data points can be colored according to their class labels or other target attributes, making
it easier to identify patterns. While effective for small datasets, scatter plots can become cluttered and hard
to interpret with large datasets or too many attributes.

Bar Plot: This is one of the widely used plots that we would have seen multiple times not just in data analysis
but we use this plot also wherever there is a trend analysis in many fields. We can visualize the data in a cool
plot and can convey the details straight forward to others. This plot may be simple and clear but it’s not
much frequently used in Data science applications.
Stacked Bar Graph: Unlike a Multi-set Bar Graph which displays their bars side-by-side, Stacked Bar Graphs
segment their bars. Stacked Bar Graphs are used to show how a larger category is divided into smaller
categories and what the relationship of each part has on the total amount. There are two types of Stacked
Bar Graphs:

1. Simple Stacked Bar Graphs place each value for the segment after the previous one.
2. The total value of the bar is all the segment values added together.
Ideal for Comparing the total amounts across each group/segmented bar.
100% Stack Bar Graphs show the percentage-of-the-whole of each group and are plotted by the percentage
of each value to the total amount in each group. This makes it easier to see the relative differences between
quantities in each group.
One major flaw of Stacked Bar Graphs is that they become harder to read the more segments each bar has.

Box and Whisker Plot: This plot can be used to obtain more statistical details about the data. The straight
lines at the maximum and minimum are also called whiskers.
Points that lie outside the whiskers will be considered as an outlier.
The box plot also gives us a descriptionofthe25th, 50th, 75th quartiles.
With the help of a box plot, we can also determine the Inter quartile
Range (IQR) where maximum details o f the data will be present
These box plots come under univariate analysis, which means that we
Are exploring data only with one variable.

Also comparing each segment to each other is difficult, as they're not aligned on a common baseline.
Pie Chart: It shows a static number and how categories represent part of a whole the composition of
something. It represents numbers in percentages, and the total sum of all segments needs to equal 100%.
 Extensively used in presentations and offices, Pie Charts help show proportions and percentages
between categories, by dividing a circle into proportional segments. Each arc length represents a
proportion of each category, while the full circle represents the total sum of all the data, equal to
100%.
Donut chart: It is essentially a Pie Chart with an area of the centre cut out. Pie Charts are sometimes
criticized for focusing readers on the proportional areas of the slices to one another and to the chart as a
whole. This makes it tricky to see the differences between slices, especially when you try to compare
multiple Pie Charts together.
A Donut Chart somewhat remedies this problem by de-emphasizing the use of the area. Instead, readers
focus more on reading the length of the arcs, rather than comparing the proportions between slices. Also,
Donut Charts are more space-efficient than Pie Charts because the blank space inside a Donut Chart can be
used to display information inside it.

Mari mekko Chart: Also known as a Mosaic Plot.

These Charts are used to visualize categorical data over a pair of variables. In a Marimekko Chart, both axes
are variable with a percentage scale that determines both the width and height of each segment. So these
charts work as a kind of two-way 100% Stacked Bar Graph. This makes it possible to detect relationships
between categories and their subcategories via the two axes.
The main flaws of Marimekko Charts are that they can be hard to read, especially when there are many
segments. Also, it’s hard to accurately make comparisons between each segment, as they are not all
arranged next to each other along a common baseline. Therefore, Marimekko Charts are better suited for
giving a more general overview of the data

Applications of Geometric Projection Techniques:

 Data Exploration: Identifying patterns or clusters in multi-dimensional data.
 Machine Learning: For visualizing feature spaces, class distributions, and understanding model
results in high-dimensional feature spaces.
 Bioinformatics: To analyze gene expression patterns, protein structures, and other biological data
that is inherently high-dimensional.
 Customer Segmentation: For clustering customer data in marketing analytics, allowing for
segmentation based on purchasing patterns and behaviors.
 Psychology and Social Sciences: For visualizing survey data, personality traits, or preferences, where
similarities and differences between subjects are crucial.
 Finance: Simplifying financial data to reveal relationships between indicators.
Advantages of Geometric Projection Techniques:
Data Simplification: These techniques simplify complex, high-dimensional data, making it possible to
visualize data structures on a 2D or 3D plane.
Pattern Recognition: They enable users to detect patterns, groupings, or trends that would be difficult to
discern in the original high-dimensional form.
Clustering and Outlier Detection: By projecting data, clusters and outliers become visually identifiable,
aiding in segmentation and anomaly detection.
Versatility: These techniques are applicable across various fields, from machine learning to social science
and biology.
Insight Generation: Offers insights into relationships between variables.
Challenges of Geometric Projection Techniques:
 Information Loss: Some data or relationships may be lost during dimensionality reduction.
Projecting high-dimensional data to two or three dimensions may result in the loss of some data
details, potentially leading to misinterpretations.
 Interpretation: Reduced dimensions might be harder to understand. Some methods, like t-SNE or
MDS, can be challenging to interpret quantitatively, as the resulting low-dimensional representation
may not have clear axes or scales.
 Computational Cost: Techniques like t-SNE can be resource-intensive, especially with large datasets.
Certain projection techniques, especially nonlinear methods like t-SNE, can be computationally
intensive for large datasets.
 Choice of Parameters: Techniques like t-SNE require fine-tuning parameters (e.g., perplexity), which
can significantly influence the visualization outcome.
5.4 Icon-Based Visualization Techniques: These techniques use small icons or glyphs to represent individual
data points in multidimensional datasets. Each icon’s shape, size, color, or orientation can be varied to
represent different attributes of the data, allowing multiple dimensions to be shown within a single visual
element. These techniques are particularly effective for multi-dimensional data, as they provide an intuitive
way to map complex data into simple, recognizable visual forms.
Key Concepts of Icon-Based Visualization Techniques:
 Icons as Data Carriers/Data Mapping to Visual Features: In each data point is visualized as a unique
icon, with various aspects of the icon (such as shape, color, size, or orientation) representing
different attributes of the data .For example, an icon might use its color to represent one variable,
its size to represent another, and its shape or orientation to represent yet another attribute.
 Multi-dimensional Data Representation /Interpretive Icons: Icons are designed to convey as much
information as possible through visual cues. For instance, color intensity might indicate a variable's
value, while icon rotation could represent another attribute.
 Pattern Recognition: Humans are naturally good at recognizing shapes and patterns, making icon-
based techniques useful for spotting trends, similarities, or anomalies in the data. When icons are
arranged together, it becomes easier to see clusters or outliers based on the visual characteristics of
the icons. By arranging icons in structured layouts, users can quickly identify patterns, clusters, or
outliers across multiple attributes.
Design Elements:
 Shape: Different shapes can represent different categories or attributes.
 Color: Colors can encode quantitative or qualitative information (e.g., temperature, intensity, and
category).
 Size: Icon size can be used to represent a numerical attribute, such as population size or frequency.
 Orientation: The direction or angle of an icon can represent additional data dimensions, such as
geographic orientation or a time-based trend.
Common types of Icon-Based Techniques: There are several popular types of icon-based visualizations, each
suited to different data structures and analytical needs:
 Chernoff Faces: It represent each data point as a face, where different facial features (like the size of
the eyes or the shape of the mouth) are mapped to data dimensions. For example, the size of the
eyes might represent one attribute, while the width of the mouth represents another. This
technique leverages the human brain’s ability to recognize and differentiate between faces, making
it easy to spot patterns or anomalies in the data.
Limitations: Since faces are somewhat abstract and subjective, subtle differences in data may be
overlooked

 Star Glyphs (Star Plots): Star glyphs (or star plots) use star-shaped icons where each spoke (or ray)
of the star represents a different variable. The length of the spoke reflects the value of that variable.
These are useful for comparing multiple data points at once, as users can quickly gauge which
variables are strong or weak for each data point based on the shape of the star. They display
multidimensional data of up to 18 variables as a cartoon human face.

Limitations: When used for datasets with too many dimensions, star glyphs can become cluttered and hard
to interpret.
 Stick Figures: Data points are visualized as stick figures, where body part lengths or angles
correspond to data values. For instance, arm length might represent one attribute, while leg angle
represents another. These are often used in situations where human-like representations make data
variations intuitive to observe.
Limitations: Similar to Chernoff faces, stick figures can be somewhat abstract and may not clearly
represent all data variations.
 Flower Glyphs: These are similar to star glyphs, but use petal-like structures to represent variables.
The size, shape, or number of petals can be used to indicate different attributes. They provide a
visually appealing way to represent complex, multi-dimensional data.
 Shape-Coding Techniques: Icons of different shapes (triangles, squares, circles, etc.) are used to
represent data points, where each shape or attribute of the shape represents a data
dimension.Shape-coding is flexible and effective in distinguishing different groups or categories
within a dataset.
Limitations: Limited by the human ability to differentiate between many shapes at once; too many
dimensions can make the visualization cluttered.
Applications of Icon-Based Visualization:
Finance: For portfolio analysis, where each stock or asset can be represented by an icon that encodes
various performance metrics.
Medical Data: In patient data visualization, where multiple health metrics (like blood pressure, cholesterol,
etc.) can be mapped to icon features.
Marketing and Customer Segmentation: To display customer characteristics or buying behaviors in a way
that allows for quick comparison and grouping.
Environmental Monitoring: Representing multiple environmental variables (temperature, humidity,
pollution levels, etc.) in ecological studies.
Exploratory Data Analysis: It help users explore multi-dimensional datasets, making it easier to identify
patterns, trends, or outliers.
Comparative Analysis: By using icons, analysts can quickly compare multiple data points across several
attributes at once.
Biology: For visualizing multi-dimensional genetic data or analyzing molecular structures.
Business and Finance: Helps in market research and customer segmentation by visually comparing various
attributes of clients or products.
Advantages of Icon-Based Visualization:
Multi-Dimensional Representation: Can handle datasets with many attributes, allowing users to see
multiple data dimensions in a single visual form.
Pattern Recognition: Leverages the human brain’s ability to detect visual patterns, making it easier to spot
trends, similarities, or outliers in the data.
Customizable: Icons can be easily tailored to represent specific variables or categories, offering flexibility in
design.
High Information Density: Icons can encode multiple data attributes simultaneously, providing a
comprehensive view of each data point.
Interactive Exploration: Much icon-based visualization allow for interactive adjustments, such as zooming or
filtering, which helps in focusing on specific data aspects.
Challenges /Drawbacks of Icon-Based Visualization:
 Visual Overload/ Subjectivity in Visual Representation: When dealing with very large datasets or a
high number of dimensions, icon-based techniques can become overwhelming or cluttered.
Some icon-based techniques, like Chernoff faces, rely on abstract features (like faces or figures) which may
not be universally interpretable or precise.
 Interpretation Difficulty: Icons can become confusing if there are too many data dimensions or if
the icons themselves are complex.
 Limited Scalability: As the number of data points increases, the effectiveness of icons as a means of
representation may diminish due to crowding or overlapping icons. Large datasets can quickly make
icon-based visuals cluttered, making it hard to distinguish between individual data points.
Examples of Use:
 Chernoff Faces in Stock Market: Faces show stock indicators like volatility, price changes, and
trading volume.
 Star Glyphs in Medical Data: Used to compare patient health metrics such as blood pressure, heart
rate, and cholesterol.
 Flower Glyphs in Climate Data: Petals represent climate factors like temperature, humidity, and
wind speed, showing variations over time or location.
 These techniques provide an intuitive and flexible way to visualize multi-dimensional data by
encoding several attributes into different aspects of an icon. They are highly effective for pattern
recognition and comparison tasks, but they must be carefully designed to avoid visual overload and
ensure interpretability.
5.5 Hierarchical Data Visualizations: Hierarchical Visualizations (or Trees) are collections of items, where
each item connects to one parent (except the root), and both items and connections can have multiple
attributes.
 Hierarchical data is organized in a tree-like structure, with each data point having defined
relationships that create parent-child connections.
 It represent data with a hierarchical structure, helping users understand relationships and
organization within complex datasets, especially those with multiple category levels like
organizational charts or taxonomies.
 For a large data set of high dimensionality, it would be difficult to visualize all dimensions at the
same time.
 These techniques partition all dimensions into subsets (i.e., subspaces).
 The subspaces are visualized in a hierarchical manner
 “Worlds-within-Worlds,” also known as n-Vision, is a representative hierarchical visualization
method.
 To visualize a 6-D data set, where the dimensions are F, X1, X2, X3, X4, X5.
 We want to observe how F changes w.r.t. Other dimensions. We can fix X3,X4,X5 dimensions to
selected values and visualize changes to F w.r.t. X1, X
Key Concepts:
Hierarchy: It is a way of organizing data into levels, where higher levels represent broader categories and
lower levels represent more specific subcategories. For example, a company structure can have a CEO at the
top, followed by department heads, team leaders, and individual employees.
Visualization Goals: The primary goals of hierarchical visualization are to show relationships between data
points, help users navigate through different levels of detail, and make complex structures comprehensible
at a glance.
Data Structure: Hierarchical data is often represented in tree-like structures, where each node represents a
data point, and branches connect parent nodes to their child nodes. This structure helps visualize the
hierarchy and relationships among different data levels.
Common Hierarchical Visualization Techniques:
 Tree Diagrams: These visually represent hierarchical data using branches. Each node (or leaf)
represents a data point, and lines connect parent nodes to their children. They are useful for
showing relationships and structures, like family trees or organizational charts
 Tree maps: Tree maps display hierarchical data as nested rectangles, with each rectangle's area
representing the size of the category it represents. This technique is useful for visualizing part-to-
whole relationships and is effective in showing data with many categories

 Dendrograms: This is a type of tree diagram that’s often used in fields like biology, machine learning
(cluster analysis), or linguistics. It represents the hierarchical clustering of data, where nodes
represent clusters, and branches show relationships between them. This visualization is often used
in hierarchical clustering, where data points are grouped into nested clusters, with similar items
placed closer together.
 Icicle Plot: An icicle plot is similar to a sunburst chart but uses a vertical or horizontal bar layout
rather than a radial one. Each level of the hierarchy is represented by a stack of bars, with each bar
representing a node in the hierarchy. Icicle plots are particularly effective at visualizing deep
hierarchies and are good for showing the "depth" of a hierarchy.
 Organizational Chart: An organizational chart is a type of hierarchy visualization commonly used in
business to depict the relationships between employees or departments within an organization. The
chart typically has a tree structure with the CEO or top-level manager at the top and branching down
to lower-level employees or departments.
 Sunburst Charts: A sunburst chart is a circular visualization that represents hierarchical data in a
radial layout. It starts with a central node (the root) and has concentric rings radiating out, each ring
representing a level in the hierarchy. This type of visualization is particularly useful when you want
to display multiple levels of categories and subcategories, and allows for easy exploration of
hierarchical relationships.
 Circular Tree Diagram: This diagram is another variant of the hierarchical structure but laid out in a
circular format rather than a traditional tree. It may have a central node surrounded by rings or
layers, each representing different levels of hierarchy. It is used to emphasize the
interconnectedness or cyclical nature of the hierarchy.
 Circle Packing: It is a variation of a Tree map that uses circles instead of rectangles. Containment
within each circle represents a level in the hierarchy: each branch of the tree is represented as a
circle and its sub-branches are represented as circles inside of it. The area of each circle can also be
used to represent an additional arbitrary value, such as quantity or file size. Color may also be used
to assign categories or to represent another variable via different shades. As beautiful as Circle
Packing appears, it's not as space-efficient as a Tree map, as there's a lot of empty
Space within the circles. Despite this, Circle Packing actually reveals hierarchal structure better than a Tree
map.
 Radial Tree Diagram: This variation of the tree diagram uses a radial (circular) layout, often in a
radial or spider-web pattern, to display the hierarchical relationships. Each node is represented by a
circle, with connecting lines or arcs to display parent-child relationships. This can be particularly
helpful when you have multiple branches of a hierarchy.
 Nested Pie Charts: Nested pie charts, or donut charts, show hierarchical data in concentric circles.
Each layer of the pie represents a different level of the hierarchy. This technique allows for
comparing the sizes of categories and subcategories in a compact visual format.
Applications/ Use Cases for Hierarchical Data Visualizations:
 Organizational Structures: Visualizing company hierarchies, department structures, and employee
roles.
 Biological Classification: Representing relationships among species or biological classifications.
 File Systems: Showing the structure of folders and files in a computer’s directory.
 Product Categories: Organizing product lines into categories and subcategories in e-commerce
platforms.
 Knowledge Organization: Structuring information for databases, libraries, or ontologies.
 Family Trees and Genealogy: Representing generational lineage or ancestry.
 Project Management: Visualizing task dependencies in a project, especially for Gantt charts or Work
Breakdown Structures (WBS).
 Website Structures: Mapping the hierarchy of pages and content on a website.

Advantages of Hierarchical Data Visualizations:

Clarity/ Simplified Complex Data: They provide a way to break down complex datasets into more digestible
chunks, allowing users to understand high-level relationships and drill down to more detailed layers.
Hierarchical visualizations make complex structures more understandable and navigable.
Spatial Organization: They effectively represent relationships among categories, allowing for quick
comprehension of the data's organization.
Detail Exploration: Users can drill down into subcategories for more detailed insights while maintaining the
context of the overall structure. Much hierarchical visualization (like sunbursts or tree maps) allow users to
drill down into different layers of data interactively, making it possible to explore large datasets dynamically.
Intuitive Structure: The tree-like or layered structure mimics natural cognitive patterns for organizing
information, making it easier for users to follow the hierarchy and relationships between entities.
Highlight Proportions: Some visualization, like tree maps, also allow you to visually represent proportional
data (e.g., sales or quantities) within the hierarchy, helping to identify trends and outliers at different levels.
Efficient Navigation: When dealing with very large datasets or deep hierarchies, hierarchical data
visualizations can help users navigate efficiently, finding relevant information quickly.
Challenges /Drawback of the Hierarchical Data Visualizations:
 Overcrowding: Large hierarchies can become cluttered and difficult to read if too many levels or
items are included.
 Limited Detail: Hierarchical visualizations can sometimes oversimplify data, making it hard to
capture nuances or outliers.
 Scalability: As the number of categories grows, maintaining a clear and effective visualization
becomes challenging.
5.6 Visualizing complex data and relationships: It involves using various techniques and tools to represent
intricate datasets, enabling users to understand patterns, trends, and connections within the data. As data
becomes more complex—often comprising multiple dimensions, categories, or interrelationships—effective
visualization becomes crucial for analysis and decision-making. Here’s an overview of key concepts,
techniques, and applications in this area.

Key Concepts:
 Complex Data: Complex data refers to datasets that contain numerous variables, relationships, or
dimensions. This includes multi-dimensional data, time series, and interconnected datasets, such as
those found in social networks or scientific research. It often includes hierarchical structures,
categorical variables, and temporal aspects, making it challenging to analyze and interpret without
appropriate visualization.
 Relationships: Relationships in data can be direct or indirect, linear or non-linear, and can involve
multiple dimensions. Understanding these relationships is key to deriving insights and making
informed decisions.
 Dimensionality: Complex data often operates in high-dimensional spaces, where each dimension
represents a different variable. Visualizing these dimensions effectively is essential to reveal
underlying patterns and correlations.
 For a large data set of high dimensionality it would be difficult to visualize all dimensions at the same
time.
 Hierarchical visualization techniques partition all dimensions into subsets (i.e., subspaces).
 The sub spaces are visualized in a hierarchical manner
 “Worlds-within-Worlds, “also known as n-Vision, is a representative hierarchical visualization
method.
 Tovisualizea6-Ddataset,where the dimensions areF,X1,X2,X3,X4,X5.
 We want to observe how F Changes w.r.t. other dimensions. We can fix X3,X4,X5 dimensions to
selected values and visualize changes to Fw.r.t.X1,X2
 Most visualization techniques were mainly for numeric data.

 Recently, more and more non-numeric data, such as text and social networks, have become
available.
 Many people on the Web tag various objects such as pictures, blog entries, and product reviews.
 A tag cloud is a visualization of statistics of user-generated tags.
 Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order.
 The importance of a tag is indicated by font size or color.
Word Cloud: Also known as a Tag Cloud. A visualization method that displays how frequently words appear
in a given body of text, by making the size of each word proportional to its frequency. All the words are then
arranged in a cluster or cloud of words. Alternatively, the words can also be arranged in any format:
horizontal lines, columns or within a shape.
 Word Clouds can also be used to display words that have meta-data assigned to them. For example,
in a Word Cloud with all the World's country's names, the population could be assigned to each
name to determine its size.
 Color used on Word Clouds is usually meaningless and is primarily aesthetic, but it can be used to
categorize words or to display another data variable.
 Typically, Word Clouds are used on websites or blogs to depict keyword or tag usage. Word Clouds
can also be used to compare two different bodies of text together.
 Although being simple and easy to understand, Word Clouds have some major flaws:

 Long words are emphasized over short words.

 Wordswhoseletterscontainmanyascendersanddescendersmayreceivemoreattention.
They're not great for analytical accuracy, so used more for a esthetic reasons instead
Applications:
 Business Intelligence: Organizations use complex data visualizations to analyze market trends,
customer behaviors, and sales performance, aiding strategic decision-making.
 Scientific Research: Researchers utilize visualization techniques to explore relationships in large
datasets, such as genomic data, climate models, or ecological studies.
 Social Network Analysis: Visualization of social networks helps in understanding connections,
influence patterns, and community structures among individuals or entities.
 Healthcare: In healthcare analytics, complex data visualizations assist in tracking patient outcomes,
identifying trends in treatment efficacy, and optimizing resource allocation.
 Finance: Financial analysts use visualizations to understand market trends, risk factors, and portfolio
performances, aiding investment decisions.
Challenges/Drawbacks:
 Information Overload: With complex data, visualizations can become cluttered and difficult to
interpret, making it challenging for users to extract meaningful insights.
 Scalability: As the volume of data increases, maintaining clarity and usability in visualizations can be
difficult.
 Choosing the Right Technique: Selecting the appropriate visualization technique for the specific
type of complex data is crucial but can be challenging, especially when multiple dimensions or
relationships need to be represented.
Visualizing complex data and relationships is crucial for gaining insights and making informed decisions.
Techniques like network graphs, t-SNE, heat maps, and parallel coordinates effectively represent multi-
dimensional data. However, challenges like information overload and scalability must be managed to
maintain clarity and usability. By using these techniques, analysts and decision-makers can explore complex
datasets and uncover valuable patterns and connection

Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
15 pages
Data Visualization-1
No ratings yet
Data Visualization-1
29 pages
Data Visualization Notes
No ratings yet
Data Visualization Notes
4 pages
Data Visualization Seminar Report4.docx 11
No ratings yet
Data Visualization Seminar Report4.docx 11
40 pages
Ds 4
No ratings yet
Ds 4
88 pages
Data Analytics Unit V
No ratings yet
Data Analytics Unit V
18 pages
Data Visualization
No ratings yet
Data Visualization
16 pages
Notes
No ratings yet
Notes
10 pages
Lecture Notes 1 - Introduction To Data Analysis and Visualization-1718780831207
No ratings yet
Lecture Notes 1 - Introduction To Data Analysis and Visualization-1718780831207
11 pages
Data Visualization Basics & Techniques
No ratings yet
Data Visualization Basics & Techniques
23 pages
Ds 1603 - Data Visualization Unit I Introduction
No ratings yet
Ds 1603 - Data Visualization Unit I Introduction
17 pages
Unit - Iv
No ratings yet
Unit - Iv
59 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
24 pages
DA Unit3
No ratings yet
DA Unit3
40 pages
Chapter 7 Data Analytics and Visualisation
No ratings yet
Chapter 7 Data Analytics and Visualisation
7 pages
5th Unit Fds
No ratings yet
5th Unit Fds
5 pages
Visualization
No ratings yet
Visualization
15 pages
DV Unit-1
No ratings yet
DV Unit-1
8 pages
EIT Project
No ratings yet
EIT Project
16 pages
Data Science Unit-5 B.sc. III Sem. MDC
No ratings yet
Data Science Unit-5 B.sc. III Sem. MDC
12 pages
Lecture 5 (BI)
No ratings yet
Lecture 5 (BI)
18 pages
Unit-1 Data Visualization Notes
No ratings yet
Unit-1 Data Visualization Notes
15 pages
Dsbda Ut6
No ratings yet
Dsbda Ut6
11 pages
Unit - 1 DV
100% (1)
Unit - 1 DV
10 pages
Lecture 1
No ratings yet
Lecture 1
10 pages
Eti MP
No ratings yet
Eti MP
15 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
24 pages
Vol11Iss1 P4
No ratings yet
Vol11Iss1 P4
7 pages
UNIT 1 DVT
No ratings yet
UNIT 1 DVT
22 pages
Data Visualization CAE-1
No ratings yet
Data Visualization CAE-1
8 pages
CSC 428 - 4
No ratings yet
CSC 428 - 4
12 pages
Data Visualisation Techniques and
No ratings yet
Data Visualisation Techniques and
19 pages
105-106 Data Visualization Techniques Tools and Best Practices
No ratings yet
105-106 Data Visualization Techniques Tools and Best Practices
25 pages
Data Visualization in Data Science
No ratings yet
Data Visualization in Data Science
50 pages
CSC504 Note On VISUALIZATION
No ratings yet
CSC504 Note On VISUALIZATION
13 pages
Unit III Business Analytics
No ratings yet
Unit III Business Analytics
8 pages
BDT UNIT - 4 Text Note
No ratings yet
BDT UNIT - 4 Text Note
63 pages
Unit-1 DVT
No ratings yet
Unit-1 DVT
44 pages
DA Unit-5
No ratings yet
DA Unit-5
8 pages
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
No ratings yet
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
34 pages
Data Visualization New
No ratings yet
Data Visualization New
103 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
27 pages
UNIT-3 Data Visualization
No ratings yet
UNIT-3 Data Visualization
10 pages
Reading and Writing Set 2 Assgn
No ratings yet
Reading and Writing Set 2 Assgn
16 pages
LM1
No ratings yet
LM1
12 pages
DSBDA Unit 6 Notes
No ratings yet
DSBDA Unit 6 Notes
41 pages
Final Seminar Report
No ratings yet
Final Seminar Report
27 pages
Unit V-Data Visualization
No ratings yet
Unit V-Data Visualization
5 pages
Data Visualization Essentials
No ratings yet
Data Visualization Essentials
2 pages
UNIT-5: Explain The Importance of Data Visualization in Data Analytics
No ratings yet
UNIT-5: Explain The Importance of Data Visualization in Data Analytics
6 pages
Unit 5
No ratings yet
Unit 5
6 pages
703 (A) Data Visualization Unit-1 Notes
No ratings yet
703 (A) Data Visualization Unit-1 Notes
5 pages
Unit 3 Dva
No ratings yet
Unit 3 Dva
34 pages
Data Visualization Essentials
No ratings yet
Data Visualization Essentials
33 pages
Notes DV 2025
No ratings yet
Notes DV 2025
10 pages
1 Introduction
No ratings yet
1 Introduction
130 pages
All Unit DV Notes
No ratings yet
All Unit DV Notes
31 pages
Unit 5 DSA
No ratings yet
Unit 5 DSA
42 pages
CCW331 Unit 1 BA Part 2
No ratings yet
CCW331 Unit 1 BA Part 2
5 pages
Ryan Et Al 2024 Pregnancy Is Linked To Faster Epigenetic Aging in Young Women
No ratings yet
Ryan Et Al 2024 Pregnancy Is Linked To Faster Epigenetic Aging in Young Women
9 pages
Student Cohort Analysis Guide
No ratings yet
Student Cohort Analysis Guide
8 pages
Marketing Research
No ratings yet
Marketing Research
29 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Data Pre-Processing Essentials
No ratings yet
Data Pre-Processing Essentials
21 pages
Improving The Storage Quality of Eggplan
No ratings yet
Improving The Storage Quality of Eggplan
9 pages
Mixed Data Analysis Techniques
No ratings yet
Mixed Data Analysis Techniques
7 pages
FCH Hello IITK
No ratings yet
FCH Hello IITK
3 pages
A PDF
No ratings yet
A PDF
9 pages
Exploratory Factor Analysis - A Five-Step Guide For Novices
100% (1)
Exploratory Factor Analysis - A Five-Step Guide For Novices
14 pages
Path Loss Modelling Based On Path Profile in Urban
No ratings yet
Path Loss Modelling Based On Path Profile in Urban
10 pages
Ba Course Plan Lab AY 2023-2024 BA
No ratings yet
Ba Course Plan Lab AY 2023-2024 BA
5 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
16 pages
Practical 5
No ratings yet
Practical 5
6 pages
ClarkeGorley 2006 PrimerV6UserManual
No ratings yet
ClarkeGorley 2006 PrimerV6UserManual
193 pages
Konstantinova 2021
No ratings yet
Konstantinova 2021
19 pages
Advanced Strategies For Metabolomic Data Analysis
100% (1)
Advanced Strategies For Metabolomic Data Analysis
31 pages
AI Syllabus Course
No ratings yet
AI Syllabus Course
16 pages
A Study On Rainfall Prediction Techniques
No ratings yet
A Study On Rainfall Prediction Techniques
16 pages
Zalazar Et Al 2024 Theriogenology
No ratings yet
Zalazar Et Al 2024 Theriogenology
11 pages
Heritage Tourism in India A Stakeholders Perspect
No ratings yet
Heritage Tourism in India A Stakeholders Perspect
15 pages
Early Wheat Disease Detection
No ratings yet
Early Wheat Disease Detection
15 pages
Chess Vector Representation Study
No ratings yet
Chess Vector Representation Study
5 pages
Wa0002.
No ratings yet
Wa0002.
110 pages
Beverages 05 00027
No ratings yet
Beverages 05 00027
13 pages
Chapter 19, Factor Analysis
No ratings yet
Chapter 19, Factor Analysis
7 pages
Mark Stamp - Introduction To Machine Learning With Applications in Information Security - Previewpdf
0% (1)
Mark Stamp - Introduction To Machine Learning With Applications in Information Security - Previewpdf
27 pages
Emerging Trends in Computer Engineering
No ratings yet
Emerging Trends in Computer Engineering
20 pages
M.Sc. Statistics Semester I Curriculum
No ratings yet
M.Sc. Statistics Semester I Curriculum
62 pages

UNIT 5 Data Analytics

Uploaded by

UNIT 5 Data Analytics

Uploaded by

UNIT- V Data Visualization: Pixel-Oriented Visualization Techniques, Geometric Projection Visualization

Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques, Visualizing Complex

Key Concepts in Data Visualization:

Fig: visualization of 2D data set using scatter plot

Key Concepts/ Principles of Geometric Projection Techniques:

Mari mekko Chart: Also known as a Mosaic Plot.

Applications of Geometric Projection Techniques:

Advantages of Hierarchical Data Visualizations:

 Long words are emphasized over short words.

You might also like