To understand the basic components of R programming and to explore
various types of data visualizations used in analytics.
# 1. Basic R Components
# ---------------------
# Variable declaration
a <- 10           # Numeric
b <- "R-Lab" # Character
c <- TRUE          # Logical
d <- c(1, 2, 3, 4, 5) # Vector
# Display variable types
print(class(a)) # numeric
print(class(b)) # character
print(class(c)) # logical
print(class(d)) # numeric (vector)
# 2. Data Types
# -------------
# Numeric
num_var <- 12.5
print(num_var)
# Integer
int_var <- as.integer(10)
print(int_var)
# Logical
log_var <- FALSE
print(log_var)
# Character
char_var <- "Visualization"
print(char_var)
# Factor (categorical variable)
fact_var <- factor(c("Male", "Female", "Female", "Male"))
print(fact_var)
# Matrix
mat <- matrix(1:9, nrow=3)
print(mat)
# Data Frame
df <- data.frame(Name=c("A", "B"), Marks=c(90, 85))
print(df)
3. Overview of Visualization
# ----------------------------
# Visuals help us to see data trends, outliers, patterns
# We'll demonstrate:
# - Line Plot
# - Bar Plot
# - Histogram
# - Pie Chart
# - Boxplot
# - Scatter Plot
# 4. Basic Plotting Graphs
# ------------------------
# Line Plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line
Plot")
# Bar Plot
subjects <- c("Math", "Science", "English")
marks <- c(88, 75, 90)
barplot(marks, names.arg=subjects, col="skyblue", main="Bar Plot:
Subject Marks")
# Histogram
data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)
hist(data, col="green", main="Histogram of Data", xlab="Values")
# Pie Chart
slices <- c(10, 20, 30, 40)
labels <- c("A", "B", "C", "D")
pie(slices, labels=labels, main="Pie Chart", col=rainbow(length(slices)))
# Boxplot
score <- c(65, 70, 75, 80, 90, 100, 50, 60)
boxplot(score, main="Boxplot of Scores", col="orange")
# Scatter Plot
x <- c(5, 10, 15, 20)
y <- c(2, 4, 8, 16)
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y", col="red", pch=19)
5. Types of Graphs in Analytics
# -------------------------------
# - Univariate plots (histogram, barplot, pie)
# - Bivariate plots (scatterplot, boxplot)
# - Multivariate plots (pairs, grouped barplot)
# - Time-series plots (line charts)
# - Distribution plots (density, histogram)
# Example: Multiple lines in a plot (time series)
time <- 1:5
sales_2023 <- c(100, 120, 130, 115, 140)
sales_2024 <- c(90, 110, 125, 135, 150)
plot(time, sales_2023, type="o", col="blue", ylim=c(80, 160),
    xlab="Quarter", ylab="Sales", main="Sales Comparison")
lines(time, sales_2024, type="o", col="red")
legend("topleft", legend=c("2023", "2024"), col=c("blue", "red"), lty=1,
pch=1)
Try using built-in datasets like:
data(mtcars)
head(mtcars)
Then you can apply:
plot(mtcars$wt, mtcars$mpg, col="blue", main="Weight vs MPG")
What is R?
      R is a powerful programming language and software environment
       used for statistical computing and graphics.
      It is open-source and widely used in data science, machine
       learning, and academic research.
🔹 Why Use R for Data Visualization?
      Built-in support for graphics and plotting.
      Libraries like ggplot2, lattice, plotly, shiny enhance visual
       capabilities.
      Supports interactive dashboards and visual storytelling.
Variables and Assignment
x <- 10     # assigns 10 to x
name <- "R Language"
Data Types in R
Type        Example         Description
Numeric     x <- 5.5        Decimal values
            x <-
Integer                    Whole numbers
            as.integer(10)
            name <-
Character                   Text data
            "Data"
Logical     flag <- TRUE    TRUE/FALSE values
Type        Example          Description
            factor(c("M",
Factor                       Categorical values
            "F"))
Vector      c(1,2,3)         Sequence of elements
            matrix(1:6, 2,
Matrix                       2D array of data
            3)
Data                         Table-like structure
            data.frame()
Frame                        (rows/cols)
. Overview of Data Visualization
🔹 What is Data Visualization?
      The graphical representation of data and information using visual
       elements like charts, graphs, and maps.
🔹 Importance:
      Simplifies complex data
      Highlights patterns and trends
      Aids in effective communication and decision making
Basic Graphs in R (Using Base Functions)
Graph                                               Functio
            Purpose
Type                                                n
Line Plot   Show trends over time                   plot()
                                                    barplot(
Bar Chart Compare categories
                                                    )
Pie Chart Show proportions                          pie()
Histogram View frequency distribution               hist()
                                                    boxplot(
Box Plot    View data distribution and outliers
                                                    )
Scatter     Relationship between two numeric
                                                    plot()
Plot        variables
Example: Line Plot
x <- 1:5
y <- c(5, 10, 15, 20, 25)
plot(x, y, type="o", col="blue", main="Line Plot", xlab="X", ylab="Y")
Graph Types in Analytics
✅ Univariate Graphs
        Analyze single variable
           o   Histogram, Pie Chart, Box Plot
Bivariate Graphs
        Compare two variables
           o   Scatter Plot, Bar Graph, Line Graph
✅ Multivariate Graphs
        Analyze three or more variables
           o   Grouped Bar Plot, Bubble Chart, Faceted Plots
Key Functions in Base R
Functio
        Use
n
plot()     General plotting
hist()     Histogram
barplot(
         Bar chart
)
boxplot(
         Boxplot
)
pie()      Pie chart
lines()    Add lines to a plot
legend() Add legends to a
Functio
        Use
n
         plot
# Variable declaration
num <- 10             # Numeric
name <- "R Programming"         # Character
flag <- TRUE          # Logical
# Display values
print(num)
print(name)
print(flag)
Data Types in R
# Numeric
a <- 23.5
print(class(a)) # "numeric"
# Integer
b <- as.integer(23)
print(class(b)) # "integer"
# Character
c <- "Hello R"
print(class(c)) # "character"
# Logical
d <- FALSE
print(class(d)) # "logical"
# Vector
v <- c(1, 2, 3, 4, 5)
print(v)
print(class(v)) # "numeric"
# Factor
gender <- factor(c("Male", "Female", "Female", "Male"))
print(gender)
print(class(gender)) # "factor"
# Matrix
mat <- matrix(1:9, nrow=3)
print(mat)
# Data Frame
df <- data.frame(Name=c("John", "Alice"), Age=c(25, 30))
print(df)
Visualization = transforming raw data into visual insights
# This example shows how to use built-in plotting to understand trends
data <- c(12, 15, 20, 18, 25)
barplot(data, main="Sample Bar Chart", col="steelblue")
Basics of Plotting Graphs in R
# Line Plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
plot(x, y, type="o", col="blue", xlab="X Axis", ylab="Y Axis", main="Line
Plot")
# Bar Plot
subjects <- c("Math", "Science", "English")
scores <- c(80, 90, 70)
barplot(scores, names.arg=subjects, col="green", main="Bar Chart of
Subjects")
# Histogram
data <- c(10, 20, 20, 30, 40, 40, 40, 50, 60, 70)
hist(data, col="purple", main="Histogram of Values", xlab="Value")
# Pie Chart
slices <- c(25, 35, 40)
labels <- c("A", "B", "C")
pie(slices, labels=labels, col=rainbow(length(slices)), main="Pie Chart")
# Boxplot
marks <- c(55, 60, 65, 70, 90, 85, 45, 77)
boxplot(marks, main="Boxplot of Marks", col="orange")
# Scatter Plot
wt <- c(2, 2.5, 3, 3.5, 4)
mpg <- c(35, 30, 28, 22, 20)
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
Different Types of Graphs in Analytics (Summary with Examples)
# Univariate: Histogram (distribution of a single variable)
hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")
# Bivariate: Boxplot (continuous vs categorical)
boxplot(mpg ~ cyl, data=mtcars, main="Bivariate - MPG vs Cylinders",
col="yellow")
# Bivariate: Scatter plot (2 continuous variables)
plot(mtcars$hp, mtcars$mpg, main="HP vs MPG", xlab="Horsepower",
ylab="MPG", col="blue", pch=16)
# Multivariate: Colored scatter plot with shape by gear
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg, color=factor(cyl), shape=factor(gear)))
+
 geom_point(size=3) +
 ggtitle("Multivariate: Weight vs MPG by Cylinders and Gears")
Description of the Command
🔹 Function: hist()
      This function is used to create a histogram — a graphical
       representation that organizes a group of data points into user-
       specified ranges (bins).
      It helps you visualize the frequency distribution of a continuous
       numeric variable.
Component-by-Component Explanation
Component                   Meaning
                            Refers to the mpg (Miles Per Gallon) column in
mtcars$mpg
                            the built-in mtcars dataset.
hist(mtcars$mpg)            Plots a histogram for the mpg values.
main="Univariate - MPG
                            Adds a title to the plot.
Distribution"
                            Fills the bars of the histogram with the color
col="skyblue"
                            "skyblue".
hist(mtcars$mpg, main="Univariate - MPG Distribution", col="skyblue")
What This Graph Shows
      X-axis: Represents MPG intervals (e.g., 10–15, 15–20, etc.).
      Y-axis: Represents the frequency — how many cars fall within each
       MPG range.
      You can visually see:
          o   Whether the data is skewed or symmetric
          o   Where most values are concentrated (central tendency)
          o   Any possible outliers
wt <- c(2, 2.5, 3, 3.5, 4)
mpg <- c(35, 30, 28, 22, 20)
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
This code creates a scatter plot showing the relationship between the
weight of vehicles (wt) and their corresponding miles per gallon
(mpg).
Line-by-Line Explanation
🔹 wt <- c(2, 2.5, 3, 3.5, 4)
      Creates a numeric vector wt with 5 values.
      These could represent weights of cars in tons (or another unit).
🔹 mpg <- c(35, 30, 28, 22, 20)
      Creates another numeric vector mpg with corresponding Miles Per
       Gallon values.
      Each value in mpg is related to the respective value in wt.
Weight     MP
(wt)       G
2.0        35
2.5        30
3.0        28
3.5        22
4.0        20
plot(wt, mpg, ...)
This command plots the values:
Paramet
        Description
er
           x = wt, y = mpg → plots weight on X-axis,
wt, mpg
           MPG on Y-axis.
main       "Scatter Plot: Weight vs MPG" → adds a title.
xlab       "Weight" → label for the X-axis.
ylab       "MPG" → label for the Y-axis.
col="red" Plots points in red color.
pch=19     Uses solid circle for plotting points.
What the Scatter Plot Shows
      As weight increases, MPG decreases.
      This represents a negative correlation between vehicle weight
       and fuel efficiency.
      Heavier cars tend to be less fuel-efficient.
Interpretation
This kind of visualization is important in automobile analytics, where we
assess how one feature (e.g., weight) affects another (e.g., fuel
efficiency).
plot(wt, mpg, main="Scatter Plot: Weight vs MPG", xlab="Weight",
ylab="MPG", col="red", pch=19)
abline(lm(mpg ~ wt), col="blue", lwd=2)
What This Script Does
This code creates a scatter plot of Weight vs MPG and adds a
regression (trend) line that represents the best linear fit for the data.
Paramet
        Description
er
wt        X-axis values: car weights
          Y-axis values: miles per
mpg
          gallon
main      Title of the graph
xlab, ylab Axis labels
col="red" Points are colored red
          Points are plotted as solid
pch=19
          circles
abline(lm(mpg ~ wt), col="blue", lwd=2)
Adds a regression (trend) line to the plot.
Compone Descripti
nt      on
lm(mpg ~ Fits a linear regression model where mpg is
wt)      predicted using wt.
abline(.. Adds the regression line to the
.)        scatter plot.
col="blu The line is colored
e"       blue.
lwd= Line width is set to 2 (thicker line for better
2    visibility).
rpose: Helps visualize the trend — in this case, how MPG decreases
as weight increases.
Output Interpretation
If the trend line is downward-sloping:
      Conclusion: Heavier vehicles typically have lower fuel efficiency.
plot(mtcars$mpg, mtcars$wt, col='steelblue',
   main='Scatterplot', xlab='mpg', ylab='wt', pch=19)
This line of code creates a scatter plot to visualize the relationship
between:
      mpg (Miles Per Gallon) — fuel efficiency
      wt (Weight of the car in 1000 lbs)
using the built-in mtcars dataset in R.
Compone Descripti
nt      on
plot( The R function used to create scatter plots or
)     other graphs
mtcars$m Values for the X-axis (Miles Per
pg       Gallon)
mtcars$ Values for the Y-axis (Weight in
wt      1000 lbs)
col='steelbl Sets the color of the plotted points to
ue'          "steelblue"
main='Scatterpl Sets the main title of the
ot'             graph
xlab='mp Labels the X-axis as
g'       "mpg"
ylab=' Labels the Y-axis as
wt'    "wt"
pch=1 Plots the points using solid filled
9     circles
 The scatter plot consists of points where:
     X-axis = mpg (fuel efficiency)
     Y-axis = wt (vehicle weight)
 It helps visualize the relationship between a car's weight and its
fuel efficiency.
Scatter plots are ideal to:
     Detect correlation between variables.
     Spot outliers.
     Visualize linear/nonlinear patterns.
abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)
This draws a line showing the overall trend between mpg and wt.
abline(lm(mtcars$wt ~ mtcars$mpg), col='red', lwd=2)
Specifically, it shows the linear relationship between mpg (miles per
gallon) and wt (weight of the vehicle) from the mtcars dataset.
Component               Description
lm(mtcars$wt ~          Fits a linear regression model where wt is
Component                Description
mtcars$mpg)              predicted by mpg.
                         Draws a straight line using the coefficients of
abline(...)
                         the regression model.
col='red'                Colors the line red.
                         Sets the line width to 2 (makes it thicker for
lwd=2
                         visibility).
What the Regression Line Represents
      The regression line is the best-fit straight line through the data
       points.
      It shows the trend or direction of the relationship between mpg
       and wt.
Interpretation:
      As mpg increases, wt decreases.
      This indicates a negative linear relationship: cars that are more
       fuel efficient tend to weigh less.
What is the Use of a Boxplot in R (and Data Analysis)?
A boxplot (also known as a box-and-whisker plot) is a graphical
summary of the distribution of a dataset. It visually displays a five-
number summary:
   1. Minimum
   2. First Quartile (Q1)
   3. Median (Q2)
   4. Third Quartile (Q3)
   5. Maximum
🎯 Main Uses of a Boxplot
. Visualizing Distribution
      Shows how the data is spread out — whether it is symmetric,
       skewed left/right, or has outliers.
2. Identifying the Median
      The line inside the box represents the median, which is the
       central value of the dataset.
Detecting Skewness
      If the median is closer to the bottom or top of the box, the data
       is skewed.
          o   Bottom = left-skewed
          o   Top = right-skewed
Spotting Outliers
      Data points that fall outside the whiskers are considered outliers
       and are shown as individual dots.
Comparing Groups
      When plotting multiple boxplots side-by-side, you can compare
       distributions across categories (e.g., scores of boys vs girls,
       sales in different regions, etc.).
# Create sample data
scores <- c(55, 60, 65, 70, 90, 85, 45, 77)
# Plot boxplot
boxplot(scores, main="Boxplot of Scores", col="lightblue", ylab="Score")
This will show:
      The central 50% of scores (interquartile range)
      The median score
      Any outliers in the data
Why Boxplots are Important in Analytics
      Quickly detect variability, outliers, and data symmetry.
      Preferred in exploratory data analysis (EDA).
      Useful when comparing large datasets or categories
✅ R Script: Side-by-Side Boxplots
# Load built-in dataset
data(mtcars)
# Convert 'cyl' to a categorical variable
mtcars$cyl <- as.factor(mtcars$cyl)
# Create side-by-side boxplots
boxplot(mpg ~ cyl, data = mtcars,
       main = "MPG Distribution by Number of Cylinders",
       xlab = "Number of Cylinders",
       ylab = "Miles Per Gallon (MPG)",
       col = c("skyblue", "orange", "lightgreen"))
Component Description
mpg ~ cyl Formula format: Compare mpg across different cyl values (4,
6, 8)
data = mtcars       Use the mtcars dataset
boxplot(...) Draws the grouped boxplots
col = ...     Adds different colors to each box
main, xlab, ylab    Add labels and title for readability
What the Boxplot Shows
Each box shows the distribution of MPG for a specific cylinder category.
You can compare:
Median fuel efficiency
Variation in MPG
Presence of outliers
You'll likely see:
4-cylinder cars have higher and more consistent MPG.
8-cylinder cars have lower MPG with more spread.
The operator ~ used in the expression mpg ~ cyl is called the tilde
operator and it has a special meaning in R, especially in formulas for
statistical modeling and plotting.
Operator: ~ (Tilde)
✅ Used for:
Creating a formula object that defines a relationship between variables.
In Context: mpg ~ cyl
🔹 Meaning:
"mpg is modeled as a function of cyl"
mpg is the dependent variable (Y-axis)
cyl is the independent variable or grouping factor (X-axis)
📘 In the boxplot function:
Copy
Edit
boxplot(mpg ~ cyl, data = mtcars)
This tells R to:
Group the mpg values based on each unique value of cyl
Then create a separate boxplot for each cylinder group (4, 6, 8)
ata(mtcars)
boxplot(disp ~ gear, data = mtcars,
       main = "Displacement by Gear",
       xlab = "Gear",
       ylab = "Displacement")
This code creates a boxplot that shows the distribution of engine
displacement (disp) grouped by gear categories (gear) in the mtcars
dataset.
Line-by-Line Explanation
🔹 data(mtcars)
Loads the built-in mtcars dataset.
It contains information about various car attributes such as:
disp = engine displacement (in cubic inches)
gear = number of forward gears (3, 4, or 5)
This creates side-by-side boxplots of the disp variable for each unique
value in gear.
Element      Description
disp ~ gear Formula: Plot displacement (disp) grouped by gear
data = mtcars      Use the mtcars dataset
main Title of the boxplot
xlab, ylab   Axis labels for clarity
What the Boxplot Shows
X-axis (gear): Different gear groups — typically 3, 4, and 5 gears.
Y-axis (disp): The engine displacement (how big the engine is).
Each boxplot summarizes the distribution of displacement for each gear
group.
. Gear = 3 → Highest Displacement
The boxplot shows that cars with 3 forward gears have a higher median
displacement, and the range (box + whiskers) is spread over larger engine
sizes.
This suggests that older or heavier vehicles, such as classic muscle cars or
luxury sedans, often come with:
Fewer gears
Larger engines
Less fuel efficiency
These cars are designed more for power and torque than for speed or
agility.
Gear = 5 → Lower Displacement
Cars with 5 gears show a lower engine displacement, with a tighter spread
and smaller median value.
This typically represents:
Modern, performance-tuned vehicles
Compact sports cars
Fuel-efficient or economy-class cars
These engines may be smaller in size, but paired with more gears for
better performance, speed, and efficiency.