Class 1: Introduction to R
Introduction to R and Data Basics
Md. Iftekhar Ahmed Khan
Machine Learning Engineer
Bondstein Technologies
Limited
Welcome & Today's Goals
• Today you will learn to:
• Understand what R is and why it's used.
• Navigate the RStudio interface.
• Perform basic calculations and use variables.
• Understand and use fundamental data structures (Vectors, Data Frames).
• Manage R packages (install/load).
• Import data from common file types (CSV, Excel) and save results.
• Know how to get help!
What is R?
• R is a language AND an environment for statistical computing
and graphics.
• Strengths:
• Specifically designed for data analysis and visualization.
• Open Source: Free to use, modify, and distribute.
• Huge Community: Active development, extensive documentation, lots
of help online.
• Packages: Thousands of add-ons for specialized tasks (more later!).
• Common Uses: Data cleaning, data exploration, statistical
modeling, machine learning, report generation, creating plots
and dashboards.
Why R for Data Science?
• Vast Package Ecosystem: CRAN (Comprehensive R Archive
Network) hosts thousands of packages (like dplyr for manipulation,
ggplot2 for plotting).
• Powerful Visualization: Tools like ggplot2 allow for creating
complex and publication-quality graphics.
• Data Wrangling: Excellent tools (like the tidyverse) for cleaning,
transforming, and preparing data.
• Reproducibility: Scripts make analyses repeatable and shareable.
• Interoperability: Connects well with databases, other languages
(Python, SQL), and reporting tools (R Markdown).
RStudio
• RStudio: An Integrated Development Environment (IDE) for
R. Think of it as a powerful dashboard for R.
• Code editor with syntax highlighting
• Console to run commands interactively
• Workspace browser to see your variables
• Plotting window
• Help and file browsers
• Package manager
Your First Commands (Use Console)
• R can be used as a powerful calculator.
• Type directly into the Console pane (after the > prompt) and
press Enter.
• # Basic Arithmetic
• 2+2
# [1] 4 <- This is the output R gives
• 5 * 10
# [1] 50
• 10 / 3
# [1] 3.333333
Logical Operations & Comparisons
• Used for asking TRUE/FALSE questions. Essential for filtering data later.
• == means "is equal to?" (Note: double equals!)
• != means "is not equal to?"
• >, <, >=, <= (Greater than, Less than, etc.)
# Comparisons
5>3
# [1] TRUE
10 == 10
# [1] TRUE
10 == 5
# [1] FALSE
5 != 6
# [1] TRUE
Variables (Objects) in R
• Store values or results using variables (R often calls them
objects).
• Use the assignment operator <- (less than sign, hyphen).
Think of it as an arrow pointing from the value to the variable
name.
• Variable names:
• Must start with a letter.
• Can contain letters, numbers, _, and ..
• Are case-sensitive (myVar is different from myvar).
• Avoid using names of existing functions (like c, mean, data).
Using Built-in Functions
• Functions perform specific tasks. You provide arguments (inputs)
inside parentheses ().
• R has many built-in functions.
some_numbers <- c(2, 8, 3, 7, 5)
# Use functions on the data
sum(some_numbers) # Calculates the sum
# [1] 25
mean(some_numbers) # Calculates the average (mean)
# [1] 5
Getting Help!
• Essential skill! Don't try to memorize everything.
• Use ? followed by the function name (no parentheses needed).
• Use help("function_name").
• Use ?? to search documentation for keywords (use quotes).
Packages: Extending R's Power
• Packages are collections of functions, data, and documentation that add
specific capabilities to R.
• Thousands are available from CRAN (Comprehensive R Archive Network)
and other places (like GitHub, Bioconductor).
• Examples: dplyr for data manipulation, ggplot2 for plotting, readxl for
reading Excel files.
• Two Steps:
• Install: Download the package to your computer (only need to do ONCE per R
installation). Use install.packages("package_name").
• Load: Make the package's functions available in your current R session (need to do
EVERY TIME you start a new R session and want to use it). Use
library(package_name).
Data Structures: Organizing Your Data
• Variables store single values. Data structures store collections of values.
• R has several fundamental data structures:
• Vectors: Ordered sequence of elements of the same basic type. (MOST
FUNDAMENTAL)
• Data Frames: Rectangular table (like a spreadsheet), columns can be
different types. (MOST IMPORTANT FOR TABULAR DATA)
• Lists: Ordered, flexible collection, elements can be of different
types/structures.
• Matrices: 2-dimensional array, all elements must be the same type.
• Factors: Special type of vector for representing categorical data
(groups/levels).
Data Structure 1: Vectors
• The basic building block. Use the c() function (combine or
concatenate).
• All elements MUST be the same type (numeric, character,
logical). If you mix, R will coerce them (often to character).
# Numeric vector
ages <- c(25, 30, 22, 45)
ages
# [1] 25 30 22 45
class(ages)
# "numeric"
Data Structure 2: Data Frames
• The go-to structure for datasets (rows = observations, columns =
variables).
• Think spreadsheet: rectangular.
• Columns are typically vectors.
• Columns can be different data types (numeric, character, etc.).
• All columns MUST have the same length (same number of rows).
# Creating a data frame
employee_data <- data.frame(
EmployeeID = c(101, 102, 103, 104),
Name = c("Alice", "Bob", "Charlie", "David"),
Department = c("Sales", "IT", "Sales", "HR"),
Salary = c(50000, 65000, 52000, 58000)
)
# Print the data frame
employee_data
Accessing Data Frame Elements
• Use $ to access columns by name (most common).
• Use [[ ]] to access columns by name or index.
• Use [row, column] indexing.
# Access the 'Name' column
employee_data$Name
# [1] "Alice" "Bob" "Charlie" "David"
# Access the 'Salary' column
employee_data[["Salary"]]
# [1] 50000 65000 52000 58000
# Access the 3rd column (Department)
employee_data[[3]]
# [1] "Sales" "IT" "Sales" "HR"
Data Structure 3: Lists
• Flexible containers. Can hold vectors, data frames, other lists,
mixed types.
my_list <- list(name = "Alice", age = 30, scores = c(85, 92, 78), employed
= TRUE)
my_list #Print the list
my_list$scores # Access list elements by name using $
my_list[[3]] # Access list elements by index using [[ ]]
Working Directory & RStudio Projects
• When reading/writing files, R looks in the working directory.
• getwd(): Get Working Directory (see where R is looking).
• setwd("path/to/your/directory"): Set Working Directory (use /
not \). Can be fragile!
• BETTER WAY: RStudio Projects!
• Go to File -> New Project... -> New Directory (or Existing
Directory).
• Create a folder for your course/project.
• RStudio automatically sets the working directory to the project folder when
you open the .Rproj file.
• Keeps scripts, data, and output organized together! Highly recommended.
Importing Data: CSV Files
• CSV = Comma Separated Values. Very common plain text
format.
• Use read.csv() (base R) or read_csv() (from the readr
package, part of tidyverse - often faster and smarter).
• Make sure the CSV file is in your RStudio Project folder (or
working directory).
# Assume 'employee_data.csv' exists in your project directory
# Using base R:
my_data_csv <- read.csv("employee_data.csv")
head(my_data_csv)
str(my_data_csv)
Importing Data: Excel Files
• Requires the readxl package (install and load it first!).
• read_excel() function is the main tool.
• Can specify sheet name or number.
# Make sure readxl is loaded: library(readxl)
# Assume 'employee_data.xlsx' exists in your project directory
# Read the first sheet by default
my_data_excel <- read_excel("employee_data.xlsx")
head(my_data_excel)
str(my_data_excel)
Class 1 Summary & Recap
• R is a powerful language for data analysis. RStudio is the best way to
use it.
• You can do calculations, use variables (<-), and call functions ().
• Key Data Structures:
• Vectors: c(), same data type, access with [].
• Data Frames: data.frame(), columns ($, [[]], [,]), rows ([,]).
• Packages extend R: install.packages(), library().
• Use RStudio Projects for organization.
• Import/Export: read.csv(), read_excel(), write.csv(),
write_xlsx().
• Getting Help: ?, ??.
Practice & Next Class
• Practice:
• Create different types of vectors.
• Create a simple data frame.
• Practice accessing elements/columns.
• Try importing a sample CSV or Excel file (find one online or create
one).
• Next Class:
• Data Manipulation! We'll learn how to filter, select, rearrange, and
summarize data using the powerful dplyr package.