Business Analytics Unit III: Getting Started with R
1. Introduction to R
R is an open-source programming language developed by Ross Ihaka and Robert Gentleman in the 1990s. It is
specifically designed for statistical computing and data analysis. R supports a wide range of statistical techniques,
visualizations, and data manipulation methods. It is widely used in business analytics for predictive modeling, data
visualization, and decision-making strategies.
2. Advantages of R
R is:
- Open Source: Free and widely accessible.
- Equipped with Advanced Statistical Tools: Supports regression, clustering, forecasting.
- Excellent at Data Handling: Manages large and complex datasets efficiently.
- Cross-Platform: Available on Windows, Mac, and Linux.
- Backed by a Strong Community: Thousands of packages and active user forums.
3. Installing R & RStudio
To use R:
1. Download R from CRAN (Comprehensive R Archive Network).
2. Download and install RStudio from rstudio.com, which provides a user-friendly IDE to write, debug, and visualize R
code.
4. RStudio Interface
- Console: Area to run R code.
- Environment: Shows active variables and datasets.
- Files/Plots/Help Tabs: Used to navigate files, view charts, and access documentation.
5. Packages and Libraries
Packages are collections of functions for specific tasks. To use a package:
- Install: install.packages("ggplot2")
- Load: library(ggplot2)
Popular packages: ggplot2 (visuals), dplyr (data manipulation), readxl (import Excel).
6. Importing Excel Data
Use `readxl` or `openxlsx` packages. Example:
library(readxl)
df <- read_excel("file.xlsx")
You can select sheet names, cell ranges, and specify column types.
Business Analytics Unit III: Getting Started with R
7. Operators in R
- Arithmetic: +, -, *, /, ^, %% (mod), %/% (integer division)
- Relational: ==, !=, >, <, >=, <=
- Logical: &, |, !
- Assignment: <-, =, ->
8. Data Types in R
R supports:
- Numeric: Decimal numbers like 3.14
- Integer: Whole numbers with 'L' (e.g., 4L)
- Character: Text strings like "Hello"
- Logical: TRUE/FALSE
- Factor: Categorical (e.g., Gender)
- Complex: Numbers with imaginary parts like 2+3i
9. Functions in R
Functions are reusable blocks of code. Syntax:
add <- function(x, y) {
return(x + y)
}
They improve modularity and efficiency in coding by taking inputs (arguments) and returning outputs.
10. Data Structures in R
- Vectors: One-dimensional, same type. Created with c().
- Matrices: Two-dimensional, same type. Created with matrix().
- Lists: One-dimensional, mixed types. Created with list().
- Arrays: Multi-dimensional, same type. Created with array().
11. Factors & Data Frames
- Factors: Used for categorical data. Created with factor().
Levels and ordered categories can be specified.
- Data Frames: Tabular structures with columns of different types.
Access data using $, [row, col], and use functions like str(), subset(), etc.
Appendix: Flashcards
Q: What is R?
A: An open-source language for statistical computing and data analysis.
Business Analytics Unit III: Getting Started with R
Q: Name two advantages of R.
A: It is free and supports advanced statistical tools.
Q: What is RStudio?
A: A user-friendly IDE for R programming.
Q: How do you install a package in R?
A: install.packages("package_name")
Q: What is a vector in R?
A: A one-dimensional data structure with same type elements.
Q: How is a data frame different from a matrix?
A: Data frames can have different data types in each column; matrices cannot.
Q: What is a factor?
A: A categorical data type with fixed levels.
Q: Write the syntax for a function in R.
A: function_name <- function(arg1, arg2) { code block }
Q: What operator is used for modulus in R?
A: %%
Q: How do you read an Excel file in R?
A: Use read_excel("filename.xlsx") from the readxl package.