An R Companion For Introduction To Data Mining

An R Companion for Introduction to Data Mining is an open-source resource designed to help learners implement data mining concepts using R, complementing the textbook Introduction to Data Mining. It provides annotated code examples for data preparation, classification, clustering, and association analysis, targeting advanced undergraduate and graduate students. The resource is publicly available and aims to fill the gap for complete R code examples in data mining education.

Uploaded by

wangyonl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views3 pages

An R Companion For Introduction To Data Mining

Uploaded by

wangyonl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

An R Companion for Introduction to Data Mining

1
Michael Hahsler
1 Department of Computer Science, Southern Methodist University, USA
DOI: 10.21105/jose.00223
Software
• Review Summary
• Repository
• Archive An R Companion for Introduction to Data Mining is an open-source learning and teaching
resource that covers how to implement data mining concepts using R. It is designed
Submitted: 31 May 2023 to accompany the popular data mining textbook Introduction to Data Mining (Tan et
Published: 26 December 2024 al., 2017) to study the implementation of the basic data mining concepts including data
License preparation, classification, clustering, and association analysis. The resource uses complete,
Authors of papers retain copyright annotated examples to demonstrate how data mining concepts can be translated into R
and release the work under a code.
Creative Commons Attribution 4.0
The materials have been made publicly available at: https://github.com/mhahsler/
International License (CC BY 4.0).
Introduction_to_Data_Mining_R_Examples and licensed under the Creative Commons
Attribution 4.0 (CC BY 4.0) License.

Statement of Need
The textbook Introduction to Data Mining (Tan et al., 2017) has been one of the most
popular choices to learn and teach data mining concepts. Several chapters have been made
available for free by the authors on the books’s website. One of the authors also provides
Python Jupyter notebooks with examples, but complete R code examples were still needed.
Given the R community’s interest in data analysis, data science, and machine learning,
and the broad support of R packages for data mining, there was a noticeable gap that
was filled by this learning resource. This resource targets advanced undergraduate and
graduate students and can be used as a component for a first introduction to data mining.

Learning Objectives and Content

The resource assumes basic knowledge of programming and statistics. The learning
objectives are to:
• prepare and understand data,
• perform classification,
• perform association analysis, and
• perform cluster analysis.
The resource presents self-contained and annotated R code examples that work with small
datasets carefully chosen to show the learner many important aspects of data mining. The
learner can copy and paste the examples into a new R markdown notebook to experiment
with the code and the provided example data. Small exercises encourage the learner to
modify the code by applying it to a different dataset. This learning-by-doing approach
has worked well in preparing students to work with more complex real-world datasets by
initially relieving them from dealing with too many low-level implementation details while
exploring the concepts.

Hahsler. (2024). An R Companion for Introduction to Data Mining. Journal of Open Source Education, 7 (82), 223. https://doi.org/10.21105/jose.00223. 1
The resource mirrors the textbook’s structure so it can be used along with it easily. After
a short introduction, Chapter 2 discusses data types in R, data quality concerns, and
data preprocessing. Data exploration and visualization examples are included. Chapters
3 and 4 cover classification methods, model selection, model evaluation, different types
of classifiers, and essential practical issues like class imbalance. Chapter 5 introduces
association analysis with a strong emphasis on visualization. Chapter 7 presents examples
of cluster analysis, including popular algorithms, cluster evaluation, and the effect of
outliers.

Instructional Design
This resource does not replace the Introduction to Data Mining textbook or instruction by
a teacher. It instead provides supporting material for learning to implement data mining
concepts in R. The learner is expected to have some programming experience and basic
statistics knowledge.
The resource can be used for self-study by any interested person together with reading the
Introduction to Data Mining textbook, but its main purpose is to be used as a component
for designing an introductory data mining course for advanced undergraduate or graduate
students. To support instructors, in addition to the documented code examples, complete
presentation slide sets are provided on the book’s GitHub page in PDF and PowerPoint
format. The slides are organized in the same way as the resource. A direct connection
between the slides and the code examples is provided by the R symbol on the slides
where example code is available. The code examples can be assigned to be studied by the
students outside of class or used by the instructor in class.
Designing assignments and assessments is left to the instructor since they depend on
the level and field of study of the students (e.g., computer science, statistic, economics,
or business). For example, for undergraduates, we suggest to ask the students to apply
the data mining techniques to a small, clean instructional data set (sample exercises are
available in the resource at the end of each chapter), while graduate students may be
asked to analyze larger real-world data sets, which may require a significant amount of
cleaning and preprocessing.

Story of the Project

Since starting to teach data mining with R in the Spring of 2013, I have been developing
the Companion for Introduction to Data Mining resource mainly based on caret (Kuhn,
2008), and a set of packages developed with students to better support different data
mining tasks (e.g., arules (Hahsler et al., 2005), seriation (Hahsler et al., 2008) arulesViz
(Hahsler, 2017), and dbscan (Hahsler et al., 2019)). The resource grew from a collection of
short, unconnected R scripts to a complete set of documented code examples that walk
the learner step-by-step through how to implement data mining methods, and how to
interpret the results. It went through an update to incorporate the popular tidyverse
package collection (Wickham et al., 2019) and a transition from the 1st edition of the
Introduction to Data Mining textbook to the second.
The companion resource has been used successfully in the department of Computer Science
at Southern Methodist University for many years and by several instructors as a key
component of an introductory data mining course delivered in person and in a distance
education setting. It is also linked on the textbook website as an official resource. Faculty
at the department actively maintains the resource, and we will update it with new R tools
like tidymodels (Kuhn & Wickham, 2020) over time.

Hahsler. (2024). An R Companion for Introduction to Data Mining. Journal of Open Source Education, 7 (82), 223. https://doi.org/10.21105/jose.00223. 2
References
Hahsler, M. (2017). arulesViz: Interactive visualization of association rules with R. R
Journal, 9(2), 163–175. https://doi.org/10.32614/RJ-2017-047
Hahsler, M., Grün, B., & Hornik, K. (2005). arules – A computational environment for
mining association rules and frequent item sets. Journal of Statistical Software, 14(15),
1–25. https://doi.org/10.18637/jss.v014.i15
Hahsler, M., Hornik, K., & Buchta, C. (2008). Getting things in order: An introduction
to the R package seriation. Journal of Statistical Software, 25(3), 1–34. https:
//doi.org/10.18637/jss.v025.i03
Hahsler, M., Piekenbrock, M., & Doran, D. (2019). dbscan: Fast density-based clustering
with R. Journal of Statistical Software, 91(1), 1–30. https://doi.org/10.18637/jss.v091.
i01
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of
Statistical Software, 28(5), 1–26. https://doi.org/10.18637/jss.v028.i05
Kuhn, M., & Wickham, H. (2020). Tidymodels: A collection of packages for modeling and
machine learning using tidyverse principles. https://doi.org/10.32614/CRAN.package.
tidymodels
Tan, P.-N., Steinbach, M. S., Karpatne, A., & Kumar, V. (2017). Introduction to data
mining (2nd Edition). Pearson. ISBN: 978-0133128901
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., Grole-
mund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E.,
Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani,
H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686.
https://doi.org/10.21105/joss.01686

Hahsler. (2024). An R Companion for Introduction to Data Mining. Journal of Open Source Education, 7 (82), 223. https://doi.org/10.21105/jose.00223. 3

EDA Website Material (As Per VTU Syllabus)
No ratings yet
EDA Website Material (As Per VTU Syllabus)
160 pages
Learning Data Mining With R Sample Chapter
No ratings yet
Learning Data Mining With R Sample Chapter
35 pages
Chapter 1 Introduction - An R Companion For Introduction To Data Mining
No ratings yet
Chapter 1 Introduction - An R Companion For Introduction To Data Mining
9 pages
R For Data Science Sample Chapter
100% (1)
R For Data Science Sample Chapter
39 pages
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
100% (1)
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
52 pages
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
100% (1)
Kassambara, Alboukadel - Machine Learning Essentials - Practical Guide in R (2018)
424 pages
From Data To Decisions in Music Education Research Data Analytics and The General Linear Model Using R 1st Edition Brian C. Wesolowski Download
100% (1)
From Data To Decisions in Music Education Research Data Analytics and The General Linear Model Using R 1st Edition Brian C. Wesolowski Download
31 pages
Introducing Data Mining With Rattle and R
No ratings yet
Introducing Data Mining With Rattle and R
35 pages
Instant Ebooks Textbook R in Action 3rd Edition Robert I. Kabacoff Download All Chapters
100% (6)
Instant Ebooks Textbook R in Action 3rd Edition Robert I. Kabacoff Download All Chapters
49 pages
Introduction To Data Science Data Analysis and Prediction Algorithms With R
No ratings yet
Introduction To Data Science Data Analysis and Prediction Algorithms With R
4 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
Data Mining Essentials Explained
No ratings yet
Data Mining Essentials Explained
24 pages
Unit-I Data Mining
No ratings yet
Unit-I Data Mining
28 pages
Day1 2017
No ratings yet
Day1 2017
74 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
CU Data Science
No ratings yet
CU Data Science
8 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Unit 3
No ratings yet
Unit 3
34 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
SASprimer
No ratings yet
SASprimer
125 pages
Data Mining Mod 1 Notes
No ratings yet
Data Mining Mod 1 Notes
25 pages
Data Mining
No ratings yet
Data Mining
44 pages
Data Mining - KTUweb PDF
No ratings yet
Data Mining - KTUweb PDF
82 pages
Data Science Recommended Books
No ratings yet
Data Science Recommended Books
23 pages
R & Azure ML: A Beginner's Guide
No ratings yet
R & Azure ML: A Beginner's Guide
63 pages
Data Mining Overview and Applications
No ratings yet
Data Mining Overview and Applications
6 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
Data Mining Applications With R 1st Edition Yanchang Zhao 2025 Download Now
No ratings yet
Data Mining Applications With R 1st Edition Yanchang Zhao 2025 Download Now
73 pages
Wa0001.
No ratings yet
Wa0001.
46 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
15 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
100% (2)
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Edition PDF Download
49 pages
Unit-2 Bi
No ratings yet
Unit-2 Bi
26 pages
Data Mining: Should It Be Included in The 'Statistics' Curriculum?
No ratings yet
Data Mining: Should It Be Included in The 'Statistics' Curriculum?
4 pages
Data Mining Unit-II
No ratings yet
Data Mining Unit-II
13 pages
Data Mining Lab 1
No ratings yet
Data Mining Lab 1
16 pages
TT02 Data, Methods, and Scenarios
No ratings yet
TT02 Data, Methods, and Scenarios
44 pages
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Editioninstant Download
100% (4)
(Ebook PDF) Introduction To Data Mining, Global Edition 2nd Editioninstant Download
57 pages
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
No ratings yet
Data Science & Analytics: Course Code: CSE3105 Credits: 02 Credit Hours: 02/week Exam Hours: 03
2 pages
Ccpda Book
No ratings yet
Ccpda Book
46 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Data Mining and Business Analytics
No ratings yet
Data Mining and Business Analytics
7 pages
R Programming - An Approach To Data Analytics
No ratings yet
R Programming - An Approach To Data Analytics
402 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
Data Mining Notes
67% (3)
Data Mining Notes
75 pages
Slides Concepts
No ratings yet
Slides Concepts
55 pages
Data Mining Introduction Guide
No ratings yet
Data Mining Introduction Guide
95 pages
R Data Mining Guide for Analysts
No ratings yet
R Data Mining Guide for Analysts
4 pages
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
4 pages
Unit 1 Big Data Analytics - An Introduction (Final)
No ratings yet
Unit 1 Big Data Analytics - An Introduction (Final)
65 pages
Introduction To Data Mining With R: Yanchang Zhao
No ratings yet
Introduction To Data Mining With R: Yanchang Zhao
46 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Chapter 1
No ratings yet
Chapter 1
313 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Big Data Mid Term
No ratings yet
Big Data Mid Term
14 pages
Chap1 Intro
No ratings yet
Chap1 Intro
32 pages
Log Analysis for Business Insights
No ratings yet
Log Analysis for Business Insights
17 pages
Full Download Big Data Computing A Guide For Business and Technology Managers 1st Edition Vivek Kale PDF
100% (7)
Full Download Big Data Computing A Guide For Business and Technology Managers 1st Edition Vivek Kale PDF
63 pages
Review Questions and Discussion Questions
No ratings yet
Review Questions and Discussion Questions
12 pages
Fiche Bim v14
No ratings yet
Fiche Bim v14
2 pages
SQL Injection Risks in Bangladesh
No ratings yet
SQL Injection Risks in Bangladesh
1 page
MongoDB Lab
No ratings yet
MongoDB Lab
41 pages
University of Engineering and Technology, Lahore Formal Proposal Nestle
No ratings yet
University of Engineering and Technology, Lahore Formal Proposal Nestle
5 pages
Introduction To Docs and Image Based Voice Chatbots
No ratings yet
Introduction To Docs and Image Based Voice Chatbots
17 pages
Db2 Interview Question
No ratings yet
Db2 Interview Question
124 pages
2024 4 GPT
No ratings yet
2024 4 GPT
2 pages
BI Apps796 Perf Tech Note V9
No ratings yet
BI Apps796 Perf Tech Note V9
134 pages
Annu Priya CV
No ratings yet
Annu Priya CV
1 page
IBM I2 Analyst's Notebook Family
No ratings yet
IBM I2 Analyst's Notebook Family
6 pages
Job Portal
No ratings yet
Job Portal
11 pages
Sample Project - Real Estate
No ratings yet
Sample Project - Real Estate
8 pages
Move Tables, Indexes and Lobs To Another Tablespace
No ratings yet
Move Tables, Indexes and Lobs To Another Tablespace
3 pages
Power BI Quiz/Test-3 PGDM 2020-22
No ratings yet
Power BI Quiz/Test-3 PGDM 2020-22
9 pages
SQL Questions and Answers
No ratings yet
SQL Questions and Answers
49 pages
Project Proposal Guidlines
No ratings yet
Project Proposal Guidlines
10 pages
Book Shop System Management
No ratings yet
Book Shop System Management
18 pages
Management Info System Lesson 2
No ratings yet
Management Info System Lesson 2
4 pages
IBM Cognos - Business Inteligence
No ratings yet
IBM Cognos - Business Inteligence
593 pages
Data Analyst Resume
No ratings yet
Data Analyst Resume
1 page
System Analysis & Design Basics
No ratings yet
System Analysis & Design Basics
30 pages
Btech Cse & Aids DWDM Material - 2025
100% (1)
Btech Cse & Aids DWDM Material - 2025
45 pages
Report of The Summer Internship Project
No ratings yet
Report of The Summer Internship Project
25 pages
Explain The Concept of Information Technology
No ratings yet
Explain The Concept of Information Technology
38 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
DA 100 Exam Practice Questions
100% (1)
DA 100 Exam Practice Questions
21 pages
Gis Demystified
100% (2)
Gis Demystified
8 pages

An R Companion For Introduction To Data Mining

Uploaded by

An R Companion For Introduction To Data Mining

Uploaded by

An R Companion for Introduction to Data Mining

Learning Objectives and Content

Story of the Project

You might also like