ST2195 Programming for Data Science
Course Outline
1
Introduction
Name: Leong Chee Ming
Email: cmleong003@mymail.sim.edu.sg
2
Introduction
This course will cover the main principles of computer
programming with a focus on data science applications by
following the entire pathway from raw data to databases, data
wrangling and visualisation, machine learning frameworks up to
software development.
3
Aims and Objectives
• Gain knowledge on the main principles of programming in the data science
context
• Develop ability to handle and visualise data
• Apply computational thinking in various applications domains
• Provide training in state-of-the-art tools, e.g. SQL, Python, R and Git
• Communicate the data analysis results to stakeholders and share work
with people in the Data Science industry
4
Learning Outcomes
At the end of the course and having completed the essential reading and
activities students should be able to:
• Convert raw data to relational databases such as SQL
• Import data to Python and R, apply data manipulation and visualization
• Program in Python and R
• Develop software using version control via Git
5
Assessment
1. Individual case study piece of coursework (50%)
2. Two-hour unseen written examination (50%)
6
Course Materials
The following materials, available through the UOL Virtual Learning Environment (VLE),
will be your main resources
1. ST2195 Subject Guides
2. ST2195 Practice Assignments
Readings/References:
• McKinney W. Python for Data Analysis, 2nd edition O’Reilly (2017)
• Gutagg J.V. Introduction to Computation and Programming using Python, MIT
Press, 2nd edition (2017)
• Wickham H. and Grolemund G. R for Data Science, 1st edition O’Reilly (2017)
• Wickham H. Advanced R., 1st edition Chapman & Hall (2015)
• Rammakrishnan R. and Gehrke J. Database Management Systems, 3rd edition,
McGraw Hill (2002)
7
Course Structure – Lectures and Practical Sessions
Course materials divided into 10 Blocks
• Each Block will be covered in ~2 Lectures and 1 Practical Session
• Total of 19 Lectures and 10 Practical Sessions
Lectures will go through the content in the ST2195 Subject Guides (19 sessions)
• Coverage of key concepts/highlights
• Illustrations through demos and trying out the code
Practical Sessions will go through the ST2195 Practice Assignments (10 sessions)
• Do the assignment in groups of 4-5 students each
• For each assignment, a few groups will be randomly selected to present their solution
8
Course Structure – Details
Block Objective
1 • Introduce yourself to Data Science and review real-world data examples
Programming Tools
• Gain experience using basic tools and technology for programming, such as notebooks, IDEs and
Data Science
Ecosystem &
version control using Git.
Each Block will have ~2
Lectures and 1 Practical
Session
• Gain familiarity with various data types and structures as well as popular data-exchange formats
Interacting with Data
(e.g. JSON, XML, CSV).
2
Structures
• Be able to work with various data types and structures and data-exchange formats in R and
Python.
• Use relational database models and structured query languages (SQL)
3
• Gain experience with interfacing SQL from R and Python
9
Course Structure – Details (cont’d)
Block Description
Core Programming Concepts
• Understand and use basic programming concepts such as control flow, variable and function
scoping in R.
4
• Understand and use basic programming concepts such as exceptions, error handling, testing
and debugging in R.
• Understand and use basic programming paradigms in R.
5
• Understand and use basic programming concepts and paradigms in Python.
Wrangling
• Understand and use data types and structures in R.
Data
6 • Demonstrate how to clean and manipulate data in R.
• Manage data types and structures in Python.
10
Course Structure – Details (cont’d)
Block Description
Graphics and Data
• Gain familiarity with graphics and data visualizations in R.
7
Visualisation
• Understand the grammar of graphics paradigm and its implementation in R.
• Use graphics frameworks in Python.
8
• Produce network visualisations
Frameworks
• Introduce yourself to Machine Learning frameworks.
Learning
Machine
• Gain experience with the formation of data analytic pipelines and principles of parallel computing.
9
• Interact with Machine Learning frameworks in R.
• Interact with Machine Learning frameworks in Python.
Development
Software
• Gain experience with documenting code
10 • Understand software testing frameworks and test-driven development.
• Developing R and Python packages
11
Course Structure – Additional Activities
Revision Class (1 session)
• Scheduled for March 2022
Individual Project Reviews (2x)
• 1st Submission – Plan and Commencement (early Jan 2022)
• 2nd Submission – Near Completion (mid-end Feb 2022)
Class Test/Assignment (2x)
• Take-home mode, self-timed, closed-book
12
Course Structure – Summary
Course Divided into 10 Blocks
• Lectures (19 sessions)
• Practical Sessions (10 sessions)
Additional Activities
• Revision Class (1 session)
• Individual Project Reviews (2x)
• Class Test/Assignment (2x)
13
Key Takeaways
• R and Python (also SQL, Git)
• Know how to find help
• Practice is important
• Individual project worth 50%
14