KEMBAR78
Programming For Data Science | PDF | Statistics | Data
100% found this document useful (1 vote)
213 views4 pages

Programming For Data Science

This document describes a Module on Programming for Data Science that will be taught virtually. The 8-week course is worth 2 credits and is designed as a practical introduction to the R programming language for data analysis. The module will cover understanding different data types, cleaning and transforming raw data, conducting exploratory data analysis, and using R packages for data visualization. Students will learn basic programming skills in R to import, manage, analyze, and visualize data to extract knowledge from it.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
213 views4 pages

Programming For Data Science

This document describes a Module on Programming for Data Science that will be taught virtually. The 8-week course is worth 2 credits and is designed as a practical introduction to the R programming language for data analysis. The module will cover understanding different data types, cleaning and transforming raw data, conducting exploratory data analysis, and using R packages for data visualization. Students will learn basic programming skills in R to import, manage, analyze, and visualize data to extract knowledge from it.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

PROCESO:

FORMATO Código: JD-RG-002


Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL

ESCUELA DE NEGOCIOS Y DESARROLLO INTERNACIONAL

I. IDENTIFICACIÓN DEL MÓDULO

CICLO DE FORMACIÓN Pregrado

NOMBRE DEL MÓDULO Programming for Data Science

CÓDIGO DEL MÓDULO ES597

NÚMERO DE CRÉDITOS 2

DURACIÓN 8 semanas

CONDICIONES DE INSCRIPCIÓN

TIPOLOGÍA Teórico práctico

II. DESCRIPCIÓN DEL MÓDULO

The contents in this Module are designed to be an applied introduction with some statistical concepts. The work will be done using R
language, so you will need to install the programming language and the code editor, that are in free.

It is not necessary to have previous knowledge, but if you have experience in computer programming and statistics, this may feel more
familiar. However, all the contents were structured to learn from scratch. There are a great deal of topics and resources to learn R, so you
will often find references to manuals and online resources.

This Module will provide you with basic tools so that you can later delve into disciplinary applications. It will be extremely useful for
you.

III. COMPETENCIA GENERAL DEL MÓDULO

Understanding the basic operation of programming languages for data science focused on the tasks of importing, managing, cleaning, preparing
and exploratory analysis of data, as necessary tools for the technical development of data science projects.

COMPETENCIA UNIDAD 1
Understanding the types of data involved in data science project to decide the proper programming tool to approach them for an effecient and
effective processing.

ESCENARIO 1 ESCENARIO 2

Elemento 1 Indicador 1 Elemento 1 Indicador 1

Classifies data in structured, semi-


Understanding the types of Describes the types of data that are
Understanding structured, semi- structured and unstructured and
data that are processed processed with programming languages
structured and unstructured acknowledges the possibilities of
using programming and explains ways to structure them in
special data types. processing in each programming
languages. each programming language.
language.

Elemento 2 Indicador 2 Elemento 2 Indicador 2

Understanding the formats Defines the possibilities of data


Defines data writing formats and Understanding the possibilities
for writing, printing, and processing involving time, finance,
presents primary visualizations of the of data processing involving
displaying primary and GPS in each programming
raw data. time, finance, and GPS.
structured data. language.

“Este documento es propiedad intelectual del POLITECNICO GRANCOLOMBIANO, se prohíbe su reproducción total
o parcial sin la autorización escrita de la Rectoría. TODO DOCUMENTO IMPRESO O DESCARGADO DEL
SISTEMA, ES CONSIDERADO COPIA NO CONTROLADA”.
Página 1 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL

COMPETENCIA UNIDAD 2
Understanding the tools used to take raw data, clean it and transform it for the subsequent modeling that will extract knowledge.

ESCENARIO 3 ESCENARIO 4

Elemento 1 Indicador 1 Elemento 1 Indicador 1

Understanding the use of tools


Understanding the
Defines a strategy to tackle data that is for primary data processing such Uses tools for primary data
treatment of missing and
missing, abnormal or erroneous. as subsets, filters, conditionals, processing.
anomalous data.
crossing of variables.

Elemento 2 Indicador 2 Elemento 2 Indicador 2

Understanding the main


Understands the scope and application of Understanding the use of
tools for data Understands the strategies for
data transformation: replacement, programming tools for the
transformation, specifically iterative programming applied to
concatenation, reassignment, and iterative application of data
replacement, concatenation, large databases.
indexing. groups.
reassignment, and indexing.

COMPETENCIA UNIDAD 3
Understanding the main strategies and instruments for initial data exploration, to draw hypotheses, select variables and planning the design of
experiments with data mining tools.

ESCENARIO 5 ESCENARIO 6

Elemento 1 Indicador 1 Elemento 1 Indicador 1

Understanding and Understanding dimension Recognizes dimension reduction


developing univariate Makes a univariate statistics report. reduction tools for viewing tools and applies them in practical
descriptive statistics. multivariate bases. cases.

Elemento 2 Indicador 2 Elemento 2 Indicador 2

Understanding and Understanding data simulation Understands and applies the tools of
Runs multivariate analysis using
developing multivariate tools, sampling and stochastic simulation, sampling and stochastic
programming code.
descriptive analysis. processes. processes.

COMPETENCIA UNIDAD 4
Understanding and using the different packages for the graphic exploration of data for the descriptive representation of data or the explanation of
results.

ESCENARIO 7 ESCENARIO 8

Elemento 1 Indicador 1 Elemento 1 Indicador 1

Understands and applies the R base Analyzing data descriptively


Understanding the use of Analyzes data descriptively with
visualization functions and additional with univariate visualization
display paths in R. multivariate tools.
packages. tools.

Elemento 2 Indicador 2 Elemento 2 Indicador 2

Understanding the use of Understands code that generates Presenting descriptive reports Presents descriptive reports with

Página 2 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL

multivariate statistics bivariate descriptive statistics graphs and


with univariate tools. univariate tools.
graphs in R. interprets them appropriately.

IV. NÚCLEOS TEMÁTICOS


1. Types of Data and their Representations
2. Data Cleansing and Transformation
3. Exploration of Quantitative Data
4. Graphic Exploration of Data
NÚCLEO TEMÁTICO 1. Types of Data and their Representations
EJES TEMÁTICOS:

 Setup
 Starting tasks Basic R
 Types of data in R
 Tibbles 7
 Importing data
 Loops and iterations
 Conditional declarations (conditional statements)
 Functions
 Statistics in geometric spaces
 Areal data
 Some models applied to data analysis in economics
 Decision trees

NÚCLEO TEMÁTICO 2. Data Cleansing and Transformation


EJES TEMÁTICOS:

 Missing values
 Verification of the type of variable
 Approach to outliers and missing data
 Grouping
 Functions applied to an entire data frame

NÚCLEO TEMÁTICO 3. Exploration of Quantitative Data


EJES TEMÁTICOS:

 Unvariate descriptive statistics


 Measures of location
 Measures of dispersion
 Some descriptive graphs
 Principal Component Analysis (PCA)
 t-SNE
 Probability distributions in R

NÚCLEO TEMÁTICO 4. Graphic Exploration of Data


EJES TEMÁTICOS:

 Loading and enlisting grouped data


 Distribution

Página 3 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL

 Graph of evolution in time


 Some considerations taken from literature
 Better graphics
 Software
 Good practices
 Additional resources
 Communicating the results

V. APOYOS REFERENCIALES
BIBLIOGRÁFICOS
 Wickham, H. & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc.
 Laude, H. (2017). Data Scientist y lenguaje R Guía de autoformación para el uso de Big Data. Eni.
 De Jonge, E., & Van Der Loo, M. (2013). An introduction to data cleaning with R. Statistics Netherlands Heerlen.
 Burns, E. (2021). Data Cleaning in R Made Simple. Towards Data Science. https://towardsdatascience.com/data-cleaning-in-r-made-
simple-1b77303b0b17
 Rincón, L. (2007). Curso elemental de probabilidad y estadística. Universidad UNAM.
 Laude, H. (2017). Data Scientist y lenguaje R Guía de autoformación para el uso de Big Data. Eni.
 Kabacoff, R. (2020). Data visualization with R. Wesleyan University.
 Plotly. (n.d.). Plotly R Open Source Graphing Library. Plotly. https://plotly.com/r/

VI. ANEXOS
1. Desarrollo Didáctico de los Módulos en Pregrado Virtual
2. Evaluación de los Módulos en Pregrado Virtual
3. Desarrollo Didáctico de los Módulos en Posgrado Virtual
4. Evaluación de los Módulos en Posgrado Virtual

Página 4 de 4

You might also like