0% found this document useful (0 votes)

80 views12 pages

FDSNotes

Ty bsc cs sppu

Uploaded by

student.2004.in

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views12 pages

FDSNotes

Ty bsc cs sppu

Uploaded by

student.2004.in

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Foundation of Data Science

1. Introduction to Data Science

Introduction to Data Science
• Definition: Data Science is a multidisciplinary field that uses scientific methods, algorithms,
processes, and systems to extract knowledge and insights from structured and unstructured
data.
• Key Components: It involves the integration of statistics, computer science, machine
learning, data mining, and domain knowledge.
• The 3 V’s of Data:
• Volume: Refers to the vast amount of data generated every second from various
sources (e.g., social media, sensors, transactions).
• Velocity: The speed at which data is generated, processed, and analyzed. In today’s
fast-paced world, data needs to be processed in real-time or near real-time.
• Variety: The different forms and types of data, including structured (e.g., databases),
semi-structured (e.g., XML, JSON), and unstructured data (e.g., text, images,
videos).

Why Learn Data Science?

• Demand for Data Scientists: The demand for data scientists is high across various
industries, as businesses increasingly rely on data-driven decision-making.
• Versatility: Data Science skills are applicable in numerous fields such as healthcare,
finance, marketing, and technology.
• Problem Solving: Data Science enables professionals to solve complex problems, improve
business processes, and innovate.
• Career Growth: Offers lucrative career opportunities with high earning potential and job
security.

Applications of Data Science

• Healthcare: Predictive analytics for patient outcomes, personalized medicine, and medical
image analysis.
• Finance: Fraud detection, risk management, algorithmic trading, and customer
segmentation.
• Marketing: Customer behavior analysis, targeted advertising, sentiment analysis, and
recommendation systems.
• Retail: Inventory management, demand forecasting, and personalized shopping experiences.
• Transportation: Route optimization, autonomous vehicles, and predictive maintenance.

The Data Science Lifecycle

• Data Collection: Gathering data from various sources such as databases, sensors, or the
web.
• Data Cleaning: Preprocessing the data to handle missing values, outliers, and errors to
ensure quality.
• Data Exploration: Analyzing the data to discover patterns, trends, and relationships using
statistical methods.
• Data Modeling: Building predictive models using machine learning algorithms to make
forecasts or decisions.
• Data Interpretation: Interpreting the results to gain insights and inform decision-making.
• Model Deployment: Implementing the model in a production environment where it can be
used to make real-time decisions.
• Monitoring & Maintenance: Continuously monitoring the model’s performance and
updating it as needed.

Data Scientist’s Toolbox

• Programming Languages: Python, R, and SQL are essential for data manipulation,
analysis, and modeling.
• Libraries & Frameworks:
• Pandas: Data manipulation and analysis.
• NumPy: Numerical computing.
• Scikit-learn: Machine learning algorithms.
• TensorFlow & PyTorch: Deep learning frameworks.
• Data Visualization Tools: Matplotlib, Seaborn, and Tableau for creating visual
representations of data.
• Big Data Technologies: Hadoop and Spark for processing and analyzing large datasets.
• Database Management: SQL databases (e.g., MySQL, PostgreSQL) and NoSQL databases
(e.g., MongoDB, Cassandra).

Types of Data
• Structured Data:
• Definition: Data that is organized in a specific format, often in rows and columns,
making it easily searchable in databases.
• Examples: Excel sheets, SQL databases.
• Semi-structured Data:
• Definition: Data that doesn’t have a fixed format but includes tags or markers to
separate elements.
• Examples: XML, JSON files.
• Unstructured Data:
• Definition: Data that lacks a specific format or structure, making it more challenging
to process and analyze.
• Examples: Text documents, images, videos, emails.
• Problems with Unstructured Data:
• Storage Issues: Requires more space and advanced storage solutions.
• Processing Complexity: Difficult to process and analyze due to its lack of
structure.
• Interpretation Challenges: Requires advanced techniques like natural
language processing (NLP) or image recognition.
Data Sources
• Open Data: Publicly available data that can be freely used and shared. Examples include
government datasets, public health data, and environmental data.
• Social Media Data: Data generated from social media platforms, such as posts, likes,
shares, and comments. Useful for sentiment analysis and trend prediction.
• Multimodal Data: Data that combines multiple types of information, such as text, images,
and audio. Examples include video files with subtitles or annotated images.
• Standard Datasets: Widely-used datasets in Data Science for benchmarking algorithms and
models. Examples include the Iris dataset, MNIST dataset, and ImageNet.

Data Formats
• Integers and Floats:
• Integers: Whole numbers used for counting or indexing.
• Floats: Numbers with decimal points, used for representing continuous data.
• Text Data:
• Plain Text: Simple text data stored without any formatting (e.g., .txt files).
• Text Files:
• CSV Files: Comma-separated values, often used for storing tabular data.
• JSON Files: JavaScript Object Notation, used for storing and exchanging data.
• XML Files: Extensible Markup Language, used for encoding documents in a format
that is both human-readable and machine-readable.
• HTML Files: Hypertext Markup Language, used for creating web pages.
• Dense Numerical Arrays: Arrays containing numerical data, typically used in scientific
computing and data analysis (e.g., NumPy arrays).
• Compressed or Archived Data:
• Tar Files: Archive files that can contain multiple files and directories.
• GZip Files: Compressed files that reduce storage space and transfer time.
• Zip Files: Archive files that can contain multiple files in a compressed format.
• Image Files:
• Rasterized Images: Images made up of pixels (e.g., JPEG, PNG).
• Vectorized Images: Images made up of paths and curves, scalable without losing
quality (e.g., SVG files).
• Compressed Images: Images that have been compressed to reduce file size (e.g.,
JPEG).

2. Statistical Data Analysis

Role of Statistics in Data Science
• Definition: Statistics is the branch of mathematics that deals with the collection, analysis,
interpretation, presentation, and organization of data.
• Importance in Data Science:
• Data Collection: Statistics provides methods to design surveys and experiments to
collect data efficiently.
• Data Analysis: Statistical techniques are essential for analyzing and interpreting
complex data sets.
• Inference: Statistics helps in making inferences about a population based on sample
data.
• Decision Making: Statistical methods enable data-driven decision-making by
providing a quantitative basis for assessing the reliability and significance of results.

Descriptive Statistics (6 Lectures)

• Definition: Descriptive statistics involves summarizing and organizing data to understand
its main characteristics, typically through numerical summaries, graphs, and tables.
• Key Components:
• Measuring the Frequency:
• Definition: Frequency refers to how often a data point occurs in a dataset.
• Tools: Frequency distributions, histograms, and bar charts are used to
visualize frequency.
• Measuring the Central Tendency:
• Mean: The arithmetic average of a set of numbers.
• Median: The middle value in a dataset when arranged in ascending or
descending order.
• Mode: The value that appears most frequently in a dataset.
• Measuring the Dispersion:
• Range: The difference between the highest and lowest values in a dataset.
• Standard Deviation: A measure of the amount of variation or dispersion in a
set of values.
• Variance: The square of the standard deviation, representing the spread of a
dataset.
• Interquartile Range (IQR): The difference between the first quartile (Q1)
and the third quartile (Q3), representing the middle 50% of the data.

Inferential Statistics (10 Lectures)

• Definition: Inferential statistics involves making predictions or inferences about a
population based on a sample of data drawn from that population.
• Key Concepts:
• Hypothesis Testing:
• Definition: A method used to determine if there is enough evidence to reject
a null hypothesis in favor of an alternative hypothesis.
• Steps:
1. Formulate Hypotheses: Define the null hypothesis (H0) and
alternative hypothesis (H1).
2. Choose Significance Level (α): Commonly used levels are 0.05 or
0.01.
3. Calculate Test Statistic: Based on the sample data.
4. Determine p-value: Compare the p-value with the significance level
to make a decision.
5. Make a Conclusion: Accept or reject the null hypothesis.
• Multiple Hypothesis Testing:
• Definition: Testing several hypotheses simultaneously, often using
adjustments like the Bonferroni correction to control the overall error rate.
• Parameter Estimation Methods:
• Point Estimation: Estimating an unknown parameter using a single value
(e.g., sample mean for population mean).
• Interval Estimation: Providing a range within which the parameter is
expected to lie, with a certain level of confidence (e.g., confidence intervals).

Measuring Data Similarity and Dissimilarity

• Definition: Similarity and dissimilarity measures are used to compare data points or objects,
which is essential for clustering, classification, and other data analysis tasks.
• Key Concepts:
• Data Matrix versus Dissimilarity Matrix:
• Data Matrix: Represents data with rows as objects and columns as attributes.
• Dissimilarity Matrix: Represents pairwise dissimilarities between objects,
with values indicating how different two objects are.
• Proximity Measures for Nominal Attributes:
• Definition: Nominal attributes are categorical attributes with no intrinsic
ordering (e.g., color, gender).
• Proximity Measures: Jaccard coefficient, Simple Matching Coefficient
(SMC).
• Proximity Measures for Binary Attributes:
• Definition: Binary attributes take on two values (e.g., 0 or 1).
• Proximity Measures: Hamming distance, Jaccard coefficient for binary data.
• Dissimilarity of Numeric Data:
• Euclidean Distance: The straight-line distance between two points in a
multi-dimensional space.
• Manhattan Distance: The sum of absolute differences between the
coordinates of two points (also known as L1 distance).
• Minkowski Distance: A generalization of Euclidean and Manhattan
distances, parameterized by a value 'p' that determines the specific distance
measure (p=1 for Manhattan, p=2 for Euclidean).
• Proximity Measures for Ordinal Attributes:
• Definition: Ordinal attributes have a clear, ordered relationship between
values (e.g., rankings).
• Proximity Measures: Can use rank correlation coefficients like Spearman's
rank correlation or Kendall's tau.

Concept of Outlier
• Definition: An outlier is a data point that significantly differs from other observations in a
dataset.
• Types of Outliers:
• Univariate Outliers: Outliers that occur in a single variable.
• Multivariate Outliers: Outliers that occur in a combination of variables, not
apparent when looking at individual variables.
• Contextual Outliers: Outliers that are only considered abnormal in a specific
context (e.g., temperature readings that are normal in summer but outliers in winter).
• Outlier Detection Methods:
• Z-Score Method: Calculates how many standard deviations a data point is from the
mean. Data points with a Z-score beyond a certain threshold (e.g., ±3) are considered
outliers.
• IQR Method: Outliers are identified as data points that fall below Q1 - 1.5IQR or
above Q3 + 1.5IQR.
• Machine Learning Methods: Techniques like clustering, isolation forests, and one-
class SVMs can be used to detect outliers in more complex datasets.

3. Data Preprocessing
Data Objects and Attribute Types
• What is an Attribute?
• Definition: An attribute (or feature) is a property or characteristic of an object or
data point. In a dataset, attributes are the columns that describe different aspects of
the data objects (rows).
• Types of Attributes:
• Nominal Attributes:
• Definition: Categorical attributes with no inherent order or ranking among
the values.
• Examples: Colors (red, blue, green), gender (male, female).
• Binary Attributes:
• Definition: Attributes that have two possible states or values.
• Types:
• Symmetric Binary: Both outcomes are equally important (e.g., 0 or 1
in a binary variable).
• Asymmetric Binary: One outcome is more significant than the other
(e.g., success/failure, where success is more critical).
• Ordinal Attributes:
• Definition: Categorical attributes with a meaningful order or ranking
between values.
• Examples: Education levels (high school, bachelor's, master's), customer
satisfaction ratings (poor, fair, good, excellent).
• Numeric Attributes:
• Definition: Attributes that are quantifiable and expressible in numbers.
• Types:
• Discrete Attributes: Attributes that take on a countable number of
distinct values.
• Examples: Number of students in a class, number of cars in a
parking lot.
• Continuous Attributes: Attributes that can take on any value within a
range.
• Examples: Temperature, height, weight.
Data Quality: Why Preprocess the Data?
• Importance of Data Preprocessing:
• Accuracy: Ensures the accuracy and reliability of the analysis by addressing issues
such as missing data, noise, and inconsistencies.
• Efficiency: Reduces the complexity of data, making it easier to process and analyze.
• Consistency: Aligns data from different sources or formats, ensuring that it is
coherent and uniform.
• Improves Model Performance: Clean and well-preprocessed data lead to better
model performance and more accurate predictions.

Data Munging/Wrangling Operations

• Definition: Data munging or wrangling refers to the process of transforming raw data into a
clean, structured format suitable for analysis.
• Common Operations:
• Data Parsing: Converting raw data into a structured format.
• Data Filtering: Removing irrelevant or redundant data.
• Data Aggregation: Summarizing or combining data from multiple sources.
• Data Enrichment: Enhancing data with additional relevant information.

Cleaning Data
• Definition: Data cleaning is the process of identifying and correcting (or removing) errors
and inconsistencies in data to improve its quality.
• Common Data Cleaning Issues:
• Missing Values: Data points where information is absent.
• Handling Methods: Imputation (filling in missing values), deletion, or using
algorithms that can handle missing data.
• Noisy Data: Data that contains errors, inconsistencies, or irrelevant information.
• Types of Noisy Data:
• Duplicate Entries: Multiple records for the same entity.
• Multiple Entries for a Single Entity: Different entries representing
the same entity with slight variations.
• Missing Entries: Partial data missing for certain records.
• NULLs: Missing values represented as NULL.
• Huge Outliers: Data points that are significantly different from other
observations.
• Out-of-Date Data: Data that is no longer accurate or relevant.
• Artificial Entries: Data that is not genuine or was created for testing
purposes.
• Irregular Spacings: Inconsistent spacing within text data.
• Formatting Issues: Different formatting styles used across tables or
columns.
• Extra Whitespace: Unnecessary spaces that can cause parsing issues.
• Irregular Capitalization: Inconsistent use of uppercase and
lowercase letters.
• Inconsistent Delimiters: Different delimiters used to separate data
fields.
• Irregular NULL Format: Inconsistent representation of missing
data.
• Invalid Characters: Characters that do not belong in the dataset.
• Incompatible Datetimes: Different date and time formats that need
standardization.

Data Transformation
• Definition: Data transformation involves converting data into a suitable format or structure
for analysis.
• Common Data Transformation Techniques:
• Rescaling: Adjusting the range of data values to a specific scale, often to bring all
variables into the same range.
• Example: Rescaling data to a range of 0 to 1.
• Normalizing: Adjusting the data to have a mean of 0 and a standard deviation of 1.
• Example: Z-score normalization.
• Binarizing: Converting numerical data into binary form (e.g., 0 or 1).
• Example: Converting a continuous attribute into a binary attribute based on a
threshold.
• Standardizing: Ensuring data follows a standard normal distribution with a mean of
0 and standard deviation of 1.
• Example: Standardizing data to remove the effects of different scales.
• Label Encoding: Converting categorical attributes into numerical form by assigning
a unique integer to each category.
• One-Hot Encoding: Converting categorical attributes into binary vectors where each
category is represented by a binary variable (0 or 1).

Data Reduction
• Definition: Data reduction involves reducing the volume of data while maintaining its
integrity and meaning, making it easier to analyze.
• Techniques:
• Dimensionality Reduction: Reducing the number of attributes or features while
retaining essential information (e.g., PCA, LDA).
• Numerosity Reduction: Reducing the number of data points or records through
techniques like clustering, sampling, or aggregation.

Data Discretization
• Definition: Data discretization involves converting continuous data into discrete intervals or
categories.
• Importance: Useful for transforming continuous attributes into categorical attributes, which
can simplify analysis and improve model performance.
• Methods:
• Binning: Dividing data into intervals, or "bins," and assigning a categorical label to
each bin.
• Histogram Analysis: Using histograms to define intervals based on data distribution.
• Cluster Analysis: Grouping similar data points and assigning them to discrete
categories.

4. Data Visualization
Introduction to Exploratory Data Analysis (EDA)
• Definition: EDA is an approach to analyzing data sets to summarize their main
characteristics, often using visual methods.
• Purpose of EDA:
• Identifying Patterns: Detecting trends, correlations, and relationships in data.
• Spotting Anomalies: Finding outliers or irregularities in the data.
• Checking Assumptions: Verifying the validity of assumptions made about the data.
• Guiding Further Analysis: Informing the choice of statistical models or algorithms
to apply.

Data Visualization and Visual Encoding

• Definition: Data visualization is the graphical representation of data to make complex data
more accessible and understandable.
• Visual Encoding: The process of mapping data attributes (e.g., numbers, categories) to
visual elements like color, shape, size, or position in a chart.
• Examples of Visual Encoding:
• Position: The location of data points on a plot (e.g., x and y axes in a scatter
plot).
• Color: Used to distinguish different categories or indicate data intensity (e.g.,
heat maps).
• Size: Represents the magnitude of data points (e.g., bubble size in bubble
plots).
• Shape: Differentiates between categories (e.g., different marker shapes in a
scatter plot).

Data Visualization Libraries

• Definition: Libraries or software packages that provide tools and functions for creating
visual representations of data.
• Popular Libraries:
• Matplotlib: A widely used Python library for creating static, animated, and
interactive visualizations.
• Seaborn: Built on top of Matplotlib, Seaborn provides a high-level interface for
creating attractive and informative statistical graphics.
• Plotly: An interactive graphing library that enables complex, web-based
visualizations.
• ggplot2: A popular data visualization package in R, based on the Grammar of
Graphics.
• D3.js: A JavaScript library for producing dynamic, interactive data visualizations in
web browsers.
Basic Data Visualization Tools
• Histograms:
• Definition: A graphical representation of the distribution of a dataset. It shows the
frequency of data points in specified ranges (bins).
• Use: Ideal for displaying the distribution of a single continuous variable.
• Bar Charts/Graphs:
• Definition: A chart that presents categorical data with rectangular bars. The length of
each bar is proportional to the value it represents.
• Use: Best for comparing the frequency or count of different categories.
• Scatter Plots:
• Definition: A plot that shows the relationship between two numerical variables. Each
point represents an observation in the dataset.
• Use: Useful for identifying correlations or patterns between variables.
• Line Charts:
• Definition: A type of chart that displays data points connected by a line. It shows
trends over time or ordered categories.
• Use: Commonly used to track changes over time.
• Area Plots:
• Definition: Similar to line charts, but the area under the line is filled with color or
shading.
• Use: Good for visualizing cumulative data or comparing multiple variables.
• Pie Charts:
• Definition: A circular chart divided into sectors, each representing a proportion of
the whole.
• Use: Ideal for showing the relative proportions of categories in a dataset.
• Donut Charts:
• Definition: A variation of the pie chart with a central hole, often used to provide
additional information in the center.
• Use: Similar to pie charts but with an added aesthetic appeal.

Specialized Data Visualization Tools

• Boxplots:
• Definition: A graphical representation of the distribution of a dataset based on five
summary statistics: minimum, first quartile (Q1), median, third quartile (Q3), and
maximum.
• Use: Effective for identifying outliers and understanding the spread and skewness of
data.
• Bubble Plots:
• Definition: A variation of a scatter plot where each point is replaced by a bubble, and
the size of the bubble represents a third variable.
• Use: Useful for visualizing three dimensions of data on a two-dimensional plane.
• Heat Maps:
• Definition: A graphical representation of data where individual values are
represented by colors.
• Use: Ideal for displaying the intensity or density of data points across a matrix.
• Dendrogram:
• Definition: A tree-like diagram used to illustrate the arrangement of clusters
produced by hierarchical clustering.
• Use: Useful for visualizing the structure and hierarchy of data clusters.
• Venn Diagram:
• Definition: A diagram that shows all possible logical relations between a finite
collection of sets.
• Use: Effective for illustrating set relationships, such as intersections and unions.
• Treemap:
• Definition: A hierarchical structure represented as nested rectangles, where each
rectangle's size is proportional to the data value.
• Use: Useful for visualizing large amounts of hierarchical data.
• 3D Scatter Plots:
• Definition: An extension of the scatter plot into three dimensions, where each point
is defined by three numerical coordinates.
• Use: Ideal for visualizing the relationship between three continuous variables.

Advanced Data Visualization Tools - Wordclouds

• Definition: A visual representation of text data where the size of each word reflects its
frequency or importance.
• Use: Effective for quickly identifying the most prominent words or themes in a text dataset.

Visualization of Geospatial Data

• Definition: The process of visualizing data that includes geographical or spatial
components.
• Tools and Techniques:
• Choropleth Maps: Maps where areas are shaded or patterned in proportion to the
data value.
• Point Maps: Maps that represent individual data points as symbols, such as dots.
• Heat Maps: Geographical maps that use color to represent the density of data points
in a given area.
• Interactive Maps: Maps that allow users to interact with data by zooming, clicking,
or filtering.

Data Visualization Types

• Categorical Data Visualization:
• Tools: Bar charts, pie charts, donut charts.
• Purpose: Comparing different categories or understanding the distribution of
categorical data.
• Numerical Data Visualization:
• Tools: Histograms, boxplots, scatter plots.
• Purpose: Understanding the distribution, trends, and relationships between
numerical variables.
• Hierarchical Data Visualization:
• Tools: Treemaps, dendrograms.
• Purpose: Displaying the structure and relationships within hierarchical datasets.
• Network Data Visualization:
• Tools: Network graphs, node-link diagrams.
• Purpose: Visualizing relationships and interactions between entities within a
network.

DA-1,2,3 (1) Merged
No ratings yet
DA-1,2,3 (1) Merged
39 pages
Data Science Unit 01
No ratings yet
Data Science Unit 01
19 pages
Data Science Fundamentals Detailed Notes
No ratings yet
Data Science Fundamentals Detailed Notes
31 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Introduction to Data Science Concepts
100% (1)
Introduction to Data Science Concepts
167 pages
Self Learning Material - Introduction To Data Science
No ratings yet
Self Learning Material - Introduction To Data Science
10 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Anshumoocs
No ratings yet
Anshumoocs
20 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
CUITM217-DATA-SCIENCE Data
No ratings yet
CUITM217-DATA-SCIENCE Data
48 pages
DTS 201 Lecture Note
No ratings yet
DTS 201 Lecture Note
24 pages
Module 1 - Introduction To Data Science
No ratings yet
Module 1 - Introduction To Data Science
3 pages
Unit I
No ratings yet
Unit I
52 pages
Data Science 1
100% (5)
Data Science 1
133 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
25 pages
File
No ratings yet
File
27 pages
Data Science in IOT
No ratings yet
Data Science in IOT
220 pages
Unit-Iv Basics of Data Science 7 Hours
No ratings yet
Unit-Iv Basics of Data Science 7 Hours
60 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
84 pages
Fds Csheet and Read The Rule
No ratings yet
Fds Csheet and Read The Rule
4 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
UNIT - II Artificial Intelligence Second Part
No ratings yet
UNIT - II Artificial Intelligence Second Part
9 pages
AI DS Unit 3
No ratings yet
AI DS Unit 3
5 pages
Fundamentals of Data Science Course
100% (3)
Fundamentals of Data Science Course
62 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Ids Unit 1,2,3,4 & 5
No ratings yet
Ids Unit 1,2,3,4 & 5
117 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
Data Science - Unit 1 MDM
No ratings yet
Data Science - Unit 1 MDM
64 pages
Foundation of Data Science (BSC)
No ratings yet
Foundation of Data Science (BSC)
64 pages
BD4151 Foundations OF DATA Science BD4151 Foundations OF DATA Science
No ratings yet
BD4151 Foundations OF DATA Science BD4151 Foundations OF DATA Science
70 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
12 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Science Unit-1 Notes
No ratings yet
Data Science Unit-1 Notes
19 pages
DSF Notes
No ratings yet
DSF Notes
97 pages
Cs3352 Foundation of Data Science
No ratings yet
Cs3352 Foundation of Data Science
17 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
Data Science Terminology
No ratings yet
Data Science Terminology
10 pages
Kadir
No ratings yet
Kadir
84 pages
Unit-3 Intr Data Science
No ratings yet
Unit-3 Intr Data Science
150 pages
Week1 1
No ratings yet
Week1 1
40 pages
CS250
No ratings yet
CS250
55 pages
Data Science Overview Basic To Advance Guide
No ratings yet
Data Science Overview Basic To Advance Guide
27 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
Session 1819
No ratings yet
Session 1819
47 pages
Class X Data Science
No ratings yet
Class X Data Science
29 pages
What Is Data Science
No ratings yet
What Is Data Science
5 pages
Approaches in Data Science (Slides)
No ratings yet
Approaches in Data Science (Slides)
13 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Intro to Data Science Basics
No ratings yet
Intro to Data Science Basics
11 pages
Cereal Production in Pakistan
No ratings yet
Cereal Production in Pakistan
24 pages
Statistic Form 4
No ratings yet
Statistic Form 4
34 pages
Grade 10 Math: Measures of Position
No ratings yet
Grade 10 Math: Measures of Position
18 pages
Cumulative Frequency Exercises
No ratings yet
Cumulative Frequency Exercises
10 pages
Akhil Report
No ratings yet
Akhil Report
54 pages
Bks MaaHL 0502 gdc05 Xxti84
No ratings yet
Bks MaaHL 0502 gdc05 Xxti84
4 pages
Statistic Project Grade 8
No ratings yet
Statistic Project Grade 8
5 pages
Quantitative Techniques for BBA LLB
No ratings yet
Quantitative Techniques for BBA LLB
23 pages
Cumulative Frequency Analysis Guide
No ratings yet
Cumulative Frequency Analysis Guide
20 pages
Mcqs SSGB Six Sigma
No ratings yet
Mcqs SSGB Six Sigma
12 pages
Study Guide
No ratings yet
Study Guide
16 pages
Cs3352 Fds Notes Mk1
No ratings yet
Cs3352 Fds Notes Mk1
30 pages
Week 3 GA
No ratings yet
Week 3 GA
11 pages
Stat 166 Final Paper
No ratings yet
Stat 166 Final Paper
59 pages
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
No ratings yet
Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers
16 pages
F4 Maths Simplified Notes SP
No ratings yet
F4 Maths Simplified Notes SP
35 pages
Quarter 4 Periodical Research 1 TQ
No ratings yet
Quarter 4 Periodical Research 1 TQ
5 pages
Ferreira JCE2013
No ratings yet
Ferreira JCE2013
9 pages
Chapter 3 Numerical Descriptive Measures Jaggia4e - PPT
No ratings yet
Chapter 3 Numerical Descriptive Measures Jaggia4e - PPT
69 pages
Chapter 2 dataPreProcessing HAN
No ratings yet
Chapter 2 dataPreProcessing HAN
76 pages
Journal of Geochemical Exploration: Pedro Tume, Nuria Roca, Rodrigo Rubio, Robert King, Jaume Bech
No ratings yet
Journal of Geochemical Exploration: Pedro Tume, Nuria Roca, Rodrigo Rubio, Robert King, Jaume Bech
13 pages
Yr 10 5.2 NCM CH 5 Investigating Data
No ratings yet
Yr 10 5.2 NCM CH 5 Investigating Data
50 pages
Lean Six Sigma Guide Step 5
No ratings yet
Lean Six Sigma Guide Step 5
80 pages
Descreptive Statistics Additional Questions
No ratings yet
Descreptive Statistics Additional Questions
3 pages
Unit Outline: MA5.2-1WM MA5.2-3WM MA5.2-15SP MALS-35SP MALS-36SP MALS-37SP
100% (1)
Unit Outline: MA5.2-1WM MA5.2-3WM MA5.2-15SP MALS-35SP MALS-36SP MALS-37SP
37 pages
Papers Citation vs H-Index Analysis
No ratings yet
Papers Citation vs H-Index Analysis
22 pages
Chapter 5 Measures of Dispersion
No ratings yet
Chapter 5 Measures of Dispersion
31 pages
Quartiles and Interquartile Range
No ratings yet
Quartiles and Interquartile Range
30 pages
Grade 12 Math Literacy: Maps & Scales
100% (1)
Grade 12 Math Literacy: Maps & Scales
66 pages
Math Full Guide
No ratings yet
Math Full Guide
51 pages

FDSNotes

Uploaded by

FDSNotes

Uploaded by

Foundation of Data Science

1. Introduction to Data Science

Why Learn Data Science?

Applications of Data Science

The Data Science Lifecycle

Data Scientist’s Toolbox

2. Statistical Data Analysis

Descriptive Statistics (6 Lectures)

Inferential Statistics (10 Lectures)

Measuring Data Similarity and Dissimilarity

Data Munging/Wrangling Operations

Data Visualization and Visual Encoding

Data Visualization Libraries

Specialized Data Visualization Tools

Advanced Data Visualization Tools - Wordclouds

Visualization of Geospatial Data

Data Visualization Types

You might also like