0% found this document useful (0 votes)

167 views11 pages

Data Analytics and Reporting - Notes Unit 1 and 2

Bjjkskwkjwnwnskkwmwmwkkwkwmwkwkkwmkwkwkwkwkwkkwkwkwlwllwlwlwlkwlwnmwmmwmwmwmwkkwkwkwkkklwllwlwlwllwlllllalalqllqlviiakwnkwlwmwl

Uploaded by

shuklarudraksh990

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

167 views11 pages

Data Analytics and Reporting - Notes Unit 1 and 2

Bjjkskwkjwnwnskkwmwmwkkwkwmwkwkkwmkwkwkwkwkwkkwkwkwlwllwlwlwlkwlwnmwmmwmwmwmwkkwkwkwkkklwllwlwlwllwlllllalalqllqlviiakwnkwlwmwl

Uploaded by

shuklarudraksh990

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Analytics and

Reporting: An Introduction
Welcome to the World of Data!
Understanding the Importance of Data

In today's digital age, data is the new oil. It's the raw material that fuels
innovation, decision-making, and business growth. Data Analytics is the process
of examining, cleaning, transforming, and modeling data to discover useful
information, draw conclusions, and support decision-making.

Why Python?

Python has emerged as the language of choice for data scientists and analysts
due to its simplicity, readability, and powerful libraries. It's versatile, making it
suitable for both beginners and experienced programmers.

Python: A Brief Overview

● What is Python?
a. A high-level, interpreted programming language
b. Known for its readability and simplicity
c. Widely used in data science, machine learning, web development, and
more.

● History of Python:
a. Created by Guido van Rossum in the late 1980s
b. Named after the British comedy group Monty Python
c. Initially designed for scripting and automation
d. Grew in popularity due to its focus on code readability and efficiency
● Purpose of Python in Data Analytics:
a. Data manipulation and cleaning
b. Exploratory data analysis (EDA)
c. Data visualization
d. Machine learning and model building
e. Statistical analysis

Data Types in Python

Data types define the kind of data a variable can hold. Python supports various
data types:

● Numeric:
a. int: Integer values (e.g., 42, -10)
b. float: Floating-point numbers (e.g., 3.14, 2.5)
c. complex: Complex numbers (e.g., 2+3j)

● Text:
a. str: Strings (e.g., "Hello", 'World')

● Boolean:
a. bool: Boolean values (True or False)

● Sequence:
a. list: Ordered collection of items (mutable)
b. tuple: Ordered collection of items (immutable)

● Mapping:
a. dict: Unordered collection of key-value pairs

Pandas: Your Data Analysis Toolkit

Pandas is a powerful Python library built on top of NumPy. It provides
high-performance, easy-to-use data structures and data analysis tools.

Installation:
1. Open your terminal or command prompt.
2. Type the following command and press Enter:
Bash
pip install pandas

Importing Pandas:

To use Pandas in your Python code, import it as follows:

Python

import pandas as pd

DataFrame: The Core Data Structure

A DataFrame is a two-dimensional labeled data structure with columns of

potentially different types. It is similar to a spreadsheet or SQL table.

Creating a DataFrame:

Python

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 28],
'City': ['New York', 'Los Angeles', 'Chicago']}

df = pd.DataFrame(data)
print(df)

In the next session, we will delve deeper into Pandas, exploring various
data manipulation techniques and visualization capabilities.
Remember: Practice is key to mastering Python and Pandas. Experiment with
different datasets and explore the vast functionalities offered by these libraries.

Unit-01: Introduction to
Data Analytics and
Reporting
Lecture 1: What is Data Analytics?
Data Analytics is the process of examining large data sets to discover trends
and patterns. It involves collecting, cleaning, transforming, and analyzing data to
extract meaningful insights. These insights can be used to make informed
decisions, identify opportunities, and solve problems.

Real-world example: A retailer might use data analytics to analyze customer

purchasing behavior to determine which products to promote or to identify trends
in customer preferences.

Lecture 2: Data Analysis and Data Processing

Data Analysis is the process of inspecting, cleansing, transforming, and
modeling data with the goal of discovering useful information, informing
conclusions, and supporting decision-making.

Data Processing is the conversion of raw data into a more organized format
suitable for analysis. This involves tasks like data cleaning, transformation, and
integration.

Real-world example: A telecom company might process customer call records

to analyze call patterns, identify network congestion areas, and improve service
quality.

Lecture 3: Types of Analysis

● Descriptive Analytics: Summarizes historical data to understand what
happened.
○ Examples: Sales reports, customer demographics

● Diagnostic Analytics: Explores the reasons behind past occurrences.

○ Examples: Root cause analysis of product failures

● Predictive Analytics: Uses historical data to predict future outcomes.

○ Examples: Customer churn prediction, demand forecasting

● Prescriptive Analytics: Recommends actions based on predictive models.

○ Examples: Product recommendations, optimized pricing strategies

Lecture 4: Difference Between Data Science and Analysis

Data Science is a broader field that encompasses data analysis, machine
learning, and data visualization. It focuses on extracting insights from data to
solve complex problems.

Data Analysis is a subset of data science that focuses on exploring and

understanding data to uncover patterns and trends.

Lecture 5: Different Data Preprocessing Techniques

Data Preprocessing is the process of transforming raw data into a clean and
structured format suitable for analysis. Techniques include:

● Data Cleaning: Handling missing values, outliers, and inconsistencies.

● Data Integration: Combining data from multiple sources.
● Data Transformation: Normalization, standardization, and aggregation.
Lecture 6: Understanding Reporting and Use of Different
Tools
Reporting is the process of communicating insights derived from data analysis
to stakeholders. Effective reporting involves clear visualization and concise
communication. Tools:

● Business Intelligence (BI) tools: Power BI, Tableau, IBM Cognos.

● Data Visualization tools: Excel, Python libraries (Matplotlib, Seaborn)
● Statisticall: Pandas

Real-world example: A marketing team might use a BI tool to create a

dashboard showing sales trends, customer demographics, and campaign
performance.

Unit-02: Data Analysis

Using Pandas
Pandas: A Powerful Tool for Data Manipulation
Pandas is a Python library specifically designed for data manipulation and
analysis. It provides high-performance, easy-to-use data structures and data
analysis tools. Think of it as a spreadsheet on steroids, offering much more
flexibility and capabilities.

Key Features of Pandas:

● Data Structures: Pandas introduces two primary data structures:
○ Series: One-dimensional labeled array holding any data type.
○ DataFrame: Two-dimensional labeled data structure with columns of
potentially different types.
● Data Import/Export: Easily handles various file formats like CSV, Excel,
JSON, SQL databases, and more.

● Data Cleaning and Preparation: Offers functions to handle missing

values, duplicates, outliers, and data normalization.

● Data Analysis: Provides tools for statistical calculations, data aggregation,

and exploratory data analysis.

● Time Series: Excellent support for working with time-series data.

Why Pandas is Popular:

● Efficiency: It's optimized for performance on large datasets.
● Flexibility: Handles diverse data types and structures.
● Ease of Use: Intuitive syntax and clear documentation.
● Integration: Works seamlessly with other Python libraries like NumPy,
Matplotlib, and Scikit-learn.

Lecture 7: Types of Data and Different Sources of Data

● Structured Data: Organized in a predefined format (e.g., databases, CSV
files)
● Unstructured Data: No predefined format (e.g., text, images, audio)
● Semi-Structured Data: Hybrid of structured and unstructured (e.g., JSON,
XML)

Data Sources:

● Databases (SQL, NoSQL)

● Files (CSV, Excel, JSON)
● APIs (REST, GraphQL)
● Web scraping
Lecture 8: Overview of Pandas Library
Pandas is a Python library for data manipulation and analysis. It provides
high-performance data structures and data analysis tools.

Lecture 9: Data Structures in Pandas: Series and DataFrame

● Series: One-dimensional labeled array
● DataFrame: Two-dimensional labeled data structure

Python

import pandas as pd

# Create a Series
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30,
35]}
df = pd.DataFrame(data)
print(df)

Lecture 10: Importing and Exporting Data Using Pandas

Python
import pandas as pd

# Import CSV data

df = pd.read_csv('data.csv')

# Export to CSV
df.to_csv('output.csv', index=False)

Lecture 11: Data Cleaning and Preparation with Pandas

● Handling missing values: dropna(), fillna()
● Removing duplicates
● Outlier detection and treatment

Python

import pandas as pd
import numpy as np

# Handle missing values

df.fillna(0, inplace=True) # Fill missing values with 0

# Remove duplicates
df.drop_duplicates(inplace=True)

Lecture 12: Handling Missing Data: dropna(), fillna(), and

Interpolation
● dropna(): Removes rows or columns with missing values
● fillna(): Fills missing values with specified values or methods
● Interpolation: Estimates missing values based on surrounding data
Python

import pandas as pd

# Drop rows with missing values

df.dropna(inplace=True)

# Fill missing values with mean

df.fillna(df.mean(), inplace=True)

Lecture 13: Data Transformation and Manipulation: Sorting,

Filtering, and Grouping
● Sorting: sort_values()
● Filtering: Boolean indexing
● Grouping: groupby()

Python

import pandas as pd

# Sort by age
df.sort_values('Age', ascending=False, inplace=True)

# Filter for age greater than 30

filtered_df = df[df['Age'] > 30]

# Group by gender and calculate mean age

grouped_df = df.groupby('Gender')['Age'].mean()

Lecture 14: Descriptive Statistics with Pandas

● Count, mean, median, mode, standard deviation, min, max, quartiles
● Correlation and covariance
Python

import pandas as pd

# Calculate summary statistics

print(df.describe())

# Calculate correlation between columns

correlation = df['Column1'].corr(df['Column2'])
print(correlation).

SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
No ratings yet
SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
527 pages
AI-based Self-Driving Car
No ratings yet
AI-based Self-Driving Car
9 pages
AICTE Proposals
No ratings yet
AICTE Proposals
186 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Question Bank - WTL-oral Question Bank - WTL-oral
No ratings yet
Question Bank - WTL-oral Question Bank - WTL-oral
9 pages
Tentative BTech - CSE 4TH Sem Syllabus 2018-19
No ratings yet
Tentative BTech - CSE 4TH Sem Syllabus 2018-19
26 pages
Specialized Visualization Tools - Coursera PDF
50% (2)
Specialized Visualization Tools - Coursera PDF
3 pages
FSD Notes
No ratings yet
FSD Notes
47 pages
Module 3
No ratings yet
Module 3
43 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Experiment 3 Module 1
No ratings yet
Experiment 3 Module 1
6 pages
STRING-Module 2 Notes
100% (1)
STRING-Module 2 Notes
29 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Machine Learning Exam Prep Guide
No ratings yet
Machine Learning Exam Prep Guide
4 pages
Unit I Dbms
0% (1)
Unit I Dbms
45 pages
AL3391 Artificial Intelligence Apr May 2024 Question Paper Download
No ratings yet
AL3391 Artificial Intelligence Apr May 2024 Question Paper Download
4 pages
Fds - Syllabus-2 Engineering Sppu
No ratings yet
Fds - Syllabus-2 Engineering Sppu
8 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
RTRP Lab Project
No ratings yet
RTRP Lab Project
13 pages
Dbms Final Report Nithin and Ramesh
No ratings yet
Dbms Final Report Nithin and Ramesh
40 pages
III Sem Syllabus RNSIT New
No ratings yet
III Sem Syllabus RNSIT New
19 pages
Computer Vision Vehicle Detection
No ratings yet
Computer Vision Vehicle Detection
4 pages
E-Commerce & Social Media Integration
No ratings yet
E-Commerce & Social Media Integration
64 pages
DATA SCIENCE 6th Sem
No ratings yet
DATA SCIENCE 6th Sem
40 pages
Shell Prog Finalpdf
No ratings yet
Shell Prog Finalpdf
43 pages
r20 4-1 Open Elective III Syllabus Final Ws
No ratings yet
r20 4-1 Open Elective III Syllabus Final Ws
29 pages
Python Project Vehicle Management System Final
No ratings yet
Python Project Vehicle Management System Final
26 pages
Pds Full Asiignment Mam - 240926 - 123334
No ratings yet
Pds Full Asiignment Mam - 240926 - 123334
15 pages
Chatbot: A Project Report
No ratings yet
Chatbot: A Project Report
7 pages
Software Engineering Course Guide
No ratings yet
Software Engineering Course Guide
3 pages
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
No ratings yet
Question Bank: T.E. (Computer Engineering) Data Science and Big Data Analytics (2019 Pattern)
4 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
Chpater 1 - Unit 2
No ratings yet
Chpater 1 - Unit 2
31 pages
Knowledge Engineering Basics
No ratings yet
Knowledge Engineering Basics
19 pages
JNTUA - R23 - B.tech. Cyber Security III & IV Year Course Structure & Syllabus
No ratings yet
JNTUA - R23 - B.tech. Cyber Security III & IV Year Course Structure & Syllabus
195 pages
FSD Module 3 Notes
No ratings yet
FSD Module 3 Notes
16 pages
Unit 2
No ratings yet
Unit 2
11 pages
DWM Manual
No ratings yet
DWM Manual
60 pages
Agile Technologies 21CS641 Module 1
No ratings yet
Agile Technologies 21CS641 Module 1
19 pages
Project Synopsis of Python
100% (1)
Project Synopsis of Python
6 pages
ML Lab (R22) Manual
No ratings yet
ML Lab (R22) Manual
25 pages
MongoDB for E-Commerce Devs
100% (8)
MongoDB for E-Commerce Devs
32 pages
IT2202 - OPERATING SYSTEMS Handout
No ratings yet
IT2202 - OPERATING SYSTEMS Handout
5 pages
Unit 4
No ratings yet
Unit 4
42 pages
31.5 - Python Syllabus
No ratings yet
31.5 - Python Syllabus
2 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
JNTUGV B.tech R23 Course Structure
No ratings yet
JNTUGV B.tech R23 Course Structure
6 pages
Big Data Analytics Comp Syllabus Sem7
No ratings yet
Big Data Analytics Comp Syllabus Sem7
4 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Mini Project Report-B.tech Cse
No ratings yet
Mini Project Report-B.tech Cse
12 pages
Membership Fees
No ratings yet
Membership Fees
2 pages
Fruits & Vegetable Classification and Calories Measurement System
No ratings yet
Fruits & Vegetable Classification and Calories Measurement System
2 pages
Set-2 (AIML202)
No ratings yet
Set-2 (AIML202)
6 pages
Result Management System
No ratings yet
Result Management System
11 pages
Python Data Analytics Course Guide
No ratings yet
Python Data Analytics Course Guide
36 pages
Data Analytics Curriculum
No ratings yet
Data Analytics Curriculum
8 pages
Data Analysis With Python & Pandas
100% (3)
Data Analysis With Python & Pandas
378 pages
Data Analyst Compressed
No ratings yet
Data Analyst Compressed
51 pages
Python
No ratings yet
Python
170 pages
A High-Speed CRC-32 Implementation On FPGA
No ratings yet
A High-Speed CRC-32 Implementation On FPGA
4 pages
Fluorescent Multi-Layer Disc: A Presentation On
No ratings yet
Fluorescent Multi-Layer Disc: A Presentation On
17 pages
Circular Queue Assignment
No ratings yet
Circular Queue Assignment
6 pages
SERVER SIDE SCRIPTING BASIC-php
No ratings yet
SERVER SIDE SCRIPTING BASIC-php
42 pages
Cse 2054 Dqac QB PC Aug 2023
No ratings yet
Cse 2054 Dqac QB PC Aug 2023
16 pages
Sybase Update Statistics
No ratings yet
Sybase Update Statistics
8 pages
LAB # 10 To Understand The Decisions: SWITCH Case Statements
No ratings yet
LAB # 10 To Understand The Decisions: SWITCH Case Statements
5 pages
NF 2
No ratings yet
NF 2
12 pages
GPIO & Power Consumption Guide
No ratings yet
GPIO & Power Consumption Guide
42 pages
TLE Practice Set 3 Review Guide
No ratings yet
TLE Practice Set 3 Review Guide
8 pages
Meesho Security Assessment Solution
No ratings yet
Meesho Security Assessment Solution
5 pages
EPN Vs VPN
No ratings yet
EPN Vs VPN
1 page
Class 5 ICT Notes
100% (1)
Class 5 ICT Notes
8 pages
HR & C Interview Questions Guide
No ratings yet
HR & C Interview Questions Guide
15 pages
Samba Setup Guide for Linux Users
No ratings yet
Samba Setup Guide for Linux Users
4 pages
SAP ABAP - Sample Report Program On WRITE, COLOR, HOTSPOT Keywords
No ratings yet
SAP ABAP - Sample Report Program On WRITE, COLOR, HOTSPOT Keywords
37 pages
Multmedia Studies
No ratings yet
Multmedia Studies
15 pages
Binary and CSV File Assigment
No ratings yet
Binary and CSV File Assigment
3 pages
Assemblers
100% (2)
Assemblers
62 pages
Java Final Exam Practice Test
No ratings yet
Java Final Exam Practice Test
12 pages
Developer Patch Notes
No ratings yet
Developer Patch Notes
7 pages
M3, C2 CAP Theorem
No ratings yet
M3, C2 CAP Theorem
30 pages
Et - Comp - 162 - Computer Applications Question Answer
No ratings yet
Et - Comp - 162 - Computer Applications Question Answer
13 pages
How To Format Datetime & Date With Century?: DD 0 DD 0
No ratings yet
How To Format Datetime & Date With Century?: DD 0 DD 0
18 pages
Fall Semester 2024-25 Freshers - CSE1012 - ETH - AP2024253000288 - 2024-10-08 - Reference-Material-II
No ratings yet
Fall Semester 2024-25 Freshers - CSE1012 - ETH - AP2024253000288 - 2024-10-08 - Reference-Material-II
25 pages
Programming Logic and Design: Ninth Edition
No ratings yet
Programming Logic and Design: Ninth Edition
49 pages
Network Monitoring System: User Guide
No ratings yet
Network Monitoring System: User Guide
54 pages
Linux Examples Exercises
No ratings yet
Linux Examples Exercises
7 pages
Developing Java Applications - Db2aje90
No ratings yet
Developing Java Applications - Db2aje90
401 pages
S. No. Roll NO Name Project Title: B R Krishna Kokiligada
No ratings yet
S. No. Roll NO Name Project Title: B R Krishna Kokiligada
8 pages

Data Analytics and Reporting - Notes Unit 1 and 2

Uploaded by

Data Analytics and Reporting - Notes Unit 1 and 2

Uploaded by

Data Analytics and

Python: A Brief Overview

Data Types in Python

Pandas: Your Data Analysis Toolkit

To use Pandas in your Python code, import it as follows:

DataFrame: The Core Data Structure

A DataFrame is a two-dimensional labeled data structure with columns of

data = {'Name': ['Alice', 'Bob', 'Charlie'],

Real-world example: A retailer might use data analytics to analyze customer

Lecture 2: Data Analysis and Data Processing

Real-world example: A telecom company might process customer call records

Lecture 3: Types of Analysis

● Diagnostic Analytics: Explores the reasons behind past occurrences.

● Predictive Analytics: Uses historical data to predict future outcomes.

● Prescriptive Analytics: Recommends actions based on predictive models.

Lecture 4: Difference Between Data Science and Analysis

Data Analysis is a subset of data science that focuses on exploring and

Lecture 5: Different Data Preprocessing Techniques

● Data Cleaning: Handling missing values, outliers, and inconsistencies.

● Business Intelligence (BI) tools: Power BI, Tableau, IBM Cognos.

Real-world example: A marketing team might use a BI tool to create a

Unit-02: Data Analysis

Key Features of Pandas:

● Data Cleaning and Preparation: Offers functions to handle missing

● Data Analysis: Provides tools for statistical calculations, data aggregation,

● Time Series: Excellent support for working with time-series data.

Why Pandas is Popular:

Lecture 7: Types of Data and Different Sources of Data

● Databases (SQL, NoSQL)

Lecture 9: Data Structures in Pandas: Series and DataFrame

Lecture 10: Importing and Exporting Data Using Pandas

# Import CSV data

Lecture 11: Data Cleaning and Preparation with Pandas

# Handle missing values

Lecture 12: Handling Missing Data: dropna(), fillna(), and

# Drop rows with missing values

# Fill missing values with mean

Lecture 13: Data Transformation and Manipulation: Sorting,

# Filter for age greater than 30

# Group by gender and calculate mean age

Lecture 14: Descriptive Statistics with Pandas

# Calculate summary statistics

# Calculate correlation between columns

You might also like