KEMBAR78
Chapter 1+ Python Basics | PDF | Data Science | Boolean Data Type
0% found this document useful (0 votes)
9 views6 pages

Chapter 1+ Python Basics

Uploaded by

arsalserver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Chapter 1+ Python Basics

Uploaded by

arsalserver
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Science

Chapter 1
1. What is Data Science?
Definition:
Data Science is an interdisciplinary field that uses data to extract insights, build models, and
support decision-making.
It combines three key elements:
o Statistics & Mathematics → to understand patterns in data.
o Computer Science (Programming) → to process and analyze data using algorithms.
o Domain Knowledge → to apply findings to real-world problems.
o Example:
o Healthcare: Using data to predict diseases.
o Netflix: Recommending movies based on your watch history.
o Banks: Detecting fraudulent transactions.

2. Why Data Science?


In the modern world, huge amounts of data are generated every second (social media, online
shopping, healthcare, finance, etc.).
Raw data is useless unless converted into meaningful insights.
Data Science helps in decision-making, predictions, and automation.
Applications of Data Science:
Netflix: Movie/TV show recommendations.
Facebook & Instagram: Friend suggestions and content ranking.
Banks: Fraud detection and credit scoring.
Healthcare: Disease risk prediction and drug discovery.

3. The Data Science Workflow


The typical process followed by a data scientist is:
o Define the Problem → What are we trying to solve?
o Collect Data → Gather information from databases, APIs, sensors, etc.
o Clean & Prepare Data → Handle missing values, remove duplicates, fix errors.
o Explore Data (EDA – Exploratory Data Analysis) → Use statistics and visualizations
to understand data.
o Build Models (Machine Learning/AI) → Train algorithms to make predictions.
o Evaluate & Interpret Results → Check accuracy, performance, and meaning.
o Communicate Insights → Present findings through reports, dashboards, or
visualizations.
Think of it as a cycle – data science is an iterative process.

4. Case Study: DataSciencester


To make these ideas practical, Joel Grus (in Data Science from Scratch) introduced a fictional
social network called DataSciencester.
(a) Representing Users
Each user is stored in Python as a dictionary with an id and name.
users = [
{"id": 0, "name": "Hero"},
{"id": 1, "name": "Dunn"},
{"id": 2, "name": "Sue"},
]
Here, id is unique, making it easy to track users.
(b) Representing Friendships
Friendships are stored as pairs of user IDs.
friendships = [
(0, 1),
(0, 2),
(1, 2)
]
(0, 1) means user 0 (Hero) is friends with user 1 (Dunn).
(c) Building a Friend Network
We can attach a list of friends to each user:
for user in users:
user["friends"] = [] # start with empty list
for i, j in friendships:
users[i]["friends"].append(users[j])
users[j]["friends"].append(users[i])
Now:
Hero’s friends → Dunn, Sue
Dunn’s friends → Hero, Sue
(d) Analyzing Connections
Number of Friends:
def number_of_friends(user):
return len(user["friends"])
Average Connections:
total_connections = sum(number_of_friends(user) for user in users)
avg_connections = total connections / len(users)
This tells us how connected people are on average.
(e) Finding Popular Users
We can sort users by their number of friends:
num_friends_by_id = [(user["id"], number_of_friends(user)) for user in users]
print(sorted(num_friends_by_id, key=lambda x: x[1], reverse=True))
This identifies influencers in the network.
(f) Friend-of-a-Friend (Foaf)
We can recommend new friends based on mutual friends:
def friends_of_friend_ids(user):
return [foaf["id"]
for friend in user["friends"]
for foaf in friend["friends"]]
Example: If Hero is friends with Dunn, and Dunn is friends with Sue, then Sue is a friend-of-a-
friend of Hero.

5. Why This Case Study is Important


Teaches data representation → users and relationships stored in Python.
Demonstrates basic analysis → counting, averaging, ranking.
Shows real-world relevance → friend suggestions, influencer ranking, community detection.
This is a mini version of Facebook or LinkedIn.

Roles in Data Science


Data science is teamwork where different professionals handle different parts of the process.
Data Scientist
Analyzes data to find patterns, insights, and predictions.
Builds machine learning models.
Communicates results to decision-makers.
Example: Predicting which customers are likely to leave a telecom company.
Data Engineer
Designs and manages data pipelines, databases, and storage systems.
Ensures data is clean, accessible, and reliable for analysis.
Example: Building the system that collects streaming data from YouTube users.
Machine Learning Engineer
Focuses on deploying ML models into production.
Optimizes algorithms for speed and accuracy.
Example: Implementing recommendation models that run live on Netflix.
Business Analyst
Acts as a bridge between technical teams and business teams.
Converts insights into strategies and decisions.
Example: Using sales data to advise a retail store on stocking inventory.

Skills Required for Data Science


Programming Skills
Python, R, SQL for data analysis and automation.
Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn.
Mathematics & Statistics
Probability, hypothesis testing, linear algebra.
Understanding distributions, correlation, regression.
Data Visualization
Tools: Tableau, Power BI, Matplotlib, Seaborn.
Skill: Present results in charts and graphs for better storytelling.
Soft Skills
Problem-Solving: Breaking down complex problems into steps.
Communication: Explaining results to non-technical people.
Storytelling with Data: Turning numbers into actionable insights.
Example: Presenting a fraud detection system to bank managers in simple terms.

Scope of Data Science


Data Science is one of the fastest-growing fields with applications across industries.
o Healthcare: Predicting diseases, drug discovery.
o Finance: Fraud detection, credit scoring.
o Retail & E-commerce: Customer segmentation, product recommendation.
o Social Media: News feed ranking, friend recommendations.
o Manufacturing: Predictive maintenance of machines.

Class Activity
Task: Pick one Pakistani company (for example, Careem, Daraz, Jazz, HBL).
Discuss in groups:
How does it already use data science?
If not, how could it use data science to improve services?
Example: Careem uses data science for estimating ride fares, matching drivers with passengers,
and predicting demand in different areas.

Python Basics – Syntax, Data Types, and Loops


Python Setup
Install Python from python.org or install Anaconda which includes Python, Jupyter Notebook,
and libraries.
o IDEs (Integrated Development Environments):
o Jupyter Notebook: Best for data science.
o VS Code: Lightweight and widely used.
o PyCharm: Professional IDE.

Basic Python Syntax Rules:


Must start with a letter or underscore.
Print Statement Cannot start with a number.
print("Hello, Data Science") Case-sensitive (Name ≠ name).
Output:
Hello, Data Science
Variables
Variables are used to store values.
name = "Aleeza" # string
age = 21 # integer
gpa = 3.7 # float
is_student = True # boolean

Data Types in Python

Text Type Boolean Type


str → string (text in quotes). bool → True or False
message = "Hello World" is_active = True

Collections
List → Ordered, changeable. Numeric Types
fruits = ["apple", "banana", "cherry"] int → integers (e.g., 10)
Tuple → Ordered, unchangeable. float → decimal numbers (e.g., 3.14)
coordinates = (4, 5) x = 10 # int
Dictionary → Key-value pairs. y = 3.14 # float
student = {"name": "Aleeza", "age": 21}

Loops in Python
For Loop
Used when the number of iterations is known.
for i in range(5):
print(i)
Output:
0
1
2
3
4
While Loop
Used when the number of iterations is not fixed.
i=0
while i < 5:
print(i)
i += 1
Output:
0
1
2
3
4

You might also like