0% found this document useful (0 votes)

11 views15 pages

Sec Assignment

The document discusses the differences and similarities between Python and Excel for data analysis, highlighting Python's advantages for large datasets and automation. It covers the installation of Anaconda and Jupyter Notebook, the importance of data types, conditional statements, and the role of Pandas in data manipulation. Additionally, it explains machine learning concepts, data transformation techniques, and provides insights on handling duplicates and missing values in datasets.

Uploaded by

diyanshaadvani98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Sec Assignment

Uploaded by

diyanshaadvani98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

SECTION A-

Q1. Explain the differences and similarities between Python and Excel. Provide
real-life scenarios where Python is preferred over Excel and justify why.
ANS. Python and Excel are widely used for data analysis, but they serve different purposes:
FEATURES PYTHON EXCEL
AUTOMATION Supports scripting(Pandas, Requires VBA or power
Numpy) query
SCALABILITY Handles large dataset Slows down with large data
efficiently
VISUALIZATION Uses matplotlib, seaborn Built-in charts
DATA PROCESSING More flexible with complex Limited for large scale
logic. operations

Excel is preferred for budget planning, small-scale reports, quick data visualization.
Python is preferred for large scale data analysis, predictive modeling, automation.

Q2. Describe the process of installing Anaconda and launching Jupyter

Notebook. Explain how to create, save, and run Python code using Jupyter
Notebook.
 Go to the Anaconda website and download the installer for your operating system
(Windows, macOS, or Linux).
 Choose the Python 3.x version.
 Run the downloaded installer and follow the installation instructions.
 During installation, select the option to Add Anaconda to your system PATH for
easier command-line access.
 Once installed, open Anaconda Navigator.
 Open Anaconda Navigator, and you'll see an option to launch Jupyter Notebook.

Create a New Notebook:

 In the Jupyter Notebook interface, click New (top-right corner), then select Python 3
from the dropdown list.
 This opens a new notebook where you can start writing your Python code.

Run Python Code:

 To run the code in a cell, press Shift + Enter or click the Run button in the toolbar.
 The output will appear below the code cell.
Q3. Illustrate the differences between various Python data types (int, float, str,
list, tuple, dict, etc.) with examples. Discuss the importance of understanding
data types while handling large datasets.
ANS. Python has several fundamental data types:
int: 10
float: 3.14
str: "hello"
list: [1, 2, 3]
tuple: (1, 2, 3)
dict: {"name": "Alice", "age": 25}

Importance of Data Types in Large Datasets:

 Helps optimize memory usage.
 Ensures correct operations (e.g., string vs. numeric calculations).

Q.4Define and explain conditional statements in Python with multiple use-case

examples. Include examples of nested if-else and practical scenarios where
nested conditions are necessary.
ANS. if statement: Executes code only if a condition is true.
Elif statement: Checks additional conditions if previous ones are false.
Else statement: Executes code when all previous conditions are true.

Nested if-else statement: this occur when you include an if-else statement inside
another.

Practical Use-Case:
SECTION B-

Q6. Discuss the role of Pandas in data manipulation. Explain the key differences
between DataFrame and Series with appropriate code examples.
ANS. Role of pandas in data manipulation:
 Data Cleaning
 Data Transformation
 Exploratory Data Analysis(EDA)
 Integration
DataFrame VS Series:
Series DataFrame
One dimensional Two Dimensional
Used for single column data or simple Used for handling datasets with multiple
labeled arrays. rows and columns.
Support element wise operations. Allows complex operations like joining ,
grouping and merging data across
multiple column.
Q7. Write a Python script that reads data from an Excel file, performs data
cleaning (removes duplicates and fills missing values), and outputs a
summary of the data.
ANS.

Q8. Demonstrate how to create various types of visualizations (line plots,

bar charts, scatter plots, and histograms) using Matplotlib. Customize the
visualizations with labels, legends, and colors.
ANS.
SECTION C-
Q.11 Explain the concept of Machine Learning (ML) and differentiate between
supervised and unsupervised learning. Provide real-world examples of each type.

Ans. Machine Learning (ML) is a subset of artificial intelligence (AI) that enables systems to
learn from data and make predictions or decisions without being explicitly programmed.

Supervised vs. Unsupervised Learning

Feature Supervised Learning Unsupervised Learning

Definition Learns from labeled data (input-output Learns from unlabeled data by
pairs) finding patterns
Goal Predict outcomes Discover hidden structures or
(classification/regression) clusters
Data Labeled Unlabeled
Type
Examples Email spam detection, predicting house Customer segmentation, anomaly
prices detection

Real-World Examples

 Supervised Learning: Identifying spam emails (Spam vs. Not Spam)

 Unsupervised Learning: Grouping customers based on shopping behavior for
targeted marketing
Q12. Describe the key steps involved in building a Machine Learning model using
Scikit-learn. Explain how to split data into training and testing sets and evaluate
the model.

Q13. Demonstrate the implementation of a Linear Regression model using Scikit-

learn to predict house prices based on a given dataset. Include evaluation of
model performance using appropriate metrics.

Q14. .Discuss the differences between Linear and Logistic Regression. Highlight
the mathematical equations involved and explain when to use each method.
Ans.
Feature Linear Regression Logistic Regression
Type Regression (Predicts Classification (Predicts probability of classes)
continuous values)
Equation y=β0+β1x+ϵ P(Y=1)=1/(1+e^(−(β0+β1x)))
Output Continuous values (e.g., Probability (0 to 1)
house prices)
Use Case When predicting numeric When predicting categorical outcomes (e.g.,
values spam detection)

When to Use:

 Linear Regression → Predicting salaries, stock prices.

 Logistic Regression → Classifying emails as spam or not spam

SECTION D-
Q17. Discuss different data transformation techniques such as normalization,
standardization, and encoding. Write Python code that performs each of these
techniques on a sample dataset.

Ans.  Normalization (Min-Max Scaling)

 Scales data between 0 and 1.

 Formula: X′=( X−Xmin)/( Xmax−Xmin )

 Standardization (Z-Score Scaling)

 Centers data around mean = 0 and std = 1.

 Formula: X′=(X−μ)/σ

 Encoding (Categorical Data Transformation)

 One-Hot Encoding → Converts categories into binary columns.

 Label Encoding → Assigns numeric values to categories.
Q18. Explain overfitting and underfitting in machine learning models. Provide
practical examples where overfitting occurs and discuss strategies to mitigate it
using regularization.
Ans.

Concept Overfitting Underfitting

Definition Model learns noise and performs well on Model is too simple and fails to
training data but poorly on test data. capture patterns in data.
Cause Too complex model, too many features, Model is too simple, not enough
small dataset. features, high bias.
Effect High variance, poor generalization. High bias, poor accuracy.

Example of Overfitting

 A deep neural network trained on a small dataset memorizes data but fails on new
examples.
 A decision tree that grows too deep and perfectly classifies training data but
performs poorly on unseen data.

Mitigating Overfitting Using Regularization

Regularization helps control model complexity:

1. Lasso Regression (L1): Shrinks some coefficients to zero, selecting important

features.
2. Ridge Regression (L2): Shrinks all coefficients to small values, reducing model
complexity.
3. Dropout (for Neural Networks): Randomly drops neurons to prevent memorization.
4. Cross-Validation: Splitting data into multiple sets to evaluate performance.
5. More Data: Increases generalization and reduces noise impact.
Q19. Implement Ridge and Lasso Regression on a dataset and compare the
impact of L1 and L2 regularization techniques. Explain which model performs
better and why.

Comparison & Explanation

 Ridge Regression (L2 Regularization): Shrinks coefficients but does not eliminate
them completely. It performs better when all features contribute to predictions.
 Lasso Regression (L1 Regularization): Shrinks some coefficients to zero, effectively
performing feature selection. It performs better when some features are irrelevant.

Which is better?

 If feature selection is needed → Lasso is better.

 If all features contribute → Ridge is better.

SECTION E-
Q21. Write a Python script to identify and remove duplicate rows from a large
dataset using Pandas. Provide a detailed explanation of how Pandas handles
duplicates.
Ans.

 df.duplicated() → Returns a Boolean Series indicating duplicate rows.

 df.drop_duplicates() → Removes duplicate rows, keeping the first occurrence by

default.

 Options:

 keep='first' (default) → Keeps the first occurrence.

 keep='last' → Keeps the last occurrence.
 keep=False → Removes all duplicates.

Q22. Explain the importance of handling missing values in a dataset. Write

Python code that demonstrates different methods to handle missing values
(drop, fill, or interpolate).
 dropna() → Removes rows with missing values.
 fillna(df.mean()) → Replaces missing values with the column's mean.
 interpolate() → Estimates missing values based on neighboring data.

Q23. Illustrate the concept of indexing and slicing in Pandas DataFrame. Provide
examples of selecting specific rows, columns, and sub-sections of the
DataFrame.
SECTION F-

Q26.Demonstrate how to create and customize interactive plots using Plotly.

Explain how it differs from Matplotlib and Seaborn for data visualization.

Q27. Write a Python script to perform time series analysis and visualization on a
dataset. Explain how to identify trends, seasonality, and anomalies.
Ans. Time series analysis involves analyzing data points ordered by time to uncover
underlying patterns such as trends, seasonality, and anomalies. The process typically
involves:

1. Identifying Trends: Long-term movements in the data.

2. Identifying Seasonality: Regular, repeating patterns.
3. Detecting Anomalies: Outliers or unusual events in the time series.

Steps for Time Series Analysis:

 Visualize the data.

 Decompose the time series to identify trend, seasonality, and residual components.
 Detect anomalies using statistical methods.
Q28. Demonstrate how to use a Box Plot to identify outliers in a dataset. Provide
Python code to create a Box Plot and explain how to interpret it.
A Box Plot is a powerful visualization for identifying outliers and understanding the
distribution of the data. It displays the minimum, first quartile (Q1), median (Q2), third
quartile (Q3), and maximum of the data. Outliers are typically points that fall outside of the
"whiskers" of the box plot, which are calculated as 1.5 times the interquartile range (IQR)
from the first and third quartiles.

How to Interpret a Box Plot

 Box: Represents the interquartile range (IQR) from Q1 to Q3, containing the middle
50% of the data.
 Whiskers: Extend to 1.5 times the IQR from Q1 and Q3. Data points outside the
whiskers are potential outliers.
 Median: The line inside the box represents the median (Q2) of the data.
How to Interpret the Box Plot:

 The box shows where the middle 50% of the data lies.
 The line inside the box is the median (Q2) value.
 The whiskers extend to the most extreme values within 1.5 times the IQR.
 Outliers: Data points beyond the whiskers (above 120 and below 80 in this example).

? Python Interview Q
No ratings yet
? Python Interview Q
29 pages
Question Bank Python For Data Science
0% (1)
Question Bank Python For Data Science
3 pages
NPTEL
No ratings yet
NPTEL
13 pages
Revision Questions
No ratings yet
Revision Questions
19 pages
Python Exam for EEE Students
No ratings yet
Python Exam for EEE Students
3 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
ml2 250401 105339
No ratings yet
ml2 250401 105339
10 pages
DATASCIENCE (Unit-1) Question Bank
No ratings yet
DATASCIENCE (Unit-1) Question Bank
6 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
PPS CT 3 QB
No ratings yet
PPS CT 3 QB
4 pages
Python CAT Papers
No ratings yet
Python CAT Papers
6 pages
Python Programming 2 - Assignment 1
No ratings yet
Python Programming 2 - Assignment 1
1 page
Data Science
No ratings yet
Data Science
10 pages
EML Midterm Answer Keys
No ratings yet
EML Midterm Answer Keys
3 pages
Solution
No ratings yet
Solution
18 pages
Viva
No ratings yet
Viva
7 pages
Solutions To Applied Data Science AI
No ratings yet
Solutions To Applied Data Science AI
9 pages
23CSE312-MQP Python Sjbit
No ratings yet
23CSE312-MQP Python Sjbit
3 pages
QP Xii Ip Hy 2024-25
No ratings yet
QP Xii Ip Hy 2024-25
9 pages
Ip CLSS Xii 2024-25 Hy
No ratings yet
Ip CLSS Xii 2024-25 Hy
14 pages
Instructions For Candidates:: A10.1-R5-Data Science Using Python
No ratings yet
Instructions For Candidates:: A10.1-R5-Data Science Using Python
8 pages
Computational
No ratings yet
Computational
7 pages
Ip QP 1
No ratings yet
Ip QP 1
11 pages
Sample Paper 2 IP 12
No ratings yet
Sample Paper 2 IP 12
8 pages
DVW 203105491 - 5926 - Question - Paper
No ratings yet
DVW 203105491 - 5926 - Question - Paper
2 pages
Data Science for Engineers Course
No ratings yet
Data Science for Engineers Course
8 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
Python & Pandas Statistical Analysis Q&A
No ratings yet
Python & Pandas Statistical Analysis Q&A
2 pages
Ds Viva
No ratings yet
Ds Viva
9 pages
23cde312 MQP2
No ratings yet
23cde312 MQP2
2 pages
Scoring Key/marking Scheme
No ratings yet
Scoring Key/marking Scheme
9 pages
Unit-II Data Science QB
No ratings yet
Unit-II Data Science QB
33 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Dav End Sem
No ratings yet
Dav End Sem
2 pages
Worksheet Class 12 Ai
No ratings yet
Worksheet Class 12 Ai
38 pages
Python
No ratings yet
Python
6 pages
PPPL Final Practical Questions
No ratings yet
PPPL Final Practical Questions
5 pages
Idsup Mid Sem Exam-2023
No ratings yet
Idsup Mid Sem Exam-2023
2 pages
Question Bank
No ratings yet
Question Bank
2 pages
Board PAper 2021 Term1
No ratings yet
Board PAper 2021 Term1
15 pages
ML Practice - Set
No ratings yet
ML Practice - Set
2 pages
Python2 Materials
No ratings yet
Python2 Materials
27 pages
Dump
No ratings yet
Dump
8 pages
12pb24ip01 QP
No ratings yet
12pb24ip01 QP
12 pages
Work Sheet-1 Class 12 IPR
No ratings yet
Work Sheet-1 Class 12 IPR
5 pages
Bangluru Ip
No ratings yet
Bangluru Ip
6 pages
Top 50 Python Interview Questions
No ratings yet
Top 50 Python Interview Questions
8 pages
ML Year1 Exam Paper
No ratings yet
ML Year1 Exam Paper
3 pages
Decap776 P 1
No ratings yet
Decap776 P 1
6 pages
Model QB Many
No ratings yet
Model QB Many
37 pages
Question Bank CIA 2
No ratings yet
Question Bank CIA 2
3 pages
Ocs353 Data Science Fundamentals Laboratory-Eee
No ratings yet
Ocs353 Data Science Fundamentals Laboratory-Eee
52 pages
Dsa QB
No ratings yet
Dsa QB
4 pages
12 Pre Board 2
No ratings yet
12 Pre Board 2
7 pages
Data Science Using Python
No ratings yet
Data Science Using Python
7 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
AK Preboard 2
No ratings yet
AK Preboard 2
10 pages
Chapter 6 - Review Questions
No ratings yet
Chapter 6 - Review Questions
6 pages
Demo Imperial
No ratings yet
Demo Imperial
11 pages
Engineering Lab: BJT Amplifier Study
No ratings yet
Engineering Lab: BJT Amplifier Study
9 pages
Instagram Powerpoint Template by Ppthemes
No ratings yet
Instagram Powerpoint Template by Ppthemes
13 pages
DS Professor Layton Diabolical Box
No ratings yet
DS Professor Layton Diabolical Box
40 pages
03 - FNG31 - Install SW in ATCA NG
No ratings yet
03 - FNG31 - Install SW in ATCA NG
30 pages
Fybcom Sem 1 Commerce 1
No ratings yet
Fybcom Sem 1 Commerce 1
20 pages
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
No ratings yet
Swi MT940 and MT950 Statements Customer Service Guide: Haribabu Ramineni Full Description
15 pages
Starting From SCRATCH: An Introduction To Computing Science - Scratching The Surface
No ratings yet
Starting From SCRATCH: An Introduction To Computing Science - Scratching The Surface
9 pages
LG K8 (2017) - Schematic Diagarm PDF
No ratings yet
LG K8 (2017) - Schematic Diagarm PDF
141 pages
Technote-HowToDecodeOpt82 v1
No ratings yet
Technote-HowToDecodeOpt82 v1
4 pages
Using The 1783-NATR To Bridge Networks - The Automation Blog
No ratings yet
Using The 1783-NATR To Bridge Networks - The Automation Blog
5 pages
Java Unit Wise Questions
No ratings yet
Java Unit Wise Questions
4 pages
L I N K e D: Linked List
100% (1)
L I N K e D: Linked List
11 pages
Modaris V7R2 EN tcm31-216804
0% (1)
Modaris V7R2 EN tcm31-216804
2 pages
Practical
No ratings yet
Practical
20 pages
CompTIA Server
No ratings yet
CompTIA Server
3 pages
Resume Contoh
No ratings yet
Resume Contoh
3 pages
Firmware Update Log for Engineers
No ratings yet
Firmware Update Log for Engineers
2 pages
Warhammer 40k: Assassinorum Guide
0% (1)
Warhammer 40k: Assassinorum Guide
4 pages
Comparision Sheet Colour Printer
No ratings yet
Comparision Sheet Colour Printer
4 pages
Speed Control of Switched Reluctance Motor Based On Fuzzy Logic Controller
No ratings yet
Speed Control of Switched Reluctance Motor Based On Fuzzy Logic Controller
5 pages
Internship Report
No ratings yet
Internship Report
25 pages
Linux on TX3 Mini: A Technical Guide
No ratings yet
Linux on TX3 Mini: A Technical Guide
22 pages
Job Interview Etiquette
100% (1)
Job Interview Etiquette
47 pages
692283-Phone Repair StepbyStep Flowchart Diagrams
No ratings yet
692283-Phone Repair StepbyStep Flowchart Diagrams
52 pages
Topic 1
No ratings yet
Topic 1
38 pages
PHXT 25 QUOT0563 26 May 2025 Sobha Furniture Industries LLC Ci62 + IQC May 2025
No ratings yet
PHXT 25 QUOT0563 26 May 2025 Sobha Furniture Industries LLC Ci62 + IQC May 2025
2 pages
CNG Owners Manual V1.0.2
No ratings yet
CNG Owners Manual V1.0.2
35 pages
Goldman Sachs Cover Letter Advice
100% (2)
Goldman Sachs Cover Letter Advice
7 pages