KEMBAR78
DA Unitwise Notes Detailed Cleaned | PDF
0% found this document useful (0 votes)
7 views5 pages

DA Unitwise Notes Detailed Cleaned

The document provides an overview of Data Analytics, covering types of data, sources, and applications across various fields. It details data analysis techniques such as regression, classification, and clustering, along with exploratory data analysis methods and tools. Additionally, it discusses big data frameworks like Hadoop and Spark, as well as visualization tools for effective data presentation.

Uploaded by

saumya2213215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

DA Unitwise Notes Detailed Cleaned

The document provides an overview of Data Analytics, covering types of data, sources, and applications across various fields. It details data analysis techniques such as regression, classification, and clustering, along with exploratory data analysis methods and tools. Additionally, it discusses big data frameworks like Hadoop and Spark, as well as visualization tools for effective data presentation.

Uploaded by

saumya2213215
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT 1: Introduction to Data Analytics

Data Analytics is the science of analyzing raw data to make conclusions.

Types of Data:

- Structured: Tabular (e.g., SQL, Excel)

- Semi-Structured: Tagged data (e.g., XML, JSON)

- Unstructured: Media, emails (e.g., images, videos)

Sources of Data:

- Internal: From within the organization (e.g., CRM)

- External: Outside sources (e.g., Govt. reports)

- Primary: Direct collection (e.g., surveys)

- Secondary: Already available (e.g., research papers)

- Real-Time: Streaming (e.g., IoT, stock feeds)

Applications: Healthcare, marketing, fraud detection.


UNIT 2: Data Analysis Techniques

1. Regression: Predicts continuous values (e.g., price prediction).

- Types: Linear, Multiple, Polynomial

2. Classification: Predicts categories (e.g., email spam).

- Methods: Decision Tree, SVM, Logistic Regression

3. Bayesian Modeling: Probabilistic reasoning using Bayes theorem.

4. Neural Networks: Used in deep learning for image, speech.

5. Fuzzy Logic: Deals with imprecise data; applied in control systems.


UNIT 3: Exploratory Data Analysis (EDA)

1. Data Cleaning: Handling missing, duplicate data.

2. Data Transformation: Normalization, encoding.

3. Visualization: Histograms, scatter plots, heatmaps.

4. Statistical Summaries: Mean, median, standard deviation.

Tools: Python (pandas, seaborn), R

Goal: Discover patterns, outliers, and trends before formal modeling.


UNIT 4: Frequent Pattern Mining and Clustering

1. Association Rule Mining:

- Apriori: Uses support and confidence to find frequent itemsets.

- FP-Growth: Faster, tree-based algorithm.

2. Clustering:

- K-Means: Partitions data into K groups.

- Hierarchical: Creates nested clusters (dendrogram).

Applications: Market basket analysis, customer segmentation.


UNIT 5: Big Data Frameworks & Visualization

1. Hadoop:

- HDFS for storage, MapReduce for processing.

2. Spark:

- In-memory computing, supports MLlib and streaming.

3. R Programming: Data manipulation, statistical analysis.

4. Visualization Tools:

- Tableau, Power BI: Interactive dashboards

- Python: Matplotlib, Seaborn for plots.

Used in real-time analytics and decision support systems.

You might also like