UNIT 1: Introduction to Data Analytics
Data Analytics is the science of analyzing raw data to make conclusions.
Types of Data:
- Structured: Tabular (e.g., SQL, Excel)
- Semi-Structured: Tagged data (e.g., XML, JSON)
- Unstructured: Media, emails (e.g., images, videos)
Sources of Data:
- Internal: From within the organization (e.g., CRM)
- External: Outside sources (e.g., Govt. reports)
- Primary: Direct collection (e.g., surveys)
- Secondary: Already available (e.g., research papers)
- Real-Time: Streaming (e.g., IoT, stock feeds)
Applications: Healthcare, marketing, fraud detection.
UNIT 2: Data Analysis Techniques
1. Regression: Predicts continuous values (e.g., price prediction).
- Types: Linear, Multiple, Polynomial
2. Classification: Predicts categories (e.g., email spam).
- Methods: Decision Tree, SVM, Logistic Regression
3. Bayesian Modeling: Probabilistic reasoning using Bayes theorem.
4. Neural Networks: Used in deep learning for image, speech.
5. Fuzzy Logic: Deals with imprecise data; applied in control systems.
UNIT 3: Exploratory Data Analysis (EDA)
1. Data Cleaning: Handling missing, duplicate data.
2. Data Transformation: Normalization, encoding.
3. Visualization: Histograms, scatter plots, heatmaps.
4. Statistical Summaries: Mean, median, standard deviation.
Tools: Python (pandas, seaborn), R
Goal: Discover patterns, outliers, and trends before formal modeling.
UNIT 4: Frequent Pattern Mining and Clustering
1. Association Rule Mining:
- Apriori: Uses support and confidence to find frequent itemsets.
- FP-Growth: Faster, tree-based algorithm.
2. Clustering:
- K-Means: Partitions data into K groups.
- Hierarchical: Creates nested clusters (dendrogram).
Applications: Market basket analysis, customer segmentation.
UNIT 5: Big Data Frameworks & Visualization
1. Hadoop:
- HDFS for storage, MapReduce for processing.
2. Spark:
- In-memory computing, supports MLlib and streaming.
3. R Programming: Data manipulation, statistical analysis.
4. Visualization Tools:
- Tableau, Power BI: Interactive dashboards
- Python: Matplotlib, Seaborn for plots.
Used in real-time analytics and decision support systems.