KEMBAR78
Advanced Analytics | PDF | Factor Analysis | Statistical Analysis
0% found this document useful (0 votes)
19 views23 pages

Advanced Analytics

Uploaded by

burn0cis73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views23 pages

Advanced Analytics

Uploaded by

burn0cis73
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CCEE Exam Study Notes - Data Analytics

Session 1 & 2: Introduction to Analytics & Data Analytics Life Cycle

Introduction to Analytics
• Definition: Analytics is the systematic computational analysis of data to discover patterns,
trends, and insights that support decision-making
• Key Concept: Analytics transforms raw data into actionable business intelligence
• Types: Descriptive (what happened), Predictive (what will happen), Prescriptive (what
should happen)

Data Analytics Life Cycle


Definition: A structured approach to data analytics projects with sequential phases

1. Discovery
• Purpose: Understanding business requirements and available data
• Key Activities: Problem definition, data source identification, stakeholder alignment
• Common MCQ Tip: Remember this is the first phase - focuses on “what” not “how”

2. Data Preparation
• Purpose: Clean, transform, and structure data for analysis
• Key Activities: Data cleaning, integration, transformation, quality checks
• Common MCQ Tip: This phase typically takes 60–80% of project time

3. Model Planning
• Purpose: Select appropriate analytical methods and tools
• Key Activities: Algorithm selection, variable selection, model design
• Explanation: Like choosing the right tool for a job before starting work

4. Model Building Implementation


• Purpose: Develop and execute the analytical model
• Key Activities: Coding, parameter tuning, initial testing
• Common MCQ Tip: This is where actual algorithms are coded and run

5. Quality Assurance
• Purpose: Validate model accuracy and reliability
• Key Activities: Testing, validation, performance measurement
• Key Concept: Ensures model meets business requirements and statistical standards

6. Documentation
• Purpose: Record methodology, findings, and procedures

1
• Key Activities: Technical documentation, user guides, process documentation
• Common MCQ Tip: Critical for reproducibility and knowledge transfer

7. Management Approval
• Purpose: Obtain stakeholder sign-off for deployment
• Key Activities: Presentation, business case validation, risk assessment
• Explanation: Business validation before technical implementation

8. Installation
• Purpose: Deploy model into production environment
• Key Activities: System integration, performance optimization, user training
• Key Concept: Moving from development to live operational use

9. Acceptance and Operation


10. Intelligent Data Analysis
• Purpose: Advanced analytical techniques for complex insights
• Key Activities: Machine learning, AI applications, pattern recognition
• Explanation: Uses sophisticated algorithms to find hidden patterns
Quick Summary Checklist:
• ✓Analytics transforms data into insights
• ✓Life cycle has 10 sequential phases
• ✓Discovery defines the problem
• ✓Data preparation consumes most time
• ✓Documentation ensures reproducibility
• ✓Operation includes ongoing monitoring

Session 3 & 4: Probability Fundamentals

Sample Spaces and Events


• Sample Space (S): Set of all possible outcomes of an experiment
• Event: Any subset of the sample space
• Key Concept: Foundation for all probability calculations
• Example: Rolling a die - Sample space = {1, 2, 3, 4, 5, 6}, Event “even number” = {2, 4, 6}

Joint, Conditional and Marginal Probability


Joint Probability
• Definition: P( A ∩ B) = Probability that both events A and B occur
• Formula: P( A and B) = P( A) × P( B) if events are independent
• Common MCQ Tip: Look for “AND” keywords in questions

2
Conditional Probability
• Definition: P( A | B) = Probability of A given that B has occurred
P( A∩ B)
• Formula: P( A | B) = P( B)
, where P( B) ̸= 0
• Key Concept: Probability changes when we have additional information
• Common MCQ Tip: Look for “given that” or “|” symbol

Marginal Probability
• Definition: Probability of a single event, ignoring other variables
• Formula: P( A) = ∑ P( A ∩ Bi ) for all possible events Bi
• Explanation: Like finding totals in probability tables

Bayes’ Theorem
• Definition: Method to update probability based on new evidence
P( B| A)× P( A)
• Formula: P( A | B) = P( B)

• Key Components:
– P( A | B): Posterior probability
– P( B | A): Likelihood
– P( A): Prior probability
– P( B): Evidence
• Application: Medical diagnosis, spam filtering, machine learning
• Common MCQ Tip: Remember the formula structure - numerator has likelihood × prior
Quick Summary Checklist:
• ✓Sample space contains all possible outcomes
• ✓Joint probability uses AND logic
• ✓Conditional probability uses GIVEN information
• ✓Bayes’ theorem updates probabilities with evidence
• ✓Marginal probability ignores other variables

Session 5 & 6: Random Variables and Relationships

Random Variable
• Definition: A function that assigns numerical values to outcomes of random experiments
• Types:
– Discrete: Countable values (e.g., number of customers)
– Continuous: Any value in a range (e.g., height, weight)
• Key Concept: Bridge between probability theory and real-world measurements

Concepts of Correlation
• Definition: Statistical measure of linear relationship between two variables

3
• Range: −1 to +1
• Interpretation:
– +1: Perfect positive correlation
– 0: No linear correlation
– −1: Perfect negative correlation
[( x − x̄ )(y −ȳ)]
• Formula: r = √∑ i 2 i
∑( xi − x̄ ) ∑(yi −ȳ)2

• Common MCQ Tip: Correlation ̸= Causation

Covariance
• Definition: Measure of how two variables change together
• Formula: Cov( X, Y ) = E[( X − µ X )(Y − µY )]
• Key Concept:
– Positive covariance: Variables increase together
– Negative covariance: One increases as other decreases
– Zero covariance: No linear relationship
Covariance
• Relationship: Correlation = σX ×σY

Outliers
• Definition: Data points significantly different from other observations
• Detection Methods:
– IQR Method: Values below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR
– Z-score Method: | Z | > 2 or 3 (depending on threshold)
– Visual: Box plots, scatter plots
• Handling Techniques:
– Remove outliers
– Transform data
– Use robust statistics
– Cap/floor values
• Common MCQ Tip: Outliers can dramatically affect mean but not median
Quick Summary Checklist:
• ✓Random variables map outcomes to numbers
• ✓Correlation measures linear relationship strength
• ✓Covariance shows direction of relationship
• ✓Outliers are extreme values that need special handling
• ✓IQR and Z-score are common outlier detection methods

4
Session 7 & 8: Probability Distributions

Probability Distribution and Data


• Definition: Mathematical function describing likelihood of different outcomes
• Purpose: Model uncertainty and variability in data

Continuous Distributions
Uniform Distribution
• Definition: All values in an interval are equally likely
• Parameters: a (minimum), b (maximum)
• PDF: f ( x ) = 1
b− a for a ≤ x ≤ b
a+b
• Mean: 2
( b − a )2
• Variance: 12
• Example: Random number generation

Exponential Distribution
• Definition: Models time between events in Poisson process
• Parameter: λ (rate parameter)
• PDF: f ( x ) = λe−λx for x ≥ 0
1
• Mean: λ
1
• Variance: λ2
• Application: Reliability analysis, waiting times
• Common MCQ Tip: Memoryless property - P( X > s + t | X > s) = P( X > t)

Normal Distribution
• Definition: Bell-shaped, symmetric distribution
• Parameters: µ (mean), σ (standard deviation)
( x − µ )2
• PDF: f ( x ) = √1 e− 2σ2
σ 2π
• Properties:
– 68% data within 1σ
– 95% data within 2σ
– 99.7% data within 3σ
• Standard Normal: µ = 0, σ = 1
• Common MCQ Tip: Central Limit Theorem makes normal distribution fundamental

Discrete Distributions
Binomial Distribution
• Definition: Number of successes in n independent trials

5
• Parameters: n (trials), p (success probability)
• PMF: P( X = k) = C (n, k ) × pk × (1 − p)n−k
• Mean: np
• Variance: np(1 − p)
• Application: Quality control, survey analysis

Poisson Distribution
• Definition: Number of events in fixed time/space interval
• Parameter: λ (average rate)
λk × e−λ
• PMF: P( X = k ) = k!
• Mean: λ
• Variance: λ
• Application: Call center arrivals, defect counting
• Common MCQ Tip: Approximates binomial when n is large, p is small

Geometric Distribution
• Definition: Number of trials until first success
• Parameter: p (success probability)
• PMF: P( X = k) = (1 − p)k−1 × p
1
• Mean: p
1− p
• Variance: p2

• Application: Reliability testing, quality control


Quick Summary Checklist:
• ✓Continuous distributions: Uniform, Exponential, Normal
• ✓Discrete distributions: Binomial, Poisson, Geometric
• ✓Normal distribution has 68-95-99.7 rule
• ✓Poisson approximates binomial under certain conditions
• ✓Each distribution has specific parameters and applications

Session 9 & 10: Descriptive Statistical Measures

Descriptive Statistical Measures


• Definition: Numerical summaries that describe data characteristics
• Purpose: Provide quick understanding of data distribution and central tendencies

Summary Statistics - Central Tendency


Mean
• Definition: Arithmetic average of all values

6
∑ xi
• Formula: x̄ = n
• Properties: Sensitive to outliers, uses all data points
• Common MCQ Tip: Mean can be misleading with skewed data

Median
• Definition: Middle value when data is arranged in order
• Calculation:
– Odd n: Middle value
– Even n: Average of two middle values
• Properties: Robust to outliers, represents 50th percentile
• Key Concept: Better than mean for skewed distributions

Mode
• Definition: Most frequently occurring value
• Types:
– Unimodal: One mode
– Bimodal: Two modes
– Multimodal: Multiple modes
• Properties: Can be used for categorical data
• Common MCQ Tip: A dataset can have no mode, one mode, or multiple modes

Summary Statistics - Dispersion


Range
• Definition: Difference between maximum and minimum values
• Formula: Range = Max - Min
• Properties: Simple but sensitive to outliers
• Limitation: Doesn’t consider distribution of middle values

Interquartile Range (IQR)


• Definition: Difference between 75th and 25th percentiles
• Formula: IQR = Q3 - Q1
• Properties: Robust to outliers, measures middle 50% spread
• Application: Outlier detection (values beyond Q1 − 1.5 × IQR or Q3 + 1.5 × IQR)

Quartiles
• Definition: Values that divide data into four equal parts
• Q1 (25th percentile): 25% of data below this value
• Q2 (50th percentile): Median
• Q3 (75th percentile): 75% of data below this value

7
Percentiles
• Definition: Values below which a certain percentage of data falls
• Example: 90th percentile means 90% of data is below this value
• Application: Standardized test scores, growth charts

Standard Deviation
• Definition: Average distance of data points from mean
√ √
( x − µ )2 ( x − x̄ )2
• Formula: σ = ∑ iN (population), s = ∑ ni−1 (sample)
• Properties: Same units as original data, sensitive to outliers
• Key Concept: Measures typical deviation from average

Variance
• Definition: Average of squared deviations from mean
∑ ( x i − µ )2 ∑( xi − x̄ )2
• Formula: σ2 = N (population), s2 = n −1 (sample)
• Properties: Units are squared, always non-negative

• Relationship: Standard deviation = Variance

Coefficient of Variation
• Definition: Relative measure of variability
( )
• Formula: CV = Standard Deviation
Mean × 100%
• Purpose: Compare variability across different datasets or units
• Common MCQ Tip: Useful when comparing datasets with different scales
Quick Summary Checklist:
• ✓Central tendency: Mean, Median, Mode
• ✓Dispersion: Range, IQR, Standard deviation, Variance
• ✓Median is robust to outliers, mean is not
• ✓IQR measures middle 50% spread
• ✓Coefficient of variation allows relative comparison
• ✓Standard deviation has same units as data

Session 11 & 12: Sampling and Estimation

Sample & Population


• Population: Complete set of all items of interest
• Sample: Subset of population used for analysis
• Key Concept: We study samples to make inferences about populations
• Common MCQ Tip: Sample statistics estimate population parameters

8
Uni-variate and Bi-variate Sampling
Uni-variate Sampling
• Definition: Sampling involving one variable
• Purpose: Estimate population parameter for single characteristic
• Example: Average height of students

Bi-variate Sampling
• Definition: Sampling involving two variables simultaneously
• Purpose: Study relationship between two characteristics
• Example: Relationship between study hours and exam scores

Re-sampling
• Definition: Repeatedly drawing samples from original sample
• Types:
– Bootstrap: Sample with replacement
– Cross-validation: Sample without replacement
• Purpose: Estimate sampling distribution, validate models
• Application: Confidence intervals, model validation

Central Limit Theorem


• Definition: Sample means approach normal distribution as sample size increases
• Key Properties:
– Works regardless of population distribution shape
– Mean of sample means = population mean
– Standard error = √σ
n

• Conditions: Sample size ≥ 30 (rule of thumb)


( )
• Formula: X̄ ∼ N µ, σn
2

• Common MCQ Tip: Fundamental theorem enabling statistical inference


• Application: Hypothesis testing, confidence intervals

Sampling Techniques
• Simple Random Sampling: Every item has equal selection probability
• Systematic Sampling: Select every kth item
• Stratified Sampling: Divide population into groups, sample from each
• Cluster Sampling: Select entire groups randomly
• Convenience Sampling: Select easily accessible items
Quick Summary Checklist:
• ✓Population vs Sample distinction is fundamental

9
• ✓Uni-variate: one variable, Bi-variate: two variables
• ✓Re-sampling helps estimate uncertainty
• ✓Central Limit Theorem enables normal approximation
• ✓Sample size ≥ 30 typically sufficient for CLT
• ✓Multiple sampling techniques available

Session 13 & 14: Statistical Inference and Hypothesis Testing

Statistical Inference Terminology


Types of Errors
• Type I Error (α): Rejecting true null hypothesis (False positive)
• Type II Error (β): Accepting false null hypothesis (False negative)
• Power: 1 − β = Probability of correctly rejecting false null hypothesis
• Common MCQ Tip: Remember α is significance level, typically 0.05

Tails of Test
• One-tailed Test: Tests direction of difference (>, <>)
• Two-tailed Test: Tests existence of difference (̸=)
• Critical Region: Area where null hypothesis is rejected
• Key Concept: Choice depends on research question

Confidence Intervals
• Definition: Range of plausible values for population parameter
• Formula: Point estimate ± Margin of error
• Interpretation: 95% CI means 95% of such intervals contain true parameter
• Common MCQ Tip: Confidence level = 1 − α

Hypothesis Testing
• Null Hypothesis (H0 ): Statement of no effect or difference
• Alternative Hypothesis (H1 ): Statement we want to prove
• p-value: Probability of observing data if H0 is true
• Decision Rule: If p-value < α, reject H0

Parametric Tests
ANOVA (Analysis of Variance)
• Purpose: Compare means of three or more groups
• Assumptions:
– Normal distribution
– Equal variances

10
– Independent observations
• Types:
– One-way ANOVA: One factor
– Two-way ANOVA: Two factors
• F-statistic: Ratio of between-group to within-group variance
• Common MCQ Tip: ANOVA tests equality of means, not individual differences

t-test
• Purpose: Compare means when population standard deviation is unknown
• Types:
– One-sample t-test: Compare sample mean to known value
– Two-sample t-test: Compare two group means
– Paired t-test: Compare paired observations
• Assumptions: Normal distribution, independent observations
x̄ −µ
• Formula: t = √
s/ n

Non-parametric Tests
Chi-Square Test
• Purpose: Test relationships in categorical data
• Types:
– Goodness of fit: Compare observed vs expected frequencies
– Test of independence: Test relationship between variables
(Observed−Expected)2
• Formula: χ2 = ∑ Expected
• Assumptions: Expected frequency ≥ 5 in each cell
• Common MCQ Tip: Used when data doesn’t meet parametric assumptions

U-Test (Mann-Whitney)
• Purpose: Non-parametric alternative to two-sample t-test
• Application: Compare two groups when normality assumption violated
• Advantages: Robust to outliers, doesn’t require normal distribution
• Test statistic: Based on ranks of combined data
Quick Summary Checklist:
• ✓Type I error: False positive, Type II error: False negative
• ✓Confidence intervals provide range of plausible values
• ✓ANOVA compares multiple group means
• ✓t-tests used when population σ unknown
• ✓Chi-square for categorical data relationships
• ✓Non-parametric tests don’t assume normal distribution

11
Session 15 & 16: Predictive Modelling

Predictive Modelling (From Correlation to Supervised Segmentation)


• Definition: Using data to predict future outcomes or classify observations
• Goal: Build models that generalize well to new, unseen data

Identifying Informative Attributes


• Purpose: Select variables that provide predictive power
• Methods:
– Correlation analysis: Linear relationships
– Mutual information: Non-linear dependencies
– Feature importance: Model-based rankings
• Key Concept: Not all variables are equally useful for prediction
• Common MCQ Tip: More features ̸= better model (curse of dimensionality)

Segmenting Data by Progressive Attributive


• Definition: Divide data into meaningful groups based on attributes
• Process: Sequential splitting based on most informative variables
• Benefit: Creates homogeneous subgroups with similar characteristics
• Application: Customer segmentation, risk profiling

Models
• Definition: Mathematical representations of real-world processes
• Types:
– Linear models: Linear relationships
– Tree models: Hierarchical decisions
– Ensemble models: Combine multiple models
• Purpose: Capture patterns in data for prediction

Induction and Prediction


• Induction: Learning general rules from specific examples
• Prediction: Applying learned rules to new cases
• Training phase: Model learns from historical data
• Testing phase: Model performance evaluated on new data
• Key Concept: Good models balance bias and variance

Supervised Segmentation
• Definition: Partitioning data using target variable information
• Goal: Create segments that are homogeneous in target variable

12
• Advantage: Segments directly relate to prediction objective
• Example: Decision trees create segments based on target purity

Visualizing Segmentations
• Purpose: Understand how model partitions data space
• Methods:
– Decision boundaries: Show classification regions
– Tree diagrams: Show hierarchical splits
– Scatter plots: Show segment separation
• Benefit: Interpretability and model validation

Trees as Set of Rules


• Concept: Decision trees can be expressed as if-then rules
• Format: If (condition1 AND condition2) Then (prediction)
• Advantage: Human-readable decision logic
• Example: If (age > 30 AND income > 50K) Then (loan approved)

Probability Estimation
• Purpose: Provide confidence measures with predictions
• Methods:
– Class probabilities: P(class|features)
– Prediction intervals: Range of likely values
– Confidence scores: Model certainty measures
• Application: Risk assessment, decision making under uncertainty
• Common MCQ Tip: Probability estimation helps quantify prediction uncertainty
Quick Summary Checklist:
• ✓Predictive modeling uses historical data to predict future outcomes
• ✓Feature selection identifies most informative variables
• ✓Supervised segmentation uses target variable for partitioning
• ✓Trees can be converted to interpretable rules
• ✓Visualization helps understand model behavior
• ✓Probability estimation quantifies prediction uncertainty

Session 17: Simulation and Risk Analysis

Simulation and Risk Analysis


• Definition: Using mathematical models to imitate real-world processes
• Purpose: Analyze complex systems and quantify uncertainty
• Application: Financial risk, operations research, project management

13
Monte Carlo Simulation
• Definition: Computational method using random sampling to solve problems
• Process:
1. Define probability distributions for input variables
2. Generate random samples from distributions
3. Calculate outcomes for each sample
4. Analyze distribution of results
• Key Concept: Uses randomness to solve deterministic problems
• Applications:
– Finance: Portfolio risk, option pricing
– Eitemngineering: Reliability analysis
– PProject management: Schedule risk
• Common MCQ Tip: More simulations = more accurate results

Optimization, Linear
• Definition: Finding best solution subject to constraints
• Linear Programming: Objective function and constraints are linear
• Components:
– Objective function: What to maximize/minimize
– DItem itemecision variables: What we can control
– Citemonstraints: Limitations or requirements
• Methods:
– Graphical method: For 2-variable problems
– Simplex item method: For multi-variable problems
• Applications: Resource allocation, production planning, transportation

Risk Analysis Components


• Risk identification: What can go wrong?
• Risk quantification: How likely and severe?
• Risk mitigation: How to reduce impact?
• Sensitivity analysis: Which variables matter most?
Quick Summary Checklist:
• ✓Monte Carlo uses random sampling for complex analysis
• ✓Simulation helps analyze systems too complex for analytical solutions
• ✓Linear optimization finds best solution within constraints
• ✓Risk analysis quantifies uncertainty and impact
• ✓More simulations generally provide better accuracy

14
Session 18 & 19: Decision Analytics

Decision Analytics
• Definition: Use of data and analytical techniques to support decision-making
• Goal: Optimize decisions by combining data insights with business objectives

Evaluating Classifiers
• Purpose: Assess how well classification models perform
• Key Metrics:
TP+TN
– Accuracy: TTP+TN+FP+FN
TP
– Precision: TP+FP - How many predicted positives are correct
– Recall/Sensitivity: TP P+TFN - How many actual positives found
N
– Specificity: TN+FP - How many actual negatives found
TPrecision×Recall
– F1-score: 2 × PPrecision+Recall
• Confusion Matrix: Table showing actual vs predicted classifications
• Common MCQ Tip: High accuracy doesn’t always mean good model (class imbalance)

Analytical Framework
• Definition: Structured approach to analytical decision-making
• Components:
1. Problem definition: What decision needs to be made?
2. Data collection: What information is available?
3. An itemalysis: What patterns exist in data?
4. Ins dataights: What do patterns mean for business?
5. Action items: What should be done based on insights?

Evaluation
• Model evaluation: How well does model perform?
• Business evaluation: Does model create value?
• Methods:
– : - Split data to test generalization
– Hold-out validation: - Reserve data for final testing
– A/B testing : Compare model performance in practice
• Key Concept: Technical performance must align with business value

Baseline
• s: Definition Simple benchmark for comparison
• Purpose: Establish minimum performance standard
• Examples:

15
– Random guessinging: For classification
– Mean prediction: for regression
– Previoustext period: for time series
• Common MCQ Tip: t Good models should significantly outperform baseline

Performance and Implications for Investments in Data


• ROI of Analytics: Return on investment from analytical projects
• Value drivers:
– reduction: - Efficiency improvements
– increase: Better targeting, pricing targeting
– itemRisk mitigation: Fraud detection, compliance detection
• Investment considerations:
– quality: Poor data = poor results
– itemInfrastructure: Technology and talent needs
– itemChange management: Adoption and implementation
• Key Concept: Data investment must be justified by business value

Decision Support Elements


• Descriptive: What happened? (Reporting, dashboards)
• Diagnostic: Why did it happen? (Root cause analysis)
• Predictive: What will happen? (Forecasting, modeling)
• Prescriptive: What should we do? (Optimization, recommendations)
Quick Summary Checklist:
• ✓Decision analytics combines data insights with business decisions
• ✓Classifier evaluation uses multiple metrics beyond accuracy
• ✓Analytical framework provides structured decision approach
• ✓Baseline establishes minimum performance standard
• ✓Analytics investment must demonstrate business ROI
• ✓Four levels: Descriptive, Diagnostic, Predictive, Prescriptive

Session 20 & 21: Evidence and Probabilities

Evidence and Probabilities


• Definition: Using probabilistic reasoning to combine and evaluate evidence
• Goal: Make informed decisions under uncertainty

Explicit Evidence Combination with Bayes Rule


• Purpose: Update beliefs as new evidence becomes available
P( E| H )× P( H )
• Bayes’ Rule: P( H | E) = P( E)

16
• Components:
– P( H | E): Posterior probability (updated belief)
– P( E | H ): Likelihood (evidence given hypothesis)
– P( H ): Prior probability (initial belief)
– P( E): Evidence probability (normalization factor)

Sequential Evidence Updates


• Process: Apply Bayes’ rule repeatedly as new evidence arrives
• Formula: Today’s posterior becomes tomorrow’s prior
• Key Concept: Evidence accumulates to refine probability estimates
• Application: Medical diagnosis, fraud detection, spam filtering

Probabilistic Reasoning
• Definition: Making logical inferences under uncertainty
• Principles:
– Uncertainty quantification: Express beliefs as probabilities
– Evidence integration: Combine multiple information sources
– Decision under risk: Choose actions based on expected outcomes

Evidence Quality
• Reliability: How trustworthy is the source?
• Relevance: How related is evidence to hypothesis?
• Independence: Are evidence sources correlated?
• Completeness: What evidence might be missing?

Combining Multiple Evidence Sources


• Assumption: Evidence sources are independent
• Method: Apply Bayes’ rule sequentially
• Challenge: Handling dependent evidence sources
• Common MCQ Tip: Independence assumption is often violated in practice

Applications
• Medical diagnosis: Combine symptoms, tests, patient history
• Legal reasoning: Evaluate evidence strength in court cases
• Intelligence analysis: Assess threat levels from multiple sources
• Quality control: Combine inspection results

Limitations
• Prior specification: How to set initial probabilities?

17
• Computational complexity: Many hypotheses and evidence types
• Independence assumptions: Often unrealistic
• Subjective probabilities: Different experts, different priors
Quick Summary Checklist:
• ✓Bayes’ rule updates probabilities with new evidence
• ✓Sequential updates allow continuous learning
• ✓Evidence quality affects reasoning accuracy
• ✓Independence assumption simplifies but may be unrealistic
• ✓Applications span medical, legal, and business domains
• Prior specification remains challenging in practice

Session 22: Business Strategy

Business Strategy
• Definition: Long-term plan for achieving competitive advantage
• Goal: Create sustainable value for stakeholders

Achieving Competitive Advantages


• Definition: Outperforming competitors consistently
• Sources of advantage:
– Cost leadership: Lowest cost producer
– Differentiation: Unique value proposition
– Focus strategy: Niche market specialization
– Data-driven advantage: Superior analytics capabilities

Data as Competitive Advantage


• Unique datasets: Proprietary information competitors lack
• Analytical capabilities: Better insights from same data
• Speed of analysis: Faster decision-making cycles
• Predictive accuracy: Superior forecasting abilities
• Personalization: Customized products/services

Sustaining Competitive Advantages


• Definition: Maintaining advantages over time despite competition
• Challenges:
item imitation: Competitors copy successful strategies item Substitution: New technolo-
gies replace current advantages item Market changes: Customer preferences evolve item
Resource constraints: Limited investment capacity

18
Sustainability Mechanisms
• Network effects: Value increases with more users
• Learning curves: Experience reduces costs over time
• Brand loyalty: Customer switching costs
• Regulatory barriers: Legal protection of advantages
• Continuous innovation: Constant improvement and development

Analytics for Strategy


• Market analysis: Understanding competitive landscape
• Customer segmentation: Identifying valuable segments
• Performance measurement: Tracking strategic progress
• Scenario planning: Preparing for different futures
• Resource optimization: Allocating capabilities effectively

Strategic Analytics Applications


• Pricing strategy: Optimize revenue and market share
• Customer lifetime value: Focus on most profitable customers
• Market entry: Assess new market opportunities
• Competitive intelligence: Monitor competitor actions
• Risk management: Identify and mitigate strategic risks

Digital Transformation Strategy


• Data monetization: Creating revenue from data assets
• Platform strategies: Building ecosystem advantages
• Agile analytics: Rapid experimentation and learning
• Cultural change: Building data-driven decision culture
• Technology investment: Infrastructure for competitive advantage
Quick Summary Checklist:
• ✓Competitive advantage requires outperforming competitors consistently
• ✓Data and analytics can create unique competitive advantages
• ✓Sustaining advantages requires continuous innovation and barriers
• ✓Analytics supports strategic decision-making across functions
• ✓Digital transformation enables new forms of competitive advantage
• ✓Strategy must evolve with changing market conditions

19
Session 23: Factor Analysis and Directional Data Analytics

Factor Analysis
• Definition: Statistical technique to identify underlying factors that explain correlations
among variables
• Purpose: Reduce dimensionality while preserving information
• Goal: Find latent (hidden) variables that influence observed variables

Key Concepts
• Factors: Unobserved variables that influence multiple observed variables
• Factor loadings: Relationships between factors and observed variables
• Communality: Proportion of variable’s variance explained by factors
• Eigenvalues: Measure of variance explained by each factor
• Factor rotation: Method to make factors more interpretable

Types of Factor Analysis


• Exploratory Factor Analysis (EFA): Discover underlying factor structure
• Confirmatory Factor Analysis (CFA): Test hypothesized factor structure
• Principal Component Analysis (PCA): Similar technique focusing on variance maximization

Steps in Factor Analysis


1. Correlation matrix: Examine relationships between variables
2. Factor extraction: Determine number of factors
3. Factor rotation: Make factors interpretable
4. Factor interpretation: Assign meaning to factors
5. Factor scores: Calculate individual factor values

Extraction Methods
• Principal Component Method: Most common, maximizes variance explained
• Principal Axis Factoring: Focuses on shared variance only
• Maximum Likelihood: Assumes multivariate normal distribution
• Common MCQ Tip: Kaiser criterion (eigenvalue > 1) for factor selection

Rotation Methods
• Orthogonal rotation: Factors remain uncorrelated
– Varimax: Maximizes variance of loadings
– Quartimax: Simplifies variables
• Oblique rotation: Allows factor correlation
– Promax: Faster oblique method
– Direct oblimin: Flexible oblique method

20
Applications
• Psychology: Intelligence, personality factors
• Marketing: Customer attitude dimensions
• Finance: Risk factors in portfolios
• Quality control: Process variation sources

Directional Data Analytics


• Definition: Analysis of data with directional or circular characteristics
• Examples: Wind directions, compass bearings, time of day, seasonal patterns
• Challenge: Traditional statistics don’t apply to circular data

Circular Data Characteristics


• Periodicity: 0◦ = 360◦ (same direction)
• No natural zero: Cannot use standard mean
• Distance measure: Shortest arc between points
• Symmetry: Distribution can wrap around circle

Circular Statistics
• Circular mean: Average direction accounting for circularity
• Circular variance: Measure of directional spread
• Von Mises distribution: Circular analog of normal distribution
• Rayleigh test: Test for uniformity of directions

Applications of Directional Analytics


• Meteorology: Wind pattern analysis
• Biology: Animal movement patterns
• Geology: Rock formation orientations
• Business: Seasonal sales patterns, cyclical behavior
• Quality control: Directional measurements in manufacturing

Analysis Techniques
• Rose diagrams: Circular histograms
• Circular correlation: Relationships between circular variables
• Circular regression: Predicting circular outcomes
• Time series: Analyzing cyclical patterns over time
Quick Summary Checklist:
• ✓Factor analysis identifies underlying latent variables
• ✓EFA discovers structure, CFA tests hypotheses
• ✓Eigenvalue > 1 rule for factor selection

21
• ✓Rotation makes factors more interpretable
• ✓Directional data requires specialized circular statistics
• ✓Applications span psychology, marketing, meteorology
• ✓Circular mean and variance account for periodicity

Final Exam Preparation Tips

Common MCQ Strategies


• Read questions carefully: Look for keywords like “always,” “never,” “most likely”
• Eliminate wrong answers: Process of elimination increases success rate
• Time management: Don’t spend too much time on difficult questions
• Formula familiarity: Memorize key formulas and their applications
• Concept connections: Understand relationships between topics

Key Formulas to Remember


P( B| A)× P( A)
• Bayes’ Theorem: P( A | B) = P( B)
[( x − x̄ )(y −ȳ)]
• Correlation: r = √∑ i 2 i
∑( xi − x̄ ) ∑(yi −ȳ)2

∑ ( x i − µ )2
• Standard deviation: σ = N
( )
• Central Limit Theorem: X̄ ∼ N µ, σn
2

• Confidence interval: Point estimate ± Margin of error

Concept Review Checklist


• ✓Data analytics life cycle phases
• ✓Probability types and Bayes’ theorem
• ✓Distribution characteristics and parameters
• ✓Central tendency vs. dispersion measures
• ✓Sampling techniques and Central Limit Theorem
• ✓Hypothesis testing procedures
• ✓Predictive modeling concepts
• ✓Classification evaluation metrics
• ✓Factor analysis steps and applications

Laboratory Assignment Connections


• R programming for ETL operations
• Bayes’ theorem implementation
• Correlation and outlier analysis
• Distribution testing with scipy

22
• Statistical measures calculation
• Sampling technique exploration
• Hypothesis testing on real data
• Predictive modeling practice
• Monte Carlo simulation
• Factor analysis implementation
Best of luck with your CCEE exam!
[End of Complete Study Notes]

23

You might also like