0% found this document useful (0 votes)

19 views23 pages

Advanced Analytics

Uploaded by

burn0cis73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

Advanced Analytics

Uploaded by

burn0cis73

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

CCEE Exam Study Notes - Data Analytics

Session 1 & 2: Introduction to Analytics & Data Analytics Life Cycle

Introduction to Analytics
• Definition: Analytics is the systematic computational analysis of data to discover patterns,
trends, and insights that support decision-making
• Key Concept: Analytics transforms raw data into actionable business intelligence
• Types: Descriptive (what happened), Predictive (what will happen), Prescriptive (what
should happen)

Data Analytics Life Cycle

Definition: A structured approach to data analytics projects with sequential phases

1. Discovery
• Purpose: Understanding business requirements and available data
• Key Activities: Problem definition, data source identification, stakeholder alignment
• Common MCQ Tip: Remember this is the first phase - focuses on “what” not “how”

2. Data Preparation
• Purpose: Clean, transform, and structure data for analysis
• Key Activities: Data cleaning, integration, transformation, quality checks
• Common MCQ Tip: This phase typically takes 60–80% of project time

3. Model Planning
• Purpose: Select appropriate analytical methods and tools
• Key Activities: Algorithm selection, variable selection, model design
• Explanation: Like choosing the right tool for a job before starting work

4. Model Building Implementation

• Purpose: Develop and execute the analytical model
• Key Activities: Coding, parameter tuning, initial testing
• Common MCQ Tip: This is where actual algorithms are coded and run

5. Quality Assurance
• Purpose: Validate model accuracy and reliability
• Key Activities: Testing, validation, performance measurement
• Key Concept: Ensures model meets business requirements and statistical standards

6. Documentation
• Purpose: Record methodology, findings, and procedures

1
• Key Activities: Technical documentation, user guides, process documentation
• Common MCQ Tip: Critical for reproducibility and knowledge transfer

7. Management Approval
• Purpose: Obtain stakeholder sign-off for deployment
• Key Activities: Presentation, business case validation, risk assessment
• Explanation: Business validation before technical implementation

8. Installation
• Purpose: Deploy model into production environment
• Key Activities: System integration, performance optimization, user training
• Key Concept: Moving from development to live operational use

9. Acceptance and Operation

10. Intelligent Data Analysis
• Purpose: Advanced analytical techniques for complex insights
• Key Activities: Machine learning, AI applications, pattern recognition
• Explanation: Uses sophisticated algorithms to find hidden patterns
Quick Summary Checklist:
• ✓Analytics transforms data into insights
• ✓Life cycle has 10 sequential phases
• ✓Discovery defines the problem
• ✓Data preparation consumes most time
• ✓Documentation ensures reproducibility
• ✓Operation includes ongoing monitoring

Session 3 & 4: Probability Fundamentals

Sample Spaces and Events

• Sample Space (S): Set of all possible outcomes of an experiment
• Event: Any subset of the sample space
• Key Concept: Foundation for all probability calculations
• Example: Rolling a die - Sample space = {1, 2, 3, 4, 5, 6}, Event “even number” = {2, 4, 6}

Joint, Conditional and Marginal Probability

Joint Probability
• Definition: P( A ∩ B) = Probability that both events A and B occur
• Formula: P( A and B) = P( A) × P( B) if events are independent
• Common MCQ Tip: Look for “AND” keywords in questions

2
Conditional Probability
• Definition: P( A | B) = Probability of A given that B has occurred
P( A∩ B)
• Formula: P( A | B) = P( B)
, where P( B) ̸= 0
• Key Concept: Probability changes when we have additional information
• Common MCQ Tip: Look for “given that” or “|” symbol

Marginal Probability
• Definition: Probability of a single event, ignoring other variables
• Formula: P( A) = ∑ P( A ∩ Bi ) for all possible events Bi
• Explanation: Like finding totals in probability tables

Bayes’ Theorem
• Definition: Method to update probability based on new evidence
P( B| A)× P( A)
• Formula: P( A | B) = P( B)

• Key Components:
– P( A | B): Posterior probability
– P( B | A): Likelihood
– P( A): Prior probability
– P( B): Evidence
• Application: Medical diagnosis, spam filtering, machine learning
• Common MCQ Tip: Remember the formula structure - numerator has likelihood × prior
Quick Summary Checklist:
• ✓Sample space contains all possible outcomes
• ✓Joint probability uses AND logic
• ✓Conditional probability uses GIVEN information
• ✓Bayes’ theorem updates probabilities with evidence
• ✓Marginal probability ignores other variables

Session 5 & 6: Random Variables and Relationships

Random Variable
• Definition: A function that assigns numerical values to outcomes of random experiments
• Types:
– Discrete: Countable values (e.g., number of customers)
– Continuous: Any value in a range (e.g., height, weight)
• Key Concept: Bridge between probability theory and real-world measurements

Concepts of Correlation
• Definition: Statistical measure of linear relationship between two variables

3
• Range: −1 to +1
• Interpretation:
– +1: Perfect positive correlation
– 0: No linear correlation
– −1: Perfect negative correlation
[( x − x̄ )(y −ȳ)]
• Formula: r = √∑ i 2 i
∑( xi − x̄ ) ∑(yi −ȳ)2

• Common MCQ Tip: Correlation ̸= Causation

Covariance
• Definition: Measure of how two variables change together
• Formula: Cov( X, Y ) = E[( X − µ X )(Y − µY )]
• Key Concept:
– Positive covariance: Variables increase together
– Negative covariance: One increases as other decreases
– Zero covariance: No linear relationship
Covariance
• Relationship: Correlation = σX ×σY

Outliers
• Definition: Data points significantly different from other observations
• Detection Methods:
– IQR Method: Values below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR
– Z-score Method: | Z | > 2 or 3 (depending on threshold)
– Visual: Box plots, scatter plots
• Handling Techniques:
– Remove outliers
– Transform data
– Use robust statistics
– Cap/floor values
• Common MCQ Tip: Outliers can dramatically affect mean but not median
Quick Summary Checklist:
• ✓Random variables map outcomes to numbers
• ✓Correlation measures linear relationship strength
• ✓Covariance shows direction of relationship
• ✓Outliers are extreme values that need special handling
• ✓IQR and Z-score are common outlier detection methods

4
Session 7 & 8: Probability Distributions

Probability Distribution and Data

• Definition: Mathematical function describing likelihood of different outcomes
• Purpose: Model uncertainty and variability in data

Continuous Distributions
Uniform Distribution
• Definition: All values in an interval are equally likely
• Parameters: a (minimum), b (maximum)
• PDF: f ( x ) = 1
b− a for a ≤ x ≤ b
a+b
• Mean: 2
( b − a )2
• Variance: 12
• Example: Random number generation

Exponential Distribution
• Definition: Models time between events in Poisson process
• Parameter: λ (rate parameter)
• PDF: f ( x ) = λe−λx for x ≥ 0
1
• Mean: λ
1
• Variance: λ2
• Application: Reliability analysis, waiting times
• Common MCQ Tip: Memoryless property - P( X > s + t | X > s) = P( X > t)

Normal Distribution
• Definition: Bell-shaped, symmetric distribution
• Parameters: µ (mean), σ (standard deviation)
( x − µ )2
• PDF: f ( x ) = √1 e− 2σ2
σ 2π
• Properties:
– 68% data within 1σ
– 95% data within 2σ
– 99.7% data within 3σ
• Standard Normal: µ = 0, σ = 1
• Common MCQ Tip: Central Limit Theorem makes normal distribution fundamental

Discrete Distributions
Binomial Distribution
• Definition: Number of successes in n independent trials

5
• Parameters: n (trials), p (success probability)
• PMF: P( X = k) = C (n, k ) × pk × (1 − p)n−k
• Mean: np
• Variance: np(1 − p)
• Application: Quality control, survey analysis

Poisson Distribution
• Definition: Number of events in fixed time/space interval
• Parameter: λ (average rate)
λk × e−λ
• PMF: P( X = k ) = k!
• Mean: λ
• Variance: λ
• Application: Call center arrivals, defect counting
• Common MCQ Tip: Approximates binomial when n is large, p is small

Geometric Distribution
• Definition: Number of trials until first success
• Parameter: p (success probability)
• PMF: P( X = k) = (1 − p)k−1 × p
1
• Mean: p
1− p
• Variance: p2

• Application: Reliability testing, quality control

Quick Summary Checklist:
• ✓Continuous distributions: Uniform, Exponential, Normal
• ✓Discrete distributions: Binomial, Poisson, Geometric
• ✓Normal distribution has 68-95-99.7 rule
• ✓Poisson approximates binomial under certain conditions
• ✓Each distribution has specific parameters and applications

Session 9 & 10: Descriptive Statistical Measures

Descriptive Statistical Measures

• Definition: Numerical summaries that describe data characteristics
• Purpose: Provide quick understanding of data distribution and central tendencies

Summary Statistics - Central Tendency

Mean
• Definition: Arithmetic average of all values

6
∑ xi
• Formula: x̄ = n
• Properties: Sensitive to outliers, uses all data points
• Common MCQ Tip: Mean can be misleading with skewed data

Median
• Definition: Middle value when data is arranged in order
• Calculation:
– Odd n: Middle value
– Even n: Average of two middle values
• Properties: Robust to outliers, represents 50th percentile
• Key Concept: Better than mean for skewed distributions

Mode
• Definition: Most frequently occurring value
• Types:
– Unimodal: One mode
– Bimodal: Two modes
– Multimodal: Multiple modes
• Properties: Can be used for categorical data
• Common MCQ Tip: A dataset can have no mode, one mode, or multiple modes

Summary Statistics - Dispersion

Range
• Definition: Difference between maximum and minimum values
• Formula: Range = Max - Min
• Properties: Simple but sensitive to outliers
• Limitation: Doesn’t consider distribution of middle values

Interquartile Range (IQR)

• Definition: Difference between 75th and 25th percentiles
• Formula: IQR = Q3 - Q1
• Properties: Robust to outliers, measures middle 50% spread
• Application: Outlier detection (values beyond Q1 − 1.5 × IQR or Q3 + 1.5 × IQR)

Quartiles
• Definition: Values that divide data into four equal parts
• Q1 (25th percentile): 25% of data below this value
• Q2 (50th percentile): Median
• Q3 (75th percentile): 75% of data below this value

7
Percentiles
• Definition: Values below which a certain percentage of data falls
• Example: 90th percentile means 90% of data is below this value
• Application: Standardized test scores, growth charts

Standard Deviation
• Definition: Average distance of data points from mean
√ √
( x − µ )2 ( x − x̄ )2
• Formula: σ = ∑ iN (population), s = ∑ ni−1 (sample)
• Properties: Same units as original data, sensitive to outliers
• Key Concept: Measures typical deviation from average

Variance
• Definition: Average of squared deviations from mean
∑ ( x i − µ )2 ∑( xi − x̄ )2
• Formula: σ2 = N (population), s2 = n −1 (sample)
• Properties: Units are squared, always non-negative
√
• Relationship: Standard deviation = Variance

Coeﬀicient of Variation
• Definition: Relative measure of variability
( )
• Formula: CV = Standard Deviation
Mean × 100%
• Purpose: Compare variability across different datasets or units
• Common MCQ Tip: Useful when comparing datasets with different scales
Quick Summary Checklist:
• ✓Central tendency: Mean, Median, Mode
• ✓Dispersion: Range, IQR, Standard deviation, Variance
• ✓Median is robust to outliers, mean is not
• ✓IQR measures middle 50% spread
• ✓Coeﬀicient of variation allows relative comparison
• ✓Standard deviation has same units as data

Session 11 & 12: Sampling and Estimation

Sample & Population

• Population: Complete set of all items of interest
• Sample: Subset of population used for analysis
• Key Concept: We study samples to make inferences about populations
• Common MCQ Tip: Sample statistics estimate population parameters

8
Uni-variate and Bi-variate Sampling
Uni-variate Sampling
• Definition: Sampling involving one variable
• Purpose: Estimate population parameter for single characteristic
• Example: Average height of students

Bi-variate Sampling
• Definition: Sampling involving two variables simultaneously
• Purpose: Study relationship between two characteristics
• Example: Relationship between study hours and exam scores

Re-sampling
• Definition: Repeatedly drawing samples from original sample
• Types:
– Bootstrap: Sample with replacement
– Cross-validation: Sample without replacement
• Purpose: Estimate sampling distribution, validate models
• Application: Confidence intervals, model validation

Central Limit Theorem

• Definition: Sample means approach normal distribution as sample size increases
• Key Properties:
– Works regardless of population distribution shape
– Mean of sample means = population mean
– Standard error = √σ
n

• Conditions: Sample size ≥ 30 (rule of thumb)

( )
• Formula: X̄ ∼ N µ, σn
2

• Common MCQ Tip: Fundamental theorem enabling statistical inference

• Application: Hypothesis testing, confidence intervals

Sampling Techniques
• Simple Random Sampling: Every item has equal selection probability
• Systematic Sampling: Select every kth item
• Stratified Sampling: Divide population into groups, sample from each
• Cluster Sampling: Select entire groups randomly
• Convenience Sampling: Select easily accessible items
Quick Summary Checklist:
• ✓Population vs Sample distinction is fundamental

9
• ✓Uni-variate: one variable, Bi-variate: two variables
• ✓Re-sampling helps estimate uncertainty
• ✓Central Limit Theorem enables normal approximation
• ✓Sample size ≥ 30 typically suﬀicient for CLT
• ✓Multiple sampling techniques available

Session 13 & 14: Statistical Inference and Hypothesis Testing

Statistical Inference Terminology

Types of Errors
• Type I Error (α): Rejecting true null hypothesis (False positive)
• Type II Error (β): Accepting false null hypothesis (False negative)
• Power: 1 − β = Probability of correctly rejecting false null hypothesis
• Common MCQ Tip: Remember α is significance level, typically 0.05

Tails of Test
• One-tailed Test: Tests direction of difference (>, <>)
• Two-tailed Test: Tests existence of difference (̸=)
• Critical Region: Area where null hypothesis is rejected
• Key Concept: Choice depends on research question

Confidence Intervals
• Definition: Range of plausible values for population parameter
• Formula: Point estimate ± Margin of error
• Interpretation: 95% CI means 95% of such intervals contain true parameter
• Common MCQ Tip: Confidence level = 1 − α

Hypothesis Testing
• Null Hypothesis (H0 ): Statement of no effect or difference
• Alternative Hypothesis (H1 ): Statement we want to prove
• p-value: Probability of observing data if H0 is true
• Decision Rule: If p-value < α, reject H0

Parametric Tests
ANOVA (Analysis of Variance)
• Purpose: Compare means of three or more groups
• Assumptions:
– Normal distribution
– Equal variances

10
– Independent observations
• Types:
– One-way ANOVA: One factor
– Two-way ANOVA: Two factors
• F-statistic: Ratio of between-group to within-group variance
• Common MCQ Tip: ANOVA tests equality of means, not individual differences

t-test
• Purpose: Compare means when population standard deviation is unknown
• Types:
– One-sample t-test: Compare sample mean to known value
– Two-sample t-test: Compare two group means
– Paired t-test: Compare paired observations
• Assumptions: Normal distribution, independent observations
x̄ −µ
• Formula: t = √
s/ n

Non-parametric Tests
Chi-Square Test
• Purpose: Test relationships in categorical data
• Types:
– Goodness of fit: Compare observed vs expected frequencies
– Test of independence: Test relationship between variables
(Observed−Expected)2
• Formula: χ2 = ∑ Expected
• Assumptions: Expected frequency ≥ 5 in each cell
• Common MCQ Tip: Used when data doesn’t meet parametric assumptions

U-Test (Mann-Whitney)
• Purpose: Non-parametric alternative to two-sample t-test
• Application: Compare two groups when normality assumption violated
• Advantages: Robust to outliers, doesn’t require normal distribution
• Test statistic: Based on ranks of combined data
Quick Summary Checklist:
• ✓Type I error: False positive, Type II error: False negative
• ✓Confidence intervals provide range of plausible values
• ✓ANOVA compares multiple group means
• ✓t-tests used when population σ unknown
• ✓Chi-square for categorical data relationships
• ✓Non-parametric tests don’t assume normal distribution

11
Session 15 & 16: Predictive Modelling

Predictive Modelling (From Correlation to Supervised Segmentation)

• Definition: Using data to predict future outcomes or classify observations
• Goal: Build models that generalize well to new, unseen data

Identifying Informative Attributes

• Purpose: Select variables that provide predictive power
• Methods:
– Correlation analysis: Linear relationships
– Mutual information: Non-linear dependencies
– Feature importance: Model-based rankings
• Key Concept: Not all variables are equally useful for prediction
• Common MCQ Tip: More features ̸= better model (curse of dimensionality)

Segmenting Data by Progressive Attributive

• Definition: Divide data into meaningful groups based on attributes
• Process: Sequential splitting based on most informative variables
• Benefit: Creates homogeneous subgroups with similar characteristics
• Application: Customer segmentation, risk profiling

Models
• Definition: Mathematical representations of real-world processes
• Types:
– Linear models: Linear relationships
– Tree models: Hierarchical decisions
– Ensemble models: Combine multiple models
• Purpose: Capture patterns in data for prediction

Induction and Prediction

• Induction: Learning general rules from specific examples
• Prediction: Applying learned rules to new cases
• Training phase: Model learns from historical data
• Testing phase: Model performance evaluated on new data
• Key Concept: Good models balance bias and variance

Supervised Segmentation
• Definition: Partitioning data using target variable information
• Goal: Create segments that are homogeneous in target variable

12
• Advantage: Segments directly relate to prediction objective
• Example: Decision trees create segments based on target purity

Visualizing Segmentations
• Purpose: Understand how model partitions data space
• Methods:
– Decision boundaries: Show classification regions
– Tree diagrams: Show hierarchical splits
– Scatter plots: Show segment separation
• Benefit: Interpretability and model validation

Trees as Set of Rules

• Concept: Decision trees can be expressed as if-then rules
• Format: If (condition1 AND condition2) Then (prediction)
• Advantage: Human-readable decision logic
• Example: If (age > 30 AND income > 50K) Then (loan approved)

Probability Estimation
• Purpose: Provide confidence measures with predictions
• Methods:
– Class probabilities: P(class|features)
– Prediction intervals: Range of likely values
– Confidence scores: Model certainty measures
• Application: Risk assessment, decision making under uncertainty
• Common MCQ Tip: Probability estimation helps quantify prediction uncertainty
Quick Summary Checklist:
• ✓Predictive modeling uses historical data to predict future outcomes
• ✓Feature selection identifies most informative variables
• ✓Supervised segmentation uses target variable for partitioning
• ✓Trees can be converted to interpretable rules
• ✓Visualization helps understand model behavior
• ✓Probability estimation quantifies prediction uncertainty

Session 17: Simulation and Risk Analysis

Simulation and Risk Analysis

• Definition: Using mathematical models to imitate real-world processes
• Purpose: Analyze complex systems and quantify uncertainty
• Application: Financial risk, operations research, project management

13
Monte Carlo Simulation
• Definition: Computational method using random sampling to solve problems
• Process:
1. Define probability distributions for input variables
2. Generate random samples from distributions
3. Calculate outcomes for each sample
4. Analyze distribution of results
• Key Concept: Uses randomness to solve deterministic problems
• Applications:
– Finance: Portfolio risk, option pricing
– Eitemngineering: Reliability analysis
– PProject management: Schedule risk
• Common MCQ Tip: More simulations = more accurate results

Optimization, Linear
• Definition: Finding best solution subject to constraints
• Linear Programming: Objective function and constraints are linear
• Components:
– Objective function: What to maximize/minimize
– DItem itemecision variables: What we can control
– Citemonstraints: Limitations or requirements
• Methods:
– Graphical method: For 2-variable problems
– Simplex item method: For multi-variable problems
• Applications: Resource allocation, production planning, transportation

Risk Analysis Components

• Risk identification: What can go wrong?
• Risk quantification: How likely and severe?
• Risk mitigation: How to reduce impact?
• Sensitivity analysis: Which variables matter most?
Quick Summary Checklist:
• ✓Monte Carlo uses random sampling for complex analysis
• ✓Simulation helps analyze systems too complex for analytical solutions
• ✓Linear optimization finds best solution within constraints
• ✓Risk analysis quantifies uncertainty and impact
• ✓More simulations generally provide better accuracy

14
Session 18 & 19: Decision Analytics

Decision Analytics
• Definition: Use of data and analytical techniques to support decision-making
• Goal: Optimize decisions by combining data insights with business objectives

Evaluating Classifiers
• Purpose: Assess how well classification models perform
• Key Metrics:
TP+TN
– Accuracy: TTP+TN+FP+FN
TP
– Precision: TP+FP - How many predicted positives are correct
– Recall/Sensitivity: TP P+TFN - How many actual positives found
N
– Specificity: TN+FP - How many actual negatives found
TPrecision×Recall
– F1-score: 2 × PPrecision+Recall
• Confusion Matrix: Table showing actual vs predicted classifications
• Common MCQ Tip: High accuracy doesn’t always mean good model (class imbalance)

Analytical Framework
• Definition: Structured approach to analytical decision-making
• Components:
1. Problem definition: What decision needs to be made?
2. Data collection: What information is available?
3. An itemalysis: What patterns exist in data?
4. Ins dataights: What do patterns mean for business?
5. Action items: What should be done based on insights?

Evaluation
• Model evaluation: How well does model perform?
• Business evaluation: Does model create value?
• Methods:
– : - Split data to test generalization
– Hold-out validation: - Reserve data for final testing
– A/B testing : Compare model performance in practice
• Key Concept: Technical performance must align with business value

Baseline
• s: Definition Simple benchmark for comparison
• Purpose: Establish minimum performance standard
• Examples:

15
– Random guessinging: For classification
– Mean prediction: for regression
– Previoustext period: for time series
• Common MCQ Tip: t Good models should significantly outperform baseline

Performance and Implications for Investments in Data

• ROI of Analytics: Return on investment from analytical projects
• Value drivers:
– reduction: - Eﬀiciency improvements
– increase: Better targeting, pricing targeting
– itemRisk mitigation: Fraud detection, compliance detection
• Investment considerations:
– quality: Poor data = poor results
– itemInfrastructure: Technology and talent needs
– itemChange management: Adoption and implementation
• Key Concept: Data investment must be justified by business value

Decision Support Elements

• Descriptive: What happened? (Reporting, dashboards)
• Diagnostic: Why did it happen? (Root cause analysis)
• Predictive: What will happen? (Forecasting, modeling)
• Prescriptive: What should we do? (Optimization, recommendations)
Quick Summary Checklist:
• ✓Decision analytics combines data insights with business decisions
• ✓Classifier evaluation uses multiple metrics beyond accuracy
• ✓Analytical framework provides structured decision approach
• ✓Baseline establishes minimum performance standard
• ✓Analytics investment must demonstrate business ROI
• ✓Four levels: Descriptive, Diagnostic, Predictive, Prescriptive

Session 20 & 21: Evidence and Probabilities

Evidence and Probabilities

• Definition: Using probabilistic reasoning to combine and evaluate evidence
• Goal: Make informed decisions under uncertainty

Explicit Evidence Combination with Bayes Rule

• Purpose: Update beliefs as new evidence becomes available
P( E| H )× P( H )
• Bayes’ Rule: P( H | E) = P( E)

16
• Components:
– P( H | E): Posterior probability (updated belief)
– P( E | H ): Likelihood (evidence given hypothesis)
– P( H ): Prior probability (initial belief)
– P( E): Evidence probability (normalization factor)

Sequential Evidence Updates

• Process: Apply Bayes’ rule repeatedly as new evidence arrives
• Formula: Today’s posterior becomes tomorrow’s prior
• Key Concept: Evidence accumulates to refine probability estimates
• Application: Medical diagnosis, fraud detection, spam filtering

Probabilistic Reasoning
• Definition: Making logical inferences under uncertainty
• Principles:
– Uncertainty quantification: Express beliefs as probabilities
– Evidence integration: Combine multiple information sources
– Decision under risk: Choose actions based on expected outcomes

Evidence Quality
• Reliability: How trustworthy is the source?
• Relevance: How related is evidence to hypothesis?
• Independence: Are evidence sources correlated?
• Completeness: What evidence might be missing?

Combining Multiple Evidence Sources

• Assumption: Evidence sources are independent
• Method: Apply Bayes’ rule sequentially
• Challenge: Handling dependent evidence sources
• Common MCQ Tip: Independence assumption is often violated in practice

Applications
• Medical diagnosis: Combine symptoms, tests, patient history
• Legal reasoning: Evaluate evidence strength in court cases
• Intelligence analysis: Assess threat levels from multiple sources
• Quality control: Combine inspection results

Limitations
• Prior specification: How to set initial probabilities?

17
• Computational complexity: Many hypotheses and evidence types
• Independence assumptions: Often unrealistic
• Subjective probabilities: Different experts, different priors
Quick Summary Checklist:
• ✓Bayes’ rule updates probabilities with new evidence
• ✓Sequential updates allow continuous learning
• ✓Evidence quality affects reasoning accuracy
• ✓Independence assumption simplifies but may be unrealistic
• ✓Applications span medical, legal, and business domains
• Prior specification remains challenging in practice

Session 22: Business Strategy

Business Strategy
• Definition: Long-term plan for achieving competitive advantage
• Goal: Create sustainable value for stakeholders

Achieving Competitive Advantages

• Definition: Outperforming competitors consistently
• Sources of advantage:
– Cost leadership: Lowest cost producer
– Differentiation: Unique value proposition
– Focus strategy: Niche market specialization
– Data-driven advantage: Superior analytics capabilities

Data as Competitive Advantage

• Unique datasets: Proprietary information competitors lack
• Analytical capabilities: Better insights from same data
• Speed of analysis: Faster decision-making cycles
• Predictive accuracy: Superior forecasting abilities
• Personalization: Customized products/services

Sustaining Competitive Advantages

• Definition: Maintaining advantages over time despite competition
• Challenges:
item imitation: Competitors copy successful strategies item Substitution: New technolo-
gies replace current advantages item Market changes: Customer preferences evolve item
Resource constraints: Limited investment capacity

18
Sustainability Mechanisms
• Network effects: Value increases with more users
• Learning curves: Experience reduces costs over time
• Brand loyalty: Customer switching costs
• Regulatory barriers: Legal protection of advantages
• Continuous innovation: Constant improvement and development

Analytics for Strategy

• Market analysis: Understanding competitive landscape
• Customer segmentation: Identifying valuable segments
• Performance measurement: Tracking strategic progress
• Scenario planning: Preparing for different futures
• Resource optimization: Allocating capabilities effectively

Strategic Analytics Applications

• Pricing strategy: Optimize revenue and market share
• Customer lifetime value: Focus on most profitable customers
• Market entry: Assess new market opportunities
• Competitive intelligence: Monitor competitor actions
• Risk management: Identify and mitigate strategic risks

Digital Transformation Strategy

• Data monetization: Creating revenue from data assets
• Platform strategies: Building ecosystem advantages
• Agile analytics: Rapid experimentation and learning
• Cultural change: Building data-driven decision culture
• Technology investment: Infrastructure for competitive advantage
Quick Summary Checklist:
• ✓Competitive advantage requires outperforming competitors consistently
• ✓Data and analytics can create unique competitive advantages
• ✓Sustaining advantages requires continuous innovation and barriers
• ✓Analytics supports strategic decision-making across functions
• ✓Digital transformation enables new forms of competitive advantage
• ✓Strategy must evolve with changing market conditions

19
Session 23: Factor Analysis and Directional Data Analytics

Factor Analysis
• Definition: Statistical technique to identify underlying factors that explain correlations
among variables
• Purpose: Reduce dimensionality while preserving information
• Goal: Find latent (hidden) variables that influence observed variables

Key Concepts
• Factors: Unobserved variables that influence multiple observed variables
• Factor loadings: Relationships between factors and observed variables
• Communality: Proportion of variable’s variance explained by factors
• Eigenvalues: Measure of variance explained by each factor
• Factor rotation: Method to make factors more interpretable

Types of Factor Analysis

• Exploratory Factor Analysis (EFA): Discover underlying factor structure
• Confirmatory Factor Analysis (CFA): Test hypothesized factor structure
• Principal Component Analysis (PCA): Similar technique focusing on variance maximization

Steps in Factor Analysis

1. Correlation matrix: Examine relationships between variables
2. Factor extraction: Determine number of factors
3. Factor rotation: Make factors interpretable
4. Factor interpretation: Assign meaning to factors
5. Factor scores: Calculate individual factor values

Extraction Methods
• Principal Component Method: Most common, maximizes variance explained
• Principal Axis Factoring: Focuses on shared variance only
• Maximum Likelihood: Assumes multivariate normal distribution
• Common MCQ Tip: Kaiser criterion (eigenvalue > 1) for factor selection

Rotation Methods
• Orthogonal rotation: Factors remain uncorrelated
– Varimax: Maximizes variance of loadings
– Quartimax: Simplifies variables
• Oblique rotation: Allows factor correlation
– Promax: Faster oblique method
– Direct oblimin: Flexible oblique method

20
Applications
• Psychology: Intelligence, personality factors
• Marketing: Customer attitude dimensions
• Finance: Risk factors in portfolios
• Quality control: Process variation sources

Directional Data Analytics

• Definition: Analysis of data with directional or circular characteristics
• Examples: Wind directions, compass bearings, time of day, seasonal patterns
• Challenge: Traditional statistics don’t apply to circular data

Circular Data Characteristics

• Periodicity: 0◦ = 360◦ (same direction)
• No natural zero: Cannot use standard mean
• Distance measure: Shortest arc between points
• Symmetry: Distribution can wrap around circle

Circular Statistics
• Circular mean: Average direction accounting for circularity
• Circular variance: Measure of directional spread
• Von Mises distribution: Circular analog of normal distribution
• Rayleigh test: Test for uniformity of directions

Applications of Directional Analytics

• Meteorology: Wind pattern analysis
• Biology: Animal movement patterns
• Geology: Rock formation orientations
• Business: Seasonal sales patterns, cyclical behavior
• Quality control: Directional measurements in manufacturing

Analysis Techniques
• Rose diagrams: Circular histograms
• Circular correlation: Relationships between circular variables
• Circular regression: Predicting circular outcomes
• Time series: Analyzing cyclical patterns over time
Quick Summary Checklist:
• ✓Factor analysis identifies underlying latent variables
• ✓EFA discovers structure, CFA tests hypotheses
• ✓Eigenvalue > 1 rule for factor selection

21
• ✓Rotation makes factors more interpretable
• ✓Directional data requires specialized circular statistics
• ✓Applications span psychology, marketing, meteorology
• ✓Circular mean and variance account for periodicity

Final Exam Preparation Tips

Common MCQ Strategies

• Read questions carefully: Look for keywords like “always,” “never,” “most likely”
• Eliminate wrong answers: Process of elimination increases success rate
• Time management: Don’t spend too much time on diﬀicult questions
• Formula familiarity: Memorize key formulas and their applications
• Concept connections: Understand relationships between topics

Key Formulas to Remember

P( B| A)× P( A)
• Bayes’ Theorem: P( A | B) = P( B)
[( x − x̄ )(y −ȳ)]
• Correlation: r = √∑ i 2 i
∑( xi − x̄ ) ∑(yi −ȳ)2
√
∑ ( x i − µ )2
• Standard deviation: σ = N
( )
• Central Limit Theorem: X̄ ∼ N µ, σn
2

• Confidence interval: Point estimate ± Margin of error

Concept Review Checklist

• ✓Data analytics life cycle phases
• ✓Probability types and Bayes’ theorem
• ✓Distribution characteristics and parameters
• ✓Central tendency vs. dispersion measures
• ✓Sampling techniques and Central Limit Theorem
• ✓Hypothesis testing procedures
• ✓Predictive modeling concepts
• ✓Classification evaluation metrics
• ✓Factor analysis steps and applications

Laboratory Assignment Connections

• R programming for ETL operations
• Bayes’ theorem implementation
• Correlation and outlier analysis
• Distribution testing with scipy

22
• Statistical measures calculation
• Sampling technique exploration
• Hypothesis testing on real data
• Predictive modeling practice
• Monte Carlo simulation
• Factor analysis implementation
Best of luck with your CCEE exam!
[End of Complete Study Notes]

Institute of Management Technology: PGDM, PDGM (Finance) & PDGM (Marketing) Term - I, AY 2019-2020 Course Handout
No ratings yet
Institute of Management Technology: PGDM, PDGM (Finance) & PDGM (Marketing) Term - I, AY 2019-2020 Course Handout
8 pages
Statistics Guide
No ratings yet
Statistics Guide
27 pages
Data Analyticsi Foundations
No ratings yet
Data Analyticsi Foundations
540 pages
Static Tics
No ratings yet
Static Tics
47 pages
Probability&stats
No ratings yet
Probability&stats
12 pages
Statistics W
No ratings yet
Statistics W
11 pages
Statistics
No ratings yet
Statistics
36 pages
Data Modelling Visualization Solutions Marking Scheme
No ratings yet
Data Modelling Visualization Solutions Marking Scheme
6 pages
Previous QP
No ratings yet
Previous QP
14 pages
Complete Data Analysts RoadMap
No ratings yet
Complete Data Analysts RoadMap
47 pages
Stats and Maths For Data Analyst
No ratings yet
Stats and Maths For Data Analyst
23 pages
Ds 5 Marks Final
No ratings yet
Ds 5 Marks Final
11 pages
MECH 262 - Notes (Statistics)
No ratings yet
MECH 262 - Notes (Statistics)
7 pages
Probability & Statistics Guide
No ratings yet
Probability & Statistics Guide
10 pages
QMM File GPT
No ratings yet
QMM File GPT
14 pages
MS Theory Exam Study Guide
No ratings yet
MS Theory Exam Study Guide
50 pages
Das FFFF
No ratings yet
Das FFFF
16 pages
Notes
No ratings yet
Notes
12 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
69 pages
Probability Cheat Sheet
No ratings yet
Probability Cheat Sheet
6 pages
Statistics Notes 1702100127
No ratings yet
Statistics Notes 1702100127
22 pages
Business Statistics Course Guide
No ratings yet
Business Statistics Course Guide
5 pages
Statistics Karthik Sriram
No ratings yet
Statistics Karthik Sriram
53 pages
DS ML Probability Statistics Interview
No ratings yet
DS ML Probability Statistics Interview
6 pages
Notes
No ratings yet
Notes
18 pages
ANALYST Sources
No ratings yet
ANALYST Sources
23 pages
Statics For Manegment
No ratings yet
Statics For Manegment
9 pages
2 A Statistics
No ratings yet
2 A Statistics
34 pages
Year 1 Statistics Guide
No ratings yet
Year 1 Statistics Guide
49 pages
Data Strategy & Project Planning Guide
No ratings yet
Data Strategy & Project Planning Guide
36 pages
Probability and Statistics
No ratings yet
Probability and Statistics
48 pages
Prob and Stats in AI Unit-4
No ratings yet
Prob and Stats in AI Unit-4
24 pages
Os2 (Dragged)
No ratings yet
Os2 (Dragged)
4 pages
Intro To Probability and Statistics
No ratings yet
Intro To Probability and Statistics
147 pages
Data Science Toolkit Overview
No ratings yet
Data Science Toolkit Overview
11 pages
IMPs & Soluion For DSCI
No ratings yet
IMPs & Soluion For DSCI
18 pages
Chapter 3&4 Stats
No ratings yet
Chapter 3&4 Stats
5 pages
Theoretical Questions in Basic Business Statistics
No ratings yet
Theoretical Questions in Basic Business Statistics
12 pages
BigData QB (C.format)
No ratings yet
BigData QB (C.format)
6 pages
Stats Quiz Cheat Sheet
No ratings yet
Stats Quiz Cheat Sheet
4 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Short
No ratings yet
Short
9 pages
Course Outline MTS 102 - Introduction To Statistics Fall 2023 DR M. Massarrat Ali Khan Revised
100% (1)
Course Outline MTS 102 - Introduction To Statistics Fall 2023 DR M. Massarrat Ali Khan Revised
5 pages
Math For AI
No ratings yet
Math For AI
34 pages
Gender Studies Notes Course 2
No ratings yet
Gender Studies Notes Course 2
18 pages
Where and When The Exam Is!!!: BM 1200 Quantitative Methods & Analytics
No ratings yet
Where and When The Exam Is!!!: BM 1200 Quantitative Methods & Analytics
11 pages
Statistics - Material
No ratings yet
Statistics - Material
12 pages
Stat Cheatsheet (Ver.2)
No ratings yet
Stat Cheatsheet (Ver.2)
2 pages
Lecture01
No ratings yet
Lecture01
76 pages
CHP00
No ratings yet
CHP00
23 pages
U3 Prob & Stat & Hypo
No ratings yet
U3 Prob & Stat & Hypo
80 pages
Statistical Methods in Data Analysis - W. J. Metzger
No ratings yet
Statistical Methods in Data Analysis - W. J. Metzger
278 pages
Econometricks-Short Guide
No ratings yet
Econometricks-Short Guide
110 pages
Osx Y4 Rka 7 S CXX 2 HQC A312 Q
No ratings yet
Osx Y4 Rka 7 S CXX 2 HQC A312 Q
3 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Lecture 1 Introduction - Syllabus - Scales
No ratings yet
Lecture 1 Introduction - Syllabus - Scales
39 pages
Quantitative Reviewer
No ratings yet
Quantitative Reviewer
4 pages
Excel Assignment
No ratings yet
Excel Assignment
13 pages
Spark Deployment Modes
No ratings yet
Spark Deployment Modes
3 pages
Set I - Company CTC
No ratings yet
Set I - Company CTC
14 pages
Hypothesis Testing Industry Deep Explanation
No ratings yet
Hypothesis Testing Industry Deep Explanation
3 pages
Hypothesis Testing Use Cases Final
No ratings yet
Hypothesis Testing Use Cases Final
3 pages
Statistical Tests Cheatsheet
No ratings yet
Statistical Tests Cheatsheet
2 pages
Types of Sampling MCQs Guide
No ratings yet
Types of Sampling MCQs Guide
2 pages
Distribution
No ratings yet
Distribution
2 pages
Q4 Math 10 Week 5 6
67% (6)
Q4 Math 10 Week 5 6
7 pages
The Prevalence of Double-Burden Malnutrition Among Registered Nurses in Iligan City
No ratings yet
The Prevalence of Double-Burden Malnutrition Among Registered Nurses in Iligan City
9 pages
Basic Statistics
No ratings yet
Basic Statistics
90 pages
Crash Maths Bronze Set B As Applied Paper For HWK
No ratings yet
Crash Maths Bronze Set B As Applied Paper For HWK
21 pages
PED 18 Activity 5
No ratings yet
PED 18 Activity 5
16 pages
Statistics Classified
No ratings yet
Statistics Classified
56 pages
Biostatistics for Health Researchers
No ratings yet
Biostatistics for Health Researchers
34 pages
Numerical Descriptive Measures 1
No ratings yet
Numerical Descriptive Measures 1
39 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
m110 Handout 12
No ratings yet
m110 Handout 12
16 pages
Data Management
No ratings yet
Data Management
36 pages
Intro To Statistics and Probability
No ratings yet
Intro To Statistics and Probability
20 pages
Measures of Central Tendency Practice 2
No ratings yet
Measures of Central Tendency Practice 2
2 pages
Chapter 3 The Nature of Statistics
No ratings yet
Chapter 3 The Nature of Statistics
16 pages
5B - Ch09 Measures of Dispersion
No ratings yet
5B - Ch09 Measures of Dispersion
43 pages
7th Math Unit 5
No ratings yet
7th Math Unit 5
75 pages
Business Research Methods - Chapter16
No ratings yet
Business Research Methods - Chapter16
27 pages
Math 10 Module Week 1 Week 4 Adm Format 1
No ratings yet
Math 10 Module Week 1 Week 4 Adm Format 1
34 pages
Math GR10 Qtr4-Module-1
100% (1)
Math GR10 Qtr4-Module-1
24 pages
FDS L1 To L8 Slides
No ratings yet
FDS L1 To L8 Slides
143 pages
Statistics - Theory Notes
No ratings yet
Statistics - Theory Notes
12 pages
GDC Revision
No ratings yet
GDC Revision
10 pages
As2021407 Pulse
No ratings yet
As2021407 Pulse
20 pages
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
No ratings yet
Additional Mathematics Results For SMK Taman SEA From 2013 - 2017
8 pages
1 s2.0 S0260691719300668 Main
No ratings yet
1 s2.0 S0260691719300668 Main
8 pages
Creating Box Plots 2
No ratings yet
Creating Box Plots 2
2 pages
Urdaneta City University: Measures of Dispersion and Shape
No ratings yet
Urdaneta City University: Measures of Dispersion and Shape
10 pages
Assignment 1 (Sol.) : Introduction To Data Analytics
No ratings yet
Assignment 1 (Sol.) : Introduction To Data Analytics
4 pages
2025 GRADE 12 MLIT INVESTIGATION Edited
No ratings yet
2025 GRADE 12 MLIT INVESTIGATION Edited
7 pages
2019 La Salle F6 Math Exam Solutions
No ratings yet
2019 La Salle F6 Math Exam Solutions
7 pages