Unit - 3
Unit - 3
In software engineering, "good data" refers to the accurate, relevant, consistent, and usable data collected during
software measurement. This data serves as the foundation for metrics and models that guide decision-making,
quality control, and process improvement.
Accuracy Data truly represents what it claims (e.g., defect counts are real and not estimated).
Consistency Data must be available in same formats across everywhere , like code size in KLOC only
Relevance Only useful data is collected (e.g., collecting number of failed test cases when evaluating test effectiveness).
Completeness No missing values in critical fields (e.g., developer name, date, defect severity).
Software process and product models (e.g., COCOMO, Function Point Analysis, Reliability Growth Models) rely
heavily on good data. Poor data leads to:
Example: In COCOMO II, if effort data from past projects is unreliable or inconsistent, any effort prediction will be
flawed.
Wrong decisions, wrong predictions or data models, wasted resources, bad quality control, low
stakeholder trust
Incident reports – documentations of unplanned events that disrupt normal operation — such as defects, failures,
crashes, or performance issues.
Data collection for incident reports involves gathering accurate, complete, and structured data about these events to
analyse root causes, identify trends, and improve software quality.
Defect Removal Efficiency (DRE) (Defects found before release / Total Defects) × 100
Field Description
Root Cause Analysis Models (like Fishbone diagrams or 5 Whys) rely on detailed incident records.
Process Improvement Models (like CMMI) use incident data to identify areas of inefficiency or risk.
Challenge Example/Explanation
In software engineering, data collection refers to the systematic process of gathering quantifiable or qualitative
information about software processes, products, or resources. This data is essential for computing software metrics
and building models to support decision-making and improvements.
Type Examples
Process Data Time taken for each phase, number of reviews, delays
Incident Data Documentation about unplanned events that hampered prod development
Surveys & Feedback Collect qualitative data like user satisfaction or team feedback
Log Analysis Analyze runtime logs, crash dumps, server logs for failures and performance issues
Version Control Systems Git, SVN – used to collect commit frequency, code churn, developer activity
Test Management Tools Selenium, JUnit, TestRail – collect pass/fail data, test coverage
1. Define Objectives
→ What metrics or models do you need the data for? (e.g., effort estimation, defect prediction)
Criterion Description
Objectivity Free from individual bias; uses standard tools and definitions
Accuracy Data matches the actual values/measurements from the software process
2. Standardize Procedures
→ Use templates, guidelines, and consistent definitions
3. Train Personnel
→ All team members understand how and what to record
Minimize bias and subjectivity in data collection by using objective criteria and standardized measures.
Avoid leading questions, ambiguous language, or subjective judgments that could influence data collection
outcomes.
8. Maintain Metadata
→ Document when, how, and by whom data was collected
Wrong decisions
Data analysis interpreting measurement data to identify trends, detect problems, evaluate performance, and
support project planning or prediction models.
Trend Analysis Observe changes over time (e.g., test coverage over sprints)
Pareto Analysis 80/20 rule: Focus on areas causing the most problems
📈 Visualization Tools:
📋 Outputs of Analysis:
Quality Reports
Performance benchmarks
bottlenecks
Normal Distribution (Gaussian) Many software metrics (like test scores, LOC) approximate this.
Binomial Distribution Used when measuring success/failure (e.g., test case pass/fail).
Log-normal Distribution Used for modeling file sizes, effort, or task durations.
📍 Example:
If defect occurrence follows a Poisson distribution, we can predict how many defects to expect in a week.
Time to next failure in a system is modeled using exponential distribution in reliability models.
For example, if a company says its website gets 50 visitors each day on average, we use hypothesis testing to look at
past visitor data and see if this claim is true or if the actual number is different.
Defining Hypotheses
Null Hypothesis (H₀): The starting assumption. For example, "The average visits are 50."
Alternative Hypothesis (H₁): The opposite, saying there is a difference. For example, "The average visits are
not 50."
To understand the Hypothesis testing firstly we need to understand the key terms which are given below:
Significance Level (α): numerical value that decides the acceptable margin of error Usually 0.05 (5%).
p-value: The chance of seeing the data if the null hypothesis is true. If this is less than α, we say the claim is
probably false.
Test Statistic: test results of a small group of people that is used to predict.
Critical Value: The cutoff point to compare with the test statistic.
Degrees of freedom: A number that depends on the data size and helps find the critical value.
This is the foundational step where you define two opposing statements about the population you are studying.
Null Hypothesis (H₀): This is the default assumption, It's the hypothesis you are trying to disprove or find
evidence against.
o Example: H₀: The average height of men is equal to the average height of women.
Alternative Hypothesis (H₁ or Hₐ): It's the opposite of the null hypothesis.
o Example: H₁: The average height of men is not equal to the average height of women (two-tailed
test).
o Example: H₁: The average height of men is greater than the average height of women (one-tailed
test).
The significance level, denoted by alpha (α), is the probability of rejecting the null hypothesis when it is actually
true. This is also known as a Type I error. It's a threshold for how much risk you're willing to take of making a wrong
decision.
Common Choices: The most common values for α are 0.05 (5%), 0.01 (1%), or 0.10 (10%).
Importance: This value must be chosen before data collection and analysis to maintain objectivity.
The choice of statistical test depends on several factors related to your data and hypotheses:
Type of Data: Is your data continuous (e.g., height, temperature) or categorical (e.g., gender, yes/no
responses)?
Number of Groups: Are you comparing one group to a known value, two groups to each other, or more than
two groups?
Common Tests:
Z-tests: For comparing means when population variance is known and sample size is large.
Data Collection: Gather data using appropriate sampling methods that are representative of the population
you're studying.
Calculate Test Statistic: Based on your chosen statistical test and your collected data, compute the test
statistic. This value quantifies how much your sample results deviate from what would be expected if the
null hypothesis were true.
o Critical Value: This is a threshold value that defines the "rejection region." If your calculated test
statistic falls into this region, it's considered extreme enough to reject the null hypothesis.
State: Clearly state whether you rejected or failed to reject the null hypothesis.
Contextualize: Explain what this means in terms of your specific problem or research question.
Classical data analysis techniques involve traditional statistical methods used to summarize, interpret, and draw
conclusions from software measurement data. These techniques help identify patterns, understand process behavior,
and support data-driven decisions in software engineering.
Summarizes the central tendency and spread Mean defect count, median resolution time,
Descriptive Statistics
of data. standard deviation of LOC
Frequency
Shows how often each value appears. Number of modules with 0, 1, 2, … defects
Distribution
Trend Analysis Identifies patterns over time. Defect discovery trend across sprints
Monitor process stability over time using Track whether defect injection rate is under
Control Charts
statistical limits. control
Technique Description Example in Software Engineering
Finds abnormal data points that differ Sudden spike in resolution time for a
Outlier Detection
significantly from others. particular release
Pareto Analysis Identifies the few key factors that cause most
80% of defects come from 20% of modules
(80/20 Rule) of the problems.
Provide baseline understanding before using advanced modeling (e.g., machine learning)
⚠️Limitations
Limitation Example
Cannot handle complex patterns Can miss interactions between multiple variables
Descriptive statistics are used to summarize and describe the main features of a dataset. They help in understanding
the central tendency, dispersion, and distribution of software metrics like defects, effort, code size, etc.
🔹 Key Metrics:
Mean (average)
📌 Example:
Range = 10 − 3 = 7
➡ Use: Tells the QA manager what the typical defect count is and how much variation exists.
2. Frequency Distribution
🔹 What it does:
A frequency distribution shows how often each value (or range of values) appears in a dataset.
📌 Example:
Severity Frequency
Critical 4
High 8
Medium 15
Low 10
➡ Use: The team can focus more on High and Critical bugs first.
3. Trend Analysis
🔹 What it does:
Trend analysis identifies changes in measurement over time. It is commonly used in agile, DevOps, and QA
processes.
📌 Example:
1 12
2 10
Sprint Bugs Found
3 8
4 6
5 4
4. Correlation Analysis
Correlation analysis checks how strongly two variables are related. It produces a value between -1 and +1:
0 → no correlation
📌 Example: Suppose you have data on lines of code (LOC) and defects:
A 200 5
B 400 9
C 600 14
Correlation coefficient (r) = +0.98 → strong positive correlation btw LOC & defects.
Based on the Pareto Principle: 80% of the effects come from 20% of the causes. In software, this often means a few
modules cause most of the defects.
Module Bugs
A 30
B 20
C 10
D 5
E 2
Total bugs = 67
Top two modules (A + B) = 50 bugs ≈ 75% of all bugs
6. Outlier Detection
Defn data points significantly different from the other.
📌 Example:
Software size properties refer to the various dimensions used to measure how large or complex a software system is
across its lifecycle—from requirements to code.
Lines of code measure the number of lines in the source code of a software program. It's one of the most widely used
metrics for measuring software size. However, it can be influenced by factors like coding style and programming
language, making it less consistent across different projects.
Function points measure the size of software based on its functionalrequirements. They quantify the functionality
provided by the software in terms of inputs, outputs, inquiries, internal logical files, and external interface files.
Function points are language-independent and provide a more abstract measure of software size compared to LOC.
Object points are similar to function points but focus on object-orientedsoftware. They measure the size of software
based on the number and complexity of classes, methods, and other objects in the software design. Object points
provide a more detailed measure of size for object-oriented systems.
Software Measurements Metrics And Modelling 52 Source lines of code measure the number of lines in the source
code excluding comments and blank lines. It's a more refined version of LOC that excludes non-executable lines,
providing a more accurate measure of code size.
Physical lines of code measure the number of lines in the source code including comments and blank lines. While less
commonly used for measuring software size, PLOC provides insight into the overall size of the source code, including
documentation and comments.
6. Executable Statements:
Executable statements measure the number of statements in the source code that are executed during program
execution. It provides a measure of the complexity and functionality of the software, focusing on the logic and
control flow of the program.
7. Halstead Metrics:
Halstead metrics quantify software size based on the number of distinct operators and operands in the source code.
They include measures such as program length, program vocabulary, volume, difficulty, and effort. Halstead metrics
provide insights into the complexity and effort required to develop and maintain the software.
8. Cyclomatic Complexity:
Cyclomatic complexity measures the complexity of a software program based on the number of linearly independent
paths through its control flow graph. It provides a measure of the software's structural complexity, indicating the
number of possible execution paths and potential points of failure.
9. Information Content:
Information content measures the amount of information encoded in the software, such as the number of unique
identifiers, data structures, or algorithms. It provides a measure of the richness and complexity of the software's
design and implementation.
Component count measures the number of individual components or modules in the software, such as classes,
functions, or modules. It
3. Applications
Performance benchmarking
4. Advantages
5. Example / Limitation
A system with high documentation size but low code size may indicate poor automation or over-documentation. Size
properties alone can't always represent complexity or effort accurately.
3. Design Size
1. Definition
Quantitative measurement of the scope, scale, and complexity of a software system's design. It's a critical metric
because it provides insights into the effort, cost, quality, and maintainability of the software even before the coding
phase begins.
Early Estimation: early and accurate estimations of development effort, cost, and schedule.
Quality Assurance: By measuring design size, potential design flaws (e.g., excessive complexity, high
coupling, low cohesion) can be identified early, leading to improved software quality and reduced defects.
Progress Tracking: It helps in tracking the progress of the design phase and provides a basis for comparison
between different design alternatives.
Benchmarking: Design size metrics enable benchmarking against industry standards or past projects, aiding
in process improvement.
Risk Management: Identifying large or complex design elements can highlight potential risks that need to be
addressed.
1. Structural Metrics: focus on the structural properties of the design, such as:
o Number of Classes/Modules/Components: count of the primary building blocks of the system. More
classes usually indicate a larger design.
o Depth of Inheritance Tree (DIT): high DIT can indicate complexity and potential issues with
understanding and maintenance.
o Number of Attributes per Class: Measures the number of properties within a class.
o Number of Operations per Class: Measures the number of methods/functions within a class.
o Fan-in/Fan-out:
Fan-in: The number of modules that call a given module. High fan-in can indicate high reuse
but also a critical component whose changes could impact many others.
Fan-out: The number of modules that a given module calls. High fan-out can suggest a
complex module with many responsibilities.
o Coupling: Measures the degree of interdependence between modules or classes. Lower coupling is
generally desired as it indicates more independent and reusable components. Metrics like CBO
(Coupling Between Objects) count the number of other classes a class is coupled to.
o Cohesion: Measures how well the elements within a module belong together. Higher cohesion is
desired as it indicates a module with a clear, single responsibility. Metrics like LCOM (Lack of
Cohesion in Methods) assess how methods within a class are related to its attributes.
o Cyclomatic Complexity: While often applied to code, the underlying logic and control flow can be
analyzed at the design level, especially with detailed design models. It measures the number of
linearly independent paths through a program's source code.
o Function Point Analysis estimate software size based on the functionality delivered to the user,
rather than lines of code.
Internal Logical Files: User-identifiable logical data groups maintained within the system.
External Interface Files: User-identifiable logical data groups accessed by the system but
maintained by another system.
4. Object-Oriented Design Metrics (Chidamber & Kemerer metrics suite): This suite is specifically designed for
object-oriented systems and includes:
o Weighted Methods per Class : Sum of the complexity of all methods in a class.
o Response for a Class: Number of methods that can be invoked in response to a message to an object
of the class.
3. Applications
4. Advantages
Decision making
5. Example / Limitation
Two systems might have the same number of classes, but different interaction complexity. Design size alone may not
capture performance bottlenecks or external dependencies.
It involves understanding stakeholder needs, identifying system functionalities, Use cases, and clarifying
constraints. The goal is to produce a clear, complete, consistent, and unambiguous set of requirements that serve
as the foundation for design and development.
This refers to the amount and complexity of user and system requirements collected and documented during the
requirements phase.
2. Key Characteristics
Number of Functional This involves simply counting each distinct functional requirement. For example:
"The system shall allow users to log in," "The system shall display product details."
Number of use cases Counting the number of use cases defined for the system,
user stories user stories e.g., "As a customer, I want to search for products by category" are a direct
measure of functional scope
3. Applications
Initial effort
Resource allocation
budget estimation
4. Advantages
Early Estimation: Provides input for estimating project effort, cost, and schedule. The size of the
requirements is a primary driver of the overall software development effort.
Progress Tracking: Enables tracking the completion of the requirements phase and identifying potential
delays.
Quality Assurance: For instance, an unusually high number of requirements for a simple system might
indicate over-specification or "gold-plating."
Risk Identification: Large requirements sets can be identified as high-risk areas, requiring more careful
management.
Benchmarking: Allows comparison of requirements size across different projects to understand patterns and
improve estimation models.
Core Idea: FPA counts and weights five types of "user functions" based on their complexity (low, average, high):
External Inputs (EIs): Data or control information entering the system from outside its boundary (e.g., a data
entry screen, a file import)
External Outputs (EOs): Data or control information leaving the system to outside its boundary (e.g., reports,
error messages).
External Inquiries (EQs): Interactive requests that result in data retrieval from internal logical files (e.g., a
search function).
Internal Logical Files (ILFs): User-identifiable logical data groups maintained within the system's boundary
(e.g., a customer database table).
External Interface Files (EIFs): User-identifiable logical data groups accessed by the system but maintained by
another system (e.g., a shared corporate database).
Every components is counted and assigned a complexity rating (simple, average, complex). These weighted counts
are summed to get an "Unadjusted Function Point Count." This count is then adjusted by a "Value Adjustment
Factor" (VAF) based on 14 General System Characteristics (GSCs) like data communication, distributed processing,
performance, etc., resulting in the final Function Point (FP) count
2. COSMIC (Common Software Measurement International Consortium)
Core Idea: COSMIC measures functional size by identifying and counting "data movements" across the software
boundary. It defines a "functional process" as a unique set of processes triggered by a functional user and performing
a single purpose.
Entry (E): A data group moved from a functional user (human, hardware, or another software) into the
functional process.
Exit (X): A data group moved from the functional process to a functional user.
Read (R): A data group moved from persistent storage within the system into the functional process.
Write (W): A data group moved from the functional process to persistent storage within the system.
Process: The functional size is simply the sum of all identified data movements (each data movement counts as 1
COSMIC Function Point - CFP). There is no complexity adjustments like in FPA
Strengths: More objective and less subjective than FPA, better suited for a wider range of software types (especially
real-time and embedded), simpler counting rules.
Limitations: Less historical data available compared to FPA, still gaining widespread adoption.
Concept: These models use mathematical formulas to predict effort, schedule, and cost. Functional size (e.g.,
Function Points, COSMIC FPs) is often a key input parameter.
How they work: Historical data from past projects (their measured functional size and actual effort) are used to
calibrate the model's parameters. When a new project's functional size is known, the model can estimate its likely
effort.
Productivity Rates:
Concept: This is a simpler approach where a historical average of "functional size units delivered per unit of
effort" (e.g., Function Points per person-month, COSMIC FPs per person-hour) is used.
Limitations: Highly dependent on the stability and relevance of historical data. Assumes that future projects
will have similar characteristics and environmental factors as past ones. Productivity can vary significantly
based on team skill, technology, domain, etc.
Analogy-Based Estimation:
Concept: This method involves finding one or more past projects that are similar in functional size and
characteristics to the current project. The effort/cost of those analogous projects is then used as a basis for
estimating the new one.
Role of Functional Size: Functional size provides an objective basis for identifying "similar" projects. For
instance, if a new system is estimated to be 500 FPs, you'd look for past projects around 500 FPs in similar
domains.
Resource allocation
2. Advantages