Software Cost and Schedule Estimation
[and Tracking]
By: Dr. T V Gopal, Professor, DCSE, Anna
University
Discussion Topics
Introduction
Creating an Estimate
Identification
Size
Productivity
Parametric Models
Risks
Scheduling
Costing
Putting the Estimate Together
“Good Ideas” for Improving Estimates
Tracking Execution
Managing Estimate Changes
Conclusion
Introduction
The main purpose of the paper is to
present approaches for deriving an
estimate of the cost and schedule of
a software project
Discusses methods to track and alter
the estimates as development
progresses
Discusses ways to get a project back
on track after changes have been
made to a schedule
Creating an Estimate…
Estimates
Generally focus on labor hours, quantity
of materials and amount of services, not
the cost
This is computed later
Requires determining the work required
to meet requirements and the effort
required to perform that work
Creating an Estimate…
Step 1: Identify the tasks
They fall under four main categories:
1. Engineering
2. Program Management
3. Configuration Management
4. Quality Assurance
Tasks are recorded in a Work Breakdown
Structure (WBS)
Hierarchically identifies all tasks in a project
Each successive layer should be more descriptive
than its parent
For a software project, the lowest level should be
detailed enough to show class names
This is not always possible, or even necessary
Creating an Estimate…
Step 2: Estimate the resources
required per task
There are many types of resources
(that are often billed differently)
Materials
Subcontracted Items
Travel
Labor (the biggest one)
The focus of the paper is mainly
applied to estimating labor based on
the engineering (development) efforts
Creating an Estimate…
Step 3a: Estimating the Software
Development Effort
Basic Method
E = S/P (Estimate = Size/Productivity)
The hard part is determining the size and
productivity variables
Estimating Size – three main factors
1. Units of measure
2. Software included in the measurement
3. Amount of reused code
Reused code is generally counted differently than
newly written code
Must track code Added, Changed and Deleted from
the reused code
Creating an Estimate…
Step 3a Continued…
Estimating Productivity – An aggregate
of the capabilities of the development
team
Often based on historical project data
New project must use the same size measure
and must be implemented with equivalent
approaches - same programming language,
platform, etc.
There are a lot of variables that are difficult to
quantify that play a role in this estimate
Diseconomy of scale – project size and
productivity are inversely related
Creating an Estimate…
Step 3b: Estimating the Software
Development Effort
Parametric Estimation Methods
Some algorithm is used to determine the estimation
based on some set of independent inputs
Algorithm and Inputs must be created by an expert
estimator and tested to fit legacy data
Based on theory, experience and expert
judgments
Algorithms can change between evaluations for the
different lifecycle phases or components
RA, design, test, etc.
Creating an Estimate…
Step 3b: Parametric Estimation Methods…
Continued
Allocations can be automatically made against
WBS items to provide schedule detail along
with cost
Performance:
Validation and calibration of the method is very
important
Models calibrated against general industry data
usually provide estimates within 20% of the actuals
Models calibrated with an organization’s own
historical data provide estimates within 5% of the
actuals
These models ONLY provide an estimate of
the SW development activities, not the other
tasks and items that form a complete
estimate
Creating an Estimate…
Step 4: Estimating Risks
Risks are areas that are identified as possible
causes of problems in the future
Severity is determined by two variables
Likeliness of occurrence
Impact if it occurs
Generally a label of High, Medium or Low is applied
to the risk based on those variables
Main Risk areas are: Cost, Schedule, Technical and
Business
During estimation creation a lot of the system
risks should become apparent
Additional effort should be added to the proposal to
track and handle these risks
Often taken care of with “Management Reserve”
Scheduling Tasks
When all of the tasks are identified and
decomposed a schedule must be created
Generally based on the WBS (if it goes down to
the appropriate level)
May also be based on outputs of detailed design
There are often multiple related schedules
created with each representing a greater
level of detail
Highest level shows major milestones
A milestone is an event that will occur at a specified
date
Lowest level shows individual tasks
creation of specific classes
Scheduling Tasks
Dependency checking is important when creating a
schedule
Some tasks have prerequisites that must complete
before they can begin
Others are completely independent
Which means they can be worked in parallel
Creating a Schedule does four main things
1. Sequences tasks
Requires analyzing dependencies
2. Assigns resources to tasks
Not specific people, but notional resources
3. Calculates the length of the tasks
Critical Path is the length of time for the longest path through
the schedule. This is the program time to complete.
4. Compares interim milestones with those from the
master schedule
It is important to ensure that the schedule begins and
ends cleanly, with no dangling tasks
Costing Tasks
Converts the effort calculated previously
into actual dollar amounts
Must take into account the classification of each
person working on the tasks
Jr. Engineer, Lead Engineer, Program Manager, etc.
Each of these roles are costed at different base
amounts
So two Jr. Engineers may make different amounts of
money, but the customer is charged a single “Jr.
Engineer” rate
Work is charged based on a loaded labor rate
This rate (generally per hour) includes not only the
cost of the salary for the employee, but additional
costs that cover things such as
Profit
Contracts, IT (and other support departments)
Overhead
Putting The Estimate Together
The final estimate is put together by a
business office within the organization
Inputs are required from lots of others
Planners and Engineers define the job
Engineers and estimators determine the
resources required
Business office calculates the real costs
Schedulers create the schedule
Managers evaluate the results and set the total
price
They must work in profit and other costs that may
affect the project in the future
Such as adjustments to labor rates
Improving Estimation Accuracy
Some “Good Ideas” for improving
estimations
Understand the requirements
Ensure that the appropriate development
environment, programming language,
etc. are used
Collect and use legacy project data
Validate the estimation technique
against industry or organizational data
Mix estimating techniques and see
where and why they produce different
results
Tracking Expenditures
Control accounts are created to
logically split up the total project
funding among the many tasks
Charge codes are setup so that labor
can be charged against the funding in
the control accounts
For overhead and other support
purchases there is generally a “buyer”
that all requests must go through
This allows a greater ability to track
expenditures on these types of items
Tracking and Updating
To track the progress of development
three sources of data are used
1. Overall project plan
2. Cost accounting data
3. Project status
These sources provide inputs into the
Earned Value variables
BCWS – Budgeted Cost of Work Scheduled
ACWP – Actual Cost of Work Performed
BCWP – Budgeted Cost of Work Performed
ACWP > BCWP = Over Budget
BCWS > BCWP = Over Schedule
Managing Estimates During
Execution
Initial estimates are used to acquire initial funding
But in software projects these often change throughout
the development process
The progress of the development must be closely
tracked to determine when things have gone awry
When changes must be made the following
options are available:
Reinterpret the requirements (work with customer)
Apply COTS or reuse instead of new build
Use automated tools
Revise WBS element development resources
Change development sequencing
Possibly change model to an iterative one
Apply additional resources to tasks
Caveats to Using Additional
Resources
Some software components take a
minimum amount of time to complete…no
matter how many people work on it
Insert overused baby in nine months joke here
Mythical Man Month
It is often worse to apply additional resources
to a software development team when in a
crunch
They must be trained
They don’t have experience with the component
Often causes a greater slip in the development
Additional resources are not free
The money to pay for them must come from
somewhere, usually another component in the
system
CAIV and SAIV
Cost as an Independent Variable
Used to determine what items will be
built and when they will be completed
based on the funding available
Schedule as an Independent Variable
Used to determine what items will be
built and what they will cost based on
the schedule that must be met
Conclusion
Good primer on estimations of
software size and cost
Left out additional costs such as
Design
Test
Are these “rolled in” with the coding cost
in the general case?
End focus on tracking and adjusting
cost/schedule was useful, but
somewhat out of place
Effort Estimation
Has been an “art” for a long time because
many parameters to consider For example?
unclear of relative importance of the parameters
unknown inter-relationship among the parameters
unknown metrics for the parameters
Historically, project managers
1. consulted others with past experiences
2. drew analogy from projects with “similar”
characteristics
3. broke the projects down to components and used
past history of workers who have worked on similar
components; then combined the estimates
Class Discussion of Size vs Effort
Effort
If the relation
is non-linear
Effort = a + b * (Size) then ---- ?
Size
General Model
There have been many proposed models for
estimation of effort in software. They all
have a “similar” general form:
Effort ≡ (size) and (set of factors)
Effort = [a + (b * ((Size)**c))] * [PROD(f’s)]
where :
Size is the estimated size of the project in loc or function
points
a, b, c, are coefficients derived from past data and curve
fitting
a = base cost to do business regardless of size
b = fixed marginal cost per unit of change of size
c = nature of influence of size on cost
f’s are a set of additional factors, besides Size, that are
deemd important
PROD (f’s) is the arithmetic-product of the f’s
COCOMO Estimating Technique
Developed by Barry Boehm in early 1980’s who
had a long history with TRW and government
projects (LOC based)
Later modified into COCOMO II in the mid-
1990’s (FP preferred or LOC)
Assumed process activities :
Product Design
Detailed Design
Code and Unit Test
Note that this does not include requirements !
Integration and Test
Utilized by some but most of the people still rely
on experience and/or own company proprietary
data & process. (e.g. proprietary loc to pm
conversion rate)
Basic Form for Effort
Effort = A * B * (size ** C)
or more “generally”
Effort = [A * (size**C)] * [B ]
Effort = person months
A = scaling coefficient
B = coefficient based on 15 parameters
C = a scaling factor for process
Size = delivered source lines of code in
“KLOC”
Basic form for Time
Time = D * (Effort ** E)
Time = total number of calendar months
D = A constant scaling factor for
schedule
E = a coefficient to describe the
potential parallelism in managing
software development
COCOMO I
Originally based on 56 projects
Reflecting 3 modes of projects
Organic : less complex and flexible
process
Semidetached : average project
Embedded : complex, real-time
defense projects
3 Modes are Based on 8 Characteristics
A. Team’s understanding of the project objective
B. Team’s experience with similar or related project
C. Project’s needs to conform with established
requirements
D. Project’s needs to conform with established interfaces
E. Project developed with “new” operational environments
F. Project’s need for “new” technology, architecture, etc.
G. Project’s need for schedule integrity
H. Project’s “size” range
Key Project Organic Mode Semidetached Mode Embedded Mode
Characteristics
A Detail degree Considerable degree Know only generally
Understand require.
B Extensive amount Some amount None to modest amount
Exp. w/similar
project
C Only the basic ones Considerably more than All and full conformance
Conform w/req.
the basic ones
D Conform w/int. Only the basic ones Considerably more than All and full conformance
the basic ones
E New oper. env. Little to some Moderate amount Extensive amount
F New tech/meth. None to minimal Some Considerable
Schedule int.
G Low Medium Must
Size
H Less than 50K Between 50k and 300k All sizes
delivered loc delivered loc
COCOMO I
For the basic forms:
Effort = A * B *(size)C
Time = D * (Effort)E
Organic : A = 3.2 ; C = 1.05 ; D= 2.5; E = .
38
Semidetached : A = 3.0 ; C= 1.12 ; D= 2.5; E = .
35
Embedded : A = 2.8 ; C = 1.20 ; D= 2.5; E
= .32
What about the coefficient B? ---- see next slide
Coefficient B
Coefficient B is an effort adjustment factor based on 15
parameters which varied from very low, low, nominal,
high, very high to extra high
B = product (15 parameters)
Product attributes:
Required Software Reliability : .75 ; .88; 1.00; 1.15; 1.40;
Database Size : ; .94; 1.00; 1.08; 1.16;
Product Complexity : .70 ; .85; 1.00; 1.15; 1.30; 1.65
Computer Attributes
Execution Time Constraints : ; ; 1.00; 1.11; 1.30; 1.66
Main Storage Constraints : ; ; 1.00; 1.06; 1.21; 1.56
Virtual Machine Volatility : ; .87; 1.00; 1.15; 1.30;
Computer Turnaround time : ; .87; 1.00; 1.07; 1.15;
Coefficient B (cont.)
Personnel attributes
Analyst Capabilities : 1.46 ; 1.19; 1.00; .86; .71;
Application Experience : 1.29; 1.13; 1.00; .91; .82;
Programmer Capability : 1.42; 1.17; 1.00; .86; .70;
Virtual Machine Experience : 1.21; 1.10; 1.00; .90; ;
Programming lang. Exper. : 1.14; 1.07; 1.00; .95; ;
Project attributes
Use of Modern Practices : 1.24; 1.10; 1.00; .91; .82;
Use of Software Tools : 1.24; 1.10; 1.00; .91; .83;
Required Develop schedule : 1.23; 1.08; 1.00; 1.04; 1.10;
A “cooked up” example
Any problem?
Consider an average project of 10Kloc:
Effort = 3.0 * B * (10** 1.12) = 3 * 1 * 13.2 = 39.6 pm
Where B = 1.0 (all nominal)
Time = 2.5 *( 39.6 **.35) = 2.5 * 3.6 = 9 months
This requires an additional 8% more effort and 36%
more schedule time for product plan and requirements:
Effort = 39.6 + (39.6 * .o8) = 39.6 + 3.16 = 42.76 pm
Time = 9 + (9 * .36) = 9 +3.24 = 12.34 months
Try another example
(how about your own project?)
Go through the assessment of 15 parameters
for the effort adjustment factor, B.
You may have some concerns if your company
adopts COCOMO :
1. Are we interpreting each parameter the same way
2. Do we have a consistent way to assess the range
of values for each of the parameters
3. How do we get more accuracy in LOC estimate
Relative Accuracy of Estimates
(from B. Boehm)
4x
Estimate
Range Actual
(size/cost) size/cost
x
.25x Requirements Code/Test
Early Design
feasibility
Stages of the Project
COCOMO II
Based on 2 major realizations:
1. Realizes that there are many different software
life cycle and development models, while
COCOMO I assumed waterfall type of model
2. Realizes that estimates depends on granularity of
information --- the more information (later stage
of development) the more accurate is the
estimate
Effort (nominal) = A * (size ) C
Effort (adjusted) = { A * (size ) C } * B
COCOMO II
COCOMO research effort
performed at USC with many
industrial corporations
participating – still lead by Barry
Boehm
Has a database of over 80 some
newer projects
COCOMO II emphasis
COCOMO II - Effort (nominal) = A * (size ) C :
Removal of “modes”: Instead of the 3 “modes,” which use 8
characteristics to determine the modes, use 5 factors to determine the
scaling coefficient, “C”
Precedentedness
Flexibility
Risk
Team cohesion
Process maturity
COCOMO II - Effort (adjusted) = A * (size ) C * B :
For Early Estimate, preferred to use Function Point instead of LOC for
size (loc is harder to estimate without some experience). Coefficient
“B” rolled up to 7 cost drivers (1. prod reliability & complex; 2. reuse
req.; 3. platform difficulty; 4. personnel; 5. personnel experience; 6
facility; 7. schedule)
For Post-Architecture Estimates, may use either loc or function points.
Coefficient “B” use 17 cost drivers, expanded form the 7 cost drivers
(e.g. personnel expands into 1) analyst capability; 2) programmer
capability, 3) personnel continuity)
Function Point
A non-LOC based estimator
• Often used to assess software “complexity” and “size”
• Started by Albrecht of IBM in late 1970’s
Function Point (product
size/complexity)
Gained momentum in the 1990’s with IFPUG as
software service industry looked for a metric
Function Point does provide some advantages
over loc
language independent
don’t need the actual lines of code to do the counting
takes into account of different entities
Some disadvantages include :
complex to come up with the final number
consistency (data reliability) varies by people ---
although IFPUG membership and training have
improved on this
Function Point Metric via GQM*
Goal : Measure the Size of Software
Question: What is the size of a software in
terms of its:
Data files
Transactions
Metrics: Function Points ---- (defined in
this lecture)
* GQM is a methodology invented and advocated by V. Basili of U. of Maryland
FP Utility
Where is FP used?
Comparing software in a “normalized fashion”
independent of op. system, languages, etc.
Benchmarking and Projection based on “size”:
size -> cost or effort
size -> development schedule
size -> defect rate
Outsourcing Negotiation
Methodology
(“extended version” --- compared to your
text)
Composed of 3 major steps:
1. Identify and Classifying:
Data
Transactions
2. Evaluation of Complexity Levels of Data and
Transactions
3. Compute the Functional Point
1. Identifying & Classifying 5 “Basic
Entities”
Data:
Internally generated and stored (logical files and
tables)
Data maintained externally and requires an external
interface to access (external interfaces)
Transactions:
Information or data entry into a system for
transaction processing (inputs)
Information or data “leaving” the system such as
reports or feeds to another application (outputs)
Information or data retrieved and displayed on the
screen in response to query (query)
2. Evaluating Complexity
Using a complexity table, each of the 5
basic entities is evaluated as :
Low (simple)
Average
High (complex)
3 attributes are used for the above
complexity table decisions
# of Record Element Types (RET): e.g.
employee data type, student record type
# of unique attributes (fields) or Data Element
Types (DET) for each record : e.g. name,
address, employee number, and hiring date
would make 4 DETs for employee data stype
# of File Type Referenced (FTR): e.g an external
payroll record file that needs to be accessed
5 Basic Entity Types uses the RET, DET, and FTR
for Complexity Evaluation
For -- Internal Logical Files and External Interfaces data entities:
# of RET 1-19 DET 20-50 DET 50+ DET
1 Low Low Ave
2 -5 Low Avg High
6+ Avg High High
For -- Input, Output and Query transactions:
# of FTR 1-4 DET 5 -15 DET 16+ DET
0-1 Low Low Ave
2 Low Avg High
3+ Avg High High
Example
Consider a requirement: “has the feature to
add a new employee to the “system.”
Assume employee information involves 3 external files
that each has a different Record Element Types (RET)
Employee Basic Information has employee data records
Each employee record has 55 fields (1 RET and 55 DET) -
AVERAGE
Employee Benefits records
Each benefit record has 10 fields (1 RET and 10 DET) - LOW
Employee Tax records
Each tax record has 5 fields ( 1 RET and 5 DET) - LOW
Adding a new employee involves 1 input transaction
which involves 3 file types referenced (FTR) and a
total of 70 fields (DET). So for the 1 input transaction
the complexity is HIGH
Function Point (FP) Computation
Composed of 5 “Basic Entities”
input items (external input items from user or another application)
output items (external outputs such as reports, messages, screens
– not each data item)
Queries (a query that results in a response of one or more data)
master and logical files (internal file or data structure or data
table)
external interfaces (data or sets of data sent to external devices,
applications, etc.)
And a “complexity
Simple(low)
level index” matrix :
Complex (high)
Average
Input 3 4 6
Output 4 5 7
Query 3 4 6
Logical files
7 10 15
Ext. Interface
5 7 10
& file
Function Point Computation (cont.)
Initial Function Point :
Σ [Basic Entity x Complexity Level
all basic entities
Index]
Continuing the Example of adding new employee:
- 1 external interface (average) = 7
- 1 external interface (low) = 5
- 1 external interface (low) = 5
- 1 input (high) = 6
Initial Function Point = 7 + 5 + 5 + 6 = 23
Note that ---- this just got us to Initial Function Point
Function Point Computation (cont.)
Initial Function Point
∑ (Basic Entity x Complexity Level Index)
is modified by 14 DI’s
There are 14 more “Degree of Influences” ( 0 to 5
scale) :
data communications
distributed data processing
performance criteria
heavy hardware utilization
high transaction rate
online data entry These form the 14 DIs
end user efficiency
on-line update
complex computation
reusability
ease of installation
ease of operation
portability
maintainability
Function Point Computation (cont.)
Define Technical Complexity Factor (TCF):
TCF = .65 + [(.01) x (14 DIs )]
where DI = ∑ ( influence factor value)
So note that .65 ≤ TCF ≤ 1.35
Function Point (FP) = Initial FP x TCF
Finishing the earlier Example:
for the example, assume TCF came out to be 1.15,
Function Point
Provides you another way to estimate the “size” of the
project based on estimating 5 basic entities :
Inputs
Outputs
Logical Files
External Interfaces
Queries
(note : the text book algorithm is earlier, simplified
version)
------------------------
(important)
** Then --- still need to have an estimate on productivity
e.g. function point/person-month
***Divide the estimated total project function points
(size) by the productivity to get an estimate of “effort” in
person-month or person-days needed.
Questions
Questions?