11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
Implementing DEA models in the R program
José Francisco Moreira Pessanha
Rio de Janeiro State University, professorjfmp@hotmail.com, (corresponding Author)
Alexandre Marinho
Rio de Janeiro State University, alexandre.marinho@ipea.gov.br
Luiz da Costa Laurencel
Rio de Janeiro State University, llaurenc.ntg@terra.com.br
Marcelo Rubens dos Santos do Amaral
Rio de Janeiro State University, mrubens@ime.uerj.br
Abstract
This paper aims to present an implementation of classical DEA models in R, a free
software and open source, highly extensible that offers a variety of functions and
graphical routines for data analysis. In this work we show both the CRS and VRS
DEA models. The computational implementation is illustrated with real data from the
Brazilian electric power distribution utilities.
Keywords: Data Envelopment Analysis, classical models, R programming language
Introduction
Introduced by Charnes, Cooper and Rhodes in 1978, Data Envelopment Analysis
(DEA) is an important branch of the operations research, as well as of economics, as
evidenced by numerous publications with practical applications and theoretical
developments on little more than three decades (EMROUZNEJAD et al, 2008, COOK
& SEIFORD, 2009). In summary, DEA can be described as a nonparametric
technique based on linear programming to evaluate the efficiency of organizations
working in the same industry.
This paper presents a brief introduction about the implementation of classical DEA
models in the R programming language. The models implemented include the CRS
(Constant Returns to Scale) and the VRS (Variable Returns to Scale), both in the
multiplier form and input oriented. The computational implementation is illustrated
by the efficiency evaluation of the 18 biggest Brazilian electric power distribution
utilities.
There are several software tools available for DEA (BARR, 2004); however, the
possibility of implementing DEA models in other programming languages provides
great flexibility in the application of the DEA methodology. The advent of the R
project (R DEVELOPMENT CORE TEAM, 2013), a free software and open source,
highly extensible, offers a variety of functions and graphical routines (packages) for
data analysis. For example, the Frontier Efficiency Analysis with R - FEAR
(WILSON, 2008) and Benchmarking (BOGETOFT & OTTO, 2011) are two R
packages dedicated to DEA. However, R is more than a library of packages; it allows
analysts to build their own programs.
1
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
Classical DEA models
DEA is a widely used technique for evaluating the efficiency of a set with N peer
entities called decision making units (DMU) which convert multiples inputs into
multiples outputs. In the general case, a DMU uses multiples inputs X=(x1,...,xs) to
produce multiples outputs Y=(y1,...,ym) and its efficiency score is defined by the
following quotient:
efficiency u1 y1 um ym v1 x1 vs xs U Y V X (1)
where V=(v1,...,vs) and U=(u1,...,um) denote the weights assigned to the inputs and
outputs quantities respectively.
Charnes, Cooper and Rhodes (1978) suggest that the vectors U and V must be
determined by the linear programming problem (LPP) (2) at Table 1, called CCR or
CRS (Constant Returns to Scale) input oriented in the multiplier form.
Table 1. DEA/CRS input oriented
Multiplier form Envelopment form
m
efficiency Max ui yi , j0 (2) efficiency Min (3)
u ,v
i 1
,
s.t. s.t.
N
X j j X j
s m
vi xij ui yij 0 j 1,..., j0 ,..., N 0
i 1 i 1 j 1
N
Y j0 j Y j
s
v x
i 1
i i , j0 1
j 1
ui 0 i 1,m j 0j 1,..., j0 ,..., N
vi 0 i 1,s
The evaluated DMU (DMUj0) is efficient if the objective function is equal to one and
all weights are positive at the optimal solution. Otherwise, the DMU is inefficient.
Under the resources conservation approach (input orientation), the measure of
technical efficiency (0 1) of a DMU is defined as the maximum radial
contraction of the input vector X that can produce the same amount of products Y:
efficiency = Min { | (X,Y) production possibilities set T(X,Y) } (4)
Using the duality theory in linear programming (COOPER et al, 2002), one can derive
an equivalent model known as DEA model in the envelopment form under input
orientation whose mathematical formulation corresponds to the model (3) at Table 1.
In this case, the DMU evaluated is efficient if and only if =1. Otherwise, the DMU is
inefficient. It should be emphasized that the LPP (2) or (3) must be solved for each
DMU in order to compute its efficiency score.
Later, Banker, Charnes and Cooper (1984) added the constraint 1+…+N=1 in the
envelopment form of the CRS model (3). The result is a DEA model called BCC or
VRS (Variable Returns to Scale). The VRS model in the multiplier form and input
oriented is illustrated below (5), where the unconstrained variable u0 corresponds to
the constraint 1+…+N=1 in the dual model.
2
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
m
efficiency Max ui yi , j0 u0 (5)
u ,v
i 1
s.t.
s m
vi xij ui yij u0 0 j 1,..., j0 ,..., N
i 1 i 1
s
v x
i 1
i i , j0 1
ui 0 i 1,m vi 0 i 1,s
An R code for DEA
The R code can organized in three parts: loading input data, processing and output
reporting. In order to illustrate the R code for DEA model, consider the dataset with
the 18 biggest Brazilian distribution utilities for the year 2009. Each utility is
characterized by four variables: the annual operating expenditures in R$ (OPEX), the
total length of the distribution network in kilometer (NETWORK), the total electricity
consumption (MWH) and the number of consumers supplied by the utility
(CONSUMERS). The main outputs of the distribution utilities are the amount of
distributed energy and the number of consumers. In addition, the operating expenses
are also influenced by non controllable factors, for example, the geographical
dispersion of consumers. In order to address this issue, the size of the distribution
network can be included as an additional output variable. The outputs variables are
the drivers of operating expenditures. For a given level of output, the utility should
operate at the lowest cost. Thus, in order to obtain an efficiency score that quantifies
the potential reduction of the operating expenditures, we propose an input-oriented
DEA model wherein the OPEX is the unique input variable and the outputs are those
aforementioned. Consider that the data are stored in a MS Excel file called data.xls at
directory c:\example. The data importing can be done by the following commands
(commentaries after #):
require(lpSolve) # load lpSolve package previously installed
require(XLConnect) # load XLConnect package previously installed
setwd('c:/example') # set work directory
wb <- loadWorkbook('data.xls') # load the file data.xls
data <- readWorksheet (wb, sheet=1, header=TRUE) # read the first spreadsheet in the data file
In the R code above the loadWorkbook and readWorksheet functions are available in
the package XLConnect (http://cran.r-
project.org/web/packages/XLConnect/index.html). The readWorksheet command
create a data.frame (VERZANI, 2005) called data with all records in the first
worksheet into the file data.xls. In the readWorksheet command, the sheet parameter
indicates the index of the worksheet to read from and the header parameter set to
TRUE indicates that the first line contains the variable labels. As illustrated in Fig. 1,
the data object contains the input and output variables. The input variable is the
OPEX at second column of the data matrix and the output variables are at columns 3
(NETWORK), 4 (MWH) and 5 (CONSUMERS). The selection of inputs and outputs
variables can be done by the R code shown in Fig.1.
The DEA model processing consists in solving a LPP for each one of the N DMU.
The LPP can be solved by the lp function available in the package lpSolve
(http://cran.r-project.org/web/packages/lpSolve/index.html).
3
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
namesDMU <- data[1]
inputs <- data[2]
outputs <- data[c(3,4,5)]
N <- dim(data)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
Figure 1. The data object in R and the data import code
The LPP (2) can be expressed in the following form:
Max cT z
s.t. inputs outputs z 0 (6)
bT z 1
z 0
where inputs and outputs are the R objects defined in Fig. 1, zT=[v u1 u2 u3] is the
decision variables vector (weights), bT is the vector of inputs of the evaluated DMU
(DMUj0) and cT is the vector of outputs of the DMUj0.
The efficiency score of each DMU can be evaluated by the R code shown below,
where i is the index of the evaluated DMU and the vectors b and c are modified
automatically for each DMU:
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Note that in the lp function are informed all elements of a LPP: the problem
orientation (max or min), the objective function coefficients (f.obj), the matrix of
constraint coefficients (f.con), the directions of the constraints <=, = or >= (f.dir) and
the right-hand side (f.rhs) of the constraints. In the R code above, the optimal values
for the weights (results$solution), shadow prices (results$duals) and efficiency score
(results$objval) are stored in the object results. Ultimately, the following R code
4
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
exports the efficiency scores and the input and output weights (U and V) to the
spreadsheet illustrated in Fig. 2.
report <- cbind(effcrs,weights) # output report
colnames(report) <- c('efficiency',names(inputs),
names(outputs)) # header
# create the output file resultscrs.xls
wbout<-loadWorkbook("resultscrs.xls",create = TRUE )
# create the sheet CRS_results within the output file
createSheet(wbout,"CRS_results")
# write DMU names in the first column
writeWorksheet (wbout, namesDMU, startCol = 1,
sheet=1, header = TRUE)
# write the results from the second column onwards
writeWorksheet (wbout, report, sheet=1, startCol=2,
header = TRUE )
# save the output file
saveWorkbook(wbout)
Figure 2. Results from CRS model
The difference between CRS and VRS models resides in the unconstrained variable
u0. This variable can be modeled by the difference of two non negative variables (u0 =
u+ - u-, u+0 and u-0), as illustrated by the following R code for DEA VRS (5).
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir<-c(rep("<=",N), "=") # directions of the constraints
aux <- cbind(-1*inputs,outputs,1,-1) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj<-c(rep(0,s),outputs[i,],1,-1) # 1 and -1 represents u+ and u- respectively
f.con<- rbind(aux,c(inputs[i,], rep(0,m),0,0)) # add LHS of bTz=1
results<-lp("max", f.obj, f.con, f.dir, f.rhs, scale=1, compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency scores
duals <- results$duals # shadow prices
u0 <- multipliers[s+m+1]-multipliers[s+m+2]
if (i==1) {
weights <- c(multipliers[seq(1,s+m)],u0)
effvrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights<-rbind(weights,c(multipliers[seq(1,s+m)],u0))
effvrs <- rbind(effvrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
The efficiency scores from CRS and VRS and the scale efficiency – SE (COOPER et
al, 2002) are illustrated in the Fig. 3 generated by the following R code:
par(mar=c(10,5,1, 8), xpd=TRUE) # set plot margins
scale <- effcrs/effvrs # calculate the scale efficiency
spreadsheet <- cbind(effcrs,effvrs,scale) # concatenate the scores
rownames(spreadsheet) <- namesDMU[,1]
colnames(spreadsheet) <- c ("CRS","VRS","SE")
barplot(t(spreadsheet),col=palette()[c(1,4,7)], ylab="Efficiency",beside=TRUE,las=3)
legend("topright",inset=c(-0.2,0),colnames(spreadsheet),fill=palette()[c(1,4,7)],bty="n")
The decomposition illustrated in Fig. 3 depicts if the sources of inefficiency is in the
operation, in the scale or both (COOPER et al, 2002).
5
11th International Conference on Data Envelopment Analysis, Samsun, Turkey, 2013
Figure 3. Efficiency decomposition
Conclusions
The R codes presented in this paper are examples of how to implement DEA models
in the R programming language. They can be easily adapted to more sophisticated
DEA models, for example, models with restricted multipliers, cross-efficiency
evaluation, two-stage DEA model, network DEA and resource allocation based on
DEA. The R is free, open source and highly extensible. In addition, it offers a way to
integrate the DEA methodology with other quantitative techniques and software, an
important issue in decision making process.
References
Barr, R.S. (2004) DEA software tools and technology: A State-of-the-Art Survey, In
Handbook on Data Envelopment Analysis, ed. W.W. Cooper, L.M. Seiford, and J.
Zhu, Kluwer Academic Publishers: 539–566.
Banker, R.D., Charnes A., Cooper, W.W. (1984) Some models for estimating
technical and scale inefficiencies in data envelopment analysis, Management Science,
30: 1078-1092.
Bogetoft, P., Otto, L. (2011) Benchmarking with DEA, SFA and R, Springer Science.
Charnes, A., Cooper, W.W., Rhodes, E. (1978) Measuring the Efficiency of Decision
Making Units, European Journal of Operational Research, 2: 429-444.
Cook, W.D., Seiford, L.M. (2009) Data Envelopment Analysis (DEA) – Thirty years
on, European Journal of Operational Research, 192: 1-17.
Cooper, W.W., Seiford, L.M., Tone, K. (2002) Data Envelopment Analysis: A
comprehensive text with models, applications, references and DEA-Solver Software,
Kluwer Academic Publishers.
Emrouznejad, A., Parker, B.R., Tavares, G. (2008) Evaluation of research in
efficiency and productivity: A survey and analysis of the first 30 years of scholarly
literature in DEA, Socio-Economic Planning Sciences, 42: 151-157.
R Development Core Team R (2013) A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org/.
Verzani, J. (2005) Using R for introductory statistics, Chapman & Hall/ CRC Press.
Wilson, P.W. (2008) FEAR: A software package for frontier efficiency analysis with
R, Socio-Economic Planning Sciences, 42: 247-254.