KEMBAR78
Enterprise Architecture Analysis For Data Accuracy Assessments | PDF | Conceptual Model | Bayesian Network
0% found this document useful (0 votes)
58 views11 pages

Enterprise Architecture Analysis For Data Accuracy Assessments

Uploaded by

todebarshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views11 pages

Enterprise Architecture Analysis For Data Accuracy Assessments

Uploaded by

todebarshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224599628

Enterprise Architecture Analysis for Data Accuracy Assessments

Conference Paper · October 2009


DOI: 10.1109/EDOC.2009.26 · Source: IEEE Xplore

CITATIONS READS
21 558

5 authors, including:

Pontus Johnson Mathias Ekstedt


KTH Royal Institute of Technology KTH Royal Institute of Technology
138 PUBLICATIONS   2,157 CITATIONS    147 PUBLICATIONS   2,047 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pontus Johnson on 01 June 2014.

The user has requested enhancement of the downloaded file.


2009 IEEE International Enterprise Distributed Object Computing Conference

Enterprise Architecture Analysis for Data Accuracy Assessments

Per Närman, Pontus Johnson, Mathias Ekstedt, Moustafa Chenine, Johan König
Department of Industrial Information and Control Systems
Royal Institute of Technology (KTH)
Stockholm, Sweden
{pern, pj101, mek101, moustafac, johank}@ics.kth.se

One aspect frequently overlooked in the architecture


Abstract - Poor data in information systems impede analysis community is the aspect of data accuracy.
the quality of decision-making in many modern While of great importance to the business, data
organizations. Manual business process activities and accuracy is rarely addressed by any major architecture
application services are never executed flawlessly which framework or architecture analysis method.
results in steadily deteriorating data accuracy, the This paper describes how the Bayesian approach
further away from the source the data gets, the poorer its can be used for architectural analyses of data quality.
accuracy becomes. This paper proposes an architecture The paper focus primarily on the accuracy of data and
analysis method based on Bayesian Networks to assess how it deteriorates in a business process involving
data accuracy deterioration in a quantitative manner. multiple automated and manual processing steps. Based
The method is model-based and uses the ArchiMate on a process-centric view of data accuracy this paper
language to model business processes and the way in presents a core model based on ArchiMate [6] for
which data objects are transformed by various accuracy architecture analysis. A case study performed
operations. A case study at a Swedish utility demonstrates at a Swedish utility illustrates the approach for one
the approach.
particular process, the outage reporting process.
Keywords: Enterprise Architecture, Accuracy, Data
The next section provides some background to the
Quality architecture analysis approach on which this paper is
based and touches briefly on ArchiMate. Section 3
I. INTRODUCTION presents some related works and gives an overview of
common data quality estimation methods and theories.
In recent years, software, system and Enterprise
Section 4 proceeds to detail a so called abstract model
Architecture (EA) have become established disciplines
with which it is possible to model and analyze data
in both industry and academia. Architecture models can
accuracy. Section 5 describes a case study where the
aid the communication between various stakeholders
method is applied followed by Section 6 which
and facilitate comprehension of the properties of the
concludes the paper and proposes future works.
complex systems they represent. One important part of
this comprehension is to be able to analyze the models. II. ABSTRACT AND CONCRETE MODELS
However, common EA frameworks such as the
Zachman framework [1], TOGAF [2] or DoDAF [3] An abstract model is an EA metamodel containing
rarely take architecture analysis into consideration, at entities and entity relations, augmented with attributes
least not explicitly. Modeling and metamodeling are and causal relations between attributes. The attributes
often important parts of these frameworks and they and attribute relations correspond to nodes and causal
commonly use views and viewpoints as a way of relations in Bayesian networks [7], [8]. By
implicitly introducing the reasons and purposes of EA “metamodel” we mean a strict set of constructs and
models. However, it is hardly ever explicated how the rules that can be used to model a specific domain, we
analyses should be carried out from any specific equate meta model to the Unified Modeling Language
viewpoint. constructs, relations and rules which can be used, for
The present work builds on previous research within example, to define a model of a specific software
the field of architecture analysis where architecture system [9].
models are analyzed using a formalism based on Classes are fundamental parts found in most
Bayesian statistics [4][5] this approach allows the metamodels. Classes represent the objects of interest
analysis of various system properties, such as the when modeling, e.g. application, services, persons, or
interoperability, information security and the processes. Classes in abstract models are similar to
availability of software systems. This approach is also classes found in UML.
explicit on what kind of information architecture Class relations connect two entities, e.g. “Interface
models need to contain in order to be useful in an is provided by Application” or “Person is a resource of
architecture analysis. a Process”. Class relations also state the multiplicity of
the relationship between the Classes, e.g. that one
person can be the resource of zero or more processes.

1541-7719/09 $25.00 © 2009 IEEE 24


DOI 10.1109/EDOC.2009.26

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Class relations are similar to class relations found in Table 1: A conditional probability matrix for the chance nodes X,
UML. Y and Z.
Attributes of an abstract model represent variables Z z1 z2
related to the classes. UML also have attributes related Y y1 y2 y1 y2
to classes, but the attributes in abstract models differ x1 P(x1| y1,z1) P(x1| y2,z1) P(x1| y1,z2) P(x1| y2,z2)
X
from the attributes in UML. In abstract models, x2 P(x2| y1,z1) P(x2| y2,z1) P(x2| y1,z2) P(x2| y2,z2)
attributes and attribute relations represent the nodes and
relations of a Bayesian network, see below. A richer More comprehensive treatments on Bayesian
account of abstract and concrete models is found in networks can be found in e.g. Neapolitan [11], Jensen
Johnson et al. [7]. [12], Shachter [13] and Pearl [14].
A. Bayesian networks B. Creating concrete models
Friedman et al. [10][12][14] describes a Bayesian The abstract model tells us what information we
network, B=(G, P), as a representation of a joint need to find in order to conduct analyses of different
probability distribution, where G=(V, E) is a directed variables. Once this information is collected it is
acyclic graph consisting of vertices, V, and edges, E. specified in the model, thus creating an instantiation of
The vertices denote a domain of random variables the abstract model. Once a concrete model has been
X1,…, .Xn, also called chance nodes. In the context of created we can employ Bayesian inference to calculate
abstract models, each chance node corresponds to an the values of the attributes of the model.
attribute. Each chance node, Xi, may assume a value xi In Fig.1 below we see a Bayesian network stating
from the finite domain Val(Xi). The edges denote that the property X of Applications is affected by
causal dependencies between the nodes, i.e. the causal property Y of Servers and property Z of Operating
relations between the nodes. The second component, P, Systems. This can be transformed into an abstract
of the network B, describes a conditional probability model where Application, Server, and Operating
distribution for each chance node, P(Xi), given its System are modeled as classes and the properties as
parents Pa(Xi) in G. It is possible to write the joint class attributes. Finally, when instantiating the abstract
probability distribution of the domain X1,…, Xn using model into a concrete model, the class attributes assume
the chain rule of probability, in the product form values either through data collection or through
n
inference using conditional probabilities.
P( X 1 ,..., X n ) = ∏ P( X i | Pa ( X i )) .
i =1
In order to specify the joint distribution, the
respective conditional probabilities that appear in the
product form must be defined. The second component P
describes distributions for each possible value xi of Xi,
and pa(Xi) of Pa(Xi), where pa(Xi) is the set of values of
Pa(xi). These conditional probabilities are represented
in matrices, here forth called Conditional Probability
Matrices (CPMs). Using a Bayesian network, it is
possible to answer questions such as what is the
probability of variable X being in state x1 given that Y
= y2 and Z = z1.
In the general case, the relations between variables
described by the CPMs can be arbitrarily complicated Figure 1: 1) A Bayesian Network, 2) an abstract model and 3) a
conditional probabilities. Table 1 contains a general concrete model.
description of a CPM for the three nodes X, Y and Z
where the first is causally affected by the other two.

25

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Figure 2: The original ArchiMate metamodel [6].

Application Service is seen to encapsulate a


C. The core of the metamodel: ArchiMate
number of internal Application Functions to
The ArchiMate EA metamodel has been used as a make them useful to the business actors of the business
basis for the development of the abstract models. processes. By using services the business actors neither
ArchiMate is an EA framework [15] developed by the have to consider the internal behavior of the
ArchiMate Foundation consisting of primarily Dutch information systems nor the technology used to realize
organizations. ArchiMate is also about to be published the behavior. Fig. 2 shows the original ArchiMate
as a Technical Standard by The Open Group. metamodel [6].
ArchiMate is a mature modeling language and its
applicability as a core language for abstract models has III. DATA QUALITY AND ACCURACY
been tested in previous projects [16]. Data quality is a multi-faceted concept. The most
The ArchiMate metamodel distinguishes between common dimensions of data quality are completeness,
three layers; the business, application and technology consistency, currency relevance and accuracy [18]
layer, where the technology supports the applications [19][20][21] [23]. In this paper the accuracy dimension
and the applications support the business. is the main focus.
The entities of the layers are in turn categorized
according to three aspects of EA; the passive structure A. Accuracy
modeling informational objects, the behavior structure According to [18][23], accuracy is defined as the
modeling the dynamic events of the EA and finally the degree of proximity of a value V to V’, where V’ is an
active structure which models the components in the actual concept in the domain of reference and V is a
architecture that perform the behavior aspects. datum that represents it. So if the datum V has a value
The service concept has been used extensively in of 200 and its frame of reference V’, the “truth” so to
the ArchiMate language. Services encapsulate and hide speak V is 199, V is inaccurate. When reasoning about
the internal behavior of underlying layers and give the classes of data values, accuracy instead becomes an
overlying layers access to functionality from underlying issue of probability: the probability that a value selected
layers through well-defined interfaces. For instance, an

26

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
from a class of values is accurate according to the [20] to make use of UML’s richer semantics and to
above definition. conform to the de-facto standard that UML is.
In the UML model the data units are associated
B. Models for data quality analysis through a quality association with a stereotyped quality
There are several modeling techniques proposed for dimension class e.g. timeliness. By using UML the IP
data quality analysis. The Quality Entity Relationship UML provides the opportunity to model data units in
(QER) model is an extension of the ER models to interaction diagrams, to observe the data flow between
accommodate several data quality dimensions [19]. object calls and in activity diagrams. Both IP Map and
With QER the ER models are extended to incorporate IP UML lack the ability to perform quantitative analysis
data quality related information on attributes in the of data accuracy in a process. Instead, there function is
relational model. The attribute in relations are mostly to visualize data quality problems and
associated with quality indicators (e.g. accuracy) and requirements and aid software engineers in designing
quality rankings (e.g. excellent). better information systems.
The QER model lacks the ability to incorporate The assessment method that is presented in this
information on the origin of the data. Tracing the origin paper is similar to the IP Maps and the IP UML model
of data is known as data provenance [18] and is in the sense that it is aimed at the management or
important in most systems where data is collected from Enterprise Information systems by visualizing
various distributed sources with a varying degree of processes and data quality.
data quality. The main contribution of the method presented here
The earliest attempt in tracing data provenance is is to offer a way to quantitatively assess data accuracy
the Polygen model [19] that is more geared to analyze across a business process. Unlike the Polygen model
data quality in distributed heterogeneous data sources. which also features quantitative data quality
The Polygen model is a relational model that defines a assessments using its relational algebra but only on the
set of operators (e.g. union, Cartesian product, database level the method presented here captures the
difference, etc) based on relational algebra that can impact on data accuracy from various behavioral
semantically annotate the propagation of data. elements in a business process. Furthermore, this
While the QER and Polygen Models are expressive method incorporates constructs from the ArchiMate
and applicable to the database relational domain, they language, and is as such a first attempt to use EA
are less applicable to distributed enterprise information models for data accuracy assessments.
systems in general. In such systems the data structure is
not necessarily defined in relational algebra and could IV. AN ABSTRACT MODEL FOR ACCURACY
be more the result of an aggregation of data elements in ASSESSMENTS
a process which could in turn also be a result of several This chapter introduces an abstract model with
processes. which it is possible to describe how the manipulation of
Information Product Maps (IP Maps) [22] data across a business process gradually deteriorates the
accommodate such process based modeling of data. IP data accuracy.
Maps are graphical models that treat data as a product
from one or several processes where the input was raw A. Abstract model structure
data. This is analogous to the manufacturing process The abstract model consists of six modeling entities,
where raw materials are inserted and a finalized product each with an “Accuracy” attribute. Of the six, three
exits the manufacturing line after a series or belong to the passive, or informational structure of
manipulation and modification. Such manipulations can ArchiMate, and the other three to the behavior structure
be represented in the IP Maps model with the use of [6], see Fig. 3.
specific constructs to represent the process and actions
on the data, e.g. processing, quality check, data source 1) The passive objects.
or data receiver. Each construct in term has associated The “main” passive objects in the abstract model are
metadata which can be used in a model to specify the the Business Objects which are classes of pieces
construct. of information that are important to the business.
The strength of the IP Maps is the ability to portray
In the Application layer, Business Objects
the data provenance as well as the components and
elements in the process that manipulate the data. The IP are realized by Data Objects, which are
Maps model has been extended into a IP UML profile

27

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Figure 3: An abstract model for accuracy assessments.

elements stored in databases. In the business layer, Although the instances will not be modeled in this
the Business Objects are realized by paper either, they are crucial for the definition of
Representations, which represent regular accuracy. Instantiated Business Objects, i.e.
documents used by the business. concrete pieces of information, are the “true” pieces of
Both Representations and Data Objects information that constitute the frame of reference
model classes of information objects. Both Data against which the accuracy of the other passive
Objects and Representations may be components will be judged. An instantiated Data
composed of other Data Objects or Object is a physical piece of data stored in a database
Representations respectively. We will denote the which realizes a Business Object instance. An
lowest level of abstraction for this modeling as the instantiated Representation is a concrete
“atomic” passive objects. ArchiMate is not able to document realizing a Business Object instance.
represent the instance level, i.e. the physical instances See Fig. 4 for an example of how instances relate to the
of the classes that are represented as Data Objects (see ArchiMate model.
Fig. 4).
2) Accuracy for the passive objects.
On an instance-level, “Accuracy” is an attribute of
the passive components which can assume the states
“Accurate” and “Inaccurate”.
In the abstract model, which deals exclusively on
the class level, the “accuracy” of a Data Object or a
Representation is the probability that a Data
Object or Representation belonging to the
class is in the state “Accurate”. This can be estimated
through the frequency of errors encountered [22]:

3) The behavior components.


The passive components of ArchiMate are read or
Figure 4: An example showing how instances relate to the
abstract and concrete models and ArchiMate’s view of the world.
written by behavioral components. The entities
considered by this abstract model are Business
Processes, Application Services and the

28

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Application Functions. A Business executions divided by the total number of executions.
Process is a set of activities performed by users “Correct” is here defined according to what the
that transforms input into some kind of output. application’s specification considers to be correct given
Business Processes use Application the input data. Application Services are merely
Services. Application Services describe the bundles of Application Functions, and their accuracy is
behavior of an application from an external perspective defined in terms of Application Function
and is defined as a unit of functionality that is accuracy.
meaningful to (business) users. Application All the classes and relations in this abstract model
Services are realized by one or several are consistent with the ArchiMate metamodel as
Application Functions. An Application presented in [6], except for the access relation between
Function describes a well-defined piece of internal the Application Service and the
behavior of an application. Representation which has been added here.
All of the behavior components can access the B. Calculation framework
passive components somehow. Business
As discussed in chapter 2, attribute relationships are
Processes can read or write defined by Bayesian network logic. In addition to this
Representations.Application Functions
there are several rules for how to perform the accuracy
and Services can read and write Data Objects, assessments.
and sometimes (in the case of scanning or printing To be able to show how the accuracy of data
functionality) read and write Representations as deteriorates in a process it is necessary to show how
well. passive components are transformed by active,
4) Accuracy and behavior components. behavioral components in a process. We therefore
Behavior components influence the accuracy of the introduce the notion of time in our models. Whenever a
information entities through “write” relations, (which passive component at process step t is modified (written
are specializations of “access” relations in ArchiMate). to) by a behavior component, a new instance of the
Changes in accuracy happen for instance when passive component will be modeled again at the
Application Functions fail to read certain subsequent process step t+1.
Data Objects or when someone in a Business Although this paper represents the passive objects at
Process makes a mistake. In order to model this different discrete process steps as new Bayesian nodes,
dependency, all of the active components contain an alternative approach could use Dynamic Bayesian
“execution accuracy”. Networks [25]. Whenever there is a “write” entity
On an instance level, i.e. for a single execution, relation between an active component such as a
behavior components can be either “accurate”, meaning Business Process and a passive component such
that they transform input data to output data according as a Representation, this is accompanied by an
to what is specified, or “inaccurate”, meaning that the
attribute relation from the active components execution
behavior component fails to meet the requirements.
On the class level, i.e. what is shown in this accuracy to the accuracy of the passive component.
abstract model, the execution accuracy attribute As has been stated already, passive components
describes the probability that, given correct input data, sometimes are composed by other passive components.
the Business Process, Application Whenever the composition relation exists, it is the
Function or Application Service yields accuracy of the “atomic” passive components on the
accurate output data. The probabilities are calculated as lowest level of abstraction which will be transmitted to
frequencies analogous with the procedures used for the the next generation of passive components, see Fig 5.
passive components. A passive object on the instance level is accurate if
Taking the view that data is consumed and produced and only if (i) it is composed by other passive objects,
in Business Processes we assume that the all of which are accurate, (ii) the behavior component
execution accuracy of a Business Process can that write the passive components must be accurate
be measured by how much output data is deviating while doing so. So, a behavior component that performs
from the correct value. This is often done within the an inaccurate “write” operation on a passive component
process management community [24]. The execution makes the passive component inaccurate, and (iii) it
was not inaccurate previous to being read and written
accuracy of an Application Function is defined
by either one of the behavior components.
in a similar way, which makes it similar to the
“Computational accuracy” of ISO 9126-2 [17] standard.
This measures accuracy in terms of number of correct

29

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Figure 5: Passive, informational components are modeled once for every process step. Notice that the passive components are composed of
other passive components and it is only the “atomic” accuracies that are passed on to the next generation of passive components.

Stated more formally using the Object Constraint the same as calculating the intersection of independent
Language (OCL), the rules of computations are the events.
following for a passive object at process step t
V. THE CASE STUDY
context : inv:
The case study was performed at a medium-sized
if
( . electrical utility in Sweden. The process of interest was
: . the reporting process of power outage information to
and the authorities. Power outage information is primarily
. concerned with assessing the quality of service offered
: | . by the power grid operator to its customer and uses a
and 1. ) then set of predefined reliability indices to express this.
.
These indices show the frequency and duration of
Endif
These “all-or-nothing” rules have implications on outages as well as the number of customers affected
the underlying Bayesian network used for the which are based on compilations of all known outages
computations of the concrete model. Specifically, the within a predefined period of time. Below follows a
CPM of the accuracy nodes will consistently describe brief description of the relevant Business Objects
an AND-relation; the node is accurate if and only if its followed by a description of the process itself.
predecessor nodes are accurate as well. A. The information architecture
Table 2 A CPT describing an AND relation as used in Figure 5. The information architecture describes the structure
All predecessor nodes need to be in the "Accurate" state in order for of the passive objects in the case study. Every unit of
the passive component “XY” to be accurate at process step t=2. information that is of interest to the business is
represented as a Business Object. An Outage
Ticket is composed of an Outage Time and data on the
affected Customer. Initially, the Outage Time only
includes the time the power outage was reported. In
order to conduct maintenance to resolve the outage a
The concrete model models classes of passive Work Order is created. The Work Order includes the
objects which contain information about the Outage Ticket as well as the specifics on the
probability of the See Table 2 for an example of an Maintenance Task that is required to resolve the
AND function. The AND relations can be used on the outage. When the maintenance has been completed and
class levels as well as on the instance levels; an AND the power is restored, the Outage Ticket is updated
CPM can be used to multiply probabilities, which is with the time of power restoration, and the
Maintenance Task is updated with details concerning

30

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
the maintenance that was performed. The information Information System (NIS), which is a Geographical
contained in multiple Outage Tickets is subsequently Information System with extra functionality tailored to
aggregated to the indices that go into the Outage display distribution grids on maps, perform network
Report. calculations etc. Currently the only method available
for importing the work orders to the NIS is by
B. The outage reporting process.
compiling them, printing them and then manually
The process (Fig 6.) begins when the Create Work inserting them into the NIS database.
Order Business Process is initiated. This Before printing the work orders, they are aggregated
business process here uses the Create Work Order and compiled into a more suitable format. The
Application Service that is one of the services documents are compiled via the Word File Generator -
offered by the Maintenance Management System a collaboration between MS Word and the MMS
Application Component (Application
Application Components, offering the
Components are not part of the abstract model, but
Application Service Generate Work Order
are modeled here for easier process comprehension).
Compilation to MS Word.
The Create Work Order Application Service
An Application Service Print, a service
reads an Outage Ticket Data Object (at process
assigned to MS Word, prints out the compiled Work
step t=1, see Fig. 6) and in turn creates a new Work
Orders Data Object (process step t=4) onto paper.
Order Data Object (process step t=2), containing
In Fig. 6 this is illustrated as the Compiled Work Order
data from the Outage Ticket as well as new data from
Representation at process step t=5.
the Create Work Order Business Process,
The final Business Process Create Outage
specifically the Maintenance Task Data Object
Report includes the steps of importing Outage Ticket
(process step t=2).
information into the NIS and thereafter generate the
Once the Work Order has been created, the Resolve
Outage Report Data Object (process step t=7).
Outage Business Process is initiated when a The information on the printed documents is
field crew reads a Work Order Data Object manually typed into the NIS, here represented as a read
(process step t=2) through their Maintenance Field relation between the Create Outage Report Business
Support Application Service which is Process and the compiled Outage Ticket
another service offered by the Maintenance Representation (process step t=5)(which in turn
Management System. When the outage is resolved, the is a part of the compiled Work Orders.
Resolve Outage Business Process updates the The Create Outage Report Business Process
Work Order Data Object (process step t=3) then uses the services provided by the Register Outage
through the services provided by the Maintenance Information Application Service which in turn
Field Support Application Service. The data writes the information to the Outage Ticket Data
updated here is the details on the completion of the Object (process step t=6). The compiled outage tickets
Maintenance Task, which is updated in the are now available in NIS ready to be used for creating
Maintenance Task Data Object (process step t=3). the outage report. The Generation of Outage Report
The Outage Ticket Data Object (process step t=3) Application Service is then invoked to the NIS
is also updated with the time of power restoration. to generate the final Outage Report Data Object
To be able to generate the final Outage Report there (process step t=7).
is a need to import Work Order information, and in
particular the Outage Tickets into the Network

31

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
Figure 6: The process of generating an outage report, application components are modeled here to indicate the system structure. The accuracy
of the input data and the behavior components are given here, but not the calculated accuracy figures.

Ticket slowly deteriorates across the Business Process


C. Results
and lowers the accuracy of the Outage Report to a
To perform the accuracy analysis some data mere 95%. Referring to Fig. 7 it can be seen, that the
collection was made regarding the accuracy of the accuracy is not influenced during the time steps t=5
input Data Objects, and the execution accuracy of and t=7, this due to the lack of influence from the
the Business Processes and Application inaccurate Business Processes. While data is
Services. The estimates were made by the process only processed by applications its accuracy is kept
participants during interviews and yielded the rough intact leading to a situation where the human error has
approximation that the Business Process the largest impact.
execution accuracy, i.e. the probability that the
Business Process performs according to
specification, is around 99%, whereas the
Application Services were thought to be
accurate at all times, i.e. having a 100% correct
execution time. The reason for not having 100 %
accuracy in the Business Processes had to do with the
repetitious work often involving manually transcribing
data from paper into a computer. The accuracy of the
input data to the process, i.e. the Outage Tickets, was
estimated to be 99% accurate on average. The error in
the input data mainly had to do with erroneous
estimations of the Outage Times by customers calling
in to report their power outages. Figure 7: The gradual deterioration of accuracy in the Outage
Tickets.
The Bayesian network part of the concrete model This suggests that one way to improve process
and the estimates collected where then inserted into an accuracy could be to integrate the Maintenance
identical model in GeNIe, a Bayesian network Management Systems with NIS, leading to a reduction
inference engine [26]. The Bayesian network in Genie of manual activities when generating the outage report.
implemented the AND CPTs shown in Section 4. Notwithstanding the fact that the estimates on the
The results from the computations in Genie showed accuracy of the processes and input data may be wrong,
the gradual degradation of accuracy. Fig. 7 illustrates the case study nevertheless illustrates how the method
this degradation graphically by showing how the proposed in this paper can be employed to
various representations of the Business Object quantitatively analyze data accuracy deterioration
Outage Ticket were affected. It can be clearly seen across a business process.
how the initial accuracy of 99% in the input Outage

32

Authorized licensed use limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.
VI. CONCLUSIONS AND FUTURE WORKS [12] Jensen F. V., Bayesian Networks and Decision
Graphs. Springer New York, Secaucus, NJ, USA, 2001
In this paper a Bayesian network based architecture [13] R. Shachter. Probabilistic inference and influence
analysis method is proposed for the assessment of data diagrams. Operations Research, Vol 36, No 4, 1988, pp 36-
accuracy. The method caters for information and data 40.
transformation as a result of modification from [14] Pearl, J. “Probabilistic Reasoning in Intelligent
software or due to business process. Use of Bayesian Systems, San Mateo: Morgan Kaufmann, 1988
networks makes it possible to trace data provenance [15] Lankhorst M. et al, Enterprise Architecture At
and observe data deterioration in a quantitative Work, Springer Verlag, Heidelberg, 2005.
manner. The adoption of the ArchiMate language [16] Närman P. et al, “Using Enterprise Architecture
Models for System Quality Analysis”. Proceedings of the
makes it possible to analyse the accuracy of enterprise
12th IEEE International Enterprise Distributed Object
system data and illustrate the impact of the accuracy of Conference (EDOC), Munich, September 2008.
the business processes and software functions on this [17] ISO/IEC 9126-2, Software Engineering – Product
data. Quality – Part:2 External Metrics, International standard,
Future works include making more use of the ISO, Genève, Switzerland, 2001
Dynamic Bayesian Network formalism, to collect more [18] Batini, C., Scannapieco M., Data Quality:
empirical data to validate the method, and to Concepts, Methodologies and Techniques, Springer-Verlag,
incorporate other data quality dimensions into the 2006.
assessment framework. [19] Wang, R.Y., Ziad, M., Lee, Y.W., Data Quality,
Kluwer Academic Publishers, Boston, 2001.
[20] Wang, R. (Editor), Information Quality, M.E.
VII. REFERENCES Sharpe Inc. 2006
[21] Olson, J.E., Data Quality: The Accuracy
Dimension, Morgan Kaufmann Publishers, San Francisco,
[1] J.A. Zachman, “A Framework for Information 2003.
Systems Architecture”, IBM Systems Journal, IBM, vol [22] Y. Lee, L. Pipino, J.Funk, R. Wang. “Journey to
26(3), 1987. p 454-470. Data Quality”, MIT Press, 2006.
[2] The Open Group, TOGAF 2007 edition. [23] Redman, T.C., Data Quality for the Information
Zaltbommel, Netherlands: Van Haren Publishing, 2008. Age, Artech House, Boston, 1996.
[3] Department of Defense Architecture Framework [24] G. Brue, R G. Launsby, “Design for Six Sigma”
Working Group, DoD Architecture Framework Version 1.5, McGraw Hill , 2003.
Department of Defense, USA, 2007. [25] Murphy K.P., Dynamic Bayesian Networks:
[4] P. Johnson, R. Lagerström, P. Närman and M. Representation, Inference and Learning, PhD Thesis,
Simonsson, “Enterprise architecture analysis with extended University of California, Berkeley, 2002
influence diagrams”, Information System Frontiers Vol 9, No [26] The Graphical Network Interface (GeNie),
2-3, Springer, Netherlands, July 2007, pp. 163-180. Decision Systems Laboratory, The University of Pittsburgh.
[5] P., Johnson, M. Ekstedt, Enterprise Architecture – http://dsl.sis.pitt.edu. 11th February 2009.
Models and Analyses for Information Systems Decision
Making, Studentlitteratur, Lund, 2007.
[6] H. Jonkers, Architecture Language Reference
Manual v 4.1, Telematica Instituut / Archimate Consortium,
the Netherlands, 2006.
[7] P Johnson, E Johansson, T Sommestad and J
Ullberg, “A Tool for Enterprise Architecture Analysis”,
Proceedings of the 11th IEEE International Enterprise
Distributed Object Conference (EDOC), Annapolis, Oct.
2007.
[8] P. Johnson, R. Lagerström, P. Närman and M.
Simonsson, “Extended Influence Diagrams for System
Quality Analysis”, Journal of Software Vol 2, No 3,
Academy publisher, September 2007, pp. 30-42.
[9] OMG, “Unified Modeling Language”:, Version
2.2, 2009, URL: http://www.omg.org/technology/
documents/formal/uml.htm.
[10] N. Friedman, M. Linial, I. Nachman, and D Pe’er,
Using Bayesian Networks to Analyze Expression Data,
Journal of Computational Biology, Vol. 7, No 3-4, Mary Ann
Liebert, Inc., New Rochelle, NY, 2000, pp. 600-620
[11] Neapolitan R. Learning Bayesian Networks.
Prentice-Hall, Inc. Upper Saddle River, NJ, USA 2003.

33

Authorized licensed
View publication statsuse limited to: KTH THE ROYAL INSTITUTE OF TECHNOLOGY. Downloaded on February 4, 2010 at 05:11 from IEEE Xplore. Restrictions apply.

You might also like