KEMBAR78
An Overview of Repository Technology | PDF | Databases | Relational Database
0% found this document useful (0 votes)
44 views10 pages

An Overview of Repository Technology

Uploaded by

P729
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views10 pages

An Overview of Repository Technology

Uploaded by

P729
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/221310590

An Overview of Repository Technology.

Conference Paper · January 1994


Source: DBLP

CITATIONS READS

196 2,744

2 authors, including:

Philip A. Bernstein
Microsoft
302 PUBLICATIONS 27,900 CITATIONS

SEE PROFILE

All content following this page was uploaded by Philip A. Bernstein on 01 June 2014.

The user has requested enhancement of the downloaded file.


An Overview of Repository Technology

Philip A. Bernstein’ Umeshwar Dayal


Digital Equipment Corp. Hewlett-Packard Labs

Abstract Diverse databasetypes, managed by heterogeneous


databasesystems(DBMSs), somenot managedin a
A repository is a shared databaseof information DBMS at all, are being widely deployed. Their
about engineeredartifacts. We define a repository descriptions need to be managed in an integrated
manager to be a database application that way.
suPports checkout/checkin, version and Users want object-oriented (00) application
configuration management, notification, context development.Many of the objectsproducedby 00
management, and workflow control. Since the development are metadata, such as interface
main value of a repositoryis in the tools that use descriptionsandclasshierarchies.
it, we discusstechnicalissuesof integratingtools
with repositories. We also discuss how to Manufacturingenterprisesare alarmedby the high
implement a repository managerby layering it on cost lof information technology to introduce new
a DBMS, focusing especially on issues of products, most of it just translating information
programming interface, performance, distribu- ,; between different formats. They need to manage
tion, andinteroperability. thesedata formats and the data they describe,so that
all their&formation management, computer-aided
1 IntrGduction design (CAD), and product data managementtools
can shareon-line databases.
Metadata managementis a growing part of the database
business, driven by many trends in information The technical requirement to support the above
technology. For example, activities is not just a fancier catalog for the customer’s
database.Nor is it just introducing an object-oriented
Business process re-engineering has gotten the DBMS in place of a relational one. Rather, it’s
attention of executives of most large enterprises.This implementing a layer of control services on top of the
leads to the development of large processmodels and DBMS, called a repository manager, and integrating it
information models,which are metadataand needto be with many tools. The result of this integration is a
Itunaged. framework for metadata management, called a
repository Sysfem.
Information technology departments are buying
computer-aided software engineering (CASE) tools There is a small but rapidly-growing market for
piecemeal,and are now finding it ditlicult to mergethe repository systems.This market is led by vendors of data
design data (metadata)developedwith eachof them. dictionaries, CASE tools and CAD tools. Database
Large enterprisesare deploying “data warehouses” to
researchershave had little infhrence,exceptindirectly
via their OODB work. Conversely, the vendors have
cache databasesin user-orientedform. Tracking this made little use of what has been learned about
dataandwhereit comesfrom is a metadataproblem. OODBMSs. This paper attemptsto bridge this gap.
Permission to copy without fee all or part of this material is We propose a definition of repository (Section 2).
granted provided that the copies are not made or distiibuted for We then discusswhat functions a repositorymanager
direct commercial advantage, the KLDB copyright notice and the must support(Section3), how to integratetools with a
title of the publication and its date appear, and notice is given repository (Section 4), how to layer a repository system
that copying is by permission of the Very Large data Base on a DBMS (Section 5), and what technical problems
Endowment. To copy otherwise, or to republish, requires a fee need to be addressed for this technology to move
andor special permission porn the Endowment. forward (Section 6).
Proceedingsof the 20th VLDB Conference
SantiagoChile

1 Current Address Microsol? Corp., One Microsoft Way, Redmond, WA 98052-6399. pbilbe@tmi~i?.com
2 Address: Hewlett-Packard Labs, 1501 Page Mill Road (lU-4). P.O. Box 10490, Palo Alto, CA 94303-0969. dayal@plabs.hp.com.

705
2 What’s a Repository? Third, the information in the repository is subjectto
common control services, which makes sets of tools
A repository is a shared databaseof information about easier to use. Since a repository is a database,it is
engineered artifacts produced or used by an enterprise. subject to database controls, such as integrity,
Examples of such artifacts include software,documents, concurrency, and accesscontrol. However, in addition,
maps, information systems,and discrete manufactured a repository systemprovides checkout/checkin, version
components and systems (e.g., electronic circuits, and configuration control, notification, context
airplanes, automobiles, industrial plants). The fact that management, and workflow. These services are
artifacts are “engineered” leads to a variety of described in the next section. Of course, even in the
requirementsfor databasestructuring and access,which absenceof tools sharing data, a single tool can benefit
we describehere and in Section 3. from these repository services, though one would
Over the lifecycle of engineered artifacts, many probably not invest in building an independent set of
objects of many di@erent types are defined, created, complete and robust repository services unless the
manipulated, and managed by a variety of tools that investment were amortized over many tools that would
needto sharedata. For example, softwareengineersuse use the services (even if they won’t share via the
design tools, language editors, compilers, builders, and repository).
debuggers to create and test programs. Project By promoting data sharing through common data
managersuseplanning, tracking, financial analysis, and and information models, and imposing a common set of
reporting tools to create and manage project plans, control services, a repository is the centerpiece of an
spreadsheets,and reports. Technical writers use text integrated environment in which a dynamic collection of
and graphics editors, hypertext and document tools can work together.
managementtools to produce, manage,and manipulate
compound documents. System managers use 3 Repository Manager Functions
monitoring, reporting, diagnosis, and configuration tools
to monitor and reconfigure systemcomponents.
3.1 Introduction
The objectsthemselvesmay be stored in a variety of
A repository manager provides services for modeling,
storagesystems,such as file systems,databasesystems,
or hardapy filing cabinets. Descriptions of these retrieving, and managing the objects in a repository. It
typically usesone or more storagemanagersto storethe
objects are stored in the repository. In addition, the
repository may store information about an object’s objects. For example, a file object (e.g., a document,
location, its revision history, the tools and processesthat circuit diagram, or sourcecode program) may be stored
were usedto build it, constraints that it satisfies,who is in a file system,while descriptive attributes of the file
object (e.g., how and when it was created,who owns it,
authorized to accessor modify it, who is responsiblefor
managing it, and its dependencieson other objects. where to find it, lists of related objects)may be stored in
a DBMS. When a user asks to retrieve an object, the
Storing this information in a common repository has repository manager looks up the object’s location
several benefits. First, since the repository provides attribute and then copies the object into the user’s
storage services, tool develc@ers need not create workspace,such as a file directory or a private database.
tool-specific databasesof the objectsthat tools or use.
A repository manager should provide the standard
Second,a common repository allows tools to share amenities of a DBMS: a data model (to structure a
information so they can work together. For example, repository), queries (to browse a repository), views (to
the repository can store meta-metadata (such as enhance data independence of tools that access a
allowable data formats for record and field definitions), repo~tory), integrity control (to trap integrity
metadata(such as specific record and field definitions), violations), access control (for secure access), and
and data itself, which may be shared by tools such as transactions (for atomic multi-statement updates).
compilers and debuggers, database query processors, Managing metadatarequires additional servicesbeyond
forms managers,and report writers. Without a common those of a conventional DBMS. These services are the
repository, special protocols would be needed for main added value of a repository manager:
exchanging information between tools. By conforming checkout/checkin,version control, configuration control,
to a common data model (i.e., allowable data formats) notification, context management, and workllow
and information model (i.e., schema expressedin the control. We define a repository manager to be a data
data model), tools can share data and metadatawithout manager that offers these functions. Most repository
being knowledgeable about the internals of other tools. manager products offer these functions, though not
In this respect, a repository is like a data dictionary. always in their full generality. We describe these
However, while data dictionaries typically only store flmctions below.
metadam (database schemes, record and field
definitions), a repository can store information about the
whole range of objecttypes pertinent to an enterprise.

706
3.2 Checkout/checkin properties and behavior of the new version are
Activities that use a repository can be of long duration, derived from those of the versions being merged
lasting days or months. Treating the entire activity as (inherit from only one of them, or perform some
one transaction is impractical. For example, a system semantically meaningful merge). Some version
crash during such an activity could lose days of work. models distinguish one path in the version history as
the “main” line of descent; other versions are
Therefore, the repository manager must support variants which are merged back into the main line
checkout and checkin of objects.The checkout operation from time to time. While this main-line concept is
copies the object from the shared repository into the sometimesuseful, a general version model is needed
user’s private workspace. Afler working on the object, that supports true alternatives that may never be
the user issues a checkin operation, which copies the merged.
object from the private workspace into the shared
repository. Checkout and checkin execute as (separate) 3.4 Configuration control
short transactions.Essentially, checkout setsa persistent
lock on the object, which is released by checkin. Someobjectsin a repository are hierarchical collections
Checkout should support sharedand exclusive modes. of other objects, called composite objects. Both a
compositeobject and its componentsmay be versioned.
A configuration is a binding between a version of a
3.3 Version control compositeobject and a version of each of its (versioned)
Repository objects typically undergo a series of components. Not all the componentsof a configuration
revisions, which the repository manager represents as need to be versioned, e.g., the authors in a versioned
versions. A version is a semantically meaninghtl author list. Features of a configuration control model
snapshotof an object at somepoint in its lifecycle. The should include:
repository manager maintains each object’s version
history. A version history is a directed graph with one A mechanismfor representinga configuration.
node per version and an edgefrom version A to version A mechanism for identifying a configuration’s
B if B was derived from A. Featuresof a version model component versions, either explicitly by name or
should include [ 71: implicitly by context (e.g., John’s current
A mechanismfor representingversions as objectsin configuration of A includes the last checked-in
the repository. version of B and the last version of C created by
John).
A version naming mechanism, preferably both with
Operations for attaching component versions to
default and user-suppliednames.
ca~guratious and for detachingthem.
An operation for deriving a new version from an old
Mechanisms to define and enforce constraints on
one. The derivation operation specifieswhether the
properties and behavior of the new version are contigurations. One should avoid fixed constraints
copied from the old version or are modified in some on all configurations. E.g., a model might require
way. that a configuration contain only one version of each
component. This is useful for soflware systems,
Constraints on the ‘version history. Some models where different versions of a component could
restrict the version history to a single path, but this is interfere with each other. But it’s inconvenient for
too restrictive for most applications. complex vehicles, which might contain more than
one version of a par& especially after being repaired
A mechanism for identifying a particular version to a few times.
be used in a given context.
The semantics of change propagation in
The semanticsof checkout and checkin of versioned configurations. If one version of a component is
objects. Checkout in exclusive mode might replacedby another version, doesthat automatically
automatically create a new version, while checkout result in a new configuration7 If so, then when a
in shared mode does not. On checkin, the model versioned object is checked out, its composite
might allow multiple branches,or might require that objects must also be checked out, reducing
concurrently createdsiblings be merged. concurrency. It can recursively proliferate
A semantics for relationships between versioned configurations up the composite object hierarchy;
objects. A new version of an object may inherit e.g., replacing a version of a screw would cause a
relationships that were attached to the previous new version of a vehicle to be created. Not all
version of the object. applications want this behavior, so the model should
allow the configuration’s definer to specify the
An operation for declaring that two or more desired semanticsfor each component: createa new
independently-developed versions merge into one version, do not create a new version, or perform
version. The operation must specify how the

707
some other action (e.g., notify a user or invoke a automatically by a notification rule (e.g., becausea test
tool). ran successfully).

3.5 Notification A more general workflow control model would


provide primitives for describing a long duration activity
Many objectsin a repository are interrelated. When an as a collection of steps, the flow of control and data
operation is applied to one object, operations may need among the steps,stepsfor handling exceptions, etc., and
to be applied to related objects. For example, when a a controller to drive executions of activities. Such a
source module is changed, rebuild dependent object model is of general value, not just to the design of
modules. When one representationof a design object is engineered artifacts, so it should be available separate
changed, update the other representations. When a from a repository manager. However, it may use a
checkout request is received for an already-checked-out repository to hold its metadata.
object, grant the requestbut notify the concurrent users
of the object. 4 Tool Integration
These are all examples of notijication, where the Someusersjust want a repository for its data modeling
repository managerimplements rules of the form “When capability, for example, to describe scientific data sets.
event E occurs,if condition C holds, then perform action However, most users don’t really want just a repository
A.” The rules for change propagation, version control, per se. Rather, they want tools, and the repository makes
and configuration control might be “hardwired” into a those tools more valuable in someway.
repository manager. However, a general facility for
users to define and implement rules would be more Even when only one tool is of interest, users often
flexible. It would allow customization of version and want (some of) the control services that a repository
configuration control. It would support integrity manager provides. They’ll accept the database (i.e.,
control, accesscontrol, and propagation of operationsto repository) that comes along with it, if it’s explained
related objects. It would also enable the definition of why they need it: to model objects and collections for
enterprise-specific, domain-specific, or application- configuration management, to model relationships for
specific policies. For example, release control might notification services,etc.
define the rule “When the last signature on the approval Most users quickly graduate to an environment
list is obtained, change the status of the object to where multiple tools are used. For example, they might
‘Released’.”
want toolsetsfor businessprocessmodeling, application
design and analysis, DB design and analysis, application
3.6 Context management programming, application reengineer@, product
A context defines a view of the objectsin the repositoq. releasemanagement,or computer systemsmanagement.
It is typically used to define the set of objects that an At this point, they want their toolset integrated. There
engineer is manipulating for a particular task. It may are a number of dimensions in which to, integrate a
also include user preferences (e.g., language, editor, toolset. One critical dimension is to have the tools share
display), and specific rules and constraints to be data. Data sharing among tools requires a repository.
enforced.One should be able to put arbitrary objectsin a A minimal level of repository integration is tool
context, not just files. Thus, a file directory is often an invocation. A tool is defined in the repository by its
inadequatemechanismfor context management. interface and invocation method, so the repository
To carry out a task,’a user opensa context and then manager can invoke it and pass it objectsof the correct
performs operations on objects visible in this context. types. When invoking the tool, the repository manager
When the task is finished, the user closesthe context. A can exercise some control, such as triggering
user can have many contexts open at a time (e.g., an edit notifications or updating worldlow state. At this level of
context and a compile context). Also, contexts can integration, though tools are known to the repository,
remain open for a long time, and therefore should be they may not be able to sharedata.
persistent. This allows a user to leave a task for awhile, One way for tools to sharedata is via data exchange.
and the same;ordifferent user pick it up later. Data exported by one tool is translated into the impo$
format of another tool. With n tools, you need n
3.7 workflow control translators. One can do better by defining a canonical
Engineered artifacts progress through phases of a format for data translation, and building two translators
lifecycle, such as requirements, specification, design, for each tool, to translate the tool’s export format to
analysis, production, testing, and release. A repository canonical form and to translate canonical form into the
manager should support a wor@low control model to tool’s import format. This reduces the problem to 2n
track an object’s state relative to its lifecycle. A translators. The main technical problems are to have a
promote or demote operation changes the state of the sufficiently rich data model to representall shared data,
object. It can either be invoked mantiy by a user or a complete and tool-neutral information model for

708
objects that tools want to share, and translators that data in the repository’s format using the repository
properly interpret the semantics of the data they manager’s interface, which is quite expensive. Also, it.
translate. Some examples of this data exchange witch results in a tool that is completely dependent on the
approach are the Express data interchange standard for repository manager, which limits the tool vendor’s
CAD data translation [12], and the Software One market to customers that use this repository manager.
Exchange product that movesdata between CASE tools. Instead, one can leave the tool unchanged and use a
This is a state-of-the-practice solution in many fields virtual repository interface, which traps all of the tool’s
WI. data accesses(using whatever data accessinterface the
tool depends on) and translates them into accessesof
Although a data exchange switch can help tools repository objects. This is a common way to integrate
share data, it is not a database system. By using a unmodified UNIX tools with a repository; all UNIX
repository (i.e., database)to store data being shared by files operations are trapped and directed to the
tools, one gets someadditional benefits: repository [ 81.
you know where to look for objects. They’re in the Since tools are what give value to a repository, it
repository, not scattered around in files that are should be cheap and easy to integrate a tool with a
independently maintained by different tools. repository, so many tools can be integrated. The
you only have one copy of each shared object, state-of-the-art here is not very good and would benefit
thereby avoiding inconsistencies between copies of from someseriousresearchattention. Today, integrating
the sameobject managedby different tools a tool with a repository is an art, requiring protracted
technical negotiation between the tool vendors who
you don’t lose information moving from one tool to want to sharedata and the repository vendor,
another. Even data that can’t be representedin the
canonical model can at least be stored in the Tool integration work should be reusable, so a tool
repository in tool-specific format and not simply lost vendor’s integration effort is portable to many repository
during dataexchange. managers.This requires that all repositories support the
same application programming interface (API) and
you control all shared objectsthe sameway, with the canonical information model. In practice, this is hard to
sameversion model, configuration model, etc. do, since international standards are incomplete or
you can incrementally update shared data. By immature and there is no dominant vendor dictating a
comparison, data exchange is, by its nature, a batch standard. Still, there have been some successes.For
process. example, in the discrete mamifacturing area, there is a
canonical information model (STEP [ 111) which is
you can query the data, e.g., to find the revision written in the standard data modeling language
history of an object or all dependent objects of an (EXPRESS[ 13]), but the API for accessingsuch models
object. ? (SDAI [ 141)is immature and not yet widely supported.
Integrating In the CASE area, there is growing support for the
. a. tool._with _a repository can be.- the same
. CASE Data Interchange Format (CDIF), which is an
as integrating it with a data exchange switch: write
translators that move data from tool export format to information model for CASE data, but this model is
repository format and from repository format to tool unrelated to the Portable Common Tools Environment
import format. As with a data exchange switch, this (PCTE) API standard for CASE frameworks [lo].
requires that the repository have a rich canonical model Moreover, PCTE is principally oriented toward version
that can representall shareddata and that the translators and configuration control of coarse-grainedobjects,and
properly interpret the semantics of shared data. This is unsuitable for fine-grained data sharing. Competing
approachworks well when a tool checks out all the data CASE APIs are also being considered. For example,
it needsbefore executing, and checks in all the data it ATIS (A Tool Integration Standard) is being discussed
usedwhen it finishes. in ANSI [5]. ATIS is an extensible object+riented
information model covering the basic repository
Sometools need to accessdata interactively during functions describedin Section 3 plus an object-at-a-time
their execution (for example, a systembuilder, such as API to access repository objects. The existing IS0
“make”). The batch translator approach, which standardfor information resourcesis basedon SQL, but
imports and exports tools’ data, doesn’t work well in seemsto have had little commercial impact’. Presented
this case.One could modify each tool to directly access with such a confusing picture, many tool vendors are
individual repository objects, as needed. However, this delaying their investment in repository integration until
requires extensive modification of each tool, to access the market stabilizes. Or, they have implemented their

‘Incidentally, all of the API standardsmentioned above (SDAI, PCTE, ATIS) are throwbacks to a CODASYL-like
model. They offer record-at-a-time accesswill minimal content-basedretrieval and no attempt at SQL compatibility.
They would benefit from attention by DBMS languageexperts.

709
own information model which at least ensurestheir own many of the above capabilities. However, with the
tools can interoperate (e.g., Texas Instruments’ addition of abstractdata type facilities in many relational
Information Engineering Facility (IEF)). DBMSs, there may soonbe little difference in functional
capability between 00 APIs and SQL beyond syntactic
There’s a temptation to have the repository be the sugar [3,9].
tool’s native storage manager. While this is feasible in
principle, it is often undesirable, because high
performance may require that the tool use its private 5.2 DatabaseEngine Support
repository format. Most tools manipulate objects in a A repository manager is an application layered on a
“main memory database” which they construct after databaseengine, which could be a file system,relational
checking out the required files. A repository manager is DBMS @DBMS), or OODBMS.
inherently slower, in many casestoo slow for interactive
use. Most repository managersuse a file systemto store
coarse-grained objects, such as program source, text,
Even if a tool T uses a repository manager as its and diagrams. Somerepository managersalso use files
storagemanager,integrating T is still an issuewith tools to store objectsthat support their control functions, such
that don’t use this repository manager. To integrate T as versions and relationships. For example, classical
with tool U, either U must integrate with Ts repository CASE repositories operate this way, such as CMS,
manager or T’s and Vs repository managers must SCCS,rcs, MMS, and make. Although file systemsare
interoperate(seeSection5.5). light on functionality compared to DBMSs, they do
offer two advantages:they’re ubiquitous, so by relying
5 Repository Manager Implementation on a file system, a repository manager can easily be
ported to many operating systems; and they offer
A repository manager is an application layered on a excellent performancefor sequentialaccess.
DBMS. In this section we discuss some of the issues
involved in implementing this application. Some repository managers are implemented on a
combination of RDBMS and file system. The RDBMS
5.1 Repository API tables store descriptions of objects, and files store
objects themselves. This allows one to use unmodified
The API requirements for a repository manager are tools that use files for object storage, while for
essentially the sameas those for DBMSs in support of description data one gets the benefits of DBMS
CAD. Theserequirements are met well by 00 AEIs [l], amenities, such as transactions, referential integrity, and
which allow you to: queries. However, there are some problems with this
construct types for objects that support repository approach: Updatesto descriptions of objectsand objects
control functions, such as versions, version histories, themselvescannot be grouped in the same transaction
configurations, contexts,and rules. (because file systems don’t support transactions).
Administration of objects and descriptions is hard to
construct new atomic and complex types that coordinate (e.g., to coordinate backup and recovery so
representobjectswithin their domain. You can also that objectsand descriptions can always be recoveredto
construct bulk types of these objects,such as tuples, a mutually consistent state). And one cannot build
sets,sequences,and lists. indices on objectsand executequeries on objects(unless
one replicates some of each object in its RDBMS
representrelationships in a flexible and natural way. description, which creates potential consistency
use a inheritance hierarchy to sharetype information problems).
acrossmultiple types. The first two problems can be solved by storing
incorporate type-specific operations that encapsulate objectsin the RDBMS as binary large objects(BLOBS).
tools (e.g., edit a document,approve a design). The last requires that objectsbe decomposedinto pieces
which are stored separately in the RDBMS. But this is
navigate among objects,an object-at-a-time. hard to do with conventional RDBMSs, which require
dynamically construct new objects. objectsto be laid out in rigid table structures.
Queries over col&ctions of objectsin the repository Ideally, a repository managerwould map its “object
are needed in addition to object-at-a-time navrgatton. base” into labeled directed graphs, where objects are
Highly functional query languages are starting to be mappedto nodesand relationships are mappedto edges.
developed for OODBMSs [2], but they are not yet Operations include object-at-a-time navigation,
ubiquitous in OODBMS products. following paths of objects,and taking transitive closures
of subgraphs. A growing number of RDBMSs are
Entity-relationship ApIs support the data modeling supporting graph-type databasesvia 00 features, such
requirementsof repositories,but they don’t allow you to as user-defined data types, type constructors for
add new operations. SQL hastraditionally not supported complex user-definedtypes (such as recordsand arrays),

710
and extensions to SQL for transitive-closure-type proceduresthat run in the object server. One client call
operations. The SQL standard is also evolving to to a storedprocedurecan result,in many object accesses,
support thesefeatures [9]. thereby reducing client-server traffic. .
OODBMSs are a promising target for repository Most QQDBMSs implement a page server and the
manager implementation, because they can directly client process (the repository manager) has a page
implement graphs and graph operations. They have rich cache. Since a page accessbrings many objects to the
facilities for user-defined types and type constructors. client, if the client accessesmany objects on a page
They can lay out complex objectsin contiguous memory (e.g., navigating an object-at-a-time), client-server
instead of splitting them into different tables. They can traffic is lower than with an object server. Moreover, in
execute type-specilic methods. They are optimized for someOODBMSs, the client cache is not invalidated on
navigational operations; in some products, edge every transaction commit. Thus, when the client runs
traversal in the database graph only costs a pointer many transactions, it can build up a useful cache that’s
dereference,either by directly mapping databasepages continually reused,further reducing client-server trafllc.
into main memory or by “swizzling” (mapping) disk This gives OODBMSs a performance advantage,at the
pointers into memory pointers when objects move to cost of protection betweenthe repository manager’s and
memory. Also, since most OODBMSs target the CASE OODBMS’s addressspace.
and CAD markets, they provide limited forms of the
repository control functions (typically checkout/checkin, Many customersinsist that the repository manager
versions, and configurations). be layered on the same DBMS they use for other
purposes, e.g., for easier administration and training.
However, most OODBMSs are less mature than This means the repository manager must be portable
RDBMSs. Many provide limited transaction facilities acrossDBMSs, which makes it diflicult to get the full
(e.g. medium- or coarse-grained locking), limited performance benefit from certain DBMSs. We predict
support for queries, views, constraints, and triggers, and that successful repository manager vendors will attain
weak subsetsof SQL with limited query optimization. this portability with high performance, but it entails
much engineering expenseand therefore isn’t around the
In summary, both RDBMSs with 00 features and comer.
the best OODBMSs are promising targets for repository
manager implementation. Since both types of DBMS OODB benchmarks are probably representative of
products are immature and since experiencein building how repository managersuse a DBMS [6], but we know
repository managers on such types of products is of no published workload analysesto substantiate this
limited, it is too soonto saywhich type will dominate in intuition. Even less is known of how a repository
the long term. manager’s use of a DBMS affects performance. This
area would benefit greatly from systematicstudy.
5.3 Performance
A critical problem of today’s repository managers is 5.4 Distribution Issues
poor performance.A checkout or checkin operation on a A repository manager may offer transparent accessto
complex design object can take tens of minutes, for distributed data. That is, a client application may issuea
example, to traverse a large object base to find the location-transparent access, which the repository
relevant versions of objects in a large configuration. A manager translates to a local accesson the appropriate
design tool that accesseslarge parts of a repository can repository manager server. If the repository manager is
take hours. Usersfind this barely acceptable.They often implemented on a DBMS that supports transparent
work around the problem by storing a large set of distribution, then it can trivially rely on the DBMS’s
objectsas one large objectwhose internal structure isn’t capability. Otherwise, it must implement the capability
visible to the repository managerand which is read and itself
written as a unit. Of course, this reduces the value of
In a distributed repository, objectsin one repository
many repository control services, since they are unable
may referenceobjects in other repositories. Since each
to help managethe fine-granted componentsof a design.
repository manager needs the flexibility to move and
One aspect of repository managementperformance delete its own objects, these references should be
is cache management. Most RDBMSs implement a logical, not physical, and certain update operations need
record (i.e., object) server, and the client process(in this to check the integrity of these references.For example,
casethe repository manager) has an object cachethat is if it is illegal to delete an object that participates in a
invalidated on every transaction commit. Thus, each relationship, then a delete needsto check the validity of
object accesscostsa client-server message,except when relationships with objectsin other repository managers.
the server sendsmultiple use&l objectsin bulk (e.g., for Another form of integrity involves distributed
set-oriented access) or when the client’s transaction transactions. If an update transaction accessestwo or
accessesthe sameobject multiple times. Most RDBMSs more repository managers, then those repository
support “stored procedures,” which are application

711
managersmust usetwo-phasecommit to ensurethat the We believe it is unavoidable that many tools will, for
transaction is all-or-nothing. If the underlying DBMSs the foreseeable future, have replicated heterogeneous
do not support this capability, then the repository repositories,for the following reasons:
managerneedsa private workaround.
Many existing tools have already committed to a
In a design environment such as CASE or CAD, private repository implementation. E.g., database
users normally operate on repository data for long systems.Theserepositories are already well-tuned to
periods. Even if they use data managed by a remote the tool’s performancerequirements.
repository manager, they probably need a private
(replicated) copy of that data to get adequate Many tools need to be portable across operating
performance.Fully symmetric data replication is beyond systems. Therefore, they can only depend on a
the state-of-the-art of distributed DBMSs. Therefore, repository manager that runs on those operating
one needs to exploit application-specific behavior to systems and there are few such products on the
implement a replication scheme. For example, if market.
versions are immutable, and sharing over short periods In an object-oriented world, some objects will be
is rare, then copiesof versions of popular objectscan be designed to maintain some state that describesthe
distributed to all servers, say overnight, so they can be object. It will be some time before repository
read locally on demand [15]. As another example, a technology is so mature that all objectswill entrust
remote checkout operation with intent to update may all their stateto a sharedrepository manager.
create a new locked version in the remote and local
repository. On checkin, the local copy is written to the Thus, the problem of maintaining consistent
remote repository and deleted from the local repository. heterogeneousrepositories must be faced. Or we will
This replication technique is used in Digital’s have to wait for a repository technology to dominate the
CDDLRepository[4]. product world and for tools to be written or re-written to
usethat technology
5.5 Interoperation
6 Conclusion
To share information, heterogeneous repository
managershave to interoperate. That is, object types in We have arguedthat repository systemsare an important
one repository’s information model or data model must type of databaseapplication and a worthwhile area of
be accessibleas object types in the other’s model. This study. We proposed definitions for “repository” and
requires mapping operations on one repository manager ‘ ‘repository manager.’ ’ We discussed approaches to
into operationson the other (seefig. 1). repository tool integration. And we discussedrepository
managerimplementation issues.
Application --) Repository Mapper,- Repository We believe repositories are a field that would benefit
Program 1 ManagerRl 4 Manager R2
by more intense study by the database research
c&mu&y. Some specific areas that warrant attention
Reposit04 Repositoj are tool integration techniques, repository manager
Operations Operations performance, a completely general model of versions
in Rl’s in R2’s and configurations, interoperability of heterogeneous
Language Language
repositories and repository managers, comparisons of
Figure 1: Mapping OperationsBetweenRepositories commercial products, and case studies of using
repositoriesin different tool domains.
Ideally, each object exists in only one repository,
with referencesto it in other repositories that share the 7 Acknowledgments
object. This way, each object is only updated in one Most of what we know about repository systems we
place. If replicas of an object exist in heterogeneous learned from dozens of engineers at Digital who work
repositories, then updates must be propagated to all on repository products and strategy.We especiallythank
replicas. Semantic differences can make this hard to Jonathan Bauer, Jim Gray, Ken Moore (now at Iris
automate. Often, even a manual solution is hard, i.e., Software), Chip Nylander, Neil Schutzman,Al Simons,
writing a repository-specific and information-model- and Melissa Waldie.
specific program for cross-postingupdatesbetweentwo
repositories. For example, if a versioned record
definition in a CASE repository may be shared with an
RDBMS catalog, how do you translate an update in the
CASE repository that creates a new version into an
RDBMS catalog update7What if they don’t support the
samedata types?What if a relationship is many-to-one
in one repository but many-to-many in another?Etc.

712
8 References Standardization,1993.

1. Atkinson, M. et al, “The Object-Oriented Database 14. Product Data Representation and Exchange. STEP
Systems Manifesto, ’ ’ in Deductive and Part 22: Standard Data Access Interface
Object-Oriented Databases, Elsevere Science Specification, IS0 WD 10303-22, Working Draft
Publishers,Amsterdam,Netherlands, 1990. TC184/SC4/WG7/N350, International Organization
for Standardization, 1993.
2. Cattell, R.G.G. (txi.), The Object Database Standard
ODMG-93, Morgan Kaufmann Pubs., 1993. $15.Prusker, Francis, Edward P. Wobber, “The Siphon:
Managing Distant Replication Repositories,”
3. Committeefor Advanced DatabaseFunction, ‘ ‘Third Technical Report 42, Digital Systems Research
Generation Data Base System Manifesto,” ACM Center, Palo Alto, April 1990.
SIGMOD Record 19.3 (Sept 1990), pp. 3 l-44.
16. Ring, K. “U.K. Start-Up Software One Claims to
4. Digital Equipment Corporation, CDDmepository Have Nuts and Bolts of AD/Cycle,” Computergram
fr$&ture Manual, Field Test 3 Draft, March International, Issue No. 1512 (Sept. 14, 1990),
Applied Data Services,England.

5. Goering, R., “Standardization Effort Targets Dam


Management for CASE,” Computer Design 27, 18
(Oct. 1, 1988), pp. 28-30.

6. Gray, J. TheBenchmark Handbookfor Database and


Transaction Processing Systems, Morgan
Kaufmann, SanMateo, CA, 1991.

7. Katz, R.H. “Toward a Unified Framework for


VersionModeling in Engineering Databases.” ACM
SurveysVol. 22, No. 4}, December1990.

8. Lain, Roy and Paul R. McJones, The Vesta


Approach to Precise Conjiguration of Large
Software Systems,Tech. Report 105, Digital System
ResearchLab, Palo Alto, June 1993.

9. Melton, Jim (ed.), Database Language SQL 3,


ISO/ANSI Working Draft, ANSI X3H2-93-091 and
IS0 DBL-YOK 003, February, 1993.

10. Portable Common Tool Environment (PCTE)


Abstract Specification. ECMA European Computer
Manufacturing Association Standard ECMA-149.
December1990.

11. Product Data Representation and Exchange. STEP


Part 1: Overview and Fundamental Principles, IS0
CD 10303-1(E), International. Organization for
Standardization,1993.

12. Product Data Representation and Exchange. STEP


Part 21: Clear Text Encoding of the Exchange
Structure (Physical File), IS0 CD 10303-21,
InternationaI Organization for Standardization,1993.

13. Product Data Representation and Exchange. STEP


Part 1I: Express Language ReferenceManual, IS0
CD 10303-11, International Organization for

713

View publication stats

You might also like