Data Model Standards v3.3 Guide
Data Model Standards v3.3 Guide
Table of Contents
SECTION 1. OVERVIEW ............................................................................................................................................... 1
1.1. INTRODUCTION ...................................................................................................................................................... 1
1.2. PURPOSE ............................................................................................................................................................. 1
1.3. SCOPE ................................................................................................................................................................. 1
1.4. BENEFITS ............................................................................................................................................................. 1
1.5. INTENDED AUDIENCE ............................................................................................................................................. 2
1.6. ORGANIZATION OF THE DOCUMENT ......................................................................................................................... 2
1.7. FSA DATA MODELING ENVIRONMENT...................................................................................................................... 2
1.8. REFERENCES ........................................................................................................................................................ 3
SECTION 2. MODELING STANDARDS AND GUIDELINES ........................................................................................ 4
2.1. OVERVIEW OF FSA DATA MODELS ......................................................................................................................... 4
2.2. FSA DATA MODEL DEFINITIONS ............................................................................................................................. 4
2.2.1. Entity-Relationship Data Model (ERDM) ..................................................................................................... 4
2.2.2. Entity Data Model ....................................................................................................................................... 5
2.2.3. Entities ........................................................................................................................................................ 5
2.2.4. Relationships .............................................................................................................................................. 5
2.2.5. Attributes .................................................................................................................................................... 5
2.2.6. Class Word ................................................................................................................................................. 5
2.2.7. Meta Data ................................................................................................................................................... 5
2.3. FSA DATA STANDARDIZATION ENVIRONMENT .......................................................................................................... 6
2.4. COMMON NAMING AND DEFINITION STANDARDS ....................................................................................................... 6
2.4.1. Naming Standards ...................................................................................................................................... 6
2.4.2. Definition Standards ................................................................................................................................... 6
2.5. UNIQUE DATA OBJECT NAMING AND DEFINITION STANDARDS AND GUIDELINES ........................................................... 7
2.5.1. Entity versus Object Modeling .................................................................................................................... 7
2.5.2. Data Object Naming Conventions ............................................................................................................... 8
2.5.3. Data Object Definitions ............................................................................................................................. 10
2.5.4. Generic Class Words ................................................................................................................................ 10
2.6. CONCEPTUAL DATA MODELING (CDM) STANDARDS AND GUIDELINES ...................................................................... 11
2.6.1. Conceptual data model packaging:........................................................................................................... 11
2.6.2. Conceptual data model level of detail: ...................................................................................................... 11
2.6.3. CDM Review Template ............................................................................................................................. 12
2.7. LOGICAL DATA MODELING (LDM) STANDARDS AND GUIDELINES .............................................................................. 12
2.7.1. LDM Review Template ............................................................................................................................. 12
2.8. PHYSICAL DATA MODEL (PDM) STANDARDS AND GUIDELINES ................................................................................. 13
2.8.1. PDM Review Template ............................................................................................................................. 14
SECTION 3. DATA MODEL REGISTRATION (DMR) ................................................................................................. 15
3.1. DMR POLICY ...................................................................................................................................................... 15
3.2. DMR PROCEDURES ............................................................................................................................................ 15
3.2.1. DMR Procedures for existing applications ................................................................................................ 15
3.2.2. DMR Procedures for future developments in support of TSV ................................................................... 16
3.2.3. DMR Process Flows ................................................................................................................................. 16
3.3. ROLES AND RESPONSIBILITIES FOR THE DMR PROCESS ......................................................................................... 18
3.3.1. DMR Responsibilities of the EDD Team ................................................................................................... 18
3.3.2. DMR Responsibilities of the System Developer........................................................................................ 18
3.4. REGISTRATION REQUIREMENTS FOR DATA-RELATED ARTIFACTS.............................................................................. 18
3.4.1. Data Model Meta Data Registration Requirements ................................................................................... 19
3.4.2. Data Model Entity or Table Meta Data Registration Requirements........................................................... 19
3.4.3. Attribute/ Column Meta Data Registration Requirements ......................................................................... 19
3.4.4. Registering COTS Meta Data Requirements ............................................................................................ 19
3.4.5. Registering all the supporting documents related to data models............................................................. 19
List of Figures
Figure 3-1: Registration Process Flow Diagram ........................................................................................................... 17
Figure I-1: Information Engineering Notation ............................................................................................................... I-1
Figure M-1: Process Flow for Registering a Data Model ............................................................................................ M-1
Figure M-2: Process Flow for Data Model Registration .............................................................................................. M-2
Figure M-3: Data Model Resubmission Process Flow................................................................................................ M-3
Figure M-4: ECDM Registration Process Flow ........................................................................................................... M-4
Figure M-5: ECDM Change Request Process Flow ................................................................................................... M-5
Figure N-1: System Developer Section of Registration Form ..................................................................................... N-1
Figure N-2: EDM Section of Registration Form .......................................................................................................... N-1
Figure N-3: Logical Data Model- Entity Information Entry ......................................................................................... N-2
Figure N-4: Logical Data Model- Attribute Information Entry ...................................................................................... N-3
Figure N-5: Physical Data Model- Table Information Entry ........................................................................................ N-4
Figure N-6: Physical Data Model- Column Information Entry ..................................................................................... N-5
Figure O-1: Example Entity Relationship Diagram (ERD) .......................................................................................... O-2
Figure O-2: Example of Improperly Designed ERD.................................................................................................... O-3
Figure O-3: Example ERD with "Crows Feet" ............................................................................................................ O-3
Figure O-4: Components of an ER Diagram .............................................................................................................. O-5
Figure O-5: Example ERD Techniques Using the ERD Components ........................................................................ O-5
List of Tables
Table 1-1: Intended Audience ........................................................................................................................................ 2
Table 2-1: Data Object Names Matrix ............................................................................................................................ 8
Table 2-2: Naming Standards Matrix.............................................................................................................................. 9
Table D-1: Entity or Table Metadata Abbreviations .................................................................................................... D-1
Table D-2: Entity or Table Metadata Field Items ........................................................................................................ D-2
Table E-1: Attribute or Column Metadata Filed Items ................................................................................................ E-2
Table F-1: Conceptual Data Model - Review Template .............................................................................................. F-1
Table G-1: Logical Data Model- Review Template ..................................................................................................... G-4
Table H-1: Physical Data Model- Review Template ................................................................................................... H-2
Table L-1: Data Mapping Example .............................................................................................................................. L-2
Table O-1: Levels of Data Normalization ................................................................................................................... O-6
Table A-1: Acronyms and Abbreviations .................................................................................................................... A-1
Section 1. Overview
1.1. Introduction
Data is principal Federal Student Aid (FSA) resource, which like other organizational resources, must be
managed effectively. The use of standardized data enhances the interoperability among FSA information
systems, facilitates increased data sharing, reduces data handling costs and leads to better data
accuracy, consistency and timeliness. The policies and procedures of this document provide the
framework necessary to maximize data sharing and exchange opportunities and to enable standardized
data modeling throughout FSA.
The data required by FSA is maintained in both the Erwin Data Dictionary (DD) and the IBM Information
Server (IIS) Business Glossary (BG). These two sources are collectively referred to as the Enterprise
Data Dictionary (EDDI) and provide data as input to the data modeling process. Data models contain
data entities, the relationships between the various data entities, and data entity attributes. One output of
the data modeling process is the graphical representation of the data model as an Entity Relationship
Diagram (ERD), specifically, of the Enterprise Conceptual Data Model (ECDM) and Enterprise Logical
Data Model (ELDM).
1.2. Purpose
This document illustrates a clear, effective process to be utilized by FSA for adopting, developing,
implementing and maintaining data modeling and registration standards for the purpose of information
sharing and exchange. This document, in combination with the “Data Standardization Policies and
Procedures” and the "Enterprise Data Management Data Policies", supports effective Data Management
operations across FSA. Throughout this document, standards are preceded by an (S), and guidelines are
preceded by a (G).
1.3. Scope
The scope of this document is to:
• Define the FSA data modeling standards and guidelines including naming conventions.
• Identify the FSA policies and procedures for Data Model Registration (DMR). These policies and
procedures address how to document the existing FSA business/legacy systems data models
and how to develop future data models for new business systems across FSA.
1.4. Benefits
The data modeling standards, policies, and procedures outlined in this document establish a sound
foundation for enterprise data modeling including:
• All the existing business systems / legacy data models are available in one place.
• All the future developments for new business systems will follow the same policies and
procedures for data standardization and data model development.
• The Enterprise Data Directorate (EDD) Team will review and validate the new business system
data models for compliance, thereby increasing the consistency and transparency of data
architecture across the enterprise.
• The common DD is available for use by future business systems, thereby improving data
exchange across all the business systems.
• Seamless data integration between all the new business systems is enabled.
• Data modeling standards are specific rules for the development and modification of the names,
definitions, and other metadata classes, attributes and data models.
• Data modeling standards establish a required level for the correctness, consistence, and
completeness of data models.
• Data modeling standardization focuses on universal understanding of data and their business
context throughout the enterprise rather than within the confines of a particular environment.
Policies and procedures are instructions and recommendations for the successful implementation
of data modeling standards.
Following the data modeling standards will reduce the time required for users and system developers to
identify information assets appropriate for decision making, and will permit identification and reduction of
redundant data stores, interfaces and their associated costs.
ETL developer helps to extract, transform and load data from original source to
target source
common and consistent way of categorizing, and describing data to facilitate data sharing and integration.
The concept of capturing and describing data follows the standard developed by the International
Standardization Organization (ISO)/International Electrotechnical Commission (IEC).
FSA utilizes Erwin to maintain Conceptual Data Models (CDMs), Logical Data Models (LDMs), Physical
Data Models (PDMs), the Enterprise Data Model and a Data Dictionary (DD).
Erwin is used for the development of the conceptual, logical and physical data models, and physical
database design and construction. Erwin has the following features:
• Strong logical model capabilities
• Bidirectional synchronization of logical and physical models
• Automatic database construction
• Reverse-engineering of databases
• HTML-based documentation and reporting facilities
• Data modeling repository for cooperative team modeling.
1.8. References
The following external documents provide either governance or guidance for this document.
DOCUMENT
DOCUMENT TITLE
VERSION
ED Enterprise Data Standards and Guidelines, March 23, 2005
ED Enterprise Data Management – Data Model Registration Process, January 2006
Federal Student Aid – Enterprise Data Dictionary Standards, April 2007 Version 1.0
Table 1-2: Reference Documents
1
Liberty University; Data Modeling: Data Modeling
2
University of Regina; Data Modeling and Entity Relationship Diagram (ERD): Data Modeling and ERD
2.2.3. Entities
Entities are the principal data object about which information is to be collected. Entities are usually
recognizable concrete, tangible or abstract concepts, such as person, places, things, or events important
to the organization’s business. They usually refer to objects that are relatively stable and long-lived. Some
specific examples of entities are PERSON, ORGANIZATION, or LOAN. An entity is analogous to a table
in the relational database model. Properties describe the entity by giving it a name and type (primary or
associative entity.) Entities must have:
• A name complying with FSA’s naming conventions.
• An unambiguous succinct definition.
• A minimum of two attributes: one unique identifier (key) and one or more descriptive attribute.
• A minimum of one relationship to another entity.
2.2.4. Relationships
A Relationship represents an association between two or more entities. An example of a relationship
would be: “employees are assigned to projects.” Relationships can either be one-to-one, one-to-many,
many-to-many, zero-to-one, or zero-to-many. Relationships between entities are named, which defines
the purpose of the relationship. For more information about relationships see Appendix J.
2.2.5. Attributes
Attributes describe the entity of which they are associated. A particular instance of an attribute is a value.
For example, "Jane" is one value of the attribute “First Name”. Attributes can be classified as identifiers
or descriptors. Identifiers are more commonly called keys, which uniquely identify an instance of an
entity. A descriptor describes a non-unique characteristic of an entity instance.
Each attribute is the physical representation of the column of a table in the relational database model.
3
Tabular Data: is "data in a table", organized in rows and columns (table structure).
4
Federal Enterprise Architecture – Data Reference Model, November 2005
As with the above naming standards, and regardless of the particular domain, the following 7 definition
standards apply throughout the enterprise. Once again, the sequence of the standards listed below does
not imply any order of importance. In summary, a definition must:
1. A definition must be unique and distinguishable from every other data element definition.
2. A data definition must be precise, concise, and unambiguous. The definition should be clear
enough to allow only one possible interpretation.
3. The data definition must be expressed in the singular.
4. The definition must include the essential meaning or primary characteristics of the concept.
5. The definition must only use abbreviations when necessary and the abbreviation must be
commonly understood.
6. The definition must not contain any embedded definitions or underlying concepts of other data
elements/terms/concepts.
7. The definition must not include statements about why and how a data element is used.
Additional information relating to creation of a well-written definition is in Appendix Q of this document.
“entity” and “class” are frequently used interchangeably. This is correct for naming and definition
standards and guidelines, but there are some other minor technical differences.
For example: A Class diagram gives an overview of a system by showing its classes and the
relationships among them. Class diagrams are static -- they display what interacts but not what happens
when they do interact. Unified Modeling Language (UML) class notation is a rectangle divided into three
parts: class name, attributes, and operations. Relationships between classes are the connecting links.
In general class diagram has three kinds of relationships.
• Association -- a relationship between instances of the two classes. There is an association
between two classes if an instance of one class must know about the other in order to perform its
work. In a diagram, an association is a link connecting two classes.
• Aggregation -- an association in which one class belongs to a collection. An aggregation has a
diamond end pointing to the part containing the whole.
• Generalization -- an inheritance link indicating one class is a superclass of the other. A
generalization has a triangle pointing to the superclass.
An association has two ends. An end may have a role name to clarify the nature of the association. For
example, an OrderDetail is a line item of each Order.
A navigability arrow on an association shows which direction the association can be traversed or
queried. An OrderDetail can be queried about its Item, but not the other way around. The arrow also
lets you know who "owns" the association's implementation; in this case, OrderDetail has an Item.
Associations with no navigability arrows are bi-directional.
The multiplicity of an association end is the number of possible instances of the class associated with a
single instance of the other end. Multiplicities are single numbers or ranges of numbers. For example,
there can be only one Customer for each Order, but a Customer can have any number of Orders.
5
Modifiers can be adjectives, adverbs, absolute phrases, infinitive phrases, participle phrases,
prepositional phrases, adjective clauses, and adverb clauses. For an example go to: Modifiers
• (S) Relationship name: The relationship name must be a present tense verb surrounded by two
entity names and can be expressed in either of two directions, depending on which entity is listed
first. The name is composed of the following components through underscore:
o Position1: entity 1 (for example: Person)
o Position 2: verb (for example: is)
o Position 3: entity 1 (for example: Student)
For instance, the full name of a relationship could look like: Person_is_Student.
• (S) ERD name: The ERD name must be unique across all data models and must be singular.
The name of the diagram is composed of the following components concatenated through
underscore:
o Position1: project folder (for example: name of the primary business unit the project falls
under, such as TechnicalOffice)
o Position 2: project name (for example: the project itself, such as IPM)
o Position 3: indicator for type of data model (for example: “c” for conceptual, “l” for logical,
and “p” for physical data model)
o Position 4 - version of the model (for example: v1)
For instance, the name of an ERD could look like: TechnicalOffice_IPM_p_v1.
• Description
• Status
• Image
(S) If an existing attribute of the legacy systems will be used to produce the future application data model
development and it does not have one of the proposed generic elements/class words at the end of its
name, then the attribute will be renamed. The generic element/class word that best corresponds to the
purpose of the attribute will be selected as its new name. The recommended length of an attribute name
is a maximum of 32 characters. If the name of the attribute is longer, the attribute name will be truncated
to 32 characters when developing the physical data model, which will no longer support the goal of
keeping consistent naming across the different levels of the data models. In effort to maintain the
connection between the original attribute name (as found in the legacy system) and, the new
renamed/truncated attribute name, the legacy system attribute name will be linked to the new attribute
name as a synonym in the Erwin Data Dictionary and, the IIS Business Glossary.
• (S) Association relationship verbs or verb phrases must be strong, specific, active-tense in one
direction and passive in the other instead of general, one-word verbs of being such as “is,” “has,”
or “contains.” It is permissible to begin verb phrases with the verbs in these examples. It is also
permissible to use one-word verbs if they say something specific such as “files,” “owns,” or
“resolves.” For example, the relationship between “OrganizationType” and
“OrganizationTypeRole” (associative entity) can be described as:
o “One (1) OrganizationType is allowed to act in zero (0) or many (m)
OrganizationTypeRole”, and the reverse relationship reads
o “One (1) OrganizationTypeRole is classified by one (1) OrganizationType”
• (G) A certain amount of denormalization is usually necessary when implementing the physical
data model. Denormalize only if you can demonstrate a performance gain. Losses in
maintaining data integrity must be justified by the performance gain. Some examples follow:
o Replication of some attributes to reduce the number of multi-table joins needed for
commonly used functions. Justification should explain how the replicated instances are
to be synchronized.
o Storage of computed values to avoid joins for aggregations. How and when will
computation occur?
o Allow a limited usage of repeating fields. For example, Phone1, Phone2, and Phone3
might be a reasonable denormalization, rather than creating a separate Phones table.
• (G) Document the range of values (domain) for each column. Some of these may be enforced by
means of constraints or relationships to other tables.
• (G) Estimate the expected storage requirements for each table based on the size of each row,
expected growth, number of rows, and archiving requirements.
• (G) Define alternate keys that will enhance performance by supporting common search paths.
• (G) Understand the capabilities of the specific database product. Performance improvements
may be realized by taking advantage of features such as clustered indices, caching, and index
optimization.
• (G) Define security requirements for every attribute and plan for implementation of security
policies.
• (G) Although this document does not address the actual mechanics of converting a logical data
model to a physical data model, a detailed explanation of the migration between these two data
models must accompany the physical data model unless such migration is intuitively obvious to
every model reviewer. This migration explanation must address the addition of new columns to
existing tables, new tables that need to be part of the physical design, entities now stored in rules
engines, and relationship changes.
• (G) A PDM for a current application release may display tables and columns that will not be
populated until later releases if:
o It is done for all of the columns that compose a particular table, and
o A note identifies each unpopulated table. For this guideline, an attribute that may not be
null is considered to be populated if it initially uses a default value.
• (G) A PDM may use either vertical or horizontal table design. The LOB data type appears the
same way on the PDM regardless of which table design is used.
This is a choice between a larger number of smaller rows (vertical) and a smaller number of larger rows
(horizontal). The vertical table design supports tax form changes from year to year. The horizontal table
design may require column changes when the tax form changes.
• (G) A physical data model may store a document in a separate table or column that is linked to
the primary LOB data type attribute.
• (G) If necessary, PDM tables may display referential integrity rules.
(S) Referential integrity rules must be defined as constraints on a named foreign key for updates or
deletions. These rules must reflect the business rules (for the associated data) and, may be restricted to
set to null, or cascade.
FSA_DA_LEGACY
o Business System 1 (for example: CSB)
o Business System 2 (for example: IPM)
o Business System 3 (for example: NSLDS)
o ERWIN_ARCHIVE created from ERStudio
Each sub folder contains only the most recent version of the conceptual, logical and physical data
models. All previous versions will be moved to the ERWIN_ARCHIVE created from ERStudio sub folder
within the FSA_DA_LEGACY folder.
The system developers from the business sponsor will submit their data models for registration with the
EDD team. The EDD Team will maintain all the data models of existing legacy systems and ongoing
modernization projects and will use or share this information across the enterprise as needed for the
purposes of integration, migration, data exchange
As part of the DMR Policy, the EDD Team collects data models for the following scenarios:
1. The initial registration in the repository of a business sponsor’s legacy or modernization project
data models.
2. Re-submission to capture major changes to data models.
3. Regular submission of the data model at least annually from the business sponsor.
The above scenarios have been documented as process flows and are enclosed in appendix N. As part
of each data model submission, the system developer from the business sponsor will fill and submit the
data model registration template. This template can be found in the appendix O.
Additionally, the EDD Team has developed templates for capturing entity and attribute information for a
LDM, and table and column information for PDM registration. These templates are available in Appendix
O. Submission of these templates is not mandatory if the system developer from the business sponsor
can provide this information in a format using a data modeling tool that is compatible with Erwin. For
PDMs, the submission of the Data Definition Language (DDL) used to generate the database is
acceptable.
The EDD Team will maintain the EDM (Enterprise Data Model). The EDM will be available for certain
business sponsors for review. For any proposed changes to this EDM data model, the business sponsor
must follow the process as defined in the Data Standardization Policy and Procedures document using
the template as defined in appendix O.
instance, the data model might only be available as a pdf-file, but not in a format that can be imported to
Erwin for further manipulation.
• Non-COTS Application:
o All data models in the Erwin format should be registered in the Erwin repository.
o All templates of the Meta data items are stored in the version controlled repository for
subsequent inclusion in the Erwin DD and the IIS Business Glossary.
o All review templates are stored in a version-controlled repository (SharePoint).
• COTS Application:
o Data models of the COTS product are proprietary. Only part of the information (like user
manuals, reference guides) is stored in the version-controlled repository (SharePoint).
o All the customization of business rules for the application is captured in the version
controlled repository (SharePoint).
o All the information related to data exchange should be captured in the version controlled
repository (SharePoint)
Appendix B - Glossary
TERM DEFINITION
Column A set of data values of the same type collected and stored in the rows of a table.
Database A set of table spaces and index spaces.
Data Element A generic term for an entity, table, attribute, or column in a conceptual, logical, and
physical data model.
Enterprise One of the initial components of Enterprise Data Architecture. The first enterprise level
Conceptual Data data model developed. The ECDM identifies groupings of data important to Lines of
Model (ECDM) Business, Conceptual Entities, and defines their general relationships. The ECDM
provides a picture of the data the enterprise needs to conduct its business. (Reference:
U.S. Department of Education Enterprise Data Architecture – Enterprise Data Standards
and Guidelines.)
Enterprise Logical A component of a maturing Enterprise Data Architecture. The second enterprise level
Data Model data model developed. It is the result of merging application level data model information
(ELDM): into the existing Enterprise Conceptual Data Model (ECDM). The ELDM extends the
ECDM level of detail. (Reference: U.S. Department of Education Enterprise Data
Architecture – Enterprise Data Standards and Guidelines)
Schema (Data) Any diagram or textual description of a structure for representing data. (Reference: FSA-
EDM)
List of 17 optional proposed class words for the FSA to be used for each attribute or column name.
GENERIC
ELEMENT/ TYPICAL
ABBREVIA FIELD OR METADATA
CLASS DEFINITION FEEDER TERMS
TION PARAMETERS
WORD REPRESENTED
NAME
Quantity A non-monetary numeric QTY Count, Index, Mean, Defined “The (optional
value that does not have to Median, Mode modifiers) quantity of…”
be a whole number. Integers, optional Floating
Point or Fixed Point with
decimal place count quantity
Rate A quantitative expression RT Factor, Percent Defined “The (optional
that represents the numeric modifiers) rate of…”
relationship between two Integers,
measurable units. Floating Point or Fixed Point
with decimal place count
quantity
Year A numeric value for a YR Calendar Year, Tax Defined “The (optional
calendar or fiscal year. Year, Fiscal Year modifiers) year of/when/in
which…”
Integers
Time A notation of a specified TM Time Defined “The (optional
chronological point within a modifiers) time of/when…”
period of time. Integers
Timestamp A specific point of time, TS Timestamp Defined “The (optional
described as a sequential modifiers) timestamp
seven-part value, of/when…”
presented in the order of Format is
most significant component YYYYMMDDHHMMSSNNN
to least significant where: YYYY is the year;
component MM is the month; DD is the
day; HH is the hour; MM is
the minute; SS is the
second; NNN is the
microsecond
GENERIC
ELEMENT/ TYPICAL
ABBREVIA FIELD OR METADATA
CLASS DEFINITION FEEDER TERMS
TION PARAMETERS
WORD REPRESENTED
NAME
The preferred class word is
Rate with Percent as an
optional class word modifier.
Identifier A combination of one or ID Designator, Key, Defined “The (optional
more numbers, letters, or Number modifiers) identifier that
special characters that represents…”
designates a specific object
or data model class but has
no readily definable
meaning.
Period A range of dates and/or PRD Defined “The (optional
time that has a starting modifiers) period of/when…”
point and an ending point.
Indicator An attribute having two IND Defined “The (optional
possible values that are modifiers) indicator for/of…”
contrary.
Text An unformatted character TXT Comment, Defined “The (optional
string generally in the form Description modifiers) text of…”
of words.
Description An unformatted character DSC Defined “The (optional
string, generally in the form modifiers) description of…”
of words. The preferred class word is
Text with Description as an
optional class word modifier.
Status The condition of an object STAT Defined “The (optional
or data model class at one modifiers) status of
stage in a related series of The preferred class word is
events Code with Status as an
optional class word modifier.
Image The use of a Binary Large IMG An example of an Defined “The picture of…”
Object (BLOB) to store Image would be the Recommend use of
binary data such as electronic signature modifiers with the class word
pictures on a tax return. Image to specifically
describe what the image is
capturing (e.g., Electronic
Signature Image).
Table C-2: List of Optional Proposed Class Words
As a general rule, fewer proposed generic elements/class words are preferable because they provide a
tighter enterprise data structure and minimize potential confusion. For example, the Department of
Defense (DoD) uses only four proposed qualitative generic elements/class words: Identifier, Code,
Name, and Text. Department of Education deprecated the Number, Type, Count, Duration, Percent,
Description, Status class words. However, the usage of the class words varies organization to
organization.
The following abbreviations are used in pairs in the third, fourth, and fifth columns from the left to depict
the mandatory, recommended, or optional nature of a metadata field for a data model entity or table at the
conceptual, logical, or physical data model level.
METADATA METADATA FIELD
PURPOSE C L P
FIELD ITEM PARAMETERS
Name The standardized entity M M M
name on the conceptual or
logical data model or the
abbreviated table name on
the physical data model.
Alias If such a name exists, the O R M
vernacular version of any
standardized or abbreviated
name.
Definition The standardized definition M M M
of an entity or table.
Phase The conceptual, logical, or M M M “Conceptual,” “Logical,” or “Physical”
physical state of an entity or
table.
Identifier A unique, system-generated M M M A five-digit number beginning with
record number assigned to leading zeroes and ending with a
the entity or table. zero (e.g., 00010 for the first entity or
table).
Version A system-generated M M M Maximum of two sequential integers
configuration management beginning with “1”.
identifier to maintain the
integrity of a particular
metadata set, or field string,
for an entity or table.
Current Version A Boolean “Yes” or “No” flag M M M “Y” or “N”
indicating if this version is
the recognized as the latest
version of the entity or table.
This flag allows only Current
Version data model
information to be displayed
in filtering, sorting, or
keyword searching.
Status A stage of a particular M M M One of five possible stages:
version of a data model Developmental, Candidate,
entity or table within the Approved, Archived, Rejected. See
data standardization further Status explanation
approval cycle. immediately below this table.
Status Date The date that the current M M M
Status became effective.
There are five potential status entries. These are Developmental, Candidate, Approved, Archived, and
Rejected. Typically, a data element version successively moves through the first four statuses during its
data life cycle.
A Developmental status data element is the initial stage of data element development. It may be used to
satisfy a new data requirement but with the understanding that the data element is not yet mature.
An Approved status data element is the authoritative data source for enterprise-wide use. It must be used
in its present format if it will fully satisfy a new data requirement. If it will not satisfy such a requirement, a
modified version of the data element must be submitted for approval.
A Candidate status data element is the second most authoritative data source for enterprise-wide use. It
can be used in its present format with a high degree of confidence. In comparison, a Developmental status
data element may be used, but with the understanding that it is not yet mature.
An Archived status data element should be considered purely as a historical reference. It satisfied an
earlier data requirement but should not be recreated at this time.
Finally, a Rejected status data element should not be considered at all because it never adequately
satisfied a data requirement.
REVIEW MODEL
NUMBER STANDARD STATUS
METHODOLOGY DISCREPANCIES
9 Do the logical data model Mandatory Check attribute class
attribute names end with words against the list
proposed class words? of proposed class
words.
10 Does the logical data model Mandatory Check all metadata in
define all entities and attributes? the logical data model
describing each entity
and attribute.
Negative answer is
basis for returning
model to project.
11 Do the context class and Mandatory Check these data
attribute definitions in the logical element definitions.
data model correctly?
12 Do the new native entity and Mandatory Check length of data
attribute definitions concisely tell element definitions.
what the data element is instead Consider placing any
of how it is used? “How” explanation in
the comments field.
13 Do the logical data model Mandatory Check all metadata in
entities and attributes have the logical data model
entries in all mandatory describing each entity
metadata fields? and attribute against
the standards and
guidelines. Negative
answer is basis for
returning model to
project.
14 Does the logical data model Mandatory Check the logical
provide appropriate rationale for data model, and any
any new entities or attributes supporting
that appear to duplicate existing documentation
data elements? accompanying it,
such as a cover letter.
15 Does the logical data model Optional Check all
include abbreviated class and abbreviations against
attribute names following an list of authorized
authorized abbreviation list? abbreviations in the
standards and
guidelines. Use of
abbreviations is
optional in the logical
phase but, if present,
they must match the
authorized list.
16 Does the logical data model Optional Check all metadata in
identify attribute data types? the logical data model
describing each
attribute (e.g.,
character string,
integer, fixed point,
floating point).
17 Does the logical data model list Optional Check all metadata in
aliases for entities and the logical data model
attributes? describing each entity
and attribute.
REVIEW MODEL
NUMBER STANDARD STATUS
METHODOLOGY DISCREPANCIES
18 Is the logical data model Mandatory Check all metadata in
independent of physical the logical data model
constraints imposed by a to identify and remove
specific application or the target any attributes that are
Database Management System purely physical,
(DBMS)? application unique,
duplicate, or derived.
19 Does the logical data model Mandatory Check the logical
show all relationships? data model to
determine if it shows
all applicable
relationships in the
current ECDM.
20 Are the logical data model Mandatory Check wording for all
relationships named in the form logical data model
of a verb phrase? relationships.
21 Are the relationship name verbs Mandatory Check of the logical
or verb phrases strong, specific, data model to identify
active-tense in one direction inappropriate, one-
instead of general, one-word word verbs such as
verbs of being? “is,” “has,” or
“contains.”
22 If relationships display a Optional Check logical data
reciprocal verb phrase, is this model multiplicity
second verb phrase clearly notation.
understandable?
23 Do the logical data model Mandatory Check logical data
relationships note cardinality or model multiplicity
multiplicity on both ends of the notation.
relationship line?
24 Does the logical data model Mandatory Check logical data
have associative classes to model for new
resolve many-to-many associative entities
relationships? that resolve many-to-
many relationships
into pairs of one-to-
many relationships.
25 Does the logical data model Optional If provided, these
include statistics on object or statistics will be
data model class and accepted and
relationship occurrences? reviewed for the sole
purpose of gaining
more insight into the
logical data model.
Statistics could be
provided in
supporting
documentation, such
as an attachment to a
cover letter.
REVIEW MODEL
NUMBER STANDARD STATUS
METHODOLOGY DISCREPANCIES
26 Does the logical data model Optional Such rules will be
include preliminary referential updated in the
integrity rules? physical data model.
If a specific rule is
provided (i.e., restrict,
cascade, or set to
nulls), it will be
accepted and
reviewed for the sole
purpose of gaining
more insight into the
logical data model. A
fourth, default rule will
be “not specified.”
27 Does the logical data model Mandatory Check logical data
show all foreign key migration model for attributes
as part of normalization? with foreign key
notation.
Table G-1: Logical Data Model- Review Template
REVIEW MODEL
NUMBER STANDARD STATUS
METHODOLOGY DISCREPANCIES
10 If provided, does this migration Mandatory Comparison of the
explanation address relationship physical data model,
changes? the logical data
model, and the
migration explanation.
11 Does the physical data model Optional Check of the physical
identify all tables and columns data model.
that will not be populated until
later application releases?
12 Are all Large Object (LOB) data Mandatory Check of all Large
type columns on the physical Object (LOB) data
data model uniquely named and type column names
defined to specifically reflect and definitions.
their data storage objective?
13 Do the physical data model Optional Check physical model
tables display needed referential for referential integrity
integrity rules for foreign key information. The
update or deletion? physical data modeler
determines the need
for these rules.
14 Does the physical data model Mandatory Check all metadata in
identify column data types? the physical data
model describing
each column (e.g.,
character string,
integer, fixed point,
floating point).
15 Does the physical data model Mandatory Check all metadata in
identify maximum lengths for the physical data
column values? model describing
each column.
16 Does the physical data model Mandatory Check all metadata in
identify possible values for the physical data
identifier, indicator, and code model describing
columns? each identifier,
indicator, and code
column.
Table H-1: Physical Data Model- Review Template
IE (Crow's Feet) - Crow's Feet is a variation of the traditional James Martin notation. It uses dashed lines
for non-identifying relationships and solid lines for identifying relationships. Identifying relationships are
signified by rounded corners in the child table. For non-specific relationships, the two entities have many-
to-many relationships with an association relationship between them and a cardinality or multiplicity of one
or more.
6
Entity Relationship Modeling Technique: Entity Relationship Modeling Techniques and Conceptual Modeling
using the Entity-Relationship Model.
Data model registration process flow for any type of data model to be followed by the EDD Team upon
receipt of the data model registration form.
ECDM Registration Process after identification of changes by business sponsor or system developers.
clearly as possible. If there are naming conventions established they should be adhered to. The attribute
name should communicate the business concepts that users and business specialists are familiar with
and that can be validated and verified. For example, an attribute entitled “process” is unclear as to its
intent.
The same rigor should be applied when providing a definition for an attribute. A proper definition should
impart what the attribute is intended to record. For example, for the attribute “Financial Aid Type” compare
the following definitions:
• A kind of financial aid
• A form of financial aid that does not need to be repaid and that is ordinarily awarded on the basis
of financial need. Sometimes used interchangeably with the term scholarship, though
scholarships are generally awarded on the basis of merit.
The first definition is unclear in relation to how it interacts with the business, while the second provides
background and information that can easily be validated by the user community.
A definition should provide clarity and not restate the obvious. Compare the following examples for the
attribute “Student Identifier”:
• The identifier for a Student
• The Student Identifier is the unique number assigned to a student on the date they apply for
student aid. This number stays with the student throughout the aid process. It is a counter, which
is incremented for each new student.
A review should be done to ensure that there are no attributes with the same or similar names that have
different definitions or attributes with the same name but different definitions.
Relationship
Data models also show relationships between entities. For example, there is a relationship between
Student and Loan. Relationships are notated to describe and help clarify the association: A Student
applies for a Loan. Notations should accurately reflect and describe the relationship between entities. The
relationships also define the constraints between entities by diagramming whether an association is
optional and whether there is a limitation on the number of instances that can occur between the entities.
For example, “A Student can apply for zero or more Loans, and a Loan is issued to only one Student.”
This statement tells us that a student may choose not to apply for a loan but also that they can apply for
several loans. It is imperative that the constraints be properly defined as they help enforce the business
rules.
Aesthetics
An Entity Relationship diagram (ERD) displays entities within a box and shows the relationships, including
cardinality through lines arrows:
The diagram above shows two entities, Person and Person Role. The line connecting the two entities
shows there is a relationship between them. The “crow’s feet” at the end of the line at Person Role
indicates that Person is the parent entity and that Person Role is the child entity. It also indicates that a
Person can assume one or more roles. More details on how to read the relationship between these two
entities are captured in the Communication section at the end of this document.
Models, specifically ERDs, can become complex and indecipherable if the modeler does not account for
the visual effect it will have for a user trying to navigate through it. Models become burdensome when a
one tries to do too much with them. There is a little to the amount of information that can be displayed. A
diagram with numerous entities and multiple relationships can become a maze of “boxes and lines” if not
designed properly.
The simplest thing to do to improve the appearance of a model is to limit the number of entities displayed.
A diagram should display no more than 20 entities, but 10 to 15 are preferable. For large models, views
or subject areas can be created to accommodate this. How entities are placed in a model can affect the
viewer’s ability to understand a diagram. Some people arrange entities to minimize having relationship
lines cross with no particular business purpose in mind. While this improves the readability, it may not
impart value in understanding the overall meaning, as the viewer may still observe a random collection of
“lines and boxes”. It may not distinguish an independent entity (an entity that that does not rely on
another entity for identification and reflect tangible things that are important to a business) from a
dependent entity (an entity that relies on another entity for identification and represent transactional
information that represent what the business does). A convention for overcoming this obstacle is to
display relationship lines that point in the same direction. This separates reference entities that describe
tangible things (e.g. PERSON, AID, ORGANIZATION) in the business from entities describing what the
entity does (AID APPLICATION, AID ELIBILITTY). By having the “crow’s feet” point left an up, we can
readily discern the substantial entities from the reference entities.
The diagram above clearly shows that AID is the tangible thing that is important to the business and the
other entities represent the transactions the business conducts.
Benefits of a Good Data Model
By following the guidelines listed above when developing entities, attributes and relationships the modeler
will be able to create a data model that will provide utility to the business community. When translating
the CDM into a LDM these guidelines should continue to be adhered but other criteria must be evaluated
to ensure the quality of the logical data model and its capability to support the business requirements.
LDMs should be normalized to ensure that there is consistency and quality in the data. Normalization is
the process of efficiently organizing data in a data model or database. Additional information on data
model/database normalization is available in the Communication section at the end of this document.
Completeness
An LDM ensures that all relevant and necessary data is captured and represented accurately. For
example, if the model lacks the means to record loan payments and this data is required by the system
this would be a serious omission.
Non-Redundancy
A LDM ensures that information is only recorded once. This is done through a technique known as
normalization. Several legacy systems record facts in more than one table and sometimes in several
systems. Capturing data in this fashion essentially duplicates it and could result in anomalies. If a data
item is stored in more than one place we cannot be confident it is always represented consistently. There
are also maintenance issues. If the data item needs to be updated can we confirm that it has been
updated in each table?
Enforcement of Business Rules
The LDM must reflect and enforce the rules that apply to the business data. If the model enforced that a
Student could apply for only one loan then this would not support the business requirement. If the rule
correctly enforces the business requirement(s) in the logical data model the resulting database will
implement and enforce the correct practices and maintain the quality of the data.
Data Reusability
The LDM should be constructed independent of an application or a system. Information captured by one
organization or department may want to be used by other groups. A Loan Officer could later utilize
demographic information for a Student captured during an inquiry about Aid. Structuring the data
independent off either application allows both processes to employ it.
Stability and Flexibility
An LDM should be able to support changes to the business requirement. If new data is required the
model should be able to accommodate it into existing tables or with minor extensions to the model.
Communication
Conceptual and logical data models should represent the business concepts that the users and business
specialists are familiar with. Business owners and stakeholders should verify that it is accurate.
Conceptual and logical data models can be organized into subject areas that reflect different parts of the
organization to assist in this endeavor. Logical data models need to be presented at different levels of
detail to allow the reviewer the capability to grasp what has been captured.
Data Normalization
a) The Technique
Database normalization is a technique for designing relational database tables to minimize duplication of
information and, in so doing, to safeguard the database against certain types of logical or structural
problems. For example, when multiple instances of a given piece of information occur in a table, the
possibility exists that these instances will not be kept consistent when the data within the table is updated,
leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of
this kind, because its structure reflects the basic assumptions for when multiple instances of the same
information should be represented by a single instance only.
Higher degrees of normalization typically involve more tables and create the need for a larger number of
joins, which can reduce performance. Accordingly, more highly normalized tables are typically used in
database applications involving many isolated transactions (e.g. an automatic teller system), while less
normalized tables tend to be used in database applications that do not need to map complex relationships
between data entities and data attributes (e.g. a reporting application, or a full-text search application).
Database theory describes a table's degree of normalization in terms of normal forms of successively
higher degrees of strictness. A table in third normal form (3NF), for example, is consequently in second
normal form (2NF) as well; but the reverse is not always the case.
Although the normal forms are often defined informally in terms of the characteristics of tables, rigorous
definitions of the normal forms are concerned with the characteristics of mathematical constructs known
as relations. Whenever information is represented relationally, it is meaningful to consider the extent to
which the representation is normalized.
b) Rules of Data Normalization
As mentioned above, there are several levels of normalization. A brief description is provided in the table
below. 7
Level Description
1NF Eliminate Repeating Groups - Make a separate table for each set of related attributes,
and give each table a primary key.
2NF Eliminate Redundant Data - If an attribute depends on only part of a multi-valued key,
remove it to a separate table.
BCNF Boyce-Codd Normal Form - If there are non-trivial dependencies between candidate
key attributes, separate them out into distinct tables.
4NF Isolate Independent Multiple Relationships - No table may contain two or more 1:n or
n:m relationships that are not directly related.
5NF Isolate Semantically Related Multiple Relationships - There may be practical constrains
on information that justify separating logically related many-to-many relationships.
7
Source: Database Normalization Basics- DePaul University and Database Normalization Primer-
DePaul University
8
International Organization for Standardization. Information technology — Metadata registries (MDR) — Part 4: Formulation of data
definitions (ISO/IEC 11179-4, Second Edition 2004) 4/18/2007.
9
O’Neil, Bonnie “Business Metadata: How to Write Definitions”. Business Intelligence Network. (March 29, 2005). 3/23/2007
http://ww.b-eye-network.com/view/734
10
O’Neil, Bonnie “Business Metadata: How to Write Definitions”.
also be noted as a Synonym but may be helpful in the case of migration or reverse
engineering.
o Approval – information about when the concept/term definition was approved, by whom,
etc.
Definition Requirements
Understanding that the context in which the data is used is a key factor in defining data elements,
generally, when composing a data element definition, each definition should have the following
characteristics: 11
• Unique - A definition should be unique and distinguishable from every other data element
definition.
EXAMPLE: “Closure Date”
Poor Definition: The date when something closed.
Good Definition: The date that the school ceases to provide educational instruction in all
programs, as determined by the U.S. Secretary of Education.
• Clear – A data definition should be precise, concise, and unambiguous. The definition should be
clear enough to allow only one possible interpretation.
EXAMPLE: “Disbursement Date”
Poor Definition: The date money was disbursed.
Good Definition: The date money was credited to the student's account at the school or paid to
the student (or borrower if a PLUS loan) directly.
• Singular - The data definition should be expressed in the singular.
EXAMPLE: “OPEID”
Poor Definition: The identification number assigned by OPE for data exchange partners.
Good Definition: The unique identifier assigned by the Office of Postsecondary Education (OPE)
for each data exchange partner.
• Positive - The definition should be expressed as what it is, limit any emphasis on what it is not.
EXAMPLE: “Loan Net Amount”
Poor Definition: The amount that doesn’t include any fees.
Good Definition: The total amount disbursed to the borrower after any fees or charges have been
deducted.
• Specific Concept - The definition should include the essential meaning or primary characteristics
of the concept.
EXAMPLE: “Request Tracking ID”
Poor Definition: The unique ID assigned to a document match and tracking request.
Good Definition: The unique ID associated with a request action that is returned to the requestor
for document matching and tracking. Defined with Commonly Understood Abbreviations -
The definition should only use abbreviations when necessary and the abbreviation must be
commonly understood.
EXAMPLE: “EFC Available Income”
11
International Organization for Standardization. ISO/IEC 11179-4, Second Edition 2004. 4/18/2007