KEMBAR78
Aristotle Data Model | PDF | Metadata | Part Of Speech
0% found this document useful (0 votes)
131 views20 pages

Aristotle Data Model

This describes my understanding of the underlying Aristotle Metadata Registry data model. It was discovered through profiling JSON extract files. It has not been validated by Aristotle.

Uploaded by

Matthew Lawler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views20 pages

Aristotle Data Model

This describes my understanding of the underlying Aristotle Metadata Registry data model. It was discovered through profiling JSON extract files. It has not been validated by Aristotle.

Uploaded by

Matthew Lawler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Matthew Lawler lawlermj1@gmail.

com Aristotle Data Model

Aristotle Data Model

Matthew Lawler

Lawlermj1@gmail.com

/conversion/tmp/activity_task_scratch/576228854.docx 1 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Introduction...........................................................................................................................................3
Licence...............................................................................................................................................3
Warranty...........................................................................................................................................3
Purpose.............................................................................................................................................3
Audience............................................................................................................................................3
Approach...........................................................................................................................................3
By.......................................................................................................................................................3
Acronyms...........................................................................................................................................3
References.........................................................................................................................................4
Metadata Repository (MDR)..................................................................................................................5
How does an MDR provide economic benefits?................................................................................5
Audit: Systems/database support or practicing what is preached....................................................6
Audit: Open Government..................................................................................................................7
Aristotle Metadata Repository (MDR)...................................................................................................8
Grammatical Quality..........................................................................................................................8
Additional Invariants and checks.......................................................................................................9
Building up the Aristotle MDR.........................................................................................................10
Other ideas......................................................................................................................................10
Aristotle Metadata Views................................................................................................................12
Aristotle Any Item Metadata...........................................................................................................12
API...................................................................................................................................................13
MDR Definitions...............................................................................................................................13

/conversion/tmp/activity_task_scratch/576228854.docx 2 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Introduction
Licence
This document is released under the Creative Commons Zero licence or CC0.

Warranty
The author does not make any warranty, express or implied, that any statements in this document
are free of error, or are consistent with a particular standard of merchantability, or they will meet
the requirements for any particular application or environment. They should not be relied on for
solving a problem whose incorrect solution could result in injury or loss of property. If you do use
this material in such a manner, it is at your own risk. The author disclaims all liability for direct or
consequential damage resulting from its use.

Purpose
This document describes the data model underlying the Aristotle Metadata Registry (MDR).

Audience
The primary audience for this document are metadata designers. This applies especially to designers
who need to integrate Aristotle with other metadata tools. The reader needs to understand
metadata concepts.

Approach
This document presents an analysis of an instance of the Aristotle MDR. This analysis was based on
profiling the extracted JSONs from the cloud. This is not a comprehensive document. As each
Aristotle instance is different, so the population and usage of the metadata will also be different.

By
This was written by Matthew Lawler.

Acronyms
This is a list of acronyms used in the document.

Acronym Expansion AKA By


AKA Also Known As English
ANSI American National Standards Institute ANSI
DB Database ANSI
FK Foreign Key ANSI
GUID Globally Unique Identifier UUI Microsoft
D
ISO International Organization for Standardization ISO
JSON JavaScript Object Notation ISO
MDR Metadata Registry ISO
OWL Web Ontology Language W3C
PK Primary Key ANSI
RDF Resource Description Framework W3C
RO Read Only OMG
RW Read Write OMG
UUID Universally Unique Identifier GUID ISO
W3C World Wide Web Consortium W3C

/conversion/tmp/activity_task_scratch/576228854.docx 3 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

References
By For Path Full
Aristotl API https://aristotle.cloud/api/v4/
e
Aristotl Cloud https://www.aristotlemetadata.com/
e
Aristotl Source https://github.com/aristotle-mdr/aristotle-metadata-
e registry
Aristotl wiki https://en.wikipedia.org/wiki/Aristotle_Metadata_Registry
e
IAASIST Conference https://iassistdata.org/
ISO 11179 https://www.iso.org/obp/ui/#iso:std:iso-iec:11179:-1:ed-
3:v1:en
W3 Standard https://www.w3.org/TR/owl-ref/#sameAs-def OWL
Matching
standard
Wiki Standard https://www.wikiwand.com/en/Data_element_definition Wikiwand
API
Haskell was used to access the API. All code is here: https://github.com/lawlermj1/Aristotle-JSON

/conversion/tmp/activity_task_scratch/576228854.docx 4 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Metadata Repository (MDR)


The following are some thoughts on Metadata Repository, and on the Aristotle MDR is particular.
These are not meant to be comprehensive but provide a starting for further investigation.

How does an MDR provide economic benefits?


In the Economics of Information, the main benefit of an informational good is to reduce uncertainty.
If the informational good here, the MDR, cannot reduce uncertainty due to incompleteness,
unreliability, etc, then the beneficiary, such as a project manager, will make the economically
rational choice of ignoring it, and using some alternative basis to reduce uncertainty, such as
mandating choices, etc. It is easy to create a mess in an MDR, especially when there are few or no
automated QA checks. Many of the following checks can be automated, which would lead to an
increase in data quality, and enhance the economic value.

A Metadata Repository can help with a number of business goals.

These can be

1. Definition source

2. Initial Requirements

3. Requirements Traceability from Definitions to data sources

4. Auditing for systems/database support

5. Auditing for openness

The first 3 won't be examined as they are self-evident use cases. The Auditing use cases will be
looked at below.

/conversion/tmp/activity_task_scratch/576228854.docx 5 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Audit: Systems/database support or practicing what is preached


This involves collecting 2 corpus. One based on all public documents, such as any Acts, etc. The
other would be based on available systems and database metadata. See diagram.

1. Documents Corpus created from enabling Acts and Public Documents

2. Databases Corpus created from Databases and systems

3. Supported = Overlap between Documents and Databases Corpus

4. Unsupported = Words exists in Documents, but does not exist in Databases

5. Additional = Words exists in Databases, but not in Documents

If supported % is high, then there is a good fit between documents and databases. That is, the
database would be justified as it supports the relevant Act.

If Unsupported % is high, then there is a database or systems capability gap, which could be used to
justify additional projects.

If Additional % is high, then these words are either hidden or represent too much systems capability.

/conversion/tmp/activity_task_scratch/576228854.docx 6 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

If hidden, then this can be made public, or reduce unneeded capability. It could also lead to the
discovery of legislative or regularity gaps. Then the information could be added to the legislation, or
the capability turned off on efficiency grounds.

Obviously, this is not a detailed capability assessment. It is just a check to see if the language used in
the systems is consistent with the primary documents. Further, more detailed analysis is required.

Audit: Open Government


A common Public Service anti-pattern is to impose unreasonable levels of security. All words used in
publicly available documents would be tagged at UNCLASSIFIED. Any attempt to lift a PSPF tag to a
stricter rating such as CLASSIFED should be preventable. Actual database names might have a higher
PSPF, but these would be in distribution object. It would also enable the opening up of the API to
the public, contributing open government.

/conversion/tmp/activity_task_scratch/576228854.docx 7 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Aristotle Metadata Repository (MDR)


I have used the Aristotle API to extract and parse all available Aristotle JSON metadata objects into
data structures. I have then used these data structures to define draft, exploratory checks to
measure some aspects of the Aristotle metadata data quality. Altogether there are twenty-two
parsed JSON files.

Aristotle has three core objects: Object Class, Property and Data Element Concept. However, these
three have quite complicated definitions that are hard for anybody to understand. These are a case
study of making definitions needlessly complicated. It is a case of accidental complexity, which leads
to spaghetti metadata. A key insight is that these core objects are really Nouns, Adjectives and
Phrases. That is, it is as simple as basic grammar. In effect, SA is building a Corpus of its words. To
restate, an Object Class is a Noun or group of Nouns, a property is an Adjective, Adverb or Verb, and
Data Element Concept is a Phrase formed by its parent Noun and Adjective. The word Adnominal
means either Adjective, Adverb or Verb. That is, an Adnominal is a modifier to convert a Noun into a
Noun Phrase.

As with other parts of the ISO 11179 standard, the grammatical specification is incomplete.
However, even with a limited number of Parts of Speech (POS), it can still be useful.

Grammatical Quality
See POS Tagging.

/conversion/tmp/activity_task_scratch/576228854.docx 8 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

In a typical corpus, the ratio of Nouns to Adnominals is about 3 to 1. In this example, the ratio of
Nouns to Adnominals is 1 to 9, as the Object Class count is 899 and the Property count is 7,674. So,
what is awry? The next step was to parse names into words, and attach a Part of Speech (Noun,
Adjective, etc) tag to each word. This reveals that each Object Class on average uses 3 Nouns for
each Adnominal, which is tolerable. However, each Property on average uses 2 Nouns to 1
Adnominal. Simply stated, there are too many Nouns in the Property set, and each Property's nouns
should be in the Object Class. In addition, about 4% of Property words are misspelt, which will make
it difficult to search to find them. Finally, this is quite a small corpus. It is reasonable to assume that
a Corpus would have at least 10,000 nouns.

Additional Invariants and checks


This is a preliminary list of additional checks.

% Separation of nouns between name spaces -> accidental complexity (unique/distinct vs


common/shared nouns)

% Sharing of nouns across name spaces

% Nouns in OC vs non-nouns in OC

% Adjectives in P vs non-adjectives

% Of missing implied words, especially base words from phrases not yet included.

/conversion/tmp/activity_task_scratch/576228854.docx 9 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

% Of spurious or invalid duplicates

% DEC constructed correctly

Check to determine if the ratio between corpus is credible.

Correct misallocation of current words, etc.

Type Consistency test?

Building up the Aristotle MDR


These could be functions used to quickly populate an MDR.

Currently, this is done manually, without automated checks, which has produced very mixed quality.

1. Collect all words from all published documents on the web site or the governing Act. These
should represent a full set Nouns, Adjectives, phrase of interest to the organisation.

2. Insert or Post these can be stored into Object Class, Property and Data Element Concept corpus.

The Distribution object can be used to populate the file or web or document sources.

3. Collect definitions of words from online dictionaries, such as the Governing Act, OED,
Macquarie Dictionary etc. The purpose of Australian Government Agency is defined in the relevant
enabling Parliamentary Acts. Included in these Acts are definitions that apply to the Act, and
anybody administering these Acts. So, by definition, these definitions are superior to all others.

4. Insert or Post these into Object Class, Property and Data Element Concept corpus.

5. Add a spell check and grammar check to these and allow misspellings with a link back to the
correct form.

Audit: Compare database metadata with the document metadata

6. Collect metadata from all available systems and populate into a separate corpus, along with
the distribution.

7. Match the 2 corpuses to determine overlap. These can be used to determine the metadata
completeness of systems supporting the organisation corpus.

Other ideas
A. Acronyms can be up to 20% of the words used in a namespace. These can be treated as a
Phrase type and placed into the Data Element Concept.

B. Phrases can be made up other Phrases. So, there is a need to provide a recursive link on this
object, as an additional relation. This will allow Nouns and adnominals to remain a primitive words,
and not compounds. These relations should conform to standard English Grammar. Where they did
not conform, this could be a means to identify errors.

C. Additional relations will be needed between Value Domains and Distributions directly to
Object Class and Property as a way of providing traceability. the html layer does not support this,
but it could easily be implemented in the graph DB layer.

/conversion/tmp/activity_task_scratch/576228854.docx 10 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

D. Historical, Superseded or archaic words; these are words used previously and have since been
replaced. There needs to be come traceability of these terms.

E. Namespaces: Words that are unique to a particular area. These are often not understood outside
a particular area of expertise. These need to be captured, and clearly placed into a domain. See
diagram.

F. In the implementation, there are too many workgroups and not enough visibility onto objects like
relations.

G. Aristotle does not clearly support the standard security model of UNCLASSIFIED, OFFICIAL,
UNOFFICIAL and RESTRICTED. Rules should be defined that apply to all Aristotle items.

H. Item State (candidate, recorded, etc) is not available to an API user, with standard permissions.

Z. It might be possible to use a Description Logic on top of the MDR.

/conversion/tmp/activity_task_scratch/576228854.docx 11 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

Aristotle Metadata Views


This shows the profiled or discovered objects used in the example.

These are the primary populated JSON objects.

Aristotle Any Item Metadata


This shows the subtypes of the Any Item object, which is the core object in the Aristotle’s data store.

/conversion/tmp/activity_task_scratch/576228854.docx 12 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

MDR Definitions
As a general comment, this document describes the implied data structure of a tool which manages
metadata data. This leads immediately into a language confusion trap, and the classic ‘name
collision’ problem. The MDR implements the ISO 11179 which is incomplete. The gaps are filled
with OOP terminology, so some confusion is inevitable. The major name collisions are Attribute,
Class, Object, Property and the woefully named Object Class.

Profiling clarified definitions as examples are always useful in understanding abstract ideas such as
metadata. Aristotle has not provided definitions for all JSON formats, so I derived these from
profiling. These definitions have a Type of JSON. The Count column indicates the number of JSON
records extracted when in late 2021. The Unused column indicates Aristotle concepts that are not
used in the sample instance. Therefore, they will not be discussed further.

0 Term By Type Definition Count Unused


1 Any Item Aristotle JSON This is the primary boilerplate 52,036 0
or super type, which provides
common JSON fields for ten
formats. This covers the
JSON formats for Data
Element Concept, Data
Element, Data Set
Specification, Data Type,
Distribution, Property,
Relation, Value Domain,
Object Class and Object Class
Specialisation. All these JSON
formats share critical JSON

/conversion/tmp/activity_task_scratch/576228854.docx 13 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


fields including the UUID.
Each Any Item has 2 PKs
(Primary keys) - a concept Id
and a UUID. This is useful
when mapping between
metadata repositories. In
graph theory, this represents
the nodes of the underlying
MDR graph DB.
2 Attribute 11179 Standard This is a characteristic of an 0
object or set of objects.
3 Attribute DAMA Definition Any detail that serves to 0
qualify, identify, classify, or
express state of an entity.
4 Class OOP Definition A class is an extensible 0
program-code-template for
creating objects. A class is a
blueprint for creating objects
(a particular data structure),
providing initial values for
state (member variables or
attributes), and
implementations of
behaviour (member functions
or methods).
5 Classification Aristotle Definition A list of mutually exclusive 0 1
categories representing
values of the classification
variable.
6 Classification Aristotle Definition A Classification Scheme 0 1
Scheme describes a set of ideas and
standard values used to
record codes when storing
data.
7 Concept Delta Aristotle JSON This shows the UUID, and 52,036 0
date changed. This is useful
when doing a change data
capture approach for the
metadata.
8 Conceptual Aristotle Definition A Conceptual Domain 0 1
Domain describes a set of ideas that
can be recorded using codes
when storing data. When
linked to multiple Value
Domains, a Conceptual
Domain can be used to find
similarities in different code
sets.
9 Correspondence Aristotle Definition A correspondence table is a 0 1
Table collection of mappings that

/conversion/tmp/activity_task_scratch/576228854.docx 14 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


express the relationship
between items in different
classifications.
10 Custom Value Aristotle JSON This JSON contains custom 69,157 0
defined fields for each Any
Item subtype. Examples
could be Long Name, Tag,
etc.
11 Data Catalog Aristotle Definition A Data Catalog records a 2 1
curated collection of data
sets.
12 Data Element Aristotle Definition A Data Element is a precise 21,654 0
way of defining how a piece
of data is recorded for a
specific set of objects, using
reusable metadata
components. Data Elements
are composed of a Data
Element Concept, which
describes the meaning of the
data, and a Value Domain
which describes how the data
is recorded.
13 Data Element Aristotle Definition A Data Element Concept 8,117 0
Concept defines an idea that could be
recorded by data, without
specifying how it would be
stored or measured. Data
Element Concepts are
composed of an Object Class,
which describes the thing of
interest, and a Property that
defines which attribute of the
thing would be recorded.
Data Element Concepts can
be referenced by multiple
different Data Elements that
each specify the Value
Domain used to record the
data.
14 Data Element Aristotle Definition A Data Element Derivation 0 1
Derivation describes a standardised rule
or equation that transforms a
set of input Data Elements to
produce a set of output Data
Elements. application of a
derivation rule to one or
more input
15 Data Element Aristotle JSON This JSON provides the 43,335 0
Path column name and column

/conversion/tmp/activity_task_scratch/576228854.docx 15 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


definition for a Distribution.
It is well populated, but not
complete.
16 Data Set Aristotle Definition A Data Set describes a record 2 1
of data, including any
location or time boundaries
for the data, which has been
captured and is available for
use under a specific licence.
A Data Set may be included in
a Data Catalog and can
reference multiple
Distributions.
17 Data Set Aristotle Definition A Data Set Specification 250 0
Specification describes an agreement to
collect an ideal standard of
data. A Data Set Specification
may reference other Data Set
Specifications or Data
Elements to describe the data
that should be collected
under the agreement.
18 Data Type Aristotle Definition A Data Type describes a way 13 0
of storing a specific form of
data within a system.
19 Distribution Aristotle Definition A Distribution describes the 7,724 0
structure and format of a
specific downloadable
collection of data. Multiple
Distributions that capture
various parts of data or
provide different formats for
data may be grouped into a
single Data Set.
20 DSS Cluster Aristotle JSON This JSON provides more 11 0
Inclusion precise restrictions on a Data
Set Specification. There are
only a few examples.
21 DSS DE Aristotle JSON This JSON provides more 3,114 0
Inclusion details on Data Elements in a
Data Set Specification. It is
well populated.
22 Element 11179 Standard An element or data element 0
is a basic container for data.
23 Entity DAMA Definition An entity may be defined as a 0
thing capable of an
independent existence of
interest to the business that
can be uniquely identified.
24 Enum Computer Definition An enumeration of a sum 0

/conversion/tmp/activity_task_scratch/576228854.docx 16 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


type. This can also a code,
such as Public or Private in
Scope.
25 Framework Aristotle Definition A Framework describes an 0 1
organised collection of
targets and strategic
outcomes to assess a broad
policy area. A Framework
can collect multiple Indicator
Sets and Outcome Areas to
provide a complete
understanding of the
assessment of progress to a
group of related goals.
26 Glossary Item Aristotle Definition A Glossary Item records a 2 1
business term that is
commonly used within the
metadata registry. A
collection of Glossary Items is
called a business glossary.
27 Graph DB Computer Definition A graph database is a 0
database that uses graph
theory node and edge data
type tables to represent data.
Graph data type tables are
sufficiently abstract to
represent any architectural
diagramming method, such
as data models, Zachman
diagrams, etc.
28 Identifier Aristotle JSON This JSON contains the 46,121 0
concept id and the UUID for
each Any Item subtype. Both
keys are used throughout
Aristotle. This is important
when mapping metadata
tools.
29 Indicator Aristotle Definition An indicator describes a 0 1
measure that is regularly
reported for tracking
performance of a process or
policy and provides relevant
and actionable information
about system performance.
30 Indicator Set Aristotle Definition An Indicator Set describes a 0 1
collection of targets and
objectives. An Indicator Set
collects multiple Indicators
with common targets that
are reported on together.

/conversion/tmp/activity_task_scratch/576228854.docx 17 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


31 ISO 11179 ISO Standard A standard for representing 0
metadata.
32 Issue Aristotle JSON This shows details about 23 0
quality issues raised in the
MDR. It is low volume.
33 Link Aristotle JSON This provides more 9,614 0
information regarding the
relations.
34 Metadata DAMA Definition Metadata is "data that 0
provides information about
other data". In other words,
it is "data about data".
35 Narrower Class Aristotle JSON This JSON provides more 642 0
precise restrictions on an
Object Class Specialisation.
This helps make the types
more specific. There are only
a few examples.
36 Object OOP Definition An object is an instance of a 0
class that contains properties
and methods.
37 Object Class Aristotle Definition An Object Class defines a way 880 0
of identifying or classifying a
set of real objects, ideas, or
events that all share common
measurable attributes.
38 Object Class Aristotle Definition An Object Class Specialisation 17 0
Specialisation describes a relationship
between Object Classes,
where multiple specialised
Object Classes are all
contained by a common
broader Object Class.
39 Org Record Aristotle JSON This JSON defines Org Record 497 0
data for an Any Item subtype.
It is not used much.
40 Outcome Area Aristotle Definition An Outcome Area describes a 0 1
strategic target or standard
for a process or policy that
may not be able to be
measured directly or
efficiently.
41 Permissible Aristotle JSON This JSON contains the enum 64,046 0
Value or code values for a Value
Domain. It is well populated.
42 Property 11179 Standard A characteristic common to 0
all members in an object
class.
43 Property Aristotle Definition A Property is an attribute 7,336 0
common to all members of a

/conversion/tmp/activity_task_scratch/576228854.docx 18 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


set of things defined by an
Object Class.
44 Property English Definition Something that belongs to 0
something. An adnominal,
such as an adjective or
adverb.
45 Quality Aristotle Definition A Data Quality Statement 0 1
Statement records any known issues
that may be related to a data
asset. A Data Quality
Statement assesses data
against seven key factors:
Institutional Environment,
Relevance, Timeliness,
Accuracy, Coherence,
Interpretability &
Accessibility.
46 Relation Aristotle Definition A Relation defines a 4 0
relationship that can be used
to link different metadata
items within the registry.
Each Relation can specify a
number of roles that
metadata can fill within the
relationship. In graph theory,
this represents the
underlying edges of the
underlying MDR graph DB.
Due to permission issues,
only a few were visible.
47 Relation Role Aristotle JSON This JSON provides some 7 0
definitions of the relations
such as name and
multiplicity. There are only a
few examples, due to
permissions restrictions.
48 Representation Aristotle JSON A supplemental logical data 18 1
Class type. Not populated as an
object but called a 'managed
item'.
49 Slot Aristotle JSON This JSON provides additional 182 0
information for Value
Domains and Data Elements.
It is not used much.
50 Stewardship Aristotle JSON This defines the Organisation 2 0
Org that controls the metadata
and is part of authorisation.
This appears on all JSON
items.
51 Supplementary Aristotle JSON This JSON contains some 34 0

/conversion/tmp/activity_task_scratch/576228854.docx 19 of 20
Matthew Lawler lawlermj1@gmail.com Aristotle Data Model

0 Term By Type Definition Count Unused


Value additional values for a Value
Domain. There are very few
values.
52 Units Of Aristotle Definition A Unit of Measure describes 0 1
Measure units (e.g., metres, litres,
seconds) which can be used
to record a measurement.
53 Value Domain Aristotle Definition A Value Domain describes 3,893 0
how to record the
measurement of a particular
type of data, either using a
coded list of values or a
description of the possible
values. Value Domains can
be linked to Data Elements
that all share a common way
of recording data, and its
values can be linked to a
Conceptual Domain to
provide additional context.

/conversion/tmp/activity_task_scratch/576228854.docx 20 of 20

You might also like