Data mapping
In computing and data management, data mapping is the process of creating data element mappings
between two distinct data models. Data mapping is used as a first step for a wide variety of data integration
tasks, including:[1]
Data transformation or data mediation between a data source and a destination
Identification of data relationships as part of data lineage analysis
Discovery of hidden sensitive data such as the last four digits of a social security number
hidden in another user id as part of a data masking or de-identification project
Consolidation of multiple databases into a single database and identifying redundant
columns of data for consolidation or elimination
For example, a company that would like to transmit and receive purchases and invoices with other
companies might use data mapping to create data maps from a company's data to standardized ANSI ASC
X12 messages for items such as purchase orders and invoices.
Standards
X12 standards are generic Electronic Data Interchange (EDI) standards designed to allow a company to
exchange data with any other company, regardless of industry. The standards are maintained by the
Accredited Standards Committee X12 (ASC X12), with the American National Standards Institute (ANSI)
accredited to set standards for EDI. The X12 standards are often called ANSI ASC X12 standards.
The W3C introduced R2RML (https://www.w3.org/TR/r2rml/) as a standard for mapping data in a
Relational database to data expressed in terms of the Resource_Description_Framework (RDF).
In the future, tools based on semantic web languages such as Resource Description Framework (RDF), the
Web Ontology Language (OWL) and standardized metadata registry will make data mapping a more
automatic process. This process will be accelerated if each application performed metadata publishing. Full
automated data mapping is a very difficult problem (see semantic translation).
Hand-coded, graphical manual
Data mappings can be done in a variety of ways using procedural code, creating XSLT transforms or by
using graphical mapping tools that automatically generate executable transformation programs. These are
graphical tools that allow a user to "draw" lines from fields in one set of data to fields in another. Some
graphical data mapping tools allow users to "auto-connect" a source and a destination. This feature is
dependent on the source and destination data element name being the same. Transformation programs are
automatically created in SQL, XSLT, Java, or C++. These kinds of graphical tools are found in most ETL
(extract, transform, and load) tools as the primary means of entering data maps to support data movement.
Examples include SAP BODS and Informatica PowerCenter.
Data-driven mapping
This is the newest approach in data mapping and involves simultaneously evaluating actual data values in
two data sources using heuristics and statistics to automatically discover complex mappings between two
data sets. This approach is used to find transformations between two data sets, discovering substrings,
concatenations, arithmetic, case statements as well as other kinds of transformation logic. This approach
also discovers data exceptions that do not follow the discovered transformation logic.
Semantic mapping
Semantic mapping is similar to the auto-connect feature of data mappers with the exception that a metadata
registry can be consulted to look up data element synonyms. For example, if the source system lists
FirstName but the destination lists PersonGivenName, the mappings will still be made if these data
elements are listed as synonyms in the metadata registry. Semantic mapping is only able to discover exact
matches between columns of data and will not discover any transformation logic or exceptions between
columns.
Data lineage is a track of the life cycle of each piece of data as it is ingested, processed, and output by the
analytics system. This provides visibility into the analytics pipeline and simplifies tracing errors back to
their sources. It also enables replaying specific portions or inputs of the data flow for step-wise debugging
or regenerating lost output. In fact, database systems have used such information, called data provenance, to
address similar validation and debugging challenges already.[2]
See also
Data integration
Data wrangling
Identity transform
ISO/IEC 11179 - The ISO/IEC Metadata registry standard
Metadata
Metadata publishing
Schema matching
Semantic heterogeneity
Semantic mapper
Semantic translation
Semantic web
Semantics
XSLT - XML Transformation Language
References
1. Shahbaz, Q. (2015). Data Mapping for Data Warehouse Design (https://books.google.com/b
ooks?id=pRChCgAAQBAJ). Elsevier. p. 180. ISBN 9780128053355. Retrieved 29 May
2018.
2. De, Soumyarupa. (2012). Newt : an architecture for lineage based replay and debugging in
DISC systems. UC San Diego: b7355202. Retrieved from:
https://escholarship.org/uc/item/3170p7zn
Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_mapping&oldid=1104909027"