KEMBAR78
Introducing: A Complete Algebra of Data | PDF
Grab some
coffee and
enjoy the
pre-show
banter before
the top of the
hour!
Introducing: A Complete Algebra of Data
It All Adds Up
Ø  Math is the cornerstone of engineering
Ø  It helped build the pyramids
Ø  It led to skyscrapers
Ø  It put a man on the moon
Ø  Maybe, it will finally solve data management!
Guests
Robin Bloor
Chief Analyst, The Bloor Group
@robinbloor robin.bloor@bloorgroup.com
Eric Kavanagh
CEO, The Bloor Group
@eric_kavanagh eric.kavanagh@bloorgroup.com
The
Algebra
of
Data
Robin Bloor
Chief Analyst, The Bloor Group
Mathematics
There has never previously been an
algebra of data
The “Relational Algebra” Diversion
§  “Relational Algebra” was
introduced as a
mathematical approach to
data management
§  It began with the best
intentions
§  But it never provided a
mathematical basis for
data, and subsequently it
went off the rails…
“Relational Algebra”
§  Not established axiomatically
§  Apparently based on set
theory, but:
§  No definition of the basic
element
§  No definition of the genesis
set or ground set
§  Abuse of the mathematical
term “relation”
SQL and RDBMS
§  SQL killed the possibility of
a data algebra
§  The NULL –- something that
exists and yet doesn’t –- is
mathematical nonsense
§  As a consequence, RDBMS
were constrained to tabular
data structures
§  And thus a wide variety of
alternative “database”
types gradually emerged
Data Algebra
There is NO NEW MATHEMATICS in
data algebra.
It is all garden variety SET THEORY.
The Algebra of Data: Background
§  Algebraix Data Corporation financed research into data algebra
§  It was conceived and formally defined by Professor Gary
Sherman, who was engaged for that specific purpose
§  The algebra has been tested on implementations of a “relational
database” and an RDF database, and proven against almost all
data structures
§  It has multiple potential areas of application, but has been
demonstrated for:
§  Data management
§  Query optimization
§  Data integration
Data Algebra – The Basics
§  Based on several interrelated
algebras:
§  Couplets (G)
§  Relations (G)
§  Clans (G)
§  They sit one within the other
like Russian nesting dolls
The algebra of couplets - Couplets (G)
ab
R = {ab
, ac
, bd
, cd
, ee
}
C = {{ab
, aa
, bd
, cd
, ee
},
{ab
, bd
, bc
, cd
, ed
}}
The algebra of relations - Relations (G)
The algebra of clans - Clans (G)
operators: composition %, transposition ).
additional operators: cross-union▼, cross-intersection▲,
cross-superstriction _, cross superstriction ^.
additional operators: unionj, intersection+,
complementation’, superstriction q, substriction p.
The JOIN
AAB := (Aq[A%{{xx
}}])UFyang
(Bq[B%{{xx
}}])
where xd(yang(A)+yang(B))
and Fyang := {Rd P(G x G): R is yang-functional}
A Natural Join
Data Algebra - Extensions
Because data algebra is based on
(finite) set theory, it brings ALL of
finite mathematics with it: numerical
analysis, group theory, matrix algebra,
vector algebra, topology, etc.
The Book
§  Written for: software developers,
data professionals &
mathematicians
§  Requires a basic knowledge of
set theory and mathematics
§  Goal: To demonstrate that data
algebra can be applied to all data
structures
§  Takes a step by step approach to
explain the algebra
§  Definitely not your grandfather’s
math text
The Power of Mathematics
Accounting: Double Entry Bookkeeping, Balance Sheet,
Trial Balance, etc. Enabling banking, the mercantile economy
and capitalist economy
Log Tables and slide rules became calculation technology and
the prime tool for a whole range of professions, including
scientists and engineers of every kind
Aside from gaming theory, provided basis for statistics and
the whole of the insurance sector. Also has applications
throughout science e.g. quantum mechanics.
The use of calculus is ubiquitous in all fields of engineering
in architecture, physics, operations research, statistics,
commerce and, notably in space exploration
Used in modern particle physics and molecular physics,
music, crystallography and materials science, cryptography
and error-correction in computing.
The computer and hence whole of the computer industry is
based on the work done on computer architecture, including
electronic calculators and code-breaking technology
Applications include large integer multiplication, efficient
matrix vector multiplication, filtering algorithms, solving
difference equations, etc.
(Proven application in data integration,
data management and query optimization.)
James Cooley, John Tukey:
Fast Fourier Transform
1965
Blaise Pascal, Pierre de Fermat:
Probability Theory
1654
Isaac Newton, Gottfried Leibniz:
Calculus
1684
Évariste Galois
Group Theory
1830
Alan Turing, John Von Neumann
Computer Architecture
1937
John Napier
Logarithms
1614
Luca Pacioli
Bookkeeping
1494
Gary Sherman
Data Algebra
2015
THE GROWTH OF
MATHEMATICAL
KNOWLEDGE
INCREASING VALUE
(SCIENTIFIC OR
COMMERCIAL)
WHO AND WHAT WHAT IT ENABLESTIME
LINE
Data Stores
§  File systems
§  XML files
§  Databases of all
varieties
§  Document stores
Algebraically, these
are NOT distinct
Likely Impact -1
Data algebra provides a COMMON LANGUAGE
for ALL data
It will ease data integration problems
Data Retrieval
§  File navigation
§  SQL, JSONiq, XQuery,
SPARQL
§  Search
Algebraically, these
are the SAME problem
Likely Impact - 2
It will HARMONIZE data retrieval
Optimization: The Algebraix Platform
In time we expect standards to be developed for:
§  DATA DECLARATION – data description
(think XML, except algebraic)
§  DATA RETRIEVAL (think UQL – Universal
Query Language)
Standards
The Bloor Group is building a training course for
companies who would like to train their staff in
data algebra.
Contact: MaryJo.Nott@Bloorgroup.com
Data Algebra Training
There is an Algebraix library built in Python
It instantiates the mathematics in Python and can be
downloaded along with a PDF of the book
Programming with Data Algebra
The URL is http://algebraixlib.readthedocs.org/en/latest/
An invasion of armies can be
resisted, but not an idea whose time
has come.
~Victor Hugo
Questions?
Other questions:
MaryJo.Nott@Bloorgroup.com
THANK
YOU!
Some images courtesy of Wikipedia and "All Gizah Pyramids" by Ricardo Liberato - All Gizah Pyramids. Licensed under CC BY-SA 2.0
via Commons - https://commons.wikimedia.org/wiki/File:All_Gizah_Pyramids.jpg#/media/File:All_Gizah_Pyramids.jpg

Introducing: A Complete Algebra of Data

  • 1.
    Grab some coffee and enjoythe pre-show banter before the top of the hour!
  • 2.
    Introducing: A CompleteAlgebra of Data
  • 3.
    It All AddsUp Ø  Math is the cornerstone of engineering Ø  It helped build the pyramids Ø  It led to skyscrapers Ø  It put a man on the moon Ø  Maybe, it will finally solve data management!
  • 4.
    Guests Robin Bloor Chief Analyst,The Bloor Group @robinbloor robin.bloor@bloorgroup.com Eric Kavanagh CEO, The Bloor Group @eric_kavanagh eric.kavanagh@bloorgroup.com
  • 5.
  • 6.
    Mathematics There has neverpreviously been an algebra of data
  • 7.
    The “Relational Algebra”Diversion §  “Relational Algebra” was introduced as a mathematical approach to data management §  It began with the best intentions §  But it never provided a mathematical basis for data, and subsequently it went off the rails…
  • 8.
    “Relational Algebra” §  Notestablished axiomatically §  Apparently based on set theory, but: §  No definition of the basic element §  No definition of the genesis set or ground set §  Abuse of the mathematical term “relation”
  • 9.
    SQL and RDBMS § SQL killed the possibility of a data algebra §  The NULL –- something that exists and yet doesn’t –- is mathematical nonsense §  As a consequence, RDBMS were constrained to tabular data structures §  And thus a wide variety of alternative “database” types gradually emerged
  • 10.
    Data Algebra There isNO NEW MATHEMATICS in data algebra. It is all garden variety SET THEORY.
  • 11.
    The Algebra ofData: Background §  Algebraix Data Corporation financed research into data algebra §  It was conceived and formally defined by Professor Gary Sherman, who was engaged for that specific purpose §  The algebra has been tested on implementations of a “relational database” and an RDF database, and proven against almost all data structures §  It has multiple potential areas of application, but has been demonstrated for: §  Data management §  Query optimization §  Data integration
  • 12.
    Data Algebra –The Basics §  Based on several interrelated algebras: §  Couplets (G) §  Relations (G) §  Clans (G) §  They sit one within the other like Russian nesting dolls The algebra of couplets - Couplets (G) ab R = {ab , ac , bd , cd , ee } C = {{ab , aa , bd , cd , ee }, {ab , bd , bc , cd , ed }} The algebra of relations - Relations (G) The algebra of clans - Clans (G) operators: composition %, transposition ). additional operators: cross-union▼, cross-intersection▲, cross-superstriction _, cross superstriction ^. additional operators: unionj, intersection+, complementation’, superstriction q, substriction p.
  • 13.
    The JOIN AAB :=(Aq[A%{{xx }}])UFyang (Bq[B%{{xx }}]) where xd(yang(A)+yang(B)) and Fyang := {Rd P(G x G): R is yang-functional} A Natural Join
  • 14.
    Data Algebra -Extensions Because data algebra is based on (finite) set theory, it brings ALL of finite mathematics with it: numerical analysis, group theory, matrix algebra, vector algebra, topology, etc.
  • 15.
    The Book §  Writtenfor: software developers, data professionals & mathematicians §  Requires a basic knowledge of set theory and mathematics §  Goal: To demonstrate that data algebra can be applied to all data structures §  Takes a step by step approach to explain the algebra §  Definitely not your grandfather’s math text
  • 16.
    The Power ofMathematics Accounting: Double Entry Bookkeeping, Balance Sheet, Trial Balance, etc. Enabling banking, the mercantile economy and capitalist economy Log Tables and slide rules became calculation technology and the prime tool for a whole range of professions, including scientists and engineers of every kind Aside from gaming theory, provided basis for statistics and the whole of the insurance sector. Also has applications throughout science e.g. quantum mechanics. The use of calculus is ubiquitous in all fields of engineering in architecture, physics, operations research, statistics, commerce and, notably in space exploration Used in modern particle physics and molecular physics, music, crystallography and materials science, cryptography and error-correction in computing. The computer and hence whole of the computer industry is based on the work done on computer architecture, including electronic calculators and code-breaking technology Applications include large integer multiplication, efficient matrix vector multiplication, filtering algorithms, solving difference equations, etc. (Proven application in data integration, data management and query optimization.) James Cooley, John Tukey: Fast Fourier Transform 1965 Blaise Pascal, Pierre de Fermat: Probability Theory 1654 Isaac Newton, Gottfried Leibniz: Calculus 1684 Évariste Galois Group Theory 1830 Alan Turing, John Von Neumann Computer Architecture 1937 John Napier Logarithms 1614 Luca Pacioli Bookkeeping 1494 Gary Sherman Data Algebra 2015 THE GROWTH OF MATHEMATICAL KNOWLEDGE INCREASING VALUE (SCIENTIFIC OR COMMERCIAL) WHO AND WHAT WHAT IT ENABLESTIME LINE
  • 17.
    Data Stores §  Filesystems §  XML files §  Databases of all varieties §  Document stores Algebraically, these are NOT distinct Likely Impact -1 Data algebra provides a COMMON LANGUAGE for ALL data It will ease data integration problems
  • 18.
    Data Retrieval §  Filenavigation §  SQL, JSONiq, XQuery, SPARQL §  Search Algebraically, these are the SAME problem Likely Impact - 2 It will HARMONIZE data retrieval
  • 19.
  • 20.
    In time weexpect standards to be developed for: §  DATA DECLARATION – data description (think XML, except algebraic) §  DATA RETRIEVAL (think UQL – Universal Query Language) Standards
  • 21.
    The Bloor Groupis building a training course for companies who would like to train their staff in data algebra. Contact: MaryJo.Nott@Bloorgroup.com Data Algebra Training
  • 22.
    There is anAlgebraix library built in Python It instantiates the mathematics in Python and can be downloaded along with a PDF of the book Programming with Data Algebra The URL is http://algebraixlib.readthedocs.org/en/latest/
  • 23.
    An invasion ofarmies can be resisted, but not an idea whose time has come. ~Victor Hugo
  • 24.
  • 25.
    THANK YOU! Some images courtesyof Wikipedia and "All Gizah Pyramids" by Ricardo Liberato - All Gizah Pyramids. Licensed under CC BY-SA 2.0 via Commons - https://commons.wikimedia.org/wiki/File:All_Gizah_Pyramids.jpg#/media/File:All_Gizah_Pyramids.jpg