LIS Databases
LIS Database Management
                 By
           Malachi Odongo
Lecturer Technical University of Kenya
Introduction
• Data management in LIS Consists of storing a variety of data
  categorised under
   • Spatial Data
   • Attribute Data
• Storage is done in a way that permits a user to retrieve or display any
  combinations of these data after analysis and manipulation
• Spatial data should be stored in a database capable of handling
  spatial data
   • Examples – Proprietary (Oracle Spatial)
   • Examples – Open Source (PostGIS)
• Attribute data can be stored in any conventional databases
• Before LIS Analysis, you should specify and incorporate attribute data
  in the LIS database
Database Management Systems (DBMS)
• A DBMS is a software that permits users to work efficiently with data.
• A DBMS Must provide a means to
   • Define contents of a database; insert new data; delete old data; query the database; modify
     data; update data; minimise redundancy; physical data independence; security and integrity
• A DBMS stores data in such a way that it becomes easier to retrieve, manipulate,
  and produce information.
• A DBMS:
   • Must allow the definition of data and their attributes and relationships
   • Must provide security,
   • Must provide an interface between the end users and their applications, and the data
• Functions of a DBMS:-
   •   File handling and file management
   •   Adding, updating and deleting records
   •   Extraction of information from data
   •   Maintenance of data security
   •   Maintenance of data integrity
   •   Application building
Functions of a DBMS
• Security: protection of data against accidental or intentional disclosure to
  or destruction by authorised persons
• Integrity: ability to protect data from systems problems through:
   • Backups, recovery, range checking
   • A DBMS checks elements as they are entered to enforce the necessary structural
     constraints of the internal data
   • Synchronisation: refers to forms of protection against inconsistencies that can result
     from multiple simultaneous users.
   • E.g. when a user is about t delete a record, the other user is warned or prevented
     from accessing that information
• Minimisation of Redundancy: Storing values that are dependent on others
  values. Not desired in a DBMS.
• Efficiency: efficient data access are largely dependent upon the volume of
  the data stored, the method of data encoding, design of database
  structure, complexity of the query
Database Management Systems (DBMS)
• Two types of distinct data are important
• Logical Data
   • The way in which data appears to the user
• Physical Data
   • Details of data organisation as it actually appears in memory or on storage
     media
LIS vs DBMS
• Functions of a LIS
   •   Data input
   •   Data storage
   •   Management
   •   Transformation Analysis
   •   Output
• The data management functions are necessary in any LIS to facilitate
  the storage, organisation, and retrieval of data using a DBMS
• DBMS: is defined as a set of computer programs for organising
  information at the core of which there will be a database.
LIS Data File Management
• The storage of attribute data is a well established technology and is
  analogous to a filing system
• Files: allow the machine to keep track the records of data available
  and retrieve those records in any order
• There are several computer file structures:
Heap File
• When a file is created using Heap File Organization, the Operating
  System allocates memory area to that file without any further
  accounting details.
• File records can be placed anywhere in that memory area. It is the
  responsibility of the software to manage the records. Heap File does
  not support any ordering, sequencing, or indexing on its own.
Ordered Sequential File
• Every file record contains a data field (attribute) to uniquely identify
  that record.
• In sequential file organization, records are placed in the file in some
  sequential order based on the unique key field or search key.
Hash File Organization
• Hash File Organization uses Hash function computation on some
  fields of the records.
• The output of the hash function determines the location of disk block
  where the records are to be placed.
Clustered File Organization
• Clustered file organization is not considered good for large databases.
• In this mechanism, related records from one or more relations are
  kept in the same disk block, that is, the ordering of records is not
  based on primary key or search key.
Database Models
• Many database systems today are relational,
• The relational model:
   • Is extremely easy to explain and implement
   • Well suited for ad-hoc queries
   • Relational languages are easier to learn than navigational languages
• Hierarchic and Network systems are navigational
   • Relationships are often displayed as links on a diagram
   • These links are called pointers
   • May provide faster response time for predefined queries
Relational Database Model – Key Concepts
• The records in a file are all of the same record type, containing a fixed set of
  fields (attributes)
• A relational database is a collection of tabular relations, each having a set of
  attributes
   • In relational data model, relations are saved in the format of Tables.
   • A table has rows and columns, where rows represents records and columns represent the
     attributes
• The data in a relation are structured as a set of rows called Tuples consisting a list
  of values, each one for each attribute
   • A single row of a table, which contains a single record for that relation is called a tuple
• An Attribute has associated with it a domain from which the values are drawn
   • Every attribute has some pre-defined value scope, known as attribute domain
• Most current systems requires that values must be atomic in that they can not
  be decomposed as lists or arrays
• Relation key − Each row has one or more attributes, known as relation key, which
  can identify the row in the relation (table) uniquely
Database Schema
• Database Schema
   • The skeleton structure that represents the logical view of the entire database.
   • It defines how the data is organized and how the relations among them are
     associated.
   • It formulates all the constraints that are to be applied on the data.
• A database schema defines its entities and the relationship among them.
• It contains a descriptive detail of the database, which can be depicted by
  means of schema diagrams.
• A database schema can be divided broadly into two categories −
• Physical Database Schema − Pertains to the actual storage of data and its
  form of storage like files, indices, etc.
• Logical Database Schema − Defines all the logical constraints that need to
  be applied on the data. It defines tables, views, and integrity constraints.
Physical vs Logical Schema
Relational Joins
• Matches data from one table to corresponding data from other tables
• The primary key (in one or more columns) is matched to another
  column (the foreign key) in another table
• In this way, each table can be kept simple, and a complex scheme of
  linking attained
• This allows quite complex searches while maintaining a very simple,
  well defined and easily developed set of tables
Relational Joins
User Interface
Summary of Relational Terminology
Normal / Human Terminology   File Terminology   RDBMS Terminology
Table                        File               Relation
Row                          Record/case        Tuple
Column                       Field              Attribute
Number of Columns            Number of Fields   Degree
Number of Rows               Number of Cases    Cardinality
Unique ID                    Primary Key        Index
                             Possible Values    Domain
Relational Model – Normal Forms
• First Normal Form (1NF):
   • A Relation must contain rows and columns; because the columns are used as search
     keys, there should be a single value at each row location
   • A relation is in 1NF if the domain of each attribute contains only atomic values, and the
     value of each attribute contains only a single value from that domain
• Second Normal Form (2NF):
   • Every column that is not a primary key must be totally dependent upon the primary key
   • A relation is in 2NF if it is in 1NF and no non-prime attribute is dependent on any proper
     subset of any candidate key of the table. A non-prime attribute of a table is an attribute
     that is not a part of any candidate key of the table.
   • Put simply, a table is in 2NF if it is in 1NF and every non-prime attribute of the table is
     dependent on the whole of every candidate key.
• Third Normal Form (3NF):
   • The columns that are not the primary keys must depend upon the primary key; whereas
     the primary key does not depend on the non primary keys
   • The relation is in 2NF and all the attributes in a table are determined only by the
     candidate keys of that table and not by any non-prime attributes.
SQL
• Queries build upon expressions based on Relational Algebra
   • SELECT – a subset of rows
   • PROJECT – a subset of columns
   • JOIN – to join tables based on Key Fields
• Advantages of SQL
   • Simplicity
   • High Level language (4th Generation)
   • Completeness
Storage of LIS Data
• In principle, there are three different files to be created for any LIS
  project
   • Topological data file
   • Coordinate file
   • Attribute data
• There are two models of storing data
   • The hybrid data model
   • The integrated data model
Hybrid Data Model
• Optimal for locational information
• GIS software manages linkages between cartographic files (Operating
  System files) and the DBMS (Attribute data)
Integrated Data Model
• Stores both the coordinates / topological & attribute information in
  the DBMS
• X, Y coordinate pairs for individual vertices along with line segments
  are stored as different rows in a database table
Entity-Relationship-Attribute Model
• Primary components are its objects and entities
• One of the main modelling approach (the other is Object Oriented)
• An entity is a semantic data modelling construct
• Entities are described using their attributes; e.g. name; boundary,
  location
• Entities have explicit relationships with other entities
• Entities are grouped into types, where entities of the same type have
  similar attributes
Database Modeling
External Modeling
Conceptual modeling
Logical modeling
• Poles [REA_No.. Project Name, Pole _ID; Pole _ Type; Pole _Class; Pole _Height ; Treatment _Type ; Date
   _Manufactured; Inspection Date; Description]
• Transformer [REA_No. Project Name, Transformer _ Number, Rating, Manufacturer…….]
• Fuse [REA_No. Project Name; Phase, Position, Rated, Voltage ………………………………..]
• Substation [REA_No.. Project Name, Substation_Number; Load connection; …………]
• HT Line (Cable) [REA_No, Phase_ code, Conductor_ length, conductor_ neutral,………. , ]
• LV Line (Cable) [REA_No, Phase code, Conductor_ length, conductor_ neutral, construction ……….,]
• Angle Points [ REA_No., AP No. Project Name, Description]
• Parcels [REA_No. , Project _Name, Parcel_ID; Owner; Owner _ID, Area, Description].
• Proposed Route Survey [REA_No. Project Name, Length; Description
Physical modeling
Protyping
Full implementation