GEOSPATIAL INFORMATION SYSTEMS A
GIS SUBSYSTEMS (FUNCTIONS)
LECTURE 5: DATA STORAGE AND MANAGEMENT
M. Gwena
Introduction
• A GIS is a computer-based system that provides the
following four sets of capabilities to handle geo-
referenced data.
1. Data capture and preparation
2. Data storage and management
3. Data manipulation and analysis
4. Data presentation/Output
Objectives
• At the end of this topic the student should be able to:-
• Understand the concept of GIS databases
• Explain the functions of a database
• Appreciate the advantages of database approach to Storage
and Management of GIS data.
• Know the various database models
• Differentiate among the various database models
DATA STORAGE AND MANAGEMENT - Introduction
• The second component/function of a GIS is the data storage and retrieval
subsystem.
• This subsystem organizes the data, both spatial and attribute, in a form which
permits it to be quickly retrieved for updating, querying, and analysis.
• Most GIS software utilizes proprietary software for their spatial editing and
retrieval system, and a database management system (DBMS) for their
attribute storage.
• Typically, an internal data model is used to store primary attribute data
associated with the topological definition of the spatial data.
• Most often these internal database tables contain primary columns such as
area, perimeter, length, and internal feature id number.
• Often thematic attribute data is maintained in an external DBMS that is linked
to the spatial data via the internal database table
DATA STORAGE AND MANAGEMENT
Spatial
&
Attribute data
A database
• What is a database?
• A database is a collection of interrelated tables in digital format.
• A database is a shared collection of logically related data, designed to
meet the information needs of an organization.
• A database typically consists of:
• Tables - Collection of related records
• Fields (columns) - Single category of data to be stored in a database
(name, telephone number, etc.)
• Records (rows) - Collection of related fields in a database (all the fields for
one customer, for example)
• Database engine
-The part of the program that actually stores and retrieves data
– Microsoft Access, OpenOffice Base, Corel Paradox, Oracle Database,
etc.
Databases: Major Functions of a database
Functions of a spatial database
Advantages of the database approach to storing
geographic data
• Reduction in redundancy since all data is stored in a single location.
• Reduction in maintenance cost due to better organization and reduction in data
duplication.
• Applications become data independent since multiple applications can use the
same data and evolve separately over time.
• User knowledge can be transferred between applications more easily since the
database remains constant.
• Data sharing is enabled and a corporate view of data can be provided to all
managers and users.
• Security and standards for data and data access can be established and enforced.
• DBMS are better suited to managing large numbers of concurrent users working
with vast amounts of data.
Advantages of the database approach to storing geographic
data Cont.
Disadvantages of the database approach to storing geographic
data
• The cost of acquiring and maintaining DBMS software can be quite high.
• A DBMS adds complexity to the problem of managing data especially in
small projects.
• Single user performance will often be better for files, especially for more
complex data types and structures where specialist indexes and access
algorithms can be implemented.
Database Management Systems (DBMS)
• It is a software application designed to organize the efficient and
effective storage and access of data.
• A DBMS enables creation, editing, manipulation and analysis of spatial
and non-spatial data in the application of GIS.
• Small, simple databases can be stored on a computer disk in standard
files. However, large and more complex databases used by many users
require a special DBMS software to ensure database security and
longevity.
• Many standard PCs are equipped these days with a DBMS called
ACCESS.
Capabilities of a DBMS
• A data model: this is a mechanism used to represent real-world objects
digitally in a computer system ( integer, floating-point numbers, dates, text,
etc.)
• A data load capability: these are tools that facilitate loading of data into the
database e.g. data types.
• Indexes: this is a data structure used to speed up searching.
• A query language: One of the advantages of DBMS is that they support a
standard data query/manipulation language called SQL (Structured/Standard
Query Language).
• Security: Controlled access to data (read only, read and write (create, update
and delete)).
Capabilities of a DBMS………
• Controlled update: this is done to ensure that updates affecting more than
one part of the database are coordinated.
• Backup and recovery: Protect data from system failure and incorrect update.
• Database administration tools: These are tools that enable the database
administrator to set-up the structure of a database (the schema), creating and
maintaining indexes, tuning to improve performance, backing up and
recovering, and allocating user rights.
• Applications: Modern DBMS are equipped with standard-general purpose
tools for creating, using and maintaining database, e.g. applications for
designing databases like the CASE(Computer Aided Software Engineering)
tools and for building user interfaces for data access and presentations (forms
and reports).
• Application Programming Interfaces (APIs) to enable customization for
special applications.
Database Models (or types)
Hierarchical Model
• Organizes data in a tree structure.
• There is a hierarchy of parent and child data
segments which means that a record can have
repeating information in the child data segments.
• The links between the records use the Parent-
Child Relationships.
• These are 1:N mapping and is done using trees.
• Hierarchical DBMSs were popular from the late
1960s, with the introduction of IBMs Information
Management System (IMS), through the 1970s.
Network Model
• Here some data were more naturally modelled with
more than one parent per child which means many to
many relationships.
• In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model.
• It is based on mathematical set theory where the
basic data modelling construct in the network model
is the set construct.
• A set consists of an owner record type, a set name,
and a member record type.
• A member record type can have a role in more than
one set, thus supporting the multi-parent concept.
• An owner record type can also be a member or
owner in another set.
DBMS available to GIS users today
• There are three main types of DBMS available to GIS users today,
namely:-
• Relational DBMS (RDBMS)
• Object DBMS (ODBMS)
• Object Relational DBMS (ORDBMS)
Relational Data Model
• Allows the definition of data structures, storage
and retrieval operations and integrity
constraints.
• Such a database, the data and relations
between them are organised in tables.
• A table is a collection of records and each
record in a table contains the same fields.
• Properties of Relational Tables:-
• Values are atomic(only one value per cell)
• Each row is unique
• Column Values are of the same kind
• The sequence of columns is insignificant
• The sequence of rows is insignificant
• Each column has a unique name.
Relational database
• A relational database is a collection of tables, also called relations,
which can be connected to each other by keys.
• A primary key represents one or more attributes whose values
can uniquely identify a record in a table. Its counterpart in
another table for the purpose of linkage is called a foreign key
• Advantages
• Each table in the database can be prepared, maintained, and
edited separately from other tables
• Efficient data management and processing, since linking tables
query and/or analysis is often temporary
Relational database…
Example of Un-normalized Relational Table
Example of normalized relational Tables
Relational Database
Relational Database
Three tables linked by keys
Join and relate tables
Once tables are separated as
relational tables, then two
operations can be used to link
those tables during query and
analysis
Join
Join, brings together two
tables based on a common
key. relate
Relate, connects two tables
(based on keys) but keeps Join
the tables separate.
relate
One-to-One Join
Employee-id Job Employee-id Name
1 Digislave 1 Tom
2 Useless Supervisor 2 John
Join Employee-id to Employee-id
Employee-id Job Name
1 Digislave Tom
2 Useless Supervisor John
After join
DATA STORAGE
• Digital information is stored in a computer as binary digits (or bits),
each of which can have a value of 0 or 1.
• A byte is a group of 8 bits
• Computer storage is usually measured in bytes
A kilobyte =1024 ( i.e. 103 bytes or 210)
A megabyte=106 220
A gigabyte=109 230
A terabyte=1012 240
A petabyte=1015 250
Example
• 100101
(1×2⁰)+(0×2¹)+ (1×2²)+ (0×2³)+ (0×2⁴)+ (1×2⁵)
• 11001
Storage devices
Storage media Capacity
CD-ROM 700MB
Rewritable CD 700MB Max 1000 writes
DVD 18GB
Flash disk 32GB
Hard disk 1TB
External hard disk 2TB
Cloud Unlimited
Removable storage
• Magnetic media: floppy disks, digital audio tapes
• Optical media: Compact Disk ROMs, Rewritable Optical Discs, DVDs,
• External hard drives: mass storage system
Non-removable storage and Networks
• Due to falling prices, large data can now be stored on single
computers
• To minimize duplication, use of networks within organizations and
also globally.
E.g. cloud computing
Data storage considerations
• The two main considerations relate to:
- Space
- Time
• There is usually a tradeoff between minimizing the space
required to access the data and maximizing the speed at which it
can be accessed.
DATA MANAGEMENT
• Includes all procedures involved in
• data storage
• retrieval
• updating
• backup
• exchange of data
• archiving.
Data Storage & Management Issues
• Legal Issues
• Data ownership: proprietorship of data that has been collected by a
private agency using private forms.
• Data custodianship: proprietorship of data that has been collected by
a private agency using public funds, on behalf of the public.
• Data license: authority given to an agency or individual to reproduce
and manage data on behalf of its owner.
Data commercialization
• Give the information for free:- not sustainable
• Give the information at a provisional cost. Charge cost of
reproduction and provision. Its citizen friendly. For government
departments.
• Partial cost recovery:- charge part of the production and
reproduction plus provision. Mostly recommend for parastatals.
• Sell information at market price. charge part of the production
and reproduction plus provision and profit. For private sectors.
Data privacy
• Privacy concerns exist whenever personally identifiable
information or other sensitive information is collected and
stored- in digital form or otherwise.
• Data privacy issues can arise in response to information from a
wide range of sources, such as:
Health care records
Criminal justice investigations
Financial institution and transactions
Residence and geographic records
Ethnicity
Location-based service
Individual’s right to privacy
• The challenge in data privacy is to share data while protecting personal
information
• Government-held information contains vast amount of information
pertaining to personal particulars.
• Individual’s right to privacy is seen as basic human rights in modern
democracy.
• Right to privacy is enforced through privacy policies:
• Privacy policy is a statement or a legal document that discloses some or all
of the ways a party gathers, uses, discloses and manages a customer or
clients data.
• Personal information can be anything that can be used to identify and
individual e.g. name, address, date of birth, marital status, contact
information, financial records, medical history etc.
The public’s right to know
• Access to government-held information is every citizens right
• Public access is restrictive and limitations are always imposed in view of
• Information pertaining to national security and foreign policies
• Information concerned with the making of national economic,
financial and monetary policies.
• The interest of preventing or prosecuting crimes
• The protection of personal integrity or the economic conditions of
individuals.
Intellectual property rights
Rights to financial benefits from and control of non-tangible
property that is the result of creativity. Such rights can be protected
by:
• Patent: form of limited monopoly granted by the state to an inventor
of any new and useful art, process, machine or any new
improvement to any of those.
• Copyright: a protection accorded to the form of expression or
original work.
Data security
• Data security means protecting data, such as a database, from
destructive forces and from the unwanted actions of unauthorized
users
• Or Protecting data against disclosure or modification by
unauthorized persons.
• Enable recovery of data if lost or corrupted.
• Data security mechanisms include:
• Encryption
• Antivirus
• Backups
• Bio metrics
• Passwords
• Physical security (padlocks, security guards)
Data Security….
• Protects data against destruction and misuse
• Protects against unauthorized access to and unauthorized use of a
database
• Database activity monitoring programs can be used to detect
possible intrusions and risks
• Prevents data loss
• Should include strict backup and disaster-recovery procedures
(disaster-recovery plan)
• Should be used with both in-house and cloud databases
THE END
END OF LECTURE 5