Introduction of Geographical
Information System
Tong Si Son
Tong-si.son@usth.edu.vn
1
Objective of the course
1/ Basic knowledge on GIS: definitions, structural and
functional components..
2/ Help students understanding the spatial information,
models of spatial information and their organization
3/ Provide practical skills of using GIS software for some
simple applications
4/ Think spatially: when and how to use certain
operation/tool
2
Contents of the course
Chapter 1: What is Geographic
Information System (GIS)?
Chapter 2: Spatial analysis
Chapter 3: Geospatial technologies
Chapter 4: Practices
3
Chapter 1. What is Geographic Information
System (GIS)?
1. Definition of GIS
2. Structure of GIS
3. Spatial models
4. Regression models
4
Reasons for GIS
• Our world is constantly changing, and not all changes are for the better.
– Natural causes: e.g., volcanic eruptions
– Human causes: e.g., land use changes
– Mix / Unclear causes: e.g., El Niño / La Niña events
• We, humans, want to understand what is going on in our world, and to
take action(s).
• The fundamental problem in many uses of GIS is that of understanding
phenomena that have (a) a geographic dimension, and (b) a temporal
dimension.
– Spatio-temporal: be of/in space and time
“Everything that happens, happens somewhere in space and time. ” - Michael
Wegener (University of Dortmund)
5
1/ Definition of GIS
GIS: a computer system is build to capture, store,
manipulate, analyze, manage and display all kinds of
spatial or geographical data.
GIS applications: tools allow end users to perform spatial
query, analysis, edit spatial data and create hard copy
maps.
GIS answers questions:
What appears in a location? (Mapping)
Where is a physical object located? (Spatial analysis)
When the phenomena occur? (Prediction GIS models)
6
Definition of GIS
• A geographic information system (GIS) lets us
visualize, question, analyze, and interpret data to
understand relationships, patterns, and trends.
(ESRI)
• In the strictest sense, a GIS is a computer system
capable of assembling, storing, manipulating, and
displaying geographically referenced information
(that is data identified according to their
locations). (USGS)
7
Definition of GIS
GIS models to simulate the real world
8
GIS = “G” + “I” + “S”
• “G” = Geographic
– Denotes the concept of spatial location on Earth’s surface
– Importance of relative location (not just where you are but
where you are in relation to everything else)
– Theories and techniques in Geography form the basis of GIS
9
GIS = “G” + “I” + “S”
• “I” = Information
– Substance (knowledge) about location
– Factual and interpretative
– Tables + Maps + Analysis
– Transformation of table information into spatial context
for analysis
– Technology and computer systems
• “S”
- Systems
- Science
- Studies
- Services
10
Advantage of GIS
https://grindgis.com/what-is-gis/what-is-gis-definition
• Better decision made by government people
• Improve decision making with the help of layered information
• Citizen engagement due to better system
• Help to identify communities that is under risk or lacking infrastructure
• Helps in identifying criminology matters
• Better management of natural resources
• Better communication during emergency situation
• Cost savings due to better decision
• Finding different kinds of trends within the community
• Planning the demographic changes
11
Examples of GIS
• Urban Planning, Management • Civil Engineering/Utility
– Land information acquisition – Locating underground facilities
– Economic development – Coordination of infrastructure
– Housing renovation programs maintenance
– Emergency response • Business
– Crime analysis – Demographic Analysis
• Environmental Sciences – Market Penetration/ Share
– Monitoring environmental risk Analysis
– Modeling storm water runoff • Education Administration
– Management of watersheds, – Enrollment Projections
floodplains, wetlands, forests – School Bus Routing
– Environmental Impact Analysis • Real Estate
– Hazardous or toxic facility – Neighborhood land prices
siting – Traffic Impact Analysis
• Political Science • Health Care
– Analysis of election results – Epidemiology
– Predictive modeling – Service Inventory
12
Geospatial technologies
• Geospatial technology / Geomatics
– Land surveying
– Remote sensing
– Cartography
– Geographic information systems (GIS)
– Global navigation satellite systems (GPS, GLONASS, Galileo,
Compass)
– Photogrammetry
– Geography
–…
13
History of GIS
• Year 1854: John Snow, 1984: Term of GIS, used points on London residential map to plot outbreak of
Cholera
John Snow, Cholera Outbreak Map
• Year 1960: Modern computerized GIS system began in year 1960
• Year 1962: Dr. Roger Tomlinson (father of GIS): Canadian Geographic Information System (CGIS) to
store, analyze and manipulate data. CGIS had the capacity to overlay, measurement and digitizing.
• Year 1980: GIS software: M&S Computing, Environmental Systems Research Institute (ESRI),
Computer Aided Resource Information System (CARIS). ESRI products like ArcGIS, ArcView hold 80
% of global market.
14
2. Structure of GIS
(GIS Components)
15
https://www.edc.uri.edu/nrs/classes/NRS409509/Lectures/3GISdefined/GIS_Defined.htm
Spatial data and Geoinformation in GIS
Data, Metadata, Spatial data, Geospatial data, information, Geoinformation
• “Data” are representations that can be operated upon by a computer.
• “Metadata” are data about data.
• “Spatial data” are data that contain positional values.
• “Geospatial data” are spatial data that are georeferenced.
– In the context of GIS, spatial data and geospatial data are regarded as
synonyms of georeferenced data.
• “Information” is the meaning of data as interpreted by human beings.
• “Geoinformation” is information that involves interpretation of spatial
data.
16
The real world and representations of it
• When dealing with data and information we are usually trying
to represent some part of the real world as it is, as it was, or
perhaps as we think it will be.
– We say ‘some part’ because the real world cannot be
represented completely.
• We use a computer representation of some part of the real
world to enter and store data, analyze the data and transfer
results to humans or to other systems.
17
• A representation of some part of the real world can be considered a
model of that part.
– This allows us to study the model instead of the real world.
• Models come in many different flavors.
– Maps
– Databases
–……
• Most maps and databases can be considered static models.
• Dynamic models or process models address changes that
have taken place, are taking place and may take place.
18
3. MODELS IN GIS
The real world and its spatial models
• MODEL: “A model is a manageable, comprehensible and schematic
representation of a piece of reality”
- “reality” – no hypothetical system
- “a piece of reality” – limited domain in time and space
- “schematic representation” – from a specific point of view
- “representation” - an infinite number of projections
- “a comprehensible representation” - a model serves a specific goal
- “a manageable representation” - it should give the user the results
they need
19
MODELS IN GIS
• Different types of models:
- Most familiar is the map
- A collection of stored data
representing real-world
phenomena is also a model–
data model
- Analogue maps
- Digital models
Cairo, Egypt 20
Characteristics of GIS models
What is the “problem” to be modeled? And How ?
Modeling
Geographic
data
• Given the complexity of real world phenomena
• Models can by definition never be perfect.
• Limitations on the amount of data that we can store
• Limits on the amount of detail we can capture
• Limits on the time we have available for a project.
• Some facts or relationships that exist in the real world may not be discovered
through ‘models’
21
Characteristics of GIS models
What is the “topic” to be modeled?
What are the important phenomena?
22
Characteristics of GIS models
We model theme by
theme by determining the
important phenomena.
Buildings Infrastructures
Land use Water body
23
Characteristics of GIS models
(Scale in a digital model?)
• Spatial resolution/extent
• Temporal resolution/extent
• Define what is left out of the model
• Leave out uncertainty about model data,
predictions
• Model must run faster than the real world
• Ecological fallacy
Characteristics of GIS models
What is the “scale” of the model?
Which are small scale, large scale?
25
What are the differences?
Characteristics of GIS models
Something are removed in the models?
26
Characteristics of GIS models
(Time scale of model)
Drought’s Footprint (1930 to 2000)
Image source: National Climatic Data Center, NOAA 27
Paper maps Vs Digital model
• Fixed scale of printed map • Flexible scale selection (
• All layers in one generalization)
• Fixed coordinate system • Storage of separate layers
• Projection
• Uneditable
• Editable
• Measurement limitation
• Spatial analysis function
• Area of interest is out of frame • Selection of area of interest
• Unchangeable symbolization • Symbolization
• Mass production/printing • Thematic map for single use
28
WHAT ARE GEOGRAPHIC PHENOMENA?
• A geographic phenomenon is a manifestation of an
entity or process of interest that:
- Can be named or described
- Can be geo-referenced
- Can be assigned a time interval at which it is/was
present
• Not all relevant information about phenomena has the
form of a triplet:
- No name (un-described object)
- No geo-reference (legal document)
- No time (phenomenon that exists permanently)
29
WHAT ARE GEOGRAPHIC PHENOMENA?
• Euclidean space
A GIS operates under the assumption that the relevant spatial
phenomena occur in a two- or three-dimensional Euclidean space,
unless otherwise specified.
Euclidean space can be informally defined as a model of space in
which locations are represented by coordinates—(x, y) in 2D; (x, y,
z) in 3D—and distance and direction can defined with geometric
formulas. In the 2D case, this is known as the Euclidean plane,
which is the most common Euclidean space in GIS use.
30
TYPES OF GEOGRAPHIC PHENOMENA
Object or Field?
• A (geographic) field is a
geographic phenomenon
for which, for every point
in the study area, a value
can be determined.
• A (geographic) object is a
geographic phenomenon
that does not cover the
total study area, the space
in between objects is
potentially empty or
undetermined
Elevation map 31
TYPES OF GEOGRAPHIC PHENOMENA
Objects
32
TYPES OF GEOGRAPHIC PHENOMENA
Fields: CONTINUOUS vs DISCRETE
33
TYPES OF GEOGRAPHIC PHENOMENA
Fields: CONTINUOUS
34
TYPES OF GEOGRAPHIC PHENOMENA
Fields: CONTINUOUS
35
TYPES OF GEOGRAPHIC PHENOMENA
Fields: DISCRETE
36
TYPES OF GEOGRAPHIC PHENOMENA
Fields: DISCRETE
37
Map of regions of Vietnam Administration map of Vietnam
TYPES OF GEOGRAPHIC PHENOMENA
Fields: DISCRETE
38
Geographical features/phenomena
How do we describe geographical features?
• by recognizing two types of data:
– Spatial data which describes location (where)
– Attribute data which specifies characteristics at that location
(what, how much, and when)
How do we represent these digitally in a GIS?
• by grouping into layers based on similar characteristics (e.g hydrography,
elevation, water lines, sewer lines, grocery sales) and using either:
– vector data model (coverage in ARC/INFO, shapefile in ArcView)
– raster data model (GRID or Image in ARC/INFO & ArcView)
• by selecting appropriate data properties for each layer with respect to:
– projection, scale, accuracy, and resolution
How do we incorporate into a computer application system?
• by using a relational Data Base Management System (DBMS)
39
GIS Data Model
based on
data layers
or themes
Examples of layers or themes
• Data is organized by layers, coverages or themes, with each
theme representing a common feature.
• Layers are integrated using explicit location on the earth’s
surface, thus geographic location is the organizing principal.
Digital Elevation Streams
Watersheds Waterbodies
Models
An integrated view
• Layers are integrated using explicit location on the earth’s
surface, thus geographic location is the organizing principal.
Example of layers or themes
roads
Here we have three layers or themes:
- roads,
- hydrology (water),
- topography (land elevation)
hydrology
They can be related because precise
geographic coordinates are recorded
for each theme.
topography
How are layers described?
•Layers are comprised of two data types:
- spatial data which describes
location (where) roads
stored in a shape file
- attribute data specifying what, how
much, when
stored in a database table
hydrology
GIS systems traditionally maintain spatial
and attribute data separately, then “join”
them for display or analysis
topography
Attribute data types
Categorical (name): Numerical
– nominal Known difference between
values
• no inherent ordering
Expressed as integer [whole
• land use types, county names
number] or floating point
– ordinal [decimal fraction]
• inherent order • temperature (Celsius or
• road class; stream class Fahrenheit), income, age,
rainfall
• often coded to numbers eg SN but can’t
do arithmetic
Attribute data tables can contain locational information, such as addresses or a list of
X,Y coordinates. However, these must be converted to true spatial data (shape file), for
example by geocoding, before they can be displayed as a map.
45
Attribute data types
Parcel Table
Parcel # Address Block $ Value
8 501 N Hi 1 105,450
entity 9 590 N Hi 2 89,780
36 1001 W. Main 4 101,500
75 1175 W. 1st 12 98,000
Key field Attribute
Contain Tables or feature classes in which:
– rows: entities, records, observations, features:
• ‘all’ information about one occurrence of a feature
– columns: attributes, fields, data elements, variables, items
• one type of information for all features
The key field is an attribute whose values uniquely identify each row
46
Spatial Data
The spatial component of a
layer may be represented
in two ways:
• in raster (image) format
as pixels
•in vector format as points
and lines and areas (PLA-
model)
Concept of
Vector and Raster Real World
Raster Representation Vector Representation
0 1 2 3 4 5 6 7 8 9
0 R T
1 R T
2 H R
point
3 R line
4 R R
5 R
6 R T T H
7 R T T polygon
8 R
9 R 48
Representing Data using Raster Model
• Area is covered by grid with (usually) equal-sized
cells corn fruit
• Location of each cell calculated from origin of
grid: Column, row
clover
wheat
• Cells often called pixels (picture elements); raster
data often called image data fruit
• Attributes are recorded by assigning each cell a 0 1 2 3 4 5 6 7 8 9
single value based on the majority feature 0 1 1 1 1 1 4 4 5 5 5
1 1 1 1 1 4 4 5 5 5
1
(attribute) in the cell, such as land use type. 2 1 1 1 1 1 4 4 5 5 5
3 1 1 1 1 1 4 4 5 5 5
• Easy to do overlays/analyses, just by ‘combining’ 4 1 1 1 1 1 4 4 5 5 5
2 2 2 2 2 2 2 3 3 3
corresponding cell values: “yield= rainfall + 5
6 2 2 2 2 2 2 2 3 3 3
fertilizer” (why raster is faster, at least for some 7 2 2 2 2 2 2 2 3 3 3
8 2 2 4 4 2 2 2 3 3 3
things) 9 2 2 4 4 2 2 2 3 3 3
• Simple data structure: directly store each layer as a
single table
Raster Data Structures
•Square grid: equal length sides
–4-connected neighborhood (rook’s case)
•all neighboring cells are equidistant
–8-connected neighborhood (queen’s case)
•all neighboring cells not equidistant
•Rectangular
commonly occurs for lat/long when projected
data collected at 1degree by 1 degree will be varying sized rectangles
•triangular (3-sided) and hexagonal (6-sided)
–all adjacent cells and points are equidistant
•triangulated irregular network (TIN):
–vector model used to represent continuous surfaces (elevation)
–more later under vector
Raster Data Structures
Runlength Compression (for single layer)
Full Matrix--162 bytes Run Length (row)--44 bytes
111111122222222223 1,7,2,17,3,18
111111122222222233 1,7,2,16,3,18
111111122222222333 1,7,2,15,3,18
111111222222223333 1,6,2,14,3,18
111113333333333333 This is a “lossless”
1,5,3,18
111113333333333333 compression, as 1,5,3,18
opposed to “lossy,”
111113333333333333 since the original data 1,5,3,18
111333333333333333 can be exactly 1,3,3,18
reproduced.
111333333333333333 1,3,3,18
Raster Model
Raster data are good at representing continuous phenomena, e.g.,
•Wind speed
•Elevation, slope, aspect
•Chemical concentration
•Likelihood of existence of a certain species
•Electromagnetic reflectance (photographic or
satellite imagery)
Raster Model
Best for continuous features:
Much data comes in this form •elevation
•images from remote sensing
•temperature
(LANDSAT, SPOT)
•scanned maps •soil type
•land use
• digital orthophoto • digital elevation
model (DEM)
Raster Model: Pros and Cons
• [+] Continuous (surface) data represented
easily
• [+] Simple data structure, fast indexing
• [–] Shape of discrete polygonal features
generalized by cells
• [–] Intersection of two lines
Vector Format
• point (node): 0-dimension
– single x,y coordinate pair
– zero area
– tree, oil well, label location
• line (arc): 1-dimension
– two (or more) connected x,y
coordinates
– road, stream
• polygon : 2-dimensions
– four or more ordered and
connected x,y coordinates
– first and last x,y pairs are the
same
– encloses an area
– census tracts, county, lake
55
Point Data using the Vector Model:
data implementation
•Features in the theme (coverage) have
Y
1 5
unique identifiers--point ID, polygon ID,
arc ID, etc
•common identifiers provide link to:
4 –coordinates table (for ‘where)
2 –attributes table (for what)
3
Coordinates Table Attributes Table
Point ID x y Point ID model year
1 1 3 1 a 90
X 2 2 1 2 b 90
3 4 1 3 b 80
4 1 2 4 a 70
5 3 2 5 c 70
•Again, concepts are those of a relational data base, which
is really a prerequisite for the vector model
Vector Model
Lines: fundamental spatial data model
node
vertex vertex
vertex vertex
node
• Lines start and end at nodes
- line #1 goes from node #2 to node #1
• Vertices determine shape of line
• Nodes and vertices are stored as coordinate pairs
Vector Model
Polygons: fundamental spatial data model
• complex data model, especially for larger data sets
• arc-node topology
Vector Model
Polygons: fundamental spatial data model
59
1 II 2 Birch
Node/Arc/ Polygon and Attribute Data
Smith
I Estate A34 III A35 Relational Representation: DBMS required!
4 IV 3 Cherry
Attribute Data
Spatial Data Node Feature Attribute Table
Node Table Node ID Control Crosswalk ADA?
Node ID Easting Northing 1 light yes yes
1 126.5 578.1 2 stop no no
2 218.6 581.9 3 yield no no
3 224.2 470.4 4 none yes no
4 129.1 471.9
Arc Feature Attribute Table
Arc Table Arc ID Length Condition Lanes Name
Arc ID From N To N L Poly R Poly I 106 good 4
I 4 1 A34 II 92 poor 4 Birch
II 1 2 A34 III 111 fair 2
III 2 3 A35 A34 IV 95 fair 2 Cherry
IV 3 4 A34 Polygon Feature AttributeTable
Polygon Table Polygon ID Owner Address
Polygon ID Arc List A34 J. Smith 500 Birch
A34 I, II, III, IV A35 R. White 200 Main
A35 III, VI, VII, XI
Variety of Vector Models
Spaghetti model
Topological model (most common)
Triangulated irregular network (TIN)
Vector Model: Spaghetti
Very efficient algorithms to
detect properties
Source: Lakhan, V. Chris. (1996).
Introductory Geographical Information Systems. p. 54.
Vector Model: Topological
The topological data model is used
four relations
R1: every line has two endpoints
R2: every line has two areas
R3: every area is surrounded by
lines
R4: every point is surrounded by
areas and lines
Bernhardsen, Tor. (1999). 2nd Ed. Geographic Information Systems: An Introduction. p. 62. fig. 4.12.
Topological Data Model
The topological data model is used four relations
R1: every line has two endpoints
R2: every line has two areas
R3: every area is surrounded by lines
R4: every point is surrounded by areas and lines
TIN: Triangulated Irregular Network Surface
Points Polygons Attribute Info. Database
Node # X Y Z Polygon Node #s Topology Polygons Var 1 Var 2
1 0 999 1456 A 1,2,4 B,D A 1473 15
2 525 1437 1437 B 2,3,4 A,E,C B 1490 100
3 631 886 1423 C 3,4,5 B,F,G C 1533 150
etc D 1,4,6 A,H D 1486 270
etc etc.
Elevation points (nodes) chosen
based on relief complexity, and
then their 3-D location (x,y,z) Elevation points connected
to form a set of triangular Attribute data associated
determined. via relational DBMS (e.g.
polygons; these then
2 represented in a vector slope, aspect, soils, etc.)
1 structure.
A E
B 3
D C F
H4 G
6 5
TIN
66
Advantage and Disadvantage of using raster and vector Data
https://grindgis.com/what-is-gis/what-is-gis-definition
• Raster data model record • Vector data are easily overlaid, for
value of all the points of example overlaying roads, rivers,
the area covered which land use are easier than raster data.
required more data storage • Vector data are easier to scale, re-
than model represented by project or register.
the vector model. • Vector data are more compatible
• Raster data is less with the relational database
management system.
expensive to create
computationally compare • Vector file sizes are way smaller than
raster image file sizes.
to vector graphics.
• Vector data are easier to update like
• Raster data has issue while
adding river stream but has to be
overlaying multiple images.
recreated for the raster image. 67
4. Regression models and Process
Models
• Regression model relates a dependent
variable to a number of independent
(explanatory) variables in an equation which
can then be used for prediction or estimation
• Regression model can use an overlay
operation in GIS to combine variables needed
for the analysis
68
Linear Regression Model
• A multiple linear regression model is defined by
Where Y is the dependent variable, Xi is the
independent variable, b1,….. bn are the regression
coefficients, a is the intercept
The primary purpose of linear regression is to
predict values of Y from values of Xi
69
Linear Regression Model
70
Regression model applications
• Modeling traffic accidents as a function of speed, road conditions, weather,
and so forth, to inform policy aimed at decreasing accidents.
• Modeling property loss from fire as a function of variables such as degree of
fire department involvement, response time, or property values.
• Measuring the extent that changes in one or more variables jointly affect
changes in another. Example: Understand the key characteristics of the habitat
for some particular endangered species of bird (perhaps precipitation, food
sources, vegetation, predators) to assist in designing legislation aimed at
protecting that species.
• It is mainly applied for bird habitat identification, rainfall triggered land slide
model, predicting grass land bird habitat attitude towards national park
designation
71
Process Model
• A process model integrates existing knowledge about the environment process in
the real world into: a set of relationships and equations for quantifying the
processes
• A process model offers both a predictive capability and an explanation that is
inherent in the proposed processes
• Therefore process models are by definition predictive and dynamic models
• Environmental models are very complex and data intensive
• Environmental models are typically process models because they must deal with
the interaction of many variables including physical variables such as climate,
topography, vegetation, and soils as well as cultural variables such as land
management 72
Revised Universal Soil Loss Equation (RUSLE)
• RUSLE is a model that is widely used to estimate average annual
nonchannelized soil loss.
• Soil erosion is an environmental Process that involves climate, soil
properties, topography, soil surface conditions and human activities
• A well known model of soil erosion is the Revisited Universal Soil Loss
Equation (RUSLE)
• RUSLE predicts the average soil loss carried by runoff from specific field
slopes in specified cropping and management systems from range land
73
RUSLE is a multiplicative model
• with six factors
A= R*K*L*S*C*P
Where A is average soil loss
R- is the rainfall fun off erosivity factor
K is the soil erodibility factor
L is the slope length factor
S is the slope steepness factor
C is crop management factor(land cover) and
P = Support practice factor (conservation)
74
Example of RUSLE model
75
Practices
1. Introduction about QGIS
2. Open rater file
3. Read information of the raster
4. Subtract the raster by an extent
5. Design a geographic theme and attribute
information
6. Create a point layer with attribute information
7. Create a line layer with attribute information
8. Create a region layer with attribute information
76