Data Manipulation
Data Manipulation
Data Manipulation
It is done to obtain useful information from the data previously entered into the system. The
data manipulation encompasses two types of operations:
Operations to eliminate errors and update current datasets.
2. Operations that use analytical techniques to answer questions
specifics formulated by the user.
The manipulation process can range from a simple position of two or more maps,
up to a complex extraction of disparate information elements from a wide variety
of sources.
Geographic Information Systems (GIS), along with Computer-Aided Design
Computer-Aided Design (CAD) constitutes an integral part of the work. This includes the
visual representation of data and integration of data management solutions.
Data Creation
Modern GIS technologies work with digital information, for which there are
various methods used in the creation of digital data. The most used method is the
digitalization, where from a printed map or information taken in the field
transfer to a digital medium by using a Computer-Aided Design program
Computer (DAOo CAD) with geo-referencing capabilities.
Given the wide availability of ortho-rectified images (both satellite and
airlines), digitization through this means is becoming the main source of
extraction of geographic data. This form of digitization involves the search for data
geographic directly on aerial images instead of the traditional method of the
localization of geographic shapes on a digitization board.
Data Representation
SIG data represent real-world objects (roads, land use,
altitudes). Real-world objects can be divided into two abstractions: objects
discrete (a house) and continuous (amount of rainfall, an elevation). There are two
ways to store data in a GIS: raster and vector.
Raster
A type of raster data is essentially any type of digital image represented in
Grids. The raster or grid model of GIS focuses on the properties of space more
that in the accuracy of the location. It divides the space into regular cells where each one
of them represents a unique value.
Vectorial
In a GIS, geographic features are often expressed as vectors,
maintaining the geometric characteristics of the figures.
In vector data, the interest in representations focuses on precision of
localization of geographic elements in space and where the phenomena to
to represent discrete sound, that is, with defined limits. Each of these geometries is
linked to a row in a database that describes its attributes. For example, a base
Data that describes the lakes may contain information about their bathymetry, the quality
of water or the level of pollution. This information can be used to create a
map that describes a particular attribute contained in the database. The lakes can
have a range of colors based on the level of pollution. In addition, the different
the geometries of the elements can also be compared. Thus, for example, the GIS
it can be used to identify those wells (point geometry) that are around
2 kilometers from a lake (polygon geometry) and which have a high level of
contamination.
Vector elements can be created while respecting territorial integrity through
from the application of topological rules such as 'polygons must not be
"superimpose". Vector data can be used to represent variations.
continuities of phenomena. The contour lines and the irregular triangular networks (TIN)
they are used to represent altitude or other values in continuous evolution. The TIN are
value records at a localized point, which are connected by lines to form
an irregular mesh of triangles. The face of the triangles represents, for example, the
land surface.
There are advantages and disadvantages when using a raster or vector data model.
to represent reality.
Advantages
Vectorial Raster
The data structure is compact.
Store the data only of the elements The structure of the data is very
digitized so it requires less simple.
memory for its storage and
tratamiento.
Efficient encoding of the topology and The superposition operations are
las operaciones espaciales. muy sencillas.
Good graphic output. The elements are
represented as vector graphics Optimal format for high variations
that do not lose definition when the data is enlarged.
escala de visualización.
They have a greater compatibility with Good image storage
entornos de bases de datos relacionales. digitales.
The rescaling operations,
reproyección son más fáciles de ejecutar.
Data is easier to maintain and
actualizar.
Allows for greater capacity of
análisis, sobretodo en redes.
Disadvantages
Vectorial Raster
Greater memory requirement of
The structure of the data is more storage. All the cells.
compleja. contienen datos.
The superposition operations are Topological rules are more difficult.
más difíciles de implementar y representar. de generar.
The graphic outputs are less striking.
and aesthetics. Depending on the resolution
Reduced effectiveness when the variation of the raster file, the elements can
data is high. to have its original limits more or less
definidos.
It is a more laborious format of
mantener actualizado.
Has a very limited amount of
información que almacena.
Non-Spatial Data
Non-spatial data can also be stored alongside spatial data.
those represented by the coordinates of the geometry of a vector or by the position
from a raster cell. In vector data, the additional data contains attributes of the
geographic entity. For example, a polygon from a forest inventory can also have
a value that serves as an identifier and information about tree species. In the
raster data the cell value can store attribute information, but it can also
it can be used as an identifier referring to the records of a table.
Data Capture
Data capture and information entry into the system consume the most
part of the time of GIS professionals. There is a wide variety of methods
used to input data into a GIS stored in a digital format.
Data printed on paper or maps on PET film can be digitized or
scanned to produce digital data.
With the digitization of cartography on analog support, vector data is produced at
through traces of points, lines, and polygon boundaries. This work can be
developed by a person manually or through vectorization programs
that automate the work on a scanned map. However, in this latter case
it will always be necessary for manual review and editing, depending on the level of quality that
is desired to obtain.
The data obtained from topographic measurements can be directly entered into
a GIS through digital data capture instruments using a technique called
analytical geometry. In addition, the deposition coordinates taken through a system of
Global Positioning System (GPS) can also be directly input into a GIS.
Remote sensors also play an important role in data collection.
They are sensors, such as cameras, scanners or lidar coupled to mobile platforms like
airplanes or satellites.
A GIS designed for the calculation of optimal routes for emergency services is capable of
determine the shortest path between two points taking into account as many directions and
circulation senses such as prohibited directions, etc. avoiding impractical areas.
A GIS for the management of a water supply network would be capable of determining, by
For example, how many subscribers would be affected by the service cut at a certain point in the
red.
A Geographic Information System can simulate flows along a linear network.
Values such as the slope, speed limit, service levels, etc. can be
incorporated into the model in order to achieve greater accuracy. The use of GIS for the
Network modeling is commonly used in transportation planning,
hydrological or linear infrastructure management.
Errors in Measurements
In theory, the real values of hydrological elements cannot be determined by
measurement because measurement errors cannot be completely eliminated. The
uncertainty in measurement has a probabilistic nature that can be defined as the
interval where it is expected that the real value will remain with a certain probability or level
trustworthy. The width of the confidence interval is also called the margin of error.
If the measurements are independent of each other, uncertainty can be estimated in
the results of the measurements taking about 20 to 25 observations and calculating the value
from the standard deviation, and then determining the confidence level of the results. In
Generally, this procedure cannot be applied in hydrometric measurements, due to the
changes in the value to be measured during the measurement period. For example, it is evident that,
on the ground, consecutive flow measurements cannot be made with a flow meter at
constant level. Consequently, an estimation of the uncertainty must be made,
examining the different sources of error in measurement.
Another problem that arises in the application of statistical data to the data
Hydrological processes are due to the assumption that observations are random variables.
independent of a fixed statistical distribution. This condition is rarely met in
hydrological measurements. The flow of a river, by nature, is not random; it depends on
previous values. It is generally accepted that the way in which it is not very important
produce the separation between hydrological data and theoretical concepts of errors. Without
embargo, it should be emphasized that no statistical analysis can replace the
correct observations, particularly because these analyses cannot eliminate the
systematic errors. Only random errors can be characterized by statistical means.
The type of error made can be:
a.Random: random errors cannot be eliminated, but they can be reduced.
effects through the repetition of the measurements of the elements. The uncertainty in the
The arithmetic mean calculated from an independent measure is the square root of.
times smaller than the uncertainty of a single measurement. The distribution of the
random errors can be considered as normal (Gaussian). In some cases, the
Normal distribution may or should be replaced by other statistical distributions.
Strategies to Reduce Random Error
yStandardize the measurement methods in the operations manual.
yTraining and accreditation of the observer.
yRefinement of the measuring instrument.
yAutomation of the instrument.
yRepetition of the measurement.
b.Systematic: systematic errors mainly arise from the instruments
and they cannot be reduced by increasing the number of measurements, if the instruments and the
measurement conditions remain unchanged. If the systematic error has a value
known, this value must be added or subtracted from the measurement result and the error due to
this source should be considered null. The systematic error must be eliminated by
corrections, appropriate adjustments or changing the instrument, and/or changing the
flow conditions, for example, the length of the straight section of the approach channel
to a gauging station. Frequently, these errors are due to measurement conditions.
difficult, like non-stationary flows, meandering river and the poor location of the
stations.
Strategies to Reduce Systematic Error
yDouble-blind studies to control expectations.
yImplementation of hidden measurements.
yConcealment of results.
yCalibration of the instrument.
These types of errors can occur together. It is very important to know the
amount of error being made.
constructed, adjusted or installed the instrument, for example: an unexpected height of the level
of water;
k.Error of accuracy caused by the improper use of an instrument, when the error
the minimum is greater than the tolerance for the measurement.
Instrumental Errors
These are due to imperfections in the construction or adjustment of the instruments,
These errors can be reduced or eliminated by adopting the appropriate procedures.
Natural Errors
They are caused by natural effects, wind, temperature, humidity, atmospheric pressure.
atmospheric refraction, gravity, magnetic declination.
Personal Errors
It is due to the limitations of the senses of human beings, such as hearing and touch,
view, among others.
yThe hydrological data will allow a good understanding of the hydrological conditions of
a specific area. They will be used to improve or establish a forecasting program with
hydrological fines, when such a program is needed. A program of this kind must
include forecasts of water levels, flows, ice conditions, flooding and
stormy seas.
yTo facilitate the interpretation of the observed phenomena, it would be advisable to present
the data in the form of statistical values, such as averages, maximum and minimum values,
typical deviations, frequency distribution (tables or curves), among others.
Bibliographic References