KEMBAR78
Unit 12 GIS Analysis Using R-I | PDF | Geographic Information System | Computing
0% found this document useful (0 votes)
27 views24 pages

Unit 12 GIS Analysis Using R-I

This document introduces the use of R for GIS analysis, emphasizing its advantages over traditional desktop GIS applications. It covers spatial operations, including spatial joins and attribute aggregation, using R packages like sf and dplyr. The course aims to equip learners with the skills to perform GIS analyses and present results through interactive web pages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views24 pages

Unit 12 GIS Analysis Using R-I

This document introduces the use of R for GIS analysis, emphasizing its advantages over traditional desktop GIS applications. It covers spatial operations, including spatial joins and attribute aggregation, using R packages like sf and dplyr. The course aims to equip learners with the skills to perform GIS analyses and present results through interactive web pages.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT 12

GIS ANALYSIS USING R-I


Structure______________________________________________
12.1 Introduction Buffer

Expected Learning Outcomes Boundary

12.2 R in GIS analysis platform Clipping

12.3 Basics of spatial operations using 12.4 Spatial Interactions of Vector and Raster
different functions Data
Spatial Join 12.5 Cropping to the Area of Interest
Attribute Aggregation 12.6 Summary
Spatial Aggregation 12.7 Glossary
Interpolation

Unions

12.1 INTRODUCTION
This course will introduce the use of R for script-based geographic information systems (GIS)
overlay analysis workflows. By the end of the course, you will be able to script simple GIS analyses
and present results in interactive, self-documented web pages.
Whereas desktop GIS packages are great applications for exploratory spatial data analysis and
generation of map graphics, with typical work flows they are limited for scientific research. If you
can’t remember, reproduce, or clearly communicate your methodology, then what you are doing
cannot be technically called “research.” Although both ArcGIS Desktop and QGIS, two leading
desktop GIS applications, employ the use of Python for scripting geoprocessing analyses and user
interfaces, learning Python can be difficult and requires a strong commitment and a large amount of
time to master.
12.2 R IN GIS ANALYSIS PLATFORM
R is an option to ArcPy or the QGIS Python API, being highly used, extensible,
free, and script-based. If you have been taking courses in social science,
epidemiology, or statistics at UW, you probably already know enough R to get
started using it as a GIS analysis platform
(https://csde.washington.edu/workshop/reproducible-gis-analysis-with-r/). R is
also becoming more commonly used for various programming tasks beyond its
initial development as a statistical programming environment, particularly due to
the proliferation of packages that can be added to extend R functionality. One
of those packages, sf (Simple Features for R), extends the R analytic
framework to include a plethora of spatial data management and analysis tasks
and takes advantage of the tibble data frame to include representation of
geometric objects. Additionally, the leaflet package can be used for rendering
“slippy” (real-time scale-able, pan-able) maps embedded in web pages created
by the RMarkdown package. Combining these functionalities with R’s relatively
easy programming language can help you perform GIS analyses, generate data
summaries from geoprocessing operations, render high-quality tables and
graphics, and include maps in your reports.

12.3 BASICS OF SPATIAL OPERATIONS USING


DIFFERENT FUNCTIONS
We have learned how geographic datasets are structured in R, and how to join
the spatial and attribute features. In this chapter, we would further discuss
about the skills of spatial operations. Imagine that we want to know the number
of schools in each village, we should intersect the school point layer and village
polygon layer, to identify the polygon for each point located on. “Intersection” is
a type of spatial operations. Or imagine we want to know whether there are
schools within 500 meters of MRT stations. We need to draw buffer of the
stations in advance. “Buffer” is also a type of spatial operations. From the
examples above, we find that spatial operations are vital in the spatial analysis.
We would introduce some common spatial operation functions in package sf.
But please note that we would also use some functions in package dplyr. If you
are not acquainted with the package, please read the notebook attached here,
which provides you with a quick learn on the major function in dplyr More
detailed information of dplyr is supplied in Chapter 5 of R for Data Science.
12.31 Spatial Join
As we have learned in the previous chapter, joining spatial and attribute data
requires a primary key, namely a common column variable. But what if we want
to join two spatial data? It applies the similar concept, but relies on the shared
areas of geographic space (spatial overlay). We can apply function st_join(x, y,
left). The target object which would be in the left part of the new data frame is
placed in the first parameter (x), while the joined object is placed in the second
parameter (y). Parameter left= represents whether do the left join or not, which
is left=T by default, and return all records of the x object with y fields. If left=F,
then do the inner join instead.
Take nz and nz_height in spData for instance. The former data is the map of
New Zealand, while the latter is the top 101 highest points in the country. In the
example below, we want to know which provinces those points are located in.
st_join(nz_height, nz["Name"])
## Simple feature collection with 101 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax:
5650492

## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000


## First 10 features:
## t50_fid elevation Name geometry
## 1 2353944 2723 Southland POINT (1204143 5049971)
## 2 2354404 2820 Otago POINT (1234725 5048309)
## 3 2354405 2830 Otago POINT (1235915 5048745)
## 4 2369113 3033 West Coast POINT (1259702 5076570)
## 5 2362630 2749 Canterbury POINT (1378170 5158491)
## 6 2362814 2822 Canterbury POINT (1389460 5168749)
## 7 2362817 2778 Canterbury POINT (1390166 5169466)
## 8 2363991 3004 Canterbury POINT (1372357 5172729)
## 9 2363993 3114 Canterbury POINT (1372062 5173236)
## 10 2363994 2882 Canterbury POINT (1372810 5173419)
Please note that we should remain the class sf in two parameters. Thus,
second parameter in function st_join() should not be nz$name, which is a
numeric type. Use nz["Name"] instead, to remain the spatial data.
You may be confused about the sequence of data placed in function st_join().
Let’s do a simple comparison. In st_join(nz_height, nz), nz_height is target
data, and hence the new data frame is based on all the highest points, and the
geometry remains the form of nz_height (point). In st_join(nz, nz_height), nz is
the target data, and hence the new data frame is based on the province, and
the geometry remains the form of nz (polygon). Try the code yourself might get
a better knowing.
Note that st_join(..., join=st_intersects) does the intersection operation by
default. If we want to do other operations, we need to revise the parameter
join= in st_join(..., join=). Function used in join= is listed in documentation of
geos_binary_pred. The features are simply introduced in the table below (not all
the functions are appropriate to be applied in st_join).
Table 12.1: Functions and Features of geometry type.

Function Features

st_intersects(x,y) identifies if x and y geometry share any space

st_contains(x,y) identifies if x is within y (i.e. point within polygon)

st_disjoint(x,y) identifies when geometries from x do not share space with y

st_crosses(x,y) identifies if any geometry of x have commonalities with y

identifies if any point from x is outside of y (i.e. polygon


st_covers(x,y)
outside polygon)

identifies if x is completely within y (i.e. polygon completely


st_covered_by(x,y)
within polygon)

st_within(x,y) identifies if x is in a specified distance to y

identifies if geometries of x and y share a common point but


st_touches(x,y)
their interiors do not intersect

12.3.2 Attribute Aggregation


Aggregation operations summarize datasets (including attributes and spatial
features) by a “grouping variable.” We can use function group_by() %>%
summarise() in dplyr to conduct aggregation. Let’s take data “world” for
example. We want to group by the data by continent, and summarize the total
population of each continent. The code and result are shown below.
# map before aggregation
ggplot(world)+
geom_sf()+
theme(panel.background=element_blank())
Figure 1: Before Aggregation operation

# aggregation operations
continent_pop=group_by(world, continent)%>%
summarise(pop=sum(pop, na.rm=T))
# glance at new data
continent_pop
## Simple feature collection with 8 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 8 x 3
## continent pop geom
## <chr><dbl><MULTIPOLYGON [arc_degree]>
## 1 Africa 1154946633 (((11.91496 -5.037987, 11.09377 -
3.978827,~
## 2 Antarctica 0 (((-180 -89.9, 180 -89.9, 180 -84.71338, 1~
## 3 Asia 4311408059 (((120.295 -10.25865, 118.9678 -9.557969,
~
## 4 Europe 669036256 (((-53.77852 2.376703, -54.08806
2.105557,~
## 5 North America 565028684 (((-80.9473 8.858504, -80.5219
9.111072, -~
## 6 Oceania 37757833 (((171.9487 -41.51442, 172.0972 -40.9561,
~
## 7 Seven seas (open ocean) 0 (((68.935 -48.625, 68.8675 -48.83, 68.72
-~
## 8 South America 412060811 (((-57.75 -51.55, -58.05 -51.9, -59.4 -
52.~
# map after aggregation
ggplot(continent_pop)+
geom_sf()+
theme(panel.background=element_blank())

Figure 2:After Aggregation operation

subregion_info=group_by(world, subregion)%>%
summarise(
pop=sum(pop, na.rm=T), # use function sum() to calculate total
population
lifeExp=mean(lifeExp, na.rm=T), # use function mean() to calculate
average life expectancy
num_cou=n() # use function n() to calculate total number in
each group
)%>%
arrange(desc(lifeExp)) # use arrange to order data, and desc() means
descending order
# glance at new data
subregion_info
## Simple feature collection with 22 features and 4 fields
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## # A tibble: 22 x 5
## subregion pop lifeExp num_cou geom
## <chr><dbl><dbl><int><GEOMETRY [arc_degree]>
## 1 Australia and New Zealand 2.80e7 81.9 2 MULTIPOLYGON
(((171.1251~
## 2 Western Europe 1.26e8 81.8 7 MULTIPOLYGON (((-
52.9396~
## 3 Northern America 3.54e8 80.4 3 MULTIPOLYGON (((-
155.542~
## 4 Northern Europe 9.66e7 79.5 10 MULTIPOLYGON (((-
9.97708~
## 5 Southern Europe 1.53e8 78.4 12 MULTIPOLYGON
(((23.51498~
## 6 Eastern Asia 1.57e9 76.3 6 MULTIPOLYGON
(((110.2116~
## 7 Central America 1.70e8 74.7 8 POLYGON ((-81.80857
8.95~
## 8 Western Asia 2.52e8 74.5 18 MULTIPOLYGON
(((51.17252~
## 9 Eastern Europe 2.93e8 74.5 10 MULTIPOLYGON
(((27.6739 ~
## 10 Caribbean 4.06e7 73.8 7 MULTIPOLYGON (((-60.935 ~
## # ... with 12 more rows
Function sum() obtains the total value; mean() obtains the average value; n()
obtains the total number in the group. Parameter na.rm=T means that the
calculation would skip the NA (not available). If we do not add this parameter,
the result would be NA as well. Parameter desc() means arrange the data in
descending order, or it would be in ascending order by default. As we can see
in the result printed above, the first three highest life expectancy are Australia
and New Zealand, Western Europe, and Northern America respectively.
In addition to the application of dplyr, we can use function aggregate() provided
by package sf to obtain the same result. The first parameter in the function is
the data we want to summarize. Note the data should be remain the sf features.
Thus, do not use the form like world$pop, whose type is “numeric.” Instead, we
should write in the form as world["pop"], whose type is “sf” and “dataframe.”
Parameter by= should be filled in with the columns we want to group by. Note
that the type of data should be a list. Parameter FUN is the spatial operation
such as sum or average. Last, if the data contains NA, we should add the
parameter na.rm=T, in order to skip NA. Again, take data “world” for example,
and group by the data by continent, summarize the total population of each
continent. The code and result are as follows.
continent_pop_ag=aggregate(world["pop"], by=list(world$continent), FUN=sum,
na.rm=T)
continent_pop_ag
## Simple feature collection with 8 features and 2 fields
## Attribute-geometry relationship: 0 constant, 1 aggregate, 1 identity
## Geometry type: GEOMETRY
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## Group.1 pop geometry
## 1 Africa 1154946633 MULTIPOLYGON (((11.91496 -5...
## 2 Antarctica 0 MULTIPOLYGON (((-61.13898 -...
## 3 Asia 4311408059 MULTIPOLYGON (((120.295 -10...
## 4 Europe 669036256 MULTIPOLYGON (((-53.77852 2...
## 5 North America 565028684 MULTIPOLYGON (((-80.9473 8....
## 6 Oceania 37757833 MULTIPOLYGON (((171.9487 -4...
## 7 Seven seas (open ocean) 0 POLYGON ((68.935 -48.625, 6...
## 8 South America 412060811 MULTIPOLYGON (((-57.75 -51....
12.3.3 Spatial Aggregation
Similar to attribute data aggregation introduced in the previous section, spatial
aggregation can be done by function aggregate(). In attribute aggregation, the
first parameter should be in class sf, while the second parameter by= is a list
with non-spatial data. There is a little difference on the data imported in
parameter by=, which the data should also contain geometry in spatial
aggregation. Briefly to say, in function aggregate(x, by=y), spatial aggregation is
the geometry of the source (y) that defines how values in the target object (x)
are grouped.
Take data nz and nz_height for example. We want to know the average height
in each province of New Zealand. The code is shown below.
aggregate(x=nz_height, by=nz, FUN=mean)
## Simple feature collection with 16 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax:
6191874
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## First 10 features:
## t50_fid elevation geometry
## 1 NA NA MULTIPOLYGON (((1745493 600...
## 2 NA NA MULTIPOLYGON (((1803822 590...
## 3 2408405 2734.333 MULTIPOLYGON (((1860345 585...
## 4 NA NA MULTIPOLYGON (((2049387 583...
## 5 NA NA MULTIPOLYGON (((2024489 567...
## 6 NA NA MULTIPOLYGON (((2024489 567...
## 7 NA NA MULTIPOLYGON (((1740438 571...
## 8 2408395 2777.000 MULTIPOLYGON (((1866732 566...
## 9 NA NA MULTIPOLYGON (((1881590 548...
## 10 2368390 2889.455 MULTIPOLYGON (((1557042 531...
As the result shows, all of attribute features of data nz_height (first parameter)
are average value of each province. And the geometry is based on the spatial
data in second parameter by=. You may find that the data we obtain are not
perfect, since it does not contain the features in second parameter by=. Also, it
does not make sense to calculate the average of all attribute features. For
instance, variable “t50_fid” is the identity number of the highest points, it does
not make sense to derive its mean.
In addition to function aggregate(), the same result can also be generated by
function group_by() %>% summarise() in dplyr The code and result are shown
below.
ve_height=st_join(nz, nz_height)%>%
group_by(Name)%>%
summarise(elevation=mean(elevation, na.rm=T))

Figure 3: Mean Elevation of the study area

# plot the map of average height of each province


ggplot(ave_height)+
geom_sf(aes(fill=elevation))+
scale_fill_continuous(low="#CEFFCE", high="#006000")+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.background=element_blank())
We need to join two spatial data (nz is the target data, which should be placed
in the first parameter). Then, group by the variable “Name” and summarize the
mean value of elevation in each group. Though the code is longer than function
aggregate(), dplyr shows a readable and clear code to do the spatial
aggregation.
In this example, we can again find that use dplyr and sf package together might
be a better method to do spatial aggregation.
12.3.4 Interpolation
Interpolation is the process that calculate estimates from a source set of
polygons to an overlapping but incongruent set of target polygons. Imagine that
we want to derive the population in a specific area, which is defined by us.
Since the boundary of study area, we define does not align with the one of
population data (most of the population data is based on the villages or
Statistical Areas), we should use interpolation to produce estimates in this
situation. The concept is simply illustrated in the figure below.

Figure 4:source set of polygons to an overlapping

In the figure above, the study area contains three districts whose population
have been surveyed. It is covered by 40% of Area 1; 25% of Area 2; and 40%
of Area 3. Population is the attribute assumed to be spatially extensive, and the
sum is preserved. Hence, we can simply derive the total population of the study
area based on the weight of each area: 1000* 40% + 800* 25% + 900* 40% =
960.
For the study area, there are 40% of it is Area 1; 33.3% of it is Area 2; 26.7% of
it is Area 3. Density is spatially intensive instead, and the mean is preserved.
Hence, we can simply derive the mean density of the study area based on the
area weight of study area: 200/3* 40% + 40* 33.3% + 90* 26.7% = 64.
Package sf provides a function st_interpolation_aw() to conduct the
interpolation. There are three parameters required in the function
st_interpolate_aw(x, to, extensive). The first parameter (x) is the object of
simple features, for which we want to aggregate attributes. The second
parameter (to) is the object of simple features with the target geometries.
Parameter extensive= is the operation of data. If it is True, the attribute
variables are assumed to be spatially extensive (like population) and the sum is
preserved, otherwise, spatially intensive (like population density) and the mean
is preserved.
Here, we use the data incongruent and aggregating_zones provided by
package spData. The former one is a specific region separated into 9 districts in
UK, while the latter is in the same region, but separated into 2 major area (as
the figure shown below). We need to interpolate the value in data incongruent
to the geometry data aggregating_zones, and derive the sum of value in two
areas. The code and result are shown below.

Figure 5:Aggregating Zone

st_interpolate_aw(incongruent["value"], aggregating_zones, extensive=T)


## Warning in st_interpolate_aw.sf(incongruent["value"], aggregating_zones, :
## st_interpolate_aw assumes attributes are constant or uniform over areas of
x
## Simple feature collection with 2 features and 1 field
## Attribute-geometry relationship: 0 constant, 1 aggregate, 0 identity
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 417686.2 ymin: 443703.6 xmax: 422959.3 ymax:
447036.8
## Projected CRS: OSGB 1936 / British National Grid
## value geometry
## 1 19.61613 MULTIPOLYGON (((418731.9 44...
## 2 25.66872 MULTIPOLYGON (((419196.4 44..
12.3.5 Unions
Aggregation can dissolve the boundaries of touching polygons in the same
group. And what is the behind scene of the function aggregate() and group_by()
%>% summarise()? They combine the geometries and eliminate the boundaries
in the group by using function st_union(). In the example shown below, we filter
the West part of Us states, and then union it by collecting all the geometries
together.
# filter the west of US states
us_west=filter(us_states, REGION=="West")
head(us_west)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.4096 ymin: 31.33224 xmax: -102.0422 ymax:
49.00091
## Geodetic CRS: NAD83
## GEOID NAME REGION AREA total_pop_10 total_pop_15
## 1 04 Arizona West 295281.3 [km^2] 6246816 6641928
## 2 08 Colorado West 269573.1 [km^2] 4887061 5278906
## 3 16 Idaho West 216512.7 [km^2] 1526797 1616547
## 4 30 Montana West 380829.2 [km^2] 973739 1014699
## 5 32 Nevada West 286363.7 [km^2] 2633331 2798636
## 6 06 California West 409747.1 [km^2] 36637290 38421464
## geometry
## 1 MULTIPOLYGON (((-114.7196 3...
## 2 MULTIPOLYGON (((-109.0501 4...
## 3 MULTIPOLYGON (((-116.916 45...
## 4 MULTIPOLYGON (((-116.0492 4...
## 5 MULTIPOLYGON (((-119.9992 4...
## 6 MULTIPOLYGON (((-118.6034 3...
# union us_west
us_west=st_union(us_west)
## although coordinates are longitude/latitude, st_union assumes that they are
planar
head(us_west)
## Geometry set for 1 feature
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7042 ymin: 31.33224 xmax: -102.0422 ymax:
49.00236
## Geodetic CRS: NAD83
## MULTIPOLYGON (((-103.0428 34.85024, -103.0439 3...

Figure 6: Conversion of Multipolygon to single polygon

# derive the centroid


us_states_centroid=st_centroid(us_states)
## Warning in st_centroid.sf(us_states): st_centroid assumes attributes are
## constant over geometries of x
## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for
longitude/
## latitude data
# plot the map
ggplot(us_states)+
geom_sf()+
geom_sf(data=us_states_centroid, color="red")+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.background=element_blank())
Figure 7:Polygon centroid assumes attributes

But there is a drawback of st_centroid(). If the polygon contains many


separated areas, the centroid might not lie on the polygon. For instance, the
centroid of the group islands may not be situated in the land of the island, but in
the ocean. Function st_point_on_surface can solve this problem. It ensures that
all of the points will fall in their parent object. But please note that the point
st_point_on_surface generates does not guarantee to be located in the largest
area. Here comes an example about the centroid and point of surface on
Penghu Island, which is located in 50 kilometers west of Taiwan Island.
12.3.6 Buffer
Imagine that we want to calculate the number of schools within 500 meters
radius of MRT stations, we need to first draw the buffer of each MRT stations
before points calculation. Buffer is a zone (polygon) around a map feature
within given distance, and it is useful for proximity analysis. Buffer can be
created by function st_buffer(), whatever the type of geometry is. There are two
parameters should be filled in st_buffer(). Place the spatial data first, and then
set the distance of radius. But please note that the unit of distance is based on
the coordinate reference system. In general, the unit is meter (m) in the
projected coordinate system, while the unit is NULL in the the geographic
coordinate systems (e.g., EPSG:4326). Hence, we need to use function
st_transform() to convert CRS to specific projected coordinate system in the
region of study area, if the spatial data with CRS EPSG:4326 is obtained.
Examples below illustrates the buffer of cycle hire points across London. Data
cycle hire is provided by package spData. Suppose we define the region within
300 meters of hire points is service area.
# test of the CRS and buffer of cycle_hire
st_crs(cycle_hire)$epsg # original CRS: EPSG:4326 (geographic)
## [1] 4326
st_crs(cycle_hire)$units # units: NULL
## NULL
st_buffer(cycle_hire, 300)
## Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
## endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
## dist is assumed to be in decimal degrees (arc_degrees).
## Simple feature collection with 742 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -300.2368 ymin: -248.5452 xmax: 299.9977 ymax:
351.5421
## Geodetic CRS: WGS 84
## First 10 features:
## id name area nbikes nempty
## 1 1 River Street Clerkenwell 4 14
## 2 2 Phillimore Gardens Kensington 2 34
## 3 3 Christopher Street Liverpool Street 0 32
## 4 4 St. Chad's Street King's Cross 4 19
## 5 5 Sedding Street Sloane Square 15 12
## 6 6 Broadcasting House Marylebone 0 18
## 7 7 Charlbert Street St. John's Wood 15 0
## 8 8 Lodge Road St. John's Wood 5 13
## 9 9 New Globe Walk Bankside 3 16
## 10 10 Park Street Bankside 1 17
## geometry
## 1 POLYGON ((299.89 51.52916, ...
## 2 POLYGON ((299.8024 51.49961...
## 3 POLYGON ((299.9154 51.52128...
## 4 POLYGON ((299.879 51.53006,...
## 5 POLYGON ((299.8431 51.49313...
## 6 POLYGON ((299.8558 51.51812...
## 7 POLYGON ((299.8319 51.5343,...
## 8 POLYGON ((299.8299 51.52834...
## 9 POLYGON ((299.9036 51.50739...
## 10 POLYGON ((299.9072 51.50597...
The warning shows that st_buffer() does not correctly buffer longitude-latitude
spatial data, whose distance is assumed to be in decimal degrees
(arc_degrees). Though the operation of geometry is still done, it is not true.
(You can plot the map to check what the geometry looks like.) Hence, we
should not use the spatial data with EPSG:4326 to create buffer, but transform
CRS in advance. However, what CRS should be applied? We can look up the
CRS of England in EPSG database.
# transform CRS to EPSG:27700 (projected)
cycle_hire_new=st_transform(cycle_hire, crs=27700)
# create the buffer (radius: 300 m)
st_buffer(cycle_hire_new, 300)
## Simple feature collection with 742 features and 5 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 522202 ymin: 174108 xmax: 539033.2 ymax: 184721
## Projected CRS: OSGB 1936 / British National Grid
## First 10 features:
## id name area nbikes nempty
## 1 1 River Street Clerkenwell 4 14
## 2 2 Phillimore Gardens Kensington 2 34
## 3 3 Christopher Street Liverpool Street 0 32
## 4 4 St. Chad's Street King's Cross 4 19
## 5 5 Sedding Street Sloane Square 15 12
## 6 6 Broadcasting House Marylebone 0 18
## 7 7 Charlbert Street St. John's Wood 15 0
## 8 8 Lodge Road St. John's Wood 5 13
## 9 9 New Globe Walk Bankside 3 16
## 10 10 Park Street Bankside 1 17
## geometry
## 1 POLYGON ((531503.5 182832.1...
## 2 POLYGON ((525508.1 179391.9...
## 3 POLYGON ((533285.8 182001.6...
## 4 POLYGON ((530737.8 182912, ...
## 5 POLYGON ((528351 178742, 52...
## 6 POLYGON ((529158.4 181542.9...
## 7 POLYGON ((527459 183300.8, ...
## 8 POLYGON ((527332.7 182634.6...
## 9 POLYGON ((532505 180434.6, ...
## 10 POLYGON ((532764.9 180284.3...
Figure 8:above shows the location of cycle hire points (black dots), and the buffer of each point is painted blue.

12.3.7 Boundary
Function st_boundary() creates a polygon that encompasses the full extent of
the geometry. Boundary and union are often applied together to emphasize the
border of the study area. The example below shows the us_state and makes
the border of country thicker.
# union us_states and then derive the boundary
us_boundary=st_boundary(st_union(us_states))
## although coordinates are longitude/latitude, st_union assumes that they are
planar
# plot the map
ggplot(us_states)+
Figure 9:polygon that encompasses the full extent of the geometry

geom_sf()+
geom_sf(data=us_boundary, size=1)+
theme(axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.background=element_blank())
12.3.8 Clipping
Clipping is a critical spatial operation in GIS. It extracts the geometry (point,
line, polygon) that lies within the area outlined by the clip polygon. For instance,
we have polygon of Taipei City and points of schools in Taiwan, and suppose
we want to extract the schools located in Taipei City. We need to use the
polygon to clip the point data, removing all the points outside the polygon.
Though there is no “clip” function provided by package sf in fact, function
st_intersection() help us do so. But please note that there is a difference
between clipping and intersection in an accurate GIS definition. Clipping and
intersection both extracts the geometry in a specific polygon; nonetheless,
clipping does not retain the attributes of the clip polygon, while intersection
creates output features which possess the attributes of all input features. The
concept is illustrated below.
Figure 10: Clipping and intersection

The slight difference may be a fundamental concept of GIS, but it is not really
vital in terms of spatial analysis in R (after all, there is no clip function provided
in R). In st_intersection(x, y), it is free to place the spatial data in two
parameters. But please note data x would be on the left-hand side of new data
frame, while y is on the right-hand side. Here we use the example of nz (map of
New Zealand) and nz_height (101 highest point in New Zealand) again.
Suppose we want to extract the points located in North Island, the code and
result are shown below.
# filter North Island in data nz
nz_north=filter(nz, Island=="North")
# st_intersection(point, polygon)
st_intersection(nz_height, nz_north)
## Warning: attribute variables are assumed to be spatially constant throughout
all
## geometries
## Simple feature collection with 5 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1820643 ymin: 5647971 xmax: 1822492 ymax:
5650492
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## t50_fid elevation Name Island Land_area Population
## 99 2408397 2751 Waikato North 23900.04 460100
## 100 2408406 2720 Waikato North 23900.04 460100
## 101 2408411 2732 Waikato North 23900.04 460100
## 97 2408394 2797 Manawatu-Wanganui North 22220.61 234500
## 98 2408395 2757 Manawatu-Wanganui North 22220.61 234500
## Median_income Sex_ratio geometry
## 99 27900 0.9520500 POINT (1820660 5649488)
## 100 27900 0.9520500 POINT (1822263 5650429)
## 101 27900 0.9520500 POINT (1822492 5650492)
## 97 25000 0.9387734 POINT (1821014 5647971)
## 98 25000 0.9387734 POINT (1820643 5648331)
Remember the spatial join operation? Actually, we can use function st_join() to
do the same thing. The code is shown below.
# union two data first
nz_height_join=st_join(nz_height, nz)
# filter the highest point located in North Island
filter(nz_height_join, Island=="North")
## Simple feature collection with 5 features and 8 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1820643 ymin: 5647971 xmax: 1822492 ymax:
5650492
## Projected CRS: NZGD2000 / New Zealand Transverse Mercator 2000
## t50_fid elevation Name Island Land_area Population
Median_income
## 1 2408394 2797 Manawatu-Wanganui North 22220.61 234500
25000
## 2 2408395 2757 Manawatu-Wanganui North 22220.61 234500
25000
## 3 2408397 2751 Waikato North 23900.04 460100 27900
## 4 2408406 2720 Waikato North 23900.04 460100 27900
## 5 2408411 2732 Waikato North 23900.04 460100 27900
## Sex_ratio geometry
## 1 0.9387734 POINT (1821014 5647971)
## 2 0.9387734 POINT (1820643 5648331)
## 3 0.9520500 POINT (1820660 5649488)
## 4 0.9520500 POINT (1822263 5650429)
## 5 0.9520500 POINT (1822492 5650492)
In this example, we find that function st_join() and st_intersection() do the
similar operation. This is because that st_join() sets the operation to
“intersection” by default, that is, the parameter is st_join(...,
join="st_intersects"). Other operations are listed in the table of
geos_binary_pred.
In addition to st_intersection, there are some functions to do subsetting and
clipping spatial data. To illustrate the concept and operations of these functions,
we use a simple example: two overlapping circles (X and Y), with a center point
one unit away from each other and a radius of one. The code and results are
shown below.
X=st_buffer(st_point(c(1,1)), dist=1)
Y=st_buffer(st_point(c(2,1)), dist=1)
st_intersection(X, Y)
st_difference(X, Y)
st_difference(Y, X)
st_union(X, Y)
st_sym_difference(X, Y)

Figure 11: Functions of Intersection operation

12.4 SPATIAL INTERACTIONS OF VECTOR AND


RASTER DATA
In this chapter we learn the spatial interactions of a vector and raster dataset.
We first look at how to crop (spatially subset) a raster dataset based on the
geographic extent of a vector dataset. We then cover how to extract values
from raster data for points and polygons. To be precise, here is what we mean
by raster data extraction and what it does for points and polygons data:
Points: For each of the points, find which raster cell it is located within, and
assign the value of the cell to the point.
Polygons: For each of the polygons, identify all the raster cells that intersect
with the polygon, and assign a vector of the cell values to the polygon
This is probably the most important operation economists run on raster
datasets.
We will show how we can use terra::extract() for both cases. But, we will also
see that for polygons, exact_extract() from the exactextractr package is often
considerably faster than terra::extract().
Finally, you will see conversions between Raster
∗ (raster package) objects and SpatRaster object (terra package) because of
the incompatibility of object classes across the key packages. I believe that
these hassles will go away soon when they start supporting each other.
Direction for replication
Datasets: All the datasets that you need to import are available here. In this
chapter, the path to files is set relative to my own working directory (which is
hidden). To run the codes without having to mess with paths to the files, follow
these steps:
set a folder (any folder) as the working directory using setwd()
create a folder called “Data” inside the folder designated as the working
directory (if you have created a “Data” folder previously, skip this step)
download the pertinent datasets from here
place all the files in the downloaded folder in the “Data” folder
Packages
Run the following code to install or load (if already installed) the pacman
package, and then install or load (if already installed) the listed package inside
the pacman::p_load() function.
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
terra, # handle raster data
raster, # handle raster data
exactextractr, # fast extractions
sf, # vector data operations
dplyr, # data wrangling
tidyr, # data wrangling
data.table, # data wrangling
prism, # download PRISM data
tictoc, # timing codes
tigris, # to get county sf
tmap # for mapping
)

12.5 CROPPING TO THE AREA OF INTEREST


Here we use PRISM maximum temperature (tmax) data as a raster dataset and
Kansas county boundaries as a vector dataset.
Let’s download the tmax data for July 1, 2018 (Figure 5.1).
#--- set the path to the folder to which you save the downloaded PRISM data ---
#
# This code sets the current working directory as the designated folder
options(prism.path = "Data")
#--- download PRISM precipitation data ---#
get_prism_dailys(
type = "tmax",
date = "2018-07-01",
keepZip = FALSE
)
#--- the file name of the PRISM data just downloaded ---#
prism_file <-
"Data/PRISM_tmax_stable_4kmD2_20180701_bil/PRISM_tmax_stable_4kmD2
_20180701_bil.bil"
#--- read in the prism data ---#
prism_tmax_0701_sr <- rast(prism_file)

12.6 SUMMARY
This topic will understand the use of R for script-based geographic information
systems (GIS) overlay analysis workflows and simple script GIS analyses and
present results in interactive, self-documented web pages.
The use of R for script-based workflows for GIS overlay analysis is covered in
this course. Users will be able to script basic GIS analyses and display the
findings in self-documented, interactive web pages by the end of the course. R
is an alternate to ArcPy or the QGIS Python API, being heavily utilized,
extendable, free, and script-based. Beyond its original conception as a
statistical programming environment, it is also increasingly frequently utilized for
other programming tasks. R capabilities can be expanded by using packages
like the leaflet package and sf (Simple Features for R). These features, along
with R's rather simple programming language, make it easier to do GIS
analysis, produce data summaries from geoprocessing procedures, produce
excellent tables and graphics, and include maps into reports.
12.7 GLOSSARY
A shapefile: is a file-based data format native to ArcGIS software (a much
older version of ArcGIS Pro). Conceptually, a shapefile is a feature class–it
stores a collection of features that have the same geometry type (point, line, or
polygon), the same attributes, and a common spatial extent.
GeoPackage: This is a relatively new data format that follows open format
standards (i.e. it is non-proprietary). It’s built on top of SQLite (a self-contained
relational database). Its one big advantage over many other vector formats is its
compactness–coordinate value, metadata, attribute table, projection
information, etc…, are all stored in a single file which facilitates portability. Its
filename usually ends in .gpkg. Applications such as QGIS (2.12 and up), R and
ArcGIS will recognize this format.
Raster Data File Formats: Rasters are in part defined by their pixel depth.
Pixel depth defines the range of distinct values the raster can store. For
example, a 1-bit raster can only store 2 distinct values: 0 and 1.
There is a wide range of raster file formats used in the GIS world. Some of the
most popular ones are listed below.

https://journal.r-project.org/articles/RJ-2017-067/

You might also like