Presidency University
Assignment 1
October, 2024
1. Create the following vectors:
(a) (1,3,5,7,......19)
(b) (1,2,3,.....10,9,8,7,.....,1)
(c) (2,5,7,2,5,7,2,5,7,2,5,7,2,5,7)
(d) (2,2,2,5,5,5,7,7,7)
(e) (2, 22/2 , 23/3 , 24/4 15
, ........2 /15)
(f) (0.12 0.98 , 0.13 0.97 ,0.14 0.96 ,..........,0.17 0.93 ,0.18 0.92 )
2. Find the sum:
(a) 1 + 23 + 23 . 45 + 23 . 54 . 67 + ... + 23 . 45 .... 38
39
100
X
(b) (i2 + i)
i=10
100 i
X 2
(c) and compare it with e2
i=0
i!
3. Run the following chunk of R code:
set.seed(50)
x=runif(250)
y=rnorm(250)
Suppose x = (x1 , x2 , ......., xn ) and y = (y1 , y2 , ......., yn ). Then
(a) Create the vector (y2 − x1 , y3 − x2 , ........, yn − xn−1 )
sin(y1 ) sin(y2 ) sin(yn−1 )
(b) Create the vector cos(x 2 ) , cos(x 3 ) , ........, cos(xn )
n−1
X e−xi+1
(c) Find
i=1
xi + 10
1
(d) Pick out the values in y which are > 1
(e) What are the index positions in y of the values which are > 1?
(f) What are the values in x which correspond to the values in y which
are > 1?
(g) How many values in x are more than 0.5?
(h) Sort the numbers in the vector x in the order of increasing values in
y.
(i) Pick out the elements in y at index positions 1, 4, 7, 10, 13, . . . .
and so on.
p q p
(j) Create the vector ( |x1 − x̄|, |x2 − x̄|, ......., |xn − x̄|) where x̄ is
the mean of xi ’s.
(k) Define zi = yi I(|yi | < 1). Find the mean and variance of z and y
and compare.
(l) How many x’s are more than y’s ?
(m) Round the vectors x and y to decimal places.
(n) Find the number of common elements in those rounded vectors.
In questions where you need to print long vectors as output, write only
the first few values as output in the answer. But the code should be for
generating the entire vector.
4. Execute the following lines of R code
set.seed(1)
x=sample(1:5,20,T)
(a) Convert a x into a factor.
(b) Rename the levels of the factor as Brand1, Brand2, Brand3, Brand4,
Brand5.
(c) Execute summary() over the factor explain what is the output.
5. Suppose
1 1 3
A= 5 2 6
−2 −1 −3
(a) Check that A3 = 0.
(b) Replace the third column of A by the sum of the second and third
columns.
2
6. Create the following matrix B with 15 rows:
10 −10 10
B = ... .. ..
. .
10 −10 10
Calculate the matrix B T B.
7. Create a 6×6 matrix named null with every entry equal to 0. Check what
the functions row and col return when applied to null. Hence create the
6 × 6 matrix:
0 1 0 0 0 0
1 0 1 0 0 0
0 1 0 1 0 0
0 0 1 0 1 0
0 0 0 1 0 1
0 0 0 0 1 0
8. Explore the help for the function outer . Hence create the following pat-
terned matrix:
0 1 2 3 4
1 2 3 4 5
2 3 4 5 6
3 4 5 6 7
4 5 6 7 8
9. Create the following patterned matrices. In each case, your solution should
make use of the special form of the matrix—this means that the solution
should easily generalize to creating a larger matrix with the same structure
and should not involve typing in all the entries in the matrix.
0 1 2 3 4
0 1 2 3 4 5 6 7 8 9
1 2 3 4 0 1 2 3 4 5 6 7 8 9 0
(b) ... ... .. .. .. ..
2 3 4 0 1
(a)
. . . .
3 4 0 1 2
8 9 0 1 2 3 4 5 6 7
4 0 1 2 3 9 0 1 2 3 4 5 6 7 8
0 8 7 6 5 4 3 2 1
1 0 8 7 6 5 4 3 2
2 1 0 8 7 6 5 4 3
3 2 1 0 8 7 6 5 4
4
(c) 3 2 1 0 8 7 6 5
5 4 3 2 1 0 8 7 6
6 5 4 3 2 1 0 8 7
7 6 5 4 3 2 1 0 8
8 7 6 5 4 3 2 1 0
3
(
0 a if i = j
10. If A = (a − b)In + b1n 1n , then An×n = ((aij )) where aij = .
6 j
b if i =
(a) Using the above formula by defining identity matrix and vectors con-
struct a 5 × 5 matrix A with a = 5 and b = 2.
(b) Find det(A) and trace(A)
(c) Find A−1 . Also extract the (5, 2)th element of A−1 and its minor.
(d) Solve the following system of linear equations
Ax = b
0
where b =(5, 2, 4, 1, 5) .
11. The spectral decomposition theorem for real symmetric matrix states that:
For real symmetric matrix A with eigenvalues λ1 , λ2 , ...., λn with corre-
sponding eigenvectors x1 , x2 , ..., xn , we have
n
X
A= λi xi xTi .
i=1
Using R, verify spectral decomposition theorem for the following matrix
1 2 3
2 4 5
3 5 2
12. Create a 6×10 matrix of random integers chosen from 1, 2,. . . , 10 by
executing the following two lines of code:
set.seed(75)
Mat<-matrix( sample(10, size=60, replace=T), nr=6)
(a) Find the number of entries in each row which are greater than 4.
(b) Which rows contain exactly two occurrences of the number seven?
(c) Find those pairs of columns whose total (over both columns) is greater
than 75. The answer should be a matrix with two columns; so, for
example, the row (1, 2) in the output matrix means that the sum of
columns 1 and 2 in the original matrix is greater than 75.
13. Using the function outer evaluate the following:
20 X
5
X i4
(a)
i=1 j=1
(3 + j)
4
20 X
5
i4
X
(b) (3+ij)
i=1 j=1
14. (Big Computing) The need of large computation is very common in most
of the contemporary applications as compared to 10 years ago and fortu-
nately we can manage most of them in R. Run the following piece of R
code
x= rexp(1100000)
(a) What is the length of this vector? Find the mean and variance of x.
(b) Find the mean of all of the entries in x which are strictly greater than
1.
(c) Plot a histogram of x.
(d) Add vertical lines each with different colors at the three quartiles.
Compare the relative positions of the quartiles and comment on the
skewness of the distribution.
(e) Create a matrix, X, containing the the values in x, with 32 rows and
34375 columns.
(f) Calculate the mean of the 371st column of X.
(g) Now, find the means for all 34375 columns of X simultaneously.
(h) Also find the standard deviation for the first 100 columns simultane-
ously.
(i) Use this matrix X as the input to the hist() function and save the
result to a variable of your choice. What does this new variable show?
(j) Now, find the means of all the columns of X simultaneously. Plot the
histogram of column means. Explain why its shape does not match
with the last histogram.
(k) We want to find the eigenvalue and eigenvectors of X T X. Use the
eigen() function to directly perform the eigen analysis of X T X.
What result do you get?
(l) State and use a fact of matrix algebra which helps in computing the
eigenvectors of X T X using the same eigen() function but with some
extra steps in computation. Why are some of the eigenvalues of X T X
zero?
15. The Cars93 data frame in the MASS package contains data on 93 makes
of car sold in USA.
(a) What are names of the variables in the data frame?
5
(b) What are the types of the variables?
(c) The variable Type classifies the type of market the car is aimed at.
In each type, find the cheapest car and the car with the greatest fuel
efficiency.
(d) Also for each type compute the mean horsepower and the difference
between each car’s horsepower and the mean horsepower for it’s type.
(e) Create two new data frames for US and non-US cars.
(f) Use write.table() to save the US car data to a file. Read it in and
check if all the factors are correctly set as factors.
(g) Use save() to save the non-US car data to a file.
(h) Search help to learn how to remove existing objects in R. Remove the
non-US data frame and load the non-US car data file using load().
Now check of all the factors are still set.
16. The data set at housing.csv contains information about the housing stock
of California and Pennsylvania, as of 2011. Information as aggregated into
“Census tracts”, geographic regions of a few thousand people which are
supposed to be fairly homogeneous economically and socially.
(a) (Import and scrutiny)
i. Load the data in R into a data frame called housing.
ii. What is the dimension of the dataset?
iii. Run this command, and explain, in words, what this does:
colSums(apply(housing,c(1,2),is.na))
iv. The function na.omit() omits any row containing an NA value.
Use it to eliminate rows with incomplete data. How many rows
did this eliminate? Is your answer compatible with the previous
one? Explain.
(b) The vacancy rate is the fraction of housing units which are not oc-
cupied. The data frame contains columns giving the total number of
housing units for each Census tract, and the number of vacant hous-
ing units. Add a new column to the data frame which contains the
vacancy rate. What are the minimum, maximum, mean, and median
vacancy rates?
(c) The column COUNTYFP contains a numerical code for counties
within each state. We are interested in Alameda County (county 1
in California), Santa Clara (county 85 in California), and Allegheny
County (county 3 in Pennsylvania).
i. What were the average percentages of housing built in these
counties since 2005?
6
ii. Calculate the median of house value for these counties.
iii. What is the correlation between median house value and the
percent of housing built since 2005 in
A. the whole data,
B. all of California,
C. all of Pennsylvania,
D. Alameda County,
E. Santa Clara County and
F. Allegheny County?
(d) i. The variable Built_2005_or_later indicates the percentage
of houses in each Census tract built since 2005. Plot median
house prices against this variable. Change the points sizes from
the default value.
ii. Make a new plot, or pair of plots, which breaks this out by state.
Note that the state is recorded in the STATEFP variable, with
California being state 6 and Pennsylvania state 42.
(e) The vacancy rate is the fraction of housing units which are not oc-
cupied. The dataframe contains columns giving the total number
of housing units for each Census tract, and the number of vacant
housing units.
i. Plot the vacancy rate against median house value.
ii. Plot vacancy rate against median house value separately for Cal-
ifornia and for Pennsylvania. Is there a difference?
17. Consider the following data on the severity of a crash tabulated for the
cases where the passenger had a seat belt or not:
Injury
None minimal minor major
Seat belt Yes 12813 647 359 42
No 65963 4000 2642 303
(a) Create an appropriate barplot showing the differences between those
who had seat belts and those who did not.
(b) Use identify() to interactively insert a legend.
18. (Finding the solution)
(a) Create a sequence of x values of length 100 from 1- to 1.
(b) Use plot() to draw the curve of y = ex between -1 and 1.
(c) Add the curve of y = sin(x) over the same domain on the previous
plot.
7
(d) The function text() is used to add some text anywhere in the plot-
ting area. Use this function to label the two curves appropriately.
The function expression() can be used to insert mathematical ex-
pressions as text. Use this to label y = ex curve.
(e) The function arrows() is used to add arrows to an existing plot.
Use arrows() and text() to locate the solution of sin(x) = ex . You
can take help of the locator() function which is used to get the
co-ordinates of any point in a plot interactively.
19. Run the following piece of R code:
n = 50
set.seed(0)
x = runif(n, min=-1, max=1)
y = x^3 + rnorm(n)
(a) Produce a scatterplot of x and y.
(b) Add the curve y = x3 in the plot and have the curve be drawn in red
with twice the normal thickness.
(c) Add a a straight horizontal line at 0 to the plot and have the line be
dashed.
(d) Define two new variables as
upper=x^3+qnorm(0.10)
lower=x^3-qnorm(0.10)
Add two new lines passing through the upper and lower points. These
lines are like the confidence intervals.
(e) Shade the area between the upper and lower lower bounds in gray.
[Hint: Use polygon(); this function requires that the x coordinates
of the polygon be passed in an appropriate order. You might find
it useful to set use c(x, rev(x)) for the x coordinates but need to
explain this command if you use it.]