Cognitive Psychology
Linear Algebra
Matrix
A matrix is a rectangular array of numbers arranged in rows
and columns.
34 35 73 78
A
53 64 36 98
=73
32 57 64
The number of rows and columns that a matrix has is called
its dimension; rows first, columns second. The above matrix
is 3 x 4, meaning that it has 3 rows and 4 columns.
Numbers in a matrix (A) are called elements. First subscript
refers to the row number and second subscript refers to the
column number. Thus, the first element in the first row is
represented by A11(34). The second element in the first row is
represented by A12(35).
Matrix
Transpose Matrix
The transpose of a matrix is to use the rows of this matrix as
columns of another matrix.
Vectors
Vectors have to forms: column vectors and row vectors.
a a’
Vector Norms
A norm of a vector is informally a measure of the
“length” of the vector.
𝑁
𝑥𝑖2
∑
𝑥 =
𝑖=1
Matrix
A square matrix is an n x n matrix; that is, a matrix
with the same number of rows as columns.
Symmetric matrix. If the transpose of a matrix is
equal to itself, that matrix is said to be symmetric.
Diagonal matrix. A diagonal matrix is a special kind
of symmetric matrix. It is a symmetric matrix with
zeros in the off-diagonal elements.
Matrix
The trace is the sum of diagonal elements in the
matrix.
Matrix
Scalar matrix. A scalar matrix is a special kind of
diagonal matrix. It is a diagonal matrix with equal-
valued elements along the diagonal.
The identity matrix is an n x n diagonal matrix with
1's in the diagonal and zeros everywhere else. The
identity matrix is denoted by I.
Any matrix that is premultiplied or postmultiplied by I
remains the same: AI = IA = A.
Matrix Addition
Two matrices may be added only if they have the same
dimension; that is, they must have the same number of
rows and columns.
Addition or subtraction is accomplished by adding or
subtracting corresponding elements.
The order in which matrices are added is not important;
thus, A + B = B + A.
,,
Matrix Subtraction
Same as addition, the two matrices must have the
same number of rows and columns.
,,
Matrix Multiplication
Multiplication of a matrix by a number:
Multiply every element in the matrix by the same number.
This operation produces a new matrix, which is called
a scalar multiple.
4 is sometimes called a scalar. A scalar is a real number.
Matrix Multiplication
Multiplication of a matrix by a matrix:
The matrix product AB is defined only when the number of
columns in A is equal to the number of rows in B.
Suppose that A is an i x j matrix, and B is a j x k matrix. Then,
the matrix product AB results in a matrix C, which has i rows
and k columns; and each element in C can be computed as
= the element in row i and column k from matrix C = the
element in row i and column j from matrix A = the
element in row j and column k from matrix B =
summation sign, which indicates that the aijbjk terms
should be summed over j
Matrix Multiplication
Multiplication of a matrix by a matrix:
Some visual illustrations
Matrix Multiplication
A*B does not always equal to B*A, except for the
identity matrix, AI = IA = A.
Vector multiplication
The inner product of two vectors is a real number
ab’ = ba’ = s.
a and b are row vectors, each having n elements, a' is
the transpose of a, which makes a’ a column vector, b' is
the transpose of b, which makes b' a column vector,
s is a scalar - a real number rather than a matrix.
The outer product of two vectors produces a rectangular
matrix a’b= C.
C is a rectangular m x n matrix.
Matrix Multiplication
Multiplication of a vector by a vector:
A visual illustration
Matrix as linear transformation
One-dimensional space is a One-dimensional space is a
number line. Multiply every number line. Multiply every
number on the line by two number on the line by
(2), negative three (-3),
Matrix as linear transformation
Videos above show the
linear transformations of a
one-dimensional space. The
word transformation
indicates “actions” such as
moving, stretching,
squishing… etc an object.
The first video shows how we
move the point one on the
number line to where two is,
and how we move the point
two on the number line to
where four is …etc.
Similarly, you can imagine
how a linear transformation
occurs in a two-dimensional
Matrix as linear transformation
Imagine we have a 2-
dimensional space, , is a
vector along the x-axis,
pointing at unit 1; is a vector
along the y-axis, pointing at
unit 1. These vectors will be
shifted to different
locations/directions after a
linear transformation.
We can see that after the
transformation, lands at , and
lands at .
Matrix as linear transformation
If we put these destinations together, each being the
destination of each unit vector, it becomes a matrix.
The first column tells us where the original unit vector
for x-axis lands, and the second column tells us where
the original unit vector for y-axis lands.
Where the y-axis unit vector
Another example:
lands
Where the x-axis unit vector
[1 0 ]
lands
0 −1
𝐴=
Matrix as linear transformation
What is NOT a linear transformation?
The origin must remain fixed, and all lines must
remain lines.
Independent vs dependent vectors
One vector is dependent on other vectors, if it is a linear
combination of the other vectors.
If one vector is equal to the sum of scalar multiples of other
vectors, it is said to be a linear combination of the other
vectors.
a b c 2b + 3c
a is a linear combination of b and c, thus is not an
independent vector.
A set of vectors is linearly independent if no vector in the set
is (a) a scalar multiple of another vector in the set or (b) a
linear combination of other vectors in the set.
Rank
Think of an r x c matrix as a set of r row vectors, each having c
elements; or think of it as a set of c column vectors, each having
r elements.
The rank of a matrix is
the maximum number of linearly independent column vectors in the
matrix;
or, the maximum number of linearly independent row vectors in the
matrix; both definitions are equivalent.
When all of the vectors in a Rank
matrix are linearly independent, the
= 3 (full
matrix is said to be full rank.
rank)
Example: Rank
=1 ,
Rank
The rank of a matrix actually tells you what is the
minimum dimension of the space holding all the vectors
in the matrix.
If rank(A)=3, it means, in order to contain all the vectors of A, a
space of at least 3 dimension is a must.
If rank(A)=2, it means the vectors are in fact lying in a plane (2D).
If rank(A)=1, it means the vectors are lying within a line.
A matrix is a linear operator trying to project a point x on
from space s to a space s’. The dimension of this space s’
is given by the rank of the matrix. We can say that the rank
of the matrix is just the dimension of the embedding space
for x.
Determinant
The determinant is a unique number associated with a square
matrix. The determinant of matrix A is indicated by |A|.
When you have two vectors, each comes with a parallel pair
would result in a parallelogram; when you have three vectors,
each face comes with a parallel pair would result in a
parallelepiped. The determinant of a matrix (A) is the
volume of P.
Determinant
When a parallelogram or a parallelepiped has 0
volume, it means that the parallelogram or the
parallelepiped is flat – it is squashed into a lower
dimension. This also means, the vectors are NOT
linearly independent (e.g. because one vector can be
represented as a linear combination of other vectors
inside the matrix, or the product of a vector multiplied
by a scalar).
Matrix inverse
Suppose A is an n x n matrix. The inverse of A is
another n x n matrix, denoted A-1, is…
AA-1 = A-1A = In
=
A A -1
In
Not every square matrix has an inverse; but if a matrix
does have an inverse, it is unique.
If the rank of an n x n matrix is less than n, the matrix does
NOT have an inverse.
When the determinant for a square matrix is zero, the matrix
does NOT have an inverse.
Matrix inverse
If a matrix does not have an inverse, it is called a singular
matrix.
Singular matrix is usually caused by linear dependence
between the column vectors (collinearity), or a column
(variable) that has zero variance.
In here, you can find the reverse of 2 and 3 by following the
arrow back to Z and W. However, when you want to find the
reverse of 1, you cannot determine whether to go to X or Y.
In this case, the information is lost, because you can no longer
distinguish X and Y any more after you apply the function.
It is a bad to have a singular matrix also because a lot of
actual calculations require the matrix inverse to be calculated.
Span
The span of a set of vectors in a vector space is the
intersection of all linear subspaces which each contain
every vector in that set.
The word span can be applied to a bridge, an arch. For
example, a bridge spans a river from one side to the other.
Applied to a set of vectors in linear algebra, those vectors
span a vector space.
Imagine a point (1,1,0) in a 3-D space. If you multiple this
point with any real number, you get a set of points that fills
out a line through the origin (0,0,0) and through the point
(1,1,0). So, for any y = x*(1,1,0), x is any real number, y is
in the span of (1,1,0).
Span
Example:
Assume we have two vectors [2,3], and [1,2], does a
vector [19,3] belong to the span of the first two
vectors?
This is equivalent to asking, does there exist two
scalars, c1 and c2, so that c1*[2,3] + c2*[1,2] = [19,3]?
If c1 = 35, and c2 = -51, then 35*[2,3] + (-51)*[1,2] =
[19,3]; therefore, [19,3] exists within the span of the
vectors [2,3] and [1,2].
Recap- Parametric model -
regression
Simple linear regression is a
statistical method that allows us to
summarize and study relationships
between two continuous (quantitative) 𝑌
variables:
X is the predictor or
the independent variable.
Y is the response or
the dependent variable.
is the intercept
is the coefficient for X:
1: as X increases 1 unit, Y increases 1 unit
2: as X increases 1 unit, Y increases 2 units
0.5: as X increases 1 unit, Y increases 0.5 unit 𝑋
-3: as X increases 1 unit, Y decreases 3 units
https://www.desmos.com/calculator/kreo2ss
qj8
Linear regression
Assume that you want to know how drinking affects
performance in an experimental task such as face
recognition.
Your assumption is that the more drinks (e.g., glass of wine)
people have, the more errors they will make in this
recognition task.
Now, you’ve conducted this study on 3 subjects, and your
data looks like this: (1,1), (2,2), (3,2).
The goal is to find a linear equation that fits these points, so
that you can predict the performance of e.g., a person who
drinks 100 glasses of wine in this task, without the need to
actually test it on a real individual.
Linear regression
We can write the equation as , b is the number of errors the
individual made, x is the number of drinks the individual had, C
and D are the regression coefficients we want to identify, based
on our data.
For the three data points, we can write down 3 equations…
This can be rewritten in a matrix form…
Linear regression
What this is saying is that we hope the vector b
lies in the span of A. That is, we’re hoping there’s
some linear combination of the columns of A that
gives us our vector of observed b values.
Unfortunately, we already know b doesn’t fit our
model perfectly. That is, it’s outside the span of
A.
The span of A forms a flat plane. If we think of
the columns of A as vectors a1 and a2, the plane
is all possible linear combinations of a1 and a2.
By contrast, the vector of observed values b
doesn’t lie in the plane. It sticks up in some
direction. The plane of A is really just our hoped-
for mathematical model. And the vector b is our
observed data that unfortunately doesn’t fit the
Linear regression
Linear regression suggests that we should
forget about finding a model that perfectly
fits b, and instead find another vector
(e.g., p) that is very close to b, but fits our
model. We want to pick a vector p that’s in
the span of A, but is also as close as
possible to b.
This can be done by “projecting” vector b
onto the plane. This projection is labeled
p. The line marked e is the “error”
between our observed vector b and the
projected vector p. The goal is to choose
the vector p to make e as small as
possible. That is, we want to minimize the
error between the vector p used in the
Eigenvectors and Eigenvalues
Imagine if you perform a linear transformation, turning
the green box from the shape on the left to that on the
right, you can see that 1) the length of the red vectors
changed, but their directions (i.e. their span) did NOT
change; 2) the length and the direction (i.e. the span)
of the yellow vector both changed. The red vectors,
whose span do not change over a linear
transformation, are called eigenvectors; the change in
their lengths are called eigenvalues.
Eigenvectors and Eigenvalues
For example, let…
,,
To apply linear transformation on v and w, we get…
Hence,
Eigenvectors and Eigenvalues
Formally, an eigenvector () of a linear transformation is
a non-zero vector that changes by only a scalar factor
when that linear transformation is applied to it…
is a scalar known as the eigenvalue.
Eigenvectors and Eigenvalues
Imagine that the triangles are points of data.
To find the direction where there is most variance, find the
straight line where the data is most spread out when projected
onto it.
A vertical straight line with the points projected on to it isn’t very
spread out, therefore it doesn’t have a large variance.
A projection onto a horizontal line is way more spread out, it has
a large variance.
Eigenvectors and Eigenvalues
Now, imagine the data looks like this plotted on a x-y
plane.
If, instead of using the x, y axes, we try to find new
axes that can maximize the variances along the new
axes, we can use the eigenvectors as the new axes.
The eigenvalues indicate how much variance each of
the new axes contain, i.e., how important each of the
new axes is.
Eigenvectors and Eigenvalues
The number of eigenvectors equals the number of
variables the data set contains (i.e. the dimension of the
data). After we obtain the eigenvectors, we can decide
whether we want to drop some of the eigenvectors (i.e.
new axes/dimensions) based on their eigenvalues: smaller
values indicate less variance, which can be dropped.
This way, we may represent the originally high-dimensional
data with fewer dimensions, without losing too much
information.
For instance, if we use the new axes (green axes), we can
drop the 2nd dimension because it does not contain much
variance. The original 2D data becomes 1D.
Factor analysis
Assume that people, at least some, may have a problem called
“statistics anxiety”. We want to know, if this construct exists, how many
people suffer from it? How severe different cases can be?
We can design a “Statistics Anxiety Questionnaire”, in which we ask
people questions related to the ‘statistics anxiety’ we want to know.
1. Statistics makes me cry
2. My friends will think I’m stupid for not being able to cope with statistics
3. Standard deviations excite me
4. I dream that Pearson (as in Pearson Correlation) is attacking me with
correlation coefficients
5. I don’t understand statistics
6. I have little experience of computers
7. All computers hate me
8. I have never been good at mathematics
Factor analysis
Remember that you assume ‘statistics anxiety’ is a psychological
construct: the questionnaire items ask about this construct from
different perspectives (i.e. maths, computer, or specific statistical
terms …etc), at least partly because the same construct may reveal
itself differently for different people and under different
circumstances.
In other words, your questionnaire items should be somewhat
correlated. For instance, a person who rates “totally agree” on “I
don’t understand statistics” shouldn’t rate “totally disagree” on
“Statistics makes me cry”, because if so the person contradicts
him/herself.
Some other questions could be less correlated. For instance, “I have
little experience of computers” may or may not be true for people
who hate maths.
Factor analysis
Therefore, you need to actually Corr
elati
prove that your questionnaire ons
is measuring only one 1 2 3 4 5 6 7 8
psychological construct, i.e., 1 1
statistics anxiety. But how? 2 -.099 **
1
3 -.337 **
.318** 1
You can calculate the 4 .436 **
-.112** -.380** 1
correlations between all items. 5 .402 **
-.119** -.310** .401** 1
6 1
In our case, 8 items result in a .217 -.074** -.227** .278** .257**
**
7 .305 **
-.159** -.382** .409** .339** .514** 1
8 x 8 correlation matrix. The 8 .331 **
-.050* -.259** .349** .269** .223** .297** 1
items inside the matrix are the **.
correlations between Corr
elati
questionnaire item 1 and item on is
1, item 1 and item 2…item 8 signi
Factor analysis
Factor analysis finds the eigenvectors and the eigenvalues of this
correlation matrix. If only one of the eigenvalues is large, while the
others are small, mathematically, it means that the variances along
the original 8 dimensions can be explained by just one new dimension,
which is the direction of the eigenvector. Psychologically, it means
that the 8 items in the questionnaire are in fact measuring just one
underlying construct, namely ‘statistics anxiety’. In other words, our
hypothesis will be supported.
If we find that two (or more) eigenvectors have large eigenvalues, it
means that the questionnaire may in fact be measuring two related
but different constructs. The variance along each of the original 8
dimensions may be better explained by one of the two constructs:
some may show more variance along construct A, some may show
more along construct B. These are called “factor loadings”, which can
be interpreted as which item measures which psychological construct.