Assignment and Pointer
Semantics
Rida A. Bazzi
𝜆 𝜆
𝜆
x->x->x->x = x->x->x->x
© RIDA BAZZI This document is copyrighted by Rida Bazzi and should not be shared or used
for other than the purpose for which it was provided to you.
Assignment Semantics
Assignment semantics is concerned with the meaning of
a = expr
where:
• a can be a variable or, more generally, an expression and
• expr is an expression.
In general, there are two kinds of semantics used by programming languages
1. Copy semantics. This semantics is used in C, C++ and is used for basic types
in Java
2. Reference semantics. This semantics is use by Java for assigning object
values
In what follows I will concentrate on copy semantics, but I will later explain
reference semantics
Box-Circle Diagram
address
value
name location
binding/associating a location to a name
A box-circle diagram makes clear the distinction between a name and the location
that that is associated with the name. In general, a name can have a location
associated with it, as is the case for a variable in C, or might not have a location
associated with it, such as the name MAX_INT which is the name of a constant.
Also, we make the distinction between the location and the value stored in the
location. A location can store different values at different times.
Finally, we make the distinction between a location and the address of the
location. The location itself can be thought of as a physical location but the
address is just a number that can be used to describe the “position” of the
location in memory. The address itself is not a name of the location but can be
used in naming the location. For example, 1024 is an address. By itself, 1024 is
not a name of a location, but “The location at address 1024” is a name for the
location whose address is 1024!! This is not playing on words. The distinction is
real. Note that we say “a” name not “the” name because one location can have
multiple names.
We say that the location (the box) is associated with the name. The line between
the name and the location represents this association (which is also called
binding)
In general, a name need not be a simple variable name. We also, treat more
involved expressions as names. For example, a[i] where a is an array is the name
of a location (that depends on the value of i).
In general, we distinguish between expressions that have names associated with
them from other expressions. Expressions that have locations associated with them
are called l-values. Expressions that have values, but no locations associated with
them, are called r-values. This is further described next.
Assignment under copy semantics
There are two general forms of assignment. Each assignment we will consider can
be reduced to one of the following two forms
1. a= b copy value in location associated with b to location
associated with a
2. a= 5 copy value 5 to location associated with a
5
In both forms, a value is copied to a location. The difference is where the value
comes from.
l-values and r-values
l-value is an expression that has a location associated with it
Examples a, b[i+j], b[i+b[i]], *p, **q
r-value is an expression that does not have a location associated with it, but has a
value associated with it
Examples 5, i+j, 2*a
Possibilities for assignment
1. l-value1 = l-value2 copy value in location associated with
l-value2 to location associated with l-value1
l-value1
l-value2
2. l-value = r-value copy value of the r-value to the location
associated with l-value
l-value
✘
r-value
3. r-value = l-value not possible
4.
✘
r-value1 = r-value2 not possible
Value of an expression
if an expression is an l-value, we define its value to be the value stored in the
location associated with it
if an expression in an r-value, we define its value to be the value associated with it
Examples
a = 5; // at this point, the value of the expression “a” is 5
b = a+5; // the value of the expression “a+5”, which is an r-value, is 10
Pointer Semantics in C
Pointer declaration has the form
T * x;
where T is a type. We read the declaration as “the type of x is
pointer to T”
Examples int * x; // type of x is pointer to int
int * *x; // type of x is pointer to int *
If T * x; is a declaration for x, the location associated
with x stores a value which is the address of a location that stores a
value of type T
*x addry
x addry VT
VT is a value of type T
binding to illustrate points to illustrates that value of
that location is associated location associated with x is address
with name x (name x is the name of location that the arrows point to
of the location). This is not a pointer
The picture shows that the location “pointed to” by x is associated with the name *x. More
specifically, if x is a pointer variable, *x is an l-value. The location associated with *x is the
location whose address is equal to the value in the location associated with x (the location
whose address is the value of the expression x). More simply, the location associated with
*x is the location “pointed to” by x. Or *x is a name for the location pointed to by x.
Pointer Semantics Examples
We consider two pointer variables x and y and assume
int *x;
int *y;
...
// point 1
The ... represents some missing code that is not shown, and we assume that, at point 1, the box-
circle diagram is the following
m1 *x addr2 m2
x addr2 2
m3 addr4 m4
*y
y addr4 5
In the diragram, m1, m2, m3 and m4 are used to refer to the boxes without using program
variables. Notice how in the diagram location m2 is associated with *x and location m4 is
associated with *y.
Assuming the situation is as show above, we consider the effects of various assignments.
1. x = y : this will copy the value in the location associated with y to the location
associated with x. The situation becomes as follows
m1 addr2 m2
*x
x addr4 2
m3 addr4 m4
y addr4 5
*y
notice how the value in the location associated with y (addr4) is copied to the location
associated with x. The result is that x points to m4.
2. *x = *y : this will copy the value from the location associated with *y to the location
associated with *x (remember that this is being applied to the situation above).
m1 addr2 m2
*x
x addr4 5
m3 addr4 m4
y addr4 5
*y
malloc(): memory allocation function
malloc() : input: integer which specifies the “size” in bytes
of the memory to be allocated
output: r-value which is the address of a location
type: the returned value has type void *, which only
means that it is a pointer. In order to assign the
returned value to a variable, it needs to be typecast.
malloc() allocates memory whose size is equal to the “size” parameter and returns the
address of the first byte of the allocated memory
The allocated memory is allocated on the heap and is not initialized by malloc()
Example
If x has type T *, we allocate memory with malloc() as follows
x = (T *) malloc(sizeof(T));
The call to malloc() allocates memory whose size is the size of a value of type T. The
returned value has type void * . That is why we use type casting (T *) when we assign
the value to x.
After executing the statement,
addry
x addry
memory allocated
with malloc()
Note C does not require that a value of type void * be typecast in order to assign it to
a variable of type T *. The type casting is implicitly done. Nevertheless, it is good
practice to have an explicit type case in this case. It makes the code more readable and
potentially easier to detect mistakes. This is why C++ requires typecasting with malloc().
Remember that code that you write is read much more often than it is written!
free(): memory de-allocation function
free() : input: pointer to memory that was previously
allocated
output: no output
free() de-allocates the memory by making it available for future allocation.
The input to free() must have a value which is the address of a previously
allocated memory. If the value passed to free() does not satisfy this requirement,
its behavior is undefined which means that you cannot rely on what will happen.
The size of the memory to be freed is not specified. The memory manager knows
the size because it stores that information when malloc() was previously called.
& : address operator
& & is a unary operator
the operand of & must be an l-value
the result is an r-value
Value The value of & l-value is equal to the address of the location associated
with the l-value
Example The value of &x is the address of the location associated with x
Type If x is of type T, then &x has type T * (pointer to T)
Example If x has type int, &x has type int *
* : dereference operator
* * is a unary operator
operand l-value or r-value
result always l-value
Location the location associated with *(expr), where expr is an expression that is
either an l-value or an r value is the location whose address is equal to the value
of expr.
* r-value the location associated with * r-value is the location
whose address is equal to the value of the r-value
addrm
Illustration *addr m
* l-value the location associated with * l-value is the location
whose address is the value in the location associated
with the l-value (the value of the l-value)
Illustration addrm
x addrm
*x
Informally, if x is a pointer, the location associated with *x is the location “pointed
to” by x.
Type If x has type T *, *x has type T
Box-circle diagram revisited
l-value
value
name location
binding/associating a location to a name
• An l-value is an expression that has a location associated with it.
• The l-value itself is the name of the location.
• The address of the location associated with an l-value is given by &l-value.
• The location associated with the l-value contains a value
• For simplicity, we can call the value in a location associated with an l-value, the
value of the l-value
• The address of the location associated with an l-value is not the name of the
location associated with the l-value
An analogy can help in making the distinctions clearer
location
Bazzi’s house
value stored in
two different
names for the the house
same location
The house at
3.14 𝝀 lane
3.14 𝝀 lane
(like *address) address
Example
int x;
int *y;
value is int
value is address of
a location that
stores a value of
type int
y = &x; // &x = address of location associated with x
// equivalent to y = addrx where addrx is the address
// of location associated with x
// note the arrow below point to the box not to
// the value inside the box
addrx
y addrx
Example continued
y = &*y; // *y is an l-value
// the location associated with *y is the
// location whose address is the value in
// the location associated with y (the value
// of y)
addrx
*y
y addrx
// &*y is the address of the location
// associated with *y = addrx
y = *&y; // *&y is an l-value
// the location associated with *&y is the
// location whose address is equal to
// &y = addr y
*y
y addrx
*&y
// y = *&y is equivalent to y = y
// so, the value of y does not change
Example continued
x = 1;
x 1
y addrx
y = (int *) malloc(sizeof(int)); // y = addr m address of memory
// location allocated by malloc()
x 1
addrm
y addrm
*y
int value
Example continued
*y = x; // copy value in location associated with x to location
// associated with *y
x 1
addrm
y addrm 1
*y
Structures
When we declare a structure
struct {
int i, j;
} x;
We represent the box-circle diagram for the structure as follows:
m11
i
x m1
j
m12
Note how the locations for x.i and x.j are inside the location we
associate with x.
m1 is the location associated with x
m11 is the location associated with i
m12 is the location associated with j
Arrays
When we declare an array, each entry in the array will have a
corresponding box. I will give an example with an array of structures
struct {
int i, j;
} x[4];
We represent the box-circle diagram for the array of structures as
follows:
i
x[0]
j x[1].j location
i
x[1]
j x[1].j value
i
x[2]
j
i
x[3]
j
Pointers with Structures
When we declare a structure
struct st {
int i; If x is a global variable, it is initialized
struct st * next; to NULL. If x is a local variable, it is
} * x; uninitialized (wild pointer)
Notice here that x is a pointer and the value in the location
associated with x is the address of a location that stores a value of
type st.
If we execute
x = (struct st *) malloc(sizeof(struct st)); // addr m returned by call
we get
addrm
*x
i
x m
next
Aliases
Two expressions are aliases of each other if they have the same
location associated with them. In other words, the two expressions
are two different names for the same location.
Since the definition requires that the two expressions are the names
of the same location, it follows that the definition only applies to l-
values.
We have already seen that x and *&x are aliases of each other
In general, there are multiple ways to obtain aliases
1. pointers
2. arrays
3. pass by reference (we will describe that later)
4. assignment with reference semantics (Java)
Pointer Aliases: Example
int * x;
int * y;
x = (int *) malloc(sizeof(int));
*x
x
at this point y’s value is uninitialized (assuming it is a local variable. If
it is a global variable, it will be initialized to 0)
If we execute y = x, we get
1 2
*x
x
*y
y
*y is an alias of *x because they have the same location ( location [2]
) associated with them
y is NOT an alias of x because they have different locations (locations
[1] and [3] ) associated with them
Array Alias Example
int a[10];
int i,j;
i = 5;
j = 3;
a[i-2] is an alias of a[j]
Dangling Reference
Definition A pointer is a dangling reference if its value is the address
of a location that:
• has been allocated and
• has been deallocated
Dangling Reference: Example 1
pointer to deallocated memory
int *x;
int *y;
x = (int *) malloc(sizeof(int));
y = x;
free(x); // frees location m
// at this point both x and y are dangling references
// because their values are the address of the de-allocated memory m
// it is x and y that are the dangling references not *x and *y
*x
x m
*y deallocated
y
Note. free(x) frees the location “pointed to” by x.
free(x) does not change the value of x.
Dangling Reference: Example 2
pointer to deallocated memory
int **x;
int * y;
x = (int **) malloc(sizeof(int *)); // mem1
*x = (int *) malloc(sizeof(int)); // mem2
y = *x;
pointer to int int
*x mem1 mem2
**x
x
*y
y
pointer to int*
For the code above, remember that if T * x; is a declaration, then
we allocate memory for x as follows
x = (T * ) malloc(sizeof(T))
In the example above T is int * which explains the code
x = (int * *) malloc(sizeof(int *));
Also, recall that if T * x; is a declaration *x is of type T. In the code
above, T is int *, so we allocate for *x as follows
*x = (int *) malloc(sizeof(int))
Dangling Reference: Example 2
pointer to deallocated memory
int **x;
int * y;
x = (int **) malloc(sizeof(int *));
*x = (int *) malloc(sizeof(int));
y = *x;
free(*x);
*x
**x
x
*y deallocated
y
Now *x and y are dangling references because their values are
the address of the deallocated memory. It is *x and y that are
the dangling references not **x and *y.
Note free(*x) frees the memory pointed to by *x
free(*x) does not modify the value of *x
Dangling Reference: Example 3
pointer to local variable of a function that exited
int * f()
{ int x; // memory for x allocated on stack
// point 1
return &x;
}
main()
{
int *y;
y = f(); // memory for x deallocated when function
// returns but y still point to it
// point 2
// point 1 // point 2
y main() y main()
stack
x f() x
memory for f() is
deallocated
when f() exits, its memory on the stack is
deallocated, and the value of y is the
address of deallocated memory (of x), so y
is a dangling reference
Dangling Reference: Example 4
pointer to variable from outside its scope
{ int *x;
{ int y;
x = &y; // point 1
}
// point 2
}
At point 1, the value of x is the address of the local variable y
At point 2, y is no longer accessible and its memory is reclaimed
(de-allocated) but x still points to the memory previously
associated with y
Note. In practice, the memory allocated for y might not be
reclaimed, but since y is out-of-scope, its memory can be de-
allocated by the compiler and it should be treated as de-allocated
memory
Dangling References
• Possible in C
• Possible in Ada if unchecked_deallocation_package is used
• Not possible in Java
Garbage (aka memory leak not )
A location is garbage if
• The location has been allocated and
• The location has not been deallocated and
• The location is no longer accessible by the program
What does no longer accessible mean?
It means that you cannot refer to it. Or that the memory
does not have a name.
Example
*x mem1 mem2
**x
x
*y
y
If we execute x = &y
mem1 mem2
**x
ge
x rba
ga
*x
*y
y
mem1 has no name. It is garbage. The name of mem2 is *y or **x. It
is not garbage.
A location is garbage if it has not be deallocated and we cannot follow
a sequence of arrow from a program variable to the location
Why is garbage a problem?
Consider a long-running server that continuously processes incoming requests.
repeat
receive input
call f(input) to process input
produce output
forever
If the function f() allocates some memory that is not needed after the call and is
not deallocated, then as time goes by more and more memory gets allocated
without it being deallocated (which creates a big leak as shown on the next page).
At some point, there will be no more heap memory and malloc() will fail and the
program will fail.
BIG LEAK
Garbage: Example 1
{ int * x;
int * y;
int * z;
x = (int *) malloc(sizeof(int)); // mem 1 allocated
y = (int *) malloc(sizeof(int)); // mem 2 allocated
z = (int *) malloc(sizeof(int)); // mem 3 allocated
z = y;
y = x;
// point 1
}
At point 1, the box-circle diagram looks as follows
*x
x mem1
*y
y mem2
*z
g e mem3
z ba
r
ga
mem3 is garbage because there is no way to refer to it in the program. mem1
and mem2 are not garbage because they can be referred to in the program:
mem1 is associated with *x and *y and mem2 is associated with *z
Garbage: Example 2
exiting a function before free
f()
{ int * x;
x = (int *) malloc(sizeof(int)); // mem1 is allocated
}
main()
{
f(); // mem1 is garbage when function exit if free() is not called
}
On the other hand, the following call does not produce a garbage location
int * f()
{ int * x;
return (int *) malloc(sizeof(int)); // mem2 is allocated
}
main()
{ int * y;
y = f(); // here mem2 is not garbage because its name is *y
}
In C, heap memory is not deallocated unless it is deallocated explicitly using
free()
Stack memory is automatically deallocated when the function returns or
when the local scope is exited
Reference Semantics for Assignment: Java
In Java, object assignment does not have copy semantics.
If 01 and O2 are two objects of class C
Initially, there is no location associated with O1 and O2.
O1
O2
If we execute
O1 = new A;
we get
O1
O2
If we execute
O2 = O1;
we get
O1
O2
At this point, O2 is an alias of O1