Assignment and Pointer
Semantics
                             Rida A. Bazzi
                                                𝜆 𝜆
                                       𝜆
                               x->x->x->x   =   x->x->x->x
 © RIDA BAZZI This document is copyrighted by Rida Bazzi and should not be shared or used
for other than the purpose for which it was provided to you.
Assignment Semantics
Assignment semantics is concerned with the meaning of
             a = expr
where:
     •   a can be a variable or, more generally, an expression and
     •   expr is an expression.
In general, there are two kinds of semantics used by programming languages
1.   Copy semantics. This semantics is used in C, C++ and is used for basic types
     in Java
2.   Reference semantics. This semantics is use by Java for assigning object
     values
In what follows I will concentrate on copy semantics, but I will later explain
reference semantics
Box-Circle Diagram
                                address
                                                             value
     name                                 location
           binding/associating a location to a name
A box-circle diagram makes clear the distinction between a name and the location
that that is associated with the name. In general, a name can have a location
associated with it, as is the case for a variable in C, or might not have a location
associated with it, such as the name MAX_INT which is the name of a constant.
Also, we make the distinction between the location and the value stored in the
location. A location can store different values at different times.
Finally, we make the distinction between a location and the address of the
location. The location itself can be thought of as a physical location but the
address is just a number that can be used to describe the “position” of the
location in memory. The address itself is not a name of the location but can be
used in naming the location. For example, 1024 is an address. By itself, 1024 is
not a name of a location, but “The location at address 1024” is a name for the
location whose address is 1024!! This is not playing on words. The distinction is
real. Note that we say “a” name not “the” name because one location can have
multiple names.
We say that the location (the box) is associated with the name. The line between
the name and the location represents this association (which is also called
binding)
In general, a name need not be a simple variable name. We also, treat more
involved expressions as names. For example, a[i] where a is an array is the name
of a location (that depends on the value of i).
In general, we distinguish between expressions that have names associated with
them from other expressions. Expressions that have locations associated with them
are called l-values. Expressions that have values, but no locations associated with
them, are called r-values. This is further described next.
Assignment under copy semantics
There are two general forms of assignment. Each assignment we will consider can
be reduced to one of the following two forms
1.   a= b                 copy value in location associated with b to location
                          associated with a
2.   a= 5                 copy value 5 to location associated with a
                                             5
In both forms, a value is copied to a location. The difference is where the value
comes from.
l-values and r-values
l-value    is an expression that has a location associated with it
Examples a, b[i+j], b[i+b[i]], *p, **q
r-value    is an expression that does not have a location associated with it, but has a
           value associated with it
Examples 5, i+j, 2*a
Possibilities for assignment
1.   l-value1 = l-value2         copy value in location associated with
                                 l-value2 to location associated with l-value1
                                 l-value1
                                 l-value2
2. l-value = r-value             copy value of the r-value to the location
                                 associated with l-value
                                                     l-value
          ✘
                                                                         r-value
3.   r-value = l-value           not possible
4.
          ✘
     r-value1 = r-value2         not possible
Value of an expression
if an expression is an l-value, we define its value to be the value stored in the
location associated with it
if an expression in an r-value, we define its value to be the value associated with it
Examples
           a = 5;      // at this point, the value of the expression “a” is 5
           b = a+5; // the value of the expression “a+5”, which is an r-value, is 10
     Pointer Semantics in C
     Pointer declaration has the form
                    T * x;
     where T is a type. We read the declaration as “the type of x is
     pointer to T”
     Examples       int * x;        // type of x is pointer to int
                    int * *x;       // type of x is pointer to int *
     If         T * x;      is a declaration for x, the location associated
     with x stores a value which is the address of a location that stores a
     value of type T
                                                     *x            addry
                x                    addry                                  VT
                                                                  VT is a value of type T
     binding to illustrate                     points to illustrates that value of
     that location is associated               location associated with x is address
     with name x (name x is the name           of location that the arrows point to
     of the location). This is not a pointer
The picture shows that the location “pointed to” by x is associated with the name *x. More
specifically, if x is a pointer variable, *x is an l-value. The location associated with *x is the
location whose address is equal to the value in the location associated with x (the location
whose address is the value of the expression x). More simply, the location associated with
*x is the location “pointed to” by x. Or *x is a name for the location pointed to by x.
Pointer Semantics Examples
We consider two pointer variables x and y and assume
                int *x;
                int *y;
                ...
                // point 1
The ... represents some missing code that is not shown, and we assume that, at point 1, the box-
circle diagram is the following
                                                         m1 *x         addr2             m2
                              x             addr2                               2
                                                         m3            addr4             m4
                                                               *y
                              y             addr4                               5
In the diragram, m1, m2, m3 and m4 are used to refer to the boxes without using program
variables. Notice how in the diagram location m2 is associated with *x and location m4 is
associated with *y.
Assuming the situation is as show above, we consider the effects of various assignments.
1. x = y : this will copy the value in the location associated with y to the location
           associated with x. The situation becomes as follows
                                                    m1              addr2           m2
                                                              *x
                          x              addr4                              2
                                                    m3              addr4           m4
                          y              addr4                              5
                                                              *y
          notice how the value in the location associated with y (addr4) is copied to the location
          associated with x. The result is that x points to m4.
2. *x = *y : this will copy the value from the location associated with *y to the location
             associated with *x (remember that this is being applied to the situation above).
                                                    m1              addr2           m2
                                                              *x
                          x              addr4                              5
                                                    m3              addr4           m4
                          y              addr4                              5
                                                              *y
malloc(): memory allocation function
malloc() :    input:         integer which specifies the “size” in bytes
                             of the memory to be allocated
              output:        r-value which is the address of a location
              type:          the returned value has type void *, which only
                             means that it is a pointer. In order to assign the
                             returned value to a variable, it needs to be typecast.
malloc() allocates memory whose size is equal to the “size” parameter and returns the
address of the first byte of the allocated memory
The allocated memory is allocated on the heap and is not initialized by malloc()
Example
If x has type T *, we allocate memory with malloc() as follows
              x = (T *) malloc(sizeof(T));
The call to malloc() allocates memory whose size is the size of a value of type T. The
returned value has type void * . That is why we use type casting (T *) when we assign
the value to x.
After executing the statement,
                                                        addry
        x                  addry
                                                        memory allocated
                                                          with malloc()
Note C does not require that a value of type void * be typecast in order to assign it to
a variable of type T *. The type casting is implicitly done. Nevertheless, it is good
practice to have an explicit type case in this case. It makes the code more readable and
potentially easier to detect mistakes. This is why C++ requires typecasting with malloc().
Remember that code that you write is read much more often than it is written!
free(): memory de-allocation function
free() :     input:       pointer to memory that was previously
                          allocated
             output:      no output
free() de-allocates the memory by making it available for future allocation.
The input to free() must have a value which is the address of a previously
allocated memory. If the value passed to free() does not satisfy this requirement,
its behavior is undefined which means that you cannot rely on what will happen.
The size of the memory to be freed is not specified. The memory manager knows
the size because it stores that information when malloc() was previously called.
& : address operator
&            & is a unary operator
             the operand of & must be an l-value
             the result is an r-value
Value The value of & l-value is equal to the address of the location associated
with the l-value
Example The value of &x is the address of the location associated with x
Type If x is of type T, then &x has type T * (pointer to T)
Example If x has type int, &x has type int *
* : dereference operator
*               * is a unary operator
                operand       l-value or r-value
                result        always l-value
Location the location associated with *(expr), where expr is an expression that is
either an l-value or an r value is the location whose address is equal to the value
of expr.
    * r-value                 the location associated with * r-value is the location
                              whose address is equal to the value of the r-value
                                               addrm
Illustration                 *addr m
    * l-value                 the location associated with * l-value is the location
                              whose address is the value in the location associated
                              with the l-value (the value of the l-value)
Illustration                                                         addrm
                         x                addrm
                                                          *x
Informally, if x is a pointer, the location associated with *x is the location “pointed
to” by x.
Type            If x has type T *, *x has type T
     Box-circle diagram revisited
             l-value
                                                                 value
        name                                  location
                binding/associating a location to a name
 • An l-value is an expression that has a location associated with it.
 • The l-value itself is the name of the location.
 • The address of the location associated with an l-value is given by &l-value.
 • The location associated with the l-value contains a value
 • For simplicity, we can call the value in a location associated with an l-value, the
   value of the l-value
 • The address of the location associated with an l-value is not the name of the
   location associated with the l-value
 An analogy can help in making the distinctions clearer
                                                                                  location
 Bazzi’s house
                                                                            value stored in
two different
names for the                                                               the house
same location
The house at
3.14 𝝀 lane
                                               3.14 𝝀 lane
(like *address)                                                         address
Example
int x;
int *y;
                                                  value is int
                                                  value is address of
                                                  a location that
                                                  stores a value of
                                                  type int
y = &x;       // &x = address of location associated with x
              // equivalent to y = addrx where addrx is the address
              // of location associated with x
              // note the arrow below point to the box not to
              // the value inside the box
                      addrx
          y                   addrx
Example continued
y = &*y;         //   *y is an l-value
                 //   the location associated with *y is the
                 //   location whose address is the value in
                 //   the location associated with y (the value
                 //   of y)
                       addrx
            *y
            y                  addrx
                 // &*y is the address of the location
                 // associated with *y = addrx
y = *&y;         // *&y is an l-value
                 // the location associated with *&y is the
                 // location whose address is equal to
                 // &y = addr y
            *y
            y                  addrx
           *&y
                 // y = *&y is equivalent to y = y
                 // so, the value of y does not change
Example continued
x = 1;
                   x                1
                   y               addrx
y = (int *) malloc(sizeof(int));    // y = addr m address of memory
                                    // location allocated by malloc()
                   x                1
                                                         addrm
                   y               addrm
                                             *y
                                                        int value
Example continued
*y = x;   // copy value in location associated with x to location
          // associated with *y
          x               1
                                                 addrm
          y              addrm                           1
                                    *y
Structures
When we declare a structure
struct {
       int i, j;
} x;
We represent the box-circle diagram for the structure as follows:
                                                m11
                            i
                   x                                  m1
                            j
                                                m12
Note how the locations for x.i and x.j are inside the location we
associate with x.
m1 is the location associated with x
m11 is the location associated with i
m12 is the location associated with j
Arrays
When we declare an array, each entry in the array will have a
corresponding box. I will give an example with an array of structures
struct {
    int i, j;
} x[4];
We represent the box-circle diagram for the array of structures as
follows:
                            i
       x[0]
                            j                          x[1].j location
                            i
       x[1]
                            j                            x[1].j value
                            i
       x[2]
                            j
                            i
       x[3]
                            j
Pointers with Structures
When we declare a structure
struct st {
     int i;                            If x is a global variable, it is initialized
     struct st * next;                 to NULL. If x is a local variable, it is
} * x;                                 uninitialized (wild pointer)
Notice here that x is a pointer and the value in the location
associated with x is the address of a location that stores a value of
type st.
If we execute
x = (struct st *) malloc(sizeof(struct st)); // addr m returned by call
we get
                                                     addrm
                                  *x
                                                         i
       x                                                                              m
                                                     next
Aliases
Two expressions are aliases of each other if they have the same
location associated with them. In other words, the two expressions
are two different names for the same location.
Since the definition requires that the two expressions are the names
of the same location, it follows that the definition only applies to l-
values.
We have already seen that x and *&x are aliases of each other
In general, there are multiple ways to obtain aliases
    1.   pointers
    2.   arrays
    3.   pass by reference (we will describe that later)
    4.   assignment with reference semantics (Java)
Pointer Aliases: Example
       int * x;
       int * y;
       x = (int *) malloc(sizeof(int));
                                          *x
         x
at this point y’s value is uninitialized (assuming it is a local variable. If
it is a global variable, it will be initialized to 0)
If we execute y = x, we get
                                  1                               2
                                          *x
         x
                                          *y
         y
*y is an alias of *x because they have the same location ( location [2]
) associated with them
y is NOT an alias of x because they have different locations (locations
[1] and [3] ) associated with them
Array Alias Example
 int a[10];
 int i,j;
 i = 5;
 j = 3;
 a[i-2] is an alias of a[j]
Dangling Reference
Definition A pointer is a dangling reference if its value is the address
of a location that:
    • has been allocated                       and
    • has been deallocated
Dangling Reference: Example 1
pointer to deallocated memory
    int *x;
    int *y;
    x = (int *) malloc(sizeof(int));
    y = x;
    free(x); // frees location m
    // at this point both x and y are dangling references
    // because their values are the address of the de-allocated memory m
    // it is x and y that are the dangling references not *x and *y
                                           *x
       x                                                               m
                                            *y         deallocated
       y
    Note.    free(x) frees the location “pointed to” by x.
             free(x) does not change the value of x.
    Dangling Reference: Example 2
    pointer to deallocated memory
        int **x;
        int * y;
        x = (int **) malloc(sizeof(int *));   // mem1
        *x = (int *) malloc(sizeof(int));      // mem2
        y = *x;
                                                     pointer to int        int
                                   *x         mem1                               mem2
                                                                  **x
x
                                                                      *y
                              y
         pointer to int*
    For the code above, remember that if T * x; is a declaration, then
    we allocate memory for x as follows
        x = (T * ) malloc(sizeof(T))
    In the example above T is int * which explains the code
        x = (int * *) malloc(sizeof(int *));
    Also, recall that if T * x; is a declaration *x is of type T. In the code
    above, T is int *, so we allocate for *x as follows
        *x = (int *) malloc(sizeof(int))
    Dangling Reference: Example 2
    pointer to deallocated memory
        int **x;
        int * y;
        x = (int **) malloc(sizeof(int *));
        *x = (int *) malloc(sizeof(int));
        y = *x;
        free(*x);
                                   *x
                                                      **x
x
                                                        *y     deallocated
                              y
        Now *x and y are dangling references because their values are
        the address of the deallocated memory. It is *x and y that are
        the dangling references not **x and *y.
        Note     free(*x) frees the memory pointed to by *x
                 free(*x) does not modify the value of *x
        Dangling Reference: Example 3
        pointer to local variable of a function that exited
             int * f()
             { int x; // memory for x allocated on stack
                // point 1
                return &x;
             }
             main()
             {
               int *y;
               y = f();            // memory for x deallocated when function
                                // returns but y still point to it
                 // point 2
        // point 1                              // point 2
        y                      main()                y                         main()
stack
        x                      f()                   x
                                                         memory for f() is
                                                         deallocated
                                                   when f() exits, its memory on the stack is
                                                   deallocated, and the value of y is the
                                                   address of deallocated memory (of x), so y
                                                   is a dangling reference
Dangling Reference: Example 4
pointer to variable from outside its scope
    { int *x;
         {   int y;
             x = &y; // point 1
         }
         // point 2
     }
    At point 1, the value of x is the address of the local variable y
    At point 2, y is no longer accessible and its memory is reclaimed
    (de-allocated) but x still points to the memory previously
    associated with y
    Note. In practice, the memory allocated for y might not be
    reclaimed, but since y is out-of-scope, its memory can be de-
    allocated by the compiler and it should be treated as de-allocated
    memory
Dangling References
• Possible in C
• Possible in Ada if unchecked_deallocation_package is used
• Not possible in Java
Garbage (aka memory leak not                                                  )
        A location is garbage if
            • The location has been allocated                       and
            • The location has not been deallocated                 and
            • The location is no longer accessible by the program
            What does no longer accessible mean?
                  It means that you cannot refer to it. Or that the memory
                  does not have a name.
        Example
                                      *x   mem1                              mem2
                                                           **x
    x
                                                            *y
                                 y
        If we execute x = &y
                                           mem1                              mem2
                                                         **x
                                                   ge
x                                               rba
                                              ga
                                     *x
                                                           *y
                             y
        mem1 has no name. It is garbage. The name of mem2 is *y or **x. It
        is not garbage.
        A location is garbage if it has not be deallocated and we cannot follow
        a sequence of arrow from a program variable to the location
Why is garbage a problem?
 Consider a long-running server that continuously processes incoming requests.
     repeat
               receive input
               call f(input) to process input
               produce output
     forever
 If the function f() allocates some memory that is not needed after the call and is
 not deallocated, then as time goes by more and more memory gets allocated
 without it being deallocated (which creates a big leak as shown on the next page).
 At some point, there will be no more heap memory and malloc() will fail and the
 program will fail.
BIG LEAK
Garbage: Example 1
    { int * x;
      int * y;
      int * z;
        x   =   (int *) malloc(sizeof(int)); // mem 1 allocated
        y   =   (int *) malloc(sizeof(int)); // mem 2 allocated
        z   =   (int *) malloc(sizeof(int)); // mem 3 allocated
        z   =   y;
        y   =   x;
        // point 1
    }
At point 1, the box-circle diagram looks as follows
                                              *x
    x                                                                            mem1
                                              *y
    y                                                                            mem2
                                               *z
                                                                         g   e   mem3
    z                                                                 ba
                                                                    r
                                                                  ga
mem3 is garbage because there is no way to refer to it in the program. mem1
and mem2 are not garbage because they can be referred to in the program:
mem1 is associated with *x and *y and mem2 is associated with *z
Garbage: Example 2
exiting a function before free
    f()
    { int * x;
      x = (int *) malloc(sizeof(int)); // mem1 is allocated
    }
    main()
    {
        f(); // mem1 is garbage when function exit if free() is not called
    }
On the other hand, the following call does not produce a garbage location
    int * f()
    { int * x;
      return (int *) malloc(sizeof(int)); // mem2 is allocated
    }
    main()
    { int * y;
      y = f(); // here mem2 is not garbage because its name is *y
    }
    In C, heap memory is not deallocated unless it is deallocated explicitly using
    free()
    Stack memory is automatically deallocated when the function returns or
    when the local scope is exited
Reference Semantics for Assignment: Java
In Java, object assignment does not have copy semantics.
If 01 and O2 are two objects of class C
Initially, there is no location associated with O1 and O2.
  O1
  O2
If we execute
   O1 = new A;
we get
  O1
  O2
If we execute
  O2 = O1;
we get
  O1
  O2
At this point, O2 is an alias of O1