CSCI 3210: Theory of
Programming Language
                                            Data Types
                                  Based on Textbook Slides
         Dr. Zhijiang Dong @ Middle Tennessee State U        1
         niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   2
                 niversity
Data Types
   Each data object has a type:
       Values: for objects of that type
       Operations: for objects of that type
       Implementation: (Storage representation) for objects of that type
       Attributes: (e.g., name) for objects of that type
       Signature: (of operation f): f: type x type  type
                              Dr. Zhijiang Dong @ Middle Tennessee State U   3
                              niversity
Data Type
A  data type is a class of data objects together
  with a set of operations for creating and
  manipulating them.
 Every language has a set of primitive data
  types that are built into the language.
 In addition, a language may provide facilities
  to allow the programmer to define new data
  types.
                Dr. Zhijiang Dong @ Middle Tennessee State U   4
                niversity
Data Type Specification
 Thebasic elements of a specification of a
 data type are as follows:
    The attributes that distinguish data objects of that
     type,
    The values that data objects of that type may
     have, and
    The operations that define the possible
     manipulations of data objects of that type.
                   Dr. Zhijiang Dong @ Middle Tennessee State U   5
                   niversity
Data Type Specification
   Specification of an array data type
       Attributes
         the number of dimensions
         the subscript range for each dimension
         the data type of the components
       Values
         The set of numbers that form valid values for array
          components
       Operations
         Subscripting to select individual array components
         Create arrays
         Access attributes such as upper and lower bounds of
          subscripts
                        Dr. Zhijiang Dong @ Middle Tennessee State U   6
                        niversity
Data Type Implementation
 Thebasic elements of the implementation of a
 data type are as follows:
    The storage representation that is used to
     represent the data objects of the data type in the
     storage of the computer during program execution,
     and
    The manner in which the operations defined for the
     data type are represented in terms of particular
     algorithms or procedures that manipulate the
     chosen storage representation of the data objects.
                  Dr. Zhijiang Dong @ Middle Tennessee State U   7
                  niversity
Declaration
   Both specification and implementation are largely independent of
    the particular syntactic forms used in the language.
   Attributes of data objects are often represented syntactically by
    declarations or type definitions.
   A declaration is a program statement that serves to communicate
    to the language translator information about the name and type
    of data objects needed during program execution.
   Explicit declaration
     Like variable declaration in C
   Implicit/default declaration
     Declaration that hold when no explicit declaration is given.
     In FORTRAN, a simple variable INDEX may be used without
        explicit declaration. By default, it is assumed as an integer
        variable.
                       Dr. Zhijiang Dong @ Middle Tennessee State U   8
                       niversity
Declaration
 Declarations   serve several important
 purposes
    Choice of storage representations
    Storage management
    Polymorphic operations
    Type checking
                  Dr. Zhijiang Dong @ Middle Tennessee State U   9
                  niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   10
                 niversity
Elementary Data Types
   An elementary data object contains a single data
    value.
   A class of such data objects over which various
    operations are defined is termed an elementary data
    type.
       Integer
       Real
       Character
       Boolean
       Enumeration
       pointer
                      Dr. Zhijiang Dong @ Middle Tennessee State U   11
                      niversity
Elementary Data Types:
Specification
   Attributes
       Basic attributes of any data object, such as data type and
        name, are usually invariant during its lifetime.
       Some of the attributes may be stored in a descriptor as part
        of the data object during program execution.
   Values
       The type of a data object determines the set of possible
        values that it may contain.
       The set of values defined by an elementary data type is
        usually an ordered set with a least value and a greatest
        value; for any pair of distinct values, one is greater than the
        other.
                         Dr. Zhijiang Dong @ Middle Tennessee State U   12
                         niversity
Elementary Data Types:
Specification
 Operations
    The operations may be primitive operations,
     which means they are specified as part of the
     language definition, or
    They may be programmer-defined operations, in
     the form of subprograms or method declarations
     as part of class definitions.
                 Dr. Zhijiang Dong @ Middle Tennessee State U   13
                 niversity
Elementary Data Types:
Implementation
   Storage representation
     Strongly influenced by the underlying computer that will execute
       the program
     The attributes of elementary data objects
          For efficiency, data attributes can be determined by the compiler. (like
           C)
          For flexibility, data attributes can be stored in a descriptor as part of
           the data object at run time. (like LISP, Prolog)
   Operations
     Each operation may be implemented in one of three main ways:
          Directly as a hardware operation
          As a procedure or function subprogram
          As an inline code sequence
                            Dr. Zhijiang Dong @ Middle Tennessee State U          14
                            niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   15
                 niversity
Scalar Data Types
 Scalardata objects: objects that have a
 single attribute for its data object
    Numeric Data Types
        Integers
        Subranges
        Floating-point real numbers
    Enumerations
    Boolean
    Character
                     Dr. Zhijiang Dong @ Middle Tennessee State U   16
                     niversity
Numeric Data Types: Integers
   Specification
       The set of integer values defined for the type forms an
        ordered subset, within some finite bounds, of the infinite set
        of integers studied in mathematics.
       The range is typically defined as constants
       May have different specification: short, int, long, and char
       Operations
         Arithmetic operations: +, -, *, /, %, negation (-), identify (+)
         Relational operations: >, >=, <, <=, ==, !=
         Assignment operations: =
         Bit operations: &, |, ^, <<, >>
                         Dr. Zhijiang Dong @ Middle Tennessee State U    17
                         niversity
Numeric Data Types: Integers
   Implementation
       Using hardware-defined integer storage representation and
        set of hardware arithmetic and relational primitive
        operations on integers
   Three possible approaches as indicated in the next
    slide
       First one requires declarations and static type checking for
        integer data objects
       Second one requires more space (LISP use this approach)
       Third one cannot use arithmetic operations cannot be used
        directly
                        Dr. Zhijiang Dong @ Middle Tennessee State U   18
                        niversity
Numeric Data Types: Integers
          Dr. Zhijiang Dong @ Middle Tennessee State U   19
          niversity
Numeric Data Types:
Subranges
A  subrange of an integer data type is a
  subtype of the integer data type and consists
  of a sequence of integer values within some
  restricted range.
 Specification
     Declaration:
      A : 1 .. 10 (Pascal)
      A : integer range 1..10 (Ada)
     Operations: the same set of operations for
      ordinary integer types can be applied
                   Dr. Zhijiang Dong @ Middle Tennessee State U   20
                   niversity
Numeric Data Types:
Subranges
   Implementation
       Usually stored in fewer bits than a general integer value
       However, subrange values are often represented as the
        smallest number of bits for which the hardware implements
        arithmetic operations. This can generally be 8 or 16 bits.
       Better type checking
         MonthVar := 0 is illegal if Month: 1 .. 12. This can
           be performed at compile time
         If check involves computed values, run-time checking is
           needed to determine whether the new value is still iwthin
           the bounds declared. For example:
           MonthVar := MonthVar + 1
                        Dr. Zhijiang Dong @ Middle Tennessee State U   21
                        niversity
Numeric Data Types: Floating-
Point Real Numbers
   Specification
     The values form an ordered sequence from some hardware-
      determined minimum negative value to a maximum value, but the
      values are not distributed evenly across this range
     Often specified with only the single data type attribute real
      (FORTRAN), or float (C)
     Alternatively, the precision can be specified in terms of the
      number of digits used in the decimal representation. (Ada)
     Operations
         The same arithmetic, relational and assignment operations described
          for integers are also provided for reals.
         Due to roundoff issues, boolean operations , especially equality are
          sometimes restricted.
         Language designers may probibit equality between two real numbers
                          Dr. Zhijiang Dong @ Middle Tennessee State U       22
                          niversity
Numeric Data Types: Floating-
Point Real Numbers
   Implementations
       Using scientific notation: any number N can be expressed
        as: N = m x 2k for m between 0 and 1 & for some integer k
       Based on an underlying hardware representation in which
        a storage location is divided into a mantissa (i.e. the
        significant digits of the number) and an exponent.
                       Dr. Zhijiang Dong @ Middle Tennessee State U   23
                       niversity
Numeric Data Types: Floating-
Point Real Numbers
   IEEE standard 754 specifies both a 32- and 64-bit
    standard.
   Numbers consist of three fields:
       S: a one-bit sign field. 0 is positive.
       E: an exponent in excess-127 notation. Values (8 bits)
        range from 0 to 255, corresponding to exponents of 2 that
        range from -127 to 128.
       M: a mantissa of 23 bits. Since the first bit of the mantissa
        in a normalized number is always 1, it can be omitted and
        inserted automatically by the hardware, yielding an extra
        24th bit of precision.
                         Dr. Zhijiang Dong @ Middle Tennessee State U   24
                         niversity
Numeric Data Types: Floating-
Point Real Numbers
   Decoding IEEE format
   Given E, and M, the value of the representation is:
       Parameters                                            Value
       E=255 and M  0                                       An invalid number
    
        E=255 and M = 0                                   
       0<E<255                                               2{E-127}(1.M)
       E=0 and M  0                                         2 {-126}.M
       E=0 and M=0                                           0
                    Dr. Zhijiang Dong @ Middle Tennessee State U              25
                    niversity
Numeric Data Types: Floating-
Point Real Numbers
+1= 20*1= 2{127-127}*(1).0 (binary)                 0 01111111 000000...
+1.5= 20*1.5= 2{127-127}*(1).1 (binary) 0 01111111 100000...
-5= -22*1.25= 2{129-127}*(1).01 (binary)1 10000001 010000...
   This gives a range from 10-38 to 1038.
   In 64-bit format,the exponent is extended to 11 bits giving a
    range from -1022 to +1023, yielding numbers in the range 10 -
    308
        to 10308.
                        Dr. Zhijiang Dong @ Middle Tennessee State U       26
                        niversity
Numeric Data Types: Floating-
Point Real Numbers
   Converting Decimal Fractions to Binary (by converting the decimal value .625 to a
    binary representation)
   Step 1: Begin with the decimal fraction and multiply by 2. The whole number part of the
    result is the first binary digit to the right of the point.
    Because .625 x 2 = 1.25, the first binary digit to the right of the point is a 1.
    So far, we have .625 = .1??? . . . (base 2) .
   Step 2: Disregard the whole number part of the previous result (the 1 in this case) and
    multiply by 2 once again. The whole number part of this new result is the second binary
    digit to the right of the point.
    Because .25 x 2 = 0.50, the second binary digit to the right of the point is a 0.
    So far, we have .625 = .10?? . . . (base 2) .
   Step 3: Disregarding the whole number part of the previous result (this result was .50 so
    there actually is no whole number part to disregard in this case), we multiply by 2 once
    again. The whole number part of the result is now the next binary digit to the right of the
    point.
    Because .50 x 2 = 1.00, the third binary digit to the right of the point is a 1.
    So now we have .625 = .101?? . . . (base 2) .
   Step 4: In fact, we do not need a Step 4. We are finished in Step 3, because we had 0 as
    the fractional part of our result there.
    Hence the representation of .625 = .101 (base 2) .
                                Dr. Zhijiang Dong @ Middle Tennessee State U                      27
                                niversity
Numeric Data Types:
Enumerations
   Specification
     An enumeration is an ordered set of distinct symbolic values
     Need to defines both the literal names to be used for the values
      and their ordering
           C:
            Enum StudentClass {Fresh, Soph, Junior, Senior};
           Pascal
            type StudentClass = (Fresh, Soph, Junior, Senior);
       The type definition introduces the type name StudentClass,
        which may be used wherever a primitive type name might be
        used.
       It also introduces the literals of Fresh, Soph, Junior, and Senior.
       Operations
           Full set of relational operations
           Assignment, and operations like successor and predecessor
                          Dr. Zhijiang Dong @ Middle Tennessee State U        28
                          niversity
Numeric Data Types:
Enumerations
   Implementation
       Each value in the enumeration sequence is represented at
        run time by one of the integers 0, 1, 2….
       The successor and predecessor operations involve simply
        adding or subtracting one from the integer representing the
        value and checking to see that the result is within the
        proper range.
       Hardware-provided operations on integers can be used to
        implement the basic operations on enumerations.
                        Dr. Zhijiang Dong @ Middle Tennessee State U   29
                        niversity
Numeric Data Types: Booleans
   Specification
       The boolean data type consists of data objects having one
        of two values – true or false.
       In Pascal and Ada, the boolean data type is considered
        simply a language-defined enumeration
       Operations
         AND, OR, NOT,
                       Dr. Zhijiang Dong @ Middle Tennessee State U   30
                       niversity
Numeric Data Types: Booleans
   Implementation
       Theoretically, a single bit of storage is enough
       However, the storage representation is often extended
        to be a single addressable unit such as a byte,
        because single bits may not be separately
        addressable in memory
       true and false might be represented in two ways within
        this storage unit:
           A particular bit is used for the value, and the rest is ignored;
            or
           A zero value in the entire storage unit represents false, and
            any other nonzero value represents true.
                           Dr. Zhijiang Dong @ Middle Tennessee State U    31
                           niversity
Numeric Data Types:
Characters
   Specification
       A character data type provides data objects that have
        a single character as their value.
       The set of possible character values is usually taken
        to be a language-defined enumeration corresponding
        to the standard character set supported by the
        underlying hardware and operating systems.
       The ordering of the characters in this character set is
        called collating sequence for the character set.
       Operations
           Relational operations, assignment
                          Dr. Zhijiang Dong @ Middle Tennessee State U   32
                          niversity
Numeric Data Types:
Characters
   Implementation
       Character data values are almost always directly
        supported by the underlying hardware and operating
        system because of their use in input-output.
                      Dr. Zhijiang Dong @ Middle Tennessee State U   33
                      niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   34
                 niversity
Composite Data Types
   Composite Data Types
       Multiple attributes are often given for each such data type
       Their implementation usually involves a complex data
        structure organization by the compiler
   The following composite data types are usually
    considered elementary data objects
       Character strings
       Pointers
       User-defined Data types
       Files
                        Dr. Zhijiang Dong @ Middle Tennessee State U   35
                        niversity
Composite Data Types:
Character String
   Character String
       A data object composed of a sequence of characters
   Different types of character string
       Fixed declared length
       Variable length to a declared bound
       Unbounded length
   Examples
       Fixed length:
        char A(10) - C
        DCL B CHAR(10) - PL/I
        var C packed array [1..10] of char - Pascal
       Variable length:
        DCL D CHAR(20) VARYING - PL/I - 0 to 20 characters
        E = “ABC” - SNOBOL4 - any size, dynamic
        F = `ABCDEFG\0' - C - any size, programmer defined
                            Dr. Zhijiang Dong @ Middle Tennessee State U   36
                            niversity
Composite Data Types:
Character String
   The string in C is actually a bit more complicated.
       Strings are arrays of characters, but C has no string
        declaration.
       If an array of characters need to be treated as a C string, it
        is programmer’s responsibility to make sure the array
        include the null character
       A string constant written as “string liter” will have the null
        character appended to the string constant by the C
        translator.
                         Dr. Zhijiang Dong @ Middle Tennessee State U    37
                         niversity
Composite Data Types:
Character String
   Operations
       Concatenation
       Relational operations on strings based on lexicographic ordering
       Substring selection using positioning subscripts
           Providing the positions of its first and last characters; or
           Providing the first character position and length of the substring
       Substring selection using pattern matching
           The pattern specifies the form of the substring desired
                            Dr. Zhijiang Dong @ Middle Tennessee State U         38
                            niversity
Composite Data Types:
Character String
   Operations
       Input and output formatting
       Dynamic strings
           In Perl, string can be static and dynamic
            print ‘$ABC’ will output $ABC
            print “$ABC” will cause the string to be evaluated, and the
              value of the Perl variable $ABC is printed instead
                          Dr. Zhijiang Dong @ Middle Tennessee State U    39
                          niversity
Composite Data Types:
Character String
Storage representation for strings
                                     Dr. Zhijiang Dong @ Middle Tennessee State U   40
                                     niversity
Composite Data Types:
Pointers
 Pointer
     A data object whose value is a reference to some
      object
     First introduced in PL/I
 Pointers    serve two purposes:
     efficient (and sometimes intuitive) access to
      elaborated objects (as in C)
     dynamic creation of linked data structures, in
      conjunction with a heap storage manager
                    Dr. Zhijiang Dong @ Middle Tennessee State U   41
                    niversity
Composite Data Types:
Pointers
   In some languages (Pascal, Ada 83, and Modula-3), pointers are
    restricted to point only to objects in the heap.
   In other languages (PL/I, Algol 68, C, C++, and Ada 95), one can
    create a pointer to a nonheap object by using an “address of”
    operator.
   How and when is storage reclaimed for objects that are no longer
    needed?
       Programmer needs to reclaim space explicitly. (C, C++, Pascal,
        Modula-2)
           May cause memory leak and/or dangling reference problem
       Language implementation to reclaim unused objects automatically,
        as known garbage collection. (Modula-3, Java, C#, all
        functional and scripting languages)
           Impose certain run-time costs, and
           Raises the question of how the language implementation is to
            distinguish garbage from active objects.
                            Dr. Zhijiang Dong @ Middle Tennessee State U   42
                            niversity
Composite Data Types:
Pointers
   Operations:
       Allocation
       Deallocation
       Dereferencing of pointers to access the objects to which
        they point
       Assignment of one pointer to another
   The behavior of these operations depends heavily
    on
       Whether the language is functional or imperative, and
       Whether it employs a reference or value model for
        variables/names
                        Dr. Zhijiang Dong @ Middle Tennessee State U   43
                        niversity
Composite Data Types:
Pointers
   Value model for variables/names
       A variable contains a value.
   Reference model for variables/names
       A variable refers to an object with a value.
   Variables have to be dereferenced to get their r-value; however, the
    dereferencing is implicit in most languages, so the distinction is subtle.
   Example:
            b := 2;
            c := b;
            a := b + c;
       Value model: put the value 2 in b and then copy it into c. We then read these
        values, add them together, and place the resulting 4 in a.
       Reference Model: let b refer to 2 and then let c refer to it also. We then pass
        these reference to the + operator, and let a refer to the result, namely 4.
                             Dr. Zhijiang Dong @ Middle Tennessee State U             44
                             niversity
Composite Data Types:
Pointers
   Functional languages generally employ a reference model for
    names
     A pure functional language has no variables or assignments
   Variables in an imperative language may use either a value or a
    reference mode, or some combination of the two.
     C, Pascal, and Ada employ a value model
     Clu and SmallTalk employe a reference model
     Java charts an intermediate, in which the usual implementation of
       the reference model is made explicit in the language semantics
          Variables of build-in Java types (integers, floating-point numbers,
           characters, and Booleans) employ a value model
          Variable of user-defined types (strings, arrays, and other objects in
           the object-oriented sense of the word) employ a reference model
                            Dr. Zhijiang Dong @ Middle Tennessee State U           45
                            niversity
Composite Data Types:
Pointers
 Implementation
    Two major storage representations are used for
     pointer values
        Absolute addresses: a pointer value may be
         represented as the actual memory address of the
         storage block the foe data object
        Relative addresses: a pointer value may be
         represented as an offset from the base address of
         some larger heap storage block within which the data
         object is allocated.
                     Dr. Zhijiang Dong @ Middle Tennessee State U   46
                     niversity
Composite Data Types:
Pointers
   Pointers and arrays are closely linked in C.
       In most contexts, an unsubscripted array name in C is
        automatically converted to a pointer to the array’ first
        element (the one with index 0)
       Pointer arithmetic: Given a pointer to an element of
        an array, the addition of an integer k produces a pointer to
        the element k positions later in the array (earlier if k is
        negative)
       Remarkably, the subscript operator [ ] in C is actually
        defined in terms of pointer arithmetic.
         Correctness requires only that one operand of [ ] have an
           array or pointer type and the other have an integral type.
         Thus, A[3] is equivalent to 3[A].
                        Dr. Zhijiang Dong @ Middle Tennessee State U    47
                        niversity
Composite Data Types:
Pointers
   C pointers and arrays
           int *a == int a[]
           int **a == int *a[]
   BUT equivalences don't always hold
     Specifically, a declaration allocates an array if it
      specifies a size for the first dimension
     otherwise it allocates a pointer
    int **a, int *a[]   pointer to pointer to int
    int *a[n], n-element array of row pointers
    int a[n][m], 2-d array
                      Dr. Zhijiang Dong @ Middle Tennessee State U   48
                      niversity
Composite Data Types:
Pointers
   Compiler has to be able to tell the size of the things to
    which you point
     So the following aren't valid:
        int a[][];                  bad
        int (*a)[];                 bad
       C declaration Right Left rule: read right as far as you
        can (subject to parentheses), then left, then out a level
        and repeat
        int *a[n];       n-element array of pointers to
                               integer
        int (*a)[n];     pointer to n-element array of
                               integers
                       Dr. Zhijiang Dong @ Middle Tennessee State U   49
                       niversity
Composite Data Types: Files
   A file is a data structure with two special properties:
     It ordinarily is represented on a secondary storage
       device such a a disk or tape and thus may be much
       larger than most data structures of other types
     Its lifetime may encompass a greater span of time
       than that of the program creating it.
   Two general use for files
     For input and output of data to an external operating
       environment
     As temporary scratch storage for data when not
       enough high-speed memory is available
                     Dr. Zhijiang Dong @ Middle Tennessee State U   50
                     niversity
Composite Data Types: Files
 Different   types of files
     Sequential files: a data structure composed of a
      linear sequence of components of the same type
     Direct-Access files: any single component may be
      accessed at random, just as in an array or a
      record
     Indexed Sequential files: Similar to a direct-
      access file, with the additional facility to access
      components in sequence, beginning from a
      component selected at random.
                    Dr. Zhijiang Dong @ Middle Tennessee State U   51
                    niversity
Composite Data Types: Files
 Declaration
  Master: file of EmployeeRec; (Pascal)
     Defines a file variable named Master, whose
      components are of type EmployeeRec.
 Tyically,
         the file may be accessed in either
  read mode or write mode.
     In either mode, there is a file-position
      pointer that designates a position before the
      first file component, between two components, or
      after the last component.
                   Dr. Zhijiang Dong @ Middle Tennessee State U   52
                   niversity
Composite Data Types: Files
 Operations
  Open
  Read
  Write
  End-of-file test
  Close
                Dr. Zhijiang Dong @ Middle Tennessee State U   53
                niversity
Composite Data Types: Files
   Implementation
     The underlying operating system has the primary
      responsibility for the implementation of files because
      files are created and manipulated by various
      programming language processors and utilities.
       File operations are primarily implemented by calls on
        primitives provided by the operating system.
           From the language viewpoint, the primary implementation
            problem comes from the need to provide storage for system
            data and buffers required by the operating system primitives.
                           Dr. Zhijiang Dong @ Middle Tennessee State U     54
                           niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   55
                 niversity
Structured Data Types
   The current goal in programming language design is
    to make the distinctions in the various forms of data
    transparent to the programmer who uses the data
    types.
   Four basic mechanisms exist to provide the
    programmer with the ability to create new data types
    and operations on that type.
       Structured data
       Subprograms
       Type declarations
       Inheritance
                       Dr. Zhijiang Dong @ Middle Tennessee State U   56
                       niversity
Structured Data Types
   A data object that is constructed as an aggregate of other data
    objects, called component, is termed a structured data object or
    data structure.
     A component may be elementary or another data structure
   The bindings of data structures to values, names, and locations
    are important and somewhat more complex comparing to
    elementary data types
     How to indicate the component data objects of a data structure
       and their relationships in such a way that selection of a
       component is straightforward?
       Many operations on data structures bring up storage
        management issues that are not present for elementary data
        objects.
                         Dr. Zhijiang Dong @ Middle Tennessee State U   57
                         niversity
Structured Data Types
 Structured   Data Types
    Vectors
    Arrays
    Records
    Lists
    Sets
                 Dr. Zhijiang Dong @ Middle Tennessee State U   58
                 niversity
Structured Data Types:
Specification
   The major attributes for specifying data structures
    include the following
       Number of components
         A data structure maybe fixed size such as array, record, or
          variable size such as lists, stacks, sets
         Variable-size data structure types usually provide insert and
          delete operations
       Type of each component
         Homogeneous: all components of the same type
         Heterogeneous: components of different types
       Mechanism to be used for selecting components
       Maximum number of components
       Organization of the components
                         Dr. Zhijiang Dong @ Middle Tennessee State U   59
                         niversity
Structured Data Types:
Specification
   Specification of the domain and range of operations on data
    structure types may be given in much the same manner as for
    elementary types.
   Some new classes of operations are of particular importance:
     Component selection operations
           Random selection: an arbitrary component can be accessed
           Sequential selection: components are selected in a predetermined
            order
       Whole data structure operations
           Operations take entire data structures as arguments and produce
            new data structures as results
           Examples: addition of two arrays, copy record, union of two sets
       Insertion/deletion of components
       Creation/destruction of data structures
                            Dr. Zhijiang Dong @ Middle Tennessee State U       60
                            niversity
Structured Data Types:
Implementation
   Implementation consideration for data structure
    types include the same issues as for elementary
    types.
   In addition, two new issues develop that strongly
    affect the choice of storage representation
       Efficient selection of components from a data structure,
        and
       Efficient overall storage management for the language
        implementation.
                        Dr. Zhijiang Dong @ Middle Tennessee State U   61
                        niversity
Implementation:
Storage Representation
   Storage representations for a data structure include
     Storage for the components of the structure, and
     An optional descriptor that stores some or all of the attributes of
       the structure
   Two basic storage representations
     Sequential representation
           Stored in a single contiguous block of storage that includes both
            description and components
           Typically used for fixed-size structures and sometimes for
            homogeneous variable-size structures such as strings or stacks
       Linked representation
           Stored in several noncontiguous blocks of storage, with the blocks
            linked together through pointers.
           Commonly used for variable-size structures such as lists
                             Dr. Zhijiang Dong @ Middle Tennessee State U        62
                             niversity
Implementation:
Storage Representation
          Dr. Zhijiang Dong @ Middle Tennessee State U   63
          niversity
Implementation: Operations
   Implementation of operations on data structures
       Component selection is of primary importance
       Efficiency of both random-selection and sequential-
        selection operations is needed
       Sequential-selection should more efficiently than simply as
        a sequence of random selections
       These two basic types of component selection are
        implemented differently for sequential and linked storage
        representations
                        Dr. Zhijiang Dong @ Middle Tennessee State U   64
                        niversity
Operation Implementation:
Sequential Representation
   Random selection involves
    a base-address-plus-
    offset calculation using
    an accessing formula.
   The relative location of the
    selected component within
    the sequential block is
    called its offset.
                    Dr. Zhijiang Dong @ Middle Tennessee State U   65
                    niversity
Operation Implementation:
Sequential Representation
•   Selection of a sequence of components from the
    structure is possible by the following steps, for a
    homogeneous structure such as an array
    •   To select the first component of the sequence, use the
        base-address-plus-offset calculation
    •   To advance to the next component in the sequence,
        add the size of the current component to the location
        of the current component
                      Dr. Zhijiang Dong @ Middle Tennessee State U   66
                      niversity
Operation Implementation:
Linked Representation
•   Random selection of a component from a linked
    structure involves following a chain of pointer from
    the first block of storage in the structure to the
    desired component.
•   Selection of a sequence of components proceeds by
    selecting the first component and then following the
    link pointer from the current component to the next
    component for each subsequence selection
                    Dr. Zhijiang Dong @ Middle Tennessee State U   67
                    niversity
Type Checking on Structured
Data Types
•   Type checking on structured data types is more complex
    because component selection operations must be taken into
    account
    •   Existence of a selected component
        •   The arguments to a selection operation may be of the right types,
            but the component designated may not exist in the data structure
    •   Type of a selected component
        •   A selection sequence may define a complex path through a data
            structure to the desired component. For example:
                              A[2][3].link->item
        •   To perform static type checking, the translator should be able to
            determine the type of component selected by any valid
            composite selector
                            Dr. Zhijiang Dong @ Middle Tennessee State U        68
                            niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   69
                 niversity
Array
   Arrays are the most common and important composite
    data types
   arrays are usually homogeneous
   Semantically, they can be thought of as a mapping
    from an index type to a component or element type
   Array declaration typically specify the following
    attributes
     Number of components
     Data type of each component
     Subscript to be used to select each component
         Maybe a range of values as -5 .. 5, or
         An upper bound with an implied lower bound
                        Dr. Zhijiang Dong @ Middle Tennessee State U   70
                        niversity
Array Declaration
•   Pascal
        V: array [-5 .. 5] of real;
•   C
        Float a[10];
•   FORTRAN
        character, dimension (1:26) :: upper
        character (26) upper     !shorthand notation
•   Ada
        upper: array (character range ‘a’ .. ‘z’) of
          character;
                       Dr. Zhijiang Dong @ Middle Tennessee State U   71
                       niversity
Array Storage
   Array in most language implementations are stored in contiguous
    locations in memory
   One dimensional array
     ith element is stored immediately after the (i-1) th element. (subject
       to alignment constraints)
   Two dimensional array
     Row major order (Most programming languages)
           A[i, j] is followed by A[I, j+1] immediately
       Column major order (FORTHAN)
           A[I, j] is followed by A[i+1, j] immediately
       Row major order is more popular
       The difference between row- and column-major layout can be
        important for programs that use nested loops to access all the
        elements of a large, multi-dimensional array. (efficiency because
        of cache)
                              Dr. Zhijiang Dong @ Middle Tennessee State U   72
                              niversity
Array Storage
Row-major order vs. Column-major order
              Dr. Zhijiang Dong @ Middle Tennessee State U   73
              niversity
Array Storage: Alternatives
   Row-pointer layout is an alternative to contiguous allocation.
     Rather than require the row of an array to be adjacent, they allow
       them to lie anywhere in memory, and create an auxiliary array of
       pointers to the rows.
   Advantages of row-pointer layout
     Individual elements can be access more quickly, especially on
       CISC machines with slow multiplication instruction
     Rows may have different lengths, without devoting space to
       holes at the ends of the rows. (called ragged array)
     An array can be constructed from preexisting rows without
       copying.
   Disadvantages: Requires more space
   C, C++, and C# provide both contiguous and row-pointer
    organizations for multidimensional arrays.
   Java uses the row-pointer layout for all arrays
                        Dr. Zhijiang Dong @ Middle Tennessee State U   74
                        niversity
Array Storage Comparison
Contiguous array allocation v. row pointers in C. The declaration on the left is a tr ue two-dimensional array. The slashed
boxes are NUL bytes; the shaded areas are holes. The declaration on the right is a ragged array of pointers to arrays of
character s. In both cases, we have omitted bounds in the declaration that can be deduced from the size of the
initializer (aggregate). Both data structures permit individual characters to be accessed using double subscripts, but the
memory layout (and corresponding address arithmetic) is quite different.
                                        Dr. Zhijiang Dong @ Middle Tennessee State U                                          75
                                        niversity
Accessing Array Elements
          Dr. Zhijiang Dong @ Middle Tennessee State U   76
          niversity
Accessing Array Elements:
Virtual Origin
   Rewriting access equation:
      L-value(A[I,J]) =  - d1*L1 - d2*L2 +I*d1 + J*d2
   Set I = 0; J= 0;
      L-value(A[0,0])     =  - d1*L1 - d2*L2 +0*d1 + 0*d2
                          =  - d1*L1 - d2*L2, which is a constant.
   Call this constant the virtual origin (VO); It
    represents the address of the 0th element of the
    array.
           L-value(A[I,J]) = VO +I*d1 + J*d2
                        Dr. Zhijiang Dong @ Middle Tennessee State U   77
                        niversity
Dope Vectors
   Dope vector is used to store array shape information
    such as
       Lower bound of each dimension
       The size of each dimension
       Upper bound of each dimension (optional, can be included
        to avoid computing repeatedly at runtime)
   To perform semantic checking as well as address
    calculation.
           If these information are statically known, the translator can
            look them up in the dope vector
           If these information are not statically known, the translator
            must generate code to look them up in a dope vector at run
            time.
                          Dr. Zhijiang Dong @ Middle Tennessee State U   78
                          niversity
Array Storage Representation
  One dimensional array
         Right figure: Two dimensional array
                     Dr. Zhijiang Dong @ Middle Tennessee State U   79
                     niversity
Slices
A slice or section is a rectangular portion of
 an array.
 Fortran  90 and provides extensive facilities
 for slicing, as do many scripting languages,
 including Perl, Python, Ruby, and R.
 Adaprovides more limited support: a lice is
 simply a contiguous range of elements in a
 one-dimensional array.
               Dr. Zhijiang Dong @ Middle Tennessee State U   80
               niversity
Slices
         Dr. Zhijiang Dong @ Middle Tennessee State U   81
         niversity
Array Slices in Fortran 90
                                                                                The slashes in the second
                                                                                subscript delimit an explicit
                                                                                list of positions.
   a: b: c in a subscript indicates positions a, a+c, a+2c, ...through b.
   If a or b is omitted, the corresponding bound of the array is assumed.
   If c is omitted, 1 is assumed.
   It is even possible to use negative values of c in order to select positions in reverse order.
                                 Dr. Zhijiang Dong @ Middle Tennessee State U                             82
                                 niversity
Array Operations
   Most programming languages permit selection of an
    element and assignment operations on arrays
   A few languages (Ada, Fortran 90) allow arrays to
    be compared for equality
       Ada even allows one-dimensional arrays whose elements
        are discrete to be compared for lexicographic ordering.
   Fortran 90 has a very rich set of array operations
    that take entire arrays as arguments such as A + B.
                       Dr. Zhijiang Dong @ Middle Tennessee State U   83
                       niversity
Associative Arrays
 In some applications, there is a desire to
  access information by name without having a
  predefined ordering or enumeration of the
  appropriate subscripts.
 An associative array (map, table, or
  dictionary) is an abstract data type composed
  of a collection of (key, value) pairs, such that
  each possible key appears at most once in
  the collection.
                Dr. Zhijiang Dong @ Middle Tennessee State U   84
                niversity
Associative Arrays
 In  Perl, an associative array is created with
  the % operators
  %ClassList = (
                    “Michelle”, ‘A’,
                    “Doris”,          ‘B’,
                    “Michael”, ‘D’ );
 It creates a three-element associative array:
  Student name as the key, and grade as the
  value.
                Dr. Zhijiang Dong @ Middle Tennessee State U   85
                niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   86
                 niversity
Record
   Record/Structure: A data structure composed of a
    fixed number of components of different types.
       usually laid out contiguously
       possible holes for alignment reasons
     smart compilers may re-arrange fields to minimize
      holes (C compilers promise not to)
   Record specification
       Type of each component (called field)
       Name of each component (called field name)
       The number of components (implied)
       Name of the structure
                      Dr. Zhijiang Dong @ Middle Tennessee State U   87
                      niversity
Record Declaration
struct element {                     type element = record
   char name[2];                        name : two_chars;
   int atomic_number;                   atomic_number : integer;
   double atomic_weight;                atomic_weight : double;
   _Bool metallic;                      metallic : Boolean;
}; /*C Version*/                     }; /*Pascal Version*/
                    Dr. Zhijiang Dong @ Middle Tennessee State U   88
                    niversity
Record:
Storage Representation
•   Likely layout in memory for objects of type element
    on a 32-bit machine. Alignment restrictions lead to
    the shaded “holes.”
                    Dr. Zhijiang Dong @ Middle Tennessee State U   89
                    niversity
Array of Record
•   Array element
    may be a
    record
                Dr. Zhijiang Dong @ Middle Tennessee State U   90
                niversity
Record
•   A record may also have
    components that are arrays or
    other records, leading to
    records that have a
    hierarchical structure
    consisting of a top level of
    components, some of which
    may be arrays and records
                        Dr. Zhijiang Dong @ Middle Tennessee State U   91
                        niversity
Nested Record
•   Pay attention to Nested record. They behave differently in
    value mode and reference model of variables
•   The program below prints 0 for value model of variables, 7 for
    reference model of variables.
    type
           T = record
                  j : integer;
           end;
           S = record
                  I : integer;
                  n : T;
           end;
    var s1, s2 : S;
    …
    s1.n.j := 0;
    s2 := s1;
    s2.n.j := 7;
    writeln(s1.n.j);
                        Dr. Zhijiang Dong @ Middle Tennessee State U   92
                        niversity
Record Operations
   Component selection is the basic operation on a
    record.
       Typically, using dot operator and the filed name
       Can be implemented easily
         Field name is known during translation
         The size of each component and its position within the
          storage block can be determined during translation
   Most commonly assignment of records of identical
    structure is provided
       Can be implemented as a simple copy of the contents of
        the storage block representing the first record into the
        storage block representing the second record
                        Dr. Zhijiang Dong @ Middle Tennessee State U   93
                        niversity
Variant Records
typedef union {
  int X;
  float Y;
  char Z[4];} B;
B P;
   Similar to records, except all have overlapping (same) L-value.
   But problems can occur. What happens below?
       P.X = 142;
       printf(“%O\n”, P.Z[3])
   All 3 data objects have same L-value and occupy same storage. No
    enforcement of type checking.
     Poor language design
                          Dr. Zhijiang Dong @ Middle Tennessee State U   94
                          niversity
Variant Records
type PayType=(Salaried, Hourly);
var Employee:record
   ID: integer;
   Dept: array[1..3] of char;
   Age: integer;
   case PayClass: PayType of
      Salaried:(MonthlyRate:real;
  StartDate:integer);
     Hourly:(HourRate:real;
                Reg:integer;
  Overtime:integer)
  end
                      Dr. Zhijiang Dong @ Middle Tennessee State U   95
                      niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   96
                 niversity
Lists
   List: A data structure composed of an ordered sequence of data
    structures
     Lists are rarely of fixed length
     Lists are rarely homogeneous
     Languages that use lists typically declare such data implicitly
       without explicit attributes for list members.
   LISP list are heterogeneous: any object may be placed in a list
    syntax: (a b c)
   MP list are homogeneous
    syntax: [a, b, c]
   Lists are primitive data objects in languages like ML, LISP,
    Prolog, Python, but do not appear in the usual compiled
    languages like C, Pascal, or Ada.
                        Dr. Zhijiang Dong @ Middle Tennessee State U    97
                        niversity
List Operations
 Test a list to see if it is empty
 Return the length of a list
 Return the nth element of a list
 Return a list consisting of all but the first n
  elements
 Reverse the order of the elements of a list
 Search a list for elements matching some
  predicate
 Apply a function to every element of a list
                 Dr. Zhijiang Dong @ Middle Tennessee State U   98
                 niversity
List:
Storage Representation for LISP
            Dr. Zhijiang Dong @ Middle Tennessee State U   99
            niversity
Outline
 DataTypes
 Elementary Data Types
    Scalar Data Types
    Composite Data Types
 Structured   Data Types
    Array
    Record
    List
    Set
                 Dr. Zhijiang Dong @ Middle Tennessee State U   100
                 niversity
Sets
   Set : a data object containing an unordered
    collection of distinct values.
   Basic Operations
       Membership: Is data value X a member of set S?
       Insertion and deletion of single value
       Union, intersection, and difference of sets.
   Note that accessing of components of a set by
    subscript or relative position plays no part in set
    processing.
                      Dr. Zhijiang Dong @ Middle Tennessee State U   101
                      niversity
Sets
   Pascal supports sets of any discrete type, and provides union,
    intersection, and difference operations:
    var A, B, C : set of char
            D, E : set of weekday;
    …
    A := B + C; //union
    A := B * C; //intersection
    A := B – C; //difference
   Python supports sets of arbitrary type
   Sets appear in the standard libraries of many object-oriented
    languages, including C++, Java, and C#.
                        Dr. Zhijiang Dong @ Middle Tennessee State U   102
                        niversity
Set Implementation
 There   are many ways to implement sets
    Array
    Hash tables
    Various forms of trees
                  Dr. Zhijiang Dong @ Middle Tennessee State U   103
                  niversity