CSCI 3210: Theory of
Programming Language
Data Types
Based on Textbook Slides
Dr. Zhijiang Dong @ Middle Tennessee State U 1
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 2
niversity
Data Types
Each data object has a type:
Values: for objects of that type
Operations: for objects of that type
Implementation: (Storage representation) for objects of that type
Attributes: (e.g., name) for objects of that type
Signature: (of operation f): f: type x type type
Dr. Zhijiang Dong @ Middle Tennessee State U 3
niversity
Data Type
A data type is a class of data objects together
with a set of operations for creating and
manipulating them.
Every language has a set of primitive data
types that are built into the language.
In addition, a language may provide facilities
to allow the programmer to define new data
types.
Dr. Zhijiang Dong @ Middle Tennessee State U 4
niversity
Data Type Specification
Thebasic elements of a specification of a
data type are as follows:
The attributes that distinguish data objects of that
type,
The values that data objects of that type may
have, and
The operations that define the possible
manipulations of data objects of that type.
Dr. Zhijiang Dong @ Middle Tennessee State U 5
niversity
Data Type Specification
Specification of an array data type
Attributes
the number of dimensions
the subscript range for each dimension
the data type of the components
Values
The set of numbers that form valid values for array
components
Operations
Subscripting to select individual array components
Create arrays
Access attributes such as upper and lower bounds of
subscripts
Dr. Zhijiang Dong @ Middle Tennessee State U 6
niversity
Data Type Implementation
Thebasic elements of the implementation of a
data type are as follows:
The storage representation that is used to
represent the data objects of the data type in the
storage of the computer during program execution,
and
The manner in which the operations defined for the
data type are represented in terms of particular
algorithms or procedures that manipulate the
chosen storage representation of the data objects.
Dr. Zhijiang Dong @ Middle Tennessee State U 7
niversity
Declaration
Both specification and implementation are largely independent of
the particular syntactic forms used in the language.
Attributes of data objects are often represented syntactically by
declarations or type definitions.
A declaration is a program statement that serves to communicate
to the language translator information about the name and type
of data objects needed during program execution.
Explicit declaration
Like variable declaration in C
Implicit/default declaration
Declaration that hold when no explicit declaration is given.
In FORTRAN, a simple variable INDEX may be used without
explicit declaration. By default, it is assumed as an integer
variable.
Dr. Zhijiang Dong @ Middle Tennessee State U 8
niversity
Declaration
Declarations serve several important
purposes
Choice of storage representations
Storage management
Polymorphic operations
Type checking
Dr. Zhijiang Dong @ Middle Tennessee State U 9
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 10
niversity
Elementary Data Types
An elementary data object contains a single data
value.
A class of such data objects over which various
operations are defined is termed an elementary data
type.
Integer
Real
Character
Boolean
Enumeration
pointer
Dr. Zhijiang Dong @ Middle Tennessee State U 11
niversity
Elementary Data Types:
Specification
Attributes
Basic attributes of any data object, such as data type and
name, are usually invariant during its lifetime.
Some of the attributes may be stored in a descriptor as part
of the data object during program execution.
Values
The type of a data object determines the set of possible
values that it may contain.
The set of values defined by an elementary data type is
usually an ordered set with a least value and a greatest
value; for any pair of distinct values, one is greater than the
other.
Dr. Zhijiang Dong @ Middle Tennessee State U 12
niversity
Elementary Data Types:
Specification
Operations
The operations may be primitive operations,
which means they are specified as part of the
language definition, or
They may be programmer-defined operations, in
the form of subprograms or method declarations
as part of class definitions.
Dr. Zhijiang Dong @ Middle Tennessee State U 13
niversity
Elementary Data Types:
Implementation
Storage representation
Strongly influenced by the underlying computer that will execute
the program
The attributes of elementary data objects
For efficiency, data attributes can be determined by the compiler. (like
C)
For flexibility, data attributes can be stored in a descriptor as part of
the data object at run time. (like LISP, Prolog)
Operations
Each operation may be implemented in one of three main ways:
Directly as a hardware operation
As a procedure or function subprogram
As an inline code sequence
Dr. Zhijiang Dong @ Middle Tennessee State U 14
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 15
niversity
Scalar Data Types
Scalardata objects: objects that have a
single attribute for its data object
Numeric Data Types
Integers
Subranges
Floating-point real numbers
Enumerations
Boolean
Character
Dr. Zhijiang Dong @ Middle Tennessee State U 16
niversity
Numeric Data Types: Integers
Specification
The set of integer values defined for the type forms an
ordered subset, within some finite bounds, of the infinite set
of integers studied in mathematics.
The range is typically defined as constants
May have different specification: short, int, long, and char
Operations
Arithmetic operations: +, -, *, /, %, negation (-), identify (+)
Relational operations: >, >=, <, <=, ==, !=
Assignment operations: =
Bit operations: &, |, ^, <<, >>
Dr. Zhijiang Dong @ Middle Tennessee State U 17
niversity
Numeric Data Types: Integers
Implementation
Using hardware-defined integer storage representation and
set of hardware arithmetic and relational primitive
operations on integers
Three possible approaches as indicated in the next
slide
First one requires declarations and static type checking for
integer data objects
Second one requires more space (LISP use this approach)
Third one cannot use arithmetic operations cannot be used
directly
Dr. Zhijiang Dong @ Middle Tennessee State U 18
niversity
Numeric Data Types: Integers
Dr. Zhijiang Dong @ Middle Tennessee State U 19
niversity
Numeric Data Types:
Subranges
A subrange of an integer data type is a
subtype of the integer data type and consists
of a sequence of integer values within some
restricted range.
Specification
Declaration:
A : 1 .. 10 (Pascal)
A : integer range 1..10 (Ada)
Operations: the same set of operations for
ordinary integer types can be applied
Dr. Zhijiang Dong @ Middle Tennessee State U 20
niversity
Numeric Data Types:
Subranges
Implementation
Usually stored in fewer bits than a general integer value
However, subrange values are often represented as the
smallest number of bits for which the hardware implements
arithmetic operations. This can generally be 8 or 16 bits.
Better type checking
MonthVar := 0 is illegal if Month: 1 .. 12. This can
be performed at compile time
If check involves computed values, run-time checking is
needed to determine whether the new value is still iwthin
the bounds declared. For example:
MonthVar := MonthVar + 1
Dr. Zhijiang Dong @ Middle Tennessee State U 21
niversity
Numeric Data Types: Floating-
Point Real Numbers
Specification
The values form an ordered sequence from some hardware-
determined minimum negative value to a maximum value, but the
values are not distributed evenly across this range
Often specified with only the single data type attribute real
(FORTRAN), or float (C)
Alternatively, the precision can be specified in terms of the
number of digits used in the decimal representation. (Ada)
Operations
The same arithmetic, relational and assignment operations described
for integers are also provided for reals.
Due to roundoff issues, boolean operations , especially equality are
sometimes restricted.
Language designers may probibit equality between two real numbers
Dr. Zhijiang Dong @ Middle Tennessee State U 22
niversity
Numeric Data Types: Floating-
Point Real Numbers
Implementations
Using scientific notation: any number N can be expressed
as: N = m x 2k for m between 0 and 1 & for some integer k
Based on an underlying hardware representation in which
a storage location is divided into a mantissa (i.e. the
significant digits of the number) and an exponent.
Dr. Zhijiang Dong @ Middle Tennessee State U 23
niversity
Numeric Data Types: Floating-
Point Real Numbers
IEEE standard 754 specifies both a 32- and 64-bit
standard.
Numbers consist of three fields:
S: a one-bit sign field. 0 is positive.
E: an exponent in excess-127 notation. Values (8 bits)
range from 0 to 255, corresponding to exponents of 2 that
range from -127 to 128.
M: a mantissa of 23 bits. Since the first bit of the mantissa
in a normalized number is always 1, it can be omitted and
inserted automatically by the hardware, yielding an extra
24th bit of precision.
Dr. Zhijiang Dong @ Middle Tennessee State U 24
niversity
Numeric Data Types: Floating-
Point Real Numbers
Decoding IEEE format
Given E, and M, the value of the representation is:
Parameters Value
E=255 and M 0 An invalid number
E=255 and M = 0
0<E<255 2{E-127}(1.M)
E=0 and M 0 2 {-126}.M
E=0 and M=0 0
Dr. Zhijiang Dong @ Middle Tennessee State U 25
niversity
Numeric Data Types: Floating-
Point Real Numbers
+1= 20*1= 2{127-127}*(1).0 (binary) 0 01111111 000000...
+1.5= 20*1.5= 2{127-127}*(1).1 (binary) 0 01111111 100000...
-5= -22*1.25= 2{129-127}*(1).01 (binary)1 10000001 010000...
This gives a range from 10-38 to 1038.
In 64-bit format,the exponent is extended to 11 bits giving a
range from -1022 to +1023, yielding numbers in the range 10 -
308
to 10308.
Dr. Zhijiang Dong @ Middle Tennessee State U 26
niversity
Numeric Data Types: Floating-
Point Real Numbers
Converting Decimal Fractions to Binary (by converting the decimal value .625 to a
binary representation)
Step 1: Begin with the decimal fraction and multiply by 2. The whole number part of the
result is the first binary digit to the right of the point.
Because .625 x 2 = 1.25, the first binary digit to the right of the point is a 1.
So far, we have .625 = .1??? . . . (base 2) .
Step 2: Disregard the whole number part of the previous result (the 1 in this case) and
multiply by 2 once again. The whole number part of this new result is the second binary
digit to the right of the point.
Because .25 x 2 = 0.50, the second binary digit to the right of the point is a 0.
So far, we have .625 = .10?? . . . (base 2) .
Step 3: Disregarding the whole number part of the previous result (this result was .50 so
there actually is no whole number part to disregard in this case), we multiply by 2 once
again. The whole number part of the result is now the next binary digit to the right of the
point.
Because .50 x 2 = 1.00, the third binary digit to the right of the point is a 1.
So now we have .625 = .101?? . . . (base 2) .
Step 4: In fact, we do not need a Step 4. We are finished in Step 3, because we had 0 as
the fractional part of our result there.
Hence the representation of .625 = .101 (base 2) .
Dr. Zhijiang Dong @ Middle Tennessee State U 27
niversity
Numeric Data Types:
Enumerations
Specification
An enumeration is an ordered set of distinct symbolic values
Need to defines both the literal names to be used for the values
and their ordering
C:
Enum StudentClass {Fresh, Soph, Junior, Senior};
Pascal
type StudentClass = (Fresh, Soph, Junior, Senior);
The type definition introduces the type name StudentClass,
which may be used wherever a primitive type name might be
used.
It also introduces the literals of Fresh, Soph, Junior, and Senior.
Operations
Full set of relational operations
Assignment, and operations like successor and predecessor
Dr. Zhijiang Dong @ Middle Tennessee State U 28
niversity
Numeric Data Types:
Enumerations
Implementation
Each value in the enumeration sequence is represented at
run time by one of the integers 0, 1, 2….
The successor and predecessor operations involve simply
adding or subtracting one from the integer representing the
value and checking to see that the result is within the
proper range.
Hardware-provided operations on integers can be used to
implement the basic operations on enumerations.
Dr. Zhijiang Dong @ Middle Tennessee State U 29
niversity
Numeric Data Types: Booleans
Specification
The boolean data type consists of data objects having one
of two values – true or false.
In Pascal and Ada, the boolean data type is considered
simply a language-defined enumeration
Operations
AND, OR, NOT,
Dr. Zhijiang Dong @ Middle Tennessee State U 30
niversity
Numeric Data Types: Booleans
Implementation
Theoretically, a single bit of storage is enough
However, the storage representation is often extended
to be a single addressable unit such as a byte,
because single bits may not be separately
addressable in memory
true and false might be represented in two ways within
this storage unit:
A particular bit is used for the value, and the rest is ignored;
or
A zero value in the entire storage unit represents false, and
any other nonzero value represents true.
Dr. Zhijiang Dong @ Middle Tennessee State U 31
niversity
Numeric Data Types:
Characters
Specification
A character data type provides data objects that have
a single character as their value.
The set of possible character values is usually taken
to be a language-defined enumeration corresponding
to the standard character set supported by the
underlying hardware and operating systems.
The ordering of the characters in this character set is
called collating sequence for the character set.
Operations
Relational operations, assignment
Dr. Zhijiang Dong @ Middle Tennessee State U 32
niversity
Numeric Data Types:
Characters
Implementation
Character data values are almost always directly
supported by the underlying hardware and operating
system because of their use in input-output.
Dr. Zhijiang Dong @ Middle Tennessee State U 33
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 34
niversity
Composite Data Types
Composite Data Types
Multiple attributes are often given for each such data type
Their implementation usually involves a complex data
structure organization by the compiler
The following composite data types are usually
considered elementary data objects
Character strings
Pointers
User-defined Data types
Files
Dr. Zhijiang Dong @ Middle Tennessee State U 35
niversity
Composite Data Types:
Character String
Character String
A data object composed of a sequence of characters
Different types of character string
Fixed declared length
Variable length to a declared bound
Unbounded length
Examples
Fixed length:
char A(10) - C
DCL B CHAR(10) - PL/I
var C packed array [1..10] of char - Pascal
Variable length:
DCL D CHAR(20) VARYING - PL/I - 0 to 20 characters
E = “ABC” - SNOBOL4 - any size, dynamic
F = `ABCDEFG\0' - C - any size, programmer defined
Dr. Zhijiang Dong @ Middle Tennessee State U 36
niversity
Composite Data Types:
Character String
The string in C is actually a bit more complicated.
Strings are arrays of characters, but C has no string
declaration.
If an array of characters need to be treated as a C string, it
is programmer’s responsibility to make sure the array
include the null character
A string constant written as “string liter” will have the null
character appended to the string constant by the C
translator.
Dr. Zhijiang Dong @ Middle Tennessee State U 37
niversity
Composite Data Types:
Character String
Operations
Concatenation
Relational operations on strings based on lexicographic ordering
Substring selection using positioning subscripts
Providing the positions of its first and last characters; or
Providing the first character position and length of the substring
Substring selection using pattern matching
The pattern specifies the form of the substring desired
Dr. Zhijiang Dong @ Middle Tennessee State U 38
niversity
Composite Data Types:
Character String
Operations
Input and output formatting
Dynamic strings
In Perl, string can be static and dynamic
print ‘$ABC’ will output $ABC
print “$ABC” will cause the string to be evaluated, and the
value of the Perl variable $ABC is printed instead
Dr. Zhijiang Dong @ Middle Tennessee State U 39
niversity
Composite Data Types:
Character String
Storage representation for strings
Dr. Zhijiang Dong @ Middle Tennessee State U 40
niversity
Composite Data Types:
Pointers
Pointer
A data object whose value is a reference to some
object
First introduced in PL/I
Pointers serve two purposes:
efficient (and sometimes intuitive) access to
elaborated objects (as in C)
dynamic creation of linked data structures, in
conjunction with a heap storage manager
Dr. Zhijiang Dong @ Middle Tennessee State U 41
niversity
Composite Data Types:
Pointers
In some languages (Pascal, Ada 83, and Modula-3), pointers are
restricted to point only to objects in the heap.
In other languages (PL/I, Algol 68, C, C++, and Ada 95), one can
create a pointer to a nonheap object by using an “address of”
operator.
How and when is storage reclaimed for objects that are no longer
needed?
Programmer needs to reclaim space explicitly. (C, C++, Pascal,
Modula-2)
May cause memory leak and/or dangling reference problem
Language implementation to reclaim unused objects automatically,
as known garbage collection. (Modula-3, Java, C#, all
functional and scripting languages)
Impose certain run-time costs, and
Raises the question of how the language implementation is to
distinguish garbage from active objects.
Dr. Zhijiang Dong @ Middle Tennessee State U 42
niversity
Composite Data Types:
Pointers
Operations:
Allocation
Deallocation
Dereferencing of pointers to access the objects to which
they point
Assignment of one pointer to another
The behavior of these operations depends heavily
on
Whether the language is functional or imperative, and
Whether it employs a reference or value model for
variables/names
Dr. Zhijiang Dong @ Middle Tennessee State U 43
niversity
Composite Data Types:
Pointers
Value model for variables/names
A variable contains a value.
Reference model for variables/names
A variable refers to an object with a value.
Variables have to be dereferenced to get their r-value; however, the
dereferencing is implicit in most languages, so the distinction is subtle.
Example:
b := 2;
c := b;
a := b + c;
Value model: put the value 2 in b and then copy it into c. We then read these
values, add them together, and place the resulting 4 in a.
Reference Model: let b refer to 2 and then let c refer to it also. We then pass
these reference to the + operator, and let a refer to the result, namely 4.
Dr. Zhijiang Dong @ Middle Tennessee State U 44
niversity
Composite Data Types:
Pointers
Functional languages generally employ a reference model for
names
A pure functional language has no variables or assignments
Variables in an imperative language may use either a value or a
reference mode, or some combination of the two.
C, Pascal, and Ada employ a value model
Clu and SmallTalk employe a reference model
Java charts an intermediate, in which the usual implementation of
the reference model is made explicit in the language semantics
Variables of build-in Java types (integers, floating-point numbers,
characters, and Booleans) employ a value model
Variable of user-defined types (strings, arrays, and other objects in
the object-oriented sense of the word) employ a reference model
Dr. Zhijiang Dong @ Middle Tennessee State U 45
niversity
Composite Data Types:
Pointers
Implementation
Two major storage representations are used for
pointer values
Absolute addresses: a pointer value may be
represented as the actual memory address of the
storage block the foe data object
Relative addresses: a pointer value may be
represented as an offset from the base address of
some larger heap storage block within which the data
object is allocated.
Dr. Zhijiang Dong @ Middle Tennessee State U 46
niversity
Composite Data Types:
Pointers
Pointers and arrays are closely linked in C.
In most contexts, an unsubscripted array name in C is
automatically converted to a pointer to the array’ first
element (the one with index 0)
Pointer arithmetic: Given a pointer to an element of
an array, the addition of an integer k produces a pointer to
the element k positions later in the array (earlier if k is
negative)
Remarkably, the subscript operator [ ] in C is actually
defined in terms of pointer arithmetic.
Correctness requires only that one operand of [ ] have an
array or pointer type and the other have an integral type.
Thus, A[3] is equivalent to 3[A].
Dr. Zhijiang Dong @ Middle Tennessee State U 47
niversity
Composite Data Types:
Pointers
C pointers and arrays
int *a == int a[]
int **a == int *a[]
BUT equivalences don't always hold
Specifically, a declaration allocates an array if it
specifies a size for the first dimension
otherwise it allocates a pointer
int **a, int *a[] pointer to pointer to int
int *a[n], n-element array of row pointers
int a[n][m], 2-d array
Dr. Zhijiang Dong @ Middle Tennessee State U 48
niversity
Composite Data Types:
Pointers
Compiler has to be able to tell the size of the things to
which you point
So the following aren't valid:
int a[][]; bad
int (*a)[]; bad
C declaration Right Left rule: read right as far as you
can (subject to parentheses), then left, then out a level
and repeat
int *a[n]; n-element array of pointers to
integer
int (*a)[n]; pointer to n-element array of
integers
Dr. Zhijiang Dong @ Middle Tennessee State U 49
niversity
Composite Data Types: Files
A file is a data structure with two special properties:
It ordinarily is represented on a secondary storage
device such a a disk or tape and thus may be much
larger than most data structures of other types
Its lifetime may encompass a greater span of time
than that of the program creating it.
Two general use for files
For input and output of data to an external operating
environment
As temporary scratch storage for data when not
enough high-speed memory is available
Dr. Zhijiang Dong @ Middle Tennessee State U 50
niversity
Composite Data Types: Files
Different types of files
Sequential files: a data structure composed of a
linear sequence of components of the same type
Direct-Access files: any single component may be
accessed at random, just as in an array or a
record
Indexed Sequential files: Similar to a direct-
access file, with the additional facility to access
components in sequence, beginning from a
component selected at random.
Dr. Zhijiang Dong @ Middle Tennessee State U 51
niversity
Composite Data Types: Files
Declaration
Master: file of EmployeeRec; (Pascal)
Defines a file variable named Master, whose
components are of type EmployeeRec.
Tyically,
the file may be accessed in either
read mode or write mode.
In either mode, there is a file-position
pointer that designates a position before the
first file component, between two components, or
after the last component.
Dr. Zhijiang Dong @ Middle Tennessee State U 52
niversity
Composite Data Types: Files
Operations
Open
Read
Write
End-of-file test
Close
Dr. Zhijiang Dong @ Middle Tennessee State U 53
niversity
Composite Data Types: Files
Implementation
The underlying operating system has the primary
responsibility for the implementation of files because
files are created and manipulated by various
programming language processors and utilities.
File operations are primarily implemented by calls on
primitives provided by the operating system.
From the language viewpoint, the primary implementation
problem comes from the need to provide storage for system
data and buffers required by the operating system primitives.
Dr. Zhijiang Dong @ Middle Tennessee State U 54
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 55
niversity
Structured Data Types
The current goal in programming language design is
to make the distinctions in the various forms of data
transparent to the programmer who uses the data
types.
Four basic mechanisms exist to provide the
programmer with the ability to create new data types
and operations on that type.
Structured data
Subprograms
Type declarations
Inheritance
Dr. Zhijiang Dong @ Middle Tennessee State U 56
niversity
Structured Data Types
A data object that is constructed as an aggregate of other data
objects, called component, is termed a structured data object or
data structure.
A component may be elementary or another data structure
The bindings of data structures to values, names, and locations
are important and somewhat more complex comparing to
elementary data types
How to indicate the component data objects of a data structure
and their relationships in such a way that selection of a
component is straightforward?
Many operations on data structures bring up storage
management issues that are not present for elementary data
objects.
Dr. Zhijiang Dong @ Middle Tennessee State U 57
niversity
Structured Data Types
Structured Data Types
Vectors
Arrays
Records
Lists
Sets
Dr. Zhijiang Dong @ Middle Tennessee State U 58
niversity
Structured Data Types:
Specification
The major attributes for specifying data structures
include the following
Number of components
A data structure maybe fixed size such as array, record, or
variable size such as lists, stacks, sets
Variable-size data structure types usually provide insert and
delete operations
Type of each component
Homogeneous: all components of the same type
Heterogeneous: components of different types
Mechanism to be used for selecting components
Maximum number of components
Organization of the components
Dr. Zhijiang Dong @ Middle Tennessee State U 59
niversity
Structured Data Types:
Specification
Specification of the domain and range of operations on data
structure types may be given in much the same manner as for
elementary types.
Some new classes of operations are of particular importance:
Component selection operations
Random selection: an arbitrary component can be accessed
Sequential selection: components are selected in a predetermined
order
Whole data structure operations
Operations take entire data structures as arguments and produce
new data structures as results
Examples: addition of two arrays, copy record, union of two sets
Insertion/deletion of components
Creation/destruction of data structures
Dr. Zhijiang Dong @ Middle Tennessee State U 60
niversity
Structured Data Types:
Implementation
Implementation consideration for data structure
types include the same issues as for elementary
types.
In addition, two new issues develop that strongly
affect the choice of storage representation
Efficient selection of components from a data structure,
and
Efficient overall storage management for the language
implementation.
Dr. Zhijiang Dong @ Middle Tennessee State U 61
niversity
Implementation:
Storage Representation
Storage representations for a data structure include
Storage for the components of the structure, and
An optional descriptor that stores some or all of the attributes of
the structure
Two basic storage representations
Sequential representation
Stored in a single contiguous block of storage that includes both
description and components
Typically used for fixed-size structures and sometimes for
homogeneous variable-size structures such as strings or stacks
Linked representation
Stored in several noncontiguous blocks of storage, with the blocks
linked together through pointers.
Commonly used for variable-size structures such as lists
Dr. Zhijiang Dong @ Middle Tennessee State U 62
niversity
Implementation:
Storage Representation
Dr. Zhijiang Dong @ Middle Tennessee State U 63
niversity
Implementation: Operations
Implementation of operations on data structures
Component selection is of primary importance
Efficiency of both random-selection and sequential-
selection operations is needed
Sequential-selection should more efficiently than simply as
a sequence of random selections
These two basic types of component selection are
implemented differently for sequential and linked storage
representations
Dr. Zhijiang Dong @ Middle Tennessee State U 64
niversity
Operation Implementation:
Sequential Representation
Random selection involves
a base-address-plus-
offset calculation using
an accessing formula.
The relative location of the
selected component within
the sequential block is
called its offset.
Dr. Zhijiang Dong @ Middle Tennessee State U 65
niversity
Operation Implementation:
Sequential Representation
• Selection of a sequence of components from the
structure is possible by the following steps, for a
homogeneous structure such as an array
• To select the first component of the sequence, use the
base-address-plus-offset calculation
• To advance to the next component in the sequence,
add the size of the current component to the location
of the current component
Dr. Zhijiang Dong @ Middle Tennessee State U 66
niversity
Operation Implementation:
Linked Representation
• Random selection of a component from a linked
structure involves following a chain of pointer from
the first block of storage in the structure to the
desired component.
• Selection of a sequence of components proceeds by
selecting the first component and then following the
link pointer from the current component to the next
component for each subsequence selection
Dr. Zhijiang Dong @ Middle Tennessee State U 67
niversity
Type Checking on Structured
Data Types
• Type checking on structured data types is more complex
because component selection operations must be taken into
account
• Existence of a selected component
• The arguments to a selection operation may be of the right types,
but the component designated may not exist in the data structure
• Type of a selected component
• A selection sequence may define a complex path through a data
structure to the desired component. For example:
A[2][3].link->item
• To perform static type checking, the translator should be able to
determine the type of component selected by any valid
composite selector
Dr. Zhijiang Dong @ Middle Tennessee State U 68
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 69
niversity
Array
Arrays are the most common and important composite
data types
arrays are usually homogeneous
Semantically, they can be thought of as a mapping
from an index type to a component or element type
Array declaration typically specify the following
attributes
Number of components
Data type of each component
Subscript to be used to select each component
Maybe a range of values as -5 .. 5, or
An upper bound with an implied lower bound
Dr. Zhijiang Dong @ Middle Tennessee State U 70
niversity
Array Declaration
• Pascal
V: array [-5 .. 5] of real;
• C
Float a[10];
• FORTRAN
character, dimension (1:26) :: upper
character (26) upper !shorthand notation
• Ada
upper: array (character range ‘a’ .. ‘z’) of
character;
Dr. Zhijiang Dong @ Middle Tennessee State U 71
niversity
Array Storage
Array in most language implementations are stored in contiguous
locations in memory
One dimensional array
ith element is stored immediately after the (i-1) th element. (subject
to alignment constraints)
Two dimensional array
Row major order (Most programming languages)
A[i, j] is followed by A[I, j+1] immediately
Column major order (FORTHAN)
A[I, j] is followed by A[i+1, j] immediately
Row major order is more popular
The difference between row- and column-major layout can be
important for programs that use nested loops to access all the
elements of a large, multi-dimensional array. (efficiency because
of cache)
Dr. Zhijiang Dong @ Middle Tennessee State U 72
niversity
Array Storage
Row-major order vs. Column-major order
Dr. Zhijiang Dong @ Middle Tennessee State U 73
niversity
Array Storage: Alternatives
Row-pointer layout is an alternative to contiguous allocation.
Rather than require the row of an array to be adjacent, they allow
them to lie anywhere in memory, and create an auxiliary array of
pointers to the rows.
Advantages of row-pointer layout
Individual elements can be access more quickly, especially on
CISC machines with slow multiplication instruction
Rows may have different lengths, without devoting space to
holes at the ends of the rows. (called ragged array)
An array can be constructed from preexisting rows without
copying.
Disadvantages: Requires more space
C, C++, and C# provide both contiguous and row-pointer
organizations for multidimensional arrays.
Java uses the row-pointer layout for all arrays
Dr. Zhijiang Dong @ Middle Tennessee State U 74
niversity
Array Storage Comparison
Contiguous array allocation v. row pointers in C. The declaration on the left is a tr ue two-dimensional array. The slashed
boxes are NUL bytes; the shaded areas are holes. The declaration on the right is a ragged array of pointers to arrays of
character s. In both cases, we have omitted bounds in the declaration that can be deduced from the size of the
initializer (aggregate). Both data structures permit individual characters to be accessed using double subscripts, but the
memory layout (and corresponding address arithmetic) is quite different.
Dr. Zhijiang Dong @ Middle Tennessee State U 75
niversity
Accessing Array Elements
Dr. Zhijiang Dong @ Middle Tennessee State U 76
niversity
Accessing Array Elements:
Virtual Origin
Rewriting access equation:
L-value(A[I,J]) = - d1*L1 - d2*L2 +I*d1 + J*d2
Set I = 0; J= 0;
L-value(A[0,0]) = - d1*L1 - d2*L2 +0*d1 + 0*d2
= - d1*L1 - d2*L2, which is a constant.
Call this constant the virtual origin (VO); It
represents the address of the 0th element of the
array.
L-value(A[I,J]) = VO +I*d1 + J*d2
Dr. Zhijiang Dong @ Middle Tennessee State U 77
niversity
Dope Vectors
Dope vector is used to store array shape information
such as
Lower bound of each dimension
The size of each dimension
Upper bound of each dimension (optional, can be included
to avoid computing repeatedly at runtime)
To perform semantic checking as well as address
calculation.
If these information are statically known, the translator can
look them up in the dope vector
If these information are not statically known, the translator
must generate code to look them up in a dope vector at run
time.
Dr. Zhijiang Dong @ Middle Tennessee State U 78
niversity
Array Storage Representation
One dimensional array
Right figure: Two dimensional array
Dr. Zhijiang Dong @ Middle Tennessee State U 79
niversity
Slices
A slice or section is a rectangular portion of
an array.
Fortran 90 and provides extensive facilities
for slicing, as do many scripting languages,
including Perl, Python, Ruby, and R.
Adaprovides more limited support: a lice is
simply a contiguous range of elements in a
one-dimensional array.
Dr. Zhijiang Dong @ Middle Tennessee State U 80
niversity
Slices
Dr. Zhijiang Dong @ Middle Tennessee State U 81
niversity
Array Slices in Fortran 90
The slashes in the second
subscript delimit an explicit
list of positions.
a: b: c in a subscript indicates positions a, a+c, a+2c, ...through b.
If a or b is omitted, the corresponding bound of the array is assumed.
If c is omitted, 1 is assumed.
It is even possible to use negative values of c in order to select positions in reverse order.
Dr. Zhijiang Dong @ Middle Tennessee State U 82
niversity
Array Operations
Most programming languages permit selection of an
element and assignment operations on arrays
A few languages (Ada, Fortran 90) allow arrays to
be compared for equality
Ada even allows one-dimensional arrays whose elements
are discrete to be compared for lexicographic ordering.
Fortran 90 has a very rich set of array operations
that take entire arrays as arguments such as A + B.
Dr. Zhijiang Dong @ Middle Tennessee State U 83
niversity
Associative Arrays
In some applications, there is a desire to
access information by name without having a
predefined ordering or enumeration of the
appropriate subscripts.
An associative array (map, table, or
dictionary) is an abstract data type composed
of a collection of (key, value) pairs, such that
each possible key appears at most once in
the collection.
Dr. Zhijiang Dong @ Middle Tennessee State U 84
niversity
Associative Arrays
In Perl, an associative array is created with
the % operators
%ClassList = (
“Michelle”, ‘A’,
“Doris”, ‘B’,
“Michael”, ‘D’ );
It creates a three-element associative array:
Student name as the key, and grade as the
value.
Dr. Zhijiang Dong @ Middle Tennessee State U 85
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 86
niversity
Record
Record/Structure: A data structure composed of a
fixed number of components of different types.
usually laid out contiguously
possible holes for alignment reasons
smart compilers may re-arrange fields to minimize
holes (C compilers promise not to)
Record specification
Type of each component (called field)
Name of each component (called field name)
The number of components (implied)
Name of the structure
Dr. Zhijiang Dong @ Middle Tennessee State U 87
niversity
Record Declaration
struct element { type element = record
char name[2]; name : two_chars;
int atomic_number; atomic_number : integer;
double atomic_weight; atomic_weight : double;
_Bool metallic; metallic : Boolean;
}; /*C Version*/ }; /*Pascal Version*/
Dr. Zhijiang Dong @ Middle Tennessee State U 88
niversity
Record:
Storage Representation
• Likely layout in memory for objects of type element
on a 32-bit machine. Alignment restrictions lead to
the shaded “holes.”
Dr. Zhijiang Dong @ Middle Tennessee State U 89
niversity
Array of Record
• Array element
may be a
record
Dr. Zhijiang Dong @ Middle Tennessee State U 90
niversity
Record
• A record may also have
components that are arrays or
other records, leading to
records that have a
hierarchical structure
consisting of a top level of
components, some of which
may be arrays and records
Dr. Zhijiang Dong @ Middle Tennessee State U 91
niversity
Nested Record
• Pay attention to Nested record. They behave differently in
value mode and reference model of variables
• The program below prints 0 for value model of variables, 7 for
reference model of variables.
type
T = record
j : integer;
end;
S = record
I : integer;
n : T;
end;
var s1, s2 : S;
…
s1.n.j := 0;
s2 := s1;
s2.n.j := 7;
writeln(s1.n.j);
Dr. Zhijiang Dong @ Middle Tennessee State U 92
niversity
Record Operations
Component selection is the basic operation on a
record.
Typically, using dot operator and the filed name
Can be implemented easily
Field name is known during translation
The size of each component and its position within the
storage block can be determined during translation
Most commonly assignment of records of identical
structure is provided
Can be implemented as a simple copy of the contents of
the storage block representing the first record into the
storage block representing the second record
Dr. Zhijiang Dong @ Middle Tennessee State U 93
niversity
Variant Records
typedef union {
int X;
float Y;
char Z[4];} B;
B P;
Similar to records, except all have overlapping (same) L-value.
But problems can occur. What happens below?
P.X = 142;
printf(“%O\n”, P.Z[3])
All 3 data objects have same L-value and occupy same storage. No
enforcement of type checking.
Poor language design
Dr. Zhijiang Dong @ Middle Tennessee State U 94
niversity
Variant Records
type PayType=(Salaried, Hourly);
var Employee:record
ID: integer;
Dept: array[1..3] of char;
Age: integer;
case PayClass: PayType of
Salaried:(MonthlyRate:real;
StartDate:integer);
Hourly:(HourRate:real;
Reg:integer;
Overtime:integer)
end
Dr. Zhijiang Dong @ Middle Tennessee State U 95
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 96
niversity
Lists
List: A data structure composed of an ordered sequence of data
structures
Lists are rarely of fixed length
Lists are rarely homogeneous
Languages that use lists typically declare such data implicitly
without explicit attributes for list members.
LISP list are heterogeneous: any object may be placed in a list
syntax: (a b c)
MP list are homogeneous
syntax: [a, b, c]
Lists are primitive data objects in languages like ML, LISP,
Prolog, Python, but do not appear in the usual compiled
languages like C, Pascal, or Ada.
Dr. Zhijiang Dong @ Middle Tennessee State U 97
niversity
List Operations
Test a list to see if it is empty
Return the length of a list
Return the nth element of a list
Return a list consisting of all but the first n
elements
Reverse the order of the elements of a list
Search a list for elements matching some
predicate
Apply a function to every element of a list
Dr. Zhijiang Dong @ Middle Tennessee State U 98
niversity
List:
Storage Representation for LISP
Dr. Zhijiang Dong @ Middle Tennessee State U 99
niversity
Outline
DataTypes
Elementary Data Types
Scalar Data Types
Composite Data Types
Structured Data Types
Array
Record
List
Set
Dr. Zhijiang Dong @ Middle Tennessee State U 100
niversity
Sets
Set : a data object containing an unordered
collection of distinct values.
Basic Operations
Membership: Is data value X a member of set S?
Insertion and deletion of single value
Union, intersection, and difference of sets.
Note that accessing of components of a set by
subscript or relative position plays no part in set
processing.
Dr. Zhijiang Dong @ Middle Tennessee State U 101
niversity
Sets
Pascal supports sets of any discrete type, and provides union,
intersection, and difference operations:
var A, B, C : set of char
D, E : set of weekday;
…
A := B + C; //union
A := B * C; //intersection
A := B – C; //difference
Python supports sets of arbitrary type
Sets appear in the standard libraries of many object-oriented
languages, including C++, Java, and C#.
Dr. Zhijiang Dong @ Middle Tennessee State U 102
niversity
Set Implementation
There are many ways to implement sets
Array
Hash tables
Various forms of trees
Dr. Zhijiang Dong @ Middle Tennessee State U 103
niversity