KEMBAR78
05 Associative Arrays | PDF | Computer Data | Programming Paradigms
0% found this document useful (0 votes)
46 views5 pages

05 Associative Arrays

The document discusses data types, specifically focusing on multidimensional arrays and associative arrays, including their structure and operations in programming languages like Perl and Lua. Associative arrays, which are unordered collections indexed by keys, are compared to traditional arrays, highlighting their efficiency in searches and data pairing. Additionally, the document touches on the implementation of associative arrays in Perl and PHP, and introduces record types as a way to model collections of heterogeneous data.

Uploaded by

prow8273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views5 pages

05 Associative Arrays

The document discusses data types, specifically focusing on multidimensional arrays and associative arrays, including their structure and operations in programming languages like Perl and Lua. Associative arrays, which are unordered collections indexed by keys, are compared to traditional arrays, highlighting their efficiency in searches and data pairing. Additionally, the document touches on the implementation of associative arrays in Perl and PHP, and introduces record types as a way to model collections of heterogeneous data.

Uploaded by

prow8273
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

272 Chapter 6 Data Types

location(a[i, j]) = address of a[row_lb, col_lb]


- (((row_lb * n) + col_lb) * element_size)
+ (((i * n) + j) * element_size)

where the first two terms are the constant part and the last is the variable part.
This can be generalized relatively easily to an arbitrary number of dimensions.
For each dimension of an array, one add and one multiply instruction are
required for the access function. Therefore, accesses to elements of arrays with
several subscripts are costly. The compile-time descriptor for a multidimen-
sional array is shown in Figure 6.6.

Figure 6.6
A compile-time
descriptor for a
multidimensional array

6.6 Associative Arrays


An associative array is an unordered collection of data elements that are
indexed by an equal number of values called keys. In the case of non-associative
arrays, the indices never need to be stored (because of their regularity). In an
associative array, however, the user-defined keys must be stored in the structure.
So each element of an associative array is in fact a pair of entities, a key and a
value. We use Perl’s design of associative arrays to illustrate this data structure.
Associative arrays are also supported directly by Python, Ruby, and Lua and by
the standard class libraries of Java, C++, C#, and F#.
The only design issue that is specific for associative arrays is the form of
references to their elements.

6.6.1 Structure and Operations


In Perl, associative arrays are called hashes, because in the implementation
their elements are stored and retrieved with hash functions. The namespace
for Perl hashes is distinct: Every hash variable name must begin with a percent
sign (%). Each hash element consists of two parts: a key, which is a string, and
6.6 Associative Arrays 273

a value, which is a scalar (number, string, or reference). Hashes can be set to


literal values with the assignment statement, as in

%salaries = ("Gary" => 75000, "Perry" => 57000,


"Mary" => 55750, "Cedric" => 47850);

Individual element values are referenced using notation that is similar to


that used for Perl arrays. The key value is placed in braces and the hash name is
replaced by a scalar variable name that is the same except for the first character.
Although hashes are not scalars, the value parts of hash elements are scalars, so
references to hash element values use scalar names. Recall that scalar variable
names begin with dollar signs ($). For example,

$salaries{"Perry"} = 58850;

A new element is added using the same assignment statement form. An element
can be removed from the hash with the delete operator, as in

delete $salaries{"Gary"};

The entire hash can be emptied by assigning the empty literal to it, as in

@salaries = ();

The size of a Perl hash is dynamic: It grows when an element is added and
shrinks when an element is deleted, and also when it is emptied by assignment
of the empty literal. The exists operator returns true or false, depending on
whether its operand key is an element in the hash. For example,

if (exists $salaries{"Shelly"}) . . .

The keys operator, when applied to a hash, returns an array of the keys of
the hash. The values operator does the same for the values of the hash. The
each operator iterates over the element pairs of a hash.
Python’s associative arrays, which are called dictionaries, are similar
to those of Perl, except the values are all references to objects. The associa-
tive arrays supported by Ruby are similar to those of Python, except that
the keys can be any object,6 rather than just strings. There is a progression
from Perl’s hashes, in which the keys must be strings, to PHP’s arrays, in
which the keys can be integers or strings, to Ruby’s hashes, in which any
type object can be a key.
PHP’s arrays are both normal arrays and associative arrays. They can be
treated as either. The language provides functions that allow both indexed and

6. Objects that change do not make good keys, because the changes could change the hash
function value. Therefore, arrays and hashes are never used as keys.
i n te r vi ew

Lua
ROBERTO IERUSALIMSCHY
Roberto Ierusalimschy is one of the creators of the scripting language Lua, which
is used widely in game development and embedded systems applications. He is an
associate professor in the Department of Computer Science at Pontifícia Universi-
dade Católica do Rio de Janeiro in Brazil. (For more information about Lua, visit
www.lua.org.)

How and where did you first become involved with More recently, we have done some academic
computing? Before I entered college in 1978, I had no research with Lua. But it is a long process to merge
idea about computing. I remember that I tried to read these academic results into the official distribution;
a book on programming in Fortran but did not pass the more often than not these results have little direct
initial chapter on definitions for variables and constants. impact on Lua. Nevertheless, there have been some nice
In my first year in college I took a Programming exceptions, such as the register-based virtual machine
101 course in Fortran. At that time we ran our pro- and “ephemeron tables” (to appear in Lua 5.2).
gramming assignments in an IBM 370 mainframe. We
had to punch cards with our code, surround the deck You have said Lua was raised, rather than
with some fixed JCL cards and give it to an operator. designed. Can you comment on what you meant
Some time later (often a few hours) we got a listing with and what you think are the benefits of this
the results, which frequently were only compiler errors. approach? We meant that most important pieces of
Soon after that a friend of mine brought from Lua were not present in its first version. The language
abroad a microcomputer, a Z80 CPU with 4K bytes of started as a very small and simple language and got
memory. We started to do all kinds of programs for this several of its relevant features as it evolved.
machine, all in assembly—or, more exactly, in machine Before talking about the benefits (and the draw-
code, as it did not have an assembler. We wrote our backs) of this approach, let me make it clear that we
programs in assembly, then translated them by hand to did not choose that approach. We never thought, “let
hexadecimal to enter them into memory to run. us grow a new language.” It just happened.
Since then I was hooked. I guess that a most difficult part when designing a
language is to foresee how different mechanisms will
There have been few successful programming interact in daily use. By raising a language—that is,
languages designed in academic environments in creating it piece by piece—you may avoid most of those
the last 25 years. Although you are an academic, interaction problems, as you can think about each new
Lua was designed for very practical applications. feature only after the rest of the language is in place
Do you consider Lua an academic or an industrial and has been tested by real users in real applications.
language? Lua is certainly an industrial language, Of course, this approach has a major drawback, too:
but with an academic “accent.” Lua was created for You may arrive at a point where a most-needed new fea-
two industrial applications, and it has been used in ture is incompatible with what you already have in place.
industrial applications all its life. We tried to be very
pragmatic on its design. However, except for its first Lua has changed in a variety of ways since it was
version, we were never under the typical pressure from first released in 1994. You have said that there
an industrial environment. We always had the luxury of have been times when you regretted not including
choosing when to release a new version or of choosing a Boolean type in Lua. Why didn’t you simply add
whether to accept user demands. That gave us some one? This may sound funny, but what we really missed
latitude that other languages have not enjoyed. was the value “false”; we had no use for a “true” value.

274
Like the original LISP, Lua treated nil as false “gsub” it is easy to apply a given function to each
and everything else as true. The problem is that nil character in a string.
also represents an unitialized variable. There was no Instead of a special “block” mechanism for the
way to distinguish between an unitialized variable from iterator body, Lua has used first-class functions for the
a false variable. So, we needed a false value, to make task. See the next example:
that distinction possible. But the true value was use-
less; a 1 or any other constant was good enough. —'t' is a table from names to values
I guess this is a typical example where our “indus- —the next "loop" prints all keys with
trial” mind conflicted with our “academic” mind. A values greater than 10
really pragmatic mind would add the Boolean type foreach(t, function(key, value)
if value > 10 then print(key) end
without thinking twice. But our academic mind was
end)
upset by this inelegance. In the end the pragmatic side
won, but it took some time.
However, when we first implemented iterators, func-
What were the most important Lua features, tions in Lua did not have full lexical scoping. Moreover,
other than the preprocessor, that later became the syntax is a little heavy (macros would help). Also,
recognized as misfeatures and were removed from exit statements (break and return) are always confus-
the language? I do not remember other big misfea- ing when used inside iteration bodies. So, in the end we
tures. We did remove several features from Lua, but decided for the for statement.
mostly because they were superseded by a new, usually But “true iterators” are still a useful design in Lua,
“better” in some sense, feature. This happened with tag even more now that functions have proper lexical scop-
methods (superseded by metamethods), weak refer- ing. In my Lua book, I end the chapter about the for
ences in the C API (superseded by weak tables), and statement with a discussion of true iterators.
upvalues (superseded by proper lexical scoping).
Can you briefly describe what you mean when
When a new feature for Lua that would break you describe Lua as an extensible extension lan-
backward compatibility is considered, how is that guage? It is an “extensible language” because it is
decision made? These are always hard decisions. easy to register new functions and types defined in other
First, we try to find some other format that could avoid languages. So it is easy to extend the language. From a
or at least reduce the incompatibility. If that is not more concrete point of view, it is easy to call C from Lua.
possible, we try to provide easy ways around the incom- It is an “extension language” because it is easy to
patibility. (For instance, if we remove a function from use Lua to extend an application, to morph Lua into
the core library we may provide a separated implemen- a macro language for the application. (This is “script-
tation that the programmer may incorporate into her ing” in its purer meaning.) From a more concrete point
code.) Also, we try to measure how difficult it will be to of view, it is easy to call Lua from C.
detect and correct the incompatibility. If the new fea-
Data structures have evolved from arrays, records,
ture creates syntax errors (e.g., a new reserved word),
and hashes to combinations of these. Can you
that is not that bad; we may even provide an automatic
estimate the significance of Lua’s tables in the
tool to fix old code. However, if the new feature may
evolution of data structures in programming
produce subtle bugs (e.g., a preexisting function return-
languages? I do not think the Lua table has had any
ing a different result), we consider it unacceptable.
significance in the evolution of other languages. Maybe
Were iterator methods, like those of Ruby, con- that will change in the future, but I am not sure about
sidered for Lua, rather than the for statement it. In my view, the main benefit offered by Lua tables
that was added? What considerations led to the is its simplicity, an “all-in-one” solution. But this sim-
choice? They were not only considered, they were plicity has its costs: For instance, static analysis of
actually implemented! Since version 3.1 (from 1998), Lua programs is very hard, partially because of tables
Lua has had a function “foreach”, that applies a being so generic and ubiquitous. Each language has its
given function to all pairs in a table. Similarly, with own priorities.

275
276 Chapter 6 Data Types

hashed access to elements. An array can have elements that are created with
simple numeric indices and elements that are created with string hash keys.
In Lua, the table type is the only data structure. A Lua table is an associa-
tive array in which both the keys and the values can be any type. A table can be
used as a traditional array, an associative array, or a record (struct). When used
as a traditional array or an associative array, brackets are used around the keys.
When used as a record, the keys are the field names and references to fields can
use dot notation (record_name.field_name).
The use of Lua’s associative arrays as records is discussed in Section 6.7.
C# and F# support associative arrays through a .NET class.
An associative array is much better than an array if searches of the elements
are required, because the implicit hashing operation used to access elements
is very efficient. Furthermore, associative arrays are ideal when the data to be
stored is paired, as with employee names and their salaries. On the other hand,
if every element of a list must be processed, it is more efficient to use an array.

6.6.2 Implementing Associative Arrays


The implementation of Perl’s associative arrays is optimized for fast lookups,
but it also provides relatively fast reorganization when array growth requires
it. A 32-bit hash value is computed for each entry and is stored with the entry,
although an associative array initially uses only a small part of the hash value.
When an associative array must be expanded beyond its initial size, the hash
function need not be changed; rather, more bits of the hash value are used.
Only half of the entries must be moved when this happens. So, although expan-
sion of an associative array is not free, it is not as costly as might be expected.
The elements in PHP’s arrays are placed in memory through a hash func-
tion. However, all elements are linked together in the order in which they were
created. The links are used to support iterative access to elements through the
current and next functions.

6.7 Record Types


A record is an aggregate of data elements in which the individual elements
are identified by names and accessed through offsets from the beginning of
the structure.
There is frequently a need in programs to model a collection of data in
which the individual elements are not of the same type or size. For example,
information about a college student might include name, student number,
grade point average, and so forth. A data type for such a collection might use
a character string for the name, an integer for the student number, a floating-
point for the grade point average, and so forth. Records are designed for this
kind of need.
It may appear that records and heterogeneous arrays are the same, but that
is not the case. The elements of a heterogeneous array are all references to data

You might also like