Advanced C Programming by Example - Perry, John W
Advanced C Programming by Example - Perry, John W
by Example
eee
epee
2803}
a
re
Tae
oo
K
ce
ee
woe —. O =
==
2 So
I@)P
an International Thomson Publishing company
The trademark ITP is used under license.
®
1. C (Computer program language)
This book is printed on recycled,
I. Title.
acid-free paper.
QA76.73.C15P462 1998
005.13’3-dce21 97-31818
Printed in the United States of America.
CIP
98 99 00 01— 10987654321
Preface xi
Chapter |
Optimal C Coding Style 1
Chapter 2
Review of Standard Pointer and Array Operations i)
Chapter 3
The Linear Dynamic Data Structures:
Stacks, Queues, and Linked Lists #2)
3.1 Why Do We Need Dynamic Data Structures? 45
3.2 The Stack and Queue Data Structures 47
3.3. The Linked List Data Structure 55
3.4 The Doubly Linked List Data Structure 63
3.5 Programmer-Controlled Memory Management 67
3.6 Hashing to Memory with Arrays of Lists 72
3.7. Totally Dynamic Nodes in Dynamic Data Structures 78
3.8 Multi-lists and Hybrid Dynamic Data Structures 79
Chapter 4
Advanced String Handling o1
Chapter 5
Advanced Input and Output 157,
Chapter 6
Bit Manipulation 167
Chapter 7
Recursion and Binary Trees 183
Chapter 8
Multidimensional Arrays and Arrays of
(Non-Char) Pointers 203
Chapter 9
Potpourn: Part One Z2T
Chapter 10
Potpourn: Part Two 243
Appendix
Answers to Selected Exercises 273
Chapter 2 Answers 213
Chapter 3 Answers 275
Chapter 4 Answers 283
Chapter 5 Answers 287
Chapter 6 Answers 290
Chapter 7 Answers 291
Chapter 8 Answers 294
Chapter 9 Answers 296
Chapter 10 Answers 298
Index 30]
Digitized by the Internet Archive
in 2024
https://archive.org/details/advancedcprogram0000perr
Preface
Above all, this is a code-centered book. This means that it relies heavily on offer-
ing lots of code examples and narratives discussing them. Advanced C Program-
ming by Example is a reaction against large books full of verbose theory. For
better or worse, most programmers and students of programming learn by
example, and this book acknowledges and takes full advantage of that fact. I
have tried to keep the narrative part of the text short and to the point so that
people will actually read it!
The narratives that follow code examples dissect and explain the syntax of
these examples and how they express what we wanted algorithmically. If code
contains obscure or difficult syntax, diagrams are included to show how this
code affects data structures or program output. Within text appearing outside
of code examples, all variable names, function names, C reserved words,
names of .h header files, and expressions used in code examples are set in a
different typeface. Function names referred to in the narrative are always
accompanied by parentheses—for instance, printf().
Preface xiii
First, these languages are huge. They represent a doubling (or more) in
the total number of syntactic elements (reserved words and operators). They
have a litany of “must-be-memorized” side effects like the implicit invocation
of copy constructors. They also introduce new variable scopes such as the
“class scope” of C++ and Java, and C++’s idea of classes with shared static
variables.
Class scope, alone, leads to program-reading difficulties similar to those
seen in poorly written C programs that use global variables in functions.
When you see a variable name in a “method” (function), its declaration may
not be found in the function header or local declaration section because of
class scope. You may have to look in the class definition or even further if the
program uses inheritance.
In the author’s opinion, supporters of many “new age/new paradigm” lan-
guages have failed to consider the obvious correlation between the size of a
language and the difficulty of learning and using it. Some even maintain that
these tools are easier to use than their ancestors. If we carry that argument
into the domain of natural human languages, we can see how tenuous these
claims are. Would anybody suggest that a student who is earning a D grade in
Beginning French should jump to Advanced French immediately because of
its larger vocabulary, syntax, and conceptual variety? Yet this is the kind of
propaganda we are asked to swallow, whole and unsubstantiated.
Indeed, many Java books try to portray Java as a kind of “manageable” C++,
clearly implying that C++ is unmanageable. The price you pay for this “leaner
and meaner” C++ is that the OO paradigm is unavoidable—no function can
exist outside of an object! This means that stand-alone utility functions can-
not be written legally in Java. Is OO so demonstrably better than the proce-
dural paradigm that this force-feeding is justified?
There is an impressive lack of experimental evidence that OO languages
give us anything at all. Steve McConnell, in Rapid Development (Microsoft
Press, Redmond, Washington), cites evidence that early OO projects suc-
ceeded only because they were staffed with “stars’—people who could pro-
gram brilliantly in any language. When staffed with average industrial
programmers, OO projects yielded only what could be expected with older
tools, or worse.
Those interested in the “or worse” can read the January 1994 issue of Soft-
ware Technology Practitioner, which includes an article by Robert L. Glass enti-
tled “Object-Orientation Is Least Successful Technology.” Richard Gabriel, in
his book Patterns of Software (Oxford University Press, New York), analyzes the
“habitability” of OO languages—a term he uses to describe a programmer’s
ability to move around comfortably in all parts of a language. He concludes
that the “great object-oriented experiment” will not succeed.
Even in the area of code reusability, an area of renewed emphasis in the
OO sector, no one has yet presented a cogent refutation of the idea of a sim-
ple function as a reusable entity. The simple ANSI C function library remains
the greatest reusability success story to date.
Preface XV
Acknowledgments
Special thanks go the staff at PWS Publishing Company, particularly Andrea
Goldman and Connie Day.
I would also like to thank the following reviewers for their comments and
suggestions:
e Cliff Green, University of Washington
e Clare Hamlet, Pima College
e Mike Holland, Northern Virginia Community College
¢ Bruce Mycroft, Flathead Valley Community College
e Rahul Roy-Chowdhury, Columbia University
Thanks also go to Behrouz Forouzan of De Anza College for the “right-left”
rule for analyzing difficult C declarations.
John Perry
Palo Alto, California
Cr PSOE Neante
evel ibe rt ‘a
=~
Ye AABN
seid fy} aise
ag)
: >e
bie
sasha
iis
OL, urt ee mer iy iilenn
“
ws
\
Bish its Welti eee (tet ae,i ir
~ ai wre
ara Be ;
ih : re ne
Cpl al th | <2 - Au ML a
are Fs bh ios aa DAA
/ =
eu EA hf
+h Pa ae ui af Worn! at
'
‘ ip Vo \
Optimal oi
Coding Style
e Are functions too long (i.e., longer than a page)? Do some do too much?
¢ How can I make it easy to see where comments and code start and stop?
¢ How can I make subtasks within a function separate from other subtasks
in an easy-to-visualize manner?
Comments should be written in clear English and not high-tech jargon. They
should state what a function or code fragment does. They should be easily
separable, at a glance, from surrounding code. Whitespace should be used
liberally between functions and subtasks within functions. Whenever possible,
functions should be short; ideally, they should take less than a page and
should not straddle two pages on the printout.
It is the author’s opinion that commenting styles leading to more com-
ment than code, such as pseudocode, actually detract from program readabil-
ity. When comment volume starts to dwarf code volume, it is time to create a
separate document so that the code reader can see where the code is without
straining. Judicious and plentiful use of whitespace is probably the greatest
enhancement to program readability available to the programmer.
Here’s an example that illustrates most of these suggestions.
[RR KKKKKKKKK KKK KK KKK KKK KK KKK KKK KK KK KKK KKK KKK KKK KK KKK KK |
Hes ES)
hs This program does the following: *i/,
1* 257
/* 1) Swaps two integers. a).
/* 2) Prints the values of the newly swapped x,
[* integers. */
ies 3) Adds the two integers. 23
/* 4) Prints the result of the addition. |
[RRKKKRKKEKKKEKK KEK KKK KKK KKK KEKKEKK KEK KEKKKEKKKEREKEKKEKKEKERE |
d#Hinclude <stdio.h>
main(void)
{
ANC aye ab hen 0:
Void -swapCint 2pil, 1nt..*pj).;
TNE add Giniteaa went. i
Swap(&i, &j);
DriIntrGs) =A Jeera Nn Pee
printt("
The sum of i and j is %d\neaqdGing):
j
[KR KKK KEK KKK KK KK KKK KKK KEK KKK KKK KKK KK KKK KKAKKKKKKEKER |
/* x/
!.1 Commenting, Whitespacing, and Other Style Issues
/*
The function below swaps the values of the Ay,
/*
two integer parameters. x7,
[RRKKKKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK AKA /
vemps="* pil;
di ia Ob
*pj = itemp;
}
[KKK KKK KKK KKK KKK KKK KKK KKK KKK KK AK KKK AK KKK EK KKK KKK /
ifs *x/
/* The function below returns the sum of the */
ifs two integer parameters. af
[ERK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK /
This example is not intended to intimidate the reader into one style of com-
menting and code writing. It is merely my first opportunity to reinforce an
idea that I will repeat like a mantra: Every decision in a program counts!
If your functions are jammed together with no whitespace between them,
if your curly braces do not line up, if you indent two spaces in some code
blocks and seven in others, or if the leftmost characters of all statements in a
code block don’t line up, your code will look terrible. This will hinder any-
one’s ability to maintain it or add functionality to it. Note that even the
(seemingly) insignificant detail of lining up the assignment operators in the
swap function makes it easier on the eye than if this was not done. Nothing is
trivial enough to neglect in your work of programming art!
4 Chapter | Optimal C Coding Style
versus,
ii (a & |p)
{
return a;
}
else
{
return b;
1.2 The Use of Short-Cut Notation: Good or Bad?
versus,
The ternary operator ? is merely like a sideways if statement with the true
expression to the left of the : and the false to the right of the :. It’s certainly
terse, but again, for whom are we writing our code? Other C programmers! In
this case, I greatly favor the use of ? because it avoids the need to write two sep-
arate return statements. Like any other syntactic construct in C, however, ? can
be abused or used in tricky ways that might compromise readability. Read on.
versus,
Dimi
t Coed irs dw oleerd fers DE)
and
if (palindrome(string) )
{
printtG'%s) 1S’ atpalindrome!
\n" > string);
}
else
{
printtc¢’%s is nota palindrome!
\n, string);
I
versus,
The use of ? in the last code fragment prevents the duplication of two printfs
that differ only in that one contains the word "not" in its output whereas the
other doesn’t. However, we are on much shakier ground here than in Example
1.3, where ? had no liabilities. Here the ternary expression is embedded in a
printf, which renders it much less visible than it was in the return statement
of Example 1.3. We can probably stop short of an outright condemnation of
this trick, but its use is extremely questionable. Critical code must be visible!
Before we leave Example 1.4, note that the second code fragment in the
example says
if (palindrome(string) )
and not
if (palindrome(string) != 0)
The first form of the if looks more English-like (“If string is a palindrome,
then do something...”) and is more concise—a double advantage! If a func-
tion returns a value to be interpreted in a Boolean true/false fashion, there is
no need for the relational operation !=, and it actually makes code less read-
able. Similarly, if you want to do some action if string is not a palindrome, it
is actually more English-like to say
if (! palindrome(string) ) /*¥** If not a palindrome, ado
something! ***/
instead of
if (palindrome(string) == 0)
versus,
RECUR aE
SS
!.2 The Use of Short-Cut Notation: Good or Bad? 7
Because any C programmer knows that all relational expressions (i.e., those
using = or != or > or >= or < or <=) produce a value of one if true and zero if
false, there is no good reason to use the coding style in the first code frag-
ment of Example 1.5. It is verbose, does not use C’s expressive power, and is
no more readable than return a > b.
Let’s summarize what is clear and what is open to debate in this section.
What is clear is that Boolean return values from functions and Boolean
expressions are often misused, as shown in Examples 1.4 and 1.5. The ternary
operator ? is underused, but as is clear in the last code fragment of Example
1.4, it is easily abused.
Is there any other operator in C that is not properly appreciated? Yes—the
comma operator! Look at the following examples.
getint(&i);
while (i != 0)
{
/* lots of code here */
getint(&i);
}
versus,
while (getint(&i), i != 0)
{
/* lots of code here */
}
/***k*k*k Function getint shown above would go here. *****/
el
The comma operator is shown in the final while loop conditional in Exam-
ple 1.6. The value of a series of expressions separated by comma operators
is the value of the rightmost expression. Thus, in the second while in
Example 1.6, i != 0 controls whether the while loop continues and not
getint(&i). F
Chapter | Optimal C Coding Style
So why bother using the comma operator? If you inspect the code in Exam-
ple 1.6, you will note that one reason for using the comma operator here is
that get int (8&7) is not repeated as it is in the first whi le loop of Example 1.6.
However, programming is not a contest to see whose program can be typed in
the fewest characters. Therefore, we expect another benefit. That benefit is
readability. Loop headers, if well written, give the reader vital information
about the flow of control in a program. Here, putting getint(&7) in the loop
header really reveals what controls the loop (i.e., whether an integer just
obtained from the user is zero or not).
However, you must remember that the comma operator is a highly abus-
able operator! It is rather tragicomic to see a while loop that has seven
expressions comma-separated in its loop conditional with only one or two
statements in the loop body itself. Restrict comma-delimited expressions in a
loop header so that only those expressions involved in loop control are in the
header. If the number of expressions exceeds two or three, avoid using the
comma operator. Just remember the old adage: “Now that you have been
given a hammer, remember that not everything is a nail!”
}
}
else
t
printf("Average job!\n");
Grace = VC's
}
}
else
{
oediner( POOR jOlOl Nn) 2
grade = 'D';
}
}
else
{
Died htt (cra ingeavOp
!\nn )s
grade = ‘F';
}
Many readers may be rolling on the floor with laughter at this point, but I
have seen such shoddy code countless times! Let’s examine this code and pin-
point what is wrong with it. First, nesting forces you to “remember your past.”
That is, when you are inside the fourth nested if, you must remember the
three prior conditions that had to be “true” to get you to this deeply nested
level.
Second, this method of nesting does not put the false actions (those associ-
ated with the else clauses) immediately after the true actions (those associated
with the if clauses). As a result, by the time you read the e1se clause, you may
have forgotten what the if condition was because it occurred too far back in
the program.
Contrast this disaster with the correct code shown in Example 1.8.
{
printf("Average job!\n");
Grades —siCa.
}
else if (score >= 60)
{
printtC Poor jobl ne);
grade = 'D';
}
else /* Score is below 60! */
if
DrinttG-
Fail lingsgob! Ans):
grade = 'F';
}
Note the following about the code in Example 1.8: (1) There is no nesting, so
the printfs and assignments are all at the same level of indentation and thus
are easy to read. (2) It is much easier to see what condition leads to the execu-
tion of the printfs and assignments, because they follow the condition zmme-
diately. (3) Because there is no nesting, readers do not have to remember a
multitude of Boolean conditions that got them to the code they are currently
reading—readers do not have to “remember their past.”
Unnecessary if nesting also results when programmers simply forget that
ifs can have multiple Boolean conditions. Look at Example 1.9.
phi pallGche =)
The AG) Se)
We (Ce. a tely)
{
/* Do something in here. */
versus,
if (a == b && b == c && c = d)
{
/* Do something in here. */
i;
<n SS SSS SSS SSS SSS SSS ES
Obviously the single if is better than the trebly nested if. Why “obviously”?
For two reasons: The notion that the block of code is executed only if all three
conditions are true is more obvious because of the commonsense meaning of
and, and people read better from left to right than downward at an angle. The
1.3 Flow of Control: How to Write “Flat” Rather Than “Deep” Programs
linenum = 1;
Ce=" Xe:
while (c != EOF)
{
printf("\n%-5d", linenum) ;
linenum++ ;
while ((c = fgetc(filepointer)) != '\n' && c != EOF)
ms (c:J= EOF)
This code has an artificial, contrived look about it, and analysis bears this
impression out. Note the strange initialization of c, the repeated EOF tests
(one of which is necessary to keep the inner whi 1e from becoming an infinite
loop when EOF occurs), and the general feel of near chaos created by this
code.
It is simply amazing that such apparently logical thought could lead to
such convoluted code. Nonetheless, it is bad code indeed. By contrast, the
code in Example 1.11 uses no loop nesting, no repeated EOF test, and no
repeat of the putchar(c) statement, and there is no need for a contrived ini-
tialization of c. It is clearly superior to the code in Example 1.10, and yet it
does not just “drop out” of the language of the problem statement.
12 Chapter |_ Optimal C Coding Style
linenum = 1;
Ole wut Ca i
while (Ce = "gevchan@)! != BOR)
{
hiGisap hate) ewcrt a2:
putchar(c);
if nea")
{
linenum++ ;
printf("%-5d", linenum) ;
}
}
The code in Example 1.11 can be designed easily if we remember at the out-
set that in C, unlike Pascal, newlines are just characters and not Boolean con-
ditions (end-of-line is a Boolean function in Pascal that returns "true" or
"false"). “Bottom-up” counts from beginning to end, and here it results in
the waste-free code of Example 1.11 instead of the hodpepodge that is Exam-
ple 1.10. Some algorithmic purists will object to this assertion, but nothing
dispels such controversy better than an example that applies this philosophy.
while(1)
{
/* Some code is here. */
if (condition) break;
/* Some code is here, too. */
When programmers read code, they usually look first at generalities, not spe-
cifics, to get the “big picture” of what the code does. Unless you understand
the basic functioning of a program, the specifics are often meaningless. Yet
code like that in Example 1.12 forces the reader to wade into the small details
immediately, because loop control information is embedded deeply within
the loop and surrounded by lots of code the reader would rather not think
about at the outset.
However, the continue statement is another matter. Because it forces con-
trol to return to the loop condition, it actually forces the programmer to cre-
ate proper loop control conditionals. Continue can also enhance readability
if it enables us to dispense with odd conditions at the top of the loop body
before we engage in processing “normal” situations. The following “pseudo-
C” code illustrates a good use of continue.
while(strcmp(user_input, “quit") != 0)
{
if (user_input is all whitespace or just a newline)
{
continue;
}
if (user_input has non-letter characters)
{
printf("Invalid characters!\n");
continue;
}
/* See if user-supplied word is in some table here. */
14 Chapter | Optimal C Coding Style
Computer programming is much like chess in that there are general prin-
ciples that are usually true. However, we get into difficulties if we fail to be
mindful of the exceptions. I have seen many chess games lost because a per-
son unthinkingly played a move that conformed to general principles but was
invalid in the circumstances at that time. Programming, like chess, requires
vigilance at every step. That is why it is such a challenging, frustrating, and yet
peculiarly rewarding activity.
ter and Array
_ Operations
~
Why does this work? Characters are guaranteed to occupy one byte of mem-
ory in the ANSI C standard. Therefore, the address held in ptr will be
strlen(string) higher than the address held in string after the while loop
terminates. Note that the while conditional could have been written as
while (*ptr)
Though terse, this is certainly readable to a seasoned programmer, because
the value of the '\0' character is zero—the value that stops any loop! You
may choose the less terse condition used in Example 2.1, but do not forget
that you will read far more code than you write in your career as a program-
mer. Let’s look at another well-known string.h library function, stremp().
Remember the rule for how strcmp() works? It returns a positive number if
the first string argument is greater than the second string argument, a zero if
the two strings are the same, and a negative number if the first string argu-
ment is less than the second string argument.
The sense of greater than and less than can be understood only sel refer-
ence to the ASCII character set. In this character set, all the digit characters
precede all the uppercase letters, which precede all the lowercase letters. This
means that a lowercase letter is “greater than” that an uppercase letter and
that an uppercase letter is “greater than” a digit character.
This algorithm says, then, in English, “Keep looping while the characters
being pointed at by s1 and s2 are the same.” The if takes care of the case
where both *sl1 and *s2 are both '\0' and, therefore, we've reached the
end of both identical strings. All other cases are taken care of by the while
header. At this point, the ASCII value of *s2 is subtracted from the ASCII
value of *s1. Obviously, this number must be positive or negative. (Remem-
ber that the zero return is taken care of by the if inside the loop!)
2.1 One-Dimensional Array Manipulation, Pointer Arithmetic, and Indexing 17
while (*pi != 0)
iL
Sum = pitt;
}
return sum;
}
This little piece of code illustrates a lot. Let’s pick it apart. First, note that we
initialized our accumulator variable sum to zero in the declaration—a com-
mon practice and a good one. Note the use of += instead of =. This can actu-
ally make the code run faster on some platforms. Note that after the value of
18 Chapter 2 Review of Standard Pointer and Array Operations
*pi (i.e., the value in the address pointed at by pi) is added to sum, pi’s value
is incremented.
It is critical that you understand this meaning of *pi++ because the opera-
tion of lending a value to an expression and then incrementing the pointer is
universal in C programs. It is pi that is incremented, not *pi. We must type
(*pi)++ to increment the value pointed to by pi. That’s because the * has a
lower precedence than the ++ and therefore does not bind to pi as closely as ++.
Now for the confusing part. Everyone knows that ++ in C means “incre-
ment by one.” But hold on! If our program is running on a machine with
four-byte integers, then how can I claim that the ++ in the code above moves
to the next integer; that is, it is adding four to pi, not one? Here is where the
intelligent design of the C language saves us. When you use ++ to increment a
pointer, doing so actually increments its value by the size of the thing being pointed
at. In this case, we are pointing at an int, which will be two, four, or eight
bytes, depending on the host computer’s hardware. It is no exaggeration to
say that C would be a useless language if it did not have this feature.
Many students who are fearful of pointers are afraid to write code like that
in Example 2.3. They would write it something like this:
int sum = 0, j = 0;
sum += i[j+t+];
; ee Sum;
Although this code works, it is far inferior to Example 2.3. Why? To answer
this, we must discuss assembly language for a moment. All assemblers have an
“increment by one” instruction just as C does. When we use pointer arith-
metic in Example 2.3, the ++ operator is all we use to move the pointer. How-
ever, in Example 2.4 we use indexing, and the C compiler turns i[ j++] into
*(j + Jj) followed by j++. We add j to i—an additional operation not done
in Example 2.3 and, worse yet, one that takes at least three statements in
assembly language. And we must still increment j!
Thus, in this small code space, we have introduced a variable, j, that is not
needed and that greatly reduces the efficiency of the function. “But such a
small function!” you may object. Yes, but remember that such small inefficien-
cies, if created routinely in a 10,000-line program, add up to a great deal of
inefficiency.
2.1 One-Dimensional Array Manipulation, Pointer Arithmetic, and Indexing 19
while (*(i + j) != 0)
{
Stil = SC) se g)s
ag
}
return sum;
}
This disastrous code contains all the inefficiency of Example 2.4 (you’re still
doing that addition of j to 1) and none of the readability of Example 2.3. In
fact, I often jokingly refer to *(i + Jj) as “cheater indexing.”
Why “cheater indexing”? Because you’re still using an index variable, j,
but are fooling yourself into thinking that you understand pointer arithmetic
just because you wrote an expression, *(i + j), that doesn’t use index brack-
ets ([ ]). Example 2.3 is the only code you should even consider as a correct
way of solving the array element summing problem.
Remember that the code in Example 2.3 uses zero as a sentinel value (a
value that warns us to stop processing data). When you traverse arrays, you
must know when to stop. Otherwise, you will get a fatal runtime memory
error (whether you use indexing or pointer arithmetic). Often, the sentinel
method is no good, because in some applications, all array element values
(including zero!) are valid data.
To write sumints() in the most general fashion possible, we would proba-
bly pass a second parameter telling us how many array elements to process.
The optimal result is shown in Example 2.6.
20 Chapter 2 Review of Standard Pointer and Array Operations
Finally, why not use the sizeof() function to compute the number of
bytes in the array? Because an array, when passed to a function, is always
treated as a pointer regardless of the notation in the function header. Therefore, when
used in a function, $izeof(pi) will return the size of an int pointer, not the
size of an int array.
Cenpe=mar:
I= Fj;
jj = Temes
}
We know that the code in Example 2.7 does not work, because call-by-value
means that copies of iand j are sent to the swap() function. These copies do
not live in the same address in memory as the original i and j from the call-
ing environment. Therefore, no matter what we do in the function, the values
of the parameters of swap() will not change when we return to the calling
environment.
2.2 Using Pointers as Function Parameters 21
Vemnp>=FApi ;
lee Das
“Dj vemp:
}
Let’s assume that we are swapping the values of variables i and j in the call-
ing environment. From beginning C, we know that the proper call of swap
looks like this:
Swap(&i, &j);
This same call works no matter which of the three single-parameter versions
of sumints() we use. Why? Because the name of a one-dimensional array is a
pointer (address) to its first element. We are sending the address of the first ele-
ment, so we are manipulating what is in that address directly (whether with
pointer arithmetic or with indexing), not a copy that exists in another address.
Many beginning and intermediate-level C programmers forget this fact
and pass &a to the function. This is a serious mistake for three reasons:
Surprisingly, I have seen compilers that would let such a type mismatch pass at
compile time and runtime. Of course, making such a mistake invites bizarre
runtime behavior. Later in this chapter, we will see a situation where we want
22 Chapter 2 Review of Standard Pointer and Array Operations
In the calling environment, the two struct parameters would still have their
original contents after the return from swapstruct(). The correct code to
swap two structures is shown in Example 2.10.
CA = sosIls
Plarser lh DST &
<OSiae—c CID
Note that the correct swapping code does not look any different from the
code that swapped ints. That’s because structs can be assigned to other
Structs just like ints. As we have already seen, arrays cannot be directly
assigned. We will talk about array copying in Section 2.3.
Obviously, we cannot write a function similar to sumints() for an array of
structs, because an element of an array of int is simply one int, whereas an
element of an array of structs is a struct. As you already know, one struct
2.2 Using Pointers as Function Parameters 23
There are several important items to note in Example 2.11. When we say sf p++,
the pointer sfp is moved to the next array element. Here we see a clear applica-
tion of a principle stated earlier: When you increment a pointer via ++, the
pointer is actually incremented by the number of bytes in the item being point-
ing at. In this case, s fp is incremented by the number of bytes in a struct foo.
Also, you see a notation that you may not have seen before, sfp- >i. What does
it mean?
Suppose I have a set of declarations like these:
Silver FOC
{
dime Ws
Ghar stiri
ng Elon.
hi
Recall, from beginning C, that I can access the component i of sf00 with the
notation
S7OOs 1
24 Chapter 2 Review of Standard Pointer and Array Operations
What about fooptr? The first thing we have to worry about is that fooptr
has not been allocated any space to point at something. Here’s how we would
take care of this task:
fooptr = (struct foo *) malkloc(sizeof(struct foo));
Now the fooptr is pointing at a newly allocated struct foo that has not
yet been filled up. Malloc(), a function we will use throughout this book,
returns a void *. It turns out that void pointers are compatible with any
pointer type. You should still cast mal1oc’s return value so the reader knows
what type of pointer fooptr is. Later, we will see what to do ifmalloc() can-
not deliver memory to us successfully.
How can we assign 5 to the i component of the struct pointed at by
fooptr? There are two ways:
(*foopiraa1 = 53
TOOOERS1 = BE
Does this copy the string pointed at by s1 to s2? No! It merely puts the
address of the “H” in “Hello” in s2. Here $1 and s2 are pointing at the same
string in the same place in memory. As we all know, strcpy makes a true copy, so
Surcow(szZ;, Sil)-s
is what we really want. Now s1 and s2 are pointing at their own copy of
“Hello”, each of which resides in a different place in memory. However, we
have a dilemma if we try to copy arrays of types other than char. What do we
do? We use the string.h library function memcpy (). Memcpy’s syntax is
memcpy(destination_address, origin_address,
num_bytes_to_copy);
Doesn’t this memcpy () violate the rule that you can’t assign to an array because
it is a pointer constant? No! When you try to assign an array via a2 = al, you
26 Chapter 2 Review of Standard Pointer and Array Operations
You don’t need to use for loops for all array initializations! How about ini-
tializing an array of structs? Here’s an example:
struct person
{
int age;
char name[40];
float salary;
Ne
int main()
{
struct person people[3].=—"£ 145,.-*Johnabee. 40123734},
{39, “Jack Benny”, 109987.88},
{100, “George Burns”, 1000000.00}};
/*** Remainder of main’s code goes here. ***/
We know that memcpy() works a lot like strcpy(), but it works for any
array data type. Is there a mem analog to the strcmp() function? Yes, the
memcmp() function. It too is in string.h. Here’s its syntax:
Memcmp() works just like strcmp() except that it compares bytes from any
two array types. Thus two int arrays can be compared, or (ridiculous as it may
sound) one int array and one char array. The array types of the first two
arguments do not have to be the same. Generally, though, it would be an odd
application if they were not. Here’s an example of the use of memcmp( ):
WARS CURSE aia ie Rone 7. Wee loll 8 Fill erth aps welA Seis
printf("Result of comparison is %4d\n", memcmp(a, b, 5 *
sizeof(int)));
What will the printf() print? Remember that it works just like strcmp ()
in that it returns a negative value if the first array is less than the second, a
zero if the first array is the same as the second, and a positive number if the
first array is greater than the second.
However, with non-char arrays, greater than, less than, and equal refer to the
integer value of bytes rather than to the ASCII value of characters. Clearly, the
printf() should print a negative value, because after the first three values
are equal (1, 3, and 5), the fourth value of b is greater than the fourth value
of a (8 versus 7). Note that the two arrays may be partially or wholly com-
pared, because the third argument of memcmp() enables us to specify the byte
length of the comparison.
This leads us to the introduction of the "n" versions of the string.h func-
tions—the ones named strncmp(), strncpy(), and strncat(). These, like
memcmp() and memcpy (), take a third parameter that enables us to compare,
copy, and concatenate only the first "n" characters, where "n" is some int
you give as the third argument. Let’s demonstrate strncmp().
What should the printf () print? It should print 0 (zero) because the first
three characters of the two strings are identical.
28 Chapter 2 Review of Standard Pointer and Array Operations
Have you thought about the problem? Are you confused about where to
start? Let’s examine the problem algorithmically first. First, how many func-
tions should there be besides main( )? Your answer should be “at least two” —
one to format the string and one to turn the string backwards. Here, in
English, is the algorithm to format the string:
1. Initialize two pointers, fast and $1 ow, so that they start out by pointing
to the beginning of the string.
2. Let the fast pointer run ahead until it finds a non-whitespace character.
3. When *fast is a non-whitespace character, copy it to *$ 1 ow.
4. Continue looping until *fast is the '\0" character.
5. After loop termination, copy the '\0" character to *s$1 ow.
1. Initialize two pointers, start and end, so that they point at the first and
last characters of the string (e.g., the "f" in "foobar" and the "r" in
LT OODaliae)
2. Swap the values of *start and *end.
Se Move the start and end pointers toward one another by one character.
4. Continue steps | through 3 until the two pointers are equal or cross
over.
#Hinclude <stdio.h>
#Hinclude <string.h>
#Hinclude <ctype.h>
[BKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KK KKK KKK KKK /
He oy
/* Get strings from user, clean out whitespace a
/* and convert letters to lowercase, reverse string */
/* and then output to screen. xy
[RR KKRKKKKKK KKK KK KKK KK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK /
main()
{
char word[512];
void getword(char *), format(char *), reverse(char *);
[KKK KKK KKK KK IK KK KK KKK KKK KKK KKK KKK KKK IKK KAKA KKK KKK KKK |
*/
Hes
[prs Remove whitespace from string. Convert uppercase */
js letters to lowercase. * /
[ ERRRERKAKKRK KKK KKK IKE KK KEE ERK EKER ERK KKK ERK KEKE |
Wher *fasite»)
{
if (lisspace(*fast)) *slow++ = tolower(*fast) ;
fuaSiuaatas
}
*slow = °\0’;
[BRK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK |
/* “af
If Reverse order of characters in string. xf
[RRKKKKKKKKKK KKK KKK KK KKK KKK KKK KK KKK KK KKK KKK KKK KK KKK KKK KKK /
$ asgle.exe
start end
start end
start end
At this point, the while conditional (end > start) fails because the two
pointers are now equal. If you don’t understand format(), try drawing
graphs like these for every loop iteration. When it comes to understanding
pointers, a picture really is worth a thousand words.
Let’s look at some of the details of the code in this program. Note the trick
in reverse() that got the end pointer to the end of "hel1o". That trick was
ends=estantetestrmlenCstant)ior ls
Many people would use a loop such as
fommendr= "start; sende!=s\ 08; send) {;
end--; /*** We want to point at the "o" in “hello”, not
Che am\ 0) Pes
You may have overheard programmers using the expression “reinventing
the wheel” when describing a programmer who writes unnecessary code
instead of using the power already in the language. Not to use strlen() to
get the end pointer to its starting location is indeed reinventing the wheel in
this situation.
32 Chapter 2 Review of Standard Pointer and Array Operations
Note that gets() was used to capture the user’s input string. Here is
another situation where a beginner reinvents the wheel or makes a serious
mistake. Some would use this to capture the input string:
scanf("%s", word);
WRONG! This programmer has forgotten that scanf() will stop reading
into word as soon as it encounters whitespace. Others would use getchar()
in a loop to capture characters, one at a time, and stuff them into word until
"\n' is encountered.
The beauty of gets() is that it captures all characters up to, and including, the
newline. After it finds the newline, it overwrites it with ‘\0’ in the captured
string. The only danger with gets() is that your line may have more charac-
ters than allocated for word. That’s why it’s a good idea to give word a hefty
allocation such as the 512 characters allocated for it in Example 2.13.
Format() removes whitespace and converts uppercase characters to lower-
case as the algorithm states. Note that the tolower() function is called to
convert the non-whitespace character to lowercase even if it is not uppercase.
The ANSI C definition says that tolower() returns the character unchanged
if it is not uppercase.
Also note that the fast pointer is not incremented inside the call to
tolower(). This is because the function call parentheses have a higher prior-
ity than ++, and, in fact, this author has experienced difficulty when using the
postfix form of ++ inside a function call. Don’t do it!
Finally, note that the while condition in format() says only *fast, not
*fast != '\0'. Why? Because if *fast is '\0" then its value is zero—the
value that terminates a while loop. However, this is a gray area. If you find
the wordier while condition more readable, use it. It is not worth quibbling
over. However, you may have to read code written by others who use the terse
form. Therefore, it is good to be conversant with both forms.
It is worth taking a moment to look at the testing session that accompanies
the code in Example 2.13. You should always test all possible permutations of
input, especially extreme input such as an empty string or one that contains
whitespace only. Note that these conditions and all relevant “normal” input
are, indeed, tested in the sample program session shown in Example 2.13.
While we are on the subject of testing and functionality, do format() and
reverse() behave properly in the extreme cases of empty strings and strings
that are all whitespace? They do. However, some of you may think that
end = start + strlen(start) - 1;
causes trouble on an empty string, because end will point one character to the
left of start—a location clearly outside the string’s boundaries. Remember
this rule: A pointer with an out-of-bounds address causes a problem only when it is
dereferenced. Because the end > start test will fail immediately, the function
will never refer to *end when the string is empty.
2.5 Index and Pointer Trickery in C 33
Now let me ask you another question. What is the value of b - a? If your
answer is “It depends on the size of an int,” then you are forgetting how
address arithmetic works in C. Remember that when we add an int to, or sub-
tract it from, a pointer, it adds or subtracts that int tomes the size of the thing
pointed to by the pointer. Thus, in the previous paragraph, the value of b - 1 is
reallyb - (1 * sizeof(int)) because b isan int pointer. Therefore, b - a
isreallyb - (a * sizeof(int)). But there’s an easier way to do this subtrac-
tion: Ask yourself how many ints separate b and a. The answer: 3.
A final trick I want to demonstrate is the use of the sizeof() function
with arrays and pointers. If a variable is declared as a true array (e.g., int
a[5]), then sizeof() will return the size of each element times the num-
ber of elements. Thus, if your computer had four-byte ints, then
sizeof(a) would return 20 (5 * 4). However, if a variable is declared as a
34 Chapter 2 Review of Standard Pointer and Array Operations
pointer, then sizeof() will return your machine’s byte size for a pointer—
generally 4.
It does not matter how much memory is mal 1oc’ed to a true pointer; the
sizeof() it will be just the pointer size for that machine. Finally, if you give
sizeof() an element of an array as a parameter (e.g., sizeof(a[5])), it will
simply return the size of a[5]’s data type—in this case, sizeof (int).
Later in the book, when we look at multidimensional arrays, you will see
how this knowledge can help you to understand tricks involving multiple
indices and/or multiple levels of indirection. First, however, you must take
the biggest leap of your programming career: understanding pointers to
pointers.
Most people have little trouble understanding the use of pointers in functions
because of call-by-value. For some reason, however, most students have great
difficulty with the concept of a pointer to a pointer. My theory is that this
problem arises because the concept is actually easy while it is expected to be
difficult!
Remember that when you are passing a parameter to a function and you
want this parameter to come back to the calling environment with a changed
value, you must pass a pointer to it. What if the value being passed to the func-
tion is a pointer and you wish it to come back with a changed value? You
guessed it—you have to give the address of the pointer, which is just another
way of saying a pointer to the pointer.
Here’s a simple piece of code illustrating the idea of apointer to a pointer.
dHinclude <stdio.h>
dHinclude <stdlib.h>
int main()
if
Chigir “Sil, se
BON) = wile
Gil = WEP s
*sz = temp;
}
Study this code carefully. It contains many important chunks of basic knowl-
edge. What does it do? As the function name would suggest, it swaps two
strings. It does so without copying a single character. How it does so will be
explained after we’ve cleared up a few points.
First, why are $1 and s2 declared as character pointers and not as charac-
ter arrays? Remember that true arrays are pointer constants. However, in
order to swap these two strings, we do so by changing the address contained
in $1 so that it contains the address of the "G" in "Goodbye" and by changing
the address contained in s2 so that it contains the address of the "H" in
"Hello". Because we are going to change the contents of s1 and s2, they
must be pointer variables, not pointer constants.
Next, after we’ve properly declared s1 and s2 as pointer variables, we
must allocate some space for them via malloc(). Malloc() can fail at run-
time if the computer is out of available memory. When malloc() fails, it
returns NULL (zero) as the return value. If the malloc() of space for $1 or
s2 fails,we deliver an error message to the terminal and then call the
exit() function.
Exit() and malloc() both reside in the std1ib.h header file. Typically,
when a program ends prematurely because of an error condition, we return
any integer value except zero. Zero is typically returned to the calling envi-
ronment only upon a normal (error-free) termination of the program.
Once we’ve allocated memory and guarded against a memory failure, we
call the strswap() function. Because we want the values of s1 and s2 to be
changed in the function (so that they point at the other pointer’s string), we
must pass the address of s1 and the address of s2—just as we must pass the
address of any variable if it is to be changed in a function.
That’s why the data types of the two parameters in the function’s environ-
ment are char ** (i.e., pointer-to-pointer-to-char). The following diagram
shows the state of the pointers in strswap() before it does anything and then
after each statement.
36 Chapter 2 Review of Standard Pointer and Array Operations
WEI: = ceils
temp
2h i LAG/AG
Sy2 TS AEGIS yr ae
Just for fun, I want to pose another question here. When you are in strswap
and have not yet swapped addresses, what are the values of **s1 and **s2?
You can answer this question correctly by getting the correct data type.
What is the data type of s1 in strswap()? It’s a char **. Then what is the
type of *s1? It’sa char *—you can see in the diagram that it is indeed point-
ing at the first character of a string. Therefore, the data type of **s1 (same
for **s2) is char. This reasoning leads you to the correct answer: **S$1 is the
"H" in “Hello”, and **s2 is the “G” in "Goodbye". It is imperative that you
not leave this chapter without understanding this last paragraph!
<stdio.h> Functions
Function: ingsputchaneGintrc):
Fact: 1) Same as putchar() but writes the character (c) toa file.
38 Chapter 2 Review of Standard Pointer and Array Operations
Fact: 1) Writes string to the screen and then writes a '\n" as well.
<string.h> Functions
<stdlib.h> Functions
Exercises
2. Write a function that will determine whether its char * argument is a pal-
indrome. A palindrome is a string that looks the same way forwards and
backwards (such as "]evel"). Your function should work even if the
string passed has nothing but the '\0" character. It should not use any
strings other than the one passed as a parameter!
d. Sorts the scores if you know how to do that. If not, just enter the scores
in ascending order.
e. Calculates the median score. If there are an odd number of scores, the
median is the middle score. If there are an even number of scores,
the median is an average of the two scores closest to the middle. For
example, the third highest score is the median of five scores. The aver-
age of the third and fourth highest scores is the median of six scores.
f. Outputs the result of part e to the terminal.
Given these declarations and that $1 is 1000 and s2 is 5000, find the
value of the expressions below. Assume four-byte pointers.
m 6. You are given the following declarations and statements. (See the individ-
ual parts for * markings.)
/*** Yes,
pa and pb can be declared as pointer variables!! ***/
/*** They
are allocated six ints apiece. “Seestee]
Inte*par=i{15253),4,5,6) + *pbi=17,659,10,01,121. *pe, spd:
10. . Write a function that accepts an int * parameter and the number of ele-
ments in the corresponding array. The function will transpose each pair
of ints in the array—that is, the 0 and | element, the 2 and 3 element,
the 4 and 5 element, and so on. Use indices when and if you feel they’re
appropriate.
_ The Linear Dynamic
Data Structures:
- Stacks, Queues, and
Linked Lists
45
46 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
For example, suppose your program maintains student records for colleges
and universities. It keeps the array sorted by the student_id field of the follow-
ing struct type:
struct student
{
long student_id;
char student_address[200];
char student_name[50];
[****x* Lots more fields here! *****/
‘is
Let’s suppose that each struct student contains 500 bytes of information
(not unreasonable). Further, let’s make the reasonable assumption that the
size of an average college is 15,000 students but the enrollment at the largest
possible college is 60,000 students. Thus on an average campus site where our
program is run, we will waste 45,000 times 500 bytes of memory—22.5 mega-
bytes of waste!
We still have not touched on the most serious deficiency of using an array
of structs: sorting them. Almost no real-world databases are static; they grow
and shrink. Suppose we have all records of the student database loaded into
an array of struct sorted in memory. What do we do about new insertions? If
we just put them at the end of the array of existing records, then we lose our
sorted order. If we want to keep the sorted order at all times, we have to cre-
ate a “hole” in the middle of the array and then move the “greater than” ele-
ments to the right—a very expensive maneuver!
Deletion presents another problem. Whenever you delete a student record
from the array of struct students, you have a hole in the array. After each
batch of deletes, you have to run another function that removes holes by mov-
ing all records in higher indices of the array down to cover the holes. Clearly,
this is also extremely inefficient.
As the reader can plainly see, fixed-allocation arrays of structs are
inflexible, high-maintenance data structures. They are really not worth the
trouble. Whenever we have data collections that can grow or shrink, we
would like to receive or remove memory whenever we need it or are through
with it.
The answer to our prayers is a dynamic data structure. A dynamic data struc-
ture in C is represented as a series of structs all with pointers in them to simi-
lar structs. Here is a graphical depiction of an ordered linked list data structure
holding student records. Only the “key” of each structure (student_id) is
shown, along with the pointer to the next structure.
Note that there is a slight change to each struct student in this data
structure. Each contains a pointer to another struct student. Thus each
struct student would have a field in it that looks something like this:
struct student *next; /*x*k Tt doesn't have to be called
wnext" | *x** 7
Note that although it looks just like a linked list, the nodes are not
ordered. That’s because they are always inserted in the same place, the top of
the stack. The “top” of any stack is defined as the node pointed at by the stack
pointer. Like many data structures books, we will use the electrical ground
48 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
symbol (seen to the right of the last node) to indicate a NULL pointer. The
pointer coming out of the last node should always be NULL.
The only difference between a stack and a queue is that a queue (pro-
nounced “cue”) receives new nodes at its rear. To do this easily, we must have
a second pointer to refer to the last node. Thus a queue looks like this:
queue
rear
Let’s look at some C code that demonstrates insertion into and deletion
from stacks and queues.
dHinclude <stdio.h>
#include <stdlib.h> /*** For malloc and exit. ***/
new->next = stack;
stack = new;
return stack;
Now, let’s look at some calls to pushs(). I could have prototyped pushs() as
However, that would not be consistent with the philosophy of keeping code as
simple as possible. It is harder to read code with double indirection (i.e., the
two *’s) than code with single indirection. Therefore, I decided to use the
return value of the function to pass back the new value of the stack pointer.
We will examine this in detail later, but let’s look inside of pushs() first
and analyze the code graphically while adding a node with key 3 to the stack.
ad es
new->next = stack; [EX Stacks tants -OUinehcS mldih Gnas aNUieleeg cs47/
stack = new;
new
stack
3 |
Then the function returns the stack pointer. Note the notation new->i = 7.
This could have been written (*new).i = i. However, the -> notation was
added to C to make it quite clear that new is a pointer. Thus new->next
means “the component next in the node pointed at by new.”
We saw this notation in Chapter 2, but it is so important that it is worth a
quick review. Don’t continue beyond this point without understanding this
notation, because it will be used throughout the rest of the book. Now, say we
add the node with 4 as its key value.
50 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
~ stack
new
stack = new;
|
new
stack
Nothing new here, so let’s look at the pops() function, which removes a
node from the stack. Let’s assume that we have also pushed a 5 node onto the
stack before entering pops().
We have seen that a stack is a LIFO (Last-In, First-Out) data structure. Thus
the nodes that were inserted in 3-4-5 order will come off in 5-4-3 order. Look
3.2 The Stack and Queue Data Structures 51
at this code. The if statement returns NULL if nothing is on the stack. NULL is
generally regarded as a “failure” value in functions returning a pointer—we
have failed to pop from the stack because nothing is there.
Let’s remind ourselves of why stack is a NODE **. We cannot use the
return value of pops() to return the stack pointer, because we are returning
a pointer to the “popped-off” node. This is prudent: The person using our
pops() function may want to do something with that node. Thus we must
pass a pointer to the stack pointer so that, after a pop, the stack pointer
comes back to the calling environment with a changed value.
We can also see why *stack = *stack->next is wrong. This says, “Look for
a component next in a node pointed at by stack and then dereference that.”
But stack isn’t pointing at the node; *stack is! To reinforce this concept,
here’s the state of the stack inside pops () before we have done anything.
TIPS = wSieclel<s
*stack = (*stack)->next;
stack ——*stack
PURSE
Now the last statement, first->next = NULL, detaches the node pointed
at by first before returning first to the calling environment. Here’s a pic-
ture of the state of our stack now:
first->next = NULL;
stack ——>*stack ||
Now let’s look at the code that pushes a node onto a queue. Note that this
code maintains two pointers—a pointer to the front of the queue (because all
52 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
popped nodes come off the front of the queue) and a pointer to the rear of
the queue (because all new nodes are pushed onto the end of the queue).
Let’s follow this code graphically. First, if we have inserted a couple of nodes
with keys 3 and 4, then the state of our queue inside of pushq() before any-
thing has been done is
——> *q
rear —— » “*rear
Now let’s see how the code works to add a node with 2 onto the queue.
Canad
Now that we have seen that a new node is linked into a queue at its rear,
let’s examine the code to pop a node off a queue. Remember, queue nodes
are always taken off the front of a queue, just as the first person in line at a
movie theater is the first to get in. It’s a FIFO (First-In, First-Out) data struc-
ture.
Mes =] Wels
Tie Aq = INU ere turn NULLS /*xe™ Nothing to popl! ***/
SG) = (s@)=2mMeace
di (s@, == INUILIL)) ip@eie = INUILILS =fesees [Lasie M@Gle Joop), ot/
Pirsie=Snede = IWILiLs
return first; /*** Return pointer to popped off node. ***/
main()
{
NODES. *=stack; = NULL, 7*q,= NUEL SN *rear = NULL, top;
NODE *pushs(NODE *stack, int 1), *pops(NODE **stack),
*popq(NODE **q, NODE **rear)-;
void pushq(NODE **q, NODE **rear, int 1);
top = pops(&stack) ;
printf("Top node of stack is %d\n", top ? top->i : EMPTY ).;
putchar('\n');
[xe*xe End. test of Stacks fUNCtLONS at tea),
Note that all functions to be called in main() are prototyped. Remember the
style guideline in Chapter 1 that all functions should be prototyped in their
caller(s). If you are programming in a team situation, some functions, such as
utility functions needed by all team members, may have their prototypes
grouped in a .h file. However, when we are talking about your code calling
your code, it is much easier on the reader if you make his or her eyes move
only a few inches to the top of the page rather than several pages back to
some global area outside of all functions.
Stacks and queues have many obvious areas of applicability. Queues are
used everywhere in operating systems: job queues, mail queues, print queues,
and so on. Print jobs are queued because we want to provide print services in
the order in which they are requested: first come, first served! Indeed, in any
application demanding a “first come, first served” algorithm with a changing
amount of data (e.g., print queues shrink and grow), a queue is indicated.
3.3 The Linked List Data Structure 55
Stacks have somewhat fewer areas of applicability, because the need for a
“last come, first served” algorithm is less common (though by no means rare).
One very common use of stacks is in language compilers such as the C com-
piler you are using to do your homework.
Typically, if a function A calls a function B that calls a function C, the
parameters of those functions are put on a stack, and so are the entry point
addresses of the functions. Why? Because in the A-B-C calling sequence, we
complete the entirety of C’s code before we finish the entirety of B’s code
before we finish the entirety of A’s code. In other words, the last one called
finishes before the second-to-last finishes, and so on. Last called, first fin-
ished!
Parameters of functions are also put on a stack by compilers. The order of
processing of parameters is irrelevant (as long as we can refer to them in the
function), so we should use the simplest dynamic structure possible: a stack.
However, we do need a dynamic data structure, because the number of param-
eters in a function varies from one function to the next.
At the beginning of this chapter, we saw why dynamic data structures are nec-
essary when a program works with a data collection that grows or shrinks dur-
ing program execution. Stacks and queues are useful when we have a variable
amount of data that must be added or removed from the same place in the
structure every time.
Sometimes, however, we may want to keep the nodes in key order, where a
key is one component of a struct, such as an integer or a string. The list of
college students discussed at the beginning of the chapter is a linked list in
order of student id number.
We will now look at basic code for initializing, inserting into, deleting
from, searching, and traversing linked lists. Read the code and the output
from it carefully. We are using a very simple node type again, because we are
interested mainly in the pointer operations.
d#Hinclude <stdio.h>
#Hinclude <string.h>
dHinclude <stdlib.h> /* Has malloc and free. */
Given these declarations, let’s follow some of the code. First, we will initialize
our linked list with a dummy header node and a dummy trailer node. Here is
the code that accomplishes this:
NODE *init_list(void)
{
NODE *list;
After this list initialization, here is what our list looks like:
= HEH!
Why were two dummy nodes created in main()? Think about the difficulty
of inserting into an ordered linked list without dummy nodes. You have three
cases to consider—insertion at the front of the list, insertion at the end of the
list, and insertion between two existing nodes.
3.3 The Linked List Data Structure 57
Furthermore, if it is possible to insert at the front of the list, you must com-
plicate matters more by passing a pointer to the list pointer in the insert ()
function. This leads to the greater complexity of double indirection (because
list would be a NODE **), as with the queue insertion routines.
By wasting two structs to construct a dummy header node and a dummy
trailer node before any real data are inserted, you are narrowing three insertion
cases down to one. You must, however, be very careful to make the key value in
the dummy header (the one pointed at by the list pointer) less than the key
value of any actual data record. Similarly, you must be sure that the key value in
the dummy trailer record is greater than the key value of any actual data record.
In addition to the obvious benefit of reducing three insertion cases to one,
you need not pass the address of the list pointer to the insert() function,
because the list pointer will always point at the dummy header node—no data
record will ever have a key value less than the dummy header’s key value.
We chose to make the key in the dummy header contain the empty string,
one that has only the '\0' character. The dummy trailer record’s key starts
with the last character in the ASCII character sequence, the one with octal
value 177. This is very safe if the key of actual data records consists of letters
or numbers such as a name, address, or Social Security number.
In programming, you must often make trade-offs such as memory usage
versus efficiency or code complexity versus efficiency. In this case the decision
was easy. At a cost of a mere two structs of wasted storage, we have vastly sim-
plified the insertion process and made it more efficient. Of course, evaluating
such trade-offs is not always this easy.
Before we leave the list initialization code, let’s look at the notation
list->next->next. It is possible that you have never seen this form of dou-
ble indirection before. Which pointer is this? It’s the one coming out of the
"\177" node (the NULL pointer). Read 1ist->next->next as meaning “List
points to a node that has a pointer component next that points to another
node that also contains a component next.”
Now we turn to the insert() and delete() functions, which use a method
known as the “chasing pointers” method. If you read the code in these two
functions, you will notice a pointer called previous that always lags one node
behind a pointer called current. We say that previous is “chasing” current
Read the code for insert() and delete(), and then we will explain how it
works via a diagram.
/**x** Keep cycling in the loop until you pass the ****/
/**** point where you expect to find "string". aK KK /
while(strcemp(string, current->string) > 0)
R
{
previous = current;
current = current->next;:
}
if (strcmp(string, current->string) != 0)
{
return NOT_FOUND;
}
else
{
previous->next = current->next;
free(current);
return FOUND;
}
3.3 The Linked List Data Structure 59
previous current
Suppose we want to insert a node with "banana" as its key. Obviously, the
"banana" node must go between the "apple" node and the "cat" node to
keep the list in alphabetical order. We need the current pointer so that the
new "banana" node’s pointer can be assigned to point at the "cat" node.
This is done in insert() via the new->next = current statement.
In order to finish the job of linking the new "banana" node into the list,
we must get the pointer coming out of the "apple" node to point at the new
"banana" node. This is done via the statement previous->next = new. Just
to crystallize this in your mind, we will diagram the effect of the two state-
ments just mentioned.
previous current
list
new "banana"
The delete() function also makes use of the “chasing pointers,” but for a
different reason. Looking at the diagram above (prior to insertion of the
"banana" node), let us assume that we wish to delete the "cat" node. We need
the previous pointer so that we can reassign the pointer previous ->next so
that it points to the "dog" node. We need the current pointer so that we can
say free(current) and give the memory space used by the "cat" node back
to the system.
How do you find a node in a linked list? You simply march across the list
until you find a node with the desired key value. Then you return a pointer to
the found node or NULL if it could not be found. Here is the code to find a
node in a linked list:
return list;
He = ]ist->next;
ae NULL;
}
The statement list = list->next above the while loop skips over the
dummy header node. We are not interested in that node—it merely makes
insertions and deletions easier.
As the code states, we keep moving the 1ist pointer from node to node
until we either find the desired node or terminate the loop because we’ve
gone past the point where the node was expected (i.e., the strcmp() returns
a negative number). It does not matter that we are changing the value of
list in the function. Because we did not pass a pointer to 171St, it will remain
unchanged back in the calling environment, pointing at the dummy header
node as usual.
Finally, we present the traverse() function that prints out the contents of
all non-dummy nodes.
Because this function is a lot like the find() function, no commentary will
be made on it. However, you should read the code; this is a stereotypical
function.
Finally, we show you the main() function, which uses all of the linked list
functions discussed thus far. A sample execution of the program is also
included with main().
3.3 The Linked List Data Structure 61
Example 3.12 Driver Program for Linked List Functions with Output
[RRR KKK KK KK KKK KKK KKK KK KKK KKK KK KKK KKK KK KK KK IKK KK KKK KK KK /
ifs “of
/* This program initializes, inserts intoeedeletes */
/* from, searches, and traverses an ordered linked * /
/<. list witheastning. Keys off
[RK KRKKEKK KKK KKK KKK KK KKK KKK KKK KK KK KKK KK KKK KKK K KKK EK KKK /
main()
{
NODES list, =nodeptr: /** Always prototype functions! **/
char string[20];
int found, duplicate;
int insert(NODE *list, char *string);
int delete(NODE *list, char *string);
void traverse(NODE *list);
NODE *find(NODE *list, char *string), *init_list(void);
ist —indtslisti@.s
Whi teaCpreinchG Enterestring. Cor quits... Gets (stinindgys
SUEPEMOCSP
IMG! “EUITie” ))
{
duplicate = insert(list, string);
if (duplicate)
if
printf("That string already in the list.\n");
}
}:
[BRR RRR KKK KI KKK KKK KKK KIKI KKK KK KK KK KKK KKK KK KKK KKK KKK /
[KKK KKKK KKK KK KKK KKK KK KKK KK KKK KKK KKK KK KKK KK KKK KKK KKKKEKKKKK KK /
[BKK RKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KK KKK KK /
while (current->i != i)
{
previous = current;
current = current->forward;
}
previous->forward = current->forward;
current->forward->backward = previous;
free(current) ;
}
The initialization of the doubly linked list with dummy header and trailer
nodes is left as an exercise at the end of this chapter. Let’s analyze this code in
graphical fashion as we did for stacks, queues, and singly linked lists. First,
here’s a view of the doubly linked list with only a dummy header node and a
dummy trailer node.
list
jd ||
As you can see, the values in the dummy header and trailer nodes assume that
no actual data value will be less than | or greater than 9999. The forward
pointer coming out of the dummy trailer node is NULL, as is the backward
pointer coming out of the dummy header node.
The method used to insert new nodes is still the “chasing pointers”
method, just as it was for singly linked lists. The difference is that when we
insert a new node, we have four pointer assignments to make instead of the
two pointer assignments necessary in singly linked list insertion.
Let’s assume that we have inserted a node with key value 250 into our dou-
bly linked list. Here’s what the data structure will look like:
Let’s follow the pointer assignments after the insert() function’s while
loop and insert a node with key value 500. Remember that we are assuming
that the new node will not be a duplicate of an existing node. You should
think about how to alter the code in Example 3.13 to prohibit duplicates, as
was done in singly linked list insertion.
Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
new->forward = current;
new->backward = previous;
previous current
previous->forward = new;
current->backward = new;
previous current
The delete() function uses the same old “chasing pointers” method but
features a difficult pointer assignment. Therefore, we will assume that we are
about to delete the record with key value 500 that we just inserted. Here’s the
state of the data structure in delete() just after the while loop:
previous current
Now let’s see the result of the following two statements after the delete()
function’s
whi le loop.
previous->forward = current->forward;
current->forward->backward = previous;
Here’s what the data structure looks like after these two statements:
previous current
3.5 Programmer-Controlled Memory Management 67
The first of these two statements makes the forward pointer coming out of
the 250 node point to the 10000 (dummy) node. The second statement makes
the backward pointer coming out of the 10000 node point to the 250 node.
Current->forward->backward means “Current pointsat a node whose
forward pointer points at a node with a backward pointer.” (We saw such
double indirection in the notation 1ist->next->next from the singly linked
list initialization code in Example 3.8.)
The free(current) statement gives the memory pointed at by current
back to the system, and the pointer current is now unassigned! There is a tre-
mendous difference between NULL and unassigned. A NULL pointer has a defi-
nite value, zero. An unassigned pointer has no value at all. Another crucial
difference is that free(some_pointer) works on a NULL pointer but, if
applied to an unassigned pointer, results in unpredictable (and possibly fatal)
performance at runtime.
Now suppose we wish to close this doubly linked list and make a doubly
linked rng. The closering() function moves a pointer out to the end of the
list so that we have a way of referring to the forward pointer in the dummy
trailer node. Now you know why pointers are often called reference variables—
they give us a way of referring to a particular node. You are encouraged to
review Closering() on your own.
[BRK KKK KKK KKK KKK KK KKK KKK KKK KKK KK KK KK KK KKK KKK KKK KKK /
/* 25///
/* Gets a free node from the free stack (if not */
/* empty) or from a malloc’ed block of structs. */
fs * /
[ERK KKKKKKKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KK KKKKEK KEK /
Note that the stack pointer is passed in—it is not a local variable. This is
because a node to be deleted from a list, stack, or queue will not be freed bya
call to free(). It will be deleted by a function we will write that will push a
node onto the free stack. Thus, this variable must be manipulable outside of
get_node() and passed in as a parameter to this deleting function.
Otherwise, the logic of get_node() is fairly straightforward. If there is a
NODE on the free stack, we pop it off, update the value of the stack pointer,
and pass back the pointer to the popped NODE.
If the free stack is empty, we look at the block of malloc’ed nodes. If we
have not yet exhausted all nodes in the block, we pass back a pointer to the
current struct in the block and then move the block pointer to the next
struct. Remember that this block is an array of struct, not a linked list, so
we move block along the array via block++. Block = block->next would
be a horrible blunder because the component next in the struct pointed at
by block is zero—that is, NULL. Why? Read the next paragraph!
Finally, if both the free stack and the block are exhausted, we create a new
block (the malloc() call), initialize block and blockrear, zero out all nodes
in the block via memset() (now you know why all the next components are
zero or NULL), give the calling environment the address of the first struct in
the new block, and once again move the block pointer to the next struct in
the block of structs.
Note that block and blockrear are declared as static NODE pointers. A
static local variable in a function has a value that persists between calls. It is
essential that the current position of block and blockrear be preserved
between calls to getnode().
Another mystery that needs to be cleared up is the NULL initialization of
block in its declaration. Won’t every call to getnode() result in this initializa-
tion? No. A static local variable in a function is initialized on the first call
70 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
only. Thus, from the second call of get_node() to the final call, this initializa-
tion will not occur.
The code for a delete() function, as shown for lists in Examples 3.9 and
3.13, is not shown in Example 3.14. (It appears as an exercise at the end of this
chapter.) However, it will clearly be different from the delete() functions
already shown in this chapter, because it will not call free(). It will push the
unneeded node onto the free stack.
Now that we see how the code works, let’s see what its advantages are. First,
free() is never called, so we don’t care whether it is written well or poorly!
Second, malloc() is called very infrequently, which leads to enhanced pro-
gram performance. Still don’t understand how the code works? Let’s look at a
few situations graphically.
stack I Pack — ||
The situation in the diagram above would occur if the free stack and the
block were exhausted and we were in the code of Example 3.14 just before
the malloc().
/* Already used
stack — || structs here. */
block blockrear
The situation shown in this diagram would occur if the free stack were
empty (because no deleted nodes were pushed onto it) and many structs
were taken from the malloc’ed block of structs. The structs before the
struct pointed at by block have already been used to build a list, stack, or
queue. In this case, when one needs a new struct to use as a new node for a
list, stack, or queue, it would come from the block of structs, and the block
pointer would move forward by one struct.
/* Already used
structs here. */
block blockrear
3.5 Programmer-Controlled Memory Management 7I
The previous diagram shows a case where many nodes have been inserted
into, and many deleted from, some kind of dynamic data structure. The free
stack must have been empty many times; otherwise the block pointer would
never have moved from its original position. Remember, new nodes are always
taken off the free stack until it is empty.
This diagram shows a situation that looks impossible but is actually quite
possible! However, a couple of things have to fall in place for this to happen.
If several insertions were done consecutively when the free stack was empty,
this could exhaust the malloc’ed block of structs. If two nodes were then
deleted immediately after exhaustion of the block, they would be placed
where deleted nodes are always placed—on the free stack. Thus, this situation
can only arise if several deletions immediately follow the exhaustion of the
malloc’ed block of structs.
As you can see, many states are possible: stack-empty/block-not-empty,
stack-not-empty/block-not-empty, stack-not-empty/block-empty, and _stack-
empty/block-empty. It is useful to think about which insertion/deletion
permutations give rise to these four possibilities. When you write such code,
all possibilities must be accounted for, even if some of them are highly
unlikely.
Students often ask this perfectly reasonable question: “Can’t I do every-
thing with just one data structure?” They mean with just a free stack or just a
malloc’ed block. A free stack alone is a possibility, but without the aid of a
preallocated block, all mallocs needed for insertions when the free stack is
empty will be one-node-at-a-time mal ]1ocs—just the kind of inefficiency our
approach avoids.
It seems feasible to do everything with a malloc’ed block. If you want to
delete a node from a list, stack, or queue, you simply move the block pointer
backwards (i.e., toward the beginning of the original block). Here’s the situa-
tion that spoils this plan: Suppose you have just exhausted a block (because of
insertions), mal]loc’ed a new block, and then performed several node dele-
tions from your data structure(s). You risk moving the block pointer back-
wards past the beginning of the original block. Needless to say, this will lead to a
serious runtime problem when you try to get a new node from this (illegal)
location.
I hope that these analyses have given you an appreciation for the attention
to detail demanded of all programmers who want to call themselves experts.
This code will enable you to “roll your own” memory management code so
that you don’t have to rely on good operating system memory management.
If you want something done right, do it yourself!
72 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
and a given student’s id hashes to a value of 2, then that student’s node will go
into the linked list pointed at by student_body[2]. Obviously, any hash func-
tion in this application must produce a value between 0 and 4 for every
record, because those are the minimum and maximum indices of the
student_body array. This is the essence of hashing.
However, there are two pertinent questions here: (1) How many linked
lists should you have in your hash table? and (2) What constitutes a good
hashing formula? Entire books have been written to answer these questions!
However, without going into the mind-numbing mathematical theory behind
hashing, we can suggest two rules of thumb.
First, the number of linked lists should roughly equal the number of data
records you expect. If you have, say, 40,000 records and only 100 linked lists,
then each list will be so large that you will lose the benefits of hashing. Sec-
ond, the hashing formula should distribute data records relatively evenly
among all the lists.
3.6 Hashing to Memory with Arrays of Lists 73
[KK KEK KKK KKK KKK KKK KKK KKK KKK KK KKK IK KKK KK KK KKK KK KKK KK EK KKK KKK /
[* */
/* Demonstration of array storage/retrieval via hashing x]
/* to an array -of linked lists: 28/f
[RRKKKKKKKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK KKK /
4tdefine NUMPOINTERS 5
d#Hinclude <stdio.h>
d#Finclude <stdlib.h>
#Hinclude <string.h>
struct student
{
ear SsiLil@i|s
struct student *next;
Saadwe
These declarations will be used for the hashing formula and for other func-
tions in this section.
[ KRRKRKKKKRKK KKK KK KK KKK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK /
[* */
/* Hash Social Security number by summing the cubes */
/* of the ASCII value of characters and then take i)
/* the modulo of this sum. Return result (hashval). */
[BRRRRRKRRKK KKK KK KK RK KKK KKK RK KKK KKK KKK KK KKK KKK KKK KKKKKK /
74 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
long hashval = 0;
Hold on a second, you say! What if two or more records’ student_id fields
hash to the same value. No problem. Because each pointer in the array of
pointers points to a list, we simply add the new record to the end of that list
(provided it is not a duplicate entry). When the keys of two or more records
produce the same hash value, then we say a collision has taken place. Lists can
hold limitless numbers of records, so collisions don’t present any problem
here—collidees simply occupy the same list.
Are these lists really stacks? No. Because we must check all entries to avoid
inserting a duplicate record, we cannot just add the node to the front of the
list. Are they queues? Again, the answer is no. We can’t add to the end of each
list straight away, because it might be a duplicate entry.
Should we have dummy headers and trailers such as we had with ordered
linked lists? No, the records in each list are not ordered by key. Also, in a real-
world application with 1000 lists, you’ll waste 2000 structs! And because
each list will have a very short size in a real-world application, the “chasing
pointers” method simply won’t work well if the average size of a list is 1.3
nodes per list—a realistic occurrence!
So how do we insert into these lists? Look at the code that follows.
[RRKKKRKKKKK KKK KKK KKK KK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KK /
mover= &student_body[hashval];
while (*mover)
{
if (strcmp(ss,(*mover)->ss) == 0) return (*mover);
3.6 Hashing to Memory with Arrays of Lists 75
mover = &((*mover)->next);
}
if ((*mover = (STUDENTREC *) malloc(sizeof(STUDENTREC)
))
see NULL)
{
printf("Malloc error in insert!\n");
exit (1);
}
strcpy((*mover)->ss,ss);
printf("Person with ss number %s placed in list %d.\n",
ss, hashval);
return NULL;
The real magic of this code is that it uses a pointer (mover) to follow the point-
ers coming out of the nodes in each list, instead of following the nodes them-
selves as we did with regular linked lists.
Here’s a graphical demonstration wherein a student with Social Security num-
ber 888-88-8888 is about to be added to the list pointed at by student_body[4],
which already has the record for student 333-33-3333. Each code line from this
insert() function is followed by a picture of the list.
student_body[4] 333333333 ||
mover
student_body[4] 333333333 ||
mover
mover
student_bodyl4] 333333333 ee ||
the “333333333” node, it has a reference to that pointer. In fact, that pointer
has a name, *mover. We simply malloc space for this pointer. The code after
the call to malloc() simply fills up the node with “888888888.”
This same logic is used in the traverse() function, which simply prints out
the contents of each list. Like insert(), it follows pointers instead of nodes.
True, traverse() can be written to bounce from node to node instead of from
pointer to pointer. However, we wanted to give you as much practice as possible
reading this type of code, because certain problems, such as insert(), can’t be
solved easily without this logic and its associated syntax.
This code represents a very powerful kind of C “magic” that we will revisit
when we look at binary trees.
[RREAREN
ET OU OUU COMLEML Se Ol ml Voor amin oot,
void traverse(STUDENTREC *student_body[]);
{
Wim. 18
STUDENTREC **mover;
And now finally, after all the preliminaries, here is a main() function that
utilizes all of these functions and produces output.
main()
{
char ss[10];
STUDENTREC *student_body[NUMPOINTERS] = {NULL};
STUDENTREC *person;
STUDENTREC *insert(char ss[], STUDENTREC *student_body[],
int hashval);
3.6 Hashing to Memory with Arrays of Lists 77
CONEAMES OF liSiG Ws
aa ereiy aT
666666666
COMEANUS Or liGEe Ze
444444444
Contents of list 3:
655555555
78 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
Contents of list 4:
333336558
888888888
LN _______
As you can see from this sample output, list 0 (pointed at by student_body[0])
is empty, and the other four lists (pointed at by student_body[1] through
student_body[4]) have one or two nodes apiece. Before we finish this section,
let’s take a closer at one more little detail, the line in main() that says
When you are ready to fill up the name and address components, you can
use temporary string buffers with a large declaration (say, 200 bytes) to fill up
the struct components. For example, let’s say that a pointer fooptr is point-
ing ata struct foo that is to be filled up with the values stored in local vari-
ables name and address. Here’s how you would do it:
fooptr->name = (char *) malloc(strlen(name) + 1);
strcpy(fooptr->name, name);
fooptr->address = (char *) malloc(strlen(address) + 1);
strcpy(fooptr->address, address);
When we malloc space for fooptr->name, we allocate just enough room
for the name plus the terminating '\0' character. Not a byte is wasted! From
now on, your nodes within your data structures should be as dynamic as the
data structures themselves.
Halse
In this diagram, each queue node holds the job-id and username of a user
requesting mail service, printer service, etc.—whatever type of service is
named in the main list node. This flexible data structure allows for a limitless
number of operating system services, and each service can accommodate a
limitless number of user requests for that service.
80 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
The data structure in this example is a list of queues. Clearly, you could also
find applications for lists of lists (often called multi-lists), lists of stacks, stacks
of lists, queues of stacks, and so on. The number of permutations is very
large! The hard part, as usual, is designing the node structures and pointer
assignments to implement initialization, insertion, deletion, and traversal
algorithms for these complex, hybrid structures.
We'll show you the insertion, traversal, and initialization algorithms for this
application. The deletion algorithm is left as a chapter exercise.
#Hinclude <stdio.h>
#Hinclude <stdlib.h>
#Hinclude <string.h>
previous = current;
current = current->next;
}
if (strcmp(qname, current->qname) != 0) /* New service! */
{
if ((newserv = (SERVICE *) malloc(sizeof(SERVICE)
))
== NULL)
{
printf("Fatal malloc error!\n");
exit(1);
}
if ((newserv->qname =
(char *) malloc(strlen(qname) + 1)) == NULL)
{
printf("Fatal malloc error!\n");
exit(2);
}
strcpy(newserv->qname, qname);
newsiervornext == cuinrenits
previous->next = newserv;
newserv->front = NULL;
i
if ((newreg = (REQUEST *) malloc(sizeof (REQUEST) ))
== NULL)
{
printf("Fatal malloc error! \n");
exit (3);
}
if ((newreq->username =
(char *) malloc(strlen(username) + 1)) == NULL)
{
printt Gokatal mal locrerrnont\n).:
exit(4);
}
strcpy(newreq->username, username);
newreq->job_id = job_id;
newreq->next = NULL;
if (newserv->front == NULL)
{
newserv->rear = newserv->front = newreq;
}
else
{
current->rear->next = newreq;
current->rear = current->rear->next;
}
82 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
This code can be quite confusing to the uninitiated. Let’s see how it words
graphically. Let’s assume that username smith is entering job-id 22222 as a
request for queue Print1. Assume that our data structure looks just like the
diagram at the beginning of this section.
Here’s what the data structure looks like when Print1 is found, a new
REQUEST node is allocated and filled up, and the new REQUEST node is linked
into the proper queue.
previous current
alist
reassigned via:
current->rear =
current->rear->next;
newreq
reassigned via:
current->rear->next
= newreq;
Now let’s look at the traversal program that prints out all the list and queue
contents.
}
a a
This code goes to each SERVICE node, prints out the service name in a head-
ing, and then prints out the contents of the queue associated with that ser-
vice—that is, the usernames and job-ids of the service requesters. The for
loop processes all the REQUEST nodes in a queue. The while loop processes
the SERVICE nodes.
Now let’s look at the rest of the code associated with this program and at its
output.
main()
{
SERVICE=*serv_ptr, *init_list(void);
void getinfo(char *qname, char *username, long *job_id);
void insert(SERVICE *serv_ptr, char *qname,
char *username, long job_id);
void traverse(SERVICE *serv_ptr);
char qname[30], username[20];
long job_id;
igeegservapir =
(SERVICE *) malloc(sizeof(SERVICE))) == NULL)
{
printf(" Fatal. malloc error, im init_serv
ptr! \n2);
excel);
84 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
}
if ((serv_ptr->qname = (char *) malloc(1)) == NULL)
{
printf("Fatal malloc error in init_serv_ptr!\n");
exieCl):
i
*serv_ptr->qname = '\0'; /**** Dummy header value. ****/
if ((serv_ptr->next =
(SERVICE *) malloc(sizeof(SERVICE))) == NULL)
i
printt("Fatal malloc error insinitsserveptrian):
exa GCs
}
if CCserv_ptr-onext=>qnane"= (char *) mal loc(Z) == NUEB)
{
printhe"Fatal: malloc errormaneinitaservaptrii\nys;
exi EC);
}
*serv_ptr->next->qname = DUMMY_TRAILER;
serv _ptr-=>next->qname[1l] =""\0";
serv_ptr->next->next = NULL;
RACURMES Ce AVEpIUitns
}
poem (Cee WiMowhe Fron USAle che whe weiMiMells — weetets7/
void getinfo(char *qname, char *username, long *job_id)
{
char line[512];
Service name:
thomas
james
james
edwards
As you can see, the service names are printed out in ASCII order, which is cor-
rect because the SERVICE nodes are in an ordered linked list. The REQUEST
nodes are printed out in the order in which they were entered into the
request queue. This also is as it should be; a queue is a “first-in, first-out”
(FIFO) data structure.
<string.h> Functions
list-Sstringe= (char)
Ml TOECSEPTSIMNCSeliing)) se IL)
Exercises
1. You have a file that contains the names of computers in a ring network
(1.e., a network in a circle shape) and whether they are up or down
(down, of course, means not functioning). The file looks like this:
wigearr Al
slowpoke 0
foobar 1
[KK
More data here ****/
a. Repeatedly prompt the user for a computer name (until user says
Squat”).
b. If the computer name exists, the program will return the nearest up
computer to the left and right of the user-supplied computer.
c. If the user-supplied computer does not exist, the program will supply
an appropriate "Not found" error message.
d. If the computer has only one up neighbor, this fact should also be
reported.
e. Do not try to calculate “nearest.” Follow the left (or backward, what-
ever you Call them) pointers first to find the left neighbor.
am 2. Write a function that takes a stack pointer of type NODE * as its only
parameter and returns a pointer to the stack turned backwards. Assume
that the pointer component of stack nodes is called next. Your answer
should have no malloc() calls!
. Write a function that will initialize a doubly linked list with nodes that
look like this:
struct node
{
char *key;
struct node *next;
ia
w 4. Assume you have a program that has constructed a linked list and that is
supposed to put deleted nodes onto a free stack. Rewrite the linked list
delete() function given this information. Hint: It should have three
parameters.
MULLIN, CHRIS 23
ROBINSON, DAVID 32
JABBAR, KAREEM 25
BIRD, LARRY 22
MULLIN, CHRIS 30
Peel Cua:
As you can see, each line contains the name of a basketball player fol-
lowed by the number of points he scored in a game played within the last
month. Obviously, a player will have multiple entries (as Chris Mullin
88 Chapter 3 The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
does above) if he played in more than one game in the last month. Write
a program that will
a. Create a linked list ordered by player name. Each list node will have a
stack of point nodes attached to it. See the following data structures:
struct points
{
int points;
SUPE WOUMES Nes
hi
struct player
{
char name[30];
struct player’, *next;
Struct points ~Stack>
He
. After the linked list is created, prompt a user for a player name. If
the player is in the list, display the average points per game for that
player.
. Give an error message if the player is not in the player list.
. Prompt the user repeatedly until he or she says "quit".
. When the user quits, create an output file that looks like the following.
As can be seen, the output file should be in alphabetical order, and the
numbers represent individual game totals followed by the average
points for these games.
BURDSMEARRY @22 623. .2502502
MULLIN, CHRIS 23 30 20 24.3
ROBINSONGMDAV ADs U7 17s lew feelia Ve Ze 210
65 (EWES, con
- Remove all whitespace from the user’s entries, but otherwise assume
that these entries are correct.
@ 6. You have a file (accounts) with lines consisting of two fields: a four-digit
bank account number, and the balance in the user’s account—for example,
Us 2543..90
1001 2500.00
1058 24.34
a CLCs tac
Write a program that will do the following:
a. The records in this file are in random order, but you will put them
into a linked list in order of account number. Assume correct input,
Chapter 3 Exercises 89
and assume that no account number will occur more than once in
the file.
b. Then you will put the program user in an input loop (until the user
types "quit"). The user will enter transactions as follows: account-
number transaction-amount. Thus if account 1111 has two entries,
such as
TPIT > 100. 00 /***x A withdrawal! beeSu |
Pity 25200 [ *X*S EA ep Osta x= 7:
then account 1111 will end up in an ordered “transactions” list with a
net transaction amount of —5.00. You may assume correct user input.
Thus each account number will be associated with one “net transactions
record” that is updated each time the user enters more information.
c. After the user enters "quit", update the list created in part a by add-
ing the transaction amount to each account’s balance (if it had any
transactions). You may assume that every account number from the list
made in part b is represented in the list made in part a.
d. Create three output files: one to contain the transaction records, one
to contain all the net balance records (after updating), and one to
contain just the records of the people with negative net balances.
e. Use the block allocation technique discussed in Section 3.5.
Write a function that will join all the student_body linked lists dis-
cussed in Section 3.6 into one big linked list and return a pointer to that
list. You will be very surprised how short this function is if you code it
properly!
a 8. Write a function that will go through a doubly linked list of nodes and
transpose each pair of nodes. Nodes are of type NODE and have compo-
nents char *key andanext pointer. The header is the empty string, and
the trailer node contains "ZZZZ". Do not move the list header or trailer
nodes. Pointers in nodes are called forward and backward.
Rewrite the e] se clause code from the insert() function in Example 3.9
if the string component of each NODE is declared asa char *.
#10. Write a deletion function for the operating system (OS) queue program
shown in this chapter. This function should remove a request from the
front of a service queue, symbolically indicating that a user has been pro-
cessed.
11. Do the same as in Exercise 10, but assume that the deleted node goes on
a free stack.
90 Chapter 3. The Linear Dynamic Data Structures: Stacks, Queues, and Linked Lists
wl3. Redo Exercise 8 for a singly linked list. This is much harder, because the
list nodes have no backward pointer. Your function should have no calls
to mal loc(). You may assume that the number of (non-dummy) NODEs is
even (0, 2, 4, 6, and so on).
4.1 Passing Address Expressions
as Arguments to String.h Functions
Almost everybody at the beginning and intermediate levels has learned how to
pass simple string variables to functions like strcmp(), strcat(), strlen(),
and strcpy(). In fact, many people at these levels have never passed anything
but simple variable names to these functions. However, like all C functions, the
string.h functions can take any expression as a parameter so long as it agrees
with the parameter data type. Look at the following code.
strcpy(string, "“Hello’);
printt(“Zd\n", strien(strinag + 2));
What number will the printf () print on the screen? Many, knowing that the
length of "He]10”" is 5, would answer 7. However, 7 is the value of
Strientstring) + 2:
As usual, if you draw a picture, you can figure out the correct answer very
rapidly.
String string string
ape ll tae eo
91
92 Chapter 4 Advanced String Handling
ptr
string
Because ptr points to a character after the first '\0', many would contend
that this printf() should lead to a runtime error or print nothing at all.
However, this contention would be incorrect, because printf () simply starts
printing the characters from the given address (i.e., ptr) and keeps printing
until a '\0" is encountered. Thus "Go" will appear on the screen. Given the
previously diagrammed string, what will the following printf() print?
DENIER CSSA Ties eS Clie ooh.
This printf() will also make "Go" appear on the screen, because the
address string + 3 is the same as ptr. Later in this chapter, we will look at a
very special and powerful string.h function called strtok() that cuts a
long string up into tokens all separated by the '\0" character. This last exam-
ple was not merely an academic exercise!
Thus we can state a rule of thumb: When a string-handling function is given an
address, it starts the manipulation of the string at that address and continues until it
encounters the '\0' character
Before we leave this section, let’s look at two more situations where address
expressions are passed to string.h functions. Here’s one:
What will this printf() print? Don’t look at the answer just yet—this is a
preparation for some of the exercises at the end of this chapter.
If your answer is that “Go Dogs!” (quotes not included) will be printed,
you are correct. The strcpy() says that you should start copying into address
sl + 3, the address of the "L" in "Lions". It also says that you should start
copying from address s2 + 4, the address of the "D" in "Dogs!". Thus
"Dogs" and a terminating '\0"' will overwrite the "Lions" part of s1.
What will the printf() in this code print?
4.2 The Strchr and Strrchr Functions 93
These two functions look for a particular character, starting from the left end
of the string (strchr) or from the right end of the string (strrchr). Their
prototypes are identical, so we will save the prototype of strrchr() for the
“Facts About Functions” section.
Cale ““SepeliiirCelhnigie “SiriiaG. Wine eis
Here’s some sample code using strchr() and strrchr(), followed by the
output generated by that sample code.
d#Hinclude <stdio.h>
#Hinclude <string.h>
main( )
{
char sl[] = "I am the one, the only one.";
Elneie *ioelell,, soneirZs
[**** Find all the o's in sl. Print their addresses. ****/
Let’s analyze this code and output. The first calls to strchr() and strrchr()
produce predictable results. They return the addresses of the first and last
‘o' in s1. When these addresses are given to the printf() function using
the %s format descriptor, it prints from the address given to the '\0' charac-
ter. Clearly, both functions worked correctly.
The for loop appears to do a few mysterious things. First, what is the “p
format descriptor in the printf ()? This is the format descriptor you should
use to print out addresses. As you can see, these addresses are printed in hexa-
decimal (base 16). The biggest mystery in the for loop involves the use of
ptr1. How does this loop work?
After strchr() finds an '0", it assigns the address of it to ptr1. But then
you want to start the search for the next '0' in the part of the string after the
"o' just found! That’s why we increment ptr1 in the increment section of the
for and then use ptr1 as the first parameter to strchr(). Thus, every
strchr() call from the second call to the last call is given the address of the
character that immediately follows an '0'. When no 'o" characters remain,
strchr() returns NULL, which ends the loop.
Here’s a diagram indicating ptr1’s position after the ptr++ in each itera-
tion of the for loop. The characters that make up the string are spaced out
for readability.
Sy ae OMe
ptrl
printf("Address of b = %p\n\
Address of c = %p\n\
Address of d = Z%p\n\
Address of a in b = %p\n\
Address of a in c.= Zp\n\
Address of a in ad = 4p\nie> bs ic..d,
SUPSEPCD 54) > SiePSiERCE.4).-
strstncdsanys
}
[RRKKKKKKKKKKKKKKKK SAMPLE EXECUTION KKKKKKKKKKKKKKKKK KKK /
Address of b = 7fffaf24
Address of c = /fffafl18
Address of d = /fffafl10
Address of a in b = /fffaf24 (fs OMNES Bie HOt ali) CENEBIKE 7/
Address of a inc = /fffaflb /* Points at second “c™ 7]
/* jin concatentate */
Addresses on vacin d= 0) 8 / xxaeINU NOMUCall ani OOD ain esx a7,
Let’s analyze the results. Because "cat" is at the very beginning of "cattle",
strstr(b, a) should return an address that is identical to the starting
address of "cattle"—that is, the 'c' in "cattle". Indeed, the output bears
this out.
"Cat" is also found in "concatenate" starting at the fourth character. Thus
strstr(c, a) should produce an address that is 3 higher than the address of
96 Chapter 4 Advanced String Handling
d#Finclude <stdio.h>
#tinclude <string.h>
main()
{
char aL] "Mississippi is my sister state",
bL] = "Nonsense",
cial "banana anatomy",
dL] = "Andy Andrews";
int find_num_substrings(char *string, char *substring);
int numfound;
string
98 Chapter 4 Advanced String Handling
mi ses Tos Slal pupeih” 1csmeihey. SAI Set ser rae roubadmorcun”
string
mixs's
1s Ss Si pip 17 OGES inyy Was siuset
er hmntcat ame sen
string
string
Thus in each while iteration, string moves from its current position to
the "i" in the next "is" in string. The count variable is then incremented,
and string skips over strlen("is") characters to reposition itself for the
next search. See—it really is as easy as it looks!
Later, after we learn about arrays of strings, we will rewrite find_num_sub-
strings() so that, instead of returning the number of substrings in string,
it will return an array of all the addresses in string where substring occurs—
that is, an array of long.
Let’s continue our investigation of the more advanced string.h functions.
length of string. In this case, the printf() will not be executed, because
there is no non-digit character in digits. The prototype of strspn() is
SiZestistrspn( char. ~stringas echar *set_to_skip_over) ;
Remember, a size_t is compatible with any kind of int (provided your
int is large enough to hold the return value). Strcspn() is the complement
of strspn(). It returns the number of characters in string that are skipped
over until a character is found that isin set_to_skip_over. Its prototype is
identical to strspn(). Here’s a bit of code with output to give you a more
concrete idea of how these functions work.
dFinclude <stdio.h>
#Hinclude <string.h>
main()
{
char sentence[] = “Hello, I am John!";
char nonsense[] = "~*%#";
char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
abcdefghijk]mnopqrstuvwxyz";
int numskipped;
d#tinclude <stdio.h>
#include <string.h>
#include <stdlib.h> /*** For exit function. ***/
Note that strchr() would not be an appropriate choice for our sentence counter,
because it can look for only one kind of character per search. Strpbrk()
can look for any one of a set of characters.
The code in count_sentences() is very minimalist. Fgets() returns NULL
when EOF is encountered. Because NULL is zero, this terminates the while.
Otherwise, fgets() puts a line from the file into its first argument (1 ine)
including the line terminator
('\n").
The outer for loop keeps looping if strpbrk() finds one of the end-of-
sentence markers. It terminates when strpbrk() returns NULL (which means
102 Chapter 4 Advanced String Handling
it couldn’t find any more) "24°14, or” .") Ihardigit follows an end-of-sen-
tence marker, it does not increment num_sentences. That’s because, given
our assumptions, it means that the end-of-sentence marker found was a
period that was part of a number (such as 3.5) rather than a real end-of-sen-
tence marker.
Finally, the inner for loop keeps cycling if the next character is a '?" or
'!'. This accounts for sentences that end with two or more '?" or '!" or
both. We don’t want to double- or triple-count these sentences. The notation
mover[1] seen in count_sentences() means “the character after the one
currently pointed at by mover.” Remember that array indices are always rela-
tive to the current pointer location.
Now that we have explained how count_sentences() works with words,
let’s explain it with pictures. We will show the mysterious movements of the
mover pointer as it hops along each line of text. Characters in the diagrams
are spread out so that you can see where mover is pointing without difficulty.
mover
mover
/**x* After inner for loop and mover++ in outer for loop. Points at space! ***/
mover
The next call to strpbrk() from this point results in a NULL return, termi-
nating the outer for loop. Now we get the next line in the input file:
mover
At this point we will stop, because the last two lines of the file contain noth-
ing extraordinary. When testing the validity of your program, you must always
be sure to test extreme cases. In this program, the “extreme cases” are lines
ending with more than one '?' or '!' and lines containing a period in a
number. It’s just such cases that normally make programs bomb if they are
not accounted for!
d#Finclude <stdio.h>
d#Finclude <string.h>
main()
{
chartineksojs spinboa ptnz:
If the line entered by the user does not have two '#' characters, the rest of
the loop body is skipped via a continue instruction. The real “trick” here 1s
the use of variable formatting in the printf(). An asterisk (*) in the format
spec tells print f() to look in the expression list for the size of the string—in
this case, ptr2 - ptri. We will examine such variable formatting in detail
later in Chapter 5 on advanced input and output.
Here’s a sample string with the positions of ptr1 and ptr2 just prior to the
printf(). Again, the characters in the diagram are spaced out for visibility.
There are no actual spaces in the string.
ptrl ptr2
Now let’s change the wording of the problem slightly. Suppose you are
asked to print out the part of the string between the first two digit characters. As it
turns out, the only change to the code in Example 4.6 would be to change the
two strchr() calls to strpbrk() calls, because the digits are a set of charac-
ters and not just one character. The two strchr() calls would be replaced by
ptrl ll= strpbrk(line, "0123456789") ;
and
I hope Examples 4.5 and 4.6 have opened your eyes to the vast problem-
solving power of the C string-handling functions. And the best is yet to come!
This function’s workings are much easier to show (via example) than to
explain. Suppose a user entered the following input line:
Oa, 3, 40
Let’s assume that we have this entire input line in string variable line.
Let’s also assume that we have four char * variables—ptri1, ptr2, ptr3, and
ptr4. Then we have the following four lines of code:
ine =i @ \C 3 9) WW 40 \0
| t i ptr4 — ||
ptrl ptr2 ptr3
make the space visible to the reader of my program. Many unprintable char-
acters show up on printers and/or terminals as whitespace, so you are doing
the reader a favor by making it obvious that it is a space. Of course, the '\t"
is ANSI C’s escape sequence for the tab character.
The beauty of strtok() is that it delivers “clean” tokens to our program,
free of any “junk” characters that we are not interested in. Quite often, these
tokens are supposed to denote numeric strings. Therefore, wherever strtok()
appears, you usually find one of its three cousins—strtol(), strtoul(), and
strtod(). These three functions attempt to convert a string into a signed
long, an unsigned long, and a double, respectively. They also have valuable
error detection facilities not present in their outmoded (and terrible!) breth-
ren, the atoi() and atof() functions.
In the next section, you will see strtok() work together with these three
crucial functions.
We will not discuss strtoul(), which ignores a sign, if one is present, rather
than reporting it as an error. The key to understanding strtol() and strtod()
lies in the second parameter: the char ** I have called endp.
Usually, you pass the address of an unassigned char * into these conver-
sion functions, and they come out pointing at the first character in string
that could not be converted into a number. If string is a clean token col-
lected by strtok(), the character pointed at ought to be the '\0' character
unless the string does not represent a correct 1ong or double, depending
on which conversion function you’ve called.
Example 4.7 demonstrates the use of strtok() and strtol() together.
4.8 Converting Numeric Strings to Numbers 107
main()
{
int tokennum, num;
char line[256], *lineptr, *tokenptr, *endp;
Hes cai
/* Lineptr == line for first iteration of for loop ‘alld
(-Only a Lue iisese Vat OmNULI Ra tlenetnat. ay:
[ RRKRRKKK EK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKKKKKK /
lineptr = line;
for (tokennum = 1; tokenptr = strtok(lineptr, "\040\t");
lineptr = NULL, tokennum++)
{
num = (int) strtol(tokenptr, &endp, 10);
if (*endp != "\0"')
Token number 2 is = 5
Token number 3 is not a valid integer!
As you can see from an examination of the code and output in Example 4.7,
the value of *endp coming out of the call to strtol() is very important. If the
value of *endp is the ‘\0’ character, then the string was a good long. If *endp
points at anything else, it is not a good long. If a character can’t be converted
properly, its value is contained in *endp. Thus, if a user entered 324 as a num-
ber, *endp would emerge from strtol() with the erroneous ‘Z’ as its value.
Another feature of the code in Example 4.7 that should be mentioned is
the assignment
lineptr = line;
This is important, because from the second call through the final call of
strtok(), we want the first parameter to be NULL. On the first call of strtok()
we want the first parameter to point at the line to be parsed. Because line is a
pointer constant, its value cannot be changed. Thus we used 1 ineptr, which is
a pointer variable. Oh, those nagging little details!
4.8 Converting Numeric Strings to Numbers 109
We have seen how the second parameter of all three conversion functions
gives us vital information about the success or failure of the conversion. If
there’s a failure, it will be reflected in the value of *endp. However, something
else can go wrong that you must know about. It is possible that a token contains
all digit characters, but the number is too big to be represented on your computer!
In this event, the following behavior occurs:
1. The value of *endp is set equal to the first character of the first param-
eter.
2. The built-in variable errno is set to ERANGE. If errno is used, you must
include the errno.h header file in your program.
3. The return value of the function is HUGE_VAL.
Example 4.8 Correct and Incorrect Ways to Use Errno with Strtol
#Finclude <errno.h>
Whitlem@come-condition). -/**** slihis, doops ancorrect) las*ax/
{
num = strtol(string, &endp, 10);
if (errno == ERANGE) { do something }
versus,
#finclude <errno.h>
while (some-condition) /**** This loop is correct!! ****/
{
110 Chapter 4 Advanced String Handling
errno = 0;
num = strtol(string, &endp, 10);
if (errno == ERANGE) { do something }
}
El
Strtod() sets errno to ERANGE on both overflow (number too large) and
underflow (number too small). Because strtod() usually returns zero on
underflow, checking for ERANGE with doubles may be more important than it
is for long returns from strtol(). However, I think it’s safe to say that you'll
be using strtol() dozens or hundreds of times more often than strtod().
Because strtok() and strtol() are such vital functions, this is probably a
good time to gather what we know about them in two lists of “Key Facts.”
1. Strtok needs the name of the string variable to parse the first token
from the string. To collect all other tokens from the same string, give
NULL as the first parameter.
You may change the delimiter list from one call of strtok() to the
next if you find it necessary.
Strtok returns NULL if it cannot find any more tokens in the string
starting from the current point out to the '\0".
Strtok overwrites the first delimiter character after a token with
"\0'. This means that it mangles the original string! If you need
the original string in its original form, make a copy of it before you
strtok()
it!
The string pointer returned by strtok() points to a string with no
delimiter characters. This is what we have referred to as a “clean”
string.
1. Strtol returnsa long int if it can find one. Thus, if the string param-
eter points to "56X", strtol() will return a 56 but endp will point at
the (unconvertible) X.
. Ifa “clean” token (no whitespace) containing a valid long int is given
to strtol(), endp should point at the '\0" character at the end of the
token string.
4. If the token given to strtol() is all digits but is too large (in the posi-
tive or negative direction), strtol() will return HUGE_VAL and set the
errno variable to ERANGE.
5. If you use errno, don’t forget to include errno.h, and don’t forget to
set errno to zero before each strtol() call.
a
Before moving to the last topic of this chapter, arrays of strings, I want to
encourage you to do as many exercises at the end of this chapter as you can.
Aside from dynamic data structures, there isn’t a more important topic in this
book than string manipulation. If you intend to write compilers, user inter-
faces, point-of-sale systems, text file analysis programs, or Web browsers, you
are going to work with strings until you are blue in the face!
To re-emphasize this importance, I am going to present a short case
study—a program that counts the frequency of each word in a text file. All
programs must make assumptions about the nature of the data they encoun-
ter. So will this one. Those assumptions follow.
1. Punctuation will always occur immediately after the end of a word and
will never be preceded by a space.
2. A “word” can contain letters, digits, periods (as in 3.5), and hyphens.
However, no word will be all hyphens, and a period at the end of a word
will be considered an end of a sentence and not as part of a word.
3. All sentences will be separated from one another by at least one
whitespace character.
The program will read from a text file whose name is entered by the user. It
will output the list of words (all lowercased) in ASCII order. Thus you should
keep an ordered linked list of nodes containing a word and its frequency count.
We’re going to use everything discussed in the book thus far!
Hints? Use fgets() to read the lines of the file and strtok() to parse the
tokens. Good luck!
*/
ifs
/* This program counts the frequency of words in a ay
/* file. Words are defined as any combination of zy)
112 Chapter 4 Advanced String Handling
dHinclude <stdio.h>
##include <string.h>
d#Finclude <stdlib.h> /**** For malloc and exit. ****/
[RRKKKKKKKKKKKKKKK End include ilese KKKKKKKKKKKKKKK |
[KKK KKK KKK KKK KKK KK KK KK KKK KKK IK KKK RAK KKK KA KK /
fas ‘ei
/* Initialize list with dummy header and “aif
/* trailer node. Header node has an empty */
/ SAStPANGE welikailer=noder is eon
li ice wif
he i]
[KKK KEKE KKK KK KKK RK KKK KKK KKK KKK KKK KKK KKK KKK KKK /
WORDNODE *init_list(void)
{
WORDNODE *list;
(3 E <i
/* Obtain input filename from user. ail
/* User’s entered filename has whitespace caf]
/* removed and all chars are lowercased. “a|
1% cll)
[RRRKRRKEKKKEKKK KKK KEK ERR ERK EKER RE KER ERK KEK KKRERE |
char *get_filename(void)
{
114 Chapter 4 Advanced String Handling
/* “}
/* Open user-specified input file. Abort */
/* if unsuccessful. Pass FILE pointer cid|
/* back if open is successful. */
/* i
[RRR KKKKK KKK KKK KKK KKK KK KK KKK KKK KK KK KKK KKKKK KK /
[ERK KRKKRKKKKKKKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKK KK /
/*
“of
/*
Insert a word into an ordered linked */
/*
list. If the word is already in the ah
/*
list, increase the frequency count by /,
/*
one and do NOT insert a new node. xf
/*
*/
[RRR RKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KEK /
fs */
/* Get lines from input file. Fetch all word */
/* tokens from each line. Put words and their */
/* frequency count into a linked list with aL,
/* nodes in ASCII order. * /
/* */
[RRR RKKKKKK KKK KK KKK KKK KK KKK KK KKK KKK KKK KKK KKK KKK KK /
word[strlen(word) - 1] = ‘\0';
*mover = tolower(*mover) ;
}
TMS Cine GleiSieeaaWOlndpe
} /**** End for ****/
} /**** End while ****/
[KRKKKKKKKK KKK KK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KK /
ee ey
/* Move across the word list. Print the rif
/* words and their frequency counts in =)
/* ASCII order to file freqg.out. zy
fs ci
[BKK KKKK KKK KKK KK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK /
The text file used as input and the program output follow.
Example 4.12 Input File for the Word Frequency Counter Program
To find out more about this issue, visit our Web site --
www.foobar.foo.
Ded
a
about
allowing
american
and
fe’) = (a>)
attention-starved
average
bad
be
been
best ee
ee
SO
ee
Chapter 4 Advanced String Handling
bottom
by
can
care
caused
children
combat
companies
competition
counseling
course
deal
due
else
employee
employees
even
face
families
family
find
for
frayed
get
global
hand
has
have
higher
however
in
increased
increasing
is
issue
issues
ie
joy
know
lack
leave
less
like
line
lives
look
margin
matters
morale RPP
RFP
MRP
WRPHEP
PRP
RP
BR
WP
WW
PWR
DP
WH
WRF
WNP
PPP
Pe
4.9 Case Study of String Handling: The Word Counter Program 119
more
must
nerves
non-work
objectively
of
on-site
or
our
out
overwrought
parents
paid
politics
pressing
pressure
productivity
profit
reluctant
site
some
steps
stress
stressed-out
take
telecommuting
that
the
their
them
these
they
this
time
to
usually
visit
volunteered
web
when
which
whose
with
workplace
would
www. foobar. foo
you Pe
OB
DR
NP
PROF
PHP
RP
PR
PRP
BRR
OWrR
BP
rPNMRPrRrPNMRPRFPRP
Pr
120 Chapter 4. Advanced String Handling
This program uses virtually all of the C idioms discussed so far in this book.
You used your knowledge of lists to create the list initialization, insertion, and
traversal functions. You used strtok() with an exceptionally long delimiter
list (all characters except letters, digits, hyphens, and period!) to parse each
input line in the file, given the rules for word parsing. You used your pointer
skills to clear whitespace from the user’s input. If you have completed this
lengthy exercise, congratulations; you’ve come a long way!
I hope you noticed that the input file, as it always should, tested the
extreme cases—words with periods at the end, words with periods in the mid-
dle, words that are all hyphens, and so on. Everybody loves to code, but few
delight in constructing a proper test of their code. It will serve you well to
develop this habit.
Now you're ready to make the final leap—the leap that will complete your
study of strings.
Arrays of strings are extremely versatile and are commonly used. Compil-
ers may put reserved words in an array of strings and then do a “table lookup”
to see whether a token parsed from a program is one of the reserved words.
An array of strings might be used to hold the lines of a text file in memory,
especially when the file is known to be of short or intermediate size. Another
use might be in a user interface to hold user input tokens, especially when the
number of these tokens is unknown. This data structure, like queues and lists,
has an endless variety of application domains.
Let’s start with the most important array of strings in a C program—the
command line argument list. Up to this point in the text, we have given the
main() function no arguments. Up to this point, if we wished to pass values
into the program, we have had to prompt the user for the value of these argu-
ments, one by one. Now we’ll show you the correct way to do it!
[KKKKKK KK EK KKK KK KK KEK KKK EK KK KK KK KKK KK KK KK KK KKK KK KEK KKK KKK KKK /
(es ey!
/* A demonstration of using and traversing command line */
/* arguments. These arguments are held in an array of */
i= pointer tor chat. */
ifs * /
[RRRKKK KKK KKK KKK KKK KKK ERK KEKE EKER ERK KIK KKK ERK KKK KA KKK |
4.10 Arrays of Strings (Arrays of Pointer to Char) 121
d#FHinclude <stdio.h>
Argument 0 is C:\ADVC\ARGV.EXE
Argument 1 is eye
Argument 2 is ear
Argument 3 is nose
As you can see, we traversed the command line argument list in two different
ways. We did it the first way to show that the argument count, represented by
the variable argc, equals the number of tokens on the command line, includ-
ing the filename of the program executable itself. The program name, and all
of the arguments on the command line invoking the program, make up the
argv array.
Then we traversed the list another way, by traversing the array of pointers
with a char ** variable (mover) until the pointer it pointed at became NULL.
The second for loop is the correct way to traverse the array, because it uses all
pointer arithmetic and shows you that operating systems always terminate the
argv list with a NULL pointer. This is a convention that you should adopt when
building your own arrays of strings.
Still don’t get it? Let’s look at a graph showing the state of this data struc-
ture before the first for iteration.
122 Chapter 4 Advanced String Handling
argv
mover C: \ADVC\ARGV.EXE\O
eye\0
ear\0
nose\0
Once you look at this diagram, it is obvious that mover isa char **. But so is
argv! Remember that the name of a one-dimensional array is a pointer to its
first element. If that first element is a pointer, then we must conclude that
argv is a pointer to a pointer.
What is the notation for the pointer that mover points at? It’s *mover. If
mover is pointing at the pointer that points at "eye", what is the notation for
the first "e" in that "eye"?
You guessed it, it’s **mover.
If you understood
pointers to pointers in Chapter 3, this should present no difficulty except that
the data here are strings instead of nodes in a list, stack, or queue.
If you’ve read the code in Example 4.14 then you know that mover++
moves the pointer to the next element in the array of pointers. This contin-
ues until mover points at the NULL pointer in the last element of the array. At
this point, *mover is NULL and the for terminates.
Here’s some more demo code to show that you can manipulate your own
array of strings just like the built-in argv array.
d#Finclude <stdio.h>
main()
{
char *commands[] = { “copy”,
"delete",
SOO UIE
"move",
"display",
"rename",
S@WINE
RWWILIL de
char **mover = commands;
while (*mover)
4.10 Arrays of Strings (Arrays of Pointer to Char) 123
puts(*mover);
mover++;
}
}
[RRKKKKKKKKKKKKKKK Program Output Below KKRKKKKKKKKKKKK KKK /
copy
delete
print
move
display
rename
quit
Just to mix things up, we used the terse form of the loop conditional, *mover.
If mover is pointing at a NULL pointer, then *mover is NULL (zero), which ter-
minates the loop. The longer conditional is slightly, but not markedly, more
readable.
We've been working with strings for a while now, but I want to remind you
of something you should have learned in beginning C: A constant string
(such as "hel 10") is a character pointer because the compiler replaces it with
the address of the string’s first character.
Also, you should know that the declaration of commands could have been
these two as well:
Now I’m going to stop and give you some really tricky notational questions
based on the following diagram.
argv
C:\ADVC\ARGV.EXE\0O
eye\0
ear\0
mover
nose\0
124 Chapter 4 Advanced String Handling
Assume that (1) The values inside the boxes are the starting addresses for the
strings, and (2) the values outside the boxes are the addresses holding the
string pointers.
Question: What is the value of mover - ptr?
Answer: — The value is 2. Why? Two char * units separate mover from
ptr.
Question: What is the value of mover?
Answer: The value is 1012. Mover has the address of the pointer to
"ear". The address of that pointer is clearly 1012.
Question: What is the value of mover[-1]?
Answer: The value is 6000. Mover[-1] is the same as *(mover - 1).
We know that mover - 1 is 1008, and dereferencing gives us
what’s in that address: 6000.
Question: What is the value of **mover?
Answer: The value is the 'e' in "ear". If *mover points at the 'e' in
"ear", then clearly **mover must be the ‘e’ itself!
Question: What is the value of *(*(mover + 1))?
Answer: The value is the 'n' in "nose". Why? Because *(mover + 1)
is the pointer to "nose". Dereferencing that address gives us
that 'n'"!
I hope you enjoyed this little pop quiz. You’re going to get a lot more of it in
the exercises at the end of this chapter. The payoff from mastering this nota-
tion is that if you find yourself in the unenviable position of having to read
obscure code, nothing will cause you anxiety!
At the risk of stressing something I’ve already stressed, I want to emphasize
that you’ll read far more C code in your programming lifetime than you'll
write. Poorly written code won’t be pleasant to read, but your boss will be oh-
so-appreciative if you can decipher it!
The pointer parameter is the pointer whose allocation size you are trying to
change. The return value is a pointer to the reallocated space. The variable
receiving the return value is usually the one passed as the first parameter, but
this is by no means a requirement.
4.11 Growing and Shrinking Arrays of Strings with Real 1oc 125
Realloc() contains a very serious trap for the unwary. The realloc’ed
space may be nowhere near the original space. Suppose you have a situation like
that shown in the next diagram.
mover
If you realloc(ptr), the realloc’ed space may be moved far away, which
means that mover would be pointing into thin air! The lesson? If you real -
loc() a pointer, be sure to update any pointers that were pointing to places
within the pre-real]oc’ed space. You do not have to worry about the values
in the original space; they are maintained.
Now you’re ready for some sample code. This code will create an array of
pointer to char from the ground up using realloc().
[RK KKK KK EK KKK KK KK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK |
ios i]
/* Reads lines from a file whose name is passed */
/* on the command line. These lines are placed */
/* into an array that grows dynamically via * /
/* realloc(). xf
[* */
[ RRRRKKEKKK KK KKK KKK KE KK KKK KKK KKK KKK KKERKKEKKEKKKEKKK KK |
#Hdefine POINTERS 5
define MAXLINE 1024
mover = line_ptrs;
while (fgets(line, MAXLINE - 1, fp))
{
if ((*mover =(char *)malloc(strlen(line) +1) == NULL)
{
printf("Fatal malloc error!\n");
Oxquic@Z)is
}
strcpy(*mover, line);
movert++t;
} /**** End while ****/
line 7
line 8
line 9
line 10
This is some of the most difficult code I’ve asked you to read. I hope you
tried. The following diagram illustrates how the code works. It shows you the
position of the mover pointer just prior to the realloc() call (whose return
should be tested for NULL just like malloc()).
Note that after the realloc() call and the mover = line_ptrs +
arr_size - 1 reassignment, mover is pointing at the last element of the old
allocation, not at the first element of the new allocation. This trick allows us to
assign *mover = NULL immediately after the loop ends. Remember always to
make the last pointer in an array of pointers NULL. That way, another program-
mer using your array can use the NULL as a sentinel value—no more data follows.
line_ptrs 1\0
2\0
3\0
4\0
/* Before first mover 5\0
realloc. */
6\0
7\0
8\0
9\0
/* Before second
realloc. */ LUE 10\0
I hope this little picture helped. The most crucial thing in this code,
because it is the easiest part to forget, is the reassignment of mover immedi-
ately after the realloc(). Remember, above all, that real 1oc() may pick the
whole array up and move it to a different location in memory!
Even though our example used realloc “buckets” of only 5 pointers, this
was merely to demonstrate that realloc() does what it is supposed to do. In
a “real-world” application, you should use more reasonable bucket sizes, such
128 Chapter 4 Advanced String Handling
<stdio.h> Functions
<string.h> Functions
Fact: 1) Works the same way as strchr() except that it looks for
the rightmost c in string. The definition of rightmost is
“starting at the first '\0' character after address
string”.
<stdlib.h> Functions
m 2. In each of the following parts, indicate what each printf() will print on
the screen.
P2taialees ples
pl += strlen("here"):
Did ee ass
}
DrANT
I(T AS S925 D2. Mehl ee (pase) 5
j- PrinthG22s™, “abcdetq” <-e2);-
mw 3. Write code fragments to do the following tasks. Declare all variables used
except string Ss.
a. Print the portion of a string s between the last two periods in it. If
the string has fewer than two periods, print "Not enough periods".
b. Print all the strings of non-digit characters in string s. Print one string
per line, and make no assumption about the number of non-digit
strings in Ss.
c. Print all the digit strings in s. Make no assumption about the number
of digit strings in Ss.
d. String s is supposed to be a binary number (all ones and zeroes).
Print the decimal value of the number in s if it is a valid binary num-
ber. Print "Error" if it is not a valid binary number. Assume that
whitespace is already removed from s.
e. Overwrite all the ‘#’ characters in s with a period. Do not visit every
character in $ to see whether it is a period! Use a string.h function.
c. Assumes that the only legal operators are '*', '+', '/',and '-".
d. Assumes that a correct expression has no parentheses.
e. Assumes that operands and operators must be whitespace-separated.
f. Assumes that all operators are binary operators.
§. Assumes that legal operands are signed or unsigned long ints in base
10 (decimal).
h. Evaluates the value of the expression from left to right.
i. Returns the expression value as a ]ong.
j- Returns HUGE_VAL if there is an error in the expression or the result
cannot be represented in the computer.
b. Reads the file in such a way that each line is put into one element of an
array of pointer to char.
c. Prompts the user for one or two line numbers.
d. Prints the line or line range of the file determined by the number(s)
input by the user.
e. Allows users to do this repeatedly until they enter "quit".
f. Detects and reports all errors in user input. You must decide for your-
self what an “error” in the user’s input is. Good luck!
You may assume that the starting addresses of the strings are 5000, 6000,
7000, and 8000, respectively. The &strs[0] is 1000, and your computer
has four-byte pointers. Given all of these assumptions, find the value of
the following expressions. Assume that no expression affects any string in
later parts of the problem.
aS URLOK(@SUtS),. ZOMm) be-striencs(strsat.2)))
**Strs do" (puriew))
oUnVenGsstrs + 2) f. * 2p ras cl
5 Ca owie ae ap) h.strpsat 3
BP
0geStrppRKsGptr - 1) + 1c") «4. puri
mw 9. You work for the Central Intelligence Agency. The CIA is giving you an
input file with text and numbers interspersed. The text is irrelevant, but
you are supposed to total all the numbers embedded in the text. For
example, if the text file consists of
10. A language expert is doing a study that counts the frequency of certain
words in an ordinary text file. Write a program that accepts three com-
mand line arguments: (1) a list of words, one per line in UPPERCASE,
(2) the name of an input text file, and (3) the name of an output text file.
The output text file will consist of a listing of all words in the wordlist file
(argument 1) and the frequency of their occurrence in the input text file.
Hint: Read the words in the wordlist file into a mal 1oc’ed or real loc’ed
array of strings. You must figure out the rest! Assume that the input text
file is all UPPERCASE.
ee t _
é p-*, “apa ll ai |7 - *
eo dad. eal .
aT os e = a
j iff vae
* %
<, The a7 ed
t
i 4
—_
4 \ aS 7
” : “Ke
nA :
, fi
i.
it ; Pea
= >
i
5
ex
' "
‘
f
A
7 i Pie
i
Y f
si
~~
et 7
| Advanced
Input and Output
If C is relatively new to you, it is quite possible that you have used only the
“magnetic tape” model to read or write disk files. That is, you have read or
written them from start to finish without backing up or randomly jumping to
some other place in the file. Although it is true that many C applications
don’t demand greater sophistication than this, many very important applica-
tions, especially database management systems, do.
Therefore, one of the main thrusts of this chapter is in the area of random
access input and output. Included in this discussion is the idea of hashing to disk.
Because Chapter 3 introduced the idea of hashing, we will concentrate less on
the hashing formula and focus on the more difficult task of hashing with
disks rather than with memory.
At the end of the chapter, we will also deal with advanced aspects of the
printf() and scanf() families. We will offer a more thorough discussion of
variable formatting (a topic we have introduced before). Other issues that will be
discussed are: non-decimal input and output, assignment suppression on input,
output field justification, string output, scan sets, and escape sequences.
137
138 Chapter 5 Advanced Input and Output
Most random access files are binary and have fixed-length records. How-
ever, before we get into record-oriented input/output, let’s introduce binary
files and random access in a simplified example. We will construct a file that
consists only of the integers 1 through 5. We will move around it, overwrite
certain values in it, and demonstrate all of the relevant C library functions for
relocating the file pointer, finding its offset, and reading/writing.
d#Finclude <stdio.h>
main(int argc, char *argv[])
{
FLEE tp.
Tht Cle 2 5 Sp Ao) oe) bol Kee MOVEladi,
long pos;
Although it is not included in the code, the function ff1lush() is very impor-
tant and deserves mention. Fflush() must be called when you are shifting
from writing to reading, or vice versa, in a file in update mode (i.e., a "+" in
the fopen() mode) unless you do an intervening fseek() or rewind(). See
Section 5.2 for a lengthy discussion of fopen( ) modes.
The operations in Example 5.1 are sufficiently important to warrant a dia-
gram showing the file pointer’s changes and the changes in the file after each
statement.
The size_t parameters and return type are almost always implemented as
long int, but of course, they are compatible with any member of the int
family. Although fp is depicted as a pointer, you should know that you cannot
manipulate this pointer directly. Only the file reading/writing functions can
do that. In this respect, FILE *’s differ from any other pointer type.
rewind(fp);
140 Chapter 5 Advanced Input and Output
The rewind() function positions the file pointer to the first byte of the
file. Thus, in this diagram, fp is actually pointing at the first byte of an integer
1 that occupies sizeof (int) bytes. We will not show a diagram for the state-
ment
freadjms bZeOTGiinitJee orenN pees
It reads the five integers that have been placed in the file. The prototype
for fread() is identical to that of fwrite(). The for loop following the
fread() simply prints out the integers obtained by the fread() as proof that
our program is working.
fp
The fwrite() writes one integer (the 6) and then advances the file pointer
to the next position. You must remember that an fread() or fwrite() always
5.1 Random Access Binary Files 141
advances the file pointer after the read/write. Note that we must pass &k
because fwrite() requires that the first argument be a pointer.
This moves the file pointer (fp) back one int unit. After this statement,
the program calls fread() to read the 6 written by the fwrite(). The 6 is
then output to the screen to prove that our overwrite operation was success-
ful. Indeed, it was.
This fseek() moves the file pointer one int unit before the end of file.
The fwrite() overwrites the 5 with k’s value, which is 13. The file pointer is
then advanced and is now at the end of file.
As we did after overwriting the 3 with a 6, we back up one int unit and
read the area we just overwrote. As the program output proves, we have suc-
cessfully overwritten the 5 with 13. After the final fread(), fp is now at the
end of the file again. Our byte offset from the beginning of the file is 5 *
sizeof(int). Because my computer has four-byte ints, the ftel1() func-
tion should return 20 into variable pos. The output proves the truth of this
assertion.
The ANSI C prototype for the ftel1() function is
londentellGrlle <*fp);
fp = fopen(argv[1], "wtb");
There are three appropriate modes for writing binary files for random access:
wtb
r+b
a+b
The wtb specification says, “I am going to create a new binary file that I
may also read.” The r+b_ specification says, “I am going to open an existing
binary file for both reading and writing.” The a+b specification says, “I am
going to open an existing binary file that I may read. All of my writes will go to
the end of the file.”
Thus, if a + is in the fopen() mode, it means you’re intending to do both
reading and writing. Of course, if you are going to do only reading or only
writing, you should leave the + out.
The b in the mode specifies that a file is to be a binary file. But what if your
file contains fixed-length character records? Should you use the b or not?
Because characters are one byte in length on all platforms, the answer would
seem to be that it makes no difference. However, many operating systems,
such as VAX/VMS, put hidden bytes into the records of nonbinary files.
Therefore, when you want “clean” files, don’t forget the b in the fopen()
mode.
We've already seen that FILE * variables are not like other pointers. We
cannot manipulate them except through stdio.h functions. Another pecu-
liarity of FILE * variables is their behavior when passed to functions. If the
file pointer is moved inside a function, that fact will be reflected in the calling
5.3 Record-Oriented Binary Files 143
#tinclude <stdio.h>
d#tinclude <stdlib.h>
#finclude <string.h>
144 Chapter 5 Advanced Input and Output
struct record
{
char key[50] ; /*** A very simple record!! ***/
ie
PILE “1raE
/*** Next line initializes all record bytes to zero! ***/
struct record hashtable[100] = {""};
struct record temprec;
long hash = 0;
char line[100], *mover;
fp = fopen(argv[1],"wtb") ;
if (memcmp(&temprec, "", 1 ) == 0)
{
printf("Sorry! Record not found!\n");
}
printf("Hash value on input is: %1d Key is: %s\n",
hash, temprec.key);
}
[RRKKKKKKKKKKKKK
Sample Program Session Below ****x*k*kKKKK /
As this piece of code illustrates, fwrite() and fread() don’t care whether
you are writing or reading one record or a million records. A standard initial-
ization trick was used to zero out all 100 records in the file. The trick was
value, zero. If zero is the value of the first character of every string in hash-
table, this is equivalent to the empty string, because the ASCII value of ‘\0’
happens to be zero. Thus, before we fill up hashtable, it is full of empty
strings.
The key field of each struct (the only field in each struct!) was hashed
in the same manner as in the hashing example in Chapter 3: by doing a mod-
ulo 100 on the sum of the cubes of the ASCII characters in the char key. How-
ever, we are not using the computed hash value to access an index in an array
of pointers as we did in Chapter 3. We are using the hash value to tell
fseek() where to put the file pointer in the statement
Because hash is the result of a modulo 100 calculation, its value range is 0
to 99—exactly the correct range in a file that can hold 100 records. A hash
value of zero means an offset of zero from SEEK_SET—the first record posi-
tion in the file. A hash value of 99 means that we skip over 99 records and the
file pointer is positioned at the first byte of the 100th (and final) record.
However, now I’m going to surprise you by asserting that the code in Exam-
ple 5.2 is a good illustration of binary record reading and writing but a terri-
ble illustration of hashing to disk!
Why? Remember that in Chapter 3 we spoke of collisions, two or more records
with different keys but the same hash value. This was no problem with an array
of lists; we simply put all the collidees in the same list. With the algorithm in
Example 5.2, if we entered a new record with the same hash value as the
“foobar” record, the “foobar” record would simply be overwritten—a disaster.
Furthermore, in a real-world application, you would seldom create the
hash table in memory as we did in Example 5.2 and then write it to disk. Such
an application might involve hundreds of thousands or even millions of
structs—memory that your computer might not be willing to give you!
Therefore, the correct method is to create an empty hash file with room for
the thousands of records. Then you hash and add the records to the file, one
at a time.
Disk is inherently less flexible than memory. A one-struct hash slot can
hold one struct, period. Therefore, we must come up with methods to
resolve the collision problem for hashing to disk. Hence ... the next section!
The following is a diagram of a file design using hash buckets and overflow
areas.
bucket 0] bucket 1
As you can infer from the diagram, when a record is hashed using this
method, it goes into a “bucket” that holds a predetermined number of
records (structs). Each bucket can hold some maximum number of col-
lidees. If any bucket is filled with collidees, there is an offset value (which can
be a fixed constant) that can be used to put new collidees into slots in the
overflow area.
This can get ridiculous, with overflow areas having secondary overflow
areas and so on. Can we avoid disaster without bogging our code down with
complexity? Yes, by making the number of structs per bucket high and by
making sure the file never gets full. Will this completely eliminate the chance
that disaster will occur: the chance of both the buckets and the overflow area
for a given hash value will overflow? Unfortunately, there is no mathematical
certainty that any bucket size will guarantee overflow protection!
Now that I’ve induced anxiety, let me calm your nerves by telling you that
when the bucket size approaches 10 with a file that is, say, 25% to 50% full,
the chances of overflow come very close to zero. That is, you may not even need
an overflow area if the bucket size is large enough. Thus, if you were designing a
data retrieval system for a student record file with a projected maximum of
20,000 students, you might consider 10,000 hash buckets with room for 8
structs each; that way the file will not be more than 25% full. If you add an
overflow area, the chances of overflowing the overflow area, even if it is of
modest size, approach nil.
Entire books (especially graduate student dissertations) have been written
about the mathematics of hashing and collision resolution. But you are reading
this book to see real code and to obtain rules of thumb that might enable you to
implement real-world systems without a Ph.D. in math. Therefore, we'll “cut to
the chase” with a code example that implements a reasonable hashing-to-disk
scheme for the minimalist one-field records from Example 5.2.
#Hinclude <stdio.h>
#tinclude <stdlib.h>
#Hinclude <string.h>
148 Chapter 5 Advanced Input and Output
struct record
{
char key[50]
Lee
FILE *fp;
FILE *create_hash_file(char *filename);
void search_or_insert(
FILE: *fp, ‘int insertilag):
fp = create_hash_file(argv[1]);
search_or_insert(fp, INSERT);
search_or_insert(fp, SEARCH);
}
Although the hashing algorithm is the same as the one used in Chapter 3,
we will present it again here for your convenience.
hash = 0;
for (mover = key; *mover != '\0'; mover++)
{
hash += (*mover) * (*mover) * (*mover);
}
hash = hash % TABSIZE;
return hash;
5.4 Standard Collision-Handling Methods for Hashing to Disk 149
Now we present the function that creates the empty hash file by dumping
an empty hash table (100 four-struct buckets) and an empty overflow area
(100 structs) into a new file.
says that the first row (left index 0 of hashtable) should be four empty strings.
This initializes all other rows to zeroes—also empty strings because '\0" and
integer 0 are the same value.
Note that fwrite() writes out the whole empty array to the file in one write
operation for the hash table and then the overflow area. Of course, we could
have just written a one-dimensional array of 500 empty structs, but this func-
tion tells the reader how we have constructed both the hashtable and the
overflow area. The small loss in efficiency is more than made up for by the
information the reader gains about how we wish to divide the file up.
Now let’s look at the function that gets user input/search requests.
Note that we use strtok() to clean whitespace out of the user’s input.
Although this is irrelevant to the topic at hand (hashing), we want to use as
much of what we have learned as possible!
Finally, we present the “workhorse” functions that insert records into the
hash file and search for records in the hash file. Because these two functions
have much in common, they are presented together.
5.4 Standard Collision-Handling Methods for Hashing to Disk 151
strcpy(temp.key, key);
[xxx Go to beginning of hash bucket. *******x*/
Ie (fseek(fp, hash * BUCKETSIZE—*—=sq zeof (RECORD).
SEEK SET)
t=a0))
{
printf("Fatal seek error! Abort!\n");
exit(4);
}
[**xKAKEX Find first available slot in bucket. ******x*x*/
for (1 = 0; 71 < BUCKETSIZE;: i++)
{
fread(adetect,. sizeof (RECORD), 1, fp);
if (*detect.key == '\0') [*** Available slot! ***/
{
TSeCCKhp a = Uses ZeomCRECORDD... SEEK CUR)
fwrite(&temp, sizeof(RECORD), 1, fp);
printf("Record: %s : added to bucket %41d.\n",
temp.key, hash);
return; /*** Nothing left to do! ***/
i
}
roses lip il CGOw wdiS Tale, | MWS WOOK Bie wes sesy/
[/****k*kx*k the overflow area. RE REAAE |
fseek( fp. MABSLZE 7 BUCKEISEZES*) si.zeor CRECORD) {SEEKSSEI ss
foe Cl = @s FT « ORMOWSIZIES jsmr)
{
fread(&detect, sizeof(RECORD), 1, fp);
lite CA eCECtakeC Vaan) PAE. AN ACID. CeeSiOil m s oy)
{
fseek(fp, -~1l * sizeof (RECORD), SEEK CUR);
fwrite(&temp, sizeof(RECORD), 1, fp);
printf("Record:%s: added to overflow slot %d.\n",
temp.key, 7);
return: /*** Nothing left to do! ***/
}
}
(Weer ts] gotathiiserateslgan in big trouble. 2° s**e/
printf("Hash table overflow! Abort!\n");
exit(5);
152 Chapter 5 Advanced Input and Output
And now, before we discuss the inner workings of this code, let’s view a
sample session with this program.
Notice that one thing this program does not do is prohibit duplicate record
entry. This is feasible with hashing to memory, because the average size of each
list in the memory hash table is likely to be quite small. Therefore, you lose
very little efficiency by searching the whole list.
With hashing to disk, however, you'd have to search the entire overflow
area to establish that you were not entering a duplicate. This would to a
large extent negate the speed of hashing. In all likelihood, a separate pro-
gram would be built to prune duplicates from the file. Database management
154 Chapter 5 Advanced Input and Output
systems, for example, have many auxiliary systems to ensure that files are not
corrupted.
The algorithm for the insert_record() function can be stated in English
quite simply:
The search_record() function looks through the hash buckets and the
overflow area in the same way as insert_record(). Therefore, we will leave
it to the reader to create this algorithm. Looking through the eight steps
listed for the insert_record() algorithm, you might wonder why we look at
the first character of a struct on file to see whether a slot is available. To
answer this, look at the user interface function, search_or_insert(). It
does not accept an empty token (a string with just '\0'). If a user enters
nothing or nothing plus whitespace, no token is collected and the user is sim-
ply reprompted. This is how nearly all commercially available software works.
It does not “punish” the user or issue an error message when the user does
not enter a valid token in response to a prompt. It just silently reprompts with
no message. Thus no actual record in the file will contain the empty string as
an actual key.
Finally, we don’t want the testing session to go unnoticed! Keys were
entered such that we eventually overflowed hash bucket 8. The code then
reacted correctly by putting new records with hash value 8 into the overflow
area. In the search, we deliberately entered record keys that were not repre-
sented in the file. The program correctly issued a "Not found" error mes-
sage.
Looking at the search_record() code, note how efficient the search is
when a record is there and how inefficient it is when a record is not there—it
has to search the entire overflow area! Is this a problem? Not when you con-
sider that users generally try to look for records they know are there. Always
make the most likely event the one that is most efficiently handled.
5.5 Random Access to Variable-Length Record Files 155
As you can see, the array index corresponds to a line number in the file.
Let’s assume that the array in the diagram above is called of fsets. The offset
of the first byte of line 4 from the beginning of the file would be held in off -
sets[4]. To go directly to the beginning of this line in the file, you would
simply say
fseek(fp, offsets[4], SEEK_SET);
You might be wondering about the time it takes at the beginning of pro-
gram execution to sweep through a file and put all the offsets in such an
array. However, look at the cost of not doing so. If a user wanted to display
lines 500 through 530 on the screen, you'd have to read all the irrelevant
lines before line 500 just to get to the start of the desired range of lines!
Also, the construction of offsets is a one-time cost—the delay occurs
before the first prompt only. Thereafter, the program fetches the desired
lines at lightning speed. Another good rule of thumb in programming is that
it is much better to delay users a lot on one prompt than to delay them moderately on
every prompt. Even if a file is a million lines, it is better to have users wait 20 sec-
onds for index array construction than to make them wait 5 seconds on every
prompt.
When you are using an array of 1ong as an index in our file browser appli-
cation, how do you know how many elements to allocate? You don’t! You
should start with a modest allocation (say 1000 longs) and use realloc()
(see Section 4.10, “Arrays of Strings”) to make the array larger, if needed.
This means that the array must start its life as a pointer, because you cannot
156 Chapter 5 Advanced Input and Output
You could construct index files for any of these fields if you wanted to do dis-
plays of the records in name order, age order, salary order, and so forth.
Here’s what an index file for 1astname might look like:
Readers who are interested in index file construction and other related
issues should consult a book on database internals or a treatise on file struc-
tures. An excellent choice is File Structures by Folk and Zoellick (Reading, MA:
Addison-Wesley).
In an exercise at the end of this chapter, you will be asked to construct a
file browser using the index array approach outlined at the beginning of this
section.
5.6 Advanced and Obscure Uses of the Printf and Scanf Families 157
[RK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KK KKK KKK KKK KKK KKK KKK KK /
ifs Ay
/* The input file used for this example consists of the */
/* three lines of data below this comment. ay
[* a/
[RRR KKK KEK EK KK KK KKK KK KK KKK KK AK KK KEK KEK KKK KK KK KK EK EK KK KKK KKK /
dFinclude <stdio.h>
main(int argc, char *argv[])
{
BULLE Tues
chan seut
pe Loo, junki loon,
*digits = "0123456789", pl[10], p2[l0];
ime > til, 2s
Tfl@ée <8
fp = fopen(argv[1],"r");
PS Canin tose mss VUE T
PriMthC pos ins OS NB eS UUiines cUt ty):
rewind(fp);
fsGant (ipowslaaZza cl ollserSc Uli);
OMI enGr oS \Nise SLUT
rewind(fp); [EAE ROS POAC mY INEM ore lies C ANes Came an ag
fscanterpy OLS sas Sturt)
fgets(junk,100,fp);
polntr(he%s ness \n",, stuff, junk);
£SCannGnDsiedodsy oStd ancy s
PSCanhGr Pe ed, ot nol Oxo,
oninian GC’2Z05d\ns-Sd\ne50\neox\n 4-1, 1.1, i)
Deinbne wie. of \ns- ici Nels SENNA ss Tne,
ote Keyl Las Tes OR
DeinthGeos in, didi Es.
Drintheezl0roS\n, digits):
158 Chapter 5 Advanced Input and Output
The first fscanf() gets the first word from the first line, “This”. The subse-
quent printf() outputs it in both 10s format and -10s format. Because
This contains only four characters, the 10s format right-justifies the field in a
width of 10. The ASCII space character is used to “fill out” the field width of
10. A negative sign in front of the field width in a format, such as - 10s, speci-
fies that you want left-justification.
After rewind’ing the file, the next two fscanfs use a scan set format
descriptor. The two forms of scan sets are [...] and [*...]. The first form
says, “Read characters into the variable so long as they are members of the set
enclosed in brackets.” The second form says, “Read characters into the vari-
able so long as they are not members of the set enclosed in brackets.”
The scan set in the second fscanf() says, “Read characters into stuff so
long as they are letters or the space character.” The scan set in the third
fscanf() says, “Read characters into stuff so long as they are not a comma.”
As it turns out, both of these fscanfs put “This is a bunch ofjunk” into the
string variable stuff. Although range notations (like A-Z) and the [*...]
scan set form are not officially recognized in ANSI C, the author has yet to see
a compiler that does not acknowledge them.
The fgets() should be self-explanatory. Fgets() reads from the file
pointer’s current position to the end of the line. The variable junk, there-
5.6 Advanced and Obscure Uses of the Print fand Scanf Families 159
fore, has", followed by a bunch of junk!" in it, as well as the '\n' char-
acter that fgets() keeps and gets() does not keep. When using fgets(),
remember to allow room for the terminating '\0" character. Never use a
string without a terminating '\0'!
The next batch of printfs show how to output in octal format (%0), hexa-
decimal format (%x), floating-point format (%f), and exponential floating-
point format (%e) with both left- and rightjustification. Note the printf ()
that said
Fl and f2 correspond to the two asterisks in the format *.*f. We have seen
this before. It is variable formatting. If f1 is 8 and f2 is 3, the final x will be out-
put in 8.3f format. In fact, that is the case in Example 5.9; the 8 and 3 on the
second data line in the input file were assigned to f1 and f 2, respectively.
A couple of fprintfs display string processing. This is an area where mis-
takes are easily made. For example, say a string s consists of “"ABCDEF" and
you type
What will appear on the screen? If you said "ABC", you were tricked! When
any type of data is being output, if a simple integer field width is smaller than
the actual value being printed, C ignores the field width and uses whatever
space is needed. If you really want ABC to go to the screen, then you need to
type
fDeMCh
Cee. SS 4, SI;
StreOVCbidGS TRING. es
Streaucpigstring, Si):
160 Chapter 5 Advanced Input and Output
SePCMECOICISEPIMCGs, SANE
Stneat (bigs tming, S35
strcat(bigstring, s4);
Sscanf() is a very good partner when used with gets(). You collect a
whole line from the screen with the gets() and then read the values in the
string with sscanf(). This approach offers a major advantage over using
scanf(), because scanf() will not collect the '\n' at the end of the line.
This often causes programmers a lot of grief! Thus the rule for reading from
the terminal is simple: Get the whole line with gets() and then read the obtained
line with sscanf().
<stdio.h> Functions
Mode Meaning
rb Read binary data.
wb Create new binary data file.
ab Append binary data to existing binary file.
r+b Read and write an existing binary file.
wtb Read and write a new binary data file.
atb Read and append to an existing binary data file.
Exercises
a. Open existing binary file foobar. dat for reading and writing, and
attach it to fp.
b. Assume that foobar. dat is a binary file with an unknown number
of struct foo. Print to screen the number of struct foo in the
file.
- Read the “middle” struct foo in the file, where “middle” is the num-
ber of struct foo divided by 2—for example, it is the 30th struct
foo if the file has 60 or 61. Use the result of part b.
. Transpose each pair of struct in the file (the Ist and 2nd, 3rd and
4th, and so on).
. Turn the file’s contents backwards so that the last record is the first,
the next-to-last is the second, and so on.
Chapter 5 Exercises 163
. Allows the user to pass the name of an input file on the command line.
. Expects the file to be text and, therefore, opens it as a binary file for
reading.
. Sweeps through the file putting the byte offsets of line beginnings into
an index array.
. Repeatedly prompts the user for a line number or line number range
(two numbers separated by whitespace).
. Checks the user’s numbers for validity. You must decide what validity
means!
Prints the line(s) requested by the user on the screen.
. Uses strtok() and strtol() to get the numbers from the user’s
input line and convert them to long.
. Uses fseeks to get to the correct place in the file to fetch the user’s
lines. Do not read the input file into a data structure in memory.
Do full error checking! Make no assumptions about the size of the
input file.
4. Write a function that deletes a record from the on-disk hash table con-
structed by the code in Examples 5.3 through 5.7.
a. eae OAs
DimiataG e000) 5. Is
b. ioieeaie 204);
Dirintr wOx. ts
floaty xX = 23.123456;
intee=16o"j*= 23
DieimbiCre*.
<1, liaise
. char *s = "abcdefghij";
Draintt (2-10.45, #s)
char *s = “abcdefghij";
Dini
( 245". SJ)
164 Chapter 5 Advanced Input and Output
Field | Format
lastname Variable length up to 20 characters.
firstname Variable length up to 20 characters.
amount 2 digits after the decimal point and at least 2 before the
decimal point. Min = 10.00, Max = 99999.99
student_id Exactly 4 digit characters. No more, no less!
w 7. Write a program that accepts text lines from the user at the terminal and
enters them into a binary file. In case you are wondering, this will create
an output text file that is free of platform-independent junk! You will use
this file in Exercise 8.
a 8. Write a program that accepts an input file argument. The program will
rewrite this same file so that it is backwards. The output file must be the
same file given as an argument. Do not use a separate file for output!
Do the same as in Exercise 8, but this time make the output file a different
file. You'll notice that this is approach is much easier (and saner, too). Of
course, this means that the command line invoking your program will
have two arguments instead of one.
10 Assume that you have done Exercise 6. Write a program that enables a
user to swap the positions of any two records in the file. Users will be
allowed to swap pairs until they enter some indication that they wish to
quit.
~.
y aie’ ; ral oa ee
S Ty im kalba :
aS. a vy
WALA cpt
wed) “dere
—_——
iw qi cartySaw ge
\ 8) aie abyried a4 wulog
preyed Mag ait AWA.» irre
- wo
Th (ROT
a an ; :
- eae)
~ —_
Sfxa) 1) =F he
mC een,
Some people have jokingly referred to ANSI C as “the greatest assembly lan-
guage ever invented.” One of the reasons they say such things is that C allows
the user to operate at the bit and byte level normally associated with assembly
languages. We have seen some of that in the string.h functions memcpy(),
memset(), and memcmp(), which do bitwise copies, value assignments, and >
comparisons, respectively.
Bit manipulation is featured in an amazing variety of different areas, such
as graphics, robotics and industrial automation, cryptography, and game the-
ory. We will start with the basic operations and then move to lise ale oper-
ations that combine them.
167
168 Chapter 6 Bit Manipulation
As you can see, only two ones results in a one. All other combinations
result in a zero. Now that we’ve seen how bitwise AND works on two bits, let’s
see how it works on variables with many more bits.
dFinclude <stdio.h>
main()
{
unssdned=short--as—= 12, —be=—45 we:
C= awk Db;
printf("a & b = Zhu\n", c);
}
[RRKKKKKKKKKKKKKK Program Output Below KEKKKKKKKKKK KKK KKK /
Let’s see why we got this result in a diagram that incorporates the knowledge
we have of bitwise AND from the truth table above. We show a and b in their
binary forms along with the result.
C= VUOUMOOVTBOWWMODMLOT
As you can see, the only way to get a zero from a bitwise OR is if both bits
are zeroes. Given this knowledge, let’s demonstrate bitwise OR with 16-bit
numbers.
6.1 Bitwise AND, OR, Exclusive OR, and NOT 169
dFinclude <stdio.h>
main()
{
Let’s see why we got this result in a diagrain that incorporates the knowledge
we have of bitwise OR from the truth table above. We show a and b in their
binary forms along with the result.
c= OWOOWOOOOOWOTAIOD
Let’s look at the truth table for bitwise exclusive OR (better known as XOR).
As you can see, XOR results in a one if one of the two bits is a one. If both are
one or both are zero, the result is a zero. Given this knowledge, let’s test XOR
with two 16-bit numbers.
d#include <stdio.h>
main()
i
Unsioned short a = 26, Db = 20ec;
@ = ey dove
170 Chapter 6 Bit Manipulation
C= UHOUVVDOHODOOOOOOOXIOOD
By the way, you may have noticed that we used the printf() format
descriptor Zhu to output our unsigned short numbers. As you may have
guessed, the u means “unsigned” and the h means “half of an int” or, in
other words, a short.
The bitwise AND (&), OR (|), and XOR (*) operators are known as binary
operators. That does not mean that they operate on binary numbers. It means
that they take two operands. The bitwise NOT (~) operator is a unary operator.
It takes only one operand. The bitwise NOT simply inverts the value of every
bit. That is, each one becomes a zero and each zero becomes a one.
The largest unsigned short, if they are 16-bit types on your computer, is
65,535. Thus, if we take the bitwise NOT of 65,534, the result should be one.
The following diagram shows why.
d#Finclude <stdio.h>
main()
{
unsigned short a = 65534, b;
b = ~a;
printt( b =ZhuX\n", b);
}
[BRAKE KIAKKIIAKEIIAAK —Program Output Below OC aR /
b= 1
SY
6.2 Bit Shifting 171
dHinclude <stdio.h>
main()
{
unsigned short a = 17, b = 11;
al = @ ® lope
el —" Gy [oye [AAS NG) UCMOF soa teeS:NOU Gis Onl 7, again aos,
printf("a = %hu\n", a);
}
[RREKKKKKKKKKKKKKK Program Output Below KKKKKKKKKK KKK KKK KKK /
a= 1/7
Example
6.6 Demonstration of the Bitwise Left Shift
and Right Shift Operators
#include <stdio.h>
main()
172 Chapter 6 Bit Manipulation
{
unsigned short a= 3,.b = 7, ¢ = 32768; ale
d=a << 3;
Dicinibrdia <<eS, =e eNom Dis
de=sbepe ce
printt(“be>> 2) = ShuNn ed)
d=c >> 14;
Drinurercs>> =14s atin smu)
}
KK Program Output Below KKKKKKKKKKKKKKEKEA /
[KK KKKKKKKKKEKKKKKK
a << 3 = 24
Deo cee
co>> 14. ="2
oo __________
on, toggling it turns it off. Ifa light is off, toggling it turns it on. First, let’s see
how you can set a range of bits. Setting bits means giving them a value of one.
#include <stdio.h>
main()
{
unsigned short a= 32, b= 13, c:
ime NUinMotcs = 3. Stalriloite = 2s
c = 60
c = 15
I often call these formulas the “line noise” formulas because they look like the
characters that appear on your screen when your modem malfunctions! The
formula shown turns on numbits bits, starting at bit position startbit. Let’s
break this formula apart in a series of diagrams.
bit position 15 0
a)(unsigned short) ~0 Terie ate oe al Sil Sake dibs Ml ate bee eal
DIFSSUITG OF @) KK Mites GG) |1 rtrd tid dt lO © @
c)~ of result of b) SVUOOOODOOOUOODOOUA GD
aresuilte OF ©) <K Suairubie (2) =O O DOOM OMOOOOLLIODD
e)value of variable a (32) =OOCOOODOWDDOILOWO DOD
The code in Example 6.7 also turns on three bits of the number 13 starting
at bit position 1 to illustrate a point about this formula. That point is that bits
that are already set, stay set. Here’s the proof.
bit position 15 0
a)(unsigned short) ~0 sividagdagateiiaa.2
HIMESUIME OF ay <—K mumoies |B) eaotetidghaggkhangdtk dc OO
c)~ of result of b) SQ QOOOODODOOOODOAOA fil
GROSUIE OF C) —K Stareois CI) SOOOOOUTVOOMOOOL A aw
e)value of variable a (13) =O 0 OC OWODOO OM OM tt Oa
Notice that the bits in b that were already set in positions 2 and 3 stayed set,
as did the bit in position zero that was unaffected by the OR.
You will be quite surprised to find that the formula for turning offa range
of bits is only slightly different from the formula for turning a range of bits on.
Here is some code demonstrating this formula.
#include <stdio.h>
main()
{
VMSICGMEC Shore a =] Sil, io = WY), CE
Me MUMS, Siwevewoi
es
numbits = 3; startbit = 1;
c= a& ~(~((unsigned short) ~0 << numbits) << startbit);
printf("c = %hu\n", c);
c= b & ~(~((unsigned short) ~0 << numbits) << startbit);
printf("c = Zhu\n", c);
}
[RR KKKKKKKEKKKKKK KK Program Output Below KKEKKKKKKKKKKKKKKKK |
ea 7
c = 17
As you can see, unsetting bits involves one more level of bitwise negation (~)
plus the use of bitwise AND (&) instead of bitwise OR (|). Just as the bit setter
formula left bits set that were already set, the bit unsetter formula leaves bits
unset that were already unset. For example, in Example 6.8 we unset three
bits (numbits) starting at bit position 1 (startbit). The value being oper-
ated upon was 19 (b), which in binary is
As you can see, we did zero out three bits starting at bit position 1, and the
two bits in positions 2 and 3 which were already off, stayed off. Try following
6.3 Bit Masking Formulas 175
each operation in the unsetting formula as we did in the diagram for the bit-
setting formula.
Of course, there is also a formula to “toggle” numbits bits starting at bit
position startbit. Guess what? You will be given this as an exercise in the
end of the chapter. Don’t worry, the answer is in the Appendix. A giant hint:
Use XOR to develop this formula.
The next trick you will be shown is how to overlay a bit pattern onto a
range of bits. You may want more flexibility than the “all-or-nothing”
approach of turning a range of bits all on or all off. You may want, for exam-
ple, to overlay the bit pattern 11001 onto an unsigned short starting at bit
number 2, or something like that. The next piece of code will show you how
to accomplish this feat.
#Hinclude <stdio.h>
main()
{
unsigned short word = 127; Petes MAMI ait TOMA. eee //
unsigned short pattern = 9; /*** 001 in binary: KKK /
unsigned short overlay(unsigned short word, int startbit,
int numbits, unsigned short pattern);
word = 103
mo
Notice that the bit overlay method first uses the bit unsetter formula to zero
out the bits that will be overlaid. Thus, the bit pattern in word goes from
1111111 to 1000011 because the bit unsetter turns off four bits (numbits)
starting at bit position 2 (startbit). Note that the bitwise operators can be
used in assignments like +, *, and so on. Therefore,
is the same as
In the code of Example 6.9, we have unset four bits of word starting at bit
position 2. Then we OR the result (which is in word) with pattern <<
startbit, resulting in the following operation:
a =
In assembly languages, there are often commands for rotating bits. First, we
have to answer the question “What does that mean?” Let’s say we have a string
of bits like this:
TE OMUF0F 0000s 0202050" CLORO aT
A right rotation of this bit pattern by two bits would result in
OF IOs 070 0" OF Os; 00 s0r OE Cs0
Now, if we rotated this pattern to the left by three bits, the result would be
OF020705
02020. 0205080 OROS0 sleet
Notice, however, that we do not need a formula to do both left and right
rotation. A left rotation by n bits results in the same value as a right rotation
by 16 - n bits. Here is some code that accomplishes a right rotation.
#Hinclude <stdio.h>
main()
{
unsigned short word = 3;
unsigned short rightrot(unsigned short word,
int numbits);
Word = 32769
dHinclude <stdio.h>
main()
{
unsigned short word = 127;
int testbit(unsigned short word, int bit_to_test), i;
void printbits(unsigned short word);
[ERK KKKKKKKKKKK KKK KKK KKK KKK KK KKK KEK KKK KK KKKKKKKKKKKKEK |
[KRKKKK KKKKKKK /
[RR KKKKRKEKK KKK KKK KKK EK KEKE KK EK KKK KEKE KK KKK EKER ERK ERK EK /
word & 1;
return word;
}
[ RRR K KKK KKK KKK KKK KKK KKK KK KKK KKK KK KK KKK KKK KK KK KK |
[KKK KKKKKK /
[RRKKKK KKKKKK /
Bit 0 of word is ON
Bit 1 of word is ON
Bit 2 of word is ON
Bit 3 of word is ON
Bit 4 of word is ON
Bit 5 of word is ON
Bit 6 of word is ON
Bie 7 Or WoirG iS ORF
aie © OF Wore 1S ORF
Bit 9 of word is OFF
The testbit() function uses a very simple principle: when you perform the
operation
something & 1;
it can have only two possible outcomes—one or zero. It’s easy to see this in
binary.
Thus the trick used by testbit() is to shift the bit you wish to test into the
rightmost position. If that bit is set, the result is one. If that bit is not set, the
result is zero. The printbits() function actually uses testbit() to do itsjob!
6.6 Applications for Bit Manipulation Formulas and Concepts 179
No new functions were introduced in this chapter, nor were any old functions
used in a new way.
Exercises
for the color bits. On some old personal computers, even a long is 16
bits, so be sure to find out what kind of architecture your machine has.
gOS. Write a function that takes two parameters: (a) a character string contain-
ing a bit pattern, and (b) an unsigned short. This function will search
for the bit pattern within the unsigned short and return the starting
location. If the bit pattern is too long, return -1. If the bit pattern is
invalid (assume all characters in this argument are supposed to be ones
or zeroes) return -2. If the bit pattern is O.K. but cannot be found in the
unsigned short, return -3.
. Write a function that encrypts a data file by exclusive OR’ing all of the
characters in the original file with the letter X. Write another function
that decrypts the encrypted file back to its original form. Yes, it’s a rather
primitive encryption/decryption scheme, but if you want, you can use a
more complicated encryptor/decryptor than X!
. Write a function that does a left bit rotation. Look at Section 6.4 if you
have forgotten how we did a right rotation.
wl0. Write a function that toggles numbits bits starting at bit position
startbit. Assume that the user gives sensible values for numbits and
Start bit.
ell. Write a function that returns the bit pattern of an unsigned short from
bit endbit down to bit startbit as a string. Do full error detection inside
the function, and make no assumptions about the size of an unsigned
short on the target computer. Return a NULL pointer if the user’s
startbit or endbit values are illegal.
Recursion is simple to explain and hard to follow. It is simply the ability of a C
function to call itself. The problem. with recursion is that the algorithms that
‘employ it are often slow, consume many system resources, and may be diffi-
cult to understand. However, some data structures, especially binary trees, are
manipulated rather well with recursive algorithms.
We will look at some very simple recursive algorithms (computation of fac-
torials and the Fibonacci sequence) and then see how to convert them to iter-
ative algorithms (algorithms that do not use recursion). After we’ve explored
the fundamental ideas about recursion, we will move to a data structure that
is very useful and is usually manipulated with recursion (that is, binary trees).
Oe = Se) 120
As you can see, the factorial of 5 can be said tobe 5 * factorial (4). In gen-
eral, the factorial of any number n can be said to ben * factori al (eer).
The fact that we can create a definition of the factorial function that refers to
itself makes the recursive computation of the factorial of any integer n rather
easy to follow. Here’s the code and the output created by the code.
183
184 Chapter 7 Recursion and Binary Trees
dHinclude <stdio.h>
main()
{
long fact(int number);
int number;
char line[80];
Enter an integer: 1
THN: FACEGIA OF 1 FS I
Enter an integer: 2
We FACLORIA OF 2 aS Z
Enter an integer: 3
ines factorial of -3 is 6
Enter an integer: 5
The factorial of 5 is 120
Enter an integer: 7
The factorial of 7 is 5040
Enter an integer: 8
The factorial of 8 is 40320
Entersaneinteger: O -/*** End of Anputmimdncavokas <> */
$ /*** Operating system prompt. We’re done! ***/
comes first. The stopping case in the fact() function says, if the parameter
number passed to itis one, return 1. The recursive case says, if number is any-
thing other than one, return number * fact(number - 1).
Let’s diagram how this function works until it hits the stopping case and
then winds back through all the recursive calls and returns to main().
Cali is Pecurn | = Face(ys
Calli Ze weir 4 s qFaew()ye
Galli] BS WAC S = aWaAeiEC2ss
Gali) 48 recur 2 = ACEC) <
Call 5: return 1; /**** The stopping caselll ****/
NReWUeM wO (Calli) Abe jeeneuien) 2 x ils
Return to: Call 3: return 3-* 2; “/*** The 2sis returned: Dy
Galli Zh eeky/
Return to Call 2: return 4 * 6; /*** The 6-is returned by
Calle se Sess7
Return to Call 1: return 5 * 24; /*** The 24 is returned by
Call 2. ses/
/**x 120 Returned to
main(). ***/
This was an easy recursion to follow, but I want to make it perfectly clear
that we are using the fact() function merely to illustrate recursion. Here’s
how you would actually write fact().
Notice that once you get past the second number in the sequence, each num-
ber in the sequence is the sum of the two previous numbers; for example,
186 Chapter 7 Recursion and Binary Trees
d#Hinclude <stdio.h>
main()
{
int fibonacci(int number), number, result;
char line[80];
Enter an integer: 1
Fibonacci number 1 is: 1
Enter an integer: 2
Fibonacci number 2 is: 1
Enter an integer: 3
Fibonacci number 3 is: 2
Enter an integer: 4
Fibonacci number 4 is: 3
Enter an integer: 5
Fibonacci number 5 is: 5
Enter an integer: 6
Fibonacci number 6 is: 8
Enter an integer: 7
Fibonacci number 7 is: 13
Enter an integer: 8
Fibonacci number 8 is: 21
7.2 Recursive Computation of the Fibonacci Number Sequence 187
Enter an integer: 9
Fibonacci number 9 is: 34
Enter an integer: 10
Fibonacci number 10 is: 55
Enter an integer: 0
$ /*** Operating system prompt. Session is done. ***/
[XAKKASS @Sequence, ts,.corrects 1 1 2532558013021 3425 5eg Ae ary,
FUCA)
\ AP ier)
| spe erie! Sag a2
iL PCA se wiley) 1 1
i 1 PulC2) se FiloCh) 1 1
1 iL
Fibs) TibC2)
i Ss
FUCZ) ar aio) 1
eau
1 Hl
Pretty scary. Now I’m going to ask you to count all of the 1’s in the diagram.
There are 13 of them, aren’t there? Look at the code output for the seventh
Fibonacci number. That’s right—it’s 13, just the number we arrived at by fol-
lowing all the recursive calls out to their stopping cases in the diagram. The
diagram does not represent an elegant proof, but it is certainly conclusive!
As usual, the Fibonacci sequence can, and should, be generated with an
iterative algorithm. One look at the foregoing diagram should convince you
188 Chapter 7 Recursion and Binary Trees
that the function-call overhead for the recursive algorithm is way too expen-
sive. Think how many calls there would be for, say, the 50th Fibonacci number!
Thus, we now present the iterative version of the Fibonacci number generator.
This is certainly a lot longer than the recursive version of fibonacci(), but
believe me, it will run a lot faster. Count the calls to fibonacci() in the tree
diagram that followed all the recursive calls out to the stopping case. That’s
right—there are 24 calls to fibonacci(). That’s a lot of overhead. And this
overhead increases depending on the parameter value.
The algorithm presented in Example 7.4 will perform exponentially better
than the recursive algorithm as the value of number increases.
Now we’re ready to look at a data structure that is usually inserted, searched,
and traversed via recursive algorithms: the binary tree data structure.
The binary tree data structure makes possible rapid data access (not quite
as good as hash tables but close enough!) and allows for sequential record
processing. We will look at recursive and iterative algorithms for inserting
into, deleting from, traversing, and searching binary trees. First, let’s see how
insertion into a binary tree works graphically. We’ll assume that our records
are minimal and have only one field, a string. Let’s also assume that the
record keys are entered in this order:
middle
ZOO
cat
poodle
barge
riddle
mailer
apple
person
giraffe
Here’s a view of how the tree will change after every two insertions (after the
“middle” node).
4) tree
190 Chapter 7 Recursion and Binary Trees
And now the final picture of the tree after all 10 insertions have been made:
tree
Unlike the trees in your yard, computer trees have their roots in the air
and their leaves on the ground. All new nodes are leaf nodes. It is called a
binary tree is because each node has two pointers (a “left” pointer to a node
whose key value is less than the given node and a “right” pointer to a node
whose key value is greater than the given node).
In the two diagrams above, all unused left or right pointers coming out of a
node have the value NULL, so this is really an easy concept. When a new node
is to be placed in the tree, you start at the root node (the one pointed at by the
tree pointer) and ask the question “Is the key to my new node greater or less
than the key to the node being pointed at?” If the answer is “less than,” you
follow the left pointer. If the answer is “greater than,” you follow the right
pointer. This process continues until a left or right pointer is found that is
NULL.
Without further ado, we will look at two different versions of the tree inser-
tion routine. The first is recursive and the second is iterative.
in ClCstree)p)
{
if ((*tree = (NODE *) malloc(sizeof(NODE))) == NULL)
{
printf("Fatal malloc error!\n");
yeeil)
7.3 The Binary Tree Data Structure 191
j
if (((*tree)->word = (char *)
malloc(strlen(word) + 1)) == NULL)
{
printf("Fatal malloc error!\n");
OXI TtC295
}
(*tree)->left = (*tree)->right = NULL;
strcpy((*tree)->word,word);
return
}
compare = strcmp(word, (*tree)->word);
if (compare < 0) insert(word,&(*tree)->left);
else if (compare > 0) insert(word,&(*tree)->right);
else printf(“Word is already in tree.\n”);
aad
Well, you already know that this algorithm is bound to be slower than the iter-
ative one, so let’s proceed immediately to the correct version of insert().
while (*tree_ptr_ref)
{
compare = strcmp(word, (*tree_ptr_ref)->word);
if (compare > 0)
{
tree_ptr_ref = &(*tree_ptr_ref)->right;
}
else if (compare < 0)
{
tree_ptr_ref = &(*tree_ptr_ref)->left;
}
else
{
printf("Node already in tree!\n");
return tree;
i
}
if ((*tree_ptr_ref = (NODE *) malloc(sizeof(NODE))) ==
NULL)
192 Chapter 7 Recursion and Binary Trees
Let me admit straightaway that many purists will howl with indignation at
the iterative insert(), because it uses double indirection (the variable
tree_ptr_ref) and they would substitute a version that is something like the
tree version of the “chasing pointers” method we used with linked lists.
The beauty of the iterative insert() is that when it gets down to an inser-
tion point (a NULL pointer), it has a reference to that NULL pointer called
tree_ref_ptr. The following diagram shows exactly what is going on just
after the while and before the malloc().
tree
tree_ref_ptr
Assuming that the new insertion is to go off the right pointer of the right
second-level node, the tree_ref_ptr has the address of this NULL pointer,
and *tree_ref_ptr is its name! We leave it to the readerto understand the
not-very-difficult recursive version of insert(). It, too, uses the dreaded dou-
ble indirection, because if the tree is empty, then the value of tree will
change, and therefore we must pass its address so that it comes back to the
calling environment with a changed value. Otherwise, the tree would (mistak-
enly!) stay empty forever.
7.4 Search Times in Binary Trees 193
Searching for a node in a tree is not very much different from inserting a
node into a tree. You do the same kind of key comparison seen in insert().
Again we show a recursive version followed by an iterative version.
In this case, the iterative find() isn’t even more notationally formidable than
the recursive find().
you have a randomly inserted binary tree, the time it takes to find a target node
is roughly
1.4 * logs Number-of-Nodes
Thus if you had 1024 nodes (2 to the tenth power), you would find the tar-
get node, on average, in 1.4 x 10 searches—that is, 14 searches. In a linked
list, it would take 512 searches! And the larger the number of nodes, the
more the comparison favors trees. If you had about 16,000 nodes, your binary
tree search would access about 1.4 x 14 (about 20) nodes. The linked list
would need to access 8000 nodes—that’s a lot!
But how can you ensure that your insertions are random? If you insert
nodes that are already sorted, your binary tree is just a linked list that wastes
all of its left pointers! If each node that comes in has a key greater than the
preceding node, only right pointers will be followed.
If you read a data structures book, you will see many algorithms for creat-
ing balanced binary trees. These are trees whose height differs by no more than
one level from any node’s point of view. Here’s a picture showing a balanced
tree and one that is not balanced.
balanced_tree out_of_balance_tree
In the balanced tree, from the root node’s point of view, there is a differ-
ence of only one level between its left and right subtrees. In the out-of-bal-
ance tree, there is a difference of two levels between its left and right subtrees.
Does this mean that we must use a balancing algorithm?
It turns out that the balancing algorithms may consume more time than
they are worth. Another ugly fact is that the code is quite complex. Therefore,
it may pay big dividends to randomize input before it is inserted into a tree structure.
How can you do this? By using random numbers.
Let’s say you had a minimal data file with only one field per line. You
would read the file and create a temporary output file with one extra field per
line—a random number generated by the ANSI C rand() function. Then you
7.4 Search Times in Binary Trees 195
would use your operating systems’s sort utility and sort the records by this ran-
dom number. The next piece of code will show this for a file that is in sorted
order (an order that creates a horrid tree unless randomized).
dHinclude <stdio.h>
#Hinclude <stdlib.h> /*** Has srand() and rand() ***/
#Hinclude <string.h> /*** Has strchr(). ARK /
main(int argc, char *argv[])
{
FELE A Tpiny “fpout:
char outfile[80], line[512];
}
[RR KKKKKKKKKKKKK KKK KK Original File Below KKKKKKKKKKKKK KKK /
foo
bar
apple
papa
crab
banana
moo
bar 8457
crab 10193
apple 10616
196 Chapter 7 Recursion and Binary Trees
moo 18104
foo 18655
banana 25964
papa 31877
This is just one technique to get rid of the overhead of using a balancing algo-
rithm during program execution. The next section discusses another head-
ache in tree manipulation.
The result is a very messy algorithm whether you are talking about the iter-
ative or the recursive version. Is it worth it? Generally, not at all. In most ses-
sions where you are inserting and deleting from a binary tree, you are not
going to delete, say, half the tree. But, even if you were, and if you had a
16,000-node tree, your searches will take 10 node accesses instead of 20.
No user will notice any change in program execution speed when such
small numbers of node accesses are involved. They'll notice the difference
between, say, 10 and 1000 (like the difference in access times between trees
and lists), but not between 10 and 20. Thus the best deletion method is
known as lazy deletion. In this method, you simply put in the key a character
that is not part of any legitimate key. In the exercises, you will show how doing
so changes insertion, deletion, and traversal of binary keys.
printf("%s\n",tree->word);
printtree(tree->right);
}
eS
#Hinclude <stdlib.h>
#Hinclude <string.h>
#Hinclude <stdio.h>
struct node
t
char word[20];
NODE *left, *right;
+
main()
{
char word[20];
NODE *tree = NULL;
NODE *find(char *word, NODE *tree);
NODE *insert(char *word, NODE *tree);
void printtree(NODE *tree);
printtree(tree) ;
}
[RKKKKKKKKKKKKKK Sample Program Session Below KKKKKKKKKKKE /
the tree, print it! Of course, if find() does not find word in the tree, it
returns NULL, which conveniently equals “false” in an if conditional.
struct treenode
{
char *key; /**** Yes, we want totally dynamic
MOdesi is)
struct treenode *left, *right;:
struct linenumber *front, *rear;
Ts
Conceptually, a tree of queues will be no harder to deal with than the
Operating System Queues example of Chapter 3 (examples 3.20-3.23) or
Exercise 5 of Chapter 3. This data structure, a hybrid of a linear data struc-
ture (a queue) and a tree, represents the final level of pointer-handling com-
plexity addressed in this book. Have fun with Exercise 9!
Remember, above all, that when you are constructing a binary tree, the per-
formance of the tree in insertions and searches depends, to a large degree, on
how random the insertions were. If you cannot guarantee randomness, then
you must rely on tree-balancing algorithms. These algorithms are inherently
difficult to understand and create a considerable runtime cost.
The message here is simple: Randomize the data before insertion. This may
involve creating a couple of utility programs to do so, but once created, these
programs will enhance tree performance meaningfully at runtime. Your users
will be happier.
200 Chapter 7 Recursion and Binary Trees
<stdlib.h> Functions
Exercises
m@ 1. Write a function that reads an indefinitely long string and prints it back-
wards. This string will be terminated when the user hits the Return key.
Aint: Recursion is involved.
@ 2. Write a function that will count the nodes in a binary tree of struct
foos.
3. A binary tree has nodes with int keys. Nodes are inserted into the tree
with keys in the following order: 22, 14, 35, 2, 23, 19, 29, 17, 6, 9, and 12.
Draw a diagram showing what this tree will look like.
Chapter 7 Exercises 201
tree
Show the order in which these nodes would be printed if this tree were
printed out using the printtree() algorithm shown in this chapter.
g 5. Assume that lazy deletion was used on nodes in a tree whose nodes look
like the ones in this chapter. Also assume that no node will have an aster-
isk (*) as part of the key. Show how the find() and printtree() algo-
rithms will change.
w 7. Write a function that prints the part of a tree whose keys are equal to or
greater than a key given as an argument. Hint: Do it recursively.
When you read your text file, tokens will consist of any combination of
the following:
a. Letters.
b. Digits.
202 Chapter 7 Recursion and Binary Trees
c. Single quotes such as in the words can’tand won’. A single quote must
have a letter on both sides.
d. Hyphens, provided that they have a letter or digit on both sides.
. Write a program that will read a user’s C source file as specified on the
command line and allow users to
Your program will accomplish this task with the following rules applying:
a. A C identifier starts with a letter or underscore followed by any combi-
nation of letters, underscores, and digits.
b. No identifier occurs in comments or quoted strings (single or dou-
ble!). . .
c. Identifiers will be put in tree nodes. Each tree node has a line number
queue hanging off it.
d. Create an array of offsets of beginnings of line numbers so that user
requests for lines that contain some identifier can be complied with
quickly.
This exercise uses everything this book has taught you. Good luck!
wl0. Write a function that gets integers from the user at the terminal and
writes them into a binary file of ints in the reverse order of their entry. Do not
enter invalid ints into the file. Tell the user when an entry has been
rejected. The function must be recursive and accept one argument, the
FILE * of the file to be created. You may assume that the file will be
opened and closed outside of this function. The answer in the Appendix
(and yours, too!) will also include a main() function to drive this recur-
sive function.
Arrays 2
andPRG. of
(Non-Cha r) Pointers
Given what you have been told about arrays, this looks correct. But it is
wrong! A good compiler should give you a type mismatch error on the argu-
ment a of memset(). It turns out that an element of a two-dimensional array is
a one-dimensional array. Therefore, I’m going to give you a simple rule to
203
204 Chapter 8 Multidimensional Arrays and Arrays of (Non-Char) Pointers
memorize: For arrays of dimension two or higher, you must pass the address of the first
element to a function through notation using the & operator.
Thus the foregoing memset() call should have been
memset(&aLOJ[0], 0, 9 * sizeof(int));
If you want to declare an int * variable p to point at the first element of array
a, your declaration should be
not
Because the rule we have stated is true for arrays of dimension two or higher,
you should have no trouble with the above operations on arrays of three, four,
or more dimensions. To set all elements of a 3-by-3-by-3 array a to be zeroes,
you should say
memset(&aLO][O][0], 0, 27 * sizeof(int));
#Hinclude <stdio.h>
main()
{
intearray2droii4 t= { (0); /* Init row 0 */
{1,.2)5.. Fx Init: ROW &/
(35 Deg As cles tae eu OMnc eo
int array3d[3](3][3] =
(eateelreeceecousy eas: 55) Oe /* Left index 0 */
AC og SS OTS firs (Lee Tinie il s3//
ECOvR ee Se Oneal: loi pis /* Left index 2 */
HOP Th = WS 1 «— se alex)
fOr Gi = Oe AS UE)
{
printf("Array2d[%d1(%d] = Sd\n",
i, J) avray2dliiiij
me
8.2 Declaration-Time Initialization of Multidimensional Arrays 205
Tor Cl = Os 7
< Be Weep)
vor @ EUSA Cee rs is)
{
printf("Array2dnew[%d][%d] = %d\n", i, J;
array2dnewLlil(jl]);
}
Array2d[0J[0] =
Array2d[Oj[1] =
Array2d[0][2] =
Array2d[0][3] =
Array2d[1][0] =
Array2d[1JEl] =
Array2d[1][2] =
Array2d[1][3] =
Arravediceiod =
Array2d[2][1] =
Array2d[2][2] =
Array2d[2][3] = O
OCC
PHROWOONrF
Array2dnewL0][0]
Array2dnewL0J[1]
Array2dnew[0][2]
Array2dnewL0][3]
Array2dnewL1][0]
Array2dnew[11][1]
ll
Array2dnew[1][2]
Array2dnewL1][3]
Array2dnew[2][0]
Array2dnew[2][1]
Array2dnew[2][2]
Array2dnew[2][3] Sena
Se)
Ss)
O16)
oa
oe
Array3d[0][0J[0]
Array3d[0J[0OJ[1]
ll
Array3d[0J][0][2]
Array3d[0J[1J]([0] FE
PWM
206 Chapter 8 Arrays and Arrays of (Non-Char) Pointers
Multidimensional
Arreaysdh0l Mit
Arnaysdlod Maii2ie=
Array3dL0J][2][0] =
Array3dLOJ[2][1] =
Arraysdh od i2iil2) =
Array3d[1][OJ(O] =
Arraysd i Odie. =
Array3d[1][0][2] =
Array3d[1][1J{[0] =
Arraysdiiiliiitis=
Array3d(1JtljE2Z] =
Array3d(1][2][0] =
Array3d[1](2J([1] =
NP PEW GILL NILAWL2 1) =
Array3d[2][0][0] =
Array3d(2](0J([1] =
Array3d[2][0][2] =
Array3d[2][1]{0] =
Array3d[2](1][1] =
Array3d[2][1][2] = @)
(eye)
Ces)
(ey)
=>)
ee
Ca
en)
ea
(SS)
(a)
(ee)
(=)
ee
Array3d[2]{2](0] = 10
Array3d[2](2]({1] = 11
INPCW SOLA ILZIIL2 1) = we
COJLO} COIC1] COIL27 COIL3I C1]C0] £19019 £19021 £1903) [2100] £2301) £2102] £23[3)
An easy way to remember what row major order means is to remember that
the row number is the leftmost index and that this index changes only after
8.2 Declaration-Time Initialization of Multidimensional
Arrays 207
all values of the rightmost indices change. We will see shortly how this applies
to three-dimensional arrays.
The array array2dnew in Example 8.1 was put there to illustrate that you
are not required to initialize two-dimensional arrays on a row-by-row basis. In
the initialization for array2dnew, the initializer is simply
Uns 1s
This means that array2dnew[0][0] is assigned the value 5, array2dnew[L0J][1]
is assigned the value 7, and all of the other elements of array2dnew are assigned
zero.
Whereas it is easy to see what row major order means for a two-dimensional
array, it is somewhat more difficult to see what it means for a three-dimen-
sional array. However, all you have to do is understand the rule we just noted:
The indices’ “speed” of change goes from leftmost = slowest to rightmost =
fastest. Keeping this in mind, you should find that the following memory lay-
out of array3d from Example 8.1 makes sense.
eee eee
This diagram, showing the change in indices in array3d, supports our the-
ory that the general meaning of row major order layout of arrays in memory
is that the leftmost index changes slowest and the rightmost the most rapidly.
Can we give a more formal definition? Yes.
For any array of dimension CiJLGICKI, the leftmost index will change its
value i times, the middle index will change its value 1 X j times, and the
rightmost will change its value t X 7 X k times.
If you simply count the changes in the foregoing diagram, you can see the
truth of the rule just stated. The leftmost index changes its value 3 times, the
middle index changes its value 3 x 3 times, and the rightmost index changes
its value 3 x 3 x 3 times.
In the initialization of the three-dimensional array, we create initializers for
each row, where row is defined as each possible combination of the left and
208 Chapter 8 Arrays and Arrays of (Non-Char) Pointers
Multidimensional
array3d[O][OJ[0O]: 1
array3d[LO][0][1]: 2
array3d{L0][0J][2]: 3
array3d[L0][1][0]: 6
All other elements of array3d: 0
Memset() puts value in each of the bytes starting at pointer and continu-
ing for number_of_bytes. Thus if you wanted to give array3d elements a
starting value of zero, you would say
memset(&array3d[0][0]J[0], 0, 27 * sizeof(int));
And this would work, because if each byte of a four-byte int is zero, the
whole int is a zero. However, if you wanted to give all elements of array3d a
starting value of 3, the following would be disastrous:
memset(&array3d[0][0][0], 3, 27 * sizeof(int));
/*
50 60 65 70 Contents of input file
OMT 2 1 Aad 2
80 85 90 85
(mo 08S 2883
Oj 367 134) i
d#Hinclude <stdio.h>
#define NUMTESTS 4
main(int argc, char *argv[])
{
Allee fps
double colavg(int matrix[][NUMTESTS], int numstudents,
int testnum),
rowavg(int matrixL]J[LNUMTESTS], int numtests,
int studentnum) ;
double resultl, result2;
Time Woalé
IMceamacvinixiol 4a /<**<SeHolds sal lathes scones sas)
/*x**k* When users use this program, they will number ****/
/***k students and tests starting at one, not zero. ****/
/**** The averaging functions called below take this ****/
(<< TN cOnaccounts HARK /
After we have added the score for the current student to the accumulator
Sum, we must skip over four test scores in matrix to get to the identicaltest
for the next student. Each student took NUMTESTS tests, so this means adding
NUMTESTS int storage units to whatever is in rowptr to get to the desired
score.
8.4 Useful Tricks with Matrices 211
Averaging scores for one student is easier than averaging a particular test
for all students. This is because, once we arrive at the proper row, the four
tests taken by that student lie contiguously in memory. Thus the declaration
int *colptr = &matrix[studentnum - 1][0];
gets us to the first test score of the proper student (studentnum - 1). Then,
to get to the next test score for that student, we need only perform
colptr++;
deinclude <stdio.h>
j#tdefine NUMTESTS 4
main(int argc, char *argv[])
{
Ae ips
double rowavg(int *rowptr, int numtests);
double result;
Time sae
Tinie Teese fetter InkolGls Mil while? SCONES, setts//
fp = fopen(argv[1],"r");
POM Os 1 oro) ite)
foe Gis =. 0 hedaKeAcedith)
fscanf(fp,"%d",&matrixlil]ljl);
long sum = 0;
Vie Ws
sum += *colptr++;
ae (double) sum/numtests;
}
[RRRKKKKKEKKKKKKKKKK Program Output Below KKKKKKKKKKKKKKEKKE /
You may have guessed from your experience with the array and pointer
versions of elements of one-dimensional arrays that these two expressions rep-
resent identical values. Hence the name matrix, just as I claimed earlier, is
not an int *; it’s an int **. Thus matrix[2] in Example 8.3 is really
*(matrix + 2), and the data type of that is int *—the int * that points to
the first element of the third row (the one with index 2).
Confused? Remember how pointer arithmetic works in C: When you add a
value to a pointer, it jumps a certain number of storage wnits dictated by the
type of the pointer. I have told you that matrix actually points at a row, so add-
ing 2 to it jumps two rows of ints. Thus *(matrix + 2) (which is the same as
matrix[2]) jumps two rows from where matrix points, and the dereference
gives us the first int in that row.
This trick cannot work for column averaging for one simple reason—the
numbers in a column do not lie next to one another in memory. Therefore,
we must stick with the version of colavg() that we have, where we passed the
entire matrix to the function.
Another useful trick with matrices is to treat a matrix as a one-dimensional
array. Where would this be useful? Suppose we wanted a function to compute
the average of all test scores in the class. We could do this with the following
function.
8.4 Useful Tricks with Matrices 213
This is a very undesirable solution for two reasons: (1) we need to have a con-
stant like NUMTESTS so that we can pass the second dimension of matrix to
the function, and (2) the function, therefore, works only on matrices with
NUMTESTS columns. If a matrix is of dimension N, the function header must
be given the rightmost N - 1 dimensions. It’s the law!
Thus the function in Example 8.4 is highly deficient. The function that fol-
lows is the correct way to average over a whole matrix.
The beauty of this function is that, in fact, it can be used on arrays of any
dimension. You simply pass the address of the first element of the array and the
total number of scores. C doesn’t care how few or how many dimensions the
original array had.
214 Chapter 8 Multidimensional Arrays and Arrays of (Non-Char) Pointers
int array[2](3](4];
22 int units past array[0][0][0]. That is exactly correct. But notice what is
missing in the memory map formula for array: the size of the leftmost
dimension—that is, 2! It isn’t needed, and that is the reason why, when pass-
ing an array to a function, you need not specify the size of the leftmost dimen-
sion. It makes no difference how many dimensions an array is; the memory
map formula will not include a coefficient for the leftmost dimension.
Believe it or not, the compiler and the runtime system will let you get away
with array[0][6]! The lack of extra runtime checking is one of the reasons
why C code executes so swiftly. This blessing is also a curse if programmers are
more interested in being clever than in producing crystal-clear code. Unfor-
tunately, many untamed egos have ruined software. Just make sure that, for
the good of all concerned, you leave your ego outside of your code!
d#Finclude <stdio.h>
main()
{
int og0)leCM ail
int al3] Sa ace
int DLS ])
int c[4]
pile
p[1]
pizi-=
KAKKX /
[KKKKK p is a “ragged” 2-D array because the
KKKKX /
[KKK sizes are all different!
pl0J(0]
pLOJ(1]
pLOj(2]
pl1J{[0]
pl1J(1]
Di Lita
pl1J(3] FE
WOON
DWH
216 Chapter 8 Arrays and Arrays of (Non-Char) Pointers
Multidimensional
pCi](4] = 10
pli hOde= 9
pLZIELd = 10
UAE Plat
pEZdkedo=eZ
____________
ee EEE
As you can see, the program in Example 8.6 creates a “pseudomatrix” with
rows of 3, 5, and 4 ints, respectively. I call ita pseudomatrix because it is not
a real matrix. Remember that in a real matrix, all elements from the first to
the last lie next to one another in memory. That is not the case with variable p
in Example 8.6 for the obvious reason that we have no guarantee that arrays
a, b, and c lie next to one another in memory.
Therefore, you cannot process p in the same way that we processed a
matrix in Example 8.5. The same caution applies when you create a matrix
from scratch, as shown in Example 8.7.
d#Finclude <stdio.h>
d#Finclude <stdlib.h> Press COMCEIMS WEI VOE@) BING Qa. were/
main()
{
TMs =, Se OWSie, Ws
No new functions were introduced in this chapter, nor were any old functions
used in a new way.
w 1. An identity matrix is a matrix with ones on its main diagonal (same row
and column index) and zeroes everywhere else. Declare and initialize a 4-
by-4 identity matrix m. You should give yourself full credit for the shortest
answer only!
“Vighee Sy C0)
BF. Pars
eaae soirs Poh 72
(25 Als
tO:
lhe: Slot eon
bd aE
You know that the computer running this program has four-byte ints and
pointers and that &a[0][0] is 1000. Give the value of the following
expressions, or give error if the expression is not correct.
a. **p b. *(*(a + 1) + 1)
Coan Le 1 dalli tii
€r ad f. sizeof(p)
218 Arrays and Arrays of (Non-Char) Pointers
Chapter 8 Multidimensional
g. sizeof(a) hak(pets3)
i, p[2] j. al0J(5]
k. p i pa ale
mip = (a) Dee pe = (4)
o. p[-1] p- *al2]
q. * aici fDi
Saad. gta t. &p[2]
. Write a function that gets one int * parameter representing an array and
one int parameter containing the number of elements in the array. The
function will return an array containing the addresses of all array ele-
ments whose value is zero. Hint: Use realloc(). Another hint: The
returned array should waste no space except for the NULL sentinel value
in the last element. You have to figure out what the return type of the
function is all by yourself.
int aloj[4if3ilel;
Give the memory map formula that C will construct to access
aCqliriisi(tl:
Your function will report the row numbers of any two rows in the first
parameter that are contiguous in memory.
. An int array a has num elements. Write a code fragment that will put the
address of every odd-indexed element of a in p. You must declare p as
well.
int **p;
Chapter 8 Exercises 219
Assume that p is built into a ragged array of int with the last pointer in
the array given NULL as a sentinel value. Each row of ints ends with a -1
sentinel value. Write a function to compute and return the sum of all
nonsentinel values in p.
w10. You do not know how many bits are in an unsigned long on your com-
puter. You have an array of unsigned long called array. Show how to
turn every bit of every element of array on. You must do it in one state-
ment. You are given the condition that array has N elements.
wll. Write a function that will create and return a ragged two-dimensional
array. The function will be given an int array argument whose first ele-
ment contains the number of rows in the ragged array and the remaining
elements contain the row sizes. The function should return NULL if it is
unable to create the ragged array. Note: You are merely creating the space
for the ragged array—you are not assigning values to the elements. The
answers to Exercises 11 and 12 are given together in the Appendix, along
with a main() driver function to show that they work.
#12. Assume that the array created in Exercise 11 has been populated with val-
ues. Write a function that will take the same argument as Exercise 11
along with the ragged array itself and will print out all the values in the
ragged array. See the note at the end of Exercise 11.
a ie Yo REN
ae
% ca jer ne + } wir
aren ties
peratés chr eters
>) SRS = ~~
' a
:
A lion o.
2 £2
; ts
7 a |
<4 —= }
Ly s_— ss
=
Potpourri:
Part One |
The last two chapters of this book present topics that, though important, do
not contain enough material to fill an entire chapter. Don’t be misled by the
brevityof coverage of these topics. They’re important—they just don’t take
many pages to cover!
In this chapter, we will discus the ANSI C time functions, pointers to func-
tions, functions with variable-length argument lists, and understanding diffi-
cult data type declarations. The time functions are particularly important.
You would be hard pressed to think of any aspect of computing where time
calculations are not involved. Business reports are usually dated. Database sys-
tems keep track of the time when database changes and searches are made.
Operating systems keep track of who made what request and when they made
it. Performance benchmarking of programs is accomplished using the time
functions to compute the wall-clock time it takes for a code fragment to exe-
cute. The list is endless.
Pointers to functions is a topic in C that deserves more attention than it
gets. System function libraries for many operating systems include routines
that require that one of the arguments be a pointer to a function. The most
confusing aspect of using pointers to functions is the syntax, because the
ANSI and pre-ANSI methods are quite different. Read Section 9.2 and you
will learn both.
Have you ever wondered how printf(), scanf(), and other functions in
the C library work? Unlike the functions you’ve written, they allow argument
221
222 Chapter 9 Potpourri: Part One
lists of any size. In Section 9.3, you'll find out how you can write such func-
tions yourself.
Finally, Section 9.4 will help programmers understand such difficult decla-
rations as arrays of function pointers, functions returning a pointer to a func-
tion, and so forth by using the rightleft rule. You won’t want to miss this!
#Finclude <stdio.h>
dHinclude <time.h> /**** Location of time functions. ****/
main()
i
struct tm *timeptr; /*** Notice lack of allocation!) 3**747
time_t timeval;
char *chtime; /**** No space allocation again!! ****/
char buffer[80];
timeptr = localtime(&timeval);
printf("Local time: %d hours, %d minutes\n\n",
timeptr->tm_hour, timeptr->tm_min) ;
timeptr = gmtime(&timeval);
printf("Greenwich time: %d hours, %d minutes\n\n",
timeptr->tm_hour, timeptr->tm_min);
chtime = asctime(timeptr);
printf("Greenwich ASCII time from asctime: %s\n\n",
chtime) ;
timeptr = localtime(&timeval);
chtime = asctime(timeptr);
printf("Local ASCII time from asctime: %s\n\n", chtime);
Notice that the time() function is the core time function. Although its return
value is seldom used directly, the return value is used by virtually all the other
time functions. The return value from time(), though it is not forced by the
ANSI C standard, is generally the number of seconds since January 1, 1970.
The prototype for the time() function is
time_t time(time_t *timeval);
As was the case with the size_t data type, the time_t type was invented to
ensure portability between systems but is almost always defined as a long in
actual practice. The return value is the same as the value placed into the
parameter, so it need not be used. The functions ]ocaltime(), gmtime(),
and asctime() all use the time_t * variable acquired by the time() func-
tion to produce the current time in different forms. The prototypes for these
three functions are
structtm:*localtime(timert tptr):
struct tm *gmtime(time_t *tptr);
char *asctime(time_t *tptr);
d#include <stdio.h>
dFinclude <time.h>
main()
{
ome) as
float ftime;
time_t timel, time2, diff;
SURUG ts EMiane Dien
9.1 The ANSI C Time Functions 225
time(&timel);
for (i = 0; i < 1000000000; i++) :
time(&time2);
diff = time2 -,.timel,
tptr = gmtime(&diff);
printf("1 billion for loops took %d seconds!\n",
tptr->tmsec)
ftime = difftime(time2, timel);
printf("1 billion for loops took %f seconds!\n", ftime);
}
[RR KKKKKKKKKKK KK KKK Program Output Below KKKKKKKKKKKKKK KK /
d#tinclude <stdio.h>
dtinclude <time.h> /**** Location of time functions. ****/
main()
{
Struct tm *timeptr; /*** Notice ack of -allocationie==**7
time_t timeval;
char buffer[80];
time(&timeval);
timeptr = localtime(&timeval);
Strftime %c format
The annotated output in Example 9.3 should give you a good idea of the dif-
ferences among the various time format indicators in strftime(). Strf-
time() (which stands for String Formatted Time) is like a time-related cousin
of sprintf (). It allows us to format a time string in lots of different ways: with
a 24 or 12-hour clock (%H versus %1), with a fully spelled-out day versus a three-
character day abbreviation (%A versus %a), with a fully spelled-out month ver-
sus a three-character month abbreviation (%B versus %b), with a four-digit year
versus a two-digit year (%Y versus %y), and with an AM/PM spec (%p).
As you can see in the first line of output, the %c format gives you a full-time
spec similar to that given by the asctime() and ctime() functions. There-
fore, it is seldom used. The list of format variations given here is not a com-
plete list of all available formats, but it shows all of the really useful variants.
Function pointers are really useful, but they are not given much coverage in
most C texts. Many low-level system calls in most operating systems have func-
tions with pointer-to-function parameters. Many mathematical functions also
require pointer-to-function parameters—for example, sin(x) on one call
versus CoS(x) on another call to the same function. Two code examples fol-
low. The first illustrates pointers to functions using the pre-ANSI C syntax.
The second illustrates the easier ANSI C syntax.
dHinclude <stdio.h>
main()
{
float reciprocal(int), square(int),
funsum(int, float: (tint);
float square(int k)
{
return Chiodw) Ks kK:
}
[RKRKKKKEKKKKKKKK KK Program Output Below KKEKKKKKKKKKKKKK KK /
#Hinclude <stdio.h>
main()
{
float reciprocal(int), square(int),
funsum(int, float fCint));
float reciprocal(int k)
t
retuned leO/ke ss
}
float square(int k)
{
PSU Cirlloee)) (Kk = [ks
}
[RRR KKKKKKKKKKKKKKKK Program Output Below KKKKKKKKKKKKKKKK /
The difference between the old style and the new style can be seen in the two
declarations:
rul@aie Cer) Cie)
TilOae F Cane)
The new, ANSI style was invented because it’s easier to use, but is it clearer?
I don’t think so. The old, pre-ANSI style makes it much clearer that f really is
a pointer. This is one of the very few areas where I disagree with the ANSI
committee. I hate the “dumbing down” of a language when the feature really
isn’t hard to use.
With either style, note that you pass the function pointer to funsum() by
just giving the name of the function to be passed. Inside of funsum(), the call
to function f() is really a sneaky call to whichever function was passed
(reciprocal() in one call to funsum() and square() in the other call).
The final point to be made about pointers to functions is that the syntax is
the same whether you are passing a pointer to a function that you wrote or to
a function that is in the C library. Thus, if you want a function to use the
sin() of a variable x in one call to the function, and the cos() of x in
another, and the function is called trig(), its prototype may look like this:
double trig (double x, double (*trigfunc) (double x));
A call to trig() that passes the math.h sin() function may look like this:
All of the functions whose names you pass as parameters must have the same
return type and the same number and type of parameters. Thus you couldn't call
trig() with a pointer to a function with, say, two int parameters or a struct
pointer return type. Don’t forget this important point! You'll get a compile-
time error if you do.
230 Chapter 9 Potpourri: Part One
[ERSAEX*EES, “Suman
Gs owes t. argument SAYS) OW. site Miss 7
[RRKKKKKKKK Many unnamed arguments follow. PR AAA KA
while (n--)
{
sum += va_arg(argp,double); /*** Get unnamed arg. ***/
} /*** Move argp to KKK |
/*** next unnamed arg. ***/
va_end(argp); /* REQUIRED!! */
return sum;
va_start(argp,string);
while ((c = va_arg(argp,int)) != ‘#’) /* # is sentinel! */
{
printf("The character %c is%s in 4s\n",
Co SwPehne(Svuringse) 2 YY es “OK. SwriiMe))s
i:
va_end(argp);
}
[RRKKRKKKKKKKKKKKKK Program Output Below KREKKKKKKKK KKK KKK KK /
The code really says it all here! All functions that use variable-length argu-
ment lists specify this list using an ellipsis (...). The unnamed arguments
denoted by ... must follow the last named argument. There must be at least
one named argument. The va_sta rt(), va_arg(), and va_end() functions
that appear in all such functions are in the header file stdarg.h. The
unnamed argument list pointer is of type va_list, whose definition is also in
StpaleleG).|nh.
232 Chapter 9 Potpourri: Part One
The function va_start() will appear in every function that uses variable-
length argument lists. It initializes a pointer of type va_list so that it points
at the first unnamed argument. The prototype for va_start() is
void va_start (va_list argp, type-of-last-named-argument
name-of-last-named-arg);
Thus, in sumargs(), the following diagram is the state of affairs after the call
to va_start().
argp
As you can see, argp is pointing at the first byte of the first unnamed argu-
ment. After the first call to va_arg(), the following is the state of affairs
inside of sumargs().
argp
Va_arg() has fetched the first unnamed argument value (10.0) and added it
to sum. This process of collecting unnamed arguments and adding them to sum
continues until the while loop header reaches zero. The named argument (n)
tells the while how many times to loop. Notice that va_arg() fetches
unnamed arguments and also moves argp to the next unnamed argument.
The prototype for the va_arg() function is
type-of-unnamed-arg va_arg(va_list argp, type-of-unnamed-arg)
You must be very careful to put the proper type name in the second param-
eter of va_arg(). Otherwise, the argp pointer will not move the proper num-
ber of bytes after each call to va_arg(). It is useful to know that floating-point
constants (such as 10.0) are of type double and that character constants
(such as 'X") are of type int. The unnamed arguments can be of all different
types. The critical point is to give va_arg() the proper type on each call.
The va_end() call cleans up the unnamed argument stack so that your
program behaves normally after the variable-length argument list function
terminates. Although it appears to do nothing, you risk dire consequences
when you exclude it! The prototype for va_end() is
void va_end(va_list va_argp);
9.4 How to Write and Read Difficult C Declarations 233
There is an amazingly simple rule for figuring out how to read or write diffi-
cult C declarations. It is the mght-left rule. It is easiest to show how this rule
works by illustrating it in action with successively more difficult problems.
Let’s start with
lniteeecs Pyeoulls
Start at the variable name, and work right and then left.
Do you see it? We went right, then left, then right, then left—until we were
done.
Let’s try it on something harder.
ime Cor) Cie Ws
This is harder, because there is one extra rule I didn’t tell you about: You
must finish off parentheses around the variable name first. So...
Again we went right left-right-left. Now let’s go to even more difficult decla-
rations.
lint Gp yEZON
234 Chapter 9 Potpourri: Part One
Remember, you've got to finish off the parentheses around p first! So...
(p) /**** Nothing to right of p-go left! ****/
(*p) [**x* P is a pointer to “something”. ****/
(*p) [20] /x*x** P is a pointer to an array of 20
KKK /
"somethings".
int (pol20)./**** Peis -a pointer to an array.of cOlints 2 ae? /
Now we’ll show you the toughest one of all—a function header that
appears virtually unreadable but can be understood using our trusty friend
the right-left rule.
int (*func(int num)) (int);
There is a function in the C header file, signal .h, that has a pointer to a
function in its parameter list and returns a pointer to a function. That func-
tion, signal(), will be discussed in the final chapter. Thus you can see that
this was not a purely academic exercise. You’re going to use this stuff in its
toughest form very soon.
Chapter 9 Facts About Functions 235
Notice that the right-left rule has a certain “vocabulary” attached to it.
When you encounter an asterisk (*), you must substitute the language “pointer
to.” When you encounter (parameters), you must substitute the language
“function with parameters of type ....” And when you encounter [n], you must
substitute the language “array of n ‘somethings’.”
The rightleft rule can help you decipher the most convoluted data types
and function headers. Although you may choose to avoid such difficult decla-
rations in your program, it is a bad mistake to avoid learning about them.
This is because (1) you may have to use such declarations in your job, and (2)
you may have to read code with such declarations. You have no control over
code that already exists, so the latter is going to be unavoidable.
<time.h> Functions
Fact: 4) The time loaded into the struct tmis in the local time
of your region.
Fact: 5) NULL is returned if the time could not be retrieved.
<stdarg.h> Functions
Exercises
gw 2. Write a code fragment, with variable declarations, that will print the cur-
rent local time in a form that looks like this:
w 3. Write a function that will return the number of hours difference between
your current local time and Greenwich Mean Time. Subtract one from
this difference if your region is currently in daylight savings time.
g@ 5. Write declarations for the following. Use the variable name X in your dec-
larations.
a. A pointer to an array of 20 unallocated strings.
b. An array of 20 pointers to unallocated strings.
c. An array of 20 strings, each of which can hold 10 characters (including
thes \0)s).
240 Chapter 9 Potpourri: Part One
@ 6. Write some code to prove that your answers to parts a and j of Exercise 5
are correct. You need not use the array dimensions in the problem to
prove understanding. Strings need not be unallocated; indeed, it is hard
to prove correctness without allocated strings. This is very difficult.
struct login
{
char username[16];
time_t login_time;
i:
Each currently-logged-in user’s login time was obtained from a call to
time(). Given this, write a function that takes no parameters and returns
void. This function will construct a text output file (login.dat) in which
each line has a username followed by the number of days, hours, min-
utes, and seconds logged in.
struct execute
{
Wime Ws
int (*function) (int);
iF
A function is called as follows:
Pray
+
Ione, (hin ~
Wwe
—
Ae sot peuliegagy
ni ates feral de
“ONETee aA
syle @
” nae
if
~
a eae ny hes
wit veT aH ;a8
anna aren
Th,
= iniae Af) cali iva Susley rt b
= ty nthe?) 77h A4ae (aire donee
he oh @owel a). kPa
airy OF
—— i Avian iivives
Tr ihetai Tat sip «
; > i
iT tte CH
nT) ‘ ; ¥ ‘aa
we >a
, -
—
=e \
Part Two
In this final chapter of the book, we will investigate searching and sorting via
the bsearch() and qsort() functions in stdlib-h, signals and interrupts,
and some advanced file- and directory-oriented functions.
The qsort() and bsearch() functions are like cousins. These two array-
oriented functions sort arrays and apply a key-driven binary search to arrays.
The array given to bsearch() is assumed to be sorted, so you can see that
these two functions are often used together.
Although we have emphasized self-sorting data structures like linked lists
and binary trees in earlier chapters, one should not be led into thinking that
operations on arrays are not needed. There are many applications where a
file’s contents are brought into memory, sorted, and then written back out to
disk. This makes possible sequential displays of data and the detection of
duplicate data entries (because after sorting, the duplicates will be next to
one another).
Indeed, when we have an unchanging collection of data where we need to
do searching, it is actually far more efficient to use arrays than dynamic data
structures. For one thing, when we are using fread() to reada file into mem-
ory, it can be done in one read operation if we are reading directly into an
array. Then the bsearch() function offers us search times on the array that
are equal to those of a completely balanced binary tree.
The second section in this chapter looks at the signal.h library and
related libraries used to detect keyboard interrupts, memory leaks, arithmetic
overflows/underflows, and other exceptional program events.
243
244 Chapter 10 Potpourri: Part Two
Finally, the book concludes with a discussion of a few file- and directory-
related functions that do not fit neatly with those discussed in Chapter 5.
#Hinclude <stdio.h>
#include <stdlib.h> /**** Has qsort() and bsearch(). ****/
#Hinclude <string.h>
dtKdefine BLOCK 10
dtdefine SENTINEL '‘~'
struct hardware
t
char partname[20];
int quantity;
Lap
parts = open_file(argv[1]);
inventory = read_file(parts, &numparts);
fclose(parts);
qsort(inventory, numparts, sizeof(HARDWARE), compare) ;
print_sorted_output(inventory) ;
Dip inteh CNM AN =)
find_keys(inventory, numparts); /** Calls bsearch()!! **/
y
PALE SOerhESE
246 Chapter 10 Potpourri: Part Two
if ((inventory = (HARDWARE *)
malloc(BLOCK * sizeof(HARDWARE))) == NULL)
{
printf("Fatal malloc error!\n");
exit(2);
}
memset(inventory, 0, BLOCK * sizeof(HARDWARE)
);
mover = inventory;
while (fgets(line, 80, parts))
{
numlinest++;
partname = strtok(line,":");
pqty ="strtokC(NUlin “\040%)
mover->quantity = int) strtolGpqty. NUL eto):
strcpy(mover->partname, partname);
if (numlines % 5 == Q).
{
if (Cinventory = (HARDWARE *)realloc(inventory,
(numlines + BLOCK) * sizeof(HARDWARE))) == NULL)
{
printf("Fatal realloc error!\n");
exit(3);
}
mover = inventory + numlines; /** Vital!!! **/
[*X**kXkK Tero out new BLOCK of storage ******/
memset(mover, 0, BLOCK * sizeof(HARDWARE)
);
}
mover++;
10.1 Sorting and Searching with Qsort and Bsearch 247
*mover->partname = SENTINEL;
*numparts = numlines;
return inventory;
}
Strepy Cl ines.
for ( ; token = strtok(keyptr, "\040"); keyptr = NULL)
{
strceat(line, token);
SwlrCereC lines “YWOAO)
}
line({strlen(line) - 1] = ‘\0’; /*** Overwrite space ****/
strcpy(key, line);
while (*key != ‘\0’)
{
*xkey = toupper(*key);
key++;
H
} U/****, SCENGROneDRoghame act ty
[RRKKKKKEKKKKKKKKKKEK KE Program Output and KKKKKKKKKKKKKKKKK |
Because it is late in the book and we are assuming a certain level of achieve-
ment at this point, we are not going to give a detailed analysis of this program
except for a few important points. First, notice that qsort() and bsearch()
both require a pointer to a function as an argument. The function is one that
you must write! This function, which is called compare()in Example 10.1,
must have const void * argument types. You must convert these arguments
250 Chapter 10 Potpourri: Part Two
This isn’t much, but even this limited set is very helpful. For example, if
malloc() or realloc() fails in your program, you may not care much where
it failed because it’s often an operating system glitch. Instead of testing the
return value of malloc() or realloc() for NULL, you could put this one
statement at the top of your program:
Signal(SIGSEGV, handler);
Remember, this is useful only when (1) you don’t care where memory fail-
ures occur, and (2) you’ve gotten all the memory leaks out of your program so
that malloc()/realloc() failures are the only remaining problem source.
Both of these conditions must apply! On some architectures, SIGBUS is synony-
mous with, or is a replacement for, SIGSEGV. Check your reference guide.
SIGFPE almost always occurs when you attempt a divide-by-zero at runtime.
SIGTERM is usually sent by another program—often one invoked by a system
252 Chapter !0 Potpourri: Part Two
#Hinclude <stdio.h>
#Hinclude <signal.h>
main( )
f
void handler(int signum);
flOateelent:
char line([80], *ptr = NULL;
}
[RRKKKKKKKKKKKKKK Sample Session Below KKKKKKKKKKKKKKKEK /
Ouch!
SIGINT caught!
Ouch!
Ouch!
SIGINT caught!
Ouch!
Ouch!
Ouch!
Ouch!
Ouch!
Enter two numbers: i = inf
SIGSEGV caught!
Example 10.2 illustrates the blessings and the perils of signal (). SIGFPE was
not trapped on a divide-by-zero on two different platforms. As you can see, it
plugged a string (inf) into i, denoting “infinity” (I guess).
The SIGSEGV signal was raised when I attempted to dereference a NULL
pointer. SIGINT was raised when I hit CTRL-C but I want you to notice that I
reinstalled the handling of the SIGINT signal inside of handler(). This is
imperative, because on some platforms, once a signal is raised, the way of
handling it reverts to SIG_DFL unless you “refresh” the signal by issuing
another call to signal() as we did inside handler(). The default on most
systems on an “unhandled” (default method of handling) CTRL-C is to abort
the program. Thus reinstallation of signal handling is a must!
Example 10.2 provides a decent model of how signal () works but does not
fully expose its shortcomings. For example, if you do not exit() from the han-
dler function, your program will return control to whichever statement was exe-
cuting at the time the signal was raised. This is dangerous because some
functions are non-reentrant. This means you may return control to a function
that hangs forever. Is there a way out of this problem? Fortunately, there is.
254 Chapter 10 Potpourri: Part Two
Example 10.3 Controlled Re-Entry to Code Via Set jmp and Longjmp
dEinclude <stdio.h>
dtinclude <signal.h>
fHinclude <setjmp.h> /*** Contains setjmp/longjmp. ***/
d#tinclude <stdlib.h>
jmp_buf env;
main()
{
void handler(int signum) ;
Wim. Le
Ouch!
Ouch!
Ouch!
SIGINT caught!
Got here via longjmp!
WEVIUE OF I< Sse 6
Ouch!
Ouch!
Ouch!
Ouch!
Ouch!
Ouch!
Ouch!
Ouch!
[ee
10.2. Signals, Interrupts, and Error Handling 255
The functions setjmp() and longjmp() are located in setjmp.h. The first
time the setjmp() function is called, it automatically returns zero. When we
hit CTRL-C in the sample session above, instead of allowing an uncontrolled
return to main(), the call to longjmp() returned us to the setjmp() again
with a return value of one (the second parameter of ]ongjmp()). You cannot
throw a 0 back to setjmp(). If you try, by saying longjmp(env, 0), it will
throw a 1 back instead.
The thing it is most important for you to know about setjmp() and
longjmp() is that they do not roll back the values of variables. Notice that k’s value
upon returning to the setjmp() is still 3—the value k had when SIGINT was
raised.
Signal handling can be used for many purposes, and one of the most
important is guarding critical code sections. For example, suppose your pro-
gram is in the process of writing out a critical system file. You do not want the
program aborted by a CTRL-C. You can prevent disaster with code such as
signal(SIGINT, SIG_IGN);
(XESSES Cid CalmCodeml Nynere) cane
signal(SIGINT, SIG_DFL);
Thus CTRL-C interrupts are disabled (ignored) while the critical code is
executing and are re-enabled after the critical code has completed execution.
We saw earlier that some systems don’t acknowledge some signals they are
supposed to acknowledge; remember how SIGFPE was ignored even when we
tried to trap it. Are there any work-arounds that allow you to raise a signal
explicitly when the system won’t do it for you? Yes. Look at the code that follows.
signal(SIGFPE, handler);
printf("Enter two numbers: ");
gets(line);
SscamnrCiliMe, ir Yr’ Gil»e Capes
it ==" 02008 False
Col GEE) sla)" =** = sEXDA ICTY Catse
SiG PE eas
else i /= j;
The raise() function generates the signal type given as its argument. Thus
if j is 0.0 in the foregoing program, control will be transferred to handler().
There are some signal types that are not part of “official” ANSI C but are
handled by virtually every compiler on the market. The most important of
these is SIGALRM. The SIGALRM signal is generally used to time a program out,
as in the following example.
#tinclude <stdio.h>
d#tinclude <signal.h>
main()
{
256 Chapter 10 Potpourri: Part Two
int num;
void myalarm();
signal(SIGALRM, myalarm);
alarm(5);
printf("You have 5 seconds to enter a number! ");
scanf("%d", &num) ;
alarm(0);
printf("Good kid-you did it!\n");
}
void myalarm()
{
printf("\nUser is timed out!\n");
OX Tralee.
}
[RRKKRKKKKKKKKKK Sample Session Output Below KKKKKKKKKKKER /
$ sigalrm
You have 5 seconds to enter a number!
User is timed out!
$ sigalrm
You have 5 seconds to enter a number! 6
Good kid-you did it!
d#Finclude <stdio.h>
d#Hinclude <signal.h>
10.2 Signals, Interrupts, and Error Handling 257
#Hinclude <stdlib.h>
main()
{
void handler(int signum), cleanup(void);
Hime Te
atexit(cleanup) ;
Signal(SIGINT, handler);
OLE Alls Sat Bh 56 ihe thas)
{
sleep(1);
printf(“Ouch!\n”);
ae cleanup(void)
Ouch!
Ouch!
Ouch! (22S onc eCUREaCenenel axa.1.
Aborting due to terminal interrupt!
I’m in the exit handler!
Notice that the cleanup() function does not need an exit() call as its last
statement. Through the atexit() call, the program knows that it is supposed
to terminate when this function is called—it does not have to be told to do
that explicitly.
You also need to be aware that the function given to atexit() must have a
void return type and parameter list. Unfortunately, this means that any vari-
ables accessed in the exit-handling function must be declared either locally or
globally. They cannot be passed in as parameters.
Finally, let’s deal with error detection during program execution. During
the debugging phase of program development, we might want the program
to fail at runtime if certain variables have improper values or if a certain con-
dition is not true. The assert() function in assert.h causes a program to
abort with an error message if the value of the expression given as a parame-
ter is zero. Here’s how it works.
258 Chapter 10 Potpourri: Part Two
assert(i == 2);
}
eS
In this example, because the assert condition is false, it has a value of zero.
This causes the program to terminate with an error message. The error mes-
sage always gives the assert condition that caused termination so you'll know
exactly where the program failed. The message for Example 10.6 might be
Assert ‘i == 2’ failed.
Abort (core dumped)
Another function that is useful for error reporting is perror(). Here isa
demonstration of it.
The perror() function dumps to the screen the string you give it plus any
additional system information it obtains from looking at the type of error that
occurred (which will be encoded in the value of errno).
10.3 File-Related Functions 259
Before we move to Section 10.3, I want to caution you that much of the mate-
rial in Section 10.2 functions in a very individual way from platform to plat-
form. Also, your platform may have slightly more reliable methods for handling
signals and performing error detection. For example, the government POSIX
standard specifies a new signal-handling function called sigaction() that
eliminates all of the deficiencies and problems with signal().
Whatever your destination platform, if you understand the material in this
chapter and the preceding chapters in this book, you'll be able to move to sys-
tems programming on any platform with little difficulty. Our intention here is
to introduce the basic ideas of signal, interrupt, and error detection and han-
dling. Realize, though, that any programs you sell will contain much non-
ANSI code when it comes to such detection and handling.
Some algorithms need disk files even though they need them only tempo-
rarily. For example, some merge sorts sort little pieces of a file into many files
and then merge the sorted pieces back into the original file. Sometimes an
application will not know immediately whether a new file needs to be saved.
The application may wait for a yes/no response from a user to save a newly
created file.
When applications create temporary files, they may wish to “clean up” by
removing any files thus created. Though such occasions are rare, there may
even be times when we wish to rename a file. We might want to take a tempo-
rary file (which might be in some directory other than ours) and make it
“permanent” by renaming it so that it is part of our directory structure. Thus
we need functions to do the following:
1. Create a file name that does not conflict with any other file name in the
user’s directory (function: tmpnam()).
2. Rename a file. The user might want to keep the file opened with a
name given by the function performing task 1 above (function:
rename())!
3. Delete a file (function: remove()).
4. Create a file that exists only for the duration of the program and that is
then automatically deleted (function: tmpfile()).
function. The output shows, for the first time in this entire book, a library
function that fails. The rename() function returns an error value. We’ll
explain why after the example.
d#tinclude <errno.h>
dHinclude <stdio.h>
dHinclude <stdlib.h>
d#Finclude <string.h>
main()
{
char filename[L_tmpnam]; /* L_tmpnam defined in stdio.h.*/
Cham sSursille— AnellON Ms.
"Goodbye\n",
Calta
7 DOGINTeae
NULL };
char **mover = strs;
char line[80], command[80];
AEE tps
As you can see, the temporary file obtained with tmpfile() was properly
opened. The proof is that when we did a rewind() and read what we wrote
on that file, we read the correct file contents as shown in the output of the
printf() in the first while.
Tmpnam() did, indeed, give us a nonconflicting name for a file. Notice, how-
ever, that this file was not placed in the jwp2286 directory (my login directory
on a Silicon Graphics computer). Rename() will usually work fine, but beware
of trying to rename files that do not reside within your directory structure, such
as files named by tmpnam( )—the operating system may not like it!
Remove() did its job fine. That’s why the last directory listing (the ds -/com-
mand on the Silicon Graphics computer) could not find the (remove'd) file
named by tmpnam(). When using remove() and rename(), use full directory
path specs whenever possible.
262 Chapter 10 Potpourri: Part Two
<stdlib.h> Functions
Fact: 1) You must write the compare() function whose name you
pass as the fourth and final parameter.
Fact: 2) Qsort() insists that the two parameters of compare() be
of type void *. Therefore, if you wish to refer to some
components of, say, structs, you should copy the
parameters into local variables of the proper type. Then
your struct component references won’t cause compile
time problems.
Fact: 3 —
Your compare() function should return -1 if elementl
is “less than” el ement2, where your code defines less
than. Compare() should return 0 if element1 and
element2 are “equal” (again, you define equal). Finally, it
should return 1 if element] is “greater than” element2.
Fact: 4 ~— Be careful! You will note that qsort() has no error-return
value. Your local compiler may change errno in some
platform-specific way. The message here is to make sure
you understand qsort() by testing it on a small example!
Fact: 5) Items are sorted in ascending order.
Fact: 6) Starts sort at the address given in start. Yes, this does
allow you to sort subarrays.
<assert.h> Functions
<stdio.h> Functions
<signal.h> Functions
<setjmp.h> Functions
Fact: 1) Returns program control toa set jmp() call that uses the
same jmp_buf variable.
Fact: 2) The return_value_of_setjmp
thrown back to the
setjmp() call can be used in an if, switch, or other
statement to determine what code to execute.
Longjmp() cannot throw a setjmp() return value of
zero back to a setjmp(). An attempt to do so will actu-
ally throw a 1.
Fact: 3) Values of variables are not altered from their most recent
settings after a longjmp() is executed. Be sure to
remember this vital fact!
Exercises
a. Reads integer test scores from a text file called scores.dat. Test
scores are separated by whitespace until EOF.
b. Throws out any test score that is not a valid int between 0 and
100.
- Uses malloc() and realloc() to control space management so that
no more than 100 int’s worth of space are wasted in the array of
scores.
- Computes and displays the median test score. The median of a group
of numbers is defined as the (N/2) + 1 highest score if there are an
odd number of scores and as the average of the (N/2) and (N/2) + 1
highest scores if there are an even number of scores. N is the total num-
ber of scores.
. Pay attention to modularity. You ought to have a function to read the
file from disk and one that computes the median. Report results in
main().
You will not be given any hints about how to read the file or error-
check its tokens. Good luck!
3. Write a function that accepts an array of file names as its only parameter
and that returns an array containing a subset of those names representing
files that could be successfully deleted. The input parameter has a NULL
sentinel value in it. Your return array should also have a NULL sentinel in
it and should waste absolutely no space.
g@ 6. Write a function that builds a command string and executes it, given a
NULL-terminated array of strings with command tokens as the only func-
tion parameter. Assume that no command token has whitespace in it.
Return void.
ysl b ZEA al
i gar ru ' $
Nae
t tec a
e 8 ei ta
orl
Sip
Sai
*» ’
fie e ‘want an
wy! . - lJ +a
ny i
= f
ee sanersPeal aa. :
pom, ee svi i
{2
= oa : age Meal ee
1 | .
4
anes =
SS
/
a i@fs ork Gate
-~ 1 i, 6
+a toseraes
=
i
~
= — —
ree sige
ren eve p) okt
>
1
‘
t
1. a. 10 b. 1016
Cc. 2 d. 2
€,2012° f. 24
g. 4 h. 4
i, 36 j. 20
273
274 Answers to Selected Exercises
5. a. -l b. 1
c. 1003 d. 1001
e. 2 fe |
g. ‘B’ or 66 h. ‘l or 108
iaire or LO 16 COI)
int *mover = p;
Chapter 3 Answers
2. NODE *reverse_ stack(NODE *stack)
{
NODE *newstack = NULL, *temp;
while (stack)
‘
temp = newstack;
newstack = stack;
stack = stack->next;
newstack->next = temp;
}
return newstack;
/* */
/*
This program reads a file (accounts) that yf
/* contains records with two fields: a 4-digit */
/*
account number and a current balance. The */
/*
program reads the account records into a */
/[* linked list ordered by account number. * /
/* = //
276 Answers to Selected Exercises
#tdefine BLOCK_SIZE 10
#Hdefine FILE_OPEN ERROR 1
#tdefine MALLOC_ERROR 2
#Hinclude <stdio.h>
d#Finclude <string.h>
struct account
{
int account_num;
double amount;
struct account *next;
1A
main()
{
ACCOUNT *balance_list, *transactions_ list:
ACCOUNT *init_list(void);
void read_file_into_list(ACCOUNT *balance_list),
get_transaction_records(ACCOUNT *transactions_list),
adjust_balances(ACCOUNT *balance_list,
ACCOUNT *transactions_list),
create transactions _file(ACCOUNT
ABIPEMISECE1
OMS Se) -
create_balance_files(ACCOUNT *balance_list);
read_file_into_list(bala list);
nce
get_transaction_records(transactions list);
adjust_balances(balance_list, transactions_list);
create_transactions_file(transaction list);
s_
create_balance_files(balance list);
if (freerecs == NULL)
{
if ((freerecs = (ACCOUNT *)
calloc(BLOCK_SIZE, sizeof(ACCOUNT))) == NULL)
f
printf("Fatal malloc error!\n");
exit(MALLOC_ERROR) ;
}
rear = freerecs + BLOCK _SIZE - 1;
}
new = freerecs;
if (freerecs == rear) freerecs = NULL;
else freerecs++;
new->account_num = account_num;
new->amount = amount;
return new;
[RRKRKKKKKKKKKKK KKKKKKKKKKK /
while (transactions_list->account_num !=
balance_list->account_num)
{
balance_list = balance_list->next;
}
280 Answers to Selected Exercises
balance_list->amount += transactions_list->amount;
transactions list = transactions_list->next;
transactions_list = transactions_list->next;
while (transactions _list->account_num != 10000)
{
fprintf(transactions, "%04d %.2f\n",
transactions_list->account_num,
transactions_list->amount) ;
transactions _list = transactions_list->next;
}
fclose(transactions);
balance_list = balance_list->next;
Chapter 3 Answers 281
4444 6000.00
2222 2400.55
6767 100000.34
5555 100.56
7878 4567.67
3431 1000.00
6898 (ieee
8906 40000.00
0043 7600.34
0154 87000000.00
4356 567.43
5656 0298
4698 S400e2!
7744 9307399
4378 90907
3333 20087 .87
LE 937.6250
9999 4209.89
(it 342.78
[*x* KKKKKKKKKKK INTERACTIVE SESSION KKEKKKKKKKKKK /
0043 7600.34
0154 87000000.00
Titty 2507200
2222 400.55
3333 .20087 .87
S43 T3337,
4356 567.43
4378 909.67
4444 6000.00
4698 3456.21
9555). 10056
5656 380.98
6767 100000.34
68988232123
7744 -192.01
TTTIRIST.O 56
7878 4567.67
8906 40000.00
9999 4209.89
1111 -849.78
2ccce2e000'.00
SAGE 3330/7
5656 380.00
7744 -10000.00
PUT 1507.00
7744 -192.01
previous = previous->forward;
current =“previous->forward;
}
}
10. void process_user(SERVICE *list, char *qname)
{
NODE *temp;
list = list->next;
while(strcmp(qname,list->qname) > 0)list = list->next;
if (strcemp(qname, list->qname) != 0)
{
printf("Queuezs %s : not found.\n", qname);
return;
}
if (list->front == NULL)
it
printf("No user in queue: %s.\n", qname);
return;
}
temp = list->front;
if (list->front == list->rear) list->rear = NULL;
list->front = list->front->next;
free(temp) ;
}
13. void transpose_pairs(NODE *list)
{
KODE sole! = Wis, sole, CWP, “chruairs
Chapter 4 Answers
1. char **substr_addresses(char *string, char *substr)
{
Static char *substrs[100];
char *ptr, **mover = substrs;
284 Answers to Selected Exercises
Pp
Cc
A bat is an animal.
. Bullwinkle is a caboose.
What's up?
foobar
foobar
aos
mo3
roa
° here here here!
.
Comto
pte
cdefg
}
}
linenums[count] = 0;
return linenums;
b
d.
fates
h ol O02
8maa
9 jlo
. #include <stdio.h>
d#Hinclude <string.h>
d#Finclude <stdlib.h>
main(int argc, char *argv[])
{
FIER Aip Srpout;
cham janel2Z574.—“punc;
fclose(fp); fclose(fpout);
} /**** End of program. ****/
9. #include <stdio.h>
#Hinclude <string.h>
#tinclude <stdlib.h>
main(int argc, char *argv[])
{
PILE “tp;
char line[257], *digit, *endp;
int sum = 0, number;
if (fp == NULL)
{
printf("Input file could not be opened!
Aborting!\n");
exit(1);
}
Chapter 5 Answers
Vraetpe= 1openC
stoobar- dat... rt”):
b. long pos;
fseek(fp, 0, SEEK_END);
pos = ftell(fp)/sizeof(struct foo);
printf("There are %ld struct foo in the file.\n", pos);
c. struct foo temp; middle = pos/2 - 1;
fseek(fp, middle * sizeof(struct foo), SEEK_SET);
fread(&temp, sizeof(struct foo), 1, fp);
d. struct foo temp, *all_records;
long records, i;
fseekGtpy- OL “SEEKSEND) ;
288 Answers to Selected Exercises
fp = fopen(filename, “r+b");
fseek(fp, 0, SEEK_END);
records = ftellGipiisizeotCstguee foo)
rewind(fp);
all_records = (struct foo *)malloc((records + 1) *
sizeof (Structsfoos
fread(all_records:. sizeof (struct.foo), records, 1p);
fclose(fp);
memset(&all_records[records], 0, sizeof(struct foo));
return all_records;
aed
00100
40
Hehe ila
. abcd
. abcdefghij
This
mmonaore
. This 1S a sentence.
Chapter 5 Answers 289
7. #include <stdio.h>
d#Finclude <stdlib.h>
main(int argc, char *argv[])
{
char line[256];
PILie “AFOUNER
8. #include <stdio.h>
dHinclude <stdlib.h>
main(int argc, char *argv[])
{
PILE “iFVaV<
long offset, middle;
char startchar, endchar;
fputc(endchar, fp);
offset++;
}
fclose(fp);
} /**** End of Program. ****/
Chapter 6 Answers
eas 0) b. 0
c. 8 d.7
roa ee Hh
g. 14 h.0
i. 63 280
Chapter 7 Answers
1. void print_backwards(void)
{
elneie x
if ((c = getchar()) == '\n') return;
print_backwards();
pUbGH ance):
}
write_file(fp);
Chapter 8 Answers
1. int matrix(4]£4] = {{1}; 10, 1 tO Oral arte, a0 yO ts:
2. a. error b. 4
@ # (SB ap i) d. 4
e. l f. 4
g. 48 h.6
rb j. 4
Ke lOZ0 1B. a
m.5 n. 4
0. 2 p- 6
q. error r. error
s. 1004 t. 1028
sum += *rowmover;
}
return sum;
}
ragged = create_ragged(sizes);
ragged[0][0] = 3; ragged[O][1] = 6;
[XKKKKK Row i! A KKKKKK /
ragged[2][0] = 1; ragged[2][1] = 5;
ragged[2][2] = 11; ragged[2][3] = 13;
[KRKKKK Row oe KKKKKK /
Chapter 9 Answers
2. time_t timeval;
SWPUIEW wih wUlMNe|EICE
char buf[80];
time(&timeval);
timeptr = localtime(&timeval);
strftime(buf, 80, "%A, %b. %d, %1:%M Zp", timeptr);
Drintt(.
2s -,.-Dut)>
3. int hours_from_greenwich(
void)
{
Tims GhinPire
time_t timeval;
struct tm *local, *greenwich;
time(&timeval);
local = localtime(&timeval);
greenwich = gmtime(&timeval);
diff = greenwich->tm_hour - local->tm_hour;
Chink = Celt — @) 2 ohh se als oars
Ti ClOGE-Sen_iSselsie) ely -= ile
return diff;
—
Chane X 204)
chides x20
char XbZ0
7E10);
int *x20):
EN Can AOeO ks
Tne, SIEZOM LINO 1s
se
gmenor
int (*X)Cint 7, double (*funcy Cinta e:
Chapter 9 Answers 297
X= Dp;
print (C25 nies asx)
Xa; [ASXRA SKIDS OVETS UHREES Sb inG Se SetaSoati
09/
Drint Ties \Nit et soi) /* eee SS houlidmp rice rey. ssh,
Xan
Drinthe
Zs Nn ioe* ex) =) YAAs* Shove piri: won eax es
}
[ RRKKKKKKKKKKK KKK KK Program Output Below KKK KKK KK KK KK KKK /
d#include <stdio.h>
main()
{
Time ALSICZOW LOW, CGexOeZowen@ile
Ke ae
SOT ROTO Ist esol ITOTOI. =e osrali etnbON si:
Orintncredaniems +2 xe
Xt? SAX SKIDS COO nts, GcO* LO), ~==*/
Drinth ¢ 20\nte senor /sreShouldapiiintess 47 /
Xora
plone
te(aed Milage es JE SHOUG Di UN beowe |oO,
j
[ERKKKKKKKKKKKKK Program Output Below KKEKKKKKKKKKK AK /
2
5
9
7. void login_times(void)
{
FILE *rawlogin, *textlogin;
SU RUG Ogun» Lemp
298 Answers to Selected Exercises
va_start(argp,dummy) ;
while (execptr = va_arg(argp, struct execute *))
{
i = execptr->function(execptr->71);
espT To nergt Pot y 40aINN etyA IO)
}
va_end(argp);
return 0;
}
Chapter 10 Answers
2a. if (j = 0) raise(SIGINT);
b. atexit(finish);
c. signal(SIGALRM, time_out);
Chapter 10 Answers 299
d. sleep(20);
e. longjmp(eptr, 5);
f. if (setjmp(env) == 0)
{
soe
}
elise
{
1 = As
}
g. rename("oldfile.dat",.“newfile.dat"™);
heint.*ptrs
int key = 25;
ptr = bsearch(&key, array, 100, sizeof(int),
compare_int);
i. assert(j == 0);
fj. if ((ptr = (int *)malloc(10* siizeof(int))) == NULL)
‘
perror("Fatal malloc error!");
exit(1);
}
k. signal (SIGSEGV, out_of_bounds);
or, on some computers...
signal(SIGBUS, out_of_bounds);
—e alarm(0);
m.int key = 25;
signal(SIGINT, handler);
if (bsearch(&key, arr, 100, sizeof(int), comp) == NULL
{
raise(SIGINT);
}
n. tmpnam( fname) ;
fp = fopen(fname, "“wtb");
o.fp = tmpfile();
}
mover = files;
while (*command)
{
strcat(run_this, *command);
SurcanCrum his, “NO4O”) <
command++;
}
runcthisistriencrun
this) = see Ome
system(run_this);
Index
&& operator, 10-11 traversing, 15-17, 17-20 bit toggling, 172-173, 175
->operator, 23, 24 See also int array; arrays, bitwise AND operator,
* operator multidimensional; 167-170, 174
and memcpy() function, 25 strings, arrays of; bitwise exclusive OR operator,
++ operator, 18, 32 structs arrays 169-171
+= operator, 17 arrays, ragged, 215-216 bitwise negation, 174
, operator, 7-8 arrays ofpointer to char, 120, biswise NOT operator, 170
= operator, 17 21 bitwise operators, 167-171
? operator, 4-6 ASCII character set bitwise XOR operator,
and character ordering, 16, 169-171
alarm() function, 256, 267 block, as static NODE pointer,
ANSI C asctime() function, 223, 224, 69
and bit manipulation, 167, 236-237 blockrear, as static NODE
assembly language, and incre- pointer, 69
fgetpos() and fsetpos() menting pointer, 18 Boolean expressions
functions, 142 assert() function, 257-258, in loop headers, 4
ftell() function prototype, 264 misuse of, 7
141-142 assignment suppression, 159 branching, 12-13
and nonsynchronous atexit() function, 256-257, break statements, in loops,
events, 251 263 12-13
pointers to functions, 221, automation, industrial, and bsearch() function, 243,
227-229 bit manipulation, 179 244-250, 262-263
range notations, 158
scan set form, 158 binary files call-by-value
signal types, 251, 255-256 modes for writing, 142 definition of, 20
sort function, 244 random access, 137-142 and structs arrays, 22
strcmp function, 16 record-oriented, 143-146 and substring searches, 97
strlen function, 15-16 binary operators, 170 and swap function, 20-21]
time functions, 222-227 binary tree data structures, case conversion, 28, 32
argument lists, variable- 188-193 character ordering, 16, 27
length, 230-233 advantages of, 189 char_find() function, 233
argv, contents of, 121-122 balanced, 194 circularization of lists, 63-67
arrays, multidimensional deleting nodes from, 196 cleanup() function, 257
averaging across row or col- functions, demonstration closering() function, 67
umn, 209-211 of, 197-199 colavg() function, 210, 212
averaging over whole inserting into, 189-192, collisions, 76
matrix, 213 194-195 and hashing to disk,
binary search of, 243, searching, 193 146-154
244-250 search times in, 193-194 command line arguments,
vs. dynamic data structures, traversing, 196-197 120-121
A) variations on, 199 comments, block, 2, 3
index of, 207 bit manipulation compare() function, 249-250
initializing, 149-150, applications for, 167, const void * argument types,
204-208 179-180 249-250
memory mapping formula operators, 167-172 continue statements, in loops,
for, 214 overlaying bit pattern on
one-dimensional arrays as range of bits, 175-176 cryptography
element of, 203-204 to print, 178 and bit manipulation, 179
passing row into a function, rotating bits, 176-177 and bitwise XOR operator,
211-212 testing, 177-178 171
sorting, 243, 244-250 to turn off range of bits, 174 ctime() function, 224, 237
tricks to avoid, 214-215 to turn on range of bits,
arrays, one-dimensional 173 data structures
assigning, 24-26 bit map, 179 vs. arrays, 243
initializing, 26-27 bit masking formulas, dynamic, need for, 45,
memory mapping formula 172-176 46-47, 55
for, 214 bit rotation methods, hybrid, 79-85
and pointer constant, 176-177 See also linked lists; queues;
19 bit shifting, 171-172 stacks
301
302 Index
matrices, See arrays, multidi- FILE * variables, 139, randomizing sorted files,
mensional 142-143 195-196
memcmp() function forward and backward, in realloc() function
and comparing arrays, 27 linked lists, 63, 67 and arrays of strings,
facts about, 39 as function parameters, 124-128
memcpy() function 20-24 facts about, 131
copying arrays, 25-26 to functions, 221, 227-229, and index array construc-
facts about, 39—40 249-250 tion, 155-156
memory failures, 251 importance of, 15 and qsort and bsearch
memory management incrementing, 18 functions, 244-250
hashing, 72-78, 152-153 manipulation with strings, and signal handling, 251
within nodes, 78-79 28-32 record file index, 155
programmer-controlled, NULL, 17, 47-48, 51, 67, recursion
67-71 192 binary tree insertion,
memory mapping for arrays, and out-of-bounds address, 190-191
214 computation of factorial,
memset() function to pointers, 34-36, 75, 122 183-185
facts about, 85-86 and swapping, 21-22 definition of, 183
and initializing multidi- and traversing arrays, 19 generation of Fibonacci
mensional arrays, 208 unassigned, fia number sequence,
and qsort and bsearch use of to traverse character 186-188
functions, 244-250 array, 15-16 recursion tree diagram, 187
and zeroing out node, pointer variables, 24-25 recursive case, 184-185
67-68 pop functions rehashing, 146
merge sorts, 259 for queues popq(), 53 relational expression, in
modem compression, and bit for stacks pops(), 50-52 return statement, 6—7
manipulation, 179 POSIX standard, 259, 266 remove() function, 259-261,
multi-lists, 80 printbits() function, 178 265
printf() function rename() function, 259-261,
nodes, 45, 47 %hu descriptor, 1'70 264-265
to delete from binary tree, advanced formatting, reserved words, 120
196 157-160 return statement
leaf, 190 facts about, 38, 128 use of ? operator in, 4-5
memory within, 78-79 format descriptors for use of relational expression
root, 190 addresses, 94 in, 6
to search for in binary tree, format list of, 2 reverse() function
and time functions, 224 and empty strings, 32
zeroing out, 67-68 use of ? operator, 6 and strings, 31-32
NULL pointers values returned by, 27 rewind() function, 139, 140
vs. empty string pointer, 17 variable formatting in, 104 facts about, 161
and iterative insertion in push functions and temporary files, 261
binary trees, 192 in queues, 51-53 right-left rule, 233-235
in stacks, 47—48, 51 in stacks, 48-50 root nodes, 190
Vs. ppesienes pointer, 17, putchar() function, 37 rowave() function, 209, 211
6 puts() function, 38 row major order, 206-207
numbits, 173-174, 175
qsort() function, 243, scanf() function
octal format, 159 244-250, 262 advanced formatting,
operating systems queues 157-160
to access, 259-261 applications, 54-55 and whitespace, 32
simulation, 79 in hybrid data structures, scan set format descriptor,
ordered linked list. See linked 80-85 158
lists main() function, 53-54 search_record() function,
OR operator, 167-170, 174 pop function, 53 154
preliminary declarations SEEK CUR, 140
parsing, and strpbrk() func- for, 48 SEEK END, 140
tion, 100 push function, 51-53 SEEK_SET, 140
Pascal, 4, 12 vs. stacks, 48 sentinel value
perror() function, 258, 264 tree of, 199 NULL as, 127
phrasing, 1 zero as, 19
pointer constant, 19, 24, 156, raise() function, 255, 267 sequential record processing,
203 rand() function 189
pointers and binary tree insertion, setjmp() function, 254-255,
and binary trees, 190 194-196 68
“chasing” method, 57, 65 facts about, 200 setting bits, 173-174
and dynamic data struc- random access short-cut notations, 4-8
tures, 47, 48, 49, 51, to binary files, 137-143 sigaction() function, 259
55, 57, 59-60 to variable-length record SIGALRM, 255-256
and empty string, 17 files, 155-156 SIGBUS, 251
304 Index
ae?
COMPUTER SCIENCE/C PROGRAMMING
Advanced Programming
by Example
John W. Perry, De Anza College
This practical, example-driven, code-centered book is intended for intermediate-
level C programmers who want to take their skills to the next level. The book builds
on readers’ existing background in C to complete their knowledge of the ANSI C
libraries and the conceptual and syntactical structures needed to master dynamic data
structures, string parsing and numeric conversion, memory management, bit-level
manipulation, interactions with operating systems, and other topics.
What sets this book apart is its “blue collar” approach to the art of programming—
how to master the “down-in-the-trenches” C details to implement abstract ideas
successfully. In recognition of this, the book illustrates real code, where many data
structures books often offer only English-like pseudocode. It also delves into C
language-specific topics that data structures books never discuss, such as:
4 Formulae for bit masking, setting, unsetting, and toggling
4 Built-in functions for string parsing, string-to-numeric conversion, and
their implications for user interfaces
Interaction with file systems, interrupts, and other operating-system issues
ma
d&
ba
aa C library functions for runtime error detection and location
A
Narrative text and diagrams are tightly connected to.code examples that are presented
in small, digestible chunks, which are ultimately combined to form sizeable programs.
In the accompanying exercises, the book strikes a careful balance, testing both syntac-
tic facility and algorithmic knowledge. The result is a book that serves both serious
programming students and professional C programmers.
H
i
.g4 PWS Publishing Company ISBN O0-S34-95140-
<PEXS> 20 Park Plaza, Boston, MA 02116 9000 Se
> http://www.pws.com a
PWS— Innovation and Excellence N
in Computer Science
T
I(T)P an International Thomson Publishing company 9"780534"951 405