Shakey The Robot
Shakey The Robot
April 1984
Approved:
ABSTRACT .............................................. 1
ACKNOWLEDGMENTS ....em.............................e.
APPENDICES
...
Ill
ILLUSTRATIONS
AUTOMATON VEHICLE
AUTOMATON VEHICLE IN ITS ENVIRONMENT
AUTOMATON- SYSTEM BLOCK DIAGRAM
SRI ARTIFICIAL INTELLIGENCE GROUP
CO MPUTER SYSTEM
EXAMPLE MODEL
CONTROL STRUCTURE OF LOW- LEVEL ACTIVITIES
CONTROL STRUCTURE OF THE
INTERMEDIATE LEVEL
AN OBSTACLE CONFIGURATION FOR FINDPATH
SEARCH TREE FOR CONFIGURATION OF
FIGURE 8
TYPICAL MACROP
MACROP WITH MARKED CLAUSES
MACROP AFTER EDITING
GENERALIZED PLAN FOR TWO- PUSH MACROP
MOP: A NESTED MACROP . . . . . . . . . . . . . . . . . . . . . . .
MAP OF SHAKEY' S EXPERIMENTAL ENVIRONMENT
MAP OF SHAKEY' S WORLD AFTER
COMPLETION OF THE FIRST TASK
8-1 THREE CORRIDOR SCENES 116
B-2 RESULTS OF MERGING HEURISTICS 117
8-3 A SIMPLE SCENE 124
8-4 A MORE COMPLICATED SCENE 127
8-5 THE ANALYSIS TREE 129
8-6 BASIC FLOWCHART FOR LANDMARK PROGRAM 132
7 LANDMARKS 133
TABLES
vii
F r o m 1966 t h r o u g h 1972, t h e A r t i f i c i a l Intelligence C e n t e r a t SRI
conducted research o n a mobile robot s y s t e m n i c k n a m e d "Shakey."
Endowed w i t h a l i m i t e d ability t o perceive a n d model its e n v i r o n m e n t ,
Skakey could p e r f o r m t a s k s t h a t required planning, route-finding, a n d t h e
rearranging o f s i m p l e objects. A l t h o u g h t h e S h a k e y project led t o
n u m e r o u s advances in A? techniques, m a n y o f w h i c h were reported in t h e
literature, much specific i n f o r m a t i o n t h a t m i g h t be u s e f u l in current
robotics research appears o n l y in a series o f relatively inaccessible S R I
technical reports. O u r purpose here, consequently, is t o m a k e this
material m o r e readily available by extracting a n d reprinting t h o s e
section* of t h e reports t h a t s e e m particularly interesting, relevant a n d
important.
C H A P T E R ONE
Introduction
Some of the ground rules guiding our research were established immediately. First, it was
decided that the basic goal of this project was to design an integrated system consisting of
a mobile vehicle under the real-time control and supervision of a powerful digital
computer. The vehicle should be equipped with at least rudimentary manipulative
abilities, and with sensory and communication subsystems. Various automata have been
built which are controlled by relatively few, simple, onboard logic circuits, but the essence
of this project is real-time control by a full-scale, programmable, digital computer.
Fourth, we decided that the environment of the automaton should be large in extent. Its
components may be simple in quality in the beginning, but there should be a non-trivial,
extensive environment that the automation is expected to deal with. This ground rule
forces us immediately to consider only methods for e fjicient internal representations of
the world.*
The robot system is a hierarchical structure in which we shall identify five major levels.
Although some of these levels are much more clearly defined than others and some have
considerable substructure, the five levels described below constitute a useful division for
this exposition. Also, the effectiveness of the system is largely derived from the clear
specifications for these levels and their interconnections.
The bottom level of the system consists of the robot vehicle and its connection to the user
programs. This connection includes radio and microwave communication links, a PDP-15
peripheral computer and its software, and a communications channel, with its associated
software, between the PDP-15 and the PDP-10. This bottom level may be thought of as
defining the elementary physical capabilities of the system.
The second level consists of what we call Low-Level Actions, or "LLAs." These are the
lowest-level robot control programs available to user programs in the LISP language, our
principal programming tool. The LLAs are programatic handles on the robot's physical
capabilities such as "ROLL" and "TILT." They are described in detail in Chapter Four.
So that it can exhibit interesting behavior, our robot system has been equipped with a
library of Intermediate-Level Actions, or "ILAs." These third-level elements are
preprogrammed packages of LLAs, embedded in a Markov table framework with various
per~ept~ion,
control and error-correction features. (Markov formalizations are explained in
Chapter Five, Section B.) Each ILA represents built-in expertise in some significant
physical capability, such as "PUSH" or "GO TO." The ILAs might be thought of as
instinctive abilities of the robot, analogous to such built-in complex animal abilities as
"WALK" or "EAT." Chapter Five contains a description of the present set of ILAs,
along with the conditions under which they are applicable and how they each can affect
the state of the world.
The principal sensor of the perceptual system is the TV camera. Programs for processing
picture data have been restricted to a few special "vision" routines, that orient the robot
and detect and locate objects. These programs are incorporated into the system at either
the ILA or LIA level. The algorithms in these routines are described in Chapter Six and
Appendix B.
Above the ILAs we have the fourth level, which is concerned with planning the solutions
to problems. The basic planning mechanism is STRIPS, described in Chapter Seven.
STRIPS constructs sequences of ILAs needed to carry out specified tasks. Such a
sequence, along with its expected effects, can be represented by a triangular table called a
1ACROP (" macro operation Chapter Eight describes how such MACROPs can be
generated in generalized forIl, thereby enabling an interesting form of learning and plan
sdcction to take place.
, or top, level of the system is the executive, the program that actually
Finally t the fifth
invokes and monitors executions of the ILAs specified in a MACROP. The current
executive program, called PLANEX, is brieny described at the end of Chapter Eight. *
The robot vehicle itself is shown in Figures 1 and 2. It is propelled by two stepping
mot.ors independently driving a wheel on either side of the vehicle. It carries a vidicon
television camera and optical range-finder in a movable "head." Control logic on board
the vehicle routes commands from the computer to the appropriate action sites on the
vehicle.. In addition to the drive motors, there are motors to control the camera focus and
iris settings and the tilt angle of the head. Other computer commands arm or disarm
interrupt logic, control power switches and request readings of the status of various
rcgist,ers on the vehicle. Besides the television camera and range-finder sensors, several
11
cat-whisker" touch-sensors are attached to the vehicle's perimeter. These touch sensors
enable the vehicle to know when it bumps into something. Commands from the computer
to the vehicle and information from the vehicle to the computer are sent over two special
radio links, one for narrow-band telemetering and one for transmission of the TV video
from the vehicle to the computer.*
The initial environment of the Automaton was real, but contrived. It has been sufficiently
simple t-o allow current visual capabilities to be useful to the Automaton, and sufficiently
complex to indicate the weaknesses of current methods and to suggest areas of further
research. Perhaps the most important result of our vision-research effort on the
Automaton project is an appreciation of the potential complexity of the problem of vision
when the real world is the subject matter, and a strong notion that the first step we have
taken towards a general capability is very small indeed.
The first visual subsyst.em was designed to specialize in the planar-surfaced environment
of our 1aborat.ory and office building. The objects in this environment are specially
constructed rect.angular parallelepipeds and wedges. The use of only the regularly spaced
overhead fluorescent, lights as well as light colored walls and floor allows us to essentially
eliminate shadows and to limit the illumination to a 2-112 to 1 range in the computer
room.
The surfaces of the objects used are uniformly coated with red, grey, or white paint.
Originally black was used to insure high contrast between adjacent surfaces. However,
the range-finclcr relies on reflected light. Red replaced black because it is relatively dark
t,o the TV camera and returns enough light to the range-finder. Thus, not only are the
objects opaque, but also have non-specular surfaces. Furthermore no two-dimensional
markings were put on the object surfaces. The floor tile was chosen so as not to have any
(let-ect.ablemarkings. The only two-dimensional marking purposely applied was a dark
wall molding at the floor level. The floor has about the same reflectivity as the walls.
There were verticle molding strips on one wall which were specular.*
Figure 3 shows a block diagram of the hardware system. The system consist,^ of a
stationary part interfacing with the SDS 940 computer and the mobile vehicle which is
remotely controlled from t,he fixed equipment via a full duplex radio link. (The data
communications interface was described in an Appendix of [4].)
Commands to the vehicle are transmitted in digital form preceded by a module address
referring to the module on the vehicle that is expected to act. Each module is equipped
- - --
of mot, ion. spced, requested distance , and other special functions. When action is
fcquet;teu, the action tarts and continues until completed or interrupted by other control
funct ions in the system. End-of-action or other control interrupts are transmitted back to
the \t, ationary equipment in coded form , where they are decoded and sent as interrupts to
the t omputrr. Intrrrupts of a similar nature are ORed together to limit the number of
interrupt.s. St.atus regist.er are therefore provided on the vehicle so that status can be
interrogated from the computer any time the source of the interrupt is in question.
The hardware for the visual system uses the same interface to the computer. The power
for the TV camera and the special transmitter for the videodata is controlled from the
power-cont.rol register on the vehicle. The rest of the visual system is quite independent.
The TV camera consists of one control unit mounted on the platform of the vehicle and
one camera head mounted on a pedestal in the center of the vehicle. The camera can be
turned :: 180 degrees around a vertical centerline, and it can be titled +60 degrees and
45 degrees around a horizontal axis located below and perpendicular to the optical axis of
the camera. The camera is equipped with a manually replaceable lens. The lens mounts
in a mechanism wit. h two motors for control of iris and focus. The control of all degrees
of freedom of the camera and its lens system is accomplished by stepping motors. The
rotation of the camera around the vertical shaft is under control of a servo similar to that
used for the wheels of the vehicle. The control from the computer is in the form of LEFT
or RIGHT commands of a given number of steps. The camera has one left-rotational
terminal switch at + 180 degrees rotation and one right-rotational terminal switch at - 180
degrees rotation. \Vhen these switches close, the rotation in the direction in process
interrupted. The switches also signal the emergency circuit , causing an interrupt signal at
the computer. Associated with the shaft rotation, there is also a pan distance counter.
The content of the counter can be transmitted to the computer. The tilt of the camera is
controlled by a stepping motor operated at a constant step rate. The motor reacts to a
TILT UP or TILT DOWN command for a given number of steps. The tilt mechanism has
limiting switches up and down. The limit switches stop the tilt and signal the interrupt
circuits in the computer. The content of the tilt counter can be transmitted to the
a::J au- au-
I;: :1
i51
:::I
::u
i:l;
:!i
iil
U-:
a_-
;: 8$
. a
-.u
I;: :=1:
w...
=si=
==5=
-.4
!! u
Only one lens is presently used. Focus is controlled by one stepping motors and iris by
another. The rotation is limited by limit switches. The limit switches preset the counters
at maximum focus and minimum iris associated with the stepping motors.
The control logic has an up-down counter ror distance and direction.
The Artificial Intellgence Group computer complex consists or the rollowing parts:
. PDP- I0 computer and peripherals
The PDP- IO 8Jstem has 192K (K 1024) words or 36-bit memory. 32K is DEC MDI0
memory. The rest is Ampex RGI0 memory, consisting or one 32K memory with interface
and one 128K memory interface and rour modules or 32K each. All memory has rour
ports. These are occupied by:
. PDP- l: cent.ral processor
. DA25C interface.
The Bryant drum is a high-speed autolift drum which has a 1.5-milion-word capacity.
is planned that it wil be used ror swapping and some system files. The drum controller
interfaces directly into the memory rather than going through a data channel.
The interface between the disk pack drives and the DF10 data channel was built by
Interactive Data Systems, Inc.
The disk pack drives are manufactured by Century Data Systems and handle the 20-
surface disk packs. This means that each disk pack has a 5-million-word capacity. The
packs themselves are manufactured by Caelus Inc. The disk pack system is used as
secondary storage.
Currently, we are also using one disk pack drive as a swapping device for the time-sharing
system.
The TV AID converter is an SRI-designed and -built device. I t handles data from the
robot TV camera at a rate of one word every 1.5 microseconds. It is capable of processing
either 120X120 or 240x240 pictures with 32 levels of gray scale.
The DA25C is the PDP-10side of the interprocessor buffer. I t handles data at one 36-bit
word every 8 microseconds. We have programmed it such that the PDP-10 is always in
cont,rol and can interrupt any transmission in order to initiate one of its own.
The DA25D is the PDP-15side of the interprocessor buffer. Each PDP-10 word is split
into two PDP-15 words (18 bits each). I t also does the reverse operation. It operates on
the PDP-15 I/O bus as a single-cycle device; however, its internal logic uses three cycles
per word.
The PDP-15 has 12K of core memory and an 110 processor. All devices are "daisy
chained" on the 110 bus. These include an Adage display, paper tape, DEC tape, AID
convertler, D/A converter, ARPA network IMP, and the SRI robot.
The Adage display provides a high-speed graphics capability. It will be refreshed from the
PDP-15 core. The display lists will be prepared in the PDP-10 and executed from the
PDP-15. Capabilities include incremental mode, print mode, dotted lines, and intensity
control.*
Interprocessor Channel
,or- POP-
MDIOA
32K OA2! DA2S
,'l
I,
AMPEX
32K
10 II r-----'
AWE X
L___ DA2I r---
32K L____
10 II
AWEX
32K
10 II
c....y
AMPEX Di DrMi
110 lus I 32K 1/0 el'
10 II
'RINTER AWEX
321C
10 II
r--DEVICE I
I MUl TIPUXER I
L_-
DeiOA
r----t r-_ , r- ,I
FUTURE U I FUTURE FUTURE I I
DEVICE I DEVICE I DEVICE I
L____ J L____ J L____
r----
HOST I
l INTERFACE r---
L__-_
7I011f2.
computer t o allow FORTRAN (or FORTRAN-compatiUe MACRO)
subroutines a n d functions t o be r u n under t h e LISP operating s y s t e m .
T h i s interface i f described i n [ISl.
CHAPER THREE
As a result of our experience with the previous robot system (i.e., the one using the
SDS 940) and our desire to expand the robot' s experimental environment to include
several rooms with their connecting hallways , we have adopted new conventions ror
represent.ing t.he robot' s model or the world. In particular, whereas the previous system
had the burden or maintaining two separate world models (i. , a map-like grid model and
an axiom model), t.he new system uses a single model for all its operations (an axiom
model); also , in the new system conventions have been established for representing doors
wall faces, rooms, objects , and the robot' s status.
The model in the new system is a collection of predicate calculus statements stored
prenexed clauses in an indexed data structure. The storage format allows the model to be
used without modification as the axiom set for STRIPS' planning operations (Chapter
Seven) and for QA3. s theorem- proving activities (14, 15).
Although t.he system allows any predicate calculus statement to be included in the model
most or the model wil consist or unit clauses (i.e., consisting of a single literal) as shown
in Table 1. Nonunit clauses typically occur in the model to represent disjunctions (e.
box2 is either in room K2 or room K4) and to state general properties of the world (e.
for all locations locI and loc2 and for all objects obi, if obi is at location loci and loci is
not the same location as loc2 , then obI is not at location loc2).
We have defined for the model the following five classes of entities: doors , wall races
rooms , objects , and the robot. For each of these classes we have defined a set of
primitive predicates which are to be used to describe these entities in the model. Table 1
lists these primitive predicates and indicates how they will appear in the model. All
distances and locations are given in feet and all angles are given in degrees. These
quantities are measured with respect to a rectangular coordinate system oriented so that
all wall faces are parallel to one of the X- Y axes. The NAME predicate associated with
each entity anows a person to use names natural to him (e.g. , halldoor , leftface , K2090
etc. ) rather than the less- intuitive system- generated names (e. , dl , f203 , r4450 , etc.
Figure 5 shows a sample environment and a portion of the corresponding world model.
Rooms are defined as any rectangular area, and therefore , the hallway on the left is
modeled as a room. There is associated with each room a grid structure that indicates
which portions of the room s noor area have not yet been explored by the robot. During
route planning the grid is employed to help determine if a proposed path is known
blocked, known clear , or unknown.
Four wall faces are modeled in Figue 5. The F ACELOC model entry for each face
indicates t.he face s location on either the X or Y coordinate depending on the face
orientation. There is associated with each face a grid structure to indicate which portions
of the wall face have not yet been explored by the robot. This grid is used in searching
wall faces for doors and signs.
Two doors are modeled in Figure 5. The DOORLOC model entry for each door indicates
t.he locations of the door s boundaries on either the X or Y coordinate , depending on the
orientation of the wall in which the door lies. Any opening between adjoining rooms is
modeled as a door, so that the complete model of the environment diagrammed in Figure
5 would have a door connecting rooms Rl and R3. This door coincides with the south
face of room R3 and wil always have the status " open.
The RADIUS and AT model entries for the object modeled in Figure 5 define a circle
circumscribing the object. These entries simplify the route-planning routines by allowing
each object to be considered circular in shape. Our current set of primitive predicates for
describing objects is purposely incomplete; we will add new predicates to the set as the
need for them arises in our experiments.
We do not wish to restrict the model to only statements containing primitive predicates.
The motivation for defining such a predicate class is to restrict the domain or model
entries that the robot action routines have responsibilty for updating. That is , it is clear
that the action routine that moves the robot must update the robot' s location in the
model , but what else should it have to update! The model may contain many other
entries whose validity depends on the robot' s previous location (e. , a statement
indicating that the robot is next to some object), and the system must be able to
determine that these statements may no longer be valid after the robot' s location has
changed.
\Ve have responded t.o this problem by assigning to the action routines (discussed in
Chapters Four and Five) the responsibility for updating only those model statements
which are unit. clauses and contain a primitive predicate. All other statements in the
model will have associated with them the primitive predicate unit clauses on which their
validity depends. \Vhen such a nonprimitive statement is fetched from the model , a test
will be made to determine whether each of the primitive statements on which it depends is
still in the model; if not , then the nonprimitive statement is considered invalid and is
deleted from t.he model. This scheme ensures that new predicates can be easily added to
t.h(' system and that existing action routines produce valid models when they are executed.
vVe have designed and programmed a set or LISP runctions for interacting with the world
model. These functions are used both by the experimenter (as he defines and interrogates
the model) and by other routines in the system to modify the model. To the experimenter
at a teletype , these functions are accessible as a set or commands. A brief description of
these commands follows.
ASSERT This is the basic command ror entering new axioms into the model. The
user follows the word ASSERT by either CUR or ALL to .indicate
whether the entries are to be ror the current model or are to be
considered part or all models. The system then prompts the user for
predicate calculus statements to be typed in using the QA3. 5 expression
input language. After each statement is entered , the system responds
with " OK" and requests the next statement. To exit the ASSERT
mode the user types "
FETCH This is the basic command ror model queries. The user follows the word
FETCH by an atom rorm , and the system types out a list of all unit
clauses in the model that match the rorm. Each term in an atom form
is either a constant or a dollar sign. The dollar sign indicates an "
don t care " term and will match anything. The last term or an atom
form can also be the characters " S." to indicate an arbitrary number of
I don t care " terms. For example , the atom fOfm " (AT ROBOT $*
wil fetch the location or the robot , and the atom fOfm " (INROOM $
Rl)" wil fetch a list of model entries indicating each of the objects in
room Rl.
DELETE This is the basic command (or removing statements (rom the model.
The user (ollows the word DELETE by an atom (orm , and the system
deletes all unit clauses in the model that match the (orm. Atom (orms
have the same syntax and semantics (or the DELETE command as
described above (or the FETCH command.
Primi ti ve
Predicate Li teral Form Example Literal
FACES
ROOMS
OBJECTS
FACES
type Cf4 flce)
typeCf1 fac'
type(t2 flce) tyCf3 flce)
naCf3 wfr2) nameCf4 e'r3)
nameCf1 nfrO nlmeC'2 "r2)
feclocCf3 5. fecloc('4 4.
'eclocCf1 15. 'lcelocCf2 15.
gridCf3 g6) gridC.4 ,11
gridCf1 ..)
gridCf2 ,51 boundsroom('4 r3 elsd
bondsroomCf2 r2 lOuth) bondsroomCf3 r2 wed
boundsroomCf1 r1 north)
DOORS
type(d1 dor) typeCd2 dor)
nameCd1 officer) nemeCd2 hilldor)
12. dorlocsCd2 22. 5 25.
doiocICd1 10. 0
ioins'8CesCd1 f1 .2) ioins'acesCd2 '4 '3)
'41'3
OBJECTS ROBOT
10 I
ROBOT
T A- 8973-
CHAPER FOUR
A. Introduction
The low- level , or " LLAs " define the interface between major robot software
act.ions
packages and the bottom , hardware-oriented level of the system. The intermediate- level
act.ions (ILAs), to be described in Chapter Five , control the operation of these LLAs. The
LLAs , in t.urn , communicate with the PDP- 15 computer and the robot vehicle according
to the protocol described in Appendix G of (9).
I n this section we shall describe the upper face of the LLAs , i. , the face presented to
higher- level programs.
Since the robot moves very slowly, we have taken great pains to permit the user to view
the robot as behaving asynchronously to as great an extent as appropriate. Thus , the
user must t.ake cognizance of this asynchrony by confirming the completion of " settling
on any robot activity before doing anything that assumes that activity to have been
successful. This low- level software package provides the necessary interlocking in the
following manner. Communications between the user and the robot are separated into
two unidirectional channels: orders from the user to the robot are handled by calls on
LLAs (i.e. , the funct.ions in t.his package); the current state of the robot s world is
reflected in the robot s world model. Now , the functions by which the user can access
these part.icular entries in the robot s world model have special provisions to ensure that
an actjvity has settled before grant. ing access to any part of the model which that activity
might affed. For example , one might move the robot to a given location by first turning
it to face the t.arget spot and then rollng it straight forward by the required distance.
One could conceivably confirm the initial turn (by interrogating the proper part of the
model) before rollng ahead. The model-access function will then delay until the turn has
settled before reporting the bearing of the robot. On the other hand , the user wil not be
delayed for' completion of the roll until he interrogates the position of the robot. Thus
have synchronization (between the user and the robot) whenever we need it but not
otherwise.
This sort of synchronization is effected in another circumstance having to do with
interlocks between activities. In particular , each activity has associated with it certain
conflicting activities. (For example , one cannot take a TV picture while the robot' s head
is panning. ) A set of init.iation functions automatically take cognizance of all possible
conflicts: each ensures that all potentially connicting activities are settled before
initiating it.s own activity. For the purpose of programming actual use of the robot
however, one should note that settling of an activity does not necessarily mean its
successful completion. For example, a roll can terminate by the robot unexpectedly
bumping int.o some obstacle- this wil " settle " the roll, but the robot cannot be assumed
to have attained its destination.
Angles are measured in degrees , and we wil call the principal value of an angle that val
between - 180 and +1800 . The bearing of the robot is a horizontal angle referred to the
positive direction of the global y-axis; thus the robot is parallel to the xeaxis in the
negative sense when its bearing is 90 o . The pan angle of the robot' s head is a horizontal
angle referred to the robot' s bearing, and the tilt angle of the robot s head is a vertical
angle measured from the horizontal plane. Thus , when the robot has its pan angle at zero
and the tilt angle at- 45 0 , the range- finder and TV camera are pointed at the floor right
beCore its very wheels.
\Ve turn now to optical values. The iris of the TV camera is set in exposure value units
(EVs), which have a logarithmic relation to f-numbers: increasing the EV number by one
doubles the amount of light arriving at the inner regions of the TV camera. Focus values
and range- finder readings are distances in feet from the intersection of the axcs about
which the robot s head tilts and pans. That point in turn is about 4 feet 1- 1/2 inches
above the floor and 9 inches forward of the axis about which the robot turns , when the
robot is standing (or sitting or whatever it does) on a level flat floor.
Having covered the numeric quantities in the robot' s world , we have but a few other items
to discuss. Perhaps the simplest of these to describe is a TV picture: it resides on a disk
file in FORTRAN binary format. Now TV pictures are digitized in square arrays of
picture elements; the size of the array is constant , but one can select two coarsenesses:
120 or 240 picture elements on a side. One can , however , alter the configuration of the
array for the sake of special stereo optics. These two options are combined into one
num her called the tvmode , as follows:
To explain the last two quantities of this section, we must first explain the two main
tactile sensors of the robot and how they interact with the roll and turn activities. The
tactile sensors are seven catwhiskers and a pushbar; each catwhisker can signal
engagement with an obstacle , and the pushbar can signal each of two levels of pressure:
mere engagement and hard contact. All nine of these conditions are renected in a
quantity called the whiskerword; to a first approximation each of these conditions has its
own bit in the whiskerword , whose format is shown in the following table:
The robot has a couple of motor renexes pertinent to this discussion: it wil stop moving
whenever the pushbar becomes disengaged, and it wil not move while a catwhisker is
engaged. Howeyer , these two reflexes can be overridden selectively; the corresponding
orders are sent to the PDP- 15 by means of the override activity and the override code
and DTI lET A , which are separated from AT and THETA to faciltate planning. The
state of the w hiskerword is updated whenever a ROLL or TURN settles , and the OVRID
predicate rerlccts the state of the overrides in the robot.
The TILT and PAN predicates refer t.o the direction the robot' s head is pointed. DTILT
. and DP AN give corresponding error estimates. All three angles (tilt angle , pan angle , and
heading THETA) are stored as their principal values. RANGE gives the yalue resulting
from the mOllt recent range- finder reading. The PICTUREST AKEN predicate , which we
wil describe more fully in our discussion of the SHOOT activities, gives the approximate
Hum ber of pictures taken to date. The meanings of the rest of the predicates should be
D. The LLAs
The predicates are the means by which the robot tells the user about its state; the LLAs
provide t.he means by which the user t. ells the robot to alter its state. One should
understand that this clean division is largely just formal; in practice an interrogation of a
predicate is intercepted by a function that ensures settling of any relevant robot activities
before proceeding to the actual access. Also, the initiation of an action does not guarantee
its completion; actions may terminate for a variety of reasons , such as engagement of limit
switches or malfunctions in the telemetry link. The state of the system after an action
may be det.ermined by investigating the model.
The following functions initiate fundamental low- level activities (whenever numeric
parameters are used, negative numbers are permissible and mean motion in the direction
opposite to that indicated):
TILT degreesup tilts the robot' s head upward by " degreesup " degrees. The motion
can be prematurely terminated by a limit switch.
PAN degreeslelt pans the robot' s head by " degreesleft" degrees to the left or far
FOCUS feetout the TV camera is initially focused on a plane removed by some focal
distance from the center of the head' s gimbals; this function increases that distance
feetout" feet. Of course the range of focal distances is limited by limit switches.
ffIS evs opens the robot s iris (on the TV camera) by " evs " EVs. Thus if B evs " has
the value 1 , this form wil double the amount of light getting into the TV camera. There
are limits for this activity too.
OVRID overrides set the overrides as specified by the " overrides " code work.
TVMODE tvmode sets the TV mode as specified by the " tvmode " code word.
RAGE reads the robot' s range- finder; this automatically includes turning on the
range- Cinder and waiting ror it to warm up.
SHOOT put.s a TV picture onto the disk file " TV. DAT. " The picture is taken
according to the current TV mode. Assuming correct operation or hardware and
software, a subsequent examination or the PICTURESTAKEN atom (in the world model)
wil yield a positive integer giving the number of current pictures in a series (1 t 2 , 3,...
begun when the robot system was loaded or initialized. In the event of an unrecovered
system malfunction (e. , transmission ' error), the value stored with PICTURESTAKEN
will be the nE'gative of the serial number of the last successfully taken picture.
ROLL reet tells the robot to roll forward by " feet" reet. This activity has three
normal ways of prematurely terminating: the robot can come into contact with an
obstacle , engaging a catwhisker; it can lose contact with an object it is pushing,
disengaging the pushbar; or it can encounter an immovable object , causing the pushbar to
come on hard. The first two conditions cause the robot to stop by renex actions that can
be overridden; the last causes the robot to attempt to free itself using more complex
evasive actions in a reflex that cannot be overridden. When the robot encounters an
immovable object , it wil not only stop, but it will back away from it by some distance
urrent.ly a constant 6 inches. (Of course , the information in the model wil be correctly
maintained. ) The whiskerword in the model is updated at the end of a ROLL or TURN;
it contains the description of the current state if the catwhiskers and pushbar are
ret.urned from the robot , but it has another bit for immovable objects-this bit showing
the history of an event rather than showing a current state. This bit is set only w hen the
hiskerword is updated the first time after hard contact.
TURN degreeslert tells the robot to turn to the leCt by " degreesleCt" degrees.
Otherwise the above description oC the ROLL activity applies excepting only the way
immovable objects are evaded. In this case , the robot turns back; currently it turns back
to its initial heading.
The functions discussed so Car that initiate motions have been incremental in Corm if not
in essence, However , even this level of robot software has a memor:y oC the various
aspects of the robot s position in the axiomatic model so dutiCully maintained by the
ett, ling funct. ions. Capitalizing on this circumstance, we have also provided some
functions to initiate motions to a given goal (rather than by a given amount). Although
these funct.ions are Cormally and conceptually outside the lowest LISP level of robot
software, they have suCficiently simple internal structure that it is convenient to describe
t.hem here rat.her than in the next (ILA) chapter. With one exception we expect their
(TIL TO degreesup)
(P ANTO degreesleft)
(FOCUSTO reet)
(IHISTO evs)
(ROLLTO xfeet yfeet)
(TURNTO degreeslerttofy).
The exception is ROLL TO: it must first turn the robot to point toward its goal , so it
must do (and does) more than simple calling the corresponding incremental function with
he difference between the desired and curent position.
E. Summary
Table 2 is a summary of Shakey s low- level activities. Figure 6 sketches how these
activit.ies fit into the overall system control structure.
(T ILT degreesup) (TILTO degrees up) RANGE , SHOO upper limit (35 TILT , MILT TILT , MILT
lower limit (-45
(PAN degrees ef t) (PANT degrees left) RANGE , SHOOT left limit (116 PAN , DPAN PAN , DPAN
right limit (-107
(FoaS feetout) (FOSTO feetout) SHOO near limi t FOCUS , DFOCS FOCUS , DFOCUS
far limit
(IRIS evs) (IRISTO evs) SHOO open Umi t IRIS , DIRIS IRIS , DFOCUS
closed limi t
(OVRID overrides) TURN , ROLL OVERRIDE
(ROLL feet) (ROLLTO xfeet yfeet) TURN , RANGE , OVR ID bump-ignored AT , DAT , THETA , MHETA AT , DAT , WHISKERS
SHoo , ROLL bump-stopped
(TURN degreesleft) (TURNO degrees left) drop object-stopped THETA , MHETA THETA , DTHETA ,
immovable object-backed off WHISKERS
1 1
ROLL TO TURNTO FOCUSTO
TVMODE
A. Introduction
As with most programming tasks, the problem of programming robot actions is simplified
when it is done in terms of well-defined subroutines. At the lowest level it is natural to
define routines that have a direct correspondence with low-level robot actions-routines
for rolling, turning, panning, taking a range reading, taking a television picture, and so
forth. However, these routines are too primitive for high-level problem solving. Here it is
desirable tJo assume the existence of programs that can carry out tasks such as going to a
specified pla.ce or pushing an object fromone place to another. These intermediate-level
actions (ILAs) may possess some limited problem-solving capacity, such as the ability to
plan routes and recover from certain errors, but the ILAs are basically specialized
~ubrout~ines.None of these routines has as yet been written. However, considerable
thought has been devoted to their design, and this section describes our plans for a set of
ILAs t8hat.a.re suitable for use with the STRIPS problem-solving system.
Perhaps tihe most difficult problem that confronts the designer of ILAs is the problem of
detectpingand recovering from errors. Sometimes errors are detected automatically, as
when an interrupt from a touch sensor indicates the presence of an unexpected obstacle.
Other times it is necessary to make explicit checks, such as checking to be sure that a
door is open before moving through it. When an error is detected, the problem of
recovery arises. This problem can be very difficult, and is one aspect that distinguishes
work in robotry from other work in artificial intelligence.
1. General Considerations
The formal structure of each ILA routine is basically that of a Markov algorithm. * Each
routine is a sequence of statements. Each statement consists of a statement label , a
predicate , an action, and a control label. When a routine is called, the predicates are
evaluated in sequence until one is found that is satisfied by the current model. Then the
corresponding action is executed. The control label indicates a transfer of control , either
to anot.her labeled statement or to the callng routine.
Table 3 gives a specific example of an ILA coded in this form. This routine , gotoadjroom
(rooml , door , room2), is intended to move the robot from room! to room2 through the
specified door. The first test made is a check to be sure that the robot is in room!. If it
is not , an error has occurred somewhere. Since this routine is not prepared to handle that
kind oC error , no action is taken , and control is returned to the calling routine. The
status or the door. The function " doorstatus " computes this rrom the model , and
evaluates to either OPEN , CLOSED, or UNKNOWN. Rather than tracing through all or
the possibilities , let us consider a normal case in which the door is open but the robot is
neither in rront of nor near it. It this case , the action taken is the last one
n avto( nearpoint( room 1 , door)). Here the function " nearpoint" computes a goal location
near the door. The function " navto " is another ILA that plans a route to the goal point
and eventually executes a series of turns and rolls to get the robot to that goal. Of
course , unexpected problems may prevent the robot from reaching that goal.
Nevertheless , whether navto succeeds or fails , when it returns to gotoadjroom the next
predicate checked wil be that of statement 4. If navto succeeds and the robot is actually
in front of the door , the bumblethru routine wil be called to get the robot into room2. If
navto had railed and the robot is not even near the door , navto wil be tried again.
Clearly, this exposes trapped in rruitless infinite loops. We shall
the danger of being
in(room2)
setq(s , doorstatus(door))
infrontof (door) Aeq (s , OPEN) bumblethru(rooml , door , room2)
near(door) Neq(s OPEN) align(rooml , door , room2)
near(door) Neq(s , UNOWN) doorpic (door)
eq (s , CLOSED)
The predicates used in the ILAs have the responsibilty of seeing t.hat preconditions for an
act.ion are satisfied. In general , the evaluation of predicates is based on information
contained in the model. If this information is incorrect , the resulting action wil usually
be inappropriate. However, the act of taking such an action wil rrequently expose errors
in the model. When the model is updated (which typically occurs after bumping into an
object or analyzing a picture), the values or predicates can and do change. . Thus , the
values of the predicates wil depend on the way the execution or the ILA proceeds, and
wil steer the routine into (hopefully) appropriate actions when errors are encountered.
The actions can ' be any executable program. The most common actions are to compute
the values of local variables, update the model , call picture-taking routines that update
the model , or call other ILAs. Only the first of these causes any answers to be returned
directly to the callng program. This constraint of communicat.ing through the model
occasionally leads to computational inefficiencies. For example , the very computation
used by one routine to determine that it has completed its job successfully may be
repeated by the callng routine to be sure that the job has been done. While some of
these inefficiencies could be eliminated with modest effort , they appear to be of minor
importance compared to the value of having a straightforward solution to the problem or
error recovery.
3. Loop Suppression
'rVe ment.ioned earlier that the failure of a lower- level ILA might result in no changes in
the model that are detected by the callng ILA. In this case , one can become trapped in
an infinite loop. There are a number of ways to circumvent this problem. Perhaps the
most satisfying way would be to have a monitor program that is aware or the complete
state of the system, and that could determine whether or not the actions being taken are
An alternative would be to have each ILA keep a record of whether or not its actions are
The simplest kind oC record Cor an ILA to keep is a count or the number of times it has
taken each action. In many cases , if an action has been taken once or twice before, and if
the predicates are callng' for it to be taken again , then the ILA can assume that no
progress is being made and return control to the callng program. This strategy can be
improved by computing a limit on the number of allowed repetitions , and making this
limit depend on the task. For example , ir the action is to take the next step in a plan , the
limit should obviously be related to the number of steps in the original plan. Both of
hese strategies can be criticized on the grounds that hey are indirect and possibly very
poor measures oC the progress being made. However , they constitute a frequently
effective , simple heuristic , and wil be used in our initial implementation of the ILAs.
One aspect of implementing the ILAs that has not yet been resolved concerns whether the
ILAs should be writ.en as ordinary LISP programs , or should be kept in tabular form as
dat.a for an interpreter. It is quite easy to go from a representation such as that in Table
3 to a LISP program realizat.ion; the basic structure is merely a COND within a PROG.
However , t.he use of an interpreter would simplify the implementation of the loop
suppressor , and would also simplify monitoring and the incorporation oC diagnostic
messages. In addition , the same program that interprets the ILAs might be used to
interpret the plans produced by STRIPS; if we can make these structures identical , the
same executive program wil be usable for both. Unirormity in program structure is also
important. for the plan generalization ideas (to be discussed in Chapter Eight).
PUSH 2 PICLO* , OBLO* , NAvrO , ROLIUMP , PUSHI Check if object being pushed slips off
I PUSH3 I PLANOBMOVE*
ROLIUMP ROLLBACK
I ROLL * , ROLLI Basic roll routine that expects a terminal bump
The second ezcerpt describes the fLAs as the, were implemented:
A. Introduction
The Int.ermediate- Level Actions (ILAs) are the action routines associated with the STRIPS
operators (sce Chapter Seven). Here we distinguish " action routines " from " opcrators
on the following basis: operators are used for planning, and the corresponding action
routines are invoked to actually move the robot. The ILAs are written in a language we
call I\1arkov because of its resemblance to Markov algorithms. There is a large body of
auxiliary LISP functions that accompanies the ILAs , but we wil confine the present
discussion t.o a brief description of the Markov language and a brief exposition of the
currcnt ILAs and the intraroom navigation algorithm.
some other line in the table. This last item (which we have been callng the " go-to ) can
optionally specify that execution of the table could cease , causing the callng routine
execution to resume in the conventional subroutine fashion. The characteristic execution
pattern is a sequent.ial scan through the table s rows , testing the predicates one by one
until a row is found whose predicate is true. Then the scan terminates and the actions (if
any) in that row are performed , and the go-to is followed; it wil either indicate
completion of the execution of the table , or it wil name a line in the table at which the
scan is to recommence. When the Markov table is first entered , the scan begins with the
first line in the table. Execution may be terminated in three ways: it can be completed
explicitly. by reaching a special go-to; the sequential scan can get to the bottom of the
tabJe wit.hout having found a line with a true predicate; and finally, an act.ion can be
fruitless , which wil cause a loop suppressor to terminate execution of the table. In all
three cases , there is only one form of return rrom a Markov table , and the calling routine
(or Table) is expected to test for the desired results. (This seemed much simpler than
trying to make the individual action routines guess what its caller had in mind.
The actions called Cor in an ILA may be LLAs , other ILAs , or arbitrary programs (usually
coded in LISP). Since the Markov interpreter is itself a LISP program , an ILA can call
itself recursively.
The H go-to " part oC a Markov table line is interpreted arter completion or the action part.
In its simplest case , the " go-to " consists of the label of a line at which to continue the
search for a true predicate. If several lines have the given label , one of the lines is
arbitrarily chosen; if no lines have the given label , one of the lines is arbitrarily chosen; if
no lines have it or ir it is NIL, execution is terminated. (NIL is our conventional explicit
return. ) The other case involves " loop suppression " and wil be discussed below.
tarkov table is generally a sequence or actions that would transform an initial state
into a final " goal" state via a linear sequence of intermediate states. Whether an action is
applicable t.o a particular state can usually be tested by a relatively simple predicate- the
one heading the t.able line with the action. Since actions in the real world frequently Cail
to achieve their desired results , the Markov interpreter determines which action to execute
by t. (,st.ing t.he state predicates one by one , starting from the goal predicate (on the top
line) and working backward (i.e. , down the table) until a true predicate is found. Markov
operators typically rollow the execution of any component action by starting again with
t.he goal predicate. In its simplest form , each line of a Markov table would contain one of
the state predicates and the operator to be applied to that state; its " go-to " would specify
the first. line , which contained the goal predicate and an explicit return. Fallng off the
end of a Markov table thus corresponds either to a drastic failure of one of the component
act.ions or to an inappropriate application of the Markov operator. Of course , persistent
Cailure of a component action to achieve its desired effect , i.e. , to produce a state
satisfying a predicate higher in the table , would cause indefinite looping in such a Markov
t.ahle. To circumvent this possibilty without requiring specific consideration in each
'farkov t. able, we introduced " loop suppression " into the Markov interpreter. Whenever
the predicate of a line is found to be true, a counter is incremented and checked against a
limit before the line s action is executed; if the counter becomes greater than the limit
then interpretation of the table is terminated without execution oC the action. Thus , if
the limit for a line is three (this is the current default value) then the action(s) on that
line wil be executed a maximum of three times; if the line s predicate is found true a
fourth time , the table will return to the operator that invoked it. OC course , one can
specify a limit for a table line rather than accepting the default value. There is an
alternative form for the " go-to " just for this purpose: rather than being just a label , it
can be a two-element list. In this case , the first element is the label , and the second
elcment is the loop-suppression limit for that line; it is evaluated only once , at the time of
the first loop-suppression check for that line.
Table 5 ilustrates the Markov language by presenting the actual code Cor the lowest- level
ILA that pushes an object. Here , line 10 does some initialization; the action (I.e. , the
(SETQ XYT ARG ... )) is always perCormed because its predicate T is always true. Then
s predicate checks whether the pushing operation is finished by means of its
line 20'
(NEARENOUGH 08 XYTARG TOL) predicate; if this is the case , then no actions (i.e.
NIL) are performed , and control jumps to the label CLEANUP ror some post- processing
before exit. . Line 25' s predicate similarly determines whether the object' s position is
known closcly cnough to continue the pushing operation. (This may not be the case either
initially or as t.he result of the object dropping off the pushbar during a push. ) Line 30
causes t.he table to exit (via CLEANUP) ir the object is past its target. Line 40'
predicat.e is t.rue if t.he robot has just pushed the object into a wall, and finally, line 50'
predicat.e is t.rue if the robot has proper contact with the object. Line 10 and the lines
starting with the label CLEANUP are representative of a more usual programming
language , with t.he normal execution being sequential. Lines 20 through 50 , however , have
the characteristic execution pattern of the ILAs: a loop testing ror the main goal and
various su bgoals and error conditions and recycling after any action is performed. This
particular ILA is designed to be especially simple because it is intended to be embedded in
several more layers of ILA before STRIPS becomes concerned with their robustness. Even
STRIPS-visible ILAs are called by PLANEX (see Chapter 8) from its execution tables , so
it is perfectly acceptable for this lowest-level pushing operator to rail as readily as it does.
c. The Actions
The following are brier descriptions of the present ILAs. The control relations among the
ILAs and between them and the rest of the system are shown in Figue
ILAs that affect the state of the world have responsibilty for making corresponding
changes to Shakey s axiom model of the current world. Such changes are mentioned
below wherever relevant; " $" wil be used to denote unspecified or changing values in the
model.
GOTHRUDR(DOOR FROMRM TORM) moves the robot from room FROMRN
to room TORM via door DOOR. It assumes only that the robot is in FROMRM; it uses
NA VTO to get to the door and BUMBLETHRU to go through it.
BLOCK(DX RX BX) pushes box BX within room RX to a position blocking door DX.
This routine directly replaces the axiom (UNBLOCKED DX RX) by (BLOCKED OX RX
BX) in the model.
PUSIll(DIST OB TOL) is the lowest- level push; as such, it maintains OD' s position
and deletes t. he (NEXTO OB $) and (NEXTTO $ OB) axioms from the model. It pushes
OB forward by DIST feet (within TOL feet); it assumes that the rront horizontal
cat.w hisker is on when it is entered , and it exits under any of the following conditions:
normal navigation. The last argument TOL is optional and is defaulted to 1 foot if not
supplied.
ROLL2(DIST TOL) is the lowest- level free- floor roll; as such it deletes the (NEXTTO
ROBOT $) axiom from the model. It moves the robot forward by DIST feet (within TOL
feet); if it engages a front catwhisker it asserts the (JUSTBUMPED ROBOT T) axiom and
backs away in an attempt to free the catwhisker. TOL is an optional parameter defaulted
to 1 foot if not supplied; DIST may be negative.
BUMLETHRU(FROMRM DOOR TORM) moves the robot from room
FROt\1RM to room TORM through door DOOR. It assumes t.hat the robot is init.ially in
FROMRM and in front of door. It heads for the corresponding position in TOR 1 and
uses the catwhiskers (if necessary) to help it negotiate the door. It updates the (INROOrvl
ROBOT $) and (NEXTTO ROBOT $) axioms in the model , and it is the most basic door-
negotiating routine in the system. It uses the vision routine CLEARP A TH before entering
an unknown room.
PUSH(OBJECT GOAL TOL) is the highest-level ILA for pushing a box. Its three
arguments are the name of aD object , the goal coordinates to be pushed to , aDd the
allowable tolerance. The tolerance argument may be omitted , in which case its value
defaults to 2. 0 feet.
The only precondition for PUSH is that Shakey and the OBJECT are in the same room.
The routine calls FINDPATH (described below) to plan a path to GOAL from the current
object location. PUSH wil fail if any of the rollowing conditions are true:
(1) OBJECT is not in a pushable location.
(3) No path can be found from the curent position of the robot to the
pushplace " of OBJECT , I.e. , Shakey cannot get behind OBJECT.
(2) The robot rolls forward and makes contact with the object with a front
catwhisker , by using ROLLBUMP.
(3) PUSHl is called , which turns on the overrides and causes the robot to
NAVTO(GOAL TOL) wil maneuver the robot to withinTOL feet or the point
GOAL. Like the PUSH ILA , it uses FINDPATH to plan the journey to GOAL. NA VTO
wil fail if no path is found; ir a path exists, it uses POINT AND GOTO 1 for each leg of
the journey.
POINT(THETA TOL) attempts to turn the robot to within TOL degrees of bearing
THETA. If necessary, the vision routine PICTHETA (Chapter Six) wil be used to
determine the bearing of the robot. A catwhisker engaged during the turn wil cause the
robot to turD back to its original bearing and then attempt to locate the object with
PICBUMPED (Chapter Six).
GOTOl(GOAL TOL) moves the robot forward in a straight line t.o wit.hin TOL feet
of GOAL. It wil use ROLL2 to actually move the robot , or it wil use vision under the
following conditions:
(1) If the robot' s location is uncertain ( TOL), it will update its position
using PICLOC.
ROLLBUM(DIST TOL OBJECT) moves the robot forward DIST feet to engage a
front catwhisker on the object OBJECT. It updates the (NEXTTO ROBOT $)
predicate(s) in the model. If an object is not encountered within TOL Ceet of DIST
ROLLBUMP fails.
(DEFPROP PUSHI
((10. T ((S XYTARG (XYTARG (OBPOS OB) (MLVFIND (QUOTE (THETA ROBOT $))) DIST))) 20.
( 20. (NEARENOUGH OB XYTARG TOL) NIL CLEANUP)
(25 . (NOT (NEARENOUGH OB (OBPOS OD) TOL)) NIL CI)
(30. (GREATERP (ABS (ANGLED IF (DEARINGIO XYTARG (OBPOS OD)) (MLVFIND (QUOTE (THETA ROOOT $))))) 90
NIL
CLEANUP)
(40. (MEM (QUOTE HC) (WHISKERS))
((SET DOSETOS NIL) (SETPUSHODPOS OD (PLUS RADFRONT 0. 5)))
CLEANUP)
( 50. (QUOTE FH) (WH ISKERS) )
((OVRID 1. ) (ROLL
(DIFFERENCE (DISTANCE XYTARG (ODPOS (QUOTE ROBO)))
(PLUS RADFRONT (MLVFIND (LIST (QUOTE RADIUS) OB (QUOTE $))))))
( OVR ID 0.
(SET DOSETPOS NIL)
(SETPUSHOBPOS OD RADFRONT)
(MLDELEE (LIST (QUOTE NEXTTO) OD (QUOTE $)))
(MLDELETE (LIST (QUOTE NEXTTO) (QUOTE $) OD)))
20.
(CLEANUP DOSETPOS ((SETPUSHODPOS OB (PLUS RADFRONT 0. 5))) Cl)
(CI (FCWON) ROLLBACK) (ROLL -1. )) C2)
(C2 T ((MLDELETE (QUOTE (NEXTTO ROBO $)))) R))
(*: MARKOVTADLE TABLE))
(DEFPROP PUSHI ((TOL I. ) XYTARG (RADFRONT 1. 5) (DOSETPOS T)) (*: MARKOVTADLE LOCALS))
MACROPS AND PLANEX
UNBLOCK QQ
tt- t--- GOTHRUDR
ROLLBUMP BUMBLETHRU
t----
LOW- LEVEL ACTIVITIES
TA-8973-9
Cunct.ion returns which the robot can navigate from ROB to G. If JOURN is
a path along
PUSH, t.he returned value is a path by which the robot can move a box at ROB to point
G. In this case global variables PUSHOBNAME (name or the box) and OBRAD (radius
the box) are set , so that in computing a pushing path the box radius and the ability of the
robot to get behind the box are taken into account.
The returned value rrom FINDP A TH is a list of subgoal points to be arrived at in order:
((X l y 1 )(X Y 2 ) ... (X l Y n- )G). If a direct- line path exists from ROB to G , the value or
FIND PATH is just (G); if no path exists , the value is NIL.
The pathfinding algorithm is a breadth- first search of the tree or predecessors to G. At
each node of the tree , FINDP A TH tests for a direct- line path between ROB and the
currcnt node , say PN. If it exists , the path from PN to G is returned. Otherwise , the
tree is grown one level deeper from PN by computing predecessors to that point. If no
predecessors exist , the path Crom PN to G is removed from the tree , thus reducing the
search space.
The predecessors to node PN are defined as the intersections of the tangent lines from ON
and ROB around the first obstructing object in the straightline path connecting them.
Thus , each point has at most two predecessors. Figure 8 ilustrates one possible
configurat.ion that would generate the tree in Figure 9.
for P0 in Figure 8), a shorter path (P5 P4) is computed using the tangents that generated
PO. If eit.her of t.hese points is unacceptable under the criterion just described , the entire
search in that. direction is abandoned, and the next node (in this case P3) is considered. A
predecessor that is acceptable under this criterion is added to the tree if a straightline
The searching in FINDPATH terminates , then, when either a path has been found or
w hen the search tree is reduced to NIL. Thus , the path that is chosen (assuming at least
one exists) is the first one found , that is , the one with the smallest number or legs in the
journey. This criterion was chosen over a minimum- distance criterion to reduce the
amount of subsequent thinking and execution time for the robot.
TA- 8973-
Vision Routines
A. In trod uction
The current robot. executive program never calls for a general visual scene analysis.
Instead, undcr appropriate circumstances various or the intermediate- level actions (ILAs)
call specific vision routines to answer certain specific questions. These specialized vision
programs perform three basic tasks: locating and orienting the robot , detecting the
presence of objects , and locating objects.
A sum m ary of the six vision routines currently used by the ILAs is given below in Section
C. PICLOC is described in Appendix B , and CLEARPATH is described briefly later.
l\'fost of the other routines make use of LOBLOC , which uses vision to locate accurately
an object whose position is only roughly known.
The following section describes the operation of this routine in some detail.
B. Object Location
Given t.he approximate floor location of an object , LOBLOC takes a television picture of
the object , analyzes the picture to find the exact coordinates , and enters this information
in t.he robot s world model. This specialized task can be done more rapidly and with less
chance for error by a special program than by performing a complete scene analysis and
then ext.racting the desired answer from the resulting description. However, certain
precondit.ions must be satisfied for LOBLOC to function properly. These are as Collows:
(1) The approximate location must be sufficiently accurate and the object
must be sufficiently small and unoccluded that at least two, and
preferably three , lower corners of the object are in view.
(2) The object and the robot must be in the same room.
(3) The location or the robot with respect to the walls must be known to
within approximately one foot.
The first action that LOBLOC performs is to pan and tilt the television camera so that
the nominal noor position image is in the center or the picture. The resulting picture is
taken at 60-line resolution to speed subsequent region analysis operations. However
before region analysis is begun, the program accesses the model to compute the image of
the wall- rloor boundary. Everything in the picture above this boundary is erased , thereby
eliminating baseboards , door jambs, and other possible sources of confusion.
The resulting picture is then subjected to region analysis. That is , it is partitioned into
elementary regions, and these regions are merged using the phagocyte and weakness
heuristics (16). The following regions are automatically deleted from the resulting region
list:
( 1) The region above the wall- noor boundary.
The next. major step is to identiry the noor region. This is done by scoring each region.
The features or properties that enter into this score are the area A, the ratio R of
perimet.er-squared to area, the average brightness B, and the lowest coordinate Z of th
ext.ernal contour. Letting A max be the largest area , R max the largest ratio , B max the
highest brightness , and Z
min the smallest coordinate , we compute the scoring function by
2 + 2 + 2 +
Z - Z
min
max Rmax Bmax
D2 = .2 1-
The next major step is to inspect the n neighbors of the noor to find the ones that are
most likely to be the faces of the object in question. Special tests are made to treat the
simple cases where n happens to be 0 , 1 , or 2. In general , for each region neighboring the
floor we compute its area A and a quantity X which is a simple measure of the horizontal
displacement. of the region from the center of the picture. These features are combined in
a scoring funct.ion:
2 +
D2 = 1-
max min
aDd the region for which D2 is minimum is declared to be one face of the object. The
same criterion is used to select the other visible face from the neighbors or both the floor
and t.he first. face.
The major problem remaining is to identiry the vertices where the corners of the object
meet. t.he floor. This is done by processing the common boundary between the face regions
and the floor region. After simple straight-line connections are made between endpoints
of any gaps , this common boundary consists of a chain of points along the lower edge of
the object. The lowest point on this chain is taken to be the central vertex , and the
corners on either side are found by the method or iterative end- point fits (17). Once these
t.hree image points are determined , the support hypothesis is used to locate the
corresponding points on the rloor. The resulting coordinates can then be entered in the
model under the name of a new object if the status of the room is unknown , or under the
name of t.he nearest object if the status is known.
The following is a summary or the intermediate- level routines related to Shakey s visual
system:
OBLOC (OB) uses the model information about the location of object OB and the
routine LOBLOC to update (AT OB $* ) and (OAT OB $*
PICBUMED (X Y) is called when a bump occurs at (X Y). If Shakcy is in a. room or
known status , PICBUMPED calls PICLOC; otherwise it calls PICDETECTOB (X Y).
PICLOC uses the landmark routine (Appendix B) to update (AT ROBOT $* ), (OAT
ROBOT $* ), (THETA ROBOT $), and (DTHET A ROBOT $).
Additional material about Shake, s vision. s,stem was reported in. 110/.
The special- purpose vision programs basically perform only three functions: orienting and
locating the robot , detecting the presence of objects , and locating objects. We shall
consider each of these functions in turn.
\Vhen the environment of the robot is represented accurately and completely in the
model , the chief role of vision is to provide feedback to update the robot' s position and
orientation. Angular orientation information is often needed in advance of a rela.tively
long trip down a corridor , where a small angular error might be significant. The simplest
way to obtain orientation feedback is to find the floor/wall boundary in the picture
project it onto the floor, and compare this result with the known wall location in the
model; any observed angular discrepancy can be used to correct the stored value or the
robot' s orientation.
For maneuvers such as going through a doorway, both the robot' s position and orientation
must be accurately known. This information can be obtained from a picture of a known
Before t.he robot starts a straight- line journey, it may be desirable to check that the path
is indeed clear. A simple way to do this is to find the image of the path in the picture
and examine that trapezoidal-shaped region ror changes in brightness that might indicate
the presence of an obstructing object. This is a simple visual task , and a program
implementing it has been written. In its current form the program uses the Roberts-cross
operator t.o det.ect. brightness changes. When we first ran the program , we were surprised
to discover that at steep camera angles the texture in the tile noor can be detected and
give rise to false alarms. This is an instance or a major shortcoming or special- purpose
vision routines , namely, the failure of simple criteria to cope with the variety of
circumstances that can arise. This particular problem can be solved by requiring a certain
minimum run- length of gradient. However , shadows and renections can stil cause false
alarms , and the only solution to some or these problems is to do more thorough scene
analysis. *
Since the camera , television control unit , and television transmitter draw a large amount of
power from the batteries , they are normally oCC. Approximately ten seconds is required Crom the
time these units are turned on to the time that a picture can be taken.
STRIPS
Description
Because STRIPS is basic to our discussion, let us briefly outline its operation. The
primitive actions available to the robot vehicle are preceded in a set of action routines.
For example, execution of the routine GOTHRU(Dl,Rl,R2) causes the robot vehicle
act,ually to go through the doorway, Dl, from room R l to room R2. The robot system
keeps track of where the robot vehicle is and stores its other knowledge of the world in a
model composed of well-formed formulas (wffs) in the predicate calculus. Thus, the
system knows that there is a doorway Dl between rooms R l and R2 by the presence of
(.he w ff CONNECTSROOMS(D1,R2,R2) in the model.
Tasks are given to the system in the form of predicate calculus wffs. To direct the robot
t.o go t:o room R2, we pose for it the goal wff INROOM(ROBOT,R2). The planning
system, STRIPS, then attempts to find a sequence of primitive actions that would change
the world in such a way that the goal wff is true in the correspondingly changed model.
In order t.0 generate a plan of actions, STRIPS needs to know about the effects of these
actions; that is, STRIPS must have a model of each action. The model actions are called
operators and, just as the actions change the world, the operators transform one model
into another. By applying a sequence or operators to the initial world model , STRIPS can
produce a sequence of models (representing hypothetical worlds) ultimately ending in a
model in which the goal wfr is true. Presumably the , execution of the sequence of actions
corresponding to these operators would change the world to accomplish the task.
Each STRIPS operator must be described in some convenient way. \Ve characterize each
operator in the repertoire by three entities: an delete function and a
add function
\Vithin t.his basic framework STRIPS operates in a GPS- like manner (23). First , it tries to
est.ablish t.hat a goal wrr is satisfied by a model. (STRIPS uses the QA3 resolution- based
t.heorem prover (15) in its attempts to prove goal wrrs. ) If the goal wrf cannot be proved
STRIPS selects a " relevant" operator that is likely to produce a model in which the goal
wfr is " more nearly " satisfied. In order to apply a selected operator , the precondition wff
or that operator must of course be satisfied: This precondition becomes a new subgoal
and the process is repeated. At some point we expect to find that the precondition of a
relevant operator is already satisfied in the current model. When this happens the
operator is applied; the initial model is transformed on the basis of the add and delete
funct.ions or the operator, and the model thus created is treated in effect as a new initial
model of the world.
To complete our re,.iew or STRIPS we must indicate how relevant operators are selected.
An operator is needed only if a subgoal cannot be proved from the wfrs defining a model.
In this case the operators are scanned to find one whose effects would allow the proof
attempt to continue. Specifically, STRIPS searches for an operator whose add Cunction
specifies clauses that would allow the proof to be successfully continued (if not completed).
\Vhen an add function is found whose clauses do in Cact permit an adequate continuation
of the proof , then the associated operator is declared relevant; moreover , the substitutions
used in the proof continuation serve to instantiate at least partially the arguments of the
operator. Typically, more than one relevant operator instance wil be Cound. Thus , the
ent ire STRIPS planning process takes the form oC a tree search so that the consequences
of considering different relevant operators can be explored. In summary, the " inner loop
(2) Choose as a relevant operator one whose add function specifies clauses
that allow the incomplete proof of Step 1 to be continued.
(4) If the subgoal is the main goal, terminate. Otherwise , create a new
model by applying the operator whose precondition is the subgoal just
established. Go to Step
The final out.put of STRIPS , then , is a list of instantiated operators whose corresponding
act.ions will achieve the goal.
An Example
Room Rl Room R2
Door
OOXl
ROBOT
Door I
I D2
Room R3
Initial Model
Goal wrr
\Ve assume for this example that models can be transformed by two operators GOTHRU
and PUSHTHRU, having the descriptions given below. Each description specifies an
operator 8chema indexed by schema variables. We wil call schema variables parameter8
and denote them by strings beginning with lower-case letters. A particular member of an
. operator schema is obtained by instantiating all the parameters in its description to
operator arguments (which may be parameters) can assume unique values. (In all of the
following we denote constants by strings beginning with capital letters and quantified
yariables by x , y, or z):
Precondition wrr
INROOM(ROBOT
Add List
INROOM(ROBOT , r2)
Precondition wrt
Delete List
INROOM(ROBOT ,
INROOM(B,
Add List
INROOM(ROBOT r2)
INROOM(b r2).
hen STRIPS is given the problem it first attempts to prove the goal G O from the initial
ll0del M o. This proof cannot be completed; however , were the model to contain other
clauses , such as INROOM(BOX1 Rl), the proof attempt could continue. STRIPS
det( rmines that the operator PUSHTHRU can provide the desired clause; in particular
he partial instance PUSHTHRU(BOXl RI) provides the wff INROOM(BOXl Rl).
: INROOM(BOXl rl)
A INROOM(ROBOT rl)
A CONNECTS(d, rl, Rl).
2: INROOM(ROBOT, r1)
A CO NNEqTS( d r 1 , R2).
: INROOM(ROBOT, R2)
CONNECTS(D1 R2)
CONNECTS(D2 R3)
BOX(BOXI)
INROOM(BOXl R2)
Now STRIPS attempts to prove the subgoal a from the new model M I. The prooC
successful with the instantiations rl = R2, d Dl. These substitutions yield the
operator instance PUSHTHRU(BOXl Rl), which applied to M l yields
r-1
2: INROOM(ROBOT RI)
CONNECTS(OI , RI , R2)
CONNECTS(01 R3)
BOX(BOXl)
INROOM(BOXI , Rl)
Next , STRIPS attempts to prove the original goal , G O' from M 2. This attempt
su(' cessful and the final operator sequence is
A. Introduction
The basic problem-solving system used by Shakey is STRIPS, a system that makes use of
a combination of heuristic search and formal deductive techniques. However, STRIPS in
its original form is limited to constructing a plan for solving a specific problem. In this
section we describe new:
(1) Procedures for constructing "generalized" plans that are applicable to a
large family of problems (in addition to the specific problem that
motivated the planning process).
(2) Methods for storing, selecting, and monitoring the use of generalized
plans while a task is actually being carried out.
The recently developed methods for storing and using generalized plans allow us:
(1) To store a generalized plan as a sequence of, say, n parameterized
operators.
As a rough ilustration of the use of these capabilities , suppose that we already have a
gcncralized plan for closing a door and turning orf a light. We are now given the task of
just turning off some particular light. The methods to be described wil extract from the
original plan the appropriate subsequence of operators needed to turn off the light.
Suppose now that the s bsequence of operators, or subplan for turning off the light also
has the effect of leaving the robot pointing in a specified direction. If this effect is a
legitimate side-effect- that is , if the successful execution of the plan does not require the
robot to be pointing in a specified direction-then the methods described wil identiry this
fact and the final robot orientation wil not be monitored during plan execution. Hence
the plan execution mechanism will not reject as " unsuccessful" an executi n that has
The processes for storing a generalized plan begin with the creation by STRIPS of a
generalized plan, or macro operator-that is, a sequence of n operators whose arguments
are parameters. During the creation of this plan, STRIPS performed proofs
demonstrating that each operator was in fact applicable at the time it was used. We
assume throughout this section the availabilty or both the STRIPS plan and certain
information about the structure of the proofs performed by STRIPS to generate the plan.
We also assume the availabilty of descriptions or each operator used in the plan.
provable from a model ir the operator is to be applied to that model; an add- list
speciCying clauses added to the model; and a delete function (represented as a list of
literals), which maps a set or clauses into a subset of itself that remains true aCtcr the
place in the top cell the add- list A i of operator OP i. Going down the ith column , we place
The late John h1unson of the SRI Artificial Intelligence Center orz ginally suggested thz
tabular format.
- IA
- IA
- (A
D4
TA- 8973-
in consecut.ive cells the portion or A i that survives the application of subsequent operators.
This is indicat.ed by the delete function D i' i = 2, 3, 4 , that maps an add- list into the
subset of it.self remaining true aCter the application of OP i. (The delete function D t of
OP 1 is applied to the model in which MACROP is applied , and not to any of the add-
lists. ) Thus , cell (2, 1) contains D ), which is the portion oC
t stil true aCter OP 2 is
applied. Cell (3 1) contains D )) =D ), which is the subset. of A l that
sun'ives the application of both OP 2 and OP
\Ve can now interpret the content or the ith row of the table , excluding the first column.
Since each cell in the ith row (excluding the first) contains statements added by one or the
first i operators and not deleted by any of the first i operators , we see that the union or
the cells in the ith row (excluding the first cell) specified the add- list obtained by applying
in sequence OP l' ... i. We denote by A i the add- list achieved by the first i
operators applied in sequence. The union of the cells in the bottom row or a triangle table
specified the add- list of the complete macro operator.
Let us now consider the first column of the triangle table , which we have so far ignored.
Loosely, the statements in row oC column zero are involved with the precondition
formula PC i+t of OP i+l. To be more specific , cell (i O) contains clauses needed to prove
i+l but not contained in A I, ... i. We wil call the set of clauses (axioms) used to prove
a formula the 8upport of that formula. The clauses in cell (i O) are thererore the portion
1. General Approach
In the preceding section, we described the construction of triangle tables for storing
generalized plans. Now let us consider how a generalized plan wil be used by STRIPS
during a subsequent planning process.
The first thing to emphasize is that the ith row of a triangle table (excluding its first cell)
represents the add- list A l,..., i; an n-row table presents STRIPS with n alternative add-
lists , anyone of which can be used to reduce a difference encountered by STRIPS during
it.s normal planning process. STRIPS selects a particular add- list in the usual fashion by
testing the relevance or that add- list with respect to the difference currently being
considered. Suppose for a moment that STRIPS selects the ith add- list. A i' i
preconditions for the tail. Concept, ually, then , we can think of a single triangle table as
achieving the add- list. STRIPS must then be provided with a complete
description- precondition wCf , add- list , and delete function-of the extracted operator so
7' F
6' F
Suppose now that STRIPS selects A l ,... 6 as the desired add- list and, in particular , selects
clause 16 and clause 25 as the particular members of the add- list that are relevant to
reducing the difference of immediate interest. These clauses have been marked on the
t.able with a dot. The operator extraction algorithm proceeds by examining the table
determine what effects of individual operators are not needed to produce clauses 16 and
25. First , OP 7 is obviously not needed; we can therefore remove all circle marks from row
, since those marks indicate the support of PC We now inspect the columns , beginning
with column 6 and going from right to left , to find the first column with no marks of
either kind. Column 4 is the first such column. The absence of marked clauses in column
4 means that the clauses added by OP 4 are not needed to reduce t.he difference and are
not required to prove the precondition of any subsequent operator; hence we delete OP
from the plan and unmark all clauses in row 3. Continuing our right- to-Ieft scan of the
columns , we note that column 3 contains no marked clauses. (Recall that we have already
unmarked clause 18. We therefore delete OP 3 from the plan and unmark all clauses in
row 2. Continuing the scan, we note that column 1 cont ins no marked entries (we have
already unmarked clause 11), and therefore delete OP 1 and the marked entries in row o.
11,
0). 15.
19.
19.
21.
21,
TA- 8973-
TA-8973-
OP 5,
OP 2
OP 5
We have thus reduced the seven-step generalized plan we started with to a compact three-
step plan that specifically produces an add- list containing the relevant clauses.
function of t.he new operator is a little more complicat d. First , notice in Figure 11 that
clauses 14, 15 , and 16 are added by OP 2. Clause 14 is evidently deleted by OP 3 since it
docs not appcar in cell (3. 2). The extracted plan, however, does not include OP 3' and we
cannot tell whether clause 14 would survive the application of OP S or OP 6 in the
extracted plan- hence the question mark in Figure 12. Furthermore , cell (3, 1) may
contain more clauses than shown. This example ilustrates the necessity of computing a
new add- list and delete function for the extracted operator.
The computation of a new add- list and delete function for a macro operator is based on
the add- lists and delete functions of the component operators. Suppose the macro
operator of Figure 12 is applied to some state S i (in which we assume that clauses 3, 7 , 8,
and 9 are true). Since STRIPS does deletions before additions, we can write the resulting
state S
f as:
f=D i) + A ) + A ) + A
where we have used u mean set union. Now it is not difficult to show that delete
" to
functions distribute over set union, that is, to show for any set A and B and any delete
function D that
Since this has the form S r = D(S ) + A , we see that the delete function of the macro
operator is the composed (unction
)+D )+A
It is interesting to note that this add- list is precisely the last row or the triangle table
constructed as described in the previous section , the plan OP 2. OP S' OP 6. In general , we
can say that the add- list or a macro operator is given by the last row of its triangle table
representation , and that its delete function is given by the composition of the component
delcte functions.
3. Refinements
push operator. Suppose now that STRIPS selects row 2 as an add- list. By instantiating
081 and OB2 to the same object name , and instantiating PI and P2 to distinct locations
we evidently have a plan for achieving a state in which the same object is simultaneously
at two different places! The source of this embarrassment lies in the delete mechanism
used by STRIPS , which we now examine in some detail.
PUSH (081 . PH
TA- 8973-
The delete function of an arbitrary STRIPS operator is specified by a delete- list consisting
or a set of literals. If the operator is applied to a state S , then STRIPS deletes rrom S
every clause containing a literal unifying (without regard to sign) with any member or the
delete-list. If a potential unification involves parameters , as it orten does , then the
unification can be made only ir it does not contradict any existing bindings or the
parameters to constants. To continue our example , suppose the second push operator is
AT(OB1, PI)
AT(OB2, P3).
The delete- list of the second push operator, we assume, contains the single literal
AT(OB2, $), where U $" unified with anYthing. If there were no existing bindings of
parameters to constants, then both clauses in S would be deleted. From figue 13 , to the
contrary, we see that AT(OBI , PI) was not deleted; hence , it must have been the case
that OBI and OB2 represented distinct objects in the unparameterized problem ror which
the plan was originally created. If in a subsequent use this plan we set OBI
attempt to
OB2 , then we are violating the constraint responsible ror the occurrence or AT(OBI, PI)
in the final state. Accordingly, we replace the entry in cell (2 1) of Figure 13 by the new
entry:
(OAI
:1 082) :) AT(OB1 Pl)
By t.hi means we indicat.e t.hat row 2, and cell (2, 1) in particular , produces the litcral
AT( on 1, PI) only under the condition that OBI and OB2 are not instantiated to the
same const.ant.
The prcyious cxample ilustrates how a literal can be allowed to survive the application of
a dclet. (' funct.ion only under some condition of the bindings of its argumcnts.
introduced this notion in the context of maintaining the validity of a triangle table , but it
is more broadly applicable within the general framework of STRIPS. Although it is an
enlargement on our main theme or storing and using generalized plans , let us briefly
consider how the notion of conditional 8urvival of a literal can be exploited.
During the planning process , STRIPS frequently permits a delete function to delete true
clauses from a state description. To overcome this tendency toward excessive deletions
we make use of the notion of conditional survival as defined by the following algorithm.
Let L(PI) be a literal in a parameterized state description , and suppose that the deletion
of the clause containing this literal depends on binding parameter PI to another
parameter P2. Then:
We should note that the inclusion in a table of such clauses as , say, PI :F P2 :) L(Pl)
leads to certain complications. Suppose , in a subsequent problem, that STRIPS uses such
a clause in the proof of some precondition. Often , the proof wil produce the unit clause
PI = P2. In this case , we consider the proof completed by assuming PI 7' P2 (providing
the assumpt.ion contradicts no existing bindings). However , we must record this
sullption by placing PI 7' P2 in column 0 of the table being constructed; it is , after all
now a hypot.hesis of t. lle theorem. Moreover , all subsequent. proors in the new plan must
not violat.e this hypothesis. As a bookkeeping procedure, we can conjoin the assumption
(viz. , PI 7' P2) to cach new precondition that STRIPS attempts to prove; this has the
efrect of prevcnting violations of our assumption.
Consider t.he situation shown in Figures 14(a) and (b), where we have shown a macro
operator 10P whose ith operator is itself the macro operator OP i. As always, cell (i, i) of
IOP cont.ains the complete add- list of OP i' while the marked entries of Row (i - I)
const.itut.e t.he support of the proor of the preconditions or OP i. During the planning
process , suppose STRIPS selects from one or the rows of MOP certain clauses it would like
to add to the current state or the world. Suppose further that some , but not all , or the
clauses in cell (i, i) of Figure 14(a) are marked. We can therefore mark in Figure 14(b)
those ' clauses in A i that are needed, and exercise the operator extraction algorithm on
table OP i' As
we saw earlier, this wil at times result in the deletion or some or the
clauses from PC i' Suppose , then, that some of the clauses of PC i are in fact deleted by
the operator extraction algorithm. This raises the possibilty or deleting some of the
clauses in the support or PC i since they now need to support only a weaker theorem.
the support of PC i can be weakened- that is , if some of the clauses in row (i - I) can
unma.rked- than in general we may be able to delete more steps from 10P and/or obtain
weaker , more easily established, preconditions for MOP.
implies a theorem T ... nT m' which C s can be deleted from the prcmises if a selected
subset. or the T s are deleted from the theorem? Fortunately, it is possible to solve this
problem by appropriately labeling literals during the refutation proof of the theorem.
will not elaborate here on the details of this bookkeeping procedure. In terms of the
example of Figures 14(a) and (b) the important point is that the bookkeeping need be
done only once , namely, when PC i is shown to be a consequence of its support.
i - 1
SUPPORT
OF PC.
(al
--PC.
(bl
TA-8973-
. From /11
J, page 69,
The ability to relax preconditions leads to an obvious refinement or the operator
extraction algorithm described earlier. Previously, we unmarked clauses only w hen a
component operator was deleted from a macro operator, in which case the entire support
of the precondition or that operator was unmarked. Now we can also unmark a subset of
the support of a component operator still retained in the macro operator. Finally, we
remark that although Figure 14 shows only two levels of tables , the procedure for relaxing
preconditions can be implemented recursively; hence; nested tables to arbitrary depth can
be rcadily processed.
only those cffccts of operators , and only those aspects of the world, relevant to the
problem solut.ion. Additionally, the algorithm embodies a modest replanning capacity in
the form of an ability to reinstantiate parameters or operators.
The plan execut.ion algorithm rests on the observation that a triangle table contains
complet.e information about the internal structure or the plan it represents. More
specifica.lly, a triangle table specifies exactly what each operator accomplishes in terms of
providing support ror the preconditions of subsequent operators or the goal statement.
Equivalently, a triangle table also specifies the conditions that must obtain in order for a
component operator to be applicable. * The plan execution algorithm to be described uses
this information in a straight- forward manner.
Important information about the internal structure or a plan is embodied in the kernels
a triangle table. The ith kernel of a triangle table for an n-step plan is the largest
rectangular subarray containing cells (n, O) and cell (i- i-I). In Figure 10 , by way of an
example , we have outlined the second kernel or MACROP. The importance of the ith
kernel stems from the fact that it contains the support or the preconditions for the tail of
the plan- t.hat. is, the the operator sequence OP i ,.. OP n ' This should be clear , since row
j of the ith kernel contains that portion or the support or PC j+l that must already be
true when OP i is executed. To continue with the example or Figure 10 , cells (2 0) and
Strictly speaking, a triangle table specifies the support Cor the particular proof of a precondition
found by STRIPS. In general , there are many possible supports Cor a given precondition , but we
would not expect a plan execution algorithm to be cognizant oC them.
(2. l) cont.ain those axioms in PC 3 that are
presumably true before OP 2 is cxecuted.
any of the e axioms are Calse , then even perfect execution of OP 2 wil not result in a state
in which OP 3 is applicable. Roughly speaking, then , a reasonable plan execution
a.lgorithm should find the highest indexed kernel with all true entries and execute the
corresponding component operator.
Such an algorithm would reflect the heuristic that it is best to exccute the " legal"
operat.or least rcmoved from the goal.
An import.ant refinement of the rough execution algorithm outlincd above can be obtained
hy not.i ng that the ith kernel contains in general not only those clauses supporting proofs
of precondit.ions but many additional clauses as well. These additional clauses are
irrelevant to the problem at hand, and we would certainly want our execution algorithm
t.o ignore them. The identification of relevant clauses is easily accomplished using the
operator extraction algorithm previously described. The final row or the table
representing a plan to be executed contains the support of the goal formula , and the
support.ing clauses are marked as before. The operator extraction algorithm then
produces a new operator for achieving those clauses. (We dispense with the computation
of precondit.ion formula , add- list , and delete function. ) Typically, but not necessarily, all
the component operators wil be retained. More importantly, only some or the table
entries wil be marked, and these are the only portions of the kernels that need be
monitorr.d.
The t.ask of finding an efficient algorithm for finding the " highest true kernel" that is
t.he highest. indexed kernel with all marked clauses true- is of some interest in itself. Our
algorit.hm for Cinding the highest true kernel involves a cell- by-cell scan of the triangle
table. Each cell examined is evaluated as either True (i.e. , all the marked clauses are true
in t.he current model) or False. The interest of the algorithm stems from the order in
which cells are examined. Let us call a kernel " potentially true " at some stage in the scaD
if all evaluated cells of the kernel are true. The scan algorithm can then be succinctly
stated as:
The reader call verify that , roughly speaking, this table-scanning rule results in a left- to-
right , bottom- to-top scan of the table. However , the table is never scanned to the right
any cell already evaluated as false. An equivalent statement or the algorithm is " Among
all unevaluated cells , evaluate the cell common to the largest number of potentially true
kernels. Break ties arbitrarily. " We conjecture that this scanning algorithm is optimal in
the sense that it evaluates, on the average , rewer cells than any other scan guaranteed
always t.o find the highest true kernel. A proof of this conjecture has not been found.
The plan execution algorithm described above is embodied in a computer program named
PLANEX (24). When PLANEX is called to execute a table , it executes the comp nent
operator associated with the highest true kernel. Typically, but not necessarily, this will
be OP 1. \Vhen OP to find the highest
1 completes its action, PLANEX rescans the table
current.ly true kernel. Typically, but not necessarily, this wil be Kernel 2 , in which case
OP 2 is executed, and so forth, until the goal kernel is reached. We emphasize, however
that after each operator execution PLANEX may either call an earlier operator (perhaps
to rectify an error) or skip to a later operator (perhaps a stroke of luck rendered some
operat.ors unnecessary). Furthermore, many arguments or predicates in the table are
parameters; PLANEX is free to instantiate these parameters in order to find a true
inst.ance of the predicate. Thus , PLANEX is really searching for the highest- indexed
kernel an inst.ance of which is satisfied by the current state of the world. This abilty of
PLANEX t. o instantiate-and reinstantiate-arguments provides a modest replanning
capacity. If the turn of world events produces a situation in which a solution has the
same form as a tail of the original plan, PLANEX will find it. If no tail of the plan is
applicable, t.hen no kernel will be true, and PLANEX returns control to STRIPS to
replan. *
Experiments W i t h Shakey
Experiments
In tohissection we shall describe some experiments now being planned that will illustrate
several features of toherobot system, which we call, informally, "Shakey." Specifically
these will show how Shakey generates a plan to perform a task, and how it then uses part
of this plan later as a component of a plan for performing another task. Saving plans for
later use might be regarded as a form of learning. The experiments also show how the
various levels in Shakey's hierarchical control structure function to enable Shakey to
recover gracefully from several kinds of unexpected failures.
We must. first describe the environment in which Shakey operates and Shakey's model of
this environment. In Figure 15, we show a floor plan of some rooms and doorways in
which our experiments with Shakey will be conducted. We can place several large boxes
and wedge-shaped objects in these rooms; three boxes are depicted in room RCLK of
Figure [Is]. Initially Shakey is in room RUNI. The doorways all have mnemonic names
inclicatJing the rooms they connect; e.g., DMYSPDP connects RMYS and RPDP. Shakey 's
model of this environment is represented by a set of formulas or axioms in the first-order
predicate calculus. The rooms, doorways, boxes, walls, and other entities occur as terms
in formulas that describe important properties of the environment. The axiom model
representing the environment for the planned experiments is listed in Table 6. The room
names are mnemomics for properties of the physical environment:
RHAL Hallway
RRIL = Rila s office
RCLK = Room with the clock on the wall
RRAM = Room with ramp to hallway
RPDT = PDP l0 room
RUNI Unimate room
RMYS Mystery room, i.e. , room with unknown contents.
The meanings of most of the predicate symbols are obvious. AT gives coordinate location
informat.ion referenced to the coordinate system of Figue 15. DAT gives information
about the probable error in this coordinate information. The RADIUS predicate is used
to give rough size information. THETA and DTHET A give information about Shakey
heading and probable heading error, respectively. The UNBLOCKED predicate tells
which doorways are unblocked (i.e. , free of obstructing objects such as boxes). The
predicate ROOMSTATUS is used to tell whether the contents or a room are known or
unknown. The model listed in Table 6 indicates that the contents or all rooms are
assumed to be known except for RMYS. By this we mean that Shakey that he
knows wil
never encounter any new objects except perhaps in RMYS. This knowledge is used to
guide certain picture-taking behavior, as we shall see later. The LANDMARKS predicate
gives the locations or various landmarks such as corners and doorjambs that Shakey can
take pictures of to update its position. The axioms at the end of the model in Table 6
(beginning with the predicate WHISKERS) give information about the status of various
lower- level motor and sensing activities , e. , the status of the catwhisker switches and
camera control settings. (These were explained in Chapter Four.
Altoget.her t.here are 170 axioms in the model initially, which makes this model quite large
in comparison with those used by any previous automatic problem-solving systems.
In order to perform the tasks described below , Shakey has available a repertoire or ILAs.
(The operation or these ILAs is described in Chapter Five. ) The problem-solving system,
STRIPS , must be aware of the properties of the available ILAs. Therefore each ILA is
represented for STRIPS by an operator with specified preconditions and effects. These
operators and their descriptions are given in Table 7 using the add and delete lists
employed by STRIPS.
RHAL
RRIL
BOXO
RRAM DRAMCLK
BOX2
RCLK
BOX1
DMYSCLK
RMYS
DMYSPDP
RPDP
SHAKEY
RUNI
JC feet
T A- 8973-6
TABLE: 6 , continued
JOINSFACES(DCLKRIL FNCLK FSRIL)
JOINSFACES(DMYSRAM FNMS FSRA)
JOINSFACES(DMYSCLK FEMYS FWCLK)
JOINSFACES( DMYSPDP FEMYS FWPDP)
JOINSFACES(DPDPCLK FNPDP FSCL)
JOINSFACES(DUNIJIYS FNUNI FSMYS)
DORLS(DRAL 11. 200000 18. 200000)
DORLOS(DRACL 26. 799998 32.
DORLS(DCLXRIL 21. 700000 24. 799998)
DORLS(DMYSRA 10. 0 15. 200000)
DORLS (DMYSCLX 16. 200000 20. 799998)
DORL(DMYSRDP 9. 7000000 14. 799998)
DORLS(DPDPC 25. 799998 30. 799998)
DORLS(DUNIMYS 10. 799998 16.
ROCA'IS( RHL KNOWN)
RoaTA'nS( RRIL KNOWN)
ROSTA'IS( KR KNOWN)
ROTA'IS( RCL KNOWN)
ROA'IS( AMS UNOWN)
ROA'IS( RPDP KNOWN)
ROA'IS( RUN I KNOWN)
LANDRX(RHL (COORD (4. 11. 20000 35. 5 0.
LANDMARJ( RRIL
(CORD (4. 21. 700000 35. 400000 -1.
(3. 24. 799998 35. 400000 -1.
(2. 18 79999 49. 0 4.
(2. 36. 80000 49 0 3.
(2. 36. 80000 35. 400000 2.
(2. 18. 799998 35 400000 1.
LANDRJ( RMN
(CORDS (4. 18. 200000 26 799998 0.
(3. 18. 200000 32. 0 0.
(1. 11. 200000 35. 5 2.
(4. 10. 0 24. 0 -1.)
(3. 15. 200000 24. 0 -1.
(2. 0. 0 35. 5 4.
(2. 18. 200000 24. 0 2.
(2. 0. 0 24. 0 1.
JOINSRO(DMYSRA KR JUS)
JOINSRO(DMYSCLI RC DlS)
JOINSROMS(DMYPDP RPP RJS)
JOINSROMS(DPDPC RPDP RCL)
JOINSROS( DUNIJlS RUI RJ)
LANDRK (
(COORD (4. 24. 79999835. 0 -1.
(3. 21. 700000 35. 0 -1.
(4. 25. 799998 15. 200000 -1.
(3. 30. 799998 15. 200000 -1.
(4. 18. 599997 20. 799998 0.
(3. 18. 599997 16 200000 0.
(4. 18. 599997 32. 0 0.
(3. 18. 599997 26. 799998 0.
(2. 18. 599997 35. 0 4.
(2. 36. 800000 35. 0 3.
(1. 36. 800000 15. 200000 2.
(2. 18. 599997 15. 200000 1.
LANDRK ( RMS
(CORD (4. 18 .200000 9. 7000000 4.
18. 200000 14. 799998 1.
(4. 18. 200000 16. 200000 0.
(3. 18. 200000 20. 799998 0.
(4. 15 .20000 3 . 599997 -1.
(3. 10. 0 23. 599997 -1.
TABLE 6 , continued
(4. 10. 7999987. 6000000 -1.
(3. 16. 000000 7 6000000 -1.
(2. 023. 5999974
(2. 18. 200000 23. 599997 3.
(2. 18. 200000 7 6000000 2.
(2. 0 7. bOOOOOO 1.
LARKS ( RPDP
(CORDS (4. 30. 799998 14. 799998 -1.
(3. 25. 799998 14 799998 -1.
(4. 18. 200000 14. 799998 -1.
(3. 18. 600000 9. 7000000 0.
(2. 36. 800000 14. 799998 3.
( 2. 36. 800000 8. 2000000 2.
LAMARKS ( Rt
( CooRDS (4. 16. 000000 7. 1999999 -1.
(3. 10. 799998 7. 1999999 -1.
(2. 16. 0 7. 1999999 3.
(2. 17. 200000 2. 1999998 2.
(2. 0.0 2. 1999998 1.
WHIsKERS(ROBO ,
IRIs(ROBOT , 1)
OVERIDE(ROBO ,
RAGE(ROBO , 30)
TVDE(ROBO ,
FOS ( ROBOT , 30)
(ROBO ,
TILT(ROBO ,
DPAN(ROBO 12)
MILT(ROBO
DIRIs(ROBO ,
DFOS(ROBO ,
PICTRESAKE( ROBO , 0)
JUSTBUED( ROBO , NIL)
TABLE 6 , concluded
\Ve shall now describe the planned experiments that wil use the modt'l or Table 6 and the
operators shown in Table 7. The description wil be in terms of the expccted rcsults or
these experiment.s.
a. Task 1
Start.ing wit. h t.he configuration or Figure 15 (represented by the model in Table 6),
Shakey wil perform two tasks. Each of these tasks is stated in English and entered into
the system via teletype. The first task is stated as U USE BOX 2 TO BLOCK DOOR
DPDPCLK FROM ROOM RCLK. " This statement is converted by the English language
ENGROB (26) to a goal expressed by a well- formed rormula (wff) of the first-order
syst. em
predicate calculus: BLOCKED(DPDPCLK, RCLK, BOX2). The STRIPS problem-solving
system is t. hen called to compose a sequence of operators whose execution wil create a
world model in which this goal wrr is true. In terms of the operators in Table 7 we can
Rather than generating this specific solution, STRIPS generates a generalized plan that
involves going rrom an arbitrary initial room through an intermediate room , and into a
third room and then blocking a doorway in the third room. The rooms , doorways , and
blocking object in this generalized plan are represented by parameter.lJ. The generalized
plan is thus a subroutine whose arguments are the parameters. These arguments are
bound to specific constants only when the plan is executed. The value or the generalized
subroutine is that it can be stored away (or " learned" ) and then used again in other
sit.uations perhaps as part or a plan (or a more complex task.
BLOCK ( DX . R.'\ , BX)
Precondi t ions:
Delete List:
AT (ROBOT , $2)
AT(ax, S2)
UNBLOCKED( DX , RX)
NEXTTO( ROBOT , S 1 )
NEXTTO( 8.'\ , S 1 )
NEXTTO(Sl 8.,\)
Add List:
Precond i ti ons :
Delete List:
AT(ROBO , $2)
BLKE( DX , RX , ax)
AT(BX S2)
NEXTTO( ROBOT , S 1)
NEXO( ax , S 1)
NEXTTO( S 1 , BX)
Add List:
-UNLOCKE ( DX , RX)
NEXO(ROBO , ax)
Unblocks door DX by pushing object ax away from its place in room RX directly in
front of door DX.
Precondi t ions:
Delete List:
AT(ROBO , 52)
NEX'MO( ROBO , $1)
INROO( ROBO , $1 )
002(
Precondi tions :
Delete List:
AT( ROBO ,52)
NEXO( ROBO , 51)
Add List:
*NEO( ROBO ,
Takes Shakey from any point in a room to a location next to any object or doorway, X
1n the same room. (Shakey w111 nav1gate around obstacles that might be 1n the way of
a direct pnth.
PUSH(OB, , Y)
Precond1 t1ons:
Delete List:
AT(ROBO , $1, $2)
NETlO( ROBO , $1)
AT(OB S2)
(OB Sl)
NEXT1( S 1 , OB)
Add L1st:
*AT(OB , Y)
NE( ROBO , OB)
Pushes object OB from one po1nt in a room to a coordinate location (X , Y) in the same room.
(Shakey must initially be in the same room as OB and (X Y), but w111 push OB around obstacles
that might be in the way of a direct path.
NAVTO(X , Y)
Precondi tions:
TABLE 7 , continued
Delete List:
AT( ROBO , $1 , $2)
NEXTTO( ROBOT , $1)
Add List:
.AT( ROBO , , Y)
Takes Shakey from any point in a room to the coordinate location (X Y) in the same room.
(Shakey wi 11 navigate around obstacles that might be in the way of a direct path.
POINT( DIRECTION)
Precond it ions:
none
Delete List:
THETA(ROBOT , $1)
Add List:
*THETA(ROBO , DIRECTION)
PUSH3(OB
Precondi tions:
Delete List:
AT(ROBOT , $2)
NE( ROBO , $1)
AT(OB $2)
NE0( OS , $1 )
NETI: $1 , OS)
Add List:
.NEXTO( OS ,
NEX( ROBO , OB)
Pushes object OB from one point in a room to a location next to any object or doorway X
in the same room. (Shakey will push OS around obstacles that might be in the way of a
direct path.
Note: An asterisk(.) in front of an add-list clause indicates that this clause is one of
the " primary effects " of the operator.
TABLE 7 , concluded**
After the creation of the triangle table representation or MACROP1, STRIPS prepares a
version of it that wil solve the given task , namely, to " Use BOX2 to block door DPDCLK
from room RCLK. " version is obtained from MACROPI by replacing those
This
parameters standing for constants in the goal wff by those constants. That is , in this
case , we replace PARI by DPDPCLK, P AR2 by RCLK , and P AR3 by BOX2 throughout
the MACROPI triangle table. This instantiated table is then given to PLANEX for
execution.
PLANEX is a program that supervises the execution of those ILAs corresponding to the
For a discussion of the operation of PLANEX , see the last part of
operat.ors in the plan.
executed. In the case of the task we are considering the first ILA to be executed is
GOT02(DUNIMYS), which causes the robot to go to the door named DUNIMYS.
Note: For all triangletables , an asterisk (* ) berore a clause indicates that this clause was used to
prove the preconditions or the operator named at the right or the row in which the clause
appears.
The PLANEX algorithm then determines that the next ILA to be executed should be
GOTHRUDR(DUNIMYS, RUNI RMYS). Execution of this ILA begins by calling the vision
rout.ine CLEARP A TH, which takes a TV picture through the doorway to determine
whether the path in RMYS is clear (since the contents or RMYS are unknown). The path
is in fact clear, so Shakey proceeds through the doorway.
Next PLANEX calls tor the execution or GOT02(DMYSCLK). Since the contents or
1YS are unknown to Shakey, GOTO calls CLEARP A TH again. To illustrate how
Shakey can deal with unroreseen difficulties, we now place a box directly in Shakey s path
in rront or the door DMYSCLK. As Figue 15 and Table 6 show , Shakey does not know
or the existence or this box. CLEARP A TH determines that the path is blocked and notes
the approximate location ot
it might
the blocking object. Since Shakey expects that
encounter unknown objects in room RMYS, GOTO next calls a vision routine called
OBLOC. This routine calculates the size and exact location or the object , gives it a name,
BOX3, and adds this inrormation to the model. (it also assumes, perhaps optimistically,
that t, he new box is pushable. OBLOC also notes that BOX3 is blocking door
DMYSCLK, so it adds the wrr BLOCKED(DMYSCLK, RMYS, BOX3) to the model. Since
the conditions for continuing the execution or GOTO(DMYSCLK) are no longer satisfied
control returns to PLANEX. Our interest in this experiment is to show how Shakey can
gracerully recover rrom such an unexpected tailure or ita plan.
Here we see one or the advantages or constructing parameterized plans. To perform the
original task, we first constructed a parameterized plan having an instance that solves the
problem. Later in the task execution we find that after an unexpected difficulty, another
instance or the same parameterized plan can be used to achieve the goal. We expect that
this method of error recovery will be quite valuable in robot problems. (Ir PLANEX could
).
;:
;:
TRIANGLE TABLE FOR MACROPUPAR3 PARI , PAR2 PAR4 PARS PAR7 , PAR6)
Aft.er finding this new instance of MACROP1 , PLANEX calls for the cxecution of the first
operator GOT02(DMYSPDP). Shakey thus moves to door D 1YSPDP. PLANEX next
calls for going through the door , and the process continues until finally Shakey enters
room RCLK. Then PLANEX calls for t.he execution of
BLOCK(DPDPCLK RCLK BOX2). Running this ILA calls ror going to BOX2 and
pushing it around BOXl and then to door DPDPCLK (a " hyo-Ieg " push). The local
planning necded to accomplish this push operation is done ent.irely within t.he PUSH ILA
called by BLOCK. With this operation complete, Shakey has accomplished the first task
in spite of the unforeseen difficulty. We also note that fACROPI has been filcd away
and can be used as an operator in future problem solving.
b. Task 2
The state of t.hings in Shakey s world is now as shown in Figure 16. V"le now test
Shakey s ability to learn by giving it a task that can be solved by using part
tvtACROPl. The statement of the task given to the system, in English, is " UNBLOCK
DOOR DYMSCLK FROM ROOM RMYS. " That is, we want Shakcy to move away the
object (BOX3) that it discovered to be blocking DMYSCLK.
STRIPS now attempts to find a sequence make the wfr true , but
of operators that wil
now it has 1ACROPI available in its operator repertoire (in addition to the operators
corresponding to ILAs). STRIPS first decides that it should try to apply the operator
UNBLOCK(DMYSCLK RMYS, BOX3). To do so , Shakey must be in room RMYS , so
STRIPS looks for operators that wil achieve INROOM(ROBOT RMYS).
STRIPS determines that an instance of the GOTHRUDR operator wil work , but so also
wil subsequences or MACROP!. One subsequence consists of the first two operators in
NIACROPI and the other consists of the first four. (For a discussion of how STRIPS
makes selections of MACROP subsequences , see Chapter Eight. ) Since an instance of
sequence of the first four operators in MACROPI is both applicable in Shakey s present
RHAL
RRIL
BOXO
RRAM DRAMCLK
RCLK
BOX1
BOX3 DMYSCLK
tHAKEY
BOX2
RMYS
DMYSPOP
RPOP
RUNI
feet
TA- 8973-
GOT02(PAR6)
GOTHRUDR(P AR6 P AR7 , P AR5)
GOT02(P AR4)
GOTHRUDR(P AR5 P AR2)
The rom pJete generalized plan for the second ask is:
This generalized plan is given the name MACROP2 and is saved for possible later use.
After creat.ing the general version of MACROP2, STRIPS prepares a version of it for
PLANEX by instantiating it with those constants appearing in the task descript.ion.
Namely, DMYSCLK is substituted for PARI and RMYS for PAR2. It t.hen givcs this
partially instantiated version to PLANEX to be executed. PLANEX finds that the
following instantiation of the plan wil achieve the goal:
Next , PLANEX calls ror execution of MACROPI ' . This execution is accomplished by
PLANEX itself. The abilty to handle U nested" triangle tables is one of the features or
our system. PLANEX discovers that the first ILA to be executed in MACROPl' is
GOTO(DRAMCLK). In a similar manner, PLANEX ultimately executes the entire string
of ILAs in MACROPI' and then the UNBLOCK ILA to accomplish the second task.
--::
"'
."
::
"' "-
TR IANGLE TABLE FOR MACROP2( PAR3 , PAR 1 , PAR6 , PAR7 , PAR5 , PAR4 , PAR2)
CH (I
* INROOM( ROBO , PAR7)
* UNBLOCKED ( PAR6 , PAR7)
*JO I NSROOMS ( PAR6 , PAR7 , PAR5)
:: Z *UNBLOC ED( PAR6 , PAR5) MACROPl' (PAR2 PAR4 PAR5 PAR7 PAR6)
*JOINSROOMS( PAR4 , PAR5 , PAR2)
* UNBLOCKED ( PAR4 , PAR5)
-I *UN BLOCKED ( PAR4 , PAR2)
see whether it is stil less than some specific tolerable error. Whenever the error estimate
eXfccds thc tolerance , a visual program called LANDMARK is called. LANDMARK takes
it picture of some nearby feature (such as a joorjamb), calculates from this picture the
robot, s actuallucation , and enters this updated location into the model. It also resets the
DA T predicate to the error estimate appropriate after having just taken a picture.
Several fcatures of the system are ilustrated in these experiments. Most important or
these arc the ability to learn generalized plans and the abilty to recover from various
types of failures. The system or ILAs is designed to be robust in the sense that each ILA
does w hat it can locally to correct any errors. When the appropriate recovery procedures
are beyond a specific fLA' s knowledge and abilties , there are several higher levels where
recovcry can occur , namely, at higher level ILAs , in PLANEX , or in STR.IPS.
By Vladimir Li slcovslcy
The following note from IS) by Vladimir Lieslcovslcy described the robot
vehicle in some detail:
At. t.he beginning of the project , only very sketchy information was available about specific
requirements for the vehicle. The general requirements given were that the vehicle should
be able to maneu ver on a linoleum-tiled laboratory noor , move on ramps that had up to a
ten percent. slope , be not wider than a doorway, weigh not more than approximately 200
Ibs , move under radio-transmitted digital-computer control , and be energized by an on-
board power source. It was further specified that the vehicle should bc able to turn
around it.s own vertical centerline in either direction and be able to move both forward
and backward.
105
between the wheels accommodates the main drive motors , and for a low center of gravity,
the batteries.
A 4- in vertical distance above the platform was reserved ror proposed manipulator arms.
A standard 19- in electronic rack , supported at three points , was located above this
reserved space. A video camera and range finder combination was mounted atop the
rack.
One of the first decisions to be made was the selection of the form of energy to be used
for drive purposes. Among those considered were hydraulic , pneumatic , and eventually,
electric drives. Since electrical power had to be made available for the electronics , electric
drive was ultimately selected. The choice between secondary batteries and fuel cells was
dict.ated mainly by price and delivery figures in favor or the batteries. Two 12-volt
batteries in series were used to establish the operational , nominal voltage at 24 Vdc. The
choice between drive motors was reduced to either a straight de motor, an inverter and ac
motor combination, or stepping motors. Complexity and control considerations of the
digital commands ruled out the inverter/ac combination. Direct current motors , although
elect.rically noisy, were attractive due to their high power density and good torque
characteristics. Manufacturer s quotes were uniformly forbidding: six months for delivery
and a price in excess of several thousand dollars for each motor. The units would have
had st.andard clutches , brakes , and position readout capability for feedback information.
St.epping motors , although they surfer from low power density, are excellently suited for
digital control , and they were immediately available and were low in price (not more than
about $200. 00 each). Therefore , the decision was made to use stepping motors exclusively
for prime movers. Not all or the motors selected were rated at 24 Vdc, but they wcre
easily converted by using dropping resistors.
In order not to lose count or the steps in the drive train between the motor and the drive
wheel , the speed reduction between the motor and the wheels had to be one without
slippage , that is , positive. The reduction was necessary to increase available torque from
the motors and to reduce the amount or translation per incremental step of the motor to
106
1/32nd of an inch measured at the periphery of the wheel. For evcry control pulse , the
stepping motor executes a rapid change in its angular position. Depending on the incrtia
of the driven load and the damping of the drive trains , oscilations may develop. These
oscillations were reduced by limiting the incremental stcpsize , i. , tbc gencrated
amplitude. A coggcd belt , or timing belt arrangement was sclected for the drive train.
This was to give the necessary positive drive , while also introducing damping. As
turned out , the belt proved to be a secondary source of oscilations , since bending
vibrat.ions were generated in the belt when the stepping motor was operated. Increasing
t he belt. tension reduced the oscilations to an acceptable level.
pat.tern pit.ch , were mounted on the motor housing. This arrangement proyided for 200
positions for every revolution , which is also the step-pattern of the motor. \Ve used the
simple schematic , described in (27) to complete the feedback loop. In operat. ion , no step
command can be given until after the information from the position feed- back disk
indicat.es that the previous step has been completed. Simply, the motor cannot miss a
step.
3. Wheels
The rubber wheels presented another problem: due to their finite elasticity, transient
motions generated either by the vehicle itself, or by its environment , resulted in disturbing
oscilations of the whole vehicle in pitch and roll modes with a time constant of about 2
seconds. This amount of settling time was judged to be unacceptable because no picture
taking with the TV camera could be initiated during that time. Since friction on the
driving wheels had to be maintained , but elasticity minimized , a properly-stiffened rubber
107
driving rim on a metal wheel proved to be an acceptable solution. Since the castor
w heels , however , could remain relatively compliant , but required reduced friction on the
floor, they were capped with a metallc rim and gave good results.
mounted on the shaft of a 200-step/revolution stepping motor. The yoke was designed for
these functions only. The shaft or the pan motor is coaxially mounted with the vertical
centerline of the vehicle; that is , if equal and opposite commands are given to the driven
w heels , the location of the pan motor shaft does not change. The TV camera is located in
such a fashion that the photosensitive surface of its vidicon tube is exactly at the
intersection of the vertical pan axis and the tilt axis. Turning the vehiclc about its
vertical axis , panning the camera , and tilting it , does not affect the location of the vidicon
surface , only its direction.
108
It also seemed expedient to attach the range finder directly to the TV camera. In this
way, the distance of an object , viewed by the optical centerline or the TV camera , from
the range- finder can be measured.
A scparate arrangement or the TV camera and the range finder was similarly logical:
dist.ance-mapping or the surroundings could be accomplished while the TV camera could
digest " and recognize a particular scene. However, the kinematic complexity of this
arrangement seemed prohibitive when compared to the possible advantages.
Stepping motors were mounted onto the TV camera lens housing for computer controlled
adjustment of the focus and the iris. Since these motors operate in the open loop mode
step count may be lost. Therefore, separate limit switches for both focus and iris
functions and at both ends or their range are provided. Whenever the limit switches are
actuat.ed , the counters are reset accordingly. This is also the scheme utilzed in the pan
and tilt modes.
5. Tactile Sensors
Tactile sensors are mounted at the front and back and on both sides or the vehicle to
provide protection against damage to the vehicle and to its surroundings and to provide
t.ouch information. These sensors were selected from commercially available
microswitches , and are actuated by a nexible coil spring approximately 6 inches long.
Piano wire whiskers or extensions may be added to the end of the coil springs to provide
longcr reach. The guiding principle has been to sense the presence of a solid object within
the braking distance or the vehicle when it is traveling at top speed. Addit.ional
appropriately placed sensors protect the TV camera against collsion in the translational
and the rotational modes. The actuation or any sensor wil inhibit the corresponding
action , while override is also made available.
As further protection against collsions , heavy rubber bumperstrips are mounted on all
protruding edges of the vehicle. If the performance capacity of the main drive motors
permits , these bumpers wil be used to move objects around the environmental room.
109
Appendix B
Some Current Techniques For Scene Ana.lltBis
Appendix B
Some Current Techniques For Scene Ana.lysis
Shakey.
Richard o. Duda
I. Introduction
The purpose of the visual system is to provide the automaton with important information
about its environment , information about the location and identity of walls , doorways
By adding new information to the model , the visual
and various objects of interest.
system gives the automaton a more complete and accurate representation of its world.
The role of vision is not independent of the state of the model. If the automaton has
entered a previously unexplored area , the visual scene must be analyzed to add
information about the new part of the environment to the model. In this situation , the
model can provide so little assistance that it is ofter not referenced at all. On the other
hand , if t.he automaton is in a thoroughly known area , the role of vision changes to one of
providing visual feedback to correct small errors and verify that nothing unexpected has
happened. In this situation, the model plays a much more important role in assisting and
actually guiding the analysis.
Until recently our attention has been directed primarily at the general scene-analysis
problem. Every picture was viewed as a totally new scene exposing a completely unknown
area. More recently we have addressed the problem of using a complete , prespecified map
of the rIoor area to update the automaton s position and help in tasks such as going
through a doorway. Another use of this kind of visual feedback would be the monitoring
of objects being pushed.
113
In trying to solve these problems , we have tended to take one or the other of two extreme
approaches. Either we tried to develop general methods that can cope with any possible
situation in the automaton s world, or we tried to exploit rather special facts that allow
an efficient special- purpose solution. The first approach involves the more interesting
problems in artificial intellgence, but it provides more capabilities than are needed in
many situations , and provides them at the cost of relatively long computation times. The
second approach provides fast and effective solutions when certain (usually implicit)
preconditions are satisfied , though it can fail badly if these conditions are not met.
Eventually, of course , some combination of these two approaches wil be needed , since the
automaton actually operates in a partially known world, rather than one that is
completely unknown or completely known. However , we have decided to concentrate on
these two extreme situations before addressing the intermediate case. The remainder of
this note describes the current status of our work in these areas.
representing walls , fioors , faces of objects , etc. The basic approach has been described in
detail elsewhere (16), and only a brief summary wil be given here. The procedure begins
by partitioning the digitized image into elementary regions of constant brightness. This
usually produces many small, irregularly shaped regions that are fragments of more
meaningful regions. Two heuristics are used to merge these smaller regions together.
Both of these heuristics operate on the basis of fairly local information , the difference in
brightness along the common boundary between two neighboring regions. The heuristics
are not infallble; they can merge regions that should have been kept distinct , and they
can fail to merge regions that should been merged. However , they reduce the picture to a
small number or large regions corresponding to major parts or the picture , together with a
larger number of very small regions that can usually be ignored.
The effect of applying these heuristics is best described through the use or examples.
Figure B-1 shows television monitor views of three typical corridor scenes. Figure B-2
Our earlier work in scene analysis is described in (7). Additional inCormation on more recent
work is contained in (8), (16), (29), and (30).
114
shows the results or appiying the merging heuristics to digitizcd vcrsions or these pictures.
The boundaries of the regions in these pictures are directed contours , and can be traced
using the correspondences shown in Table 8-1. Generally spcaking, important rcgions can
be separated from unimportant regions purely on the basis of size. Figure B- , for
example , contains four large , important regions. Three of them arc directly meaningful
(the door , the wall to the right , and the baseboard), and the fourth is the union or two
important regions (the floor and the wall to the left). An inspection of Figure B- 2b shows
similar results. Figure B-2c shows the result of applying the technique to a complicated
scene; while some userul inrormation can be obtained , the resolution available severely
limits the usefulness or the results.
Our only complete scene-analysis program is oriented toward identifying boxes and
wedges , objects with triangular or rectangular faces , in a simple room environment (16).
For this task , we begin by fitting the boundaries of the major regions by straight lines.
Regions are identified as being part of the fioor, walls , baseboards , and faces of objects by
such properties as shape. brightness , and position in the picture. Objects are identified by
grouping neighboring races satisfying some of the simpler criteria used by Guzman (31).
In the process , certain errors caused by incorrect merging are detected and corrected. We
have yet to complete a similar analysis program for the conditions encountered in corridor
scenes. However , we have investigated the problem of obtaining a scene description that
is internally consistent; the next section describes the analysis approach for this problem.
If we assume temporarily that the merging heuristics have succeeded in the sense that all
of the large regions are meaningful areas , then the only basic problem remaining is the
proper identification of each region. Examination of the corridor pictures indicates the
need to be able to identify a number of different region types , including the following:
115
Ca) DOOR
Cb) HALL
TA- 8259.
116
i1: ,iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii,
iinnmmmmiiiiiiiimiii!
:ti: :::!t!:!!i::!i
'::!::;:nY:l::::
I"
ii:
i'1; ;!i :
! It
i iflt lt
r""""""'''''''''''''''''' .".... ...... ......
lt
ff...
It'
:lt lttlilt
---_u..---..---
.?tic
:tk
mum...
lij,
H;i;'.'''!!.'''''''''.....''''''''''''''''m,,;;..;n''i!j!!lilir
I'
I :
1", i'
;: f::H:H::! H:!' !IH:::''': .Hn :HH::::::::::::::::
\i.
:::f ::::::: tiiiiiiF:....:. nnli::::::::::
.---. ............-..------..-_u..
117
CONF IGURA TION CHARACTER CONFIGURATION CHARACTER
'f 4.2158-24
118
( 1 ) Floor (8) Sign
(2) Wall (9) Window
(3) Door (10) Clock
( 4) Door jamb (11) Doorknob
( 5)Object race (12) Thermostat
(6) Baseboard (13) Power outlet
(7) Base board renection (14) Automaton.
Ea.ch of these regions has certain properties which tend to characterize it uniqucly. For
example , the noor region is usually large, bright , and near the bottom of the picture.
Howcver , most regions can be identified with greater confidence if the nature or t.heir
neighbors is considered as well. Thus , the presence of a baseboard or baseboard renection
at the top of a region almost guarantees that the region is the noor; conversely, the
presence or wall area immediately above a region guarantees that it can not be a
baseboard renection. If regions are identified without regard to how that choice affects
the oyer all scene description , the chance for error is increased. Moreover , the resulting
description can be nonsensical.
r.1any, though by no means all , of the relations between types of regions relate to
neighboring regions. Table B-2 indicates those types of regions that can and cannot be
legal neighbors. We can easily add to this further restrictions , such as the fact that the
baseboard must have the wall as a neighbor along its top edge. These are some of the
important known facts about the general nature of the automaton s environment. The
problem is to use facts such as these to aid in the analysis of the scene.
One approach to solving this problem is to use these facts as constraints to eliminate
impossible choices. Suppose that each significantly large region in the picture is
t.ent.atively classified on the basis attributes of that region alone. Suppose further
of the
that a score is computed (or each region that measures the degree to which it resembles
each region type. ** For any selection of names for regions , we can define the score for the
resulting description as the sum of the individual scores. Then, we can analyze the scene
By " sign " we mean a dark vertical bar on the wall used , as ilustrated in Figure B- , to iden tify
an office.
This score might be interpreted as the logarithm of the probabilty that the given region is of
the indicated type.
119
CJ o
II ce
. C C
-t .. o
CJ
au CD II
-t
CD en
"" au
en..
C cau
CD CD a:
. FLOOR
WALL
DOOR
DOOR JAMB
OBJECT FACE
BASE OARD
BASEBOARD
REFLECTION
SIGN
WINDOW
C1.0CK
DOORKNOB
THERMOSTAT
POER OUTLET
AUTOMATON
T A-859-
120
by trying to find highest scoring legal selection of region names. With no loss in
generality and some gain in convenience , we can work with the losses incurred by selccting
other than the highest scoring choice. In terms of losses , we want the legal description
having t.hc smallest overall loss.
This problem is basically a tree-searching problem. The start node of the tree
. corresponds to the first region selected for naming. The branches emanating from that
node correspond to the possible choices of names for that region. A path through the trce
corresponds to a unique labeling of the picture. Thus , if there are N possible region
names and R regions , there are potentially . NR possible paths through the tree. Each path
passes t.hrough R+ 1 nodes from the start node to the terminal node. Every terminal node
has a loss value, which is the sum of the losses incurred for the choices along the path to
that node. A goal node is a terminal node corresponding to a complete, legal scene
description. We seek the goal node with the smallest overall loss.
This is a standard problem in tree searching, and optimum search procedures are known.
Assume that some choices have been made for some of the regions so that we have a
partially expanded tree. Using the Hart- Nilsson- Raphael terminology (32), some of the
t.erminal nodes of this tree are open nodes , candidates for further expansion. Each open
node has an associated loss g, the sum of the losses from the start node to that node.
wc assume that there is no reason to believe that zero-loss choices cannot be made from
that node on, then the optimal search strategy is to expand that open node having the
minimum g.
To expand a node, we must select a region not previously considered and examine the
possible choice for that region, ruling out any choices that are not legal. Different
strat.egies can be used for selecting the next region. It seems advantageous to ask it to be
a neighbor of the regions selected previously, since this maximizes the chance of detecting
illegalities. In general , we wil have several neighbors for candidate successors. Of these
it seems reasonable to select the one having the highest score, under the assumption that
the first choice name ror this region is most likely to be correct.
After a region has been selected , it is necessary to examine the choices one can make ror
its name to see If we limit ourselves to pairwise relations between
which ones are legal.
neighboring regions , we need merely compare each choice with previously made choices on
121
the path to this point and test each for legality. * The node expanded is remo,' ed rrom the
list of open nodes , the resulting new nodes are added , and the process is repeated until the
algorithm selects a goal node for further expansion. This is our final result , a legal scene
description having the minimum loss.
c. Examples
The rollowing examples serve to ilustrate the action of this scene-analysis procedure.
Consider first the simple scene shown in Figure B-3. For simplicity, we assume that there
are only five t.ypes of allowed regions- noor, wall, door, baseboard, and sign. Consider
Region 1. On the basis of its brightness, size, vertical right boundary, and possession or a
hole, it should receiye a high score as wall, and lower scores as noor, door , sign , and
baseboard , Region 2 might , perhaps, score highest as a door , and so on. Thus , the
following table or scores , although purely imaginary, is not unreasonable. Missing entries
correspond to scores too low to be seriously considered.
Base-
Ree:ion Floor Wall Door board Sisrn
*\Vhen an ilegality is found, that choice is deleted. One can argue that few relations are
strong as to be absolutely ilegal , and an alternative approach would be to introduce various
additional losses for the dirferent observed relations.
122
The following table gives equivalent information in terms of the losses associated with
cac II choice.
Base- Max
Reg ion Floor Wall Door board Sign Score
Let. us use our t.ree-searching algorithm to obtain the minimum- loss, legal description of
tbis scene. Initially the successor function is unconstrained by neighbor restrictions , and
selects R('gion 2 merely because it has the highest core. At this point , all of the choices
for Region 2 are legal, and the tree has three open nodes; the numbers shown next to each
node gi,' e the loss accum ulated in reaching that par of the tree.
The search algorithm requires that the open node having the least loss be expanded next
w hie h corresponds to tentatively callng Region 2 a door. The successor function finds
only one neighbor to choose from, Region 1, and considers its alternatives: wall , noor,
and door. None of these choices is a legal neighbor surrounding Region 1 , and hence all
are rejected. Thus , this open node has no successors.
123
T A-8259-
124
Ret.urning to the choices for open nodes, Region 2 is tentatively called a sign. The
successor function again selects Region 1, and this time finds one legal successor , the
wall.. The loss associated with this choice is 0, and the overall loss is 2. The list of open
nodes stil cont.ains two members.
The search algorithm selects the open node with loss 2 , and the successor function has
only Region 3 to select from. All of the choices for Region 3 are all legal with respect to
Note that our successor (unction wil always produce a tree with R+ 1 levels. At any level , the
same region wil always be selected by the successor function. The actual successors , however
wil be limited by the legality requirement.
125
calling Region 2 a sign and Region 1 a wall. The least loss results from callng Region 3 a
door, and the scene analysis is completed.
A somewhat more realistic example involving 10 regions and 14 region types is ilustrated
in Figure B- 4. Table 8-3 gives the hypothetical scores. Based on these scores alone , half
or the regions would be incorrectly identified. Figue 8-5 shows the tree produced by the
search algorit. hm. The development or this tree is too complicat.ed to describe in detail. It
should be noted, however, that considerable backtracking occurred because a low..scoring
t.hird choice was needed for Region 8 , the doorknob. Whether or not this can be
circumvented without causing other problems is not known.
D. Remarks
To date , t.his procedure has only been used on some hypothetical examples. We have
modified a general tree-searching program to adapt it to some special characteristics or
this problem. However , we have not started the important task of writing programs to
measure characteristics of regions and to use these characteristics to produce recognition
scores.
In addition , we have not implemented any legality conditions beyond the simple conditions
given in Table B-2.
126
T A-8259-27
127
REGION
TYPE
FLOOR
WALL
DOOR
DOOR JAMB
OBJECT FACE
BASEBOARD
BASEBOARD
REFLECTION
SIGN
WINDOW
CLOCK
DOORKNOB
THERMOSTAT
POWE R OUTLET
AUTOMA TON
TA- 8259-29
128
'RUNED . SEQUENCE
NODE NUMBER
2-3
REGIONS IN
CONFLICT
129
This approach to scene analysis has several potential advantagcs. It is not nccessary to
identify every region correctly at the outset to obtain a corrcct analysis , provided that thc
syntactic " complete. By providing a limit on thc allowable loss , a
rules are sufficiently
part.ial scenc description can be obtained that may be useful even though incomplct.e.
Perhaps most. important , the operations of merging, reature extraction , classification , and
analysis are clearly separated, allowing fairly independent modification and improvement.
In particular , the general knowledge about the environment can be expressed explicitly as
rules for legal scenes, and if the environment is changed it is possible to confine the
program changes to modifying these rules.
One of the major problems with this approach is the lack or an obvious way to detect
erroneous regions, regions that are fragments of or combinations of meaningful regions.
\Ve are currently working on this problem, since progress toward its solution is needed
before implementation or this system can be begun. Another problem is that it is not
clear how specific inrormation contained in the model caD be used to guide the analysis.
This problem of working in a world that is neither completely known nor completely
unknown is one of the major unsolved problems in visual scene analysis.
m. Landmark Identincation
\Vhen the environment is completely known, the visual system can provide feedback to
updat. e the automaton s position and orientation. The x- y location or the automaton and
it.s orientation can be determined uniquely from a picture or a known point and line
lying in t.he floor. * Such distinguished points and lines serve as landmarks for the
automaton. This section describes our present program that uses concave corners , convex
corners , and doorways as landmarks to update position and orientation.
A flowchart outlining the basic operations or this program is shown in Figure B- 6. The
program begins by selecting a landmark from the model that should be visi ble from the
automaton s present position; if more than one candidate exists , one is selected on the
basis of range and the amount of panning of the camera required. * The camera is then
panned and tilted the amount needed to bring the landmark into the center of the field of
IC no landmark is in view , a suitable message is returned together with a suggested vantage point
from which a landmark can be seen. This is one of several " error " returns that can be obtained
selected.
from the program. The program can also be asked to select a specific landmark , or a landmark
differ nt from the ones previously
130
view , and a picture is taken. The baseboard-tracking routine described previously (8) is
used to find the segments of baseboard in the picture and t.o fit them with long straight
lines.
Exactly wbat happens next depends on the landmark type. For a door, the long line
nearest the center of the picture is selected , and the true image of the landmark is
assumed t.o be the endpoint of the baseboard segment on that line and nearest the center
or the picture. An additional check is made to see that the gap from that point to the
next segment is long enough to be a passageway. A convex corner viewed from an angle
suc h that only one side is visible is treated as if it were a door. Otherwise, the
intersection or long lines nearest the center or the picture is assumed to be the true image
of the landmark, and a check is made to see that the baseboard segments near this point
have the right geometrical configuation. The location of the landmark in the picture
gives the information needed to compute corrections for the automaton s position and
orientation.
The operat.ion of this program is ilustrated in Figue B-7. In this experiment , the
automat.on was approximately 7. 5 reet away from a wall along which there were four
landmarks. both sides of a doorway, a convex corner , and a concave corner. The pictures
in Figure B-7. show how closely the panning and tilting brought the landmarks to the
center of tbe pict.ures. For scenes as clear as these, the program operates very reliably.
Presentl). , we can use this routine to locate the robot with an accuracy of between 5
percent a.nd 10 percent of the range , and to fix its orientation to within 5 degrees. Since
the errors are random , the accuracy can be improved further by sighting a second
landmark. Furt.her increases in accuracy, ir needed, will have to be obtained by
improving t.he t.ilt and pan mechanism for the camera.
131
SELECT MOST
CONVENIENT
LANDMARK
F ROM MODE L
TAKE PICTURE.
TRACK BASEBOARD.
AND F IT WITH
LONG LINES
FIND
Yes FIND LONG
INTERSECTION
LINE NEAREST
OF LONG LINES LANDMARK
NEAREST LANDMARK
FIND TRACK
Concae Convex ON THAT LINE
NEAREST
LANDMARK
UPDATE
POSITION
T A-8259-
132
(8) RIGHT DOOR Cb) LEFT DOOR
Figure 7: LANDMARKS
133
REFERENCES
Interim Report , prepared Cor Rome Air Development Center, GrifCiss Air
Force Base , New York , under contract AF 30(602)-4147 , SRI Project
5953, Art.ificial Intellgence Center, SRI International , Menlo Park,
California (May 1968). NTIS access number 80-841 509.
135
IGI Nilsson , Nils J.
Rcuarch on Intelligent Automata First Interim Report , prepared
for Home Air Development Center , GriCCiss Air Force Base , New York
under contract F30602- 69-C- 0056, (ARPA Order No. 1058 , Amendment 1),
SRI Project 7494 , ArtiCicial Intellgence Center , SRI International
Menlo Park , CaliCornia (February 1969). NTIS access number AD- A140279
and R. A. Yates.
Research and Applications Interim
Artificial Intelligence,
136
1111 Haphael , B. , R. O. Duda, R. E. Fikes , P. E. Hart , N. J.
Nilsson , P. W. Thorndyke and B. M. Wilber.
Rruarrh and Applications Artificial Intelligence Final Report
Munson , .John H.
A L/SP- FOFrRAN- MACRO Interface for the PDP- l0 Computer, Technical
Note 16 , Artificial Intellgence Center , SRI International , Menlo
Park , Calirornia (November 1969).
(14) Green, C.
(Fall , 1970).
137
Duda , R. 0. , and P. E. Hart.
Pattern Recognition and Scene Analysis New York: John Wiley and Sons
(1973).
(20) Sacerdoti , E. D.
A Structure for Plans and Behavior New York: Elsevier (1977).
(211 \Vilkins, D.
Domaina indepcndent Planning: Representation and Plan Generation,
(241 Fikcs , R. E.
Alont'ored Execution of Robot Plans Produced by STRIPS Technical
Note 55 , Artificial Intellgence Center , SRI International , Menlo
Park , California (April 1971). Also appeared in Proceedings IFIP
Congress , Ljubljana, Yugoslavia (August 23-28 , 1971).
138
Coles , L. Stephen.
Talking with a Robot in English Proceedings of the
Intf;rnat ional Joint Conference on Artificial Intelligence
Fredrick8on, T. R.
Duda , R. O.
Some Cu.rrent Techniques for Scene Analysis Technical Note 46
Artificial Intellgence Center , SRI International, Menlo Park , California
(October 1970).
!:H! Guzman , A.
Decomposition of a Visual Scene into Three-Dimensional Bodies
Proceedings F JCC pp. 291- 304 (December 1068).
139