KEMBAR78
Stata Programming Tools | PDF | Variable (Computer Science) | Control Flow
0% found this document useful (0 votes)
36 views9 pages

Stata Programming Tools

Herramientas básicas de programación con Stata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

Stata Programming Tools

Herramientas básicas de programación con Stata
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Stata Programming Tools

This article will introduce you to many Stata programming tools that are not needed by everyone but
are very useful in certain circumstances. The intended audience is Stata veterans who are already
familiar with and comfortable using Stata syntax and fundamental programming tools like macros,
foreach and forvalues.
If you are new to Stata, our Stata for Researchers will teach you basic Stata syntax, and Stata
Programming Essentials will teach you the fundamental programming tools. Unlike this article, they do
not assume any Stata or general programming experience.

Topics discussed in this article include:

Compound Double Quotes


Capture
Variables in Scalar Contexts
Advanced Macros
Branching If
While Loops
Programs

If you need to learn about a specific topic, feel free to skip to it. Some of the examples use material
covered previously (i.e. a program containing a branching if) but the explanations are self-contained.

Compound Double Quotes


The beginning and ending of a string are normally denoted by double quotes ("string"). This causes
problems when the string itself contains double quotes. For example, if you wanted to display the string
Hamlet said "To be, or not to be." you could not use the code:
display "Hamlet said "To be, or not to be.""

The solution is what Stata calls compound double quotes. When Stata sees `" (left single quote
followed by a double quote) it treats what follows as a string until it sees "' (double quote followed by
a right single quote). Thus:

display `"Hamlet said "To be, or not to be.""'

Capture
On occasion you may expect a command to generate an error under certain circumstances but choose
to ignore it. For example, putting log close at the beginning of a do file prevents a previously
opened log from interfering with your do file, but generates an error if no log is open--even though not
having a log open is exactly the situation you want to create.

The capture prefix prevents Stata from halting your do file if the ensuing command generates an
error. The error is "captured" first. Thus:
capture log close

will close any open log but not crash a do file if no log file is open.

The capture prefix should only be used when you fully understand the error the command
sometimes generates and why it does so--and you are very confident that that error can be ignored.

When a command is complete it stores a "return code" in _rc. A return code of zero generally means
the command executed successfully. You can use capture followed by a branching if based on the
return code to have Stata do different things depending on whether an error occurred or not.

Variables in Scalar Contexts


list x you get a list of all the values of
A Stata variable is a vector: it has many values. If you type
x. However, some contexts call for a scalar. For example, the display command displays just one
thing. If a variable is used in a scalar context, the value of the variable for the first observation is used.
Thus if you type:

display x

you'll get just the first value of x, as if you'd typed:

display x[1]

We suggest not taking advantage of this behavior, because it makes for confusing code. If a command
calls for a scalar and you want that scalar to be the first value of x, type x[1] rather than just x.
This behavior can cause real problems if you don't realize a particular context calls for a scalar, as you'll
see in the section on branching if.

Advanced Macros
This section will discuss many tricks you can use to define macros. Type help local for even more.

Storing Lists of Values with levelsof


The levelsof command lists the valid values of a variable and stores them in a local macro. The
syntax is:

levelsof variable, local(macro)

It is frequently used with an if condition to store those values of a variable that appear in a given
subsample.

The following example (taken from Stata Programming Essentials) uses levelsof and a foreach
loop to run a survey-weighted regression separately for each race. This recreates the functionality of
by: even though by: can't be used with svy:
levelsof race, local(races)
foreach race of local races {
display _newline(2) "Race=`race'"
svy, subpop(if race==`race'): reg income age i.education
}

Since levelsof doesn't include missing values in its list, the above code will not run a regression for
observations with a missing value for race. This is different from by:, which treats observations with
a missing value of the by variable as just another group to act on.

Suppose you asked individuals to list the social groups they belong to. The groups are encoded as
numbers and stored in variables group_i where i is an integer. Thus if a person said she belongs to
group five and group three (in that order) her value of group_1 would be 5 and her value of
group_2 would be 3. (If she lists fewer groups than the maximum number allowed, she will have
missing values for some group_i variables.) You then want to create an indicator variable for each
group which has a value of 1 if the person said she is a member of that group and zero otherwise:

foreach groupvar of varlist group_* {


levelsof `groupvar', local(groups)
foreach group of local groups {
capture gen mem_`group'=0
replace mem_`group'=1 if `groupvar'==`group'
}
}

This loops over the group_i variables, figures out which groups were listed in each one, loops over
them, creates an indicator variable for each group, and sets the indicator to 1 if the person listed that
group.

Note that you don't need to know ahead of time how many groups a person can list or the numbers of
the groups that can appear. The gen command has capture in front of it because the indicator for a
given group may have already been created: if someone listed group 5 as his first group and someone
mem_5 will be created when processing group_1 and trying
else listed group 5 as his second group,
to create mem_5 again when processing group_2 gives an error--but you know it's an error you can
ignore.

Expanding Variable Lists with unab


The unab command "unabbreviates" a varlist ("expand" was taken) and puts the results in a macro.
The syntax is:

unab macro: varlist

For example, if you had variables x1, x2 and x3, then:


unab vars: x*

would create a local macro called vars and place in it x1 x2 x3. This can be useful for commands
that require a list of variables but cannot use varlist syntax, like reshape when going from long to
wide.

Lists of Files
You can place a list of files in a macro by sending local the output of a dir command. The syntax
is:

local macroname: dir directory files "pattern they must match"

Here macroname is the name of the macro you want to create, directory is the directory where the
files are located and pattern they must match is something like "*" for all files or "*.dta" for all
Stata data sets. For example, the following would put a list of all Stata data sets in the current working
directory in a macro called datafiles:
local datafiles: dir . files "*.dta"
One complication is that the file names are placed in quotes (so it can handle file names with spaces in
them). Thus to display them you have to use compound double quotes:

di `"`datafiles'"'

However, this doesn't cause problems if you want to loop over the list of files:

foreach file of local datafiles {


use `file', clear
// do something with the file
}

Formatting the Contents of a Macro


You can apply a format to a number before storing it in a macro by sending local the output of a
display command that includes a format. For example, the following command stores the R-squared
of the most recently run regression, e(r2), in a macro called r2, but using the format %5.4f so it
has four digits rather than sixteen.

local r2: display %5.4f e(r2)

See Including Calculated Results In Stata Graphs for an example that uses this tool.

Incrementing and Decrementing Macros


Macros frequently need to be increased by one and less frequently decreased by one. You can do so
with the ++ and -- operators. These go inside the macro quotes either before or after the macro
name. If they are placed before, the macro is incremented (or decremented) and then the result placed
in the command. If they are placed after, the current value of the macro is placed in the command and
then the macro is incremented (or decremented). Try the following:

local x 1
display `++x'
display `x--'
display `x'

You can use the increment or decrement operators in a local command to change a macro without
doing anything else, but be sure to put the operator before the macro's name. For example the
following does not increase x:
local x `x++'

The macro processor replaces the macro`x++' with 1. It then increases x to 2, but then when Stata
proper executes the command it sees local x 1 and sets x back to 1. The following does increase
x:
local x `++x'

Branching If
You're familiar with if conditions at the end of commands, meaning "only carry out this command for
the observations where this condition is true." This is a subsetting if. When if starts a command, it is a
branching if and has a very different meaning: "don't execute the following command or commands at
all unless this condition is true." The syntax is for a single command is:
if condition command

For a block of commands, it's:

if condition {
commands
}

An if block can be followed by an else block, meaning "commands to be executed if the condition is
not true." The else can precede another if, allowing for else if chains of any length:

if condition1 {
commands to execute if condition1 is true
}
else if condition2 {
commands to execute if condition one is false and condition2 is
true
}
else {
commands to execute if both condition1 and condition2 are false
}

Consider trying to demean a list of variables, where the list is contained in a macro called varlist
which was generated elsewhere and could contain string variables:

foreach var of local varlist {


capture confirm numeric variable `var'
if _rc==0 {
sum `var', meanonly
replace `var'=`var'-r(mean)
}
else display as error "`var' is not a numeric variable and cannot
be demeaned."
}

The command confirm numeric variable `var' checks that `var' is a numeric
variable, but when preceded by capture it does not crash the program if the variable is a string.
Instead, if _rc==0 { checks whether the confirm command succeeded (implying `var' is in
fact numeric). If it did, the program proceeds to demean `var'. If not, the program displays an error
message (as error makes the message red) and then proceeds to the next variable.

The condition for a branching if is only evaluated once, so a branching if is a scalar context and the rule
that only the first value of a variable will be used applies. In particular, a branching if cannot be used
for subsetting. Imagine a data set made up of observations from multiple census years where the data
from 1960 is coded in a different way and thus has to be handled differently. What you should not
write is something like:

if year==1960 { //don't do this!


code for handling observations from 1960
}
else {
code for handling observations from other years
}
Since this is a scalar context, the condition year==1960 is only evaluated once, and the only value
of year Stata will look at is that of the first observation. Thus if the first observation happens to be
from 1960, the entire data set will be coded as if it came from 1960. If not, the entire data set will be
coded as if it came from other years. This is not a job for branching if, it is a job for a standard
subsetting if at the end of each command.

The above code might make sense if it were embedded in a loop that processed multiple data sets,
where each data set came from a single year. Then year would be the same for all the observations in
a given data set, and the first observation could stand in for all of them. But in that case we'd suggest
writing something like:

if year[1]==1960 { // if year for the first observation is 1960 this


is a 1960 data set

Conditions for branching if frequently involve macros. If the macros contain text rather than numbers,
the values on both sides of the equals sign need to be placed in quotes:

foreach school in West East Memorial {


if "`school'"=="West" {
commands that should only be carried out for West High School
}
commands that should be carried out for all schools
}

While Loops
foreach and forvalues loops repeat a block of commands a set number of times, but while
loops repeat them until a given condition is no longer true. For example:

local i 1
while `i'<=5 {
display `i++'
}

is equivalent to:

forval i=1/5 {
display `i'
}

Note that i is increased by 1 each time through the while loop--if you left out that step the loop
would never end.

The following code is a more typical use of while:

local xnew 1
local xold 0
local iteration 1
while abs(`xnew'-`xold')>.001 & `iteration'<100 {
local xold `xnew'
local xnew=`xold'-(3-`xold'^3)/(-3*`xold'^2)
display "Iteration: `iteration++', x: `xnew'"
}
The above uses the Newton-Raphson method to solve the equation 3-x^3=0. The algorithm proceeds
until the result from the current iteration differs from that of the last iteration by less than .001, or it
completes 100 iterations. The second condition acts as a failsafe in case the algorithm does not
converge. Note that the initial value of xold is not used, but if it were the same as the initial value of
xnew then the while condition would be false immediately and the loop would never be executed.

Programs
A Stata program is a block of code which can be executed at any time by invoking the program's name.
They are useful when you need to perform a task repeatedly, but not all at once. To begin defining a
program, type:

program define name

where name is replaced by the name of the program to be defined. Subsequent commands are
considered part of the program, until you type

end

Thus a basic "Hello World" program is:

program define hello


display "Hello World"
end

To run this program, type hello.


A program cannot be modified after it is defined; to change it you must first drop the existing version
with program drop and then define it again. Since a do file run in an interactive session can't be
sure what's been defined previously, it's best to capture program drop a program before you
define it:

capture program drop hello


program define hello
display "Hello World Again"
end

Arguments
Programs can be controlled by passing in arguments. An argument can be anything you can put in a
macro: numbers, text, names of variables, etc. You pass arguments into a program by typing them
after it's program name. Thus:

hello Russell Dimond

runs the hello program with two arguments: Russell and Dimond. But arguments only matter
hello will completely ignore them.
if the program does something with them--the current version of

Programs that use arguments should first use the args command to assign them to local macros. The
command:

args fname lname

puts the first argument the program received in the macro fname and the second in the macro
lname. You can then use those macros in subsequent commands:
capture program drop hello
program define hello
args fname lname
display "Hello `fname' `lname'"
end

If you then type:

hello Russell Dimond

the output will be:

Hello Russell Dimond

Of course if you type:

hello Dimond Russell

the output will be:

Hello Dimond Russell

It's up to you to make sure the arguments you pass in match what the program is expecting.

The macro 0 is always defined, and contains a list of all the arguments that were passed into the
program. This is useful for handling lists of unknown length. For example, you could take the code you
wrote earlier for demeaning lists of variables and turn it into a program:

program define demean


foreach var of local 0 {
capture confirm numeric variable `var'
if _rc==0 {
sum `var', meanonly
replace `var'=`var'-r(mean)
}
else display as error "`var' is not a numeric variable and cannot
be demeaned."
}
end

To run this program, you'd type demean and then a list of variables to be demeaned:
demean x y z

syntax command, which makes it relatively easy to write a


You might also want to look at the
program that understands standard Stata syntax, but syntax is beyond the scope of this article.

Returning Values
Your program can return values in the r() or e() vectors, just like official Stata commands. This is
critical if your program is intended for use with bootstrap or simulate. To do so, first declare in
your program define command that the program is either rclass (puts results in the r()
vector) or eclass (puts results in the e() vector):

program define myprogram, rclass

When you have a result to return, use the return command. The general syntax is:
return type name=value

where type can be scalar, local or matrix, value is what you want to return, and name is what
you want it to be called. As a trivial example:

return scalar x=3

r(name) or e(name). Thus if you've just


When the program is complete, you can refer to the result as
run a program containing the above return command, typing:
gen var=r(x)

creates a variable var with the value 3.


For a more realistic example, see the last section of Bootstrapping in Stata.

Last Revised: 12/14/2010

©2012 UW Board of Regents, University of Wisconsin - Madison

You might also like