SPSS Syntax Guide for Large Files
SPSS Syntax Guide for Large Files
zip
Introduction .......................................................................................................................................................................... 2
Why use Syntax? ........................................................................................................................................................... 2
Pasting Syntax ............................................................................................................................................................... 2
Reading Data from External Files ........................................................................................................................................ 3
Working with Syntax ............................................................................................................................................................ 4
Running Syntax ............................................................................................................................................................. 4
Saving Syntax ................................................................................................................................................................ 4
Other sources of syntax ................................................................................................................................................. 4
Using Saved Syntax ....................................................................................................................................................... 4
Useful Options ...................................................................................................................................................................... 5
Defining Variable Properties ................................................................................................................................................ 5
Using menus to Paste Syntax......................................................................................................................................... 5
Modifying syntax .......................................................................................................................................................... 6
Syntax Rules .................................................................................................................................................................. 7
Checking the Data ................................................................................................................................................................ 7
Corrections with Syntax ....................................................................................................................................................... 7
Transformations .................................................................................................................................................................... 8
Split Files ............................................................................................................................................................................ 10
Reuse Syntax for Similar Analyses .................................................................................................................................... 10
Split File Off ....................................................................................................................................................................... 10
Chart Builder vs. Legacy Dialogs ....................................................................................................................................... 11
Chart Templates .................................................................................................................................................................. 14
Date Variables .................................................................................................................................................................... 15
Pivot Tables ........................................................................................................................................................................ 15
Export Results .................................................................................................................................................................... 15
Appendix: A Complete Syntax File .................................................................................................................................... 16
Most things you do in SPSS Statistics have an associated set of commands (syntax). You can begin
learning syntax by looking at the syntax generated by your menu choices. The generated syntax is visible in
your output window:
If you need to do the same (or very similar) analysis many times, you can do it more quickly and with
fewer opportunities for errors by using syntax. Once you have written and checked that the syntax is doing
what you want, you can run it repeatedly, in this or future SPSS Statistics sessions, without having to make
any menu choices. You can make minor modifications without having to go through the menus again. The
syntax can include a whole series of procedures.
Syntax is not dependent on your operating system; thus, analyses started on one machine can easily be re-
created and continued on another machine by transferring the syntax and data files.
Finally, some advanced features are not available from the menus. For example, General Linear Models
can do factorial and repeated measures models from the menus. For nested models, you must use syntax.
Pasting Syntax
The easiest way to begin using syntax is to complete all your choices and options from the menus, and click
Paste. A Syntax window will be opened, showing the syntax generated by the choices you made.
Most SPSS Statistics dialog boxes have a Paste button. Using Paste, you can create a syntax file for your
entire analysis.
The Define Variable Properties dialog can be used to Paste Syntax for defining missing values, variable
labels, value labels, formats, and other variable attributes. Changes made directly in the Variable View
window do not generate syntax, so should be avoided if you are trying to collect syntax.
Observe that the variables are not delimited by blanks (or anything else). Also, some missing information
has been left blank. Therefore, the data cannot be read as freefield. However, since the variables are
aligned in columns, we can use Fixed Format to read this data.
Choose File/Read Text Data. Go through the Import data wizard dialogs, filling them in as follows:
Look In: Folder where you saved the file. For the workshop, select Desktop →SPSSwork
Filename: gss93.dat
Files of type: Text (*.txt, *.dat)
Click Open
Does your text file match a predefined format? No. Click Next
How are your variables arranged? Fixed width
Are variable names included at top of your file? No. Click Next
First case of data begins on which line? 1
How many lines represent a case? 1
How many cases do you want to import? All the cases. Click Next
The vertical lines in data represent breakpoints between variables? Click between the
numbers to separate variables, using the variable location information above. Click Next
Specification for variable selected in data preview? Click a column heading in the data preview
to select it, type the variable name for it in the Variable Name box, using the names listed above.
Repeat for each columns. Click Next
Would you like to save the file format? No
Would you like to paste the syntax? Yes. Click Finish
A Syntax window opens, with the SPSS Statistics command equivalent of the menu choices you made.
Biostatistics Consulting -3-
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Working with Syntax
Running Syntax
To run all or part of the commands in the Syntax window:
From the SPSS Statistics Syntax editor menu → Run
Choose:
All (to run everything in the syntax window)
Selection (runs only highlighted syntax)
To end (runs syntax from cursor position to end)
Saving Syntax
In order to use the same syntax in a future SPSS Statistics session, you need to save it.
From the SPSS Statistics Syntax editor window → File → Save As
→
Save in: choose a drive or directory
(for the workshop, use Desktop SPSSwork)
Filename: gss93
Save as type:SPSS Statistics(*.sps)
Finally, if all else fails, all of the syntax from each SPSS Statistics session is saved in the SPSS journal file,
statistics.jnl. The name and default location of this file varies depending on your version of
Windows, and your version of SPSS Statistics. Look in Edit → Options → File Locations (or General)
to see where it is on your system.
If you want to extract syntax for a session from the journal, you must do this promptly, as it will become
increasingly difficult to identify as syntax from later sessions is appended. Alternatively, if you have set
your journal option to overwrite, the syntax from a session is destroyed as soon as you re-open SPSS
Statistics. Open the journal file with any text editor (e.g.Wordpad, Notepad, etc), and copy the syntax you
want. Save the extracted syntax to your personal directory (folder), and add some comments so you can
associate the commands with the corresponding output file. In line with SPSS Statistics conventions, the
syntax file should have the .sps extensions. To add comments without disturbing the syntax, write the
word Comment at the left margin. Then type any text you want. You can continue on as many lines as you
need. The comment should not have any internal periods, but must end with a period.
Using the journal is a last-ditch solution, as you will likely need to be very selective in what you take from
it, and may need to do considerable editing.
Under Data, you can set the 100-year interval SPSS Statistics will assume if you enter a date using only
two digits for the year.
You can also choose from a variety of standard table and chart formats, what is displayed or hidden in the
Output, and many other features. When you make changes to Options, your latest choices are saved, and
will remain in effect for future sessions.
In the Define Variable Properties dialog, select MARSTAT, DEGREE to JAZZ to Scan. (Shift-Click or
Control-Click are easy ways to select many variables in a dialog.) Click Continue. Define the Missing
Values, Value Labels, Type, Measurement Level, Width and Decimals for each of MARSTAT to
RELIG. Do NOT press OK after each variable definition. Simply click another variable and proceed with
its definition. When you get to CAPPUN, after defining its properties, under Copy Properties, click To
Other Variables. Select GUNLAW, SEXEDUC, LETDIE, and click Copy. Similarly, define the variable
properties for the first variable from each of the other groups, and copy it to all the other variables in the
group. When ALL variables properties are defined, click Paste.
*cappun.
VARIABLE LABELS cappun 'Capital Punishment'.
MISSING VALUES cappun ( 0, 8, 9 ).
VALUE LABELS cappun
1 'Favor'
2 'Oppose' .
A similar MISSING VALUES and VALUE LABELS command is generated for each variable in the
group. The VARIABLE LABELS is not copied to the other variables, but we can use the above as a model
for adding VARIABLE LABELS to any additional variables that need them.
Modifying syntax
The generated syntax contains an identical set of MISSING VALUES and VALUE LABELS for each
variable in each group, which makes it rather lengthy. You can often write much more compact code by
generating the definition for the first variable in each group, and modifying the code to include other
variables. For example the code for CAPPUN to LETDIE could be shortened to:
Notice that the MISSING VALUES and VALUE LABELS commands both accept lists of variables.
Groups of contiguous variables can be specified using the keyword TO; non-contiguous variables can be
listed separated by blanks or commas. You can even specify variables with different definitions on a
single MISSING VALUES or VALUE LABELS command, by separating them with a slash. The complete
set of MISSING VALUES and VALUE LABELS might look like this:
Upper/lower case is strictly cosmetic. You can use either apostrophes (') or quotes (") in label definitions,
provided the label does not itself contain these characters – for example the label "don't know" requires
quotes. Indentation is used to make code easier to read – it is not required.
Select the MISSING VALUES, VAR LABELS, VALUE LABELS and FORMAT commands and Run
them. Check your Output window for any error messages, and look in the Variable View window to
confirm that all variables have the correct properties. When you are done, your syntax file should contain
everything you need to read the gss93 text file and define all essential variable properties – GET DATA,
MISSING VALUES, VAR LABELS, and VALUE LABELS commands.
If you Copy and Paste a command, be sure to start the selection at the beginning of the command (where it
is at the left margin), and go to the period at the end. (Pasted syntax usually has the period on a line by
itself.) Then you can make modifications between these two points, keeping all intermediate lines indented,
and leaving any slashes where they are. Use the dialogs to generate the Syntax as much as possible, so you
can use it as a model. If in doubt, you can check the precise syntax of any command in Online Help.
Run Frequencies on all the variables except ID and Age. When you’ve filled out the dialog, click Paste.
Observe that the Frequencies command is added to your Syntax window, but the command is not run. As
before, pasted syntax is considerably more verbose than necessary. It particular, each selected variable is
named. If we were to write our own FREQUENCIES command we would use the TO convention to select
contiguous variables:
To run the Frequencies from the Syntax window, place the cursor anywhere within the Frequencies
command. Notice the blue triangle marking the line with the cursor. If the command takes more than one
line, an indicator shows the beginning and end of the command. Choose Run/Selection (or click the right
facing triangle on the toolbar).
Save the syntax file as gss93.sps, and the data file as gss93.sav.
In the gss93.sps syntax window, go to the GET DATA command, and change the column locations of all
the variables, starting with DEGREE 7-7. Note that SPSS Statistics starts numbering positions at 0, rather
than 1. As a result, all variable locations are one less than on the codebook. Insert CLASSICL 23-23
between MUSICAL and FOLK.
Look through the remaining commands to see whether these corrections have further ramifications. Note
that the MISSING VALUES, VAR LABELS, and VALUE LABELS are correct for all previously defined
variables. Add a VAR LABEL for CLASSICL. If you defined MISSING VALUES and VALUE
LABELS using the list BIGBAND TO JAZZ, the newly defined variable CLASSICL is included in those
commands; otherwise you'll need to add MISSING VALUES and VALUE LABELS for the new variable.
The Frequencies command generated by Paste lists variables individually. You need to add CLASSICL to
that list in the desired location, or change the list to the more compact version DEGREE TO JAZZ. When
you are finished, save the revised syntax file, and choose Run/All. If the Frequencies output now shows
that the data is correct, Save the datafile, gss93.sav, and the modified syntax file, gss93.sps.
Transformations
Even if you don't need to use syntax for most purposes, if you create or recode variables or do any
substantive data transformations, you should save the syntax. Without it you will have a data file with a
bunch of variables without a record of how they were created, nor any evidence that they are what they
were intended to be. It will be impossible to do any troubleshooting, should the need arise. Furthermore, if
for any reason you need to start again, the saved syntax can save you much time and grief.
The gss93 survey contains four variables that ask the respondents' view on some social issues. A response
of "Favor" to three of the questions, GUNLAW, SEXEDUC, LETDIE, and a response of "Oppose" to
CAPPUN is considered to be at the "socially liberal" end of the spectrum. The opposite set of responses
will be considered "socially conservative". Our task it to create an "index" variable that will score each
respondent on a social outlook scale.
We need to first make sure the four social issue questions are coded in the same "direction"; three of the
four have codes that can be interpreted as 1 for "liberal" and 2 for "conservative". The codes for CAPPUN
need to be reversed. When the four questions are coded in the same direction, counting the number of 1's
for each respondent will give a 0-4 score, with 0 being most "conservative", and 4 most "liberal".
Select and Run the syntax segment that creates the new variable CAPPUNR. Run Frequencies on both
CAPPUN and CAPPUNR to check that the coding has been reversed as intended and all missing values
and labels are correct.
When CAPPUNR has been correctly defined you are ready to create the "index" that counts the number of
"liberal" responses to the four questions. On the Data View screen menu:
→
Select Transform Count Values within Cases
Type a name for the new variable in Target Variable: LIBSCORE
Type a label in the Target Label box: Liberal Social View Score
Select the four variables: GUNLAW, SEXEDUC, LETDIE, CAPPUNR. Do NOT
select CAPPUN!
Click the arrow to move the selected variables into the Numeric Variables box.
Click Define Values.
On the left side, select Value and type 1 into the Value box.
Click Add to move it to the Values to Count box.
Click Continue.
Click Paste.
Add VALUE LABELS for the new variable, LIBSCORE, to define 0 as "Most Conservative" and 4 as
"Most Liberal". Select and run the newly added syntax.
You now have a complete syntax file for reading the gss93.dat data file, defining new variables, and setting
up missing values and labels for all variables. You can do a bit of clean-up to remove extraneous
commands (if you pasted some things you didn't intend) put things in sensible order and add some
comments. You can safely remove all but the last EXECUTE. Save the final syntax file, gss93.sps.
→
You could accomplish this by using Data Select Cases to select the males and run the analysis, then
select the females and re-run the same analyses. But it would be quicker and simpler to use Split Files. On
the Data Editor menu:
To test for differences in age among the 5 values of the liberal beliefs score, select
→
Analyze Compare Means Oneway ANOVA.→
Select AGE; Click the arrow to move it to Dependent List
Select LIBSCORE; Click the arrow to move it to Factor.
Click Options.
Check Descriptives.
Click Continue.
Click Paste.
Now Select and Run these commands, beginning with Split File. Note that the two graphs have different
scales, and are underneath each other so they are not easily compared visually. You can get better visual
comparisons by turning Split Files Off and using Column Panels or Clustering.
Biostatistics Consulting - 10 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Chart Builder vs. Legacy Dialogs
SPSS Statistics has two distinct set of graphing routines, Chart Builder and Legacy Dialogs. Although
there is some overlap between their functionality, each offers some choices not available in the other.
Where they overlap there is no general rule for which procedure to use. A few observations:
• To get side-by-side graphs for visual comparison, use
o Chart Builder with a Columns Panel variable
o Legacy Dialog with a COLUMN variable
• To get line charts with proportional axis for scale variables with unequal spacing, use Chart
Builder. See example 1.
• Legacy Dialogs can create multiple-variable clustered bar charts. Chart Builder does not have this
feature. See Example 2.
• Chart Builder lets you choose the base for computing percentages. Legacy Dialogs does not. See
Example 3.
You may have to experiment with both procedures to see which is better for any given situation.
Notice that LIBSCOR2 is a scale variable. Now let's make a line graph of average AGE against
LIBSCOR2, comparing Legacy Dialog and Chart Builder results. (The same is true for Bar graphs, but
there is less of an expectation for bar spacing to reflect a scale.) From the Graph menu:
Legacy Dialogs → Line
Select Simple
Select Summaries for Groups of Cases
Click Define
Select Other Summary Function
Select Age
Click the arrow to put Mean(Age) into the Variable box
Select Libscor2
Click the arrow to move it to the Category Axis box
Click Paste
Run the generated GRAPH command. Observe that the values of 0,1,4,9, and 16 are equally spaced on the
x-axis, even though the measurement level of LIBSCOR2 is Scale. If the values of the x-variable have any
quantitative meaning you would likely want the x-scale to reflect the magnitude of the x-variable.
Run the generated commands. The x-axis is now proportional to the values of LIBSCOR2.
Re-do the graph, but this time right-click Libscor2 and select Ordinal, rather than Scale. Compare the
resulting graphs for the two commands.
Biostatistics Consulting - 11 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Example 2: Multi-variable clustered bars.
We would like to make a graph comparing the percent in favor of the 4 social issue questions among males
to the percent in favor among females. Using Legacy Dialogs,
Chart Builder does not have a feature for displaying several variables on one chart.
Let's take another look at percent in favor of gun control laws, comparing the two genders' opinions.
Observe that when we choose "% of Cases" we get no information on (or control over) what the base for
the percents is. In fact, what we get is percent in favor or opposed to gun control laws, within gender.
Suppose we want to look at "what percent of those in favor or opposed are male/female"? Using Legacy
Dialogs, we cannot do this without reversing the roles of the "category axis" and "cluster by" variables.
Biostatistics Consulting - 12 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Graphs →Chart Builder
Drag Clustered Bar to the Preview area
Drag GUNLAW to the x-axis.
Drag Sex to Cluster On box.
In the Element Properties Dialog:
Select Bar1
Under Statistics, select Percentage(?)
Click Set Parameters
Select Total for Each Legend Variable (to get %
favor/oppose within gender)
OR… Select Total for Each X-Axis Category (to get %
male/female within opinion
Click Continue
Click Apply
Click Titles/Footnote
Click Title1 checkbox
In the Element Properties contents box, enter a title
Click Apply
Click Paste
Run the generated GGRAPH and GPL commands. Compare the results to the previous Graph. If, in Set
Parameters, you chose "Total for Each legend Variable", the graph should be identical to the one you got
from legacy Graphs. If you chose "Total for Each X-Axis Category", the two graphs will be quite different.
Biostatistics Consulting - 13 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Chart Templates
While the content of a chart is largely controlled by syntax, many of the visual design aspects can be
standardized using " Chart Templates". Use Chart Templates to apply a consistent set of colors, fonts,
symbols, etc. to your graphs.
When you apply a chart template to a different chart style, only those properties of the template that make
sense for the new chart are used. For example, if you apply a template you create for a bar graph to a pie
chart, only the colors in the template are used. Axes scales and tick marks do not apply to a pie chart.
To create a chart template, edit a chart to set the elements you wish to standardize:
Double-click the last clustered bar chart you made. The Chart Editor opens:
Double-Click the Y-Axis. In the Properties dialog
Select the Labels and Ticks tab.
Under Minor ticks, Check Display Ticks
For 'Number of Minor Ticks', enter 3.
Click Apply.
Select the Scale tab.
Under Range, uncheck Auto boxes, set Minimum=0,
Maximum=100, Major Increment=20
Click Apply.
In the chart, Click the Female legend box
In the Properties dialog, under Fill & Border
Click Fill
Choose a different Color for Female bar
Click Apply
Close the Properties dialog
Under Options
Select Transpose Chart
→
Select Show Grid Lines
From the Chart Editor menu, select File Save Chart Template
Select the properties of the template to be saved:
Layout, Styles, Axes (Do not check Text Content)
Click Continue
Choose a folder and filename (the extension is .sgt)
Click Save
Close the Chart Editor.
NOTE: Templates made from Legacy Dialogs graphs may not apply well to Chart Builder graphs, and vice
versa.
Biostatistics Consulting - 14 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Date Variables
Open the SPSS Statistics file dates.sav. It has three variables. AVGTEMP is the average temperature
during the time period from STARTDAT to ENDATE. Observe in Variable View that STARTDAT and
ENDATE are defined as type DATE.
Sort the data by STARTDAT. From the Data View menu, choose Data/Sort Cases. Select STARTDAT
and move it to the Sort By box. Click OK.
Calculate the number of days in each time period. SPSS Statistics stores dates as number of seconds from
Jan 1, 1900. Try changing one of the Date variables to Numeric and you will see the underlying number.
Change it back to type DATE before proceeding. To calculate elapsed time in days, we need to use the
formula DAYS=(ENDATE-STARTDAT)/(60*60*24). Use this formula in Transform/Compute to
calculate the number of days in each time period.
Make plots of AVGTEMP against STARTDAT using Legacy Dialogs and Chart Builder and compare the
results. For Legacy Dialogs, select Line/Simple/Values of Individual Cases/Define. Put AVGTEMP in
the Line Represents box. Under Category Labels, click Variable, and move STARTDAT into the Variables
box. For Chart Builder, select Line. Drag AVGTEMP to the Y-axis, and STARTDAT to the X-axis.
Observe that the two graphs look quite different. Why??
Pivot Tables
SPSS Statistics Table output is in the form of Pivot Tables. This means you can re-arrange the rows and
columns to suit your fancy.
Double-click the Descriptives table from the Oneway output. The slashed border shows that the table is
now in edit mode, and the Pivot item appears on the menu.
From the Pivot Menu, select Pivoting Trays. Note the icons in the Row, Columns and Layers.
Drag the icon representing Groups from Rows to Columns.
Drag the Statistics icon from Columns to Rows. The table is re-arranged accordingly.
Close the Pivot Tray.
In the table, double-click Mean. Change it to Average Age.
Drag the pointer across the row of average ages to select them.
Right-click the selected average ages, select Cell Properties.
Under the Format Value tab, change the number of decimal places.
Export Results
SPSS Statistics can Export output to Word, Excel, pdf, and other formats. If you don't want the entire
SPSS Statistics output exported, use the left panel in the Output Viewer window to select the table(s)
and/or charts to export. Use ctrl-click to select non-contiguous objects, or shift-click to select many
contiguous objects. Select:
→
File Export
Under Export: Select Output Document
Choose: All, Visible Objects or Selected Objects.
Under File Type: Select Word/RTF file (*.doc)
Click Browse: Select a folder and name for the Word file.
Click: OK
Biostatistics Consulting - 15 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc
Appendix: A Complete Syntax File
Here is the complete syntax file to do most of the tasks described in this document:
Comment Reverse code CAPPUN and make LIBSCORE the sum of "liberal" responses.
RECODE cappun (1=2) (2=1) (MISSING=Copy) INTO cappunr .
VARIABLE LABELS cappunr 'Capital Punishment Reversed'.
COUNT libscore = cappunr gunlaw sexeduc letdie (1) .
VARIABLE LABELS libscore 'Liberal Views Score' .
VALUE LABELS cappunr 1 "Oppose" 2 "Favor"/ libscore 0 'Most Conservative' 4 "Most Liberal".
EXECUTE .
FREQUENCIES
VARIABLES=marstat degree sex ethnic relig to jazz .
Biostatistics Consulting - 17 -
University of Massachusetts School of Public Health
C:\Word\documentation\SPSS\SPSS Syntax.doc