SYSTAT
T. KRISHNAN AND R.L. KARANDIKAR
Cranes Software International Limited
Mahatma Gandhi Road, Bangalore - 560 001
krishnan.t@systat.com
rkarandikar@gmail.com
1. Introduction
SYSTAT was designed for statistical analysis and graphical presentation of scientific and
engineering data. In order to use this tutorial, knowledge of Windows 95/98/2000/Nt/XP
would be helpful.
SYSTAT provides a powerful statistical and graphical analysis system in a new graphical
user interface environment using descriptive menus, toolbars and dialog boxes. It offers
numerous statistical features from simple descriptive statistics to highly sophisticated
statistical algorithms.
Taking advantage of the enhanced user interface and environment, SYSTAT offers many
major performance enhancements for speed and increased ease of use. Simply pointing
and clicking the mouse can accomplish most tasks. SYSTAT provides extensive use of
drag-n-drop and right click mouse functionality. SYSTATs intuitive Windows interface
and flexible command language are designed to make your research more efficient. You
can quickly locate advanced options through clear, comprehensive dialogs.
SYSTAT also offers a huge data worksheet for powerful data handling. SYSTAT handles
most of the popular data formats like, Excel, SPSS, SAS, BMDP, MINITAB, S-Plus,
Statistica, Stata, JMP, and ASCII. All matrix operations and computations are menu
driven.
The Graphics module of SYSTAT 11 is an enhanced version of the existing graphics
module of SYSTAT 10.2. This module has better user interactivity to work with all
graphical outputs of the SYSTAT application. Users can easily create 2D and 3D graphs
using the appropriate top tool bar icons, which provide tool tip descriptions of graphs.
Graphs could be created from the Graph top tool bar menu or by using the Graph Gallery,
which facilitate accomplishing complex graphs (e.g. global map with contour, 3D surface
plots with contour projections, etc.) with point and click of a mouse. Simply double
clicking the graph will bring up a dialog to facilitate editing most of graph attributes from
one comprehensive 'dynamic dialogue'. Each graph attribute such as line thickness, scale,
symbols choice, etc. can be changed with mouse clicks. Thus simple or complex changes
to a graph or set of graphs can be made quickly and done exactly as the user requires.
2. Getting Started with SYSTAT
2.1 Opening SYSTAT for Windows
To start SYSTAT for Windows NT4, 98, 2000, ME and XP, Choose
Start
All Programs
SYSTAT11
SYSTAT 11
SYSTAT
2
Alternatively, you can double-click on the SYSTAT icon , to get started with
SYSTAT.
2.2 User Interface
The user interface of SYSTAT is organized into three spaces:
I. Viewspace
II. Workspace
III. Commandspace
I. Viewspace has the following tabs:
Output Pane. Graphs and statistical results appear in the Output Pane. You can edit,
print and save the output displayed in the Output Pane.
Data Editor. The Data Editor displays the data in a row-by-column format. Each
row is a case and each column is a variable. You can enter, edit, view, and save data
in the Data Editor.
Graph Editor. You can edit and save graphs in the Graph Editor.
The Output Pane is fixed in the Viewspace, whereas the Data Editor and Graph Editor can
be moved to the Workspace and restored by double-clicking on the tab. The advantage is
that any two of these tabs can be viewed simultaneously.
SYSTAT
3
II. Workspace has the following tabs:
Output Organizer. The Output Organizer tab helps primarily to navigate through
the results of your statistical analysis. You can quickly navigate to specific portions
of output without having to use the Output Pane scrollbars.
Dynamic Explorer. The Dynamic Explorer can be used to rotate 3-D graphs, apply
power transformations to values on one or more axes, and change the confidence
intervals, ellipses, and kernels in scatter plots.
By default, the Dynamic Explorer appears automatically when the Graph Editor tab is
active.
III. Commandspace has the following tabs:
Interactive. In the Interactive tab, you can enter commands at the command prompt
(>) and issue them by pressing the Enter key.
Untitled. The Untitled tab enables you to run the commands in the batch mode. You
can open, edit, submit and save SYSTAT command file (.syc or .cmd)
Log. In the Log tab, you can view the record of the commands issued during the
SYSTAT session (through Dialog or in the Interactive mode).
You can cycle through the three tabs using the following keyboard shortcuts:
CTRL+ALT+TAB. Shifts focus one tab to the right.
CTRL+ALT+SHIFT+TAB. Shifts focus one tab to the left.
3. SYSTAT Data, Command and Output files
Data files. You can save data files with (.SYD) extension.
Command files. A command file is a text file that contains SYSTAT commands.
Saving your analyses in a command file allows you to repeat them at a later date.
These files are saved with (.SYC) extension.
Output files. SYSTAT displays statistical and graphical output in the output Pane.
You can save the output in (.SYO), Rich Text format (.RTF) and HyperText
Markup Language format (*.HTM).
4. The Data Editor
The Data Editor is used for entering, editing, and saving data. Entering data is a
straightforward process. Editing data includes changing variable names or attributes,
adding and deleting cases or variables, moving variables or cases, and correcting data
errors.
SYSTAT
4
SYSTAT imports and exports data in all popular formats, including Excel, ASCII Text,
Lotus, BMDP Data, SPSS, SAS, StatView, Stata, Statistica, JMP, Minitab and S-Plus as
well as from any ODBC compliant application. Data can be entered or imported in
SYSTAT in the following way:
Entering data
Consider the following data that has records about seven dinners from the frozen-food
section of a grocery store.
Brand$ Calories Fat
Lean Cuisine 240 5
Weight Watchers 220 6
Healthy Choice 250 3
Stouffer 370 19
Gourmet 440 26
Tyson 330 14
Swanson 300 12
To enter these data into Data Editor, from the menu choose:
File
Menu
Data
This opens the Data Editor (or clears its contents if it is already open).
Double-click (VAR00001) to open the Variable Properties dialog box.
.
SYSTAT
5
Type BRAND$ for the name. The dollar sign ($) at the end of the variable name indicates
that the variable is a string or a character variable, as opposed to numeric variable.
Note: Variable names cannot exceed 12 characters.
Select String as the Variable type.
Enter the number of characters in the Characters box.
From the Character drop-down list, select the desired number of characters.
Click OK to complete the variable definition for VAR00001.
To type CALORIES as Variable name, double-click (VAR00002) to open the
Variable Properties dialog box.
Select Numeric as the Variable type.
Enter the number of characters in the Characters box. [The decimal point is
considered as a character.]
Select the number of Decimal places to display.
Click OK to complete the variable definition for VAR00002.
Repeat this process for the FAT variable, selecting Numeric as the variable type.
Click the top left data cell (under the name of the first variable) and enter the data.
To move across rows, press Enter or Tab after each entry. To move down columns,
press the down arrow key.
Note: To navigate the behavior of the Enter key in the Data Editor.
From the menu choose:
Edit
Options
Data
SYSTAT
6
Click either of the two radio buttons below Data Editor Cursor.
Once the data are entered in the Data Editor, the data file should look something like this:
SYSTAT
7
For saving the data, from the menu choose:
File
Save As
Importing Data
To import IRIS.xls. (data of Excel format) from the menu choose:
File
Open
Data...
From the Files of type drop-down list, choose Microsoft Excel.
Select the IRIS.xls file.
Select the desired Excel sheet and click OK.
The data file in the Data Editor should look something like this:
SYSTAT
8
Statistical Analyses through SYSTAT
5. Descriptive Statistics
Descriptive Statistics offers basic statistics and stem-and-leaf plot for columns as well as
rows. The basic statistics are: number of observations (N), minimum, maximum, mean,
sum, standard deviation, variance, coefficient of variation (CV), range, median, standard
error of mean, etc. Besides the above options, you can perform the Shapiro-Wilk test for
normality.
Example 5.1: We will use the IRIS data to compute descriptive statistics. This data set
consists of four measurements made on 50 random samples of Iris flowers from each of
the three species of Setosa, Versicolor, and Virginica (coded as 1, 2, and 3, respectively).
The four measurements are Sepal length, Sepal width, Petal length, and Petal width in cm.
This is a famous data set from Fisher (1936).
To calculate basic statistics for the iris data, from the menu choose:
Analysis
Descriptive Statistics
Basic Statistics
SYSTAT
9
Choose SEPALWID and add it to the Selected variable(s) list.
Select N, Mean, SD, Minimum, Maximum.
To check for normality, select the Shapiro-Wilk normality test option.
Click OK.
The following output is displayed in the Output Pane:
SEPALWID
N of cases 150
Minimum 2.000
Maximum 4.400
Mean 3.057
Standard Dev 0.436
SW Statistic 0.985
SW P-Value 0.101
6. Correlation
The Correlation feature computes correlations and measures of similarity and distance.
Example 6.1: In the previous example, we computed basic statistics for SEPALWID. We
will now compute the correlations between the four variables.
SYSTAT
10
To compute correlations between the four variables: SEPALLEN, SEPALWI,
PETALLEN and PETALWID, from the menu choose:
Analysis
Correlations
Simple...
Often, we may want to compute certain statistics separately for each group defined by
certain variable(s) in the data set. In this case, we may want to examine if the correlations
are of the same magnitude in the three species. SYSTAT facilitates such computations by
its By Groups feature. Let us use By Groups in the Data menu to request separate results
for each level of SPECIES (grouping variables).
From the menu choose:
Data
By Groups
SYSTAT
11
In the By Groups dialog box, select SPECIES as variable.
Click OK.
Return to the Simple Correlations dialog box.
Select all the four variables and add it to the Selected variable(s) list.
Click OK.
The following output is displayed in the Output Pane:
The following results are for:
SPECIES = 1.0000
Pearson correlation matrix
SEPALLEN SEPALWID PETALLEN PETALWID
SEPALLEN 1.0000
SEPALWID 0.7425 1.0000
PETALLEN 0.2672 0.1777 1.0000
PETALWID 0.2781 0.2328 0.3316 1.0000
SYSTAT
12
S
E
P
A
L
L
E
N
S
E
P
A
L
W
I
D
P
E
T
A
L
L
E
N
SEPALLEN
P
E
T
A
L
W
I
D
SEPALWID PETALLEN PETALWID
Number of observations: 50
The following results are for:
SPECIES = 2.0000
Pearson correlation matrix
SEPALLEN SEPALWID PETALLEN PETALWID
SEPALLEN 1.0000
SEPALWID 0.5259 1.0000
PETALLEN 0.7540 0.5605 1.0000
PETALWID 0.5465 0.6640 0.7867 1.0000
S
E
P
A
L
L
E
N
S
E
P
A
L
W
I
D
P
E
T
A
L
L
E
N
SEPALLEN
P
E
T
A
L
W
I
D
SEPALWID PETALLEN PETALWID
SYSTAT
13
Number of observations: 50
The following results are for:
SPECIES = 3.0000
Pearson correlation matrix
SEPALLEN SEPALWID PETALLEN PETALWID
SEPALLEN 1.0000
SEPALWID 0.4572 1.0000
PETALLEN 0.8642 0.4010 1.0000
PETALWID 0.2811 0.5377 0.3221 1.0000
S
E
P
A
L
L
E
N
S
E
P
A
L
W
I
D
P
E
T
A
L
L
E
N
SEPALLEN
P
E
T
A
L
W
I
D
SEPALWID PETALLEN PETALWID
Number of observations: 50
Quick Graphs. Quick Graphs are graphs which are produced along with numeric
output without the user invoking the Graph menu.A number of SYSTAT procedures
include Quick Graphs. The Quick Graphs above are automatically generated when you
request correlations (with the Quick Graphs options on). If you want to turn off the Quick
Graph facility:
Under Edit menu, click Options.
In the Global Options dialog, select the Output tab.
Turn off the Display statistical Quick Graphs option.
The above Quick Graphs in this example are in the scatterplot matrix (SPLOM). In each
SPLOM there is one bivariate scatterplot corresponding to each entry in the correlation
matrix that follows. A univariate histogram for each variable is displayed along the
diagonal, and 75% normal distribution-based confidence ellipses are displayed within
SYSTAT
14
each plot. For species 3 (i.e. Virginica), the plot of SEPALLEN and PETALLEN has the
narrowest ellipse, and thus, the strongest correlation, which is 0.8642.
7. Hypothesis Testing
SYSTAT provides several parametric tests of hypotheses and confidence intervals for
means, variances, proportions, and correlations. This section provides examples of the
one-sample t-test and the paired t test.
a. One-Sample t-test
The one-sample t test is used to test if the mean of the population (from which the data set
is a sample) is equal to a hypothesized value.
Example 7.1: One-Sample t-test
Let us study the effect of cigarette smoking on the carbon monoxide diffusing capacity
(DL) of the lung. Ronald Knudson, Walter Klatenborn, and Benjamin Burrows found that
current smokers had DL readings significantly lower than those of exsmokers or
nonsmokers.
Let us answer, whether the data indicates that the mean DL () reading for current
smokers is significantly lower than 100 DL?
The null hypothesis is H
o
: = 100 against the alternative hypothesis H
1
: < 100
The carbon monoxide diffusing capacities for a random sample of n=20 are entered in the
Data Editor.
SYSTAT
15
To perform one-sample t-test, from the menu choose:
Analysis
Hypothesis testing
Mean
One-Sample t-test
Add DL_Reading to the Selected variable(s) list.
Enter Mean 100.
From the drop-down list, select the alternative type as less than.
Click OK.
The following output is displayed:
One-sample t-test of DL_READING with 20 cases
Ho: Mean = 100.000 against Alternative = 'less than'
Mean = 89.855
95.00% confidence bound = 95.617
SD = 14.904
t = -3.044
df = 19
p-value = 0.003
SYSTAT
16
60 70 80 90 100 110 120 130
DL_READING
0
1
2
3
4
5
6
7
C
o
u
n
t
Conclusion: We observe that the one-sided p-value is 0.003, which is highly significant.
Clearly, the mean DL () reading for current smokers is significantly lower than 100 DL.
b. Paired t-test
The paired t-test assesses the equality of two means in experiments involving paired
measurements.
Example 7.2: Paired t-test
To illustrate the paired t-test we use the data from Hand et al. (1996). The data were
collected on the systolic blood pressure of 15 patients (MacGregor et al., 1979). The
interest is to see if there is any difference in the systolic blood pressure of the patients,
before and after the administration of a drug called captopril. The BP data file gives the
supine systolic and diastolic blood pressures (mm Hg) for 15 patients with moderate
essential hypertension, immediately before and two hours after administering the drug.
SYSTAT
17
The null hypothesis is H
o
:
d
= 0 (i.e. there is no difference in the systolic blood pressure
of the patients, before and after the administration of the drug). The alternative hypothesis
is H
1
:
d
> 0 (i.e. there is positive difference in the systolic blood pressure of the patients,
between before and after the administration of the drug, indicating that the drug has the
desired effect.)
To perform paired t-test, from the menu choose:
Analysis
Hypothesis testing
Mean
Paired t-test
Add SYSBP_BEFORE and SYSBP_AFTER in the Selected variable(s) list.
From the drop-down list, select the alternative type as greater than.
Click OK.
The output is displayed in the Output Organizer
Paired samples t-test on SYSBP_BEFORE vs SYSBP_AFTER with 15 cases
Alternative = 'greater than'
Mean SYSBP_BEFORE = 176.933
Mean SYSBP_AFTER = 158.000
Mean difference = 18.933
95.00% confidence bound = 14.828
SD of difference = 9.027
t = 8.123
df = 14
p-value = 0.000
SYSTAT
18
SYSBP_AFTER SYSBP_BEFORE
Index of Case
120
130
140
150
160
170
180
190
200
210
220
V
a
l
u
e
From the above graph, it is seen that the systolic blood pressure has decreased after the
administration of the drug captopril. The test results (mean difference=18.933, p=0.000)
indicate that the drug captopril reduces the systolic blood pressure.
8. R x C Contingency Table
A contingency table provides a display of (joint) frequencies of categorical (or discrete)
data to study relationships between two or more variables. Using Crosstabulation, you
can analyze and save frequency tables that are formed by categorical variables.
Example 8.1: Contingency Table
This example uses questionnaire data from a community survey (Afifi et al., 2004). The
survey was conducted to study depression and help-seeking behavior among adults. The
CESD depression index was constructed by asking people to respond to 20 items. The
SURVEY2.SYD data file includes a record (case) for each of the 256 subjects in the
sample. The data set consists of following variables:
ID SEX AGE MARITAL EDUCATN EMPLOY
INCOME RELIGION BLUE DEPRESS LONELY CRY
SAD FEARFUL FAILURE AS_GOOD HOPEFUL HAPPY
ENJOY BOTHERED NO_EAT EFFORT BADSLEEP GETGOING
MIND TALKLESS UNFRNDLY DISLIKE TOTAL CASECONT
DRINK HEALTHY DOCTOR MEDS BED_DAYS ILLNESS
CHRONIC MARITAL$ SEX$ AGE$ EDUC$
SYSTAT
19
To study the relationship between depression and education, label the EDUCATN and
CASECONT into categories using the Label dialog box.
To open the Label dialog box, from the menu choose:
Data
Label
SYSTAT
20
Select EDUCATN as the variable.
Type the value(s) that require labels.
Type the label for each specified value.
Click OK.
Repeat the process for the variable CASECONT and label the value 1 as
depressed and 0 as normal.
To tabulate, from the menu choose:
Analysis
Tables
Two-Way
Select EDUCATN as the Row variable(s) and CASECONT as the Column variable.
Below the Tables, check the Frequencies and the Table percents check boxes.
Click OK.
The output is displayed in the Output Pane.
Frequencies
EDUCATN (rows) by CASECONT (columns)
SYSTAT
21
depressed normal Total
+---------------------+
Dropout | 14 36 | 50
HS grad | 18 80 | 98
College | 11 75 | 86
Degree + | 1 21 | 22
+---------------------+
Total 44 212 256
Row percents
EDUCATN (rows) by CASECONT (columns)
depressed normal Total N
+---------------------+
Dropout | 28.000 72.000 | 100.000 50
HS grad | 18.367 81.633 | 100.000 98
College | 12.791 87.209 | 100.000 86
Degree + | 4.545 95.455 | 100.000 22
+---------------------+
Total 17.187 82.813 100.000
N 44 212 256
Test statistic Value df Prob
Pearson Chi-square 7.841 3.000 0.049
Conclusion: As the level of education increases, the proportion of depressed subjects
decreases. Of those not graduating from high school (Dropout), 28% are depressed, and
4.55% of those with advanced degrees are depressed. Notice that the Pearson chi-square is
marginally significant (p value = 0.049). It tests the hypothesis that the percentage of
depressed is the same in all education groups.
9. Fitting Distributions
The Fitting Distributions feature enables you to assess whether the observed data can be
modeled by a distribution from a parametric family of distributions with appropriately
chosen parameter values.
Example 9.1: Fitting Normal Distribution
The data in FOREARM1 contains length of forearm (in inches) from Pearson and Lee
(1903). A normal distribution may be an appropriate model to describe the data on the
forearm length.
To fit a normal distribution, from the menu choose:
Analysis
Fitting Distributions
Continuous
SYSTAT
22
Add ARMLENGTH in the Selected variable(s) list.
Select Distribution as Normal.
The output is displayed in the Output Pane:
Variable Name: ARMLENGTH
Distribution: Normal
Estimated: Location or mean (mu) = 18.802143 Scale or SD (sigma) = 1.116466
Estimation of parameter(s): Maximum likelihood method.
Test Results:
LimitL LimitU Observed Expected
. 17.1600 11.0 9.8934
17.1600 17.6900 12.0 12.4498
17.6900 18.2200 16.0 19.8022
18.2200 18.7500 29.0 25.2471
18.7500 19.2800 22.0 25.8024
19.2800 19.8100 24.0 21.1380
19.8100 20.3400 11.0 13.8807
20.3400 . 15.0 11.7865
140.0 140.0000
SYSTAT
23
Chi-square test statistic = 3.849814 df = 5 p-value = 0.571236
Kolmogorov-Smirnov test statistic = 0.047870 Lilliefors Probability (2-tail) =
0.554270
Shapiro-Wilk test statistic for normality = 0.991759 p-value = 0.590263
16 18 20 22
ARMLENGTH
0
10
20
30
C
o
u
n
t
0.0
0.1
0.2
P
r
o
p
o
r
t
i
o
n
p
e
r
B
a
r
FITTED DISTRIBUTION
Conclusion: The above analysis indicates that a normal distribution fits the data well.
10. Analysis of Variance
We used the t-test for comparing the mean of one sample with a specified value or for
comparing the means of two groups. In many situations there is a need to compare
several means and to test the significance of differences between three or more means
from independently sampled populations.
Example 10.1: One Way ANOVA
This example uses a one-way design to compare average typing speed for three groups of
typists. Fourteen beginning typists were randomly assigned to three types of machines
and given speed tests. The following are their typing speeds in words per minute:
Electric Word
processor
Plain old
52 67 52
47 73 43
51 70 47
49 75 44
53 64
SYSTAT
24
Does the equipment influence typing performance?
H
o
: The average speeds of the three machines are the same.
H
1
: The average speeds of the three machines are not all the same.
To carry out analysis of variance using the above data, we need to reorganize the data in a
form suitable for SYSTAT. This is done by using the `Reshape feature and `wrapping
the columns as follows. Wrapping puts the group variable in one column and the
measurement variable in another column. Thus we need to wrap the data in two columns
for which from the menu choose:
Data
Reshape
The data file looks as below:
SYSTAT
25
The variable MEASURE is the typing speed using three types of machines. The levels 1,
2 and 3 correspond to machines ELECTRIC, WORD PROCESSOR and PLAIN OLD
respectively in the TRIAL column. Of course, you might like to rename `Trial as
`Equipment$ and `Measure as `Speed using the Variable Properties dialog. Now let us
do one-way analysis of variance using the wrapped data. To perform One-Way ANOVA,
from the menu choose:
Analysis
Analysis of Variance
Estimate Model
Add Measure as the Dependent variable.
Add TRIAL as the Factor.
Click OK.
The output is displayed in the Output Pane:
Effects coding used for categorical variables in model.
Categorical values encountered during processing are:
TRIAL (3 levels)
1, 2, 3
1 case(s) deleted due to missing data.
SYSTAT
26
Dep Var: MEASURE N: 14 Multiple R: 0.9523 Squared multiple R: 0.9068
Analysis of Variance
Source Sum-of-Squares df Mean-Square F-ratio P
TRIAL 1469.3571 2 734.6786 53.5196 0.0000
Error 151.0000 11 13.7273
Least Squares Means
1 2 3
TRIAL
37.0
45.2
53.4
61.6
69.8
78.0
M
E
A
S
U
R
E
Conclusion: We reject the hypothesis as the p value is small. The Quick Graph illustrates
this finding. Although the typists using electric and plain old typewriters have similar
average speeds (50.4 and 46.5, respectively), the word processor group has a much higher
average speed.
Example 10.2: Two Way ANOVA
Consider the following data from a two-factor (Drug & Disease) experiment, from Afifi
and Azen (1972), cited in Kutner (1974). The dependent variable, SYSINCR, is the change
in systolic blood pressure after administering one of four different drugs to patients with
one of three different diseases. Patients were assigned randomly to one of the possible
drugs. The data are stored in the SYSTAT file AFIFI.
SYSTAT
27
To perform Two-way ANOVA, from the menu choose:
Analysis
Analysis of Variance
Estimate Model
S.no DRUG DISEASE SYSINCR S.no DRUG DISEASE SYSINCR
1 1 1 42 29 2 3 4
2 1 1 44 30 2 3 16
3 1 1 36 31 3 1 1
4 1 1 13 32 3 1 29
5 1 1 19 33 3 1 19
6 1 1 22 34 3 2 11
7 1 2 33 35 3 2 9
8 1 2 26 36 3 2 7
9 1 2 33 37 3 2 1
10 1 2 21 38 3 2 -6
11 1 3 31 39 3 3 21
12 1 3 -3 40 3 3 1
13 1 3 25 41 3 3 9
14 1 3 25 42 3 3 3
15 1 3 24 43 4 1 24
16 2 1 28 44 4 1 9
17 2 1 23 45 4 1 22
18 2 1 34 46 4 1 -2
19 2 1 42 47 4 1 15
20 2 1 13 48 4 2 27
21 2 2 34 49 4 2 12
22 2 2 33 50 4 2 12
23 2 2 31 51 4 2 -5
24 2 2 36 52 4 2 16
25 2 3 3 53 4 2 15
26 2 3 26 54 4 3 22
27 2 3 28 55 4 3 7
28 2 3 32 56 4 3 25
57 4 3 5
58 4 3 12
SYSTAT
28
Select SYSINCR as the Dependent variable.
Add DRUG and DISEASE in the Factor list box.
Click OK.
Note: While performing ANOVA, all interaction terms are included in the analysis. If you
want to specify your own model then use the GLM feature.
The output is displayed in the Output Pane:
Effects coding used for categorical variables in model.
Categorical values encountered during processing are:
DRUG (4 levels)
1, 2, 3, 4
DISEASE (3 levels)
1, 2, 3
Dep Var: SYSINCR N: 58 Multiple R: 0.675 Squared multiple R: 0.456
Analysis of Variance
Source Sum-of-Squares df Mean-Square F-ratio P
DRUG 2997.472 3 999.157 9.046 0.000
DISEASE 415.873 2 207.937 1.883 0.164
DRUG*DISEASE 07.266 6 117.878 1.067 0.396
Error 5080.817 46 110.453
SYSTAT
29
Conclusion: In two-way ANOVA, begin the analysis by looking at the interaction effect.
The DRUG * DISEASE interaction is not significant (p = 0.396), so shift your focus to the
main effects.
The DRUG effect is significant (p < 0.0005), but the DISEASE effect is not (p = 0.164).
Thus, at least one of the drugs differs from the others with respect to blood pressure
change, but blood pressure change does not vary significantly across diseases.
Note: Along with ANOVA table, SYSTAT also displays the Estimates of the model
parameters. To get the estimates, you need to select LONG as the Print option. To do so,
from the menu, choose
Edit Options.
Select the Output tab.
From the Output results, select Length as Long.
11. Linear Regression
Regression analysis is used to investigate the relationship between a response variable and
one or more predictors.
Example 11.1: Let us study the relationship between noise exposure (predictor or
independent variable) and hypertension (dependent or response variable). The following
data were collected on Y (blood pressure rise in millimeters of mercury) and X (sound
pressure level in decibels).
Y X
1 60
0 63
1 65
2 70
5 70
1 70
4 80
6 90
2 80
3 80
5 85
4 89
6 90
8 90
4 90
5 90
7 94
9 100
7 100
6 100
SYSTAT
30
To perform Linear Regression, from the menu choose:
Analysis
Regression
Linear
Least Squares
Select Y as the Dependent variable.
Select X as the Independent variable.
Click OK.
The output is displayed in the Output pane:
Dep Var: Y N: 20 Multiple R: 0.865 Squared multiple R: 0.748
Adjusted squared multiple R: 0.734 Standard error of estimate: 1.318
Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail)
CONSTANT -10.132 1.995 0.000 -5.079 0.000
X 0.174 0.024 0.865 1.000 7.314 0.000
Analysis of Variance
Source Sum-of-Squares df Mean-Square F-ratio P
Regression 92.934 1 92.934 53.502 0.000
Residual 31.266 18 1.737
-----------------------------------------------------------------------------
*** WARNING ***
Case 5 is an outlier (Studentized Residual = 2.741)
Durbin-Watson D Statistic 2.290
First Order Autocorrelation -0.179
SYSTAT
31
Conclusion. The estimates of the regression coefficients are -10.132 and 0.174, so the
equation regression is:
Y= -10.132 +0.174X
F-ratio in the analysis of variance table is used to test the hypothesis that the slope is 0 (or,
for multiple regressions, that all slopes are 0). The F is large when the independent
variable(s) helps to explain the variation in the dependent variable. Here, there is a
significant linear relation between Y and X. Thus, we reject the hypothesis that the slope of
the regression line is zero (F-ratio = 53.502, p value (P) < 0.0005). SYSTAT also outputs
statistics and warnings for outlier detection and for testing the assumptions in linear
regression methodology.
12. Logistic Regression
Logistic regression describes the relationship between a dichotomous response variable
and a set of explanatory (predictor or independents) variables. The explanatory variables
may be continuous or (with dummy variables) discrete.
Example 12.1: Binary Logistic Regression
To illustrate the use of binary logistic regression, we consider a hypothetical data set. Data
on 15 skiers present, falling down (0= not falling, 1= falling) on a ski run is tested against
the difficulty of the run (on an ordered scale from 1 to 3, treated as if continuous) and the
season a categorical variable where 1 = autumn, 2= winter, and 3 = spring)
To perform Logistic regression, from the menu choose;
Analysis
Regression
Logit
Estimate Model
SYSTAT
32
Select FALL as the Dependent variable.
Select DIFFICULTY and SEASON as the Independent variables.
Let us use Category tab to recode the variable SEASON.
Select SEASON as the Categorical variable.
Select Coding type as dummy coding.
Click OK.
The output is displayed in the output pane:
Categorical values encountered during processing are:
SEASON (3 levels)
1, 2, 3
FALL (2 levels)
0, 1
Categorical variables are dummy coded with the highest value as reference.
Binary LOGIT Analysis.
Dependent variable: FALL
Input records: 15
Records for analysis: 15
Sample split
SYSTAT
33
Category choices
0 (REFERENCE) 6
1 (RESPONSE) 9
Total : 15
L-L at iteration 1 is -10.3972
L-L at iteration 2 is -8.8005
L-L at iteration 3 is -8.7411
L-L at iteration 4 is -8.7404
L-L at iteration 5 is -8.7404
Log Likelihood: -8.7404
Parameter Estimate S.E. t-ratio p-value
1 CONSTANT -1.7768 1.8898 -0.9402 0.3471
2 DIFFICULTY 1.0108 0.8960 1.1281 0.2593
3 SEASON_1 0.9275 1.5894 0.5836 0.5595
4 SEASON_2 -0.4185 1.3866 -0.3018 0.7628
95.0 % bounds
Parameter Odds Ratio Upper Lower
2 DIFFICULTY 2.7478 15.9106 0.4745
3 SEASON_1 2.5282 56.9781 0.1122
4 SEASON_2 0.6581 9.9666 0.0434
Log Likelihood of constants only model = LL(0) = -10.0952
2*[LL(N)-LL(0)] = 2.7096 with 3 df Chi-sq p-value = 0.4386
McFadden's Rho-Squared = 0.1342
Conclusion. We see that none of the coefficients is significant. The likelihood-ratio
statistic of 2.7096 is chi-squared with three degrees of freedom and a p-value of 0.4386.
13. Graphs
SYSTAT offers a wide variety of graphical analysis tools that enable better visualization
of the data. The editing options in SYSTAT allow you to fine-tune and change the display
of the graph. To create a Summary charts, Density displays, Plots click on the graph
toolbar menu or select the icon from the Graph toolbox
Note. Graph menus are available when a data file is in use.
Example 13.1: Simple Scatter Plot
Let us create a simple scatter plot. Consider the following data file.
In various international cities, how long must people work to earn enough to buy a Big
Mac? How does this time relate to the length of a typical work week? We plot BIG_MAC,
the working time (in minutes) to buy a Big Mac against WORKWEEK, the length of the
work week (in hours). The data are in the RCITY file that has 46 cases, one for each city.
Open the RCITY.SYD data file from DATA folder of main SYSTAT directory.
SYSTAT
34
Note. By default, the file location is C:\Program Files\SYSTAT 11\Data.
You can also change the default path. To do so, from the menu choose:
Edit Options.
Select the File Locations tab.
Select the radio button, Set custom directories.
Change the path for Open data.
To plot Big Mac against WORKWEEK, from the menu choose;
Graph
Plots
Scatterplot
SYSTAT
35
Select WORWEEEK as the X-variable(s).
Select BIG_MACK as the Y variable.
Click OK.
The Output pane displays the following graph:
30 35 40 45 50
WORKWEEK
0
100
200
300
B
I
G
_
M
A
C
Customization of an existing graph
Once you have created a graph, you can use the Graph Editor tab change many of its
features without recreating the graph. Using the Graph menu, you can change the
properties such as color, axes, labels, symbols, titles and graph size.
Note: To view the graph in the Graph Editor, either double click on it or click the
Graph Editor tab or double click the corresponding node in the tree formed in the
Output Organizer tab.
SYSTAT
36
To Edit Graph Axes
From the menu choose:
Graph
Options
Axes
The Axes dialog enables you to alter the axes of your graphs. It has three tabs Labels,
Scale, and Tick Marks.
Labels tab
To enter the new labels for the axes of your graph, select the Labels tab.
Change the WORKWEEK in the X-axis label to Average working hours per week.
Click Ok.
Alternatively, by right-clicking on the graph you can edit the label of your graph.
SYSTAT
37
Please note that the above menus are also available in the main Scatterplot dialog
box.
Scale tab
You can define a range for the scale of each axis on the graph.
Note: Any data points that fall outside the range do not appear on the graph.
To flip the axes check Transpose X-Y check box.
Tick Marks
Tick Marks tab allows changing X and Y-axis tick intervals along with the tick marks
style.
SYSTAT
38
To Edit the Graph Layout
From the menu choose:
Graph
Options
Layout
The Layout dialog box enables you to alter the graphic title, legend, and layout of frames.
It has three tabs Graph Title, Frame Layout, and Legend.
Graph Title
SYSTAT
39
Enter a new title for your graph, say, WORKWEEK Vs BIG_MACK.
Frame Layout
Frame Layout allows you to enter a title for individual frames, and change the position
and size of the graph.
In the Frame size, enter Height and Width equal to 3.
Note: For graphs consisting of one frame, no frame title can be specified.
Legend
The Legend tab allows you to alter the position of the graph legend, its title, and its item
labels.
SYSTAT
40
Note. Usually legend tab would be active, when a grouping variable is selected while
creating a graph. Since no grouping variable has been selected here, all fields in the legend
tab are inactive.
To Edit Appearance of the Graph
From the menu choose:
Graph
Options
Appearance
The Appearance dialog box enables you to alter the color, fill and the symbol of the graph.
It has three tabs: Color, Fill, and Symbol and Label.
Color
SYSTAT
41
To change the color for the elements in the graph, select the option Select color.
Select a color from the Color drop-down list for each of the y variables.
Fill
To change the fill pattern for the elements in the graph, select the option Select fill.
SYSTAT
42
Select a fill pattern from the Fill Pattern drop-down list for each of the y variables.
Symbol and Label
You can change the symbol type by using any of SYSTATs 23 built-in symbols.
After performing the above steps, edited graph looks like this
30 35 40 45 50
WORKWEEK
0
100
200
300
B
I
G
_
M
A
C
WORKWEEK Vs BIG_Mac
SYSTAT
43
14. Getting Help
SYSTAT uses the standard HTML Help system to provide information you need to use
SYSTAT and to understand the results. This section contains a brief description of the
Help system and the kinds of help provided with SYSTAT.
The best way to find out more about the Help system is to use it. You can ask for help in
any of these ways:
Click the button in a SYSTAT dialog box. This takes you directly to a topic
describing the use of the dialog box. This is the fastest way to learn how to use a
dialog box.
Right-click on any dialog box item, and select 'What's this?' to get help on that
particular item.
Select Contents or Search from the Help menu.
For help on commands, from the command prompt (on the Interactive tab of the
Commandspace) type:
HELP [command name]