http://www.ruf.rice.edu/~mickey/psyc339/notes/notes.html
http://web.uccs.edu/lbecker/SPSS/content.htm
Analysis
of variance:
This
is how we generally test for the presence of one or more effects.
In our experiments we manipulate one or more independent variables, we
control for other independent variables, and we measure one or more dependent
variables. Each independent
variable (or factor) has two or more levels.
Each datum comes from some condition, or combination of the levels of the
factors. For example, suppose
our dependent variable is VOT and our independent variables are consonant
(let’s say 3 levels), following vowel (let’s say 5 levels), and stress
(let’s say 2 levels) – plus the subjects who produce the speech data; and
we control for certain things, such as position in word by having all consonants
word-initial, and voicing by having them all voiceless.
Each single VOT measurement comes from some particular condition such as
“p before i in a stressed syllable” (and from a particular subject).
There is a set of all measurements (from all the subjects) for each condition,
e.g. “p before i in a stressed syllable”; and these can be combined with
other conditions to produce larger sets for all the different levels of
the different factors, e.g. all the “p” data, or all the “stressed” data,
or all the “stressed p” data – any subset defined by your variables.
Analysis
of an experiment with one factor is called “1-way”, of an experiment with
two factors “2-way”, etc.
Since you test for an effect of a factor by forming an F-ratio, there is
a separate F-ratio for each factor in your experiment; thesetest
for main effects. This doesn’t
depend on how many levels each factor has.
So, for example:
1-way
ANOVA: F-ratio for 1 factor (regardless of its number of levels)
2-way
ANOVA: F-ratios for 2 factors (regardless of their numbers of levels)
(etc.)
In
our example above, there would be F-ratios for consonant, vowel, and stress
factors.
F-ratios
can also be formed for the various subsets of the data; these are called
interactions. The consonant
x stress interaction looks at the effect of the stress factor on each of
the different consonants, and the effect of the consonant factor on each
of the different stresses.
DOING
A BASIC ANOVA
Before
we get to Repeated Measures designs, we’ll review doing a factorial ANOVA.
1.
Datafile organization
We
start here using StatView because it’s popular in our lab; but you can
skip ahead to the SPSS section if you like.
Open
StatView, then open a sample file to look at the data in it.
Select File - open (or the Open icon) – select the folder called Sample
Data – select the file exercise.svd (.svd is StatView’s file extension).
It has 6 columns and 20 numbered rows (plus extra rows of labels at the
top). Of the 6 columns, 3
are labels and 3 are data.
The first column labels the rows by number.The
next 2 columns define the data structure in terms of the independent variables:
a factor Pre-stretch with levels No stretch and Stretch, and a factor Ankle
weights with levels No weights and Weights.
So this is a 2-way design.
The last 3 rows are data for three different dependent variables.
There are 20 rows, so 20 subjects, and they are divided into 4 groups of
5 subjects, one group per condition.
See
how data look in an Excel file using my sample file factorial.xls.
This gives data from 24 subjects for 2 factors each with 2 levels.
You can open it in Excel to see what it would look like when you set it
up yourself in advance, and you can also open it directly in StatView to
see how it shows up there.
(Be sure to change the file type in the file open window to include Excel
files.) It is almost always
a good idea to first put your logs of measurements into Excel and get them
ready there, rather than pasting them directly into StatView.
However, you must save it as an Excel 4 file.
2.Running
the analysis
Again,
start by opening StatView’s file “exercise.svd” in StatView’s “sample data”
folder. Then click on:
Analyze - ANOVA and t-tests - ANOVA or ANCOVA
It
asks you to assign variables for the analysis.
This includes the independent and the dependent variables.
You can see the list of available variables along the right; notice that
they are not labeled as independent vs. dependent, or “factors” vs. “data”
– only as nominal vs. continuous, which should be enough to help you remember
which is which. The dependent
variable window is active, waiting for you to drag a variable name into
it. Recall that this sample
file has 3 dependent variables, but any one ANOVA will analyze only one
of them. Pick one and drag
it. If you try to use one
of the factors as data, you’ll get an error message.
Then drag both factors into the independent variable window.
When you click OK, the analysis runs, and you’ll see a summary table of
the results. This one is for
Oxygen:

The
rows show the two factors, their interaction, and the error term (here
called Residual). For each
of these, the columns show the degrees of freedom df, the sums of squares
SS, the mean square MS (= SS/df), the F-ratio (= MS for that row / MS Residual),
p (the chance that an effect is due to chance), and 2 measures we’ll talk
about in Power and Size.
Since the 2 factors each had only 2 levels, their df is 1, and since MS
= SS/df, here the value of MS = the value of SS.
For all three F-ratios, the denominator is the Residual MS.
Its df is related to the fact that there are 20 subjects in the entire
experiment.
Recall
that to report your result you say, e.g., “F(1,16) = 7.126, p < .02”.
Do
the analysis again, selecting only one independent variable, and compare
the results.
Now
do another ANOVA, but using my Excel file factorial.xls.
You can run the analysis on the file just as it is when you open it; no
need to first save it as a StatView file.
3.A
twist: “compact” data files
StatView
has another kind of data file organization, called compact variables.
All the independent variables are arranged into columns, and each row is
a subject. Look at my sample
files
compact.xls
and compact.svd
to see a compact variable version of what we saw before.
The 4 columns are the 4 conditions (2 levels of each of 2 factors).
Merely arranging the data like this does not by itself make it a compact
variable; you have to define it as such in StatView.
In
other words, it doesn’t matter whether your file is in one format or the
other to do a factorial ANOVA in StatView.
Regular format is intuitive in generalizing a 2-factor row x column arrangement,
and can allow you to balance the numbers of rows and columns, so that it’s
easier to see all the data at once.
But compact format, though hard to read if there are many conditions, makes
your data structure very clear.
to
review:
structure of these kinds of files
|
|
regular
|
compact
|
|
|
|
|
|
row
|
observation
|
subject
|
|
column
|
variable
|
condition
|
Regardless
of the data organization, subjects is the experimental unit – each subject
is taken to provide one piece of data, i.e. for one condition.
4.A
Question: Can you do an ANOVA with only one or two subjects?
No,
you need at least a few, probably several, possibly many, subjects.
(See section on Power for info on how many you need.)
The
problem is that the observations from a single subject are not independent
enough to be analyzed by ANOVA. You need enough subjects for subjects to
be the experimental unit (the basis of the comparisons in the test).
When an ANOVA is done with only one subject, as indeed is often seen in
the literature, individual trials (tokens, repetitions) are used as the
experimental unit, and as these are not independent, the degrees of freedom
used will be too high, which will overestimate the significance of any
differences (Alpha, or Type I, error).
Note
that StatView takes each row to be an experimental unit. So if your data
are set up in one long column, each row being a single token, then StatView
will think you have that many subjects.
Remember, just because you code a variable as "Subject", doesn't mean that
StatView in any way interprets and uses that variable differently from
others!