Note to any readers: this page comes out of informal sessions in the UCLA Phonetics Lab on how we should be doing ANOVA. I am not a statistician – not even a statistics teacher – so don’t take my word for anything here; but I have tried to convey as best I can what I’ve read. This isn’t an intro to statistics, and assumes you already know something about ANOVA. For general review, here are two websites that looked good:

 

http://www.ruf.rice.edu/~mickey/psyc339/notes/notes.html
 

http://web.uccs.edu/lbecker/SPSS/content.htm


 

Analysis of variance:
 

This is how we generally test for the presence of one or more effects.  In our experiments we manipulate one or more independent variables, we control for other independent variables, and we measure one or more dependent variables.  Each independent variable (or factor) has two or more levels.  Each datum comes from some condition, or combination of the levels of the factors.  For example, suppose our dependent variable is VOT and our independent variables are consonant (let’s say 3 levels), following vowel (let’s say 5 levels), and stress (let’s say 2 levels) – plus the subjects who produce the speech data; and we control for certain things, such as position in word by having all consonants word-initial, and voicing by having them all voiceless.  Each single VOT measurement comes from some particular condition such as “p before i in a stressed syllable” (and from a particular subject).  There is a set of all measurements (from all the subjects) for each condition, e.g. “p before i in a stressed syllable”; and these can be combined with other conditions to produce larger sets for all the different levels of the different factors, e.g. all the “p” data, or all the “stressed” data, or all the “stressed p” data – any subset defined by your variables.
 

Analysis of an experiment with one factor is called “1-way”, of an experiment with two factors “2-way”, etc.  Since you test for an effect of a factor by forming an F-ratio, there is a separate F-ratio for each factor in your experiment; thesetest for main effects.  This doesn’t depend on how many levels each factor has.  So, for example:
 

1-way ANOVA: F-ratio for 1 factor (regardless of its number of levels)
 

2-way ANOVA: F-ratios for 2 factors (regardless of their numbers of levels)
 

(etc.)
 

In our example above, there would be F-ratios for consonant, vowel, and stress factors.


 

F-ratios can also be formed for the various subsets of the data; these are called interactions.  The consonant x stress interaction looks at the effect of the stress factor on each of the different consonants, and the effect of the consonant factor on each of the different stresses.


 

DOING A BASIC ANOVA


 

Before we get to Repeated Measures designs, we’ll review doing a factorial ANOVA.


 

1. Datafile organization


 

We start here using StatView because it’s popular in our lab; but you can skip ahead to the SPSS section if you like.


 

Open StatView, then open a sample file to look at the data in it.  Select File - open (or the Open icon) – select the folder called Sample Data – select the file exercise.svd (.svd is StatView’s file extension).  It has 6 columns and 20 numbered rows (plus extra rows of labels at the top).  Of the 6 columns, 3 are labels and 3 are data.  The first column labels the rows by number.The next 2 columns define the data structure in terms of the independent variables: a factor Pre-stretch with levels No stretch and Stretch, and a factor Ankle weights with levels No weights and Weights.  So this is a 2-way design.  The last 3 rows are data for three different dependent variables.  There are 20 rows, so 20 subjects, and they are divided into 4 groups of 5 subjects, one group per condition.
 

See how data look in an Excel file using my sample file factorial.xls.  This gives data from 24 subjects for 2 factors each with 2 levels.  You can open it in Excel to see what it would look like when you set it up yourself in advance, and you can also open it directly in StatView to see how it shows up there.  (Be sure to change the file type in the file open window to include Excel files.)  It is almost always a good idea to first put your logs of measurements into Excel and get them ready there, rather than pasting them directly into StatView.  However, you must save it as an Excel 4 file.
 


 

2.Running the analysis 


 

Again, start by opening StatView’s file “exercise.svd” in StatView’s “sample data” folder.  Then click on:

Analyze - ANOVA and t-tests - ANOVA or ANCOVA
 

It asks you to assign variables for the analysis.  This includes the independent and the dependent variables.  You can see the list of available variables along the right; notice that they are not labeled as independent vs. dependent, or “factors” vs. “data” – only as nominal vs. continuous, which should be enough to help you remember which is which.  The dependent variable window is active, waiting for you to drag a variable name into it.  Recall that this sample file has 3 dependent variables, but any one ANOVA will analyze only one of them.  Pick one and drag it.  If you try to use one of the factors as data, you’ll get an error message.  Then drag both factors into the independent variable window.  When you click OK, the analysis runs, and you’ll see a summary table of the results.  This one is for Oxygen:
 

The rows show the two factors, their interaction, and the error term (here called Residual).  For each of these, the columns show the degrees of freedom df, the sums of squares SS, the mean square MS (= SS/df), the F-ratio (= MS for that row / MS Residual), p (the chance that an effect is due to chance), and 2 measures we’ll talk about in Power and Size.  Since the 2 factors each had only 2 levels, their df is 1, and since MS = SS/df, here the value of MS = the value of SS.  For all three F-ratios, the denominator is the Residual MS.  Its df is related to the fact that there are 20 subjects in the entire experiment.


 

Recall that to report your result you say, e.g., “F(1,16) = 7.126, p < .02”.


 

Do the analysis again, selecting only one independent variable, and compare the results.


 

Now do another ANOVA, but using my Excel file factorial.xls.  You can run the analysis on the file just as it is when you open it; no need to first save it as a StatView file.


 

3.A twist: “compact” data files
 

StatView has another kind of data file organization, called compact variables.  All the independent variables are arranged into columns, and each row is a subject.  Look at my sample files compact.xls and compact.svd to see a compact variable version of what we saw before.  The 4 columns are the 4 conditions (2 levels of each of 2 factors).  Merely arranging the data like this does not by itself make it a compact variable; you have to define it as such in StatView.
 

To do the factorial ANOVA, select Analyze - ANOVA and t-tests - ANOVA or ANCOVA as before; however, only the compact variable name is seen to the right.  The compact variable itself is your dependent variable, so you can drag it into that box.  To see the factors that are compacted inside the compact variable, so that you can drag them into the independent variable box, click on the arrow next to the compact variable name over on the right.T  his “opens up” your compact variable and now you can drag the factors.Click OK and see the results just as before.


 

In other words, it doesn’t matter whether your file is in one format or the other to do a factorial ANOVA in StatView.  Regular format is intuitive in generalizing a 2-factor row x column arrangement, and can allow you to balance the numbers of rows and columns, so that it’s easier to see all the data at once.  But compact format, though hard to read if there are many conditions, makes your data structure very clear.


 

to review: structure of these kinds of files


 
regular
compact
row
observation 
subject
column
variable
condition


 

Regardless of the data organization, subjects is the experimental unit – each subject is taken to provide one piece of data, i.e. for one condition. 


 

4.A Question: Can you do an ANOVA with only one or two subjects? 


 

No, you need at least a few, probably several, possibly many, subjects.  (See section on Power for info on how many you need.)


 

The problem is that the observations from a single subject are not independent enough to be analyzed by ANOVA. You need enough subjects for subjects to be the experimental unit (the basis of the comparisons in the test).  When an ANOVA is done with only one subject, as indeed is often seen in the literature, individual trials (tokens, repetitions) are used as the experimental unit, and as these are not independent, the degrees of freedom used will be too high, which will overestimate the significance of any differences (Alpha, or Type I, error).


 

Note that StatView takes each row to be an experimental unit. So if your data are set up in one long column, each row being a single token, then StatView will think you have that many subjects.  Remember, just because you code a variable as "Subject", doesn't mean that StatView in any way interprets and uses that variable differently from others!