Mathematical Structures in Language (LING 218)
Topic: "Statistics and R"
Fall 2005
Marcus Kracht
 Time: Tuesdays/Thursdays 11:00  1:00
 Place: Bunche 3157
 Prerequisites: Ling 180/208 or equivalent
 Instructor: Marcus Kracht
 Work required: weekly assignments
Short Description of the Course
This course will be both an introduction to statistics as well
as a practical guide to using R for the calculations and data
analysis. There is a syllabus available (see below for a download).
(I count weeks from Tuesday to Tuesday, so we shall have
basically only 9 weeks.)

Week 1:
Some mathematical background: binomial coefficients,
calculus. Assignment 1.

Week 2:
Probability spaces, conditional probabilities and
inverse probabilities.
Assignment 2.

Week 3:
Random variables, expectation, variance. Law of large
numbers. Assignment 3.

Week 4:
Statistics. Types of data, distributions and statistics.
Assignment 4.
 Week 5:
Parameter estimation.
Assignment 5.
 Week 6:
Hypothesis testing (ttest; chisquare test).
Power and significance of a test, and the pvalue.
Assignment 6.
 Weeks 7  8:
Linear Regression.
 Week 9  10:
To be decided.
Course Material
 Quantitatve Methods in Linguistics by Keith Johnson. This
contains some notes on R, on statistics, and practical examples
for linguists.
 R for Beginners by Emmanuel Paradis (University of
Montréal). This is much more detailed than the inbuilt
R help.
 Download Syllabus
on Statistics. Contains the mathematics. I do not
replicate the notes by Johnson. I am mainly concerned
with statistics in these notes, though I have begun to
add some notes on R. Note that the syllabus is still under
construction, that is incomplete and may contain errors.
I shall do my best to eliminate them on time. You might
find it useful to consult the solutions to the assignments.
They also contain useful information on R. For some assignments
I have provided programs. You may download them and view
them first to see what you have to do.
 Cherokee VOT (as found on the
webpage by Keith Johnson).
 Material for the language classification. The book by
Keith Johnson just gives a link to the data, but the file
is not usable as such. I redid the example. Download the
correlation table,
the names of the languages and
the script to make R produce the dendrogram. I
have not managed to get the language names into the graphics.
I shall work on that.
 Assignments: