Skip to main content
  • I am a "Full-Stack" Linguist now teaching in the University of Arizona's Human Language Technology program. What does... moreedit
The negation system of Yang Zhuang includes two standard negators and an aspectual negator, all of which occur before the verb; the negator meiz nearly always co-occurs with a clause-final particle nauq, which can also stand as a... more
The negation system of Yang Zhuang includes two standard negators and an aspectual negator, all of which occur before the verb; the negator meiz nearly always co-occurs with a clause-final particle nauq, which can also stand as a single-word negative response to a question. Although it is tempting to analyze nauq with a meaning beyond simply negation, this is difficult to do synchronically. Comparison with neighboring Tai languages suggests that this construction represents one stage in Jespersen's Cycle, whereby a negator is augmented with a second element, after which the second element becomes associated with negation; this element subsequently replaces the historical negator. A Jespersen's Cycle analysis also explains the occurrence of nauq as a preverbal negator in some neighboring Zhuang languages.
"This report presents the results of a dialect intelligibility survey carried out in 2008 in the southwestern part of the Guangxi Zhuang Autonomous Region of China, described by Zhang et al (1999) as the Dejing sub-dialect area of... more
"This report presents the results of a dialect intelligibility survey carried out in 2008 in the southwestern part of the Guangxi Zhuang Autonomous Region of China, described by Zhang et al (1999) as the Dejing sub-dialect area of Southern Zhuang. The Zhuang varieties surveyed in this area have been grouped into the Central branch of Taic languages. This survey of nineteen locations across the sub-dialect area found evidence from intelligibility, from similarity of wordlists (as determined by a string edit distance algorithm), and from speaker attitudes for assigning them to at least two distinct ISO 639-3 language codes, zyg “Yang Zhuang” and zgm “Minz Zhuang”.

Previous published sources claimed that the variety of Yang spoken in the county seat of Jingxi County is well understood across the region. This survey tested this claim with recorded text tests (RTTs) from the county seat and a nearby rural area, and found them to be well understood across most of the Dejing area. Through comparison with the wordlist similarity results, however, this intelligibility is inherent only for a subset of the surveyed varieties; these varieties represent a cluster that is herein referred to as Yang-Nong, and includes several Zhuang varieties from this area. The high intelligibility of Jingxi Yang by speakers of other varieties is due to acquired ability in Jingxi Yang. Initial intelligibility results indicate another cluster of mutually-intelligible varieties in the area which we refer to as Min-Zong and which appears to correspond to the Minz of Yunnan Province; further fieldwork is needed to verify this in a more representative sample of these varieties.

Sociolinguistic questionnaires were administered in order to measure residents' attitudes toward these language varieties and factors relevant for language planning. The results indicate that Jingxi Yang is viewed very positively over a large area, though in particular areas local sociolinguistically prominent varieties of Zhuang are preferred.

Based on the data from wordlist similarity, inherent intelligibility, and sociolinguistic attitudes, a variety of Yang would work very well as a standard or basis for language development efforts among Yang-Nong communities, accounting for roughly two-thirds of the Zhuang in the Dejing sub-dialect area; we propose that documentation of the ISO 639-3 code zyg “Yang Zhuang”, previously taken to apply to most Zhuang varieties in the area, be amended to include only those Zhuang varieties that fall under Yang-Nong. Language development among Min-Zong communities in the area, accounting for another one-sixth of the areas' Zhuang, would benefit from development based on a variety of Min or Zong; we propose that documentation of the ISO 639-3 code zgm “Minz Zhuang”, previously thought to apply only to a small group of speakers in Yunnan Province, be amended to also include Min-Zong communities in Guangxi. Language development efforts among most of the remaining one-sixth of the Zhuang population of the Dejing area would require other varieties as a basis, but for many of these varieties, Jingxi Yang could likely still be used as a means of widespread communication."
The hypothesis that the meanings of words in natural language have structure has been debated among linguists for over three decades. This dissertation examines two resultative suffixes in Pima (Tepiman, Southern Uto-Aztecan), referred to... more
The hypothesis that the meanings of words in natural language have structure has been debated among linguists for over three decades. This dissertation examines two resultative suffixes in Pima (Tepiman, Southern Uto-Aztecan), referred to as the passive resultative and the possessive resultative, whose properties are relevant for this debate. The interpretations which these two resultatives receive support one type of structure within the meanings of certain verbs.

The passive resultative suffix –s is canonically interpreted as a resultative proper; verbs with this suffix typically express the condition which results from an event of the type denoted by the unsuffixed verb. Certain verbs with this suffix, however, receive a derived stative interpretation, where the condition which they express need not be the result of any event at all. Other verbs with this suffix receive a perfect interpretation; their meaning is solely that an event of some type has occurred. Resultative-suffixed verbs with these interpretations all lack as an argument the agent which occurs as subject of the base verb. Where the base does not take an agent, however, the suffixed form receives one of three other interpretations and the argument structure of base and resultative appears identical. The possessive resultative suffix –kc, in contrast, has a more restricted distribution; verbs with this suffix receive either a resultative or derived stative interpretation, where the subject of the suffixed verb is responsible for maintaining this condition.

While several analyses of the Pima resultatives are considered here, the most economical analysis of the distribution of interpretations which Pima resultatives receive involves monotonically adding semantic components in order to build the meaning of both eventive verbs and resultatives. This analysis is presented within the framework of Distributed Morphology, where the semantic components of these verbs are associated with a number of abstract syntactic elements. Since these resultatives are temporally stative, an introductory chapter explores what temporal stativity is and what it indicates about a predicate; another introductory chapter discusses published analyses of resultatives in Chichewa and German, which show several quite different ways that a morphologically and semantically derived predicate may be given this property.
This is a report of a pilot study of the phonetic variation of vowels due to stress and syllable type in Pima, a dialect of O’odham. O’odham, along with several other Uto-Aztecan languages, has a five vowel system which appears unevenly... more
This is a report of a pilot study of the phonetic variation of vowels due to stress and syllable type in Pima, a dialect of O’odham. O’odham, along with several other Uto-Aztecan languages, has a five vowel system which appears unevenly distributed in two ways: its only front vowel is high, and it includes three high-back or high-mid vowels. This arrangement of canonical vowels appears not to reflect the influence of a drive for maximal dispersion of canonical vowels, something which has been argued to account for the frequency and type of vowel inventories cross-linguistically. Several properties of the allophonic variation observed in Pima, however, can be explained by appealing to just such a drive to maximize acoustic distinctness. Factors besides maximal distinctness must also be involved in controlling this distribution, however, as evidenced by the relative stability of this vowel system among Uto-Aztecan languages.
This thesis describes the behavior of a prefix which occurs on adjectives and verbs in the O'odham languages of Pima and Papago, Tepiman languages of Southern Uto-Aztecan, spoken in Arizona and Mexico. This prefix, whose phonological... more
This thesis describes the behavior of a prefix which occurs on adjectives and verbs in the O'odham languages of Pima and Papago, Tepiman languages of Southern Uto-Aztecan, spoken in Arizona and Mexico.
This prefix, whose phonological form is s-, shows a moderate, but not perfect, correlation with stative lexical aspect. Roughly 80% of monomorphemic adjectives and numerous stative verbs in Pima are preferred with this prefix; this preference also applies to adverbials derived from these adjectives and verbs. Active verbs are almost universally ungrammatical with this prefix. When words which typically license the s- prefix are used in contexts involving a change of state—as inchoatives or causatives—the s- prefix is frequently ungrammatical. Certain derivational affixes which derive stative predicates also license the s-.
There are exceptions to almost all of these generalizations, however, making the s- appear less clearly inflectional and regular. In addition, there are properties displayed by the s- which cannot easily be associated with stativity: a suffix which derives adverbials from verbs licenses the s-, and intriguingly, the s- is disallowed on many stems which normally license it, within the scope of negation.
This paper models the behavior of the s- prefix within the non-Lexicalist framework of Distributed Morphology, treating the s- as the expression of a grammaticized semantic feature [+ stative]. Although not every behavior of the s- can be accounted for in this way, several analyses of the s- from a syntactic perspective are briefly discussed, and shown to be insufficient, as well.
Pima is a language of the Tepiman branch of Southern Uto-Aztecan, spoken in central Arizona. It is closely related to the language Tohono O’odham (also called Papago). Pima and Papago both include a peculiar prefix /s-/, which occurs on... more
Pima is a language of the Tepiman branch of Southern Uto-Aztecan, spoken in central Arizona. It is closely related to the language Tohono O’odham (also called Papago). Pima and Papago both include a peculiar prefix /s-/, which occurs on certain adjectives, verbs, adverbs, and nouns. Zepeda (1989:111), describing this morpheme in Papago, claims that it indicates stativity. Typical examples of the s- prefix are shown in (1); the s- prefix is in bold.
(1) a
S-hem-heegam  'a-n-t.
st-2s:obj-envy  a-1s:sub-pfv
‘I am jealous of you.’
      b
S-hem-heegam-k          'o'odham  'o      kosh.
st-2s:obj-envy-stage  man          imp  sleep
‘The jealous-of-you man is sleeping.’
      c
Microsoft  'o    si      s-heegm-am    ñuukud  heg  'e-'a'agidag.
                  imp  very  st-envy-adv  guard    det  ¬1:ref-red:secret
‘Microsoft guards its secrets jealously.’
This prefix occurs on such words whether they are used predicatively (1a), attributively (1b), or adverbially (1c), though it does not usually occur in nominalizations of such words. Surprisingly, it does occur on several nouns which are not nominalizations of verbs, it occurs on several activity (i.e., apparently non-stative) verbs when these verbs are used adverbially, and it can be licensed by several distinct derivational morphemes. It is frequently judged ungrammatical on inchoatives, causatives, and on negated elements. Although this s- prefix does often occur on verbs and adjectives which intuitively denote states, there is also a small set of apparently stative verbs and adjectives on which this prefix is ungrammatical. Moreover, although it is required or preferred in certain cases, it may often be optionally absent with no detectable change in meaning. For this reason, this prefix cannot be associated with stativity without question; in fact, it is extremely difficult to assign this prefix a clear meaning.
Despite these difficulties, certain facts about the structural location of this prefix may be deduced from several types of observations: its position as the leftmost verb prefix; its sensitivity to other verb derivations, such as causative and inchoative derivations; and its sensitivity to syntactic elements like negation. It is difficult to analyze the s- either as added in a morphological component independent of the syntax (e.g., by a lexical rule), or as added within the syntax. Based on the data to date, however, the best analysis may describe the s- as a functional element related to one of the semantic arguments required by these adjectives, verbs, adverbs, and nouns, regardless of their syntactic environment. For example, Kennedy (1997) has argued that gradable adjectives denote a measure function which requires both an individual argument and a time argument; the requirement of such a time argument may extend to verbs and non-gradable adjectives, as well. Within such an analysis, functional structure would be present in all syntactic contexts for words which require the s- prefix. An association of the s- prefix with temporal event structure should not be surprising if this prefix does indicate stativity.
This paper analyzes a two-part negator in Yang Zhuang [zyg], which is unique among this area's Taic varieties. Zheng (1996) describes the negation system of Yang, including two standard preverbal negators buj [pu³³] (tone C1) and meiz... more
This paper analyzes a two-part negator in Yang Zhuang [zyg], which is unique among this area's Taic varieties. Zheng (1996) describes the negation system of Yang, including two standard preverbal negators buj [pu³³] (tone C1) and meiz [mei³¹] (tone A2), and an aspectual negator zaengz [tsaŋ³¹] (tone A2), similar in meaning to Mandarin 没 méi. These negators occur preverbally, but the negator meiz nearly always co-occurs with a clause-final particle nauq [naːu³⁵] (tone B1). Local speaker intuitions vary as to the acceptability of nauq with the other preverbal negator buj and the aspectual negator zaengz; some consider it optional, while others find it unacceptable. Strikingly, nauq can also stand by itself as a single-word negative response to a question, like English no. Comparison with neighboring Taic varieties to the south, west, and north suggests that the Taic languages in this region represent different stages in Jespersen's Cycle, whereby a standard negator is first augmented with a second element (often a quantifier or other adverbial), then the second element becomes linked with the meaning of negation, and finally replaces the historical negator as the main clausal negator of the language. This suggests a typologically unsurprising path of development for this two-part negator, rather than requiring a special explanation for its development.
This talk presents the method used in a recent survey of the Dejing dialect area of Zhuang for distinguishing the influence of previous contact on estimates of dialect intelligibility. It shows the usefulness of comparing test... more
This talk presents the method used in a recent survey of the Dejing dialect area of Zhuang for distinguishing the influence of previous contact on estimates of dialect intelligibility. It shows the usefulness of comparing test participants' self-reporting of language exposure, results of comprehension testing, and phonetic similarity from word lists.
Pima (Tepiman, Southern Uto-Aztecan) has two derivational suffixes which attach to non-stative verbs to produce apparently stative verbs referring to states resulting from the events denoted by the base verbs. Saxton, Saxton, and Enos... more
Pima (Tepiman, Southern Uto-Aztecan) has two derivational suffixes which attach to non-stative verbs to produce apparently stative verbs referring to states resulting from the events denoted by the base verbs. Saxton, Saxton, and Enos (1983) mention these suffixes without discussing them in detail; examples are shown in (1) (the suffixes in question are in bold).
(1) a. Heriberto  'o                'eesto-kc                            heg                'e-gat
                  3.subject  hide.perfective-result  determiner  ¬1.reflexive-gun
‘Heriberto has/keeps his gun hidden.’
b. Heriberto  'esh  'o                ge        hiviona-s.
                        jaw  3.subject  focus  shave-result
‘Heriberto’s jaw is shaved.’
Derived forms of this type, sometimes called resultatives, are not uncommon. Arguably similar morphemes can be found in languages as diverse as English, German, and Chichewa (a Bantu language), and possibly also San Lucas Quiaviní Zapotec and Chickasaw. The Pima suffixes, however, show unusual behavior regarding the entailment of the event bringing about the state referred to by the derived stative forms: when these suffixes attach to a morphologically simple verb, as in (1), the relevant resulting state generally may have held indefinitely far into the past; when these suffixes attach to a morphologically complex verb, such as one derived by the causative suffix -cud, the state must have come about through some past event (this event may be caused or spontaneous, so long as there is some definite instant at which the state came to hold).
Semantic analyses of the German state passive and English adjectival passive (Kratzer 2000) and the Chichewa stative (Dubinsky and Simango 1996) do not easily account for this variation concerning event entailments. Kratzer’s analysis, in the Davidsonian Event Semantics tradition, fails to capture the predictability in the absence of event entailments, and Dubinsky and Simango’s analysis, using a theory of Lexical Conceptual Structure like that of Pustejovsky (1998), predicts that forms derived by the same morpheme should either always have an event entailment or never have one, but should not vary as in Pima. In addition, both analyses cannot yield derived forms that include an external argument, as the –kc suffix appears to do.
This talk will look in detail at data from Pima, will present a comparison of the Pima, English, German, and Chichewa morphemes, and will discuss the difficulties in extending the existing analyses to the Pima suffixes. Although both Pima morphemes appear quite similar (with a difference only in the valence of the resulting derived form), evidence may indicate that ‑kc results in a habitual form, while –s results in a purely resultative form. This resolves the difficulty regarding the external argument, but does not address the variability in event entailments; lack of an external argument may reflect a deeper property of result states.
This paper presents the results of a pilot study of the phonetic variation of vowels due to stress and syllable type in Pima, a dialect of O’odham. O’odham, along with several other Uto-Aztecan languages, has a vowel system which appears... more
This paper presents the results of a pilot study of the phonetic variation of vowels due to stress and syllable type in Pima, a dialect of O’odham. O’odham, along with several other Uto-Aztecan languages, has a vowel system which appears unevenly distributed in two ways: it lacks any non-high front vowel, and includes three non-low back vowels. This arrangement of canonical vowels appears unaffected by a drive for maximal dispersion of canonical vowels, a factor which has been argued to control the distribution of vowels cross-linguistically. Several properties of the observed allophonic variation in Pima, however, can be explained by appeal to just such a drive to maximize distinctness. Factors besides maximal distinctness must also be involved in controlling this distribution, however, as evidenced by the relative stability of this vowel system among Uto-Aztecan languages.
Pima is a language of the Tepiman branch of Southern Uto-Aztecan, spoken in central Arizona. It is closely related to the language Tohono O’odham (also called Papago). Pima and Papago both include a peculiar prefix /s-/, which occurs on... more
Pima is a language of the Tepiman branch of Southern Uto-Aztecan, spoken in central Arizona. It is closely related to the language Tohono O’odham (also called Papago). Pima and Papago both include a peculiar prefix /s-/, which occurs on certain adjectives, verbs, adverbs, and nouns. Zepeda (1989:111), describing this morpheme in Papago, claims that it indicates stativity. Typical examples of the s- prefix are shown in (1); the s- prefix is in bold.
(1) a
S-hem-heegam  'a-n-t.
st-2s:obj-envy  a-1s:sub-pfv
‘I am jealous of you.’
      b
S-hem-heegam-k          'o'odham  'o      kosh.
st-2s:obj-envy-stage  man          imp  sleep
‘The jealous-of-you man is sleeping.’
      c
Microsoft  'o    si      s-heegm-am    ñuukud  heg  'e-'a'agidag.
                  imp  very  st-envy-adv  guard    det  ¬1:ref-red:secret
‘Microsoft guards its secrets jealously.’
This prefix occurs on such words whether they are used predicatively (1a), attributively (1b), or adverbially (1c), though it does not usually occur in nominalizations of such words. Surprisingly, it does occur on several nouns which are not nominalizations of verbs, it occurs on several activity (i.e., apparently non-stative) verbs when these verbs are used adverbially, and it can be licensed by several distinct derivational morphemes. It is frequently judged ungrammatical on inchoatives, causatives, and on negated elements. Although this s- prefix does often occur on verbs and adjectives which intuitively denote states, there is also a small set of apparently stative verbs and adjectives on which this prefix is ungrammatical. Moreover, although it is required or preferred in certain cases, it may often be optionally absent with no detectable change in meaning. For this reason, this prefix cannot be associated with stativity without question; in fact, it is extremely difficult to assign this prefix a clear meaning.
Despite these difficulties, certain facts about the structural location of this prefix may be deduced from several types of observations: its position as the leftmost verb prefix; its sensitivity to other verb derivations, such as causative and inchoative derivations; and its sensitivity to syntactic elements like negation. It is difficult to analyze the s- either as added in a morphological component independent of the syntax (e.g., by a lexical rule), or as added within the syntax. Based on the data to date, however, the best analysis may describe the s- as a functional element related to one of the semantic arguments required by these adjectives, verbs, adverbs, and nouns, regardless of their syntactic environment. For example, Kennedy (1997) has argued that gradable adjectives denote a measure function which requires both an individual argument and a time argument; the requirement of such a time argument may extend to verbs and non-gradable adjectives, as well. Within such an analysis, functional structure would be present in all syntactic contexts for words which require the s- prefix. An association of the s- prefix with temporal event structure should not be surprising if this prefix does indicate stativity.
This talk presents a pilot study evaluating the concept of Vowel Space Density for use in dialectometry and early planning of language documentation and development efforts. The Vowel Space Density for a sample of connected speech can be... more
This talk presents a pilot study evaluating the concept of Vowel Space Density for use in dialectometry and early planning of language documentation and development efforts. The Vowel Space Density for a sample of connected speech can be calculated automatically from data that can be collected relatively quickly. Although initial results suggest that this may be a useful tool for characterizing variation within closely-related speech varieties, significant challenges remain to be addressed before it can be effectively and confidently applied to this problem, and different methods of data collection may be better for guiding documentation planning or language survey design.
This colloquium talk (in Chinese) addresses three questions: (1) Why should we make audio recordings as an integral part of linguistic fieldwork? (2) What should be recorded as part of linguistic fieldwork? (3) How should such audio... more
This colloquium talk (in Chinese) addresses three questions: (1) Why should we make audio recordings as an integral part of linguistic fieldwork? (2) What should be recorded as part of linguistic fieldwork? (3) How should such audio recordings be made? While very basic in its scope, it aims to equip those who are just beginning their linguistic research with the important issues to consider when preparing to make audio recordings for linguistic analysis in a field setting. (This talk was given as a prezi; the URL above links to the online version.)
Many linguists have used the three-letter language identifiers from the Ethnologue, and now from the ISO 639-3 standard, to distinguish the languages that have been their object of study. Establishing the referents of these codes for... more
Many linguists have used the three-letter language identifiers from the Ethnologue, and now from the ISO 639-3 standard, to distinguish the languages that have been their object of study. Establishing the referents of these codes for populous, well-developed languages in some cases seems linguistically trivial, and in other cases seems hopelessly separated from linguistic issues, linked instead to political controversy.

In contrast, for the many under-documented or under-developed languages that are represented by these language identifiers, the determination of what exactly constitutes the language that is represented by a given three-letter identifier can also seem the domain only of historical-comparative linguists, or can seem full of arbitrary questions – at what point does dialect variation become so large that it constitutes separate languages? In actuality, however, the kind of sociolinguistic research that informs this decision-making process is accessible even to very general linguists, and any linguist who is involved in field research on an under-documented or under-developed language can provide important help in improving the representation of languages by ISO language identifiers.

In this talk, I will take as an example the case of the Dejing Zhuang dialect area, in the southwestern corner of the Guangxi Zhuang Autonomous Region, China, in which several Central Taic language varieties can be found, and which was the focus of a dialect survey in 2008 carried out in partnership between SIL International and the Guangxi Minorities Language and Scripts Work Commission. I will step through the process of dialect survey by reviewing what was known about the language situation before this survey, the development of research questions to be answered, and the selection of research instruments that would most effectively answer those questions. I will then show how this information helps refine the representation of this area in terms of ISO language identifiers, benefiting members of these language communities through well-targeted language development efforts, and benefiting linguists through a more detailed understanding of historical changes within Taic languages.

While the determination of “what counts as a language” can involve political issues, as well as linguistic and social issues, it does not need to be a process that linguists are afraid to become involved in. I hope that through this talk, other linguists involved in fieldwork will be motivated to join in improving the representation in terms of ISO language identifiers for the languages and language communities that are the focus of their work.
This talk describes some of the less result-like interpretations associated with the Pima resultative suffix, and presents and evaluates a unified analysis for these interpretations.
As linguists have gained an improved understanding of the languages and linguistic situations around the world, they have become increasingly aware of the critical situation faced by many language communities, as pressure from more... more
As linguists have gained an improved understanding of the languages and linguistic situations around the world, they have become increasingly aware of the critical situation faced by many language communities, as pressure from more dominant languages in many cases is causing a shift away from the heritage languages of these communities. This impending language loss has moved many linguists to work urgently toward documenting these many languages while vibrant language communities still remain, or before these languages are lost completely.

Working on an under-developed or under-documented language in a field setting, however, is not something that all graduate programs in linguistics train their students to do. In this talk, I will begin by reviewing some of the necessity of linguistic fieldwork in a modern setting—the “why” of fieldwork. I will then move on to some considerations of the “how” of fieldwork—mechanical issues of data collection and management, as well as practical issues of relating to minority language communities and some of the difficulties of working for an extended time in a cross-cultural setting, focusing a bit on fieldwork in rural areas of China.

This talk cannot substitute for a semester- or year-long course in field methods, or the learning from experience that actual fieldwork brings—experience both positive and negative. In this short time, however, I hope to show other linguists the importance of language documentation and linguistic fieldwork if they are not currently involved in such work, and provide references to other sources of information that will help them start such work on their own.
A number of proposals in lexical semantics have claimed that certain causative and inchoative events are related to states; these types of events correspond to the causation of a state, or the coming to hold of a state. (Dowty 1979,... more
A number of proposals in lexical semantics have claimed that certain causative and inchoative events are related to states; these types of events correspond to the causation of a state, or the coming to hold of a state. (Dowty 1979, Rappaport Hovav and Levin 1998, among others) The linguistic objects which correspond to such eventualities are typically given an analysis in which the event-denoting forms are derived from the state-denoting forms. This talk will present data on two suffixes of O’odham (Tepiman, Southern Uto-Aztecan) which appear to work in the opposite direction: the stems they attach to denote events, while the suffixed forms denote states; at least a subset of these appear to be states which entail a past event, and may therefore be compositionally derived.  The effect of these suffixes resembles that of morphological alternations in a number of other languages, and the applicability to the O’odham data of the analyses for two such alternations—the German state passive (Kratzer 2000) and the Chichewa stative (Dubinsky and Simango 1996)—will be discussed, focusing on the introduction of external arguments and the entailment of a past event.  Little work has been done on this type of alternation within the framework of Distributed Morphology (Halle and Marantz 1993); the mechanisms available within this framework to analyze such alternations will also be discussed.
This course is an introduction to syntactic theory with an emphasis on data analysis, critical thinking, and theory development. It is taught within the generative Principles and Parameters approach to syntax.
Research Interests:
This course aims to prepare students for careers in computational linguistics and natural language processing, where some basic software engineering skills are required. Students will design and implement a project using industry best... more
This course aims to prepare students for careers in computational linguistics and natural language processing, where some basic software engineering skills are required. Students will design and implement a project using industry best practices, with a focus on designs that can scale up to meet the needs of a large number of potential users. We will discuss topics such as test-driven development, object oriented programming, databases, web scraping, and RESTful APIs. There will also be a discussion of how to prepare for a technical job interview. By the end of the course, students will have a project that they can show to potential recruiters and have a plan to prepare for getting their first job.
This class serves as an introduction to human language technology (HLT), an emerging interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer... more
This class serves as an introduction to human language technology (HLT), an emerging interdisciplinary field that encompasses most subdisciplines of linguistics, as well as computational linguistics, natural language processing, computer science, artificial intelligence, psychology, philosophy, mathematics, and statistics. Content includes a combination of theoretical and applied topics such as (but not limited to) tokenization across languages, n -grams, word representations, basic probability theory, introductory programming, and version control.
This intermediate-level course is a continuation of Ling 529 and covers the basics of information retrieval, focusing on both search and classification. This course will present students with the fundamentals of text search in the context... more
This intermediate-level course is a continuation of Ling 529 and covers the basics of information retrieval, focusing on both search and classification. This course will present students with the fundamentals of text search in the context of a simple boolean search. We'll then refine our methods for effective search—returning the *best* results—by exploring issues of similarity and weighting of terms. We'll finish the course by exploring document classification, comparing statistical methods and vectorspace methods.
This advanced-level course introduces non-statistical concepts, tools, and methods for working with natural language in computational systems. This course complements the introductory statistical NLP course, Linguistics 539 (and is a... more
This advanced-level course introduces non-statistical concepts, tools, and methods for working with natural language in computational systems. This course complements the introductory statistical NLP course, Linguistics 539 (and is a prerequisite for 539). This course will introduce programming that is relevant to computational linguistics in three programming languages: Perl, Python and Prolog. This course will also introduce concepts and tools that are commonly used in symbolic computational linguistics: regular grammars, as represented by regular expressions, finite-state automata and finite-state transducers; and context-free grammars, as represented by Prolog definite clause grammars. We'll apply theses tools to parsing in a small range of realistic language data.
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov... more
This course introduces the key concepts underlying statistical natural language processing. Students will learn a variety of techniques for the computational modeling of natural language, including: n-gram models, smoothing, Hidden Markov models, Bayesian Inference, Expectation Maximization, Viterbi algorithm, Inside-Outside algorithm for Probabilistic Context-Free Grammars, and higher-order language models.
Linguistics 454/554 is a co-convened undergraduate and graduate course in phonology. It is the second in the ORSIL phonology series and covers the following topics: feature geometry; underspecification; lexical phonology; syllable theory;... more
Linguistics 454/554 is a co-convened undergraduate and graduate course in phonology. It is the second in the ORSIL phonology series and covers the following topics: feature geometry; underspecification; lexical phonology; syllable theory; autosegmental phonology; metrical phonology; and Optimality Theory.
Fall 2015, MA level introductory course at Yunnan Normal University
这门研究生课程简单介绍现代语言学中的发音语音学及音系学。不要求学生以前接触过语音学及音系学。 This course is a basic introduction to the domains of phonology and articulatory phonetics in modern linguistics at the graduate level. It assumes no previous knowledge of phonetics or... more
这门研究生课程简单介绍现代语言学中的发音语音学及音系学。不要求学生以前接触过语音学及音系学。
This course is a basic introduction to the domains of phonology and articulatory phonetics in modern linguistics at the graduate level. It assumes no previous knowledge of phonetics or phonology.
Research Interests:
The negation system of Yang Zhuang includes two standard negators and an aspectual negator, all of which occur before the verb; the negator <meiz> nearly always co-occurs with a clause-final particle <nauq>, which can also stand as a... more
The negation system of Yang Zhuang includes two standard negators and an aspectual negator, all of which occur before the verb; the negator <meiz> nearly always co-occurs with a clause-final particle <nauq>, which can also stand as a single-word negative response to a question. Although it is tempting to analyze <nauq> with a meaning beyond simply negation, this is difficult to do synchronically. Comparison with neighboring Taic varieties suggests that this represents one stage in Jespersen's Cycle, whereby a negator is augmented with a second element, after which the second element becomes associated with negation; this element subsequently replaces the historical negator.
High infant mortality rates are one of the most heart-wrenching characteristics of developing nations. Targeting limited development funds to have the greatest impact on infant mortality is a desirable outcome. It’s been observed for many... more
High infant mortality rates are one of the most heart-wrenching characteristics of developing nations. Targeting limited development funds to have the greatest impact on infant mortality is a desirable outcome. It’s been observed for many years that improving literacy rates among females is correlated with improved child mortality rates (Pinto et al 1985, Sandiford et al 1995, Levine & Levine 2001, Shetty & Shetty 2014, among others). The World Development Indicators dataset provides multiple indicators of literacy and child & infant mortality. Can this dataset help us target the right kinds of programs?
From a simple comparison of this data, literacy in young females (15-24) correlates most strongly with child (<5) mortality. Surprisingly, literacy in general seems to have a stronger effect on reducing mortality in children after their first year of life. This may indicate that causes of death in the first year are less addressable (relative to years 1-4) solely by parental education.
This analysis is very preliminary; further work is needed to confirm these observations.