Event: Bootcamp RLING2017 - Statistics for Linguistics with R
Instructor: Prof. Stefan Th. Gries
Dates: 10 - 14 July 2017
Venue: Université catholique de Louvain, Louvain-la-neuve, Belgium
Convenor: Magali Paquot
Registration 13 March - 15 May (Space is limited!)
UCL-ILC/PLIN members (special rate): until 13 March
General Information
Oriented towards both graduate students and seasoned researchers, this statistics bootcamp is a 30-hour hands-on course on statistical methods using the open source software and programming language R. The course is intended for linguists who already have a basic knowledge in statistics and some experience using R (see prerequisites), and who wish to improve their proficiency in statistical analysis of linguistic data.
The number of participants is limited to 20.
Program
The statistics bootcamp is a 30-hour hands-on introduction to statistical methods for both graduate students and seasoned researchers. Using the open source software and programming language R, we will:
- briefly recap basic aspects of statistical evaluation as well as several descriptive statistics (about 1 day);
- discuss monofactorial statistical tests for frequencies, means, dispersions, correlations; the emphasis will be on practicing these as well as to understand how many of these approaches are in fact just special (limiting) cases of regression methods (about 1 day);
- explore different kinds of multifactorial and multivariate methods, in particular different kinds of regression approaches as well as hierarchical cluster analysis. Specifically, we will discuss in detail linear and binary logistic regression to (i) understand what exactly the meaning of regression coefficients and summary statistics are and (ii) visualize their results; in addition, we will discuss selected cases of poisson and/or multinomial regression. (about 1-1.5 days). Then, there will be a component devoted to mixed-effects/multilevel modeling in R to cover crossed random effects (discussed in much linguistic literature by now) and nested random effects (hardly discussed in linguistics so far). Time and interest permitting, we will also discuss contrast settings as well as regression approaches allowing to model curvature. (about 1 day). The remainder of the workshop will cover exploratory methods - mostly hierarchical cluster analysis and follow-up evaluation statistics.
For all statistical methods to be explored, we will discuss how to test their assumptions and visualize their results with nice and annotated statistical graphs, and sometimes we will reanalyze published data from corpus-linguistic studies. The participants will also get small functions they can use for their own statistical applications. Also, time permitting, there will be a small section on how to write small statistical/visualization functions yourself.
The content of the bootcamp is based on: Gries, S. Th. (2013) Statistics for Linguistics with R. Berlin: Mouton de Gruyter. The first four chapters of the book are considered required reading for participation in the bootcamp. (see prerequisites)