Jyväskylä, Finland

Longitudinal Data Analysis

when 17 August 2016 - 19 August 2016
language English
duration 1 week
credits 2 EC

We first present linear mixed models for continuous hierarchical data. The focus lies on the modeler’s perspective and on applications. Emphasis will be on model formulation, parameter estimation, and hypothesis testing, as well as on the distinction between the random-effects (hierarchical) model and the implied marginal model. Second, models for non-Gaussian data will be discussed, with a strong emphasis on generalized estimating equations (GEE) and the generalized linear mixed model (GLMM). To usefully introduce this theme, a brief review of the classical generalized linear modeling framework will be presented. Similarities and differences with the continuous case will be discussed. The differences between marginal models, such as GEE, and random-effects models, such as the GLMM, will be explained in detail. Third, when analyzing hierarchical and longitudinal data, one is often confronted with missing observations, i.e., scheduled measurements have not been made, due to a variety of (known or unknown) reasons. It will be shown that, if no appropriate measures are taken, missing data can cause seriously jeopardize results, and interpretation difficulties are bound to occur. Methods to properly analyze incomplete data, under flexible assumptions, are presented. All developments will be illustrated with worked examples using the SAS System. Hands-on SAS exercises will supplement the lectures.


Theoretical sessions:
• Session 1: Linear mixed models, model formulation, parameter interpretation, hierarchical versus marginal model interpretation, estimation and inference, empirical bayes

• Session 2: Model families for discrete outcomes, marginal models, generalized estimating equations (GEE)

• Session 3: Generalized mixed models, estimation methods (Laplace, MQL, PQL, Quadrature), comparison with GEE

• Session 4: Missing data mechanisms, problems with nonrandom dropout (i.e., bias, loss of efficiency, etc.), modeling frameworks to handle dropout (selection, pattern mixture and shared parameter models), sensitivity analyses

Practical sessions:
• Session A: Linear mixed models and generalized estimating equations
• Session B: Generalized linear mixed models
• Session C: Missing data

Learning outcomes and instructional methods: As a result of the course, participants should be able to perform a basic analysis for a particular longitudinal data set at hand, using linear, generalized linear, and non-linear tools for longitudinal data. Based on a selection of exploratory tools, the nature of the data, and the research questions to be answered in the analyses, they should be able to construct an appropriate statistical model, to fit the model within the SAS framework, and to interpret the obtained results. Further, participants should be aware not only of the possibilities and strengths of a particular selected approach, but also of its drawbacks in comparison to other methods. The course is explanatory rather than mathematically rigorous. Emphasis is on giving sufficient detail in order for participants to have a general overview of frequently used and novel approaches, with their advantages and disadvantages, while giving reference to other sources where more detailed information is available. Also, it will be explained how the different approaches can be implemented in statistical software, and how the resulting outputs should be interpreted.

The use of SAS in the lectures will be in generic terms, so that users of alternative platforms will equally benefit from the course and the implementation strategies discussed therein. The practical sessions will use self-contained tutorial material, so that it is of benefit, both during the course and in the participants’ own time.

Course leader

Lecturer: Prof. Geert Molenberghs (KU Leuven and Hasselt University, Belgium)

Coordinator: Dr. Sara Taskinen (University of Jyvaskyla)

Target group

Throughout the course, it is assumed that the participants are familiar with basic statistical modelling, including linear models (regression and analysis of variance), as well as generalized linear models (logistic and Poisson regression). Moreover, pre-requisite knowledge should also include general estimation and testing theory (maximum likelihood, likelihood ratio).

The Summer School annually offers courses for advanced master’s students, graduate students, and post-docs in the various fields of science and information technology.

Course aim

The most important aims of the Summer School are to develop post-graduates scientific readiness and to offer students the possibility to study in a modern, scientific environment and to create connections to the international science community. The Summer School offers an excellent pathway to develop international collaboration in post-graduate research.

Credits info

2 EC
Grading: Pass/fail

Passing: A data-analytic assignment will be given. Students prepare a report and send an electronic version of it to the course instructor after the course, with an agreed deadline.

Fee info

EUR 0: Participating the Summer School is free of charge, but student have to cover the costs of own travel, accommodation and meals at Jyväskylä.


The 26th Jyväskylä Summer School is not able to grant any Summer School students financial support.