Multi-Level Modeling

Most of our statistical models rely on the assumption that each observation is independent. However, individual health can be “clustered” due to the influence of shared contexts or “contagious” due to the transmission of ideas or pathogens, violating assumptions of independence. This non-independence may be of direct interest, or merely a nuisance causing our standard errors to be incorrect.

The non-independence among real people can often be ignored in sparse samples. A study of 100 randomly sampled adults in the U.S. is unlikely to include more than 1 person from any family, neighborhood, workplace, or clinic. This sample includes such a small fraction of the total population that any individual’s close contacts are unlikely to also be sampled. Whatever generalized linear model we apply to this sample, the residuals are likely to be approximately independent.

On the other hand, dense samples and those that incorporate groups into the recruitment strategy warrant special attention. A dense sample including a large segment of the population (e.g., a sample of 1 million New Yorkers) is much more likely to include people who share health-relevant contexts or who influence each other’s behaviors and exposures. Identifying groups within the population (e.g., adults who live in the same neighborhood) may help to address this non-independence. A complex sample (e.g., a sample of 100 children in each of New York’s public schools) may incorporate group identification directly into the recruitment strategy. Even after the individual- and group-level predictors are entered into a model to predict individual health, the residuals may be correlated among individuals in the same group. Mixed models (aka random effects models or multilevel models) are an attractive option for working with clustered data.







A brief conceptual tutorial on multilevel analysis in social epidemiology

Multilevel analysis in public health research

When can group level clustering be ignored? Multilevel models versus single-level models

The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology

Estimating intraclass correlation for binary data

A glossary for multilevel analysis

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

To GEE or not to GEE: comparing population average and mixed models

Modeling neighborhood effects: the futility of comparing mixed and marginal approaches


Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models

Some applications of generalized linear latent and mixed models in epidemiology

Growth Modeling Using Random Coefficient Models: Model Building, Testing, and Illustrations


Neighborhood influences on the association between maternal age and birthweight

Effects of neighbourhood SES and convenience store concentration on individual level smoking

Long-term antipsychotic treatment and brain volumes: a longitudinal study of schizophrenia

The changing distribution and determinants of obesity in the neighborhoods of New York City