Using classification and regression trees to model missingness in youth BMI, height and body mass data

Introduction: Research suggests that there is often a high degree of missingness in youth body mass index (BMI) data derived from self-reported measures, which may have a large effect on research findings. The first step in handling missing data is to examine the levels and patterns of missingness. However, previous studies examining youth BMI missingness used logistic regression, which is limited in its ability to discern subgroups or identify a hierarchy of importance for variables, aspects that may go a long way in helping understand missing data patterns.

Methods: This study used sex-stratified classification and regression tree (CART) models to examine missingness in height, body mass and BMI data among 74 501 youth participating in the 2018/19 COMPASS study (a prospective cohort study examining health behaviours among Canadian youth), where 31% of BMI data were missing. Diet, movement, academic, mental health and substance use variables were examined for associations with missingness in height, body mass and BMI.

Results: CART models indicated that the combination of being younger, having a self perception of being overweight, being less physically active and having poorer mental health yielded female and male subgroups highly likely to be missing BMI values. Survey respondents who did not perceive themselves as overweight and who were older were unlikely to be missing BMI values.

Lead Researchers

Link to Publication


  1. Jean-Philippe Chaput

    Senior Scientist, CHEO Research Institute

    View Profile Email