PsyCourse

2022-04-25

053_ Investigating the phenotypic and genetic predictors of study attrition in participants of mental health studies

Research Question and Aims

There is increasing evidence that individuals with certain non-random characteristics, like lower socioeconomic and education status, poorer health, non-European ancestry, or increased schizophrenia, neuroticism, and attention-deficit hyperactivity disorder polygenic scores, are less likely to participate in studies and more likely to drop out during the follow-up period. This is concerning, because non-participation and attrition are not only associated with a loss of statistical power, but, if non-random, they can also influence sample representativeness and thereby bias the generalizability and real-life utility of research findings, health policy decision making, and ultimately, the equity of health care provision. In this study, we aim to identify individual patients that are at high risk of study attrition and identify predictors (including sociodemographic, disease-related, and biological) that contribute to this risk. Findings could contribute to a better understanding of important confounders of statistical analyses and offer guidance on possible intervention points for optimized study recruitment.

Analytic Plan

Advanced machine learning classification models will be used to identify study participants at high risk of attrition using baseline sociodemographic, clinical (both state and trait markers), and genetic (PGS) information. Separate models will include longitudinal information from the follow-up study visits. Model generalizability will be assessed via a nested cross-validation framework. Furthermore, additional study cohorts (FOR2017, PRONIA, Exercise-II, RESIS [analysis proposals will be submitted]) will be used for external validation. We expect a significantly accurate classification accuracy, which will increase when including longitudinal information. Predictors with high predictive value will be explored using interpretable machine learning methods (SHAP). Model performance will be compared to traditional logistic regression.
Analyses will be performed on the CORE Cluster of the LMU Klinikum.

Resources needed

Full PsyCourse phenotypic dataset (both for patients and neurotypical controls participants) from all time-points. GWAS data.