2019-09-27
021_ Providing a use case of longitudinal data for the manuscript "Phenendo: a tool for clustering of cross-sectional and longitudinal phenotype data"
Research Question and Aims
As mentioned above, the aim of this proposal is not the answer a scientific question, but to provide a use case that demonstrates the abilities of the toolbox PhenEndo to cluster longitudinal data. For this purpose, the use case should include more than one content-related group of variables and at least one of these variable groups should contain different data types (mixed data).
Schizophrenia is a heterogeneous illness regarding occurring symptoms and level of functioning. Many, but not all patients with schizophrenia not only experience episodes of positive symptoms, such as delusions and hallucinations, but also suffer from negative and/ or depressive symptoms. Especially negative symptoms are hard to treat and often have a high impact on the quality of life and level of functioning of patients. Therefore, we were interested to see, which clusters of patients emerge from the PsyCourse data with regards to level of functioning and negative as well as depressive symptoms.
Also, the severity of symptoms can be either rated by an external rater (e.g. clinician or interviewer in a study) or be assessed via self-report of a patient. Thus, we wanted to explore whether clusters of patients show comparable patterns over time between ratings on negative and depressive symptoms performed by an interviewer and self-report of depressive symptoms.
We selected three groups of variables for dimension reduction: psychosocial functioning, negative and depressive symptoms rated by an interviewer and depressive symptoms rated on a self-report form by study participants.
Analytic Plan
I) Sample selection
We selected a subsample of study participants with a DSM-IV diagnosis of schizophrenia and complete data at all four study visits (n = 76 from dataset version PsyCourse3.0 in long format).
We selected three groups of variables for dimension reduction: functioning, negative and depressive symptoms rated by an interviewer and depressive symptoms rated on a self-report form by study participants.
Group 1: Functioning mixed data
- Global Assessment of Functioning Score GAF ("gaf", continuous)
- Current employment status ("curr_paid_empl", ordinal)
- Current relationship status ("partner", categorical)
Group 2: Negative and depressive symptoms (rating by interviewer)
- Inventory of Depressive Symptomatology IDS-C30, 30 items ("idsc_", ordinal )
- The Negative Scale (= 7 items) from the Positive and Negative Syndrome Scale PANSS ("panss_n", ordinal)
Group 3: Depressive symptoms (self-report)
- Beck Depression Inventory II BDI-II, 21 items ("bdi2_", ordinal )
II) Clustering pipeline
Data are uploaded into the toolbox PhenEndo. In a first step, a dimension reduction method, factor analysis of mixed data (FAMD) is applied to each variable group. Next, the cluster algorithm flexmix is used to identify clusters of participants with similar trajectories across all variable groups.
III) Descriptive characterization of clusters
The identified clusters are compared regarding age, sex, functioning, current work status, current relationship status, depression sum scores, negative symptoms sum scores and an overall rating of course of illness.
Resources needed
The analyses will be performed in a subset of the PsyCourse sample (version PsyCourse 3.0long). Only participants with a DSM-IV diagnosis of schizophrenia and data at all four study visits will be included (n = 76).
The phenotypic variables for analysis will be as follows:
Variables for longitudinal clustering
curr_paid_empl
partner
gaf
idsc_itm1
idsc_itm10
idsc_itm15
idsc_itm16
idsc_itm17
idsc_itm18
idsc_itm19
idsc_itm2
idsc_itm20
idsc_itm21
idsc_itm22
idsc_itm23
idsc_itm24
idsc_itm25
idsc_itm26
idsc_itm27
idsc_itm28
idsc_itm29
idsc_itm3
idsc_itm30
idsc_itm4
idsc_itm5
idsc_itm6
idsc_itm7
idsc_itm8
idsc_itm9
idsc_11_12
idsc_13_14
panss_n1
panss_n2
panss_n3
panss_n4
panss_n5
panss_n6
panss_n7
bdi2_itm1
bdi2_itm10
bdi2_itm11
bdi2_itm12
bdi2_itm13
bdi2_itm14
bdi2_itm15
bdi2_itm16
bdi2_itm17
bdi2_itm18
bdi2_itm19
bdi2_itm2
bdi2_itm20
bdi2_itm21
bdi2_itm3
bdi2_itm4
bdi2_itm5
bdi2_itm6
bdi2_itm7
bdi2_itm8
bdi2_itm9
Variables for descriptive characterization of clusters
v1_id
visit
sex
ageBL
idsc_sum
panss_sum_neg
bdi2_sum
v4_opcrit
Of note, these data will NOT be published in raw form. Only the results will be published, and illustrations of these results will be contained in the documentation of the toolbox. Individual pseudonyms will not be published.
No biological data are needed.