2022-02-04
049_ Replication study: Kernel Machine Regression Analysis of Networks considering the longitudinal course of phenotypes
Research Question and Aims
Pathway analysis is a specific approach to simultaneously analyze a set of genes or a pathway for association with a phenotype of interest. We perform pathway analysis by applying the kernel machine regression (KMR) with a specific kernel, the network kernel (Freytag et al., 2013) and for comparison the common linear kernel (for more details on KMR and kernel see below). The KMR was first developed for cross-sectional studies, but we and others extended it to longitudinal data. We implemented the longitudinal KMR as a new R package "KMRLoPa" which we aim to investigate on simulated longitudinal data as well as on PsyCourse data. For the latter we will focus on our previous investigation of the longitudinal course of executive functions (EFs) (Wendel et al. 2021), where we identified a LD block of nine SNPs associated with the change over time in set-shifting. The pathway analysis using the FUMA pipeline (MAGMA), which uses GWAS summary statistics, was unsuccessful. Here we will investigate if we gain even more information for pathway analysis when directly studying all available measurement points and using the raw genotype data to test whether specific gene sets/pathways are associated with the longitudinal executive performance.
Analytic Plan
First, we will explain the longitudinal kernel machine regression (KMR) and the network kernel. Then we will present our analysis plan example to study the longitudinal-KMR, including the simulation studies as well as the application.
A kernel machine regression is a semi-parametric regression including a covariate matrix of fixed effects (e.g. age, gender, principal components) and a non-parametric function integrating the genetic data (Schaid, 2010). The latter can be interpreted as random effects due to the estimation equivalence of KMR with linear mixed models (LMMs). We utilize this equivalence to extend the KMR to longitudinal data integrating additional random effects into the regression to correct for the dependence of phenotype measurements at different time points. The genetic data are integrated by a non-parametric function, which is most often unknown or computationally expensive. Here, instead of computing a highly complex function, we calculate a kernel matrix, a matrix comprising similarity assessments. This kernel matrix contains for each pair of individuals a scalar which describes how similar the pair is regarding their genotypes (SNPs) (assumption: N= number of individuals, then we have a NxN kernel matrix). Thus, we transform high-dimensional genotype data in a similarity value (scalar, low-dimensional data). The computation of the similarity values is very flexible. The kernel matrix only needs to be symmetric and semi-positive definite. We use two different kernels, the commonly applied linear kernel where the genotype matrix is multiplied with its transposed and the network kernel (Freytag et al., 2013). The network kernel developed by our group is more complex as it includes the genotype data of the individuals and additional information of the pathway analyzed. The latter is gained from pathway databases, here we use the Reactome database (https://reactome.org/). The information is included in form of two specific matrices, the annotation matrix (assigning SNPs to genes) and the adjacency matrix. The latter is a matrix displaying if two genes of the pathway are interacting with each other (entry=1) or not (=0). We multiply the matrices to obtain the final kernel matrix which is then tested for association by applying a variance-component test.
We will study properties (type-1 error, power) of the longitudinal-KMR using the network kernel and the linear kernel via simulations. As an exemplary pathway we selected the "Signaling by ERBB4" pathway from the Reactome database, as this pathway is highly connected and has key genes. The original topology is used as well as artificial changes to investigate e.g. less connected pathways. We expect that the network compared to the linear kernel will gain with higher connectivity.
Furthermore, we want to perform a pathway analysis to test different pathways for association with core executive functions (EFs). As in our previous GWAS (Wendel et al. 2021) we yielded nice results with the Trail-Making-Test, Part B (TMT-B) in PsyCourse, we will focus on this phenotype. Only if those results are not rewarding for our application in the context of discussing our new approach we will consider the Verbal Digit Span Backwards (VDS-B) as well. We selected roughly 20 pathways of the Reactome database based on the following keywords: serotonin, dopamine, gaba, glutamate, NMDA receptor, prefrontal cortex, synapse, plasticity and voltage gated potassium channels.
We will use the latest PsyCourse sample and genotype data version. We include all genotyped patients and healthy control individuals in which the TMT-B phenotype is assessed at least once. Our model is similar to the LMM used in the GWAS (Wendel et al. 2021), as we use log TMT-B or VDS-B as outcomes, add random intercepts and slopes to model the subject-specific time courses. Similar as before we expect to include age, sex, time, DSM-IV diagnoses and the top five-ancestry principal components as fixed effects. The SNPs are included by the kernel matrix applying the network kernel (Freytag et al. 2013) and the linear kernel in comparison.
Resources needed
Recruitment data:
Participant identity column v1/v2/v3/v4_id
Clinical/Control Status v1_stat
Data of interview v1/v2/v3/v4_interv_date
Recruitment center v1_center
Demographic information:
Sex v1_sex
Age (at first interview) v1/v2/v3/v4_age
Marital status v1/v2/v3/v4_martial_stat
Relationship status v1/v2/v3/v4_partner
Children v1_no_bio_chld
v1_no_adpt_chld
v1_stp_chld
Siblings v1_brothers
v1_sisters
v1_hfl_brthrs
v1_hlf_sstrs
v1_stp_brthrs
v1_stp_sstrs
Living alone v1/v2/v3_liv_aln
Education v1_ed_status
Employment v1_curr_paid_empl
Psychiatric history:
Current psychiatric treatment v1/v2/v3/v4_cur_psy_trm
Times treated as day-or inpatient v1_cat_daypat_outpat_trm
Medication:
Clinical participants v1/v2/v3/v4_Antidepressants
v1/v2/v3/v4_Antipsychotics
v1/v2/v3/v4_Mood_stabilizers
v1/v2/v3/v4_Tranquilizers
v1/v2/v3/v4_Other_psychiatric
Control participants v1/v2/v3/v4_Antidepressants
v1/v2/v3/v4_Antipsychotics
v1/v2/v3/v4_Mood_stabilizers
v1/v2/v3/v4_Tranquilizers
v1/v2/v3/v4_Other_psychiatric
Family history of psychiatric illness v1_fam_hist
Substance abuse:
Tobacco v1/v2/v3/v4_no_cog
Alcohol v1/v2/v3/v4_lftm_alc_dep
Illicit drugs v1/v2/v3/v4_evr_ill_drg
DSM-IV Diagnosis: schizophrenia (295.1/.2/.3/.6/.9)
schizophreniform disorder (295.4)
brief psychotic disorder (298.8)
schizoaffective disorder (295.7)
bipolar disorder (296.X [bipolar disorders incl. manic episode]) v1_scid_dsm_dx, v1_scid_dsm_dx_cat
Symptom rating scales:
PANSS Positive sum score v1/v2/v3/v4_panss_sum_pos
PANSS Negative sum score v1/v2/v3/v4_panss_sum_neg
PANSS Total score v1/v2/v3/v4_panss_sum_tot
IDS-C30 Total score v1/v2/v3/v4_idsc_sum
YMRS v1/v2/v3/v4_ymrs_sum
CGI v1/v2/v3/v4_cgi_s
GAF v1/v2/v3/v4_gaf
Neuropsychology (cognitive tests):
Trail-Making-Test v1/v2/v3/v4_nrpsy_TMT_A_rt
v1/v2/v3/v4_nrpsy_TMT_A_err,
v1/v2/v3/v4_nrpsy_TMT_B_rt
v1/v2/v3/v4_nrpsy_TMT_B_err
Verbal Digit span v1/v2/v3/v4_nrpsy_dgt_sp_frw
v1/v2/v3/v4_nrpsy_dgt_sp_bck
GSA Chip analysis IDs gsa_id
Imputed GSA Chip analysis IDs gsa_imp_id
Imputed data:
Trail-Making-Test nrpsy_TMT_A_rt
nrpsy_TMT_A_err
nrpsy_TMT_B_rt
nrpsy_TMT_B_err (long format)
Verbal Digit span nrpsy_dgt_sp_frw
nrpsy_dgt_sp_back (long format)
Genotype Data:
GSA Chip data