PsyCourse

2023-06-10

065_ Outlier analyses in whole blood transcriptomes to identify rare genetic variants in individuals with schizophrenia (Amendment to 041)

Research Question and Aims

Even in common complex genetic diseases, such as schizophrenia, common genetic alterations explain only part of the heritability of the disease. Therefore, there is a possibility that not only common genetic alterations play a role In the genetic architecture of schizophrenia (PMID 31835028; PMID 29483656; PMID 29056061), but also that rare genetic variants with stronger effects on the phenotype contribute to this in a proportion of cases. It is known that there is a direct relationship between the frequency of a genetic variant in the general population and its potential effect on the phenotype (PMID 19812666), such that rarer variants have a stronger effect. Some such rare variants are already known and well characterized in the context of schizophrenia (PMID 28650482; PMID 25821909; PMID 25132547), but there is a strong possibility that there are many more rare genetic variants with a strong effect on the schizophrenia phenotype that have not been recognized to date. Such rare genetic variants with a strong effect will be identified in this project. Specifically, the project will use a newly developed bioinformatics method (transcriptomic outlier analysis) to identify functionally relevant rare genetic variants in a dataset consisting of 550 whole transcriptome analyses (WTAs) from the blood of schizophrenia patients belonging to the PsyCourse cohort.
The goal of this research project is to identify rare genetic variants that play a role in the development of schizophrenia. To this end, outlier analyses in transcriptome sequencing data will be used to identify individuals in whom rare genetic variants with a strong functional effect are present.

Analytic Plan

We will include those individuals with schizophrenia belonging to the PsyCourse Cohort for whom whole blood transcriptome data are available. Whole transcriptome analyses (WTA) for these individuals were performed in the context of a research project not related to the project described here in 550 patients with schizophrenia (Lexogen 3’RNA Sequencing, Illumina Platform, Next Generation Sequencing (NGS) Competence Center Bonn, Rheinische Friedrich-Wilhelms-Universität Bonn; bioinformatic processing by the Institute for Systems Biology and Bioinformatics, University of Rostock).
In our analysis, the already existing WTA data from the blood of schizophrenia patients from the PsyCourse study will be used for an analysis of expression and splicing outliers. For this purpose, after a quality control step, .bam files will be analyzed using the DROP workflow (PMID 33462443; PMID 33483494). This workflow involves, among other things, the use of an autoencoder to control for latent variables in the data set and to identify statistical outliers in the data. Instead of excluding these from the analysis as in most case-control comparisons, the outliers are the focus of this analysis.
To interpret the detected outliers with aberrant gene expression or aberrant RNA splicing, genetic data are needed. To this end, we will use available genome-wide genotyping data and, where and if available, genome-wide sequencing (whole genome sequencing, WGS) data (funding applied for but not yet received), since both common and rare regulatory variants in non-coding regions of the genome can often be responsible for changes in gene expression or RNA splicing. Where possible, already available proteomics data and lipidomics data will further be used to corroborate findings from the transcriptomic outlier analyses.
All bioinformatics analyses will be performed in R or Python as applicable.

Resources needed

v1_age
v1_yob
v1_sex
v1_center
v1_cur_psy_trm
v1_age_1st_out_trm
v1_age_1st_inpat_trm
v1_dur_illness
raw medication data sets (v1_med_clin_orig)
v1_fam_hist
v1_scid_dsm_dx
v1_scid_dsm_dx_cat
v1_med_pst_wk
v1_age_m_birth
v1_age_f_birth
v1_ed_status
v1_trms_daypat_outpat_trm
v1_panss_sum_pos
v1_panss_sum_neg
v1_panss_sum_gen
v1_panss_sum_tot
v1_cgi_s
v1_gaf
v1_lexo_id

v3_age
v3_sex
v3_center
v3_cur_psy_trm
raw medication data sets (v3_med_clin_orig)
v3_med_pst_wk
v3_ed_status
v3_trms_daypat_outpat_trm
v3_panss_sum_pos
v3_panss_sum_neg
v3_panss_sum_gen
v3_panss_sum_tot
v3_cgi_s
v3_gaf
v3_lexo_id

gwas_id
v1_prot_id
v3_prot_id
v1_ab_prof_id
v3_ab_prof_id

Genetic data:
Raw genotypes
Imputed genotypes

Transcriptomic data:
FASTQ and BAM files of Lexogen 3’ RNA Seq

Proteomic data:
QC-ed intensities and protein levels

Lipidomic data:
Lipid intensities