GEMMA WHOLE GENOME SEQUENCING DATA PROCESSING
TOOL TYPE
Data analysis, Script
TARGET USERS
Science & research
LEAD PARTNER
Tampere University, CNR-ITB
COMPLETENESS
90%
(delivery DEC 2025)
GEMMA WHOLE GENOME SEQUENCING DATA PROCESSING
We developed a computational pipeline with accompanying scripts for the alignment for paired end Illumina sequencing data and subsequent variant calling in the GEMMA project.
- Alignment: Data is aligned to hg38 reference genome, reads are marked for duplicates and alignment statistics are collected.
- Variant calling: Short variants are called and samples are jointly genotyped. The multisample vcf enables downstream analysis such as pca and polygenic risk scores (PRS). Variants are annotated and statistics are collected. Quality control checks sample sex and relatedness for possible sample swaps. De novo variants are detected and inheritance patterns between siblings concordant and discordant for autism are inferred.
- CNV calling: CNVs are called and merged to form a multisample vcf. Downstream analysis includes evaluation of CNV distribution between cases and controls as well as pathway analysis.
We anticipate releasing the full pipeline and their results together with the anticipated main GEMMA paper.
INSTRUCTIONS
The scripts are shell scripting appropriate for Linux terminal. Data will be released together with the main GEMMA publication.
Additional instructions: