Skip to content

Gemma - Multi-Omics Toolbox

In this project, we will implement a multi-omics toolbox, which is a documented toolbox (software package) to analyze the multi-omics GEMMA data. The toolbox includes pipeline modules for each omics layer (genomics, epigenomics, metagenomics, metabolomics, and immune profiling), comprehensive quality control (QC) routines, and integrative analysis methods (mid-integration strategies such as MOFA/MixOmics, with optional late-integration reporting).

The toolbox enables reproducible preprocessing and integrative analysis of multi-omics data in local computing environments (HPC or institutional servers). This solution ensures that sensitive data remain secure and never leave local infrastructures, while guaranteeing reproducibility and providing scalable routes for large datasets.

The toolbox covers both the necessary analyses to address the scientific aims and clear step-by-step documentation with example datasets, enabling colleagues to reproduce workflows independently.

Table of Contents

Advantages of the Toolbox Approach

Risk Mitigation

Potential concerns and our solutions:

Gemma Toolbox as a Platform

Gemma-toolbox-based solution is preferable for sensitive biomedical data, as it prioritizes reproducibility, scalability, and data protection. By combining pipelines, workflow managers, and comprehensive documentation, the toolbox constitutes a sufficient and efficient multi-omics platform for the proposed research.

Integration Strategies within the Toolbox

The toolbox will support multiple integration strategies depending on the research question and data availability. Late integration is implemented by processing each omics layer independently (e.g., significant variants, detecting differentially methylated regions, profiling microbial differences) and combining the results at the interpretation stage, through pathway enrichment, network analyses, or meta-analyses. This approach is straightforward, robust to technical differences across datasets, and provides biologically interpretable outcomes.

Another option for integration is mid integration, which is achieved by feeding the different omics layers into a joint statistical or machine learning model (e.g., MOFA, MixOMics), which identifies shared latent factors across data types while preserving their specific structures. This approach is more powerful for uncovering cross-omics interactions and mechanistic links.

With both late and mid integration modules, the toolbox ensures flexibility: researchers can choose simpler workflows when appropriate, but also apply state-of-the-art integrative modeling for comprehensive analyses.

We will use mid integration strategy for integrating microbiome, methylation and genome data, and further late integration strategy for associating metabolomics, immunoprofiling and proteome data with the other omics measurements.

Available Tools

TOOL TYPE

TITLE

LEAD PARTNER

COMPLETENESS

Data analysis, Script

OMICS INTEGRATION FOR PRECLINICAL GEMMA DATA

Tampere University, INRAE

100 %

Data analysis, Script

GEMMA WHOLE GENOME SEQUENCING DATA PROCESSING

Tampere University, CNR-ITB

90 % (delivery DEC 2025)

Data analysis, Script

BIOMARKER-BASED POLYGENIC RISK SCORE FOR GEMMA GENOMES​

Tampere University, CNR-ITB

90 % (delivery DEC 2025)

Dataset

A NETWORK OF MOLECULAR AND FUNCTIONAL INTERACTIONS TO ANALYSE GEMMA OMICS DATASETS

CNR-ITB

80 % (delivery DEC 2025)

Data analysis, Script

NETWORK-BASED MULTI-OMICS INTEGRATION TO PRIORITIZE FEATURES IN GEMMA OMICS DATASETS

CNR-ITB

90 % (delivery DEC 2025)

Data analysis, Script

ASSESSMENT OF FUNCTIONAL SIMILARITY AMONG BIOMARKERS

CNR-ITB

80 % (delivery DEC 2025)

Omics integration for preclinical GEMMA Data

Computational pipeline with accompanying scripts for the multi-omics integration of data from GEMMA's preclinical FMT mouse experiments

GEMMA WHOLE GENOME SEQUENCING DATA PROCESSING​

We developed a computational pipeline with accompanying scripts for the alignment for paired end Illumina sequencing data and subsequent variant calling in the GEMMA project

BIOMARKER-BASED POLYGENIC RISK SCORE FOR GEMMA GENOMES

We developed a computational pipeline with accompanying scripts for the construction of biomarker informed polygenic risk scores (bioPRS) using genotypes called from GEMMA WGS. Standard PRS variants and effects are collected from Grove et al. 2019 study

A NETWORK OF MOLECULAR AND FUNCTIONAL INTERACTIONS TO ANALYSE GEMMA OMICS DATASETS​

We developed a network of molecular and functional interactions to analyse gene-related, metabolite-related and microbiota species-related input scores derived from GEMMA omics datasets.

NETWORK-BASED MULTI-OMICS INTEGRATION TO PRIORITIZE FEATURES IN GEMMA OMICS DATASETS

We developed a pipeline for the integrative analysis of gene-related, metabolite-related and microbiota species-related data.

ASSESSMENT OF FUNCTIONAL SIMILARITY AMONG BIOMARKERS

We developed a pipeline to assess the similarity among biomarkers. The approach uses molecular and functional interactions, as well as molecular pathways, to estimate the functional similarity between novel biomarkers and existing biomarkers.

en_USEnglish