GEMMA Multi-Omics Toolbox

The GEMMA multi-omics toolbox includes tools and pipeline modules developed and used during the GEMMA project. The toolbox includes dedicated pipelines for the analysis of individual omics layers, quality control (QC) routines, and integrative multi-omics workflows.

Quick Start

Tools include scripts, containers, workflow modules, and links to two software tools used in GEMMA omics analyses.
Requirements and dependencies are documented in each tool’s README.
Tools can be combined based on available omics layers and research aims.
Example datasets and usage examples are included.

Available Tools

TOOL TYPE	TITLE	LEAD PARTNER
Data analysis, Script	OMICS INTEGRATION FOR PRECLINICAL GEMMA DATA	Tampere University, INRAE
Data analysis, Script	GEMMA WHOLE GENOME SEQUENCING DATA PROCESSING	Tampere University, CNR-ITB
Data analysis, Script	BIOMARKER-BASED POLYGENIC RISK SCORE FOR GEMMA GENOMES	Tampere University, CNR-ITB
Dataset	A NETWORK OF MOLECULAR AND FUNCTIONAL INTERACTIONS TO ANALYSE GEMMA OMICS DATASETS	CNR-ITB
Data analysis, Script	NETWORK-BASED MULTI-OMICS INTEGRATION TO PRIORITIZE FEATURES IN GEMMA OMICS DATASETS	CNR-ITB
Data analysis, Script	ASSESSMENT OF FUNCTIONAL SIMILARITY AMONG BIOMARKERS	CNR-ITB
Data analysis, Script	ASSESSMENT OF CLASSIFICATION PERFORMANCE OF BIOMARKERS	CNR-ITB
Data analysis, Script	GRAPH-BASED MULTI-OMICS INTEGRATION	Medinok, Italy
Automated full clinical NGS data quality control and validation	omnomicsQ	Euformatics
Agnostic clinical variant annotation and interpretation for gene panels, WES, and WGS	omnomicsNGS	Euformatics

Toolbox Approach

Reproducibility: scripts, containers, and workflow managers ensure that analyses can be replicated across environments.
Data security: sensitive data is not shared and thus do not need to be uploaded to external servers.
Resource efficiency: avoids maintaining an online platform while still allowing scalable computations on institutional systems.
Ease of maintenance: pipelines can be updated and released through version control (e.g., GitHub), with changelogs and environment specifications.
Analysing data: The data will be linked to the toolbox once the results are published in scientific articles. The data will be shared via dedicated data repositories (e.g., ENA, SRA, GEO, EGA, MGnify) designed for this purpose.
User support: documentation, and examples facilitate the further use of the tools.

Integration Strategies within the Toolbox

The toolbox will support multiple integration strategies depending on the research question and data availability. Late integration is implemented by processing each omics layer independently (e.g., calling variants, detecting differentially methylated regions, profiling microbial differences) and combining the results at the interpretation stage, through pathway enrichment, network analyses, or meta-analyses. This approach is straightforward, robust to technical differences across datasets, and provides biologically interpretable outcomes.

Another option for integration is mid integration, which is achieved by feeding the different omics layers into a joint statistical or machine learning model , which identifies shared latent factors across data types while preserving their specific structures. This approach is more powerful for uncovering cross-omics interactions and mechanistic links.

With both late and mid integration modules, the toolbox ensures flexibility: researchers can choose simpler workflows when appropriate, but also apply state-of-the-art integrative modeling for comprehensive analyses.

Last option in the toolbox is a graph-based integration model. It is an integration strategy based on logical connections among nodes that represent omics data. Reactome Pathways database is then used to glue different omics domains according to their participation in biological reactions.

In GEMMA project the mid integration strategy for is used for integrating microbiome, methylation and genome data, and further late integration strategy for associating metabolomics, immunoprofiling and proteome data with the other omics measurements.

Risk Mitigation

Potential concerns and our solutions:

Usability: clear README, and instructions.
Dependencies: will be given in instructions.
Computation needs: documented resource requirements.
Integration methods: toolbox includes modules for both mid-integration (combined modeling) and late-integration (result combination), depending on the scientific question.
Sustainability: reproducibility supported via version control, documented updates, and maintained environments.