Gemma - Multi-Omics Toolbox APPROACH
Advantages of the Toolbox Approach
- Reproducibility: scripts, containers, and workflow managers ensure that analyses can be replicated across environments.
- Data security: sensitive data is not shared and thus do not need to be uploaded to external servers.
- Resource efficiency: avoids maintaining an online platform while still allowing scalable computations on institutional systems.
- Ease of maintenance: pipelines can be updated and released through version control (e.g., GitHub), with changelogs and environment specifications.
- Analysing data: The data will be linked to the toolbox once the results are published in scientific articles. The data will be shared via dedicated data repositories (e.g., ENA, SRA, GEO, EGA, MGnify) designed for this purpose.
- User support: documentation, and examples facilitate the further use of the tools.
Risk Mitigation
- Usability: clear README, and instructions.
- Dependencies: will be given in instructions.
- Computation needs: documented resource requirements.
- Integration methods: toolbox includes modules for both mid-integration (combined modeling) and late-integration (result combination), depending on the scientific question.
- Sustainability: reproducibility supported via version control, documented updates, and maintained environments.
Gemma Toolbox as a Platform
Gemma-toolbox-based solution is preferable for sensitive biomedical data, as it prioritizes reproducibility, scalability, and data protection. By combining pipelines, workflow managers, and comprehensive documentation, the toolbox constitutes a sufficient and efficient multi-omics platform for the proposed research.
Integration Strategies within the Toolbox
The toolbox will support multiple integration strategies depending on the research question and data availability. Late integration is implemented by processing each omics layer independently (e.g., calling variants, detecting differentially methylated regions, profiling microbial differences) and combining the results at the interpretation stage, through pathway enrichment, network analyses, or meta-analyses. This approach is straightforward, robust to technical differences across datasets, and provides biologically interpretable outcomes.
Another option for integration is mid integration, which is achieved by feeding the different omics layers into a joint statistical or machine learning model , which identifies shared latent factors across data types while preserving their specific structures. This approach is more powerful for uncovering cross-omics interactions and mechanistic links.
With both late and mid integration modules, the toolbox ensures flexibility: researchers can choose simpler workflows when appropriate, but also apply state-of-the-art integrative modeling for comprehensive analyses.
Last option in the toolbox is a graph-based integration model. It is an integration strategy based on logical connections among nodes that represent omics data. Reactome Pathways database is then used to glue different omics domains according to their participation in biological reactions.
In GEMMA project the mid integration strategy for is used for integrating microbiome, methylation and genome data, and further late integration strategy for associating metabolomics, immunoprofiling and proteome data with the other omics measurements.
Available Tools
TOOL TYPE | TITLE | LEAD PARTNER |
|---|---|---|
Data analysis, Script | Tampere University, INRAE | |
Data analysis, Script | Tampere University, CNR-ITB | |
Data analysis, Script | Tampere University, CNR-ITB | |
Dataset | A NETWORK OF MOLECULAR AND FUNCTIONAL INTERACTIONS TO ANALYSE GEMMA OMICS DATASETS | CNR-ITB |
Data analysis, Script | NETWORK-BASED MULTI-OMICS INTEGRATION TO PRIORITIZE FEATURES IN GEMMA OMICS DATASETS | CNR-ITB |
Data analysis, Script | CNR-ITB | |
Data analysis, Script | CNR-ITB | |
Data analysis, Script | Medinok, Italy | |
| Automated full clinical NGS data quality control and validation | Euformatics | |
| Agnostic clinical variant annotation and interpretation for gene panels, WES, and WGS | Euformatics |