Where AI May Be Used in Metabolomic Research: Data Acquisition

An infographic illustrating potential applications of AI in metabolomic research data processing, featuring icons and descriptions of various stages such as sample analysis, data computing, molecular structure interpretation, and statistical analysis.

In the realm of metabolomics, where intricate networks of small molecules hold the keys to understanding biological systems, researchers face a constant battle against complexity. From optimizing analytical methods to tailoring sample preparation, the variables at play can overwhelm even the most seasoned scientists. But what if there was a way to harness the power of artificial intelligence (AI) to streamline experiment planning and unlock new frontiers of metabolomic discovery?

In a thoughtful summary of the promises and challenges AI offers for 8 different stages of metabolomics research, Coler et al. [1] provided valuable insights that we will review. Here we explore the first 3 stages focused on data acquisition.

Figure 1 A generalized description of a mass spectrometry-based metabolomics experiment. [1]
Figure 1 A generalized description of a mass spectrometry-based metabolomics experiment. [1]

1. AI can optimize experimental design in metabolomics by learning the best methods to detect metabolites of interest based on prior data.

This includes suggesting optimal chromatography, mass spec modes, sample prep, blanks strategy, and study power calculations. Traditionally, researchers have relied on experience to select broad metabolomic methods or those tailored for certain analytes. However, this approach does not guarantee capturing important metabolites for the biological question. AI can analyze

large metabolomic datasets in the context of research aims to learn patterns that optimize detection of specific metabolites of interest with high sensitivity and specificity.

By leveraging machine learning on prior data, AI can model which experimental parameters like chromatography type, mass spec modes, extraction solvents etc. are most effective for certain molecular classes or biological scenarios. It can also guide crucial study design choices like sample size calculations for sufficient power and optimal blank strategies to control batch effects while minimizing costs. This data-driven approach transcends the limitations of traditional experience-based methods.

2. AI techniques like neural networks show promise for improved data preprocessing tasks like peak detection, deconvolution, and noise removal compared to traditional approaches.

The structure of mass spec data is well-suited for AI methods like convolutional neural networks, recurrent neural networks, and autoencoders. These have been applied to denoise spectra, detect peaks, and deconvolute overlapping peaks based on pattern recognition across large datasets. An example is MSHub for GC-MS data, which uses a neural network to determine optimal preprocessing parameters by learning spectral patterns across runs.

AI-based preprocessing tools demonstrate advantages like higher accuracy, reduced user effort, and improved reproducibility compared to traditional algorithms requiring manual parameter tuning. As no human-coded rules are imposed, the AI can learn subtle patterns in an unbiased manner. With increasing adoption, AI is poised to streamline and harmonize data preprocessing across laboratories.

3. AI enables powerful batch effect correction methods to harmonize data across runs, labs, and instruments by learning complex patterns in the data.

High-throughput omics experiments often suffer from technical biases like batch effects that confound biological interpretation. Well-established AI algorithms like ComBat and MNN learn from control data to adjust for these nonlinear batch dynamics while preserving true biological signals. Newer approaches like automatic feature engineering infer manifolds over samples and analytes to directly calibrate out batch artifacts from the experimental design.

By exploring complex data patterns, AI overcomes limitations of traditional statistical methods that assume linear effects. AI batch correction has significantly advanced omics data harmonization over the past decade, enabling integrative studies across multiple batches, labs and instrument platforms. However, extreme divergences in protocols may still require metanalysis combining separate AI models.

4. AI facilitates metabolite annotation by generating comprehensive in silico spectral libraries, improving spectral matching, and providing accurate molecular structure prediction.

The key challenge in metabolomics is translating spectral patterns to molecular identities. AI enhances the conventional approach of matching against reference libraries in multiple ways. It

enables advanced similarity scoring accounting for noise, missing peaks, etc. by learning complex spectral relationships, instead of simplistic measures like dot products.

Additionally, AI drives breakthroughs in computational spectra generation to overcome limitations of currently small experimental libraries. Databases like METLIN and MassBank utilize AI to synthesize comprehensive libraries spanning vast chemical space. Finally, AI excels at the inverse problem of predicting molecular structures directly from spectra using self-supervised learning on large datasets, exemplified by tools like Sirius.

5. Cutting-edge AI allows integrating multi-omics data like metagenomics and metabolomics by modeling interconnected relationships across these disparate datasets.

While DNA provides a blueprint of microbial genes, metabolomics captures the functional molecular outputs synthesized by these microbes. However, combining these complementary omics data is challenging due to differences in size, structure, and underlying biology. Powerful AI methods like MMVec can infer conditional probabilities linking microbes to specific metabolites by identifying co-occurrence patterns using matrix factorization.

Such AI-driven multi-omics integration provides a systems view of microbial metabolism and overcomes limitations of simplistic correlation analysis which is confounded by data compositionality. Extending beyond the microbiome, AI strategies integrating transcriptomics, proteomics and metabolomics hold promise to map multi-scale mechanisms of metabolite biosynthesis across complex biological communities.

6. Network analysis approaches leverage AI to infer associations and patterns in mass spec data represented as networks.

The chemical similarity between mass spec features can be represented using network models, where nodes indicate analytes and edges connect those with related structures based on spectral patterns. AI techniques like graph neural networks, community detection, network propagation and embedding can then mine these data networks to uncover molecular associations and biochemical relationships.

This approach is founded on the premise that structural likeness reflected in the spectra implies functional relatedness. Extracting connectivity patterns in the networks using AI enables annotating molecular families and biochemical pathways directly from the spectral data in an unsupervised manner, with potential to delineate novel biochemistries from unknown molecules.

7. AI will transform the scientific process by dynamically incorporating new knowledge into learning models, enabling interactive AI assistants, and collaborative hypothesis generation powered by large biodata aggregators.

The AI revolution catalyzes a paradigm shift in knowledge creation and dissemination. Instead of the centuries-old model of publishing findings as static interpretations in papers, AI will facilitate an iterative, self-updating process. New evidence will be rapidly assimilated into knowledge bases which recursively retrain AI models, perpetually refining the understanding in a virtuous learning cycle.

Accompanying this transition, the format for accessing scientific knowledge may evolve from publications toward interactive AI assistants that contextualize insights based on accumulated data and postulations. Moreover, meta-aggregators of multimodal data sources will allow probing high-dimensional biological phenomena and machine-driven abductive reasoning, accelerating the generation of unifying hypotheses from big data. However, oversight on AI system inputs and thoroughness of model reasoning will be crucial to prevent propagating errors or harmful biases through the continuously expanding knowledgebase.While the full implementation of AI-driven experimental design in metabolomics is still an emerging field, the potential benefits are undeniable. By harnessing the power of AI, researchers could unlock new levels of precision, efficiency, and insight, accelerating the pace of metabolomic discoveries and paving the way for groundbreaking advancements in fields ranging from biomedicine to agriculture.

For biotech companies, the integration of AI into metabolomic research could offer a competitive edge, enabling more targeted and impactful research initiatives. However, it is crucial to approach this endeavor with a thoughtful and strategic mindset, ensuring that AI is implemented as a collaborative tool that complements and enhances human expertise, rather than replacing it. By striking the right balance between cutting-edge technology and scientific acumen, the metabolomic research community can unlock a future where the complexities of small molecule networks are unraveled with unprecedented clarity and precision.

1. Coler EA, Chen W, Melnik AV, Morton JT, Aksenov AA. Metabolomics in the era of artificial intelligence. Microbiota and Host. 2024;2(1).