Completed Microbiomics Challenge

Completed Microbiomics Challenge

The 2018 sbv IMPROVER Microbiomics Challenge in A Nutshell

Accurately determining the composition and function of the microbiome can shed light on its role in diseases and lead to the development of new therapies and diagnostic tools.

The Microbiota composition prediction challenge was designed to evaluate the performance of computational microbiome analysis pipelines for their ability to predict the microbial composition of samples based on sequencing data.

Specifically, the questions that we aimed to address were the following:

  • Which pipelines best recover bacterial community composition and relative abundance?
  • Do technical biases and specific microbial composition affect the performance?

Why we organized the microbiota composition prediction challenge

The biological interpretation of changes to the microbiome relies on the accurate qualitative and quantitative measurement and inference of the microbiome community composition and function, using advanced sequencing technologies and computational analysis approaches. Choosing the most suitable tool is challenging, as there is a large and ever-increasing variety of computational methods, and the issue of how to objectively benchmark them is still being explored.

A few crowdsourced initiatives have been conducted for evaluating the performance of metagenomics data analysis methods and providing guidance to the scientific community. The two Assemblathon efforts ran in 2010 and 2012 (Earl et al. 2011, Bradnam et al. 2013) focused on evaluating the performance of genome assembly methods. The Critical Assessment of Metagenome Interpretation (CAMI) team in collaboration with the metagenomics community organized a challenge in 2015, which aimed at evaluating methods in metagenomics for assembly, binning, and taxonomy profiling. CAMI provided an extensive benchmarking dataset to participants. Among the many results they collected, CAMI observed that (i) a good assembling step is crucial for successive binning; (ii) taxonomic profiling tools accurately predict higher level taxa (e,g., family level), while giving poor predictions on lower level taxa (e.g., species level).

In a spirit of continuity with CAMI and Assemblathon, the microbiota composition prediction challenge aimed at assessing objectively the performance of microbiomics computational analysis pipeline(s) as a whole, i.e. from quality control to taxonomy profiling, for the recovery of relative abundance and taxonomy assignment of bacterial communities, rather than assessing the individual steps of the process as CAMI already did.

The participants were provided with shotgun DNA sequencing data for several microbiome samples and asked to predict, at the phylum, genus, and species level, the composition and relative abundance of bacterial communities present in each sample.

Further background on Microbiomics


Microbiology is the study of microscopic organisms called microbes or microorganisms. Several types of microorganisms exist, including bacteria, archaea, and viruses. Microorganisms populate most of the earth and can be found in every part of the biosphere, including soil, oceans. They are present on the epithelial surface and digestive tract of higher organisms such as humans (Figure below), and the analysis of the composition of these populations is a rapidly expanding field of research.

The microbiome

In higher organisms, the microbiome comprises a complex collection of microorganisms, colonizing different body niches, such as the gut, mouth, genitals, skin, or airways. The composition of this microorganism population varies depending on the body part and the health status of the individuals (Figure below). The human microbiome is known to have a beneficial role for homeostasis, assisting for example in the bioconversion of nutrients and detoxification, supporting immunity, protecting against pathogenic microbes, and maintaining host development, metabolism and physiology (Lloyd-Price et al. 2016, Koppel et al. 2017). It is now understood that a good and sensitive balanced interaction of microbes with the host is essential to health.

Moreover, growing evidence suggests that the function of the indigenous microbiota can be influenced by many factors, including genetics, diet, age, and toxins. The disruption of this balance, called dysbiosis, is associated with a plethora of diseases, including cancers, immune-related diseases, metabolic diseases, inflammatory bowel disease, pulmonary pathologies, oral diseases, skin problems, and neurological disorders (Turnbaugh et al. 2007, Schuppan et al. 2009, Benson et al. 2010, Koren et al. 2012, Sommer and Backhed 2013, Galipeau et al. 2015, Riiser 2015, Caminero et al. 2016, Scher et al. 2016, Vatanen et al. 2016, Vogtmann and Goedert 2016, Blázquez and Berin 2017, Roy and Trinchieri 2017, Shukla et al. 2017). The common feature found among these unhealthy conditions is the loss of microbiota diversity, defined as the decrease in number and abundance of distinct types of microorganisms (Huttenhower et al. 2012, Mosca et al. 2016). Lower microbiome richness has been associated with metabolic dysfunctions, skin disorders, gastrointestinal disorders, and low-grade inflammation (Alekseyenko et al. 2013, Cotillard et al. 2013, Le Chatelier et al. 2013). Therefore, interrogating the composition of the microbiome can shed light on the etiology of diseases and, in the future, microbial abundances could potentially be used as markers for disease diagnostic.

Technologies and tools for microbiome analysis

Advances in genome sequencing technologies have enabled progress in the characterization of the microbial diversity, leading to a rapid expansion of the field known as microbiomics: the study of DNA of a microbial community. An accurate analysis of microbiome sequencing data (e.g., correct taxonomic assignment and relative abundance estimates) relies on computational methods. A plethora of analysis tools have been developed and published. However, limited information on the performance of computational methods and their context of applicability make scientists’ selection of the most appropriate software difficult. Initially, the evaluation of computational methods in microbiome analysis has been limited to authors’ benchmarking of their method against other existing ones, whenever authors published novel or improved approaches. However, this evaluation remains restricted and challenging due to the limited number of methods that are generally compared in a publication, with the risk to fall into “self-assessment trap” leading to biased results (Norel et al. 2011), as well as a low consensus about benchmarking datasets and evaluation metrics in microbiomics. For this reason, new initiatives (see Assemblathon (Earl et al. 2011, Bradnam et al. 2013) and the CAMI initiatives ( such as the one presented here are undertaken to evaluate computational methods in microbiomics independently, comprehensively, and objectively.

Introduction to microbiome


Microorganisms are found in many environments on earth, including soil, seafloor, and the human body that are among the most studied environments. In the figure, the relative abundances of four dominant bacterial phyla in different body sites: mouth (Bik et al. 2010), distal esophagus (Pei et al. 2004), lung (Beck et al. 2012), gut (Costello et al. 2009) is shown.

Microbiomics analysis pipeline in a nutshell

The figure below reports a typical pipeline for the analysis of shotgun data.

Analysis pipeline  Sample pipeline for the analysis of shotgun data

Quality control of reads: QC tools applied at this step check that the raw data are of good quality and provide insights for filtering/trimming.
Trimming/Filtering of low quality reads: Trimming refers to the action of shortening sequencing reads by removing based with poor quality base calls and bases from sequencing adapters. Filtering refers to the action of removing sequencing reads completely, for instance when the average quality of the read is below a certain threshold, or when the trimmed read becomes too short.
Host genome contamination removal: filter all unwanted reads that belong to the host genome.
Taxonomic assignment: Microbiome profile identification. Identification of represented genomes abundances.

Challenge Results

A team of researchers from Philip Morris R&D in Neuchâtel (Switzerland) established a scoring methodology and performed the scoring on the blinded submissions under the review of an independent Scoring Review Panel including Prof. Alice McHardy, Helmholtz Centre for Infection Research, Germany and Dr Luisa Cutillo, Department of management and quantitative Studies (DISAQ) - University Parthenope of Naples, Italy.

The Scoring Review Panel reviewed and approved the scoring methodology and procedures before the challenge closure as well as the below results of the scoring and final ranking:


Final rank Team Weighted sum of ranks (wsr) F1 score wsr L1 norm wsr Weighted unifrac wsr
1 Team-1 418 114 152 152
2 Team-3 455.5 129.5 163 163
3 Team-2 550 168 191 191
N/A Team-4 676.5 301.5 187 188
4 Team-5 817 362 227
5 Team-8 960.5 236.5 362 362
6 Team-6 1029 366 331.5 331.5
7 Team-7 1249.5 374.5 437.5 437.5


The winners are:

  • 1st : Vijay Kumar Narsapuram from India – Dupont Pioneer India
  • 2nd : Emma Ghrejyan from Armenia - Center for Ecological-Noosphere Sciences, NAS RA; Russian-Armenian University
  • 3rd : Tigran Vardanyan from Armenia – ISTC labz

Presentation of results

We have presented a poster with key results at the ISMB 2018 (view here).

We are now working with the challenge winners on an outcome publication "Crowdsourced-benchmarking of computational pipelines for metagenomic taxonomy profiling – the sbv IMPROVER Microbiomics Challenge" that will summarize the key learnings of this challenge.


Share this page