Distinguishing Strains in Microbiome Samples: Considerations and Challenges

Defining the term “strain” is a conscientious issue within the microbiome field. Broadly speaking, a bacterial strain is a sub-group within a species with unique characteristics such as genetic makeup, metabolism, and morphology. But standards by which researchers classify strains are lacking and, the complex nature of microbiome samples makes developing defined methodologies for distinguishing strains difficult.

Distinguishing between bacterial strains is important for microbiome research to further understand the interaction between host and microbiome. Different strains can produce different molecules, occupy different niches, and serve different functions; thus, it is vital researchers can identify which strains are present in their microbiome samples.

At the Microbiome Data Congress 2023, Boston, 13th-14th November, leading microbiome researchers from industry and academia will come together to discuss pressing microbiome data challenges such as “Distinguishing Strains: Taxonomic and Genomic Considerations for Classifying Bacteria”.  Learn more about the event here.

We spoke to representatives of the Microbiome Data Congress committee to get their perspective on the importance of distinguishing strains in microbiome research:

In your experience, what challenges have you encountered when trying to accurately classify bacterial strains?

Benjamin Callahan, North Carolina State University, Associate Professor, Microbiomes and Complex Microbial Communities

“Strain is in ill-defined term in this space. In some cases, strain might refer to a particular perhaps patented physical bacterial isolate available as a pure culture. In other cases, strain might refer to bacteria that are sufficiently closely related that their most common ancestor was within some period of time, perhaps within a human lifespan or the start of an outbreak. In still other cases, strain serves as a sort of sub-species classification scheme, referring to relatively broad clades within a species. In the absence of a more universal definition of “strain”, it is critically important for strain-level analyses to first clearly define what they mean by strain. Secondly, one must be aware that strain-level methods or tools may not match the relevant definition. For example, it is relatively common to see methods that talk about strain-level resolution based on, perhaps, reporting the best matching strain from a collection of reference genomes within a species. Yet, if considering the narrowest definition of a strain, it will essentially never be the case that a copy of a reference genome will actually be present in any natural sample. As one way forward, I would argue more attention needs to be spent on sub-species taxonomy — based on whole-genome phylogenetics — within a broad swathe of bacteria. Something like an extension of GTDB to increasingly fine sub-species taxonomic ranks. Especially in the microbiome profiling space, this can lead to a more valid “sub-species” classification than the “strain-level” classifications now often discussed.”

Can you discuss any recent advancements in technology or techniques that have improved the classification of bacterial strains in microbiome research?

Tonya Ward, Rebiotix, Senior Director, Microbiome Data Science

“I don’t think as a community we have a consensus on what a ‘strain’ really is, but we are getting close. Coming from a genomic-data lens, advances in analyses of full genomes and metagenome-assembled-genomes (MAGs) at scale has been critical to help us understand the genomic relatedness of organisms we are sequencing en mass. I’m a huge fan of the Genome Taxonomy Database (GTDB, https://gtdb.ecogenomic.org/), which aims to standardized microbial taxonomy based on genome phylogeny. The broad adoption of this genome-centred approach to taxonomy paired with an ever-expanding repertoire of microbial genomes and MAGs is allowing us to get incrementally closer to understanding the true level of diversity in the bacterial kingdom. As sequencing costs decrease and we start to gain more and more data, a standardized approach to classifying strains will become more and more important.”

Can you explain the importance of distinguishing bacterial strains in microbiome research?

Rodrigo Bacigalupe, GSK, Head of Microbiome and Disease, Discovery Data Science


“Bacterial strains exhibit significant heterogeneity in their genetic makeup, functional capabilities, and metabolic traits. However, strain-level variation has historically been overlooked in microbiome research. Uncovering this diversity is crucial for understanding microbial communities, microbiome-host interactions, disease associations, and developing effective therapeutic strategies like probiotics and vaccines. Certain strains possess unique characteristics such as bioactive compound production, immunomodulatory capabilities, or distinct metabolic properties that can impact host health. From a clinical perspective, strain-level analysis improves diagnosis, can be used to monitor bacterial spread, and guides targeted treatment strategies. In probiotic development, selecting strains with immune-modulating properties or niche colonization abilities enhances the design of more effective formulations. Strain-level information is also essential in vaccine development, enabling the creation of targeted vaccines that induce protective immune responses against specific strains or inform the selection of suitable antigens, ensuring broader coverage and efficacy against diverse strains.”

What are some common methods used to differentiate bacterial strains in microbiome research?

Henrik Bjørn Nielsen, PhD, Chief Scientific Officer, Clinical Microbiomics

“The term “strain” has, in the microbiome era, drifted away from its original meaning of an isolate that had been “strained” on a growth medium. It now refers to a clonal population, including those found in our microbiomes. In my view, we should consider strains as clonal populations with a very recent common ancestor, using the phylogenetic relatedness of the strains as a framework for understanding them. This can be done using single-nucleotide polymorphism and maximum likelihood methods to estimate phylogenetic relatedness at this level. Organizing the strains in a phylogenetic tree allows us to correlate traits or phenotypes with the evolution of the strains. This helps us understand how clades are associated with the human host’s clinical conditions or responses to treatment. Furthermore, we can place isolated strains in the tree with the strains we observe in microbiome samples. In doing so, we can link information from human studies with isolated strains and vice versa, and perhaps discover new applications areas for strains.

 My key message is that strains are unique, and since we cannot generalize something unique, it does not make sense to think of strain-level as a taxonomic level. We need to understand diversity at this level de novo and use phylogeny to generalize and learn about their traits.”


At Microbiome Data Congress 2023 hear from speakers including NIH’s Mikhail Kolmogorov, and Chan Zuckerberg Biohub’s Chunyu Zhao to discuss this challenge within the “Distinguishing Strains: Taxonomic and Genomic Considerations for Classifying Bacteria” session.

Other sessions at MDC 2023 include:

  • Deciphering Microbiome Biogeography: Longitudinal and Spatial Data
  • Analyzing and interpreting Non-Bacterial Microbiome data
  • Methodologies for Modelling Host-Microbiome Interactions
  • Integrating Dietary and Microbiome Data
  • Comparing and Validating Analytical Tools and Statistical Methods

Book now for early bird!

Daniel Quinn