Skip to main content


Table 2 Summary of methods used to identify and exclude background contamination in microbiota datasets

From: How low can we go? The implications of low bacterial load in respiratory microbiota studies

Method How does it work? Strengths and weaknesses
Exclusion method (e.g. [47, 52]) All OTUs detected in negative controls are excluded as background contaminants Background contaminants will be excluded; however, the method is not recommended as it risks exclusion of biologically relevant OTUs that can occur in negative controls (e.g. because of barcoding errors) [5, 8, 11, 61, 62].
Replicate method [8] Contaminant OTUs excluded based on replicate data from different DNA extraction batches and negative controls. This method is not suitable if replicate extractions are not possible. Requires subjective interpretation of negative control data.
Abundance ratio [5] The ratio of mean OTU abundance in negative controls and study specimens is calculated. OTUs with a ratio > 0.001 are excluded as probable background contaminants. Unsupervised method developed using pure bacterial cultures. Optimal threshold for defining contamination in more complex microbiota data may need to be determined.
Correlation analysis [56] Spearman correlation between OTU relative abundance and bacterial load is determined. OTUs showing a strong and significant negative correlation with bacterial load (Spearman rho < − 0.7) are excluded as probable contaminants. Unsupervised method. Limited when applied to OTUs present in < 20 specimens. Significance testing requires adjustment for multiple measures.
Cluster analysis [32] Hierarchical cluster analysis is used to identify and exclude specimens with high similarity to negative controls. Unsupervised method. Exclusion of specimens (instead of reads or OTUs) may impact study design. Background contamination clusters may be difficult to define where similarity between negative controls and low bacterial load specimens is variable.
Neutral model [64] Reads that are neutrally distributed and enriched in negative controls are excluded prior to OTU clustering. Additionally, only reads that are unique to clinical specimens are included in downstream analyses. Unsupervised method. It is unclear whether restriction of downstream analyses to reads unique to clinical specimens may result in exclusion of biologically relevant taxa (e.g. because of barcoding errors).