New insights from the biogas microbiome by comprehensive genome-resolved metagenomics of nearly 1600 species originating from multiple anaerobic digesters

Background Microorganisms in biogas reactors are essential for degradation of organic matter and methane production. However, a comprehensive genome-centric comparison, including relevant metadata for each sample, is still needed to identify the globally distributed biogas community members and serve as a reliable repository. Results Here, 134 publicly available metagenomes derived from different biogas reactors were used to recover 1635 metagenome-assembled genomes (MAGs) representing different biogas bacterial and archaeal species. All genomes were estimated to be > 50% complete and nearly half ≥ 90% complete with ≤ 5% contamination. In most samples, specialized microbial communities were established, while only a few taxa were widespread among the different reactor systems. Metabolic reconstruction of the MAGs enabled the prediction of functional traits related to biomass degradation and methane production from waste biomass. An extensive evaluation of the replication index provided an estimation of the growth dynamics for microbes involved in different steps of the food chain. Conclusions The outcome of this study highlights a high flexibility of the biogas microbiome, allowing it to modify its composition and to adapt to the environmental conditions, including temperatures and a wide range of substrates. Our findings enhance our mechanistic understanding of the AD microbiome and substantially extend the existing repository of genomes. The established database represents a relevant resource for future studies related to this engineered ecosystem.


Detailed process for assembly and binning
Reads obtained from biogas reactors inoculated with the same inoculum were co-assembled, as well as those collected from primary and secondary reactors of the same biogas plant. On contrary, samples derived from reactors using different inocula and those collected once from a specific biogas plant were assembled individually. Before performing the co-assembly of reads obtained from different reactors, the similarity of the microbial composition among different samples was verified running MetaPhlAn2 (v2.2.0) on one million unassembled reads, randomly collected from each sample [1]. This preliminary check confirmed that samples collected from the same reactor, or collected from reactors using the same inoculum had on average similar microbial composition. The high diversity between groups indicated the need for a separate assembly of each group in order to minimize computational requests, as well as to avoid co-assembly of different strains belonging to the same species, a process resulting in lower quality of the assembled MAGs [2].
Reads were assembled using Megahit (v1.1.1) with "--sensitive" mode for samples having less than 40 Gb of sequenced bases and with "--large" for the remaining assemblies [3]. After the assembly process, a trial alignment with Bowtie 2 program (v2.2.4) [4] was performed using 100,000 randomly selected reads per each sample in order to calculate the fraction of reads aligned on each assembly. This allowed the identification of all samples having a reasonable alignment rate on each assembly (higher than 25%) and to select them for the subsequent binning step. Samples having less than 25% aligned reads were considered as being not informative and not used to determine the coverage profile of the scaffolds. Based on these preliminary results, the number of experiments considered for coverage calculation and subsequent binning ranged from 11 to 89 depending on assembly.
After assembly and binning, contaminating scaffolds for each MAG were identified considering their genomic characteristics (GC content and tetranucleotide composition). After the filtering step performed with RefineM [5], the "CC3 value" [CC3=Cp-(Ct*3)] (where Cp is completeness and Ct contamination determined using checkM) of each MAG was calculated again leading to only 159 MAGs showing an improved "CC3 value" after contamination removal; all the remaining MAGs were maintained in their initial condition (without performing the filtering step).
During the redundancy removal, a single representative MAG was collected for each cluster. To determine the representative MAG, after ANI calculation, from each cluster of MAGs which belong to the same species, a representative one with the highest CC3 value was selected. These MAGs were classified in three groups according to their quality and contamination levels: High Quality "HQ" (Cp>90%, Ct<5%), Medium-High Quality "MHQ" (90%>Cp>=70%; 5%<Ct<10%) and Medium Quality "MQ" (70%>Cp>=50%; 5%<Ct<10%).

Details regarding taxonomic assignment
Taxonomic assignment reported in the text is reported in [6] with small modifications: (1) The highest priority for taxonomy assignment has been given to the ANI results obtained comparing MAGs with genomes from NCBI database. gANI calculation was performed as described in the main text comparing MAGs and the genomes downloaded from NCBI microbial genome database (last accessed date: May, 2018). 56 MAGs showed an ANI value higher than 95% and more than 70% of genes in common with the reference species. Other 149 MAGs were also highly similar to known species deposited at the NCBI microbial genome database, but these reference genomes were not taxonomically assigned at species level. Other 38 MAGs had average similarity which was higher than 95%, but the percentage of common genes ranged between 50% and 70%. Furthermore, affiliation of these microbes to the genus level was doubtful. (2) Intermediate priority for taxonomy classification was given to MAGs encoding the 16S rRNA genes longer than 300 bp. The 16S rRNA genes were identified for each MAG with in-house developed perl script using Hidden Markov Models obtained from RNAmmer [7] and taxonomy assessment was determined using RDP classifier trained on SILVA 132 ribosomal RNA (rRNA) database [8]. Taxonomy results were compared with those obtained from ANI and from taxonomically informative proteins (PhyloPhlAn and CheckM, "step 3" below) [9,10]. Five discordant results were manually verified and corrected removing possibly misassigned 16S rRNA genes. (3) Results obtained from taxonomically informative proteins (PhyloPhlAn and CheckM) were used for taxonomic classification of the remaining MAGs. Finally, results obtained applying all three methods were compared with each other in order to discover discrepancies, which were identified and manually corrected only for the MAG Candidatus Fermentibacter daniensis_AS4DglBPLU_32. An additional verification was performed on MAGs assigned to CPR, DPANN and some other hypothetical taxa by selecting 5278 representative genomes from NCBI microbial genomes database as described previously [6], building a tree using PhyloPhlAn [9] and performing a manual inspection assisted by Dendroscope (v1.4) [11].
From the results obtained, 1,233 MAGs were taxonomically assigned using selected marker genes, an additional 212 MAG were characterized based on results obtained from 16S rRNA gene sequences, the taxonomy of the 121 remaining MAGs (mainly belonging to candidate taxa) has been refined by manual inspection of their placement into a phylogenetic tree as previously described. Only 69 out of 1,635 MAGs were assigned to known species based on ANI comparison performed considering the genomes deposited in NCBI (https://www.ncbi.nlm.nih.gov/genome/microbes/) (Data set S6).
Taxonomic assignment obtained from the combined evidences mentioned above (marker genes, 16S rRNA, ANI and manual inspection) was compared with that obtained from MiGA [12] and results obtained were in good agreement; the fraction of MAGs consistently assigned to already existing taxa varied from 68% (family) to 88% (genus) depending on the taxonomic level.
Two additional taxonomic analyses were performed using Bin Annotation Tool (BAT) [13] and GTDB-Tk toolkit [14]. Results are available in Additional File 4.