The anaerobic digestion microbiome: a collection of 1600 metagenome-assembled genomes shows high species diversity related to methane production

Background Microorganisms in biogas reactors are essential for degradation of organic matter and methane production through anaerobic digestion process. However, a comprehensive genome-centric comparison, including relevant metadata for each sample, is still needed to identify the globally distributed biogas community members and serve as a reliable repository. Results Here, 134 publicly available datasets derived from different biogas reactors were used to recover 1,635 metagenome-assembled genomes (MAGs) representing different bacterial and archaeal species. All genomes were estimated to be >50% complete and nearly half were ≥90% complete with ≤5% contamination. In most samples, specialized microbial communities were established, while only a few taxa were widespread among the different reactor systems. Metabolic reconstruction of the MAGs enabled the prediction of functional traits related to biomass degradation and methane production from waste biomass. An extensive evaluation of the replication index provided an estimation of the growth rate for microbes involved in different steps of the food chain. The recovery of many MAGs belonging to Candidate Phyla Radiation and other underexplored taxa suggests their specific involvement in the anaerobic degradation of organic matter. Conclusions The outcome of this study highlights a high flexibility of the biogas microbiome. The dynamic composition and adaptability to the environmental conditions, including temperatures and a wide range of substrates, were demonstrated. Our findings enhance the mechanistic understanding of anaerobic digestion microbiome and substantially extend the existing repository of genomes. The established database represents a relevant resource for future studies related to this engineered ecosystem.

. According to the metrics recently proposed for evaluation of Cp and Ct of MAG 148 [40], 796 (~49%) were of high quality (HQ), the remaining were defined as medium-high quality 149 (MHQ) and medium quality (MQ) ( Table 1  The reactor process conditions influence the relative abundance of taxa, determining dramatic 207 changes. For instance, the "Bacteria/Archaea" ratio, which has a median value of ~14, was highly 208 variable. Beside the acidogenic reactors, where the methanogenic process was undetectable (i.e.

211
However, Archaea were predominant in several reactors analyzed in this study and in 3% of all 212 samples, their abundance exceeded that of Bacteria, with a ratio of ~0.5 in a biofilm sample 213 collected from a reactor fed with acetate ("LSBR-D200-DNA-BF"). Despite the fact that some of 214 the microbiome derived from sub-fractions of the samples (e.g. stable isotope labelling) or from 215 biofilms, it is interesting to note that in reactors fed only with "methanogenic substrates", Archaea 216 was the dominant group of the entire microbiome. Considering only biogas plants, the 217 "Bacteria/Archaea" ratio is kept within a more narrow range, but still it is very flexible (from 470 218 in Nysted to 3.4 in Vilasana) (Fig. 4). The bacterial phylum Firmicutes, which is the most abundant 219 taxon within the biogas microbiome, also varied between 1.3% and 99.9% of the microbial 220 community (Additional Fig. S1 and Additional File 5). In almost 40% of all samples analyzed,

221
Firmicutes was not the dominant taxon, but Bacteroidetes, Coprothermobacter, Actinobacteria, 222 Thermotogae, Chloroflexi and Euryarchaeota become prevalent reaching up to 85% relative 223 abundance within the microbiome [42]. Interestingly, in reactors where none of the previously 224 mentioned taxa were dominant, microbial species belonging to candidate phyla reached high  substrates such as cheese whey, acetate or glucose (p-value<0.001). This suggests that the AD 250 process can be supported by less than 100 species when the feedstock is mainly consisting of one 251 single compound. On the contrary, degradation of complex substrates (such as sewage sludge or 252 manure) requires the cooperation of a large cohort of microbes including more than 1,000 species.

253
Analysis of the MAGs shared among different samples ( Fig. 1 B) revealed that thermophilic 254 reactors tend to share more species than mesophilic systems, which could be due to the selective 255 pressure imposed by the high growth temperature.  Fig. S3A). This is in contrast to previous findings 268 [5,7] showing mostly specialized microbial communities depending on the temperature regime. We 269 assume that, for most of the species considered in this study, rather the substrate used plays a major 270 role in determining their abundance than the temperature (Additional Fig. S3 B-C). We are aware of 271 processes, and this might introduce a shift into the PCoA. However, the MAGs collectively 273 captured on average 89%, and sometimes close to 100%, of the total reads of the metagenomic 274 datasets and thus, the majority of the sampled microbial communities. The collection of MAGs 275 considered in this study does not completely disclose the "AD black box". However, we are getting 276 closer to the assignment of a substantial number of new species to their role in the food chain of the 277 organic matter degradation process.

278
Only few MAGs were detected in multiple samples, and this was due to the high heterogeneity of Initial evaluation was focused on the identification of MAGs having a specific KEGG module.

320
Considering both the complete and "1 bm" modules, only 15 "core modules" have been identified 321 in more than 90% of the HQ-MHQ MAGs. These include for example "C1-unit interconversion", 322 "PRPP biosynthesis", "glycolysis, core module involving three-carbon compounds". Other 223 323 "soft core modules" were present in 10% to 90% of the HQ-MHQ MAGs. Finally, 289 "shell 324 modules" have been identified in less than 10% of MAGs, including those associated with 325 "methanogenesis", "reductive citrate cycle" and "Wood Ljungdahl (WL)-pathway". The high 326 fraction of "soft core" and "shell" modules revealed a highly specialized microbial community, with 327 a small number of species performing crucial functions such as methanogenesis. Considering the 328 "complete" and "1 bm" modules together, the median number of modules per MAG was 107.  representation of crucial KEGG modules (Fig. 6). It is interesting to note that the relative abundance     (Fig. 7 A). Coprothermobacterota are distributed over a wide range, but on average are higher than for other 496 phyla (2.4 and 2.8) (Fig. 7). The limited growth rate of some taxa, such as Acidobacteria, was also 497 previously reported [54] and it was speculated that this property hampered their isolation. The high  AS27yjCOA_157, followed by Methanomicrobiales sp. AS21ysBPME_11 (Fig. 7 B). M. 504 soehngenii was previously defined as a slow-growing methanogen specialized to utilize acetate [57] 505 and it is very interesting that 7 out of 9 iRep results obtained are higher than 2, while the highest Our findings also suggest that duplication rates are dependent on metabolic properties of MAGs. and Candidatus Fermentibacteria (Fig. 7 B), suggesting that they are slow-growing members of the 531 AD system.     value higher than 95% and more than 70% of genes in common with the reference species. Other 619 149 MAGs were also highly similar to known species deposited at the NCBI microbial genome 620 database, but these reference genomes were not taxonomically assigned at species level. Other 38 621 MAGs had average similarity which was higher than 95%, but the percentage of common genes 622 ranged between 50% and 70%. Furthermore, affiliation of these microbes to the genus level was 623 doubtful.
(2) Intermediate priority for taxonomy classification was given to MAGs encoding the 624 16S rRNA genes longer than 300 bp. The 16S rRNA genes were identified for each MAG with in-house developed perl script using Hidden Markov Models obtained from RNAmmer [68] and 626 taxonomy assessment was determined using RDP classifier trained on SILVA 132 ribosomal RNA 627 (rRNA) database [69]. Taxonomy results were compared with those obtained from ANI and from  than 175 and a coverage value higher than five, were selected in order to determine their index of 678 replication (iRep) applying the iRep software [52].