Isopod holobionts as promising models for lignocellulose degradation

Background Isopods have colonized all environments, partly thanks to their ability to decompose the organic matter. Their enzymatic repertoire, as well as the one of their associated microbiota, has contributed to their colonization success. Together, these holobionts have evolved several interesting life history traits to degrade the plant cell walls, mainly composed of lignocellulose. It has been shown that terrestrial isopods achieve lignocellulose degradation thanks to numerous and diverse CAZymes provided by both the host and its microbiota. Nevertheless, the strategies for lignocellulose degradation seem more diversified in isopods, in particular in aquatic species which are the least studied. Isopods could be an interesting source of valuable enzymes for biotechnological industries of biomass conversion. Results To provide new features on the lignocellulose degradation in isopod holobionts, shotgun sequencing of 36 metagenomes of digestive and non-digestive tissues was performed from several populations of four aquatic and terrestrial isopod species. Combined to the 15 metagenomes of an additional species from our previous study, as well as the host transcriptomes, this large dataset allowed us to identify the CAZymes in both the host and the associated microbial communities. Analyses revealed the dominance of Bacteroidetes and Proteobacteria in the five species, covering 36% and 56% of the total bacterial community, respectively. The identification of CAZymes and new enzymatic systems for lignocellulose degradation, such as PULs, cellulosomes and LPMOs, highlights the richness of the strategies used by the isopods and their associated microbiota. Conclusions Altogether, our results show that the isopod holobionts are promising models to study lignocellulose degradation. These models can provide new enzymes and relevant lignocellulose-degrading bacteria strains for the biotechnological industries of biomass conversion.


Background
Microbiota shapes living organisms through complex interactions. Under changing environmental conditions, it may rapidly evolve and influence on the host adaptation and evolution. The microbiota is known to act on animal development, as well as animal health and evolution [1]. Since the "microbiome revolution" of the last 10 years [2], many studies have highlighted the impact of microbiota on the fitness of the host. This change in our vision of organisms led to the recent introduction of the holobiont concept. This concept considers a holobiont as a combination of a host and its associated microbial community, including bacteria, viruses and cellular organisms [3][4][5]. As a result, a holobiont is an assemblage of species that are metabolically interdependent. Interaction patterns in these systems shape the holobiont's composition [6], and conversely, bionts (i.e. members of the holobiont) can be agents of developmental plasticity that facilitate the evolution of new phenotypes in animals.

Open Access
Biotechnology for Biofuels Herbivory, and more specifically lignocellulose degradation, is one of these processes that evolved through symbiont acquisition in many animals [7]. In the context of global climate change, lignocellulose is also an important renewable and sustainable source to produce biofuels and other bioproducts [8]. It seems to be the best alternative to fossil fuels, thus attracting attention from researchers and industrials worldwide. Bacteria and fungi have traditionally been used for research of lignocellulose-degrading enzymes due to their important role in the decomposition of organic matter in ecosystems [9][10][11]. For most animals, the degradation of lignocellulose involves the cooperation of many bionts to fully achieve its deconstruction [12]. It requires a large number of enzymes that are classified in Carbohydrate Active EnZymes (also called CAZymes) families [13]. CAZymes act on lignocellulose like an enzymatic cocktail; they complement each other and work in synergy to degrade each component of the lignocellulose. Lignocellulose is mainly composed of cellulose, hemicellulose and lignin [14]. CAZymes are thus classified in three classes, depending on the targeted substrate: cellulases, hemicellulases and lignin-modifying enzymes (abbreviated hereafter LME). These three types of enzymes are classified in several CAZy families: the majority of cellulases belong to glycoside hydrolases (GH), the hemicellulases to carbohydrate esterases (CE) and to GHs, and the LME are all classified in auxiliary activities (AA). In addition, recent studies have shown the existence of oxidative cellulases classified in AA families [15]. They break down cellulose with oxidative processes, contrary to the cellulases classified as GHs that hydrolyze cellulose [16,17]. However, the recalcitrance of lignocellulose, as well as the strong demand of novel enzymes by the industry, imply to explore new models for lignocellulose degradation. Because of the insufficient quantities of enzymes produced by fungi, new research now focuses on bacterial lignocellulose-degrading CAZymes [11]. Expanding bacteria models to the holobiont might enable us to find new strategies for lignocellulose degradation, and thus to respond to the lack of resources for the energetic and chemical industries.
In this context, isopods could be very interesting models to study the lignocellulose degradation in the light of the holobiont concept. Strategies for lignocellulose degradation are different between terrestrial and aquatic isopods [18]. Aquatic isopods have developed specific strategies to degrade the lignocellulose, and for some of them without the help of microbiota [19][20][21]. The best example is Limnoria quadripunctata, a marine woodboring isopod which feeds on the cellulose without any help from the microbes [21]. Remarkably, whereas hemocyanins conventionally stand as respiratory proteins, those secreted in the hindgut of L. quadripunctata modify the lignin and thus enhance the digestibility of cellulose [19]. On the other hand, terrestrial isopods shelter rich and diverse microbial communities in all their tissues [22][23][24], implying multiple interactions in the holobiont. These communities enable an efficient digestion of the lignocellulose thanks to a complementarity between their CAZome (i.e. CAZyme repertoire) and that of their host [18,25,26]. Furthermore, since the land conquest, the CAZomes of terrestrial isopods have been enriched through several gene duplications and horizontal transfers [18]. Strategies for lignocellulose degradation in isopod therefore depend in part on the interactions within the holobiont. Moreover, it has been estimated that herbivory arose independently three times in isopods [27] and promoted Crustacea diversification [28]. As a result, we might suppose that much remains unknown on lignocellulose degradation strategies in isopods.
The main purpose of this study is to provide new features on lignocellulose degradation in the isopod holobionts. To this end, shotgun sequencing of 36 metagenomes was performed from digestive and nondigestive tissues of one freshwater and three terrestrial isopod species from several populations. In addition, we used 15 shotgun metagenomes of our previous study on the pill bug Armadillidium vulgare [25]. Combined with the host transcriptomic data, this large dataset enabled us to identify the CAZome of both host and microbiota, and microbial taxa associated with lignocellulose degradation. The comparison of different CAZomes, as well as the identification of CAZymes, PULs ("Polysaccharide Utilization Loci") and cellulosomes in isopods, highlighted the diversity of strategies for this process in isopods. Altogether, these results show that isopod holobionts are promising models to study lignocellulose degradation and to potentially discover new lignocellulose-degrading CAZymes.

Quality of metagenome and transcriptome assemblies
To build the CAZomes of the isopod holobionts, 51 metagenomic samples from digestive and non-digestive tissues of five isopod species were processed from 36 new datasets, along with 15 datasets from our previous study [25]. Samples from different origins enabled us to compare CAZomes of different populations for a single species. These samples represented a total of 5.7 billion reads, assembled into 25 million contigs including 4.7 million contigs > 1 kb length (Additional file 1).
Meanwhile, host transcriptomes were obtained from whole individuals from several populations (for more details, see [18]). Transcriptome assemblies resulted in 40,916 from 143,383 transcripts depending on the species. Assemblies showed an N50 from 1096 bp to 1523 bp depending on the transcriptome (Additional file 2). They displayed a good completeness since more than 95% of the complete genes from the arthropod core genome were present in the A. vulgare, P. dilatatus dilatatus, P. dilatatus petiti and P. pruinosus assemblies, and 83% of these genes were present in the A. aquaticus assembly (Additional file 2).

CAZyme identification in isopod holobionts
Both contigs and transcripts from metagenomes and host transcriptomes were subjected to the CAZy database (http://www.cazy.org) to identify CAZomes in isopod holobionts. The use of dbCAN2 for the identification of CAZymes resulted in more stringent criteria than in our previous studies [18,25] for the identification of CAZymes. In total, 15,834 CAZymes were identified distributed among 201 CAZy families of which 27 were specific to the hosts, 136 were found only in microbiota, and 36 were present in both (Additional files 3, 4, 5). Of these CAZymes, 12,916 belonged to the metagenomes (distributed among 174 families) and 2918 belonged to the host transcriptomes (distributed among 63 families). Eightyfour GH families were identified in the CAZomes of isopod holobionts; they represented the most abundant and diversified families of those CAZomes. GHs are a prominent group of enzymes that hydrolyze the glycosidic bonds, most of cellulases and hemicellulases belong to these families. In the metagenomes, GH13 was the most frequently occurring GH family (799 modules identified in 82,4% of the samples), while GH18 represented the most abundant GH family in transcriptomes (315 modules identified in all samples) (Additional file 3). Then, GTs families were the second most abundant ones with 55 members, of which GT2 was the most abundant GT family in metagenomes (1493 modules identified in 75% of the samples) and GT1 the most abundant in transcriptomes (333 modules identified in all samples). As regards other families, 27 CBMs, 13 CEs, 13 PLs and 9 AAs were identified.

Lignocellulose-degrading CAZymes
Selected CAZymes likely to contribute to the lignocellulose degradation were then examined in depth. Focusing on digestive enzymes in isopod holobionts, only metagenomic samples of caeca and hindgut were considered for the subsequent analyses. In total, 44 lignocellulose-degrading CAZyme families representing 3140 modules were predicted in the metagenomes (Additional file 3), distributed among 33 GH families, eight CE families and three AA families (Fig. 1). Enzymatic activities of 1313 (41.8%) of these modules were predicted by Hotpep (Additional file 6). Among them, 1074 (81.8%) could have a lignocellulosic enzyme activity and 239 (18.2%) potentially act on other substrates (e.g. pectin, chitin).
Among the 2918 CAZymes belonging to the host transcriptomes, 987 could act as lignocellulose-degrading CAZymes (Additional file 3). The enzyme activity of 567 (57.45%) of those CAZymes was predicted using Hotpep, 555 (97.9%) of which could degrade the lignocellulose (Additional file 6). They were distributed among 14 CAZy families including 11 GHs and three AAs (Fig. 1). Host lignocellulose-degrading CAZymes were thus three times less diversified and abundant than in the microbiota, and only four were specific: GH47, GH38, AA15, and AA1. Three cellulases were identified in isopod hosts: two cellulases widespread in crustaceans, belonging to the GH9 and GH30 families [20,29,30], and an LPMO, belonging to the AA15 family, that was recently identified in arthropods [17,18]. Concerning LMEs, a few laccases belonging to the AA1 family were identified in terrestrial isopods but not in the aquatic species, and some AA3s were found in the five species (Fig. 1). Finally, just like for the metagenomes, hemicellulases were the most diversified CAZymes in the host transcriptomes with 11 families identified. Compared to the terrestrial species, the microbiota of the freshwater isopod A. aquaticus seems to contribute more than the host to lignocellulose degradation ( Fig. 1). Indeed, only 11 lignocellulose-degrading CAZymes were identified from the transcriptome, whereas 43 lignocellulose-degrading CAZymes were identified from the metagenome, which is 1.5 times more than in the metagenomes of terrestrial species (containing 28 lignocellulose-degrading CAZymes on average).

Taxonomic origin of the CAZymes from microbiota
To identify the microbial communities associated to lignocellulose degradation, similarity searches of the predicted CAZymes from the metagenomes were performed against the NCBI Non-Redundant Protein database. Proteobacteria and Bacteroidetes were the most represented bacterial phyla, accounting for 56% and 36% of the identified CAZymes, respectively (Fig. 3a). The microbial communities were rich and highly diversified in all host species. Indeed, 41 bacterial orders were found in all microbiota (Fig. 3b). The taxonomic origin of the microbial CAZomes was highly different from one host to another (Figs. 3, 4), without apparent sex effect except for P. pruinosus (Fig. 4a).
Moreover, the microbial CAZomes were shaped by the environment, as for a given species these communities differed according to the host origin (field or laboratory) (Fig. 4). Compared to terrestrial isopods, the microbiota of A. aquaticus included more Bacteroidetes and less Proteobacteria (Fig. 3a). Furthermore, there was a high percentage of non-identified bacteria, notably for the communities that encode GH families in the A. aquaticus laboratory lineage, where more than half of bacterial families were unknown (Fig. 3b).

Identification of PULs and cellulosomes in the microbiota
Potential PULs were screened by searching within contigs for both gene markers of PULs: SusC and SusD. The SusC or/and the SusD conserved domains were identified in 5089 contigs. Among them, only 37 contigs encoded one sequential pair of susC and susD (Additional file 7).
They were found in all species except A. vulgare and the laboratory lineage of P. pruinosus. Unsurprisingly, all these contigs were assigned to Bacteroidetes species. In the metagenomes of A. aquaticus, three contigs harboring Sus genes were of interest as they also encoded CAZymes which might be involved in lignocellulose degradation (Fig. 5). Indeed, they could act in the breakdown of hemicellulosic substrates thanks to their GH3, GH16 and GH43 enzymes. These PULs exhibited a gene organization different from those referenced in the PUL Data-Base [31].
Potential cellulosomes were predicted by searching cohesin or dockerin modules within contigs. In total, 835 dockerin modules and 65 cohesin modules were identified in 874 contigs of the metagenomes (Additional files 8,9). In addition, 163 contigs harbored an SLH domain (Additional files 8,9), an anchoring module that helps to bind the cellulosome to the cell surface. The presence of a dockerin domain in 30 CAZyme genes provided evidence for active cellulosomes in microbiota ( Table 1). Several of those CAZymes are known to deconstruct   were also associated with dockerin domains. The taxonomic assignation of these contigs showed that potential cellulosomes were carried out by bacteria belonging to Bacteroidetes, Planctomycetes, Armatimonadetes, Rhizobiales and several unknown bacteria (Table 1).

Discussion
We investigated the repertoire for lignocellulose degradation on the scale of the holobiont in five isopod species, aquatic and terrestrial, highlighting that host and microbiota complement their CAZomes to achieve effective lignocellulose deconstruction. Despite using more stringent criteria than in our previous studies [18,25], we found a great diversity of CAZymes in all isopod holobionts. We also showed that highly different host-associated bacterial communities occurred in the terrestrial species and even within species, revealing a functional redundancy in lignocellulose degradation and as regards complementarity with the host's repertoire. In addition, we highlighted the potential involvement of microbial cellulosomes, as well as the presence of PUL systems in isopods. Isopods therefore represent an unexploited wealth of lignocellulose-degrading enzymes and natural nanomachines.
The microbiota had a prominent weight in the lignocellulose-degradation repertoire of the five isopod holobionts: the microbial lignocellulose-degrading CAZymes exceeded by three times those of the hosts. A large part was not represented in the hosts: accordingly, the diverse microbial CAZymes might complete the CAZome of the host for efficient lignocellulose degradation. Considering that host CAZymes are primarily produced in the caeca [25], the production of CAZymes in the hindgut by microbiota contributes to sequential biomass degradation in the digestive organs. Spatially, the digestion of lignocellulose is therefore performed along the different parts of the digestive tract thanks to the successive intervention of host and microbial enzymes from the foregut where the caeca (or digestive glands) open to the posterior part of the hindgut [26,32,33].
Hemicellulases are the most numerous lignocellulosedegrading CAZymes in isopod holobionts. Since hemicellulose composition varies from one plant to another, and even from one plant tissue to another [16], organisms require large repertoires of hemicellulases to be able to degrade these complex structures. As decomposers, isopods have to deal with a wide variety of foods. The diversity of hemicellulases we recorded in both host and microbiota might be an adaptive response of isopod holobionts to the complex composition of hemicellulose. Several types of cellulases belonging to endocellulases and beta-glucosidases were also identified in both host and microbiota. In contrast, no exocellulases were found in the five studied isopods. Whereas they are required by fungi to achieve the cellulose degradation [34], they are rarely found in animals, among which a few crustaceans, including five isopod species not corresponding to those presently studied [13,18,30,35,36]. Contrary to termites where exocellulases are provided by their symbionts [37][38][39], isopods may degrade the cellulose thanks to the abundance and diversity of other kinds of cellulases provided by their microbiota. Furthermore, we have identified oxidative cellulases (LPMO) belonging to AA15 family in host transcriptomes and LPMO belonging to AA10 in the metagenomes of A. aquaticus, P. pruinosus and P. dilatatus p., suggesting the existence of alternative strategies for cellulose degradation in isopod holobionts. Finally, isopod hosts might be key players of lignin modification for a better exploitation of hemicellulose and cellulose. Numerous cellobiose dehydrogenases (CDH) belonging to the AA3 family and laccases belonging to AA1 family were found in the four terrestrial host species. Laccases are among the most important LME in wood-destroying microorganisms [40,41] but CDHs are not widespread in arthropods and are even absent in insects [18]. In addition, in the five isopod species, the microbiota provides manganese peroxidases belonging to the AA2 family, as well as some AA3 enzymes. Once again, the host-microbiota cooperation could permit isopod to efficiently breakdown the lignin allowing access to other lignocellulose components.
It is interesting to note that host-microbiota contribution to lignocellulose degradation appeared different in the freshwater isopod A. aquaticus compared to the terrestrial ones. The host CAZome of A. aquaticus is less expanded than those of terrestrial species [18].
(See figure on next page.) Fig. 4 Comparative analysis of bacterial communities associated with the lignocellulose degradation in isopods, considering (a) or not (b) the effect of sex. a A principal component analysis (PCA) plot of the bacterial counts, at the family level, that characterize the bacterial community from each studied species depending of its origin and sex (51.2% of the information was extracted from the two principal components PC1 and PC2). Each dot represents the taxonomic composition of a metagenome and each color represents the host species and its origin (laboratory or field). b Phylogenetic tree of the bacterial communities of the studied isopods. All branches are drawn to scale as indicated by the scale bar Bredon et al. Biotechnol Biofuels (2020)   However, its microbial CAZome comprised 43 different lignocellulose-degrading CAZymes, which is 1.5 times more than that of terrestrial species. Moreover, the freshwater isopod did not have laccases for lignin degradation, and most of the LMEs seem to be produced by its microbiota, as described in accordance with Zimmer and Bartholmé [42]. Similarly, the CAZome of its microbiota contained a significant number of hemicellulases, which could allow A. aquaticus to degrade the wide variety of consumed plants and fungi [43]. Asellus aquaticus would thus compensate its small enzymatic repertoire thanks to its microbiota which would then provide the necessary CAZymes to achieve food digestion. We also showed that the microbiota could use cellulosomes and PULs to improve lignocellulose degradation. To our knowledge, this is the first evidence of the presence of cellulosomes and PULs in isopods. Only Zimmer [26] has suspected the presence of cellulosomes in bacteria from the hindgut of terrestrial isopods. Cellulosomes are described as "one nature's most elaborate and highly efficient nanomachines" [44]. They are multiprotein complexes where associated enzymes collaborate to degrade cellulose and hemicellulose. Each enzyme is specifically regulated allowing for a sequential intervention, thus avoiding any competitive interactions [45]. We have identified potential cellulosomes in bacteria mainly found in the hindgut of both aquatic and terrestrial isopod hosts. Many CAZymes associated with a predicted dockerin domain are known to degrade cellulose and hemicellulose, indicating an enhanced ability for the isopod microbiota to degrade lignocellulose through cellulosomes. Identified cohesin and dockerin modules belong to several bacteria species, suggesting inter-species and intraspecies cohesin-dockerin interactions [46][47][48]. PULs are other complexes first described in Bacteroidetes genomes for lignocellulose and other carbohydrates degradation [49]. They are organized in several co-regulated and colocalized genes encoding CAZymes, sensing proteins, binding proteins, and transporters [50]. We found several candidate contigs harboring PUL genes markers SusC or/ and SusD. Among them, we have predicted several PULs involved in hemicellulose degradation and affiliated to unclassified Bacteroidetes bacteria. Nevertheless, there are probably a great number of fragmented PULs in our metagenomes due to the high proportion of SusC and SusD orphan domains identified in the contigs. Isopod holobionts are therefore good candidates for research on cellulosomes and PULs, and for the discovery of new bacteria taxa encoding these complexes.
Microbial communities linked to lignocellulose degradation are highly diversified in all host species. Most of the 43 identified bacteria families belong to Actinobacteria, Bacteroidetes and Proteobacteria phyla, where many species are known to produce CAZymes [13]. Bacterial communities associated to lignocellulose degradation are different not only across species, but also between populations in a single species. The isopod microbiota is mostly composed of environmental bacteria and depends on several factors like environmental conditions, sex and season [22,23,51,52]; intra-and inter-host diversity is thus very labile. Yet, as it has been observed from previous studies in A. vulgare [24,25], there is a probable functional redundancy for the lignocellulose degradation between isopod microbiota, in particular in terrestrial host species. This functional redundancy directly reflects the holobiont concept, which considers that each biont is selected through metabolic and developmental interactions that take place in the holobiont [6]. The lignocellulose degradation is therefore an important process that shapes and drives the isopod holobiont composition, selecting the function over the individual. From this point of view, isopods differ from other well-known lignocellulose decomposers (e.g. ruminants, termites, etc.), where transient microorganisms have evolved into heritable symbionts and thus promoting the evolution of herbivory [7].

Conclusion
As part of the holobiont concept, isopods are excellent models to study lignocellulose degradation. First, they harbor diverse and rich microbial communities in their digestive tissues, likely providing them with complementary lignocellulose-degrading CAZymes. Lignocellulose degradation is therefore possible thanks to multiple interactions between the host and its microbial bionts. Second, strategies for lignocellulose digestion vary across isopod species. In the freshwater isopod A. aquaticus, the contribution of the microbiota for this process is much more important than in terrestrial species. On the contrary, marine isopods of the Limnoria genus degrade the lignocellulose without the help of any microbiota [19]. While the latter use their hemocyanins to facilitate lignocellulose digestion, terrestrial and freshwater isopod could use PUL and cellulosome systems from their microbiota, as well as specific enzymes like LPMOs to improve their digestion of lignocellulose. It is very likely that many strategies remain to be discovered in isopods, especially in marine ones, which constitute the most abundant and yet the least studied group of species. Isopods are therefore promising models for the biotechnological industries for biomass conversion. The discovery of novel CAZymes and relevant lignocellulose-degrading bacteria strains in isopod holobionts would help promote new sustainable methods and tools to replace fossil fuels. Bredon et al. Biotechnol Biofuels (2020) 13:49

Metagenomics: DNA extraction and sequencing
Prior to dissection, all individuals were surface-sterilized using sodium hypochlorite. Tissues were then dissected out using sterilized instruments. All tissues were rinsed in Ringer solution to avoid cross-contamination between tissues. Caeca and hindguts (with their contents) were kept as separate samples, and the remaining tissues (i.e. nerve cords, gonads and fat tissues) were pooled. All samples were homogenized in extraction buffer, and total DNA was purified using phenol-chloroform [53]. Equimolar amounts of DNA from seven biological replicates (except for females of A. aquaticus from the Pinail nature reserve for which only six individuals could be sampled) of the same tissue and sample type (i.e. origin and sex) were pooled. Then, prokaryotic DNA was enriched twice in each pool using the NEBNext ® Microbiome DNA Enrichment kit (New England Biolabs) according to the manufacturer's instructions. This resulted in 36 shotgun metagenomic libraries which were sequenced on an Illumina HiSeq 4000 by GenoScreen (Lille, France), generating 2 × 150 bp pair-end reads (Additional file 1).
To ensure that all host sequences were filtered, two control steps were performed. (1) ORFs from contigs were predicted using Prodigal (version 2.60; [59]) with "meta" parameter, and they were compared against the Non-Redundant Protein database (December 1, 2018) using BLASTX [60] with an E value cut-off of 0.0001. The BLAST outputs were then imported into MEGAN6 software (version 6.15; [61]) for taxonomic assignment based on the lowest common ancestor (LCA) algorithm using the NCBI taxonomy database. All contigs associated to an ORF assigned to eukaryotes were discarded. (2) Then, the remaining contigs were run through the CAT-BAT pipeline (version v4.6; [62]) for taxonomic classification and those that were assigned to eukaryotes were discarded.
As recommended by dbCAN2 authors, CAZyme counts (note that "CAZyme" refers to functional modules or domains, not genes) resulting from dbCAN assignment using HMMER tool were considered for the following analyses. For comparative analyses, CAZyme counts were normalized to even out the heterogeneity arising for differential library sizes. For each sample, CAZyme counts were divided by the number of ORFs in the metagenome or transcriptome of interest to calculate the relative abundance for each CAZyme family. Then, the normalized count of each family in each metagenome or transcriptome was calculated by multiplying the relative abundance by the lowest number of ORFs identified in corresponding datasets: 202,349 (i.e. male tissues of P. dilatatus dilatatus) for the newly sequenced metagenomes, 22,641 (i.e. female tissues of A. vulgare from the field) for A. vulgare metagenomes of our previous study and 19,473 (i.e. A. aquaticus transcriptome) for host transcriptomes.
All CAZy families known to potentially contribute to lignocellulose degradation were then selected for further analysis. However, a single CAZy family can bring together enzymes involved in a large variety of carbohydrate-modifying activities, including lignocellulose degradation. For that reason, the enzymatic activities of the CAZymes belonging to those families were predicted using Hotpep [70] in order to confirm their implication in lignocellulose degradation. When Hotpep could not predict the function of the CAZymes, we considered their most common activity reported in the CAZy database [13].

Community profiling
To identify microbial communities encoding CAZymes, ORFs annotated as CAZymes were compared with the Non-Redundant Protein database (December 1, 2018) using BLASTP [60]. An E value cut-off of 0.0001 was used and the top five hits were kept. MEGAN6 software (version 6.15; [61]) was then used for taxonomic assignation of ORFs using the NCBI taxonomy database, and to construct principal component analysis (PCA) and phylogenic tree. Results were visualized using the Phyloseq R package [71].

PUL and cellulosome identification
PULs consist of co-localized and co-regulated genes organized around an SusC-SusD gene pair and encoding proteins that degrade complex carbohydrates. They might play an important role in the breakdown of lignocellulose in Bacteroidetes species [49]. To identify potential PULs in microbiota, ORFs of metagenomes were first compared to the Pfam database (version 32.0; [72]) using hmm-search (version 3.2.1; [67]) with an E value cut-off of 0.0001 to identify conserved domains. Then, contigs encoding PUL gene markers [SusD like proteins (PF07980) and TonB-dependent receptor/SusC like proteins (PF00593)] were extracted and those encoding one sequential pair of susC and susD were kept for the following analyses. These sequences were subjected to CAT-BAT for taxonomic assignation, dbCAN2 for CAZyme annotation and Prokka (version 1.9; [73]) for gene annotation.
Cellulosomes are multi-enzyme lignocellulosic systems organized around a scaffolding and attached to bacterial cells [44]. Enzymes bind to the scaffolding thanks to interactions among cohesin and dockerin modules. To identify cellulosome systems in microbiota, we searched for cohesin Pfam domain (PF00963) and dockerin Pfam domain (PF00963) in the conserved domains predicted above, as well as for S-layer homology domain (SLH; PF00395). Then, contigs encoding those modules were subjected to CAT-BAT for taxonomic assignation and dbCAN2 for CAZyme annotation.