Transcriptomic analysis of lignocellulosic biomass degradation by the anaerobic fungal isolate Orpinomyces sp. strain C1A

Background Anaerobic fungi reside in the rumen and alimentary tract of herbivores where they play an important role in the digestion of ingested plant biomass. The anaerobic fungal isolate Orpinomyces sp. strain C1A is an efficient biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple types of lignocellulosic biomass. To understand the mechanistic and regulatory basis of biomass deconstruction in anaerobic fungi, we analyzed the transcriptomic profiles of C1A when grown on four different types of lignocellulosic biomass (alfalfa, energy cane, corn stover, and sorghum) versus a soluble sugar monomer (glucose). Results A total of 468.2 million reads (70.2 Gb) were generated and assembled into 27,506 distinct transcripts. CAZyme transcripts identified included 385, 246, and 44 transcripts belonging to 44, 13, and 8 different glycoside hydrolases (GH), carbohydrate esterases, and polysaccharide lyases families, respectively. Examination of CAZyme transcriptional patterns indicates that strain C1A constitutively transcribes a high baseline level of CAZyme transcripts on glucose. Although growth on lignocellulosic biomass substrates was associated with a significant increase in transcriptional levels in few GH families, including the highly transcribed GH1 β-glucosidase, GH6 cellobiohydrolase, and GH9 endoglucanase, the transcriptional levels of the majority of CAZyme families and transcripts were not significantly altered in glucose-grown versus lignocellulosic biomass-grown cultures. Further, strain C1A co-transcribes multiple functionally redundant enzymes for cellulose and hemicellulose saccharification that are mechanistically and structurally distinct. Analysis of fungal dockerin domain-containing transcripts strongly suggests that anaerobic fungal cellulosomes represent distinct catalytic units capable of independently attacking and converting intact plant fibers to sugar monomers. Conclusions Collectively, these results demonstrate that strain C1A achieves fast, effective biomass degradation by the simultaneous employment of a wide array of constitutively-transcribed cellulosome-bound and free enzymes with considerable functional overlap. We argue that the utilization of this indiscriminate strategy could be justified by the evolutionary history of anaerobic fungi, as well as their functional role within their natural habitat in the herbivorous gut. Electronic supplementary material The online version of this article (doi:10.1186/s13068-015-0390-0) contains supplementary material, which is available to authorized users.


Background
Lignocellulosic biomass is a vast and underutilized resource for the production of biofuels. Compared to current schemes that rely on edible crops, lignocellulosic biomass utilization for sugar and biofuel production offers multiple advantages. It is abundant, renewable, and alleviates the moral stigma of using edible crops for industrial purposes. Further, the utilization of available lignocellulosic biomass overcomes the need for expanding farming acreage, and the subsequent increase in input of chemical fertilizers to the environment [1][2][3].
One of the most important procedures for the production of lignocellulosic biofuels involves the utilization of enzymes to extract sugar from plant polymers. The extracted sugars are then converted into biofuel using dedicated sugar-fermenting microorganisms [4]. However, the sugar extraction process from lignocellulosic biomass is far more complicated than sugar extraction from cereal grains (mainly corn in the US) [5]. This is due to the fact that the target substrates in lignocellulosic biomass (cellulose and hemicellulose) are structural components of plant cell walls, which are chemically bound to a variety of complex macromolecules (mainly lignin) [6]. Therefore, a combination of chemical pretreatments and the addition of exogenous enzyme cocktails are required for their effective mobilization and deconstruction [7,8]. Enzymatic treatment of lignocellulosic biomass is a complex endeavor requiring multiple enzymes, a fact that significantly raises the cost of the process.
One alternative that circumvents the need for harsh pretreatments and exogenous enzyme amendments for the extraction of sugar monomers from lignocellulosic biomass is the use of specialized microbial cultures for biomass deconstruction [9][10][11]. Microbial strains capable of cellulose and/or hemicellulose degradation produce not only cellulolytic and xylanolytic enzymes targeting the backbone of these polymers, but also multiple accessory enzymes for removing side chains and breaking lignin-hemicellulose bonds [12][13][14]. Of special interest are lignocellulolytic microbes exhibiting anaerobic fermentative mode of metabolism since a significant fraction of the starting substrates could be recovered as a fermentation end product.
The anaerobic gut fungi (Phylum Neocallimastigomycota) are unique in combining the resilience and invasiveness of fungi with the metabolic capabilities of anaerobic fermentative prokaryotes [15]. Anaerobic fungi are inhabitants of the rumen and alimentary tract of herbivores where they play an important role in the metabolism of ingested plant material [16]. It has been established that in such habitats these organisms play a role akin to their aerobic counterparts in soils and streams. By attaching themselves to plant materials, they colonize and excrete extracellular enzymes that mobilize the structural plant polymers to be available to other microbes. Anaerobic fungi possess a powerful cellulolytic and hemicellulolytic enzymatic machinery [12] that aids in the required fast and efficient degradation of plant material in its relatively short residence time within the herbivorous gut [17]. Such capabilities have been demonstrated through experimental evaluation of anaerobic fungal isolates [18][19][20][21], biochemical characterization of anaerobic fungal enzymes [12], and recent genomic analysis of their lignocellulolytic repertoire [22].
We are currently exploring the utility of an anaerobic fungal isolate (Orpinomyces sp. strain C1A, henceforth referred to as strain C1A) for use in a consolidated bioprocessing framework for biofuel production. Developing an understanding of the genetic and regulatory mechanisms that enable efficient biomass degradation by strain C1A is central to gauging its potential as a sugar extraction platform in biofuel production schemes. Our previous efforts have documented the lignocellulosic biomass-degrading capabilities of C1A [22,23] and the expansion of carbohydrate-active enzymes (CAZymes) in its genome [22]. However, key questions regarding strain C1A lignocellulolytic capabilities remain unanswered. For example, patterns of differential transcription of various CAZyme families, especially those mediating apparently similar enzymatic activities, when grown on different types of substrates are currently unclear. Similarly, the differential transcriptional patterns and putative contribution to biomass degradation of the large number of CAZyme genes identified in C1A genome have not been investigated in anaerobic fungi. Finally, the transcriptional profiles and differential transcriptional patterns of fungal dockerin-containing (putatively cellulosome-bound) have yet to be determined in anaerobic fungi.
Here we present a detailed comparative analysis of the transcriptomic profiles of C1A when grown on four different types of lignocellulosic biomass (alfalfa, energy cane, corn stover, and sorghum), versus a soluble sugar monomer (glucose). Our analysis aimed at addressing the patterns of regulation of lignocellulosic gene transcription in C1A, the contribution of various CAZyme gene families to biomass degradation in C1A, and the significance of gene expansion and duplication observed in the C1A genome on its lignocellulolytic capabilities.

RNA-Seq output summary
A total of 468,159,494 (70.2 Gb) quality-filtered reads were used for transcriptome assembly and quantitative RNA-Seq analysis ( Table 1). The number of reads generated for each growth condition ranged from 58.61 million (8.7 Gb) in alfalfa-grown cultures to 141.24 million (21.19 Gb) in sorghum-grown cultures (Table 1). This level corresponds to 88.73X-201.77X genomic coverage, and 426.73X-1115.07X predicted cDNA coverage. The generated assembly had an N50 of 1319 bp. A total of 27,506 distinct transcripts with predicted peptides were identified in the assembly.
Within highly transcribed GH families putatively involved in cellulose degradation (GH1, GH3, GH5, GH6, Table 1 General statistics of RNA-Seq output a Genome coverage based on an estimated 100.5 Mb genome size [18] b cDNA coverage is based on a 20.76 % genome coding density [18] c Assembled transcript coverage is based on the total assembled transcript size (35. [44] b Fold change is shown as Log 2 expression levels compared to glucose. Italics represents significantly over-expressed (a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq), bold italics represents significantly under-expressed (a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq [44] GH9, GH45, and GH48, defined using a normalized FPKM cutoff value > pyruvate kinase, the glycolytic gene with the lowest transcriptional level under all growth conditions), only one putative cellobiohydrolase (GH6) and one putative β-glucosidase (GH1) families were significantly upregulated in all plant biomass conditions compared to glucose. One putative endoglucanase family (GH9) showed higher transcriptional levels on all plant biomass conditions, although this upregulation was significant (p value <0.01) only in three (energy cane, corn stover, and sorghum) out of four examined growth conditions. One the other hand, GH48 cellobiohydrolases were significantly downregulated in alfalfa-and sorghumgrown cultures compared to glucose ( Fig. 1; Table 2). While few, yet distinct, differential regulation patterns were observed in cellulolytic GH families, no clear family wide up-or downregulation patterns were observed in xylanolytic families. Transcriptional levels of the GH10 putative xylanases and GH39 and GH43 putative xylosidases did not show any statistically significant difference when comparing all four plant biomass conditions, compared to glucose ( Fig. 1; Table 2). Within GH11 xylanases, significant upregulation was observed only in sorghum-grown cultures compared to glucose-grown cultures ( Fig. 1; Table 2). Collectively, these results suggest that strain C1A constitutively transcribes high level of lignocellulosic enzyme transcripts, even in the absence of lignocellulosic substrates, with growth on lignocellulosic biomass with the substrate eliciting few distinct changes in transcriptional patterns of specific GH families ( Fig. 1; Table 1). This overall pattern of transcriptional change, or lack thereof, is quite distinct from the scheme utilized by aerobic lignocellulolytic fungi (e.g., Aspergillus niger and Trichoderma reesei [24,25]), where growth on lignocellulosic biomass causes a drastic induction of cellulolytic and lignocellulolytic enzymes from low, almost undetectable transcriptional levels on glucose. However, this pattern is broadly similar to transcriptomic response observed in anaerobic lignocellulolytic bacteria (e.g., Clostridium phytofermentans, C. cellulolyticum, C. thermocellum [26][27][28]), which grow and express their CAZymes on glucose as well as lignocellulolytic biomass. On a single-transcript level, 39 (energy cane) to 48 (alfalfa) GH transcripts were significantly (p < 0.01) upregulated in biomass-grown versus glucose-grown cultures, while a broadly comparable number of transcripts (53 sorghum-66 corn stover) were significantly downregulated. The majority of transcripts (192 in corn stover and energy cane, and 210 in alfalfa and sorghum), however, did not show a significant change in transcription levels (p > 0.1) ( Table 3, Additional file 1: Table S1). A similar pattern was also observed for CE and PL families as well (Table 3, Additional file 1: Table S1).
We also correlated transcriptional levels of various GH families with the composition (cellulose and hemicellulose content, Additional file 1: Table S3) of plant materials examined as growth substrates in this study. Transcriptional levels of some cellulolytic CAZyme families, e.g., GH5, GH6, GH9, GH48, and GH124, were positively correlated (Pearson correlation coefficients of 0.42, 0.81, 0.71, 0.58, and 0.62, respectively) with the substrates' cellulose content (i.e., overall normalized FPKM of the family was higher in plants with higher cellulose content). However, no such correlation was observed for GH8 or GH45 (Pearson correlation coefficients of 0.06 and −0.36, respectively). On the other hand, no clear correlation was observed between transcriptional levels of xylanase CAZyme families (GH10 and GH11) and hemicellulose content (Pearson correlation coefficient of −0.32 and −0.19, respectively). GH39 xylosidase showed a positive correlation with hemicellulose content (Pearson correlation coefficient of 0.60), while GH43 xylosidase showed a strong negative correlation with hemicellulose content (Pearson correlation coefficient of −0.93).

Strain C1A employs multiple functionally redundant but structurally and mechanistically distinct processes for biomass degradation
To examine the relative contribution of various CAZyme families to biomass degradation under different growth conditions, we quantified the relative transcriptional levels of families putatively mediating the deconstruction of various plant polymers as a fraction of an overall specific activity. Our results (Fig. 2, Additional file 1: Table  S4) demonstrate that strain C1A co-transcribes multiple functionally redundant enzymes (i.e., mediating the exact same chemical reaction and targeting the same substrate) that are, nevertheless, mechanistically and structurally distinct. While the identification of many of these genes in anaerobic fungi has been previously documented [22,29], their differential transcriptional patterns and relative contribution to biomass degradation under various growth conditions have not been previously studied. For example, transcripts of putative endoglucanases Table 3 Transcriptional patterns of C1A CAZymes on various substrates a Significantly upregulated refers to the number of transcripts with a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq [44] b Upregulated refers to the number of transcripts with a differential expression p value between 0.01 and 0.1 as calculated by the nbinomTest function in the R package DESeq [44] c Downregulated refers to the number of transcripts with a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq [44] d Significantly downregulated refers to the number of transcripts with a differential expression p value between 0.01 and 0.1 as calculated by the nbinomTest function in the R package DESeq [44] e No change refers to the number of transcripts with a differential expression p value >0.1 as calculated by the nbinomTest function in the R package DESeq [44]   belonging to five distinct families were identified, three of which [the (α/β) 8 TIM barrel retaining GH5, the (α/α) 6 barrel inverting GH9, and the β barrel inverting GH45) represented >15 % of overall endoglucanases under all growth conditions (Fig. 2a]. A similar high level of cotranscription of the inverting α/β barrel GH6 putative cellobiohydrolase acting on the non-reducing end of cellulose molecules and the retaining (α/β) 8 TIM barrel putative cellobiohydrolase acting on the reducing end of the cellulose molecule was observed (Fig. 2b). Finally, a high co-transcriptional level of GH1 and GH3 putative β-glucosidases was also observed (Fig. 2c). Within putative xylanolytic enzymes, a similar phenomenon is observed between the retaining (α/β) 8 TIM barrel GH10 putative xylanase and the retaining β-jelly roll GH11 (Fig. 2d), and the same dynamic was observed between putative xylosidases (GH39 and GH43) mediating depolymerization of xylooligomers (Fig. 2e).
Interestingly, distinct shifts in the relative transcript abundances of GH families as a fraction of an overall specific activity were frequently observed (Fig. 2). Within glucose-grown cultures, the majority of putative endoglucanases belonged to GH45 (65 % of putative endoglucanases normalized FPKM in glucose-grown cultures). However, when grown on plant biomass, the relative abundance of GH45 decreased, with a concomitant increase in the relative abundance of GH9 putative endoglucanases (Fig. 3a). Similarly, growth on plant biomass was invariably associated with an increase in the relative contribution of GH6 and a reciprocal decrease in the relative contribution of GH48 to the overall cellobiohydrolase activity (Fig. 3b).

A limited number of lignocellulolytic transcripts are highly transcribed under all growth conditions
Within a single CAZyme gene family, often a large number of distinct transcripts were identified, and this was especially true for families with a high overall transcriptional activity (Table 2). Indeed, a broad positive correlation between the total FPKM level of a specific GH family and the number of transcripts identified belonging to this family was observed (Additional file 1: Figure S2). To further zoom in on the putative variations in the contribution of specific transcripts belonging to a certain GH family to biomass degradation, we examined the transcriptional levels of all individual transcripts within key GH families. Out of the large number of transcripts identified in each family (Additional file 1: Table S1), a fairly limited (1-6) number of transcripts were dominant (i.e., represent >10 % of the total normalized family FPKM under at least one growth condition) in all instances (Fig. 4). Transcriptional patterns of dominant transcripts under different growth conditions varied across different CAZyme families. In some families (e.g., GH6, GH13, and GH39), a single transcript represented the majority (>60 %) of all FPKM levels across all growth conditions. In other instances, few (2-3) transcripts consistently represented the majority of family transcripts, with their relative abundance patterns remaining fairly stable across various growth conditions (e.g., GH18, GH43, and GH57). Within the remaining families, a significant shift in the relative transcriptional level, and hence putative contribution, was observed between different growth conditions. For example, specific transcripts in GH5 (m.22928), GH13 (m.23494), GH43 (m.5510), GH45 (m.23474), and GH48 (m.19942) appear to be highly transcribed in glucose-grown cultures, but their relative importance diminishes in lignocellulosic biomassgrown cultures. Conversely, some transcripts appear to be prominent and differentially upregulated in lignocellulosic biomass-grown cultures, while their contribution to the overall activity dwindles in glucose-grown cultures (e.g., m.17949 and m.17964 in GH9, m.20865 in GH10, m.21149 in GH11, and m.23473 in GH45). Collectively, the results demonstrate that while some families show differential transcriptional patterns in response to growth conditions, a few stable "core" of transcripts, especially within highly transcribed CAZyme families in strain C1A, appears to be consistently predominant.

Fungal dockerin domain (FDD)-containing transcripts
Anaerobic fungi produce cellulosomes with surfaceattached structures where multiple enzymes act synergistically toward the degradation of lignocellulosic biomass. As previously described, cellulosome-bound genes in anaerobic fungi usually harbor a fungal dockerin domain (FDD) that is similar in structure to carbohydrate-binding module family 10 (CBM10) [12]. By determining FDD occurrence in all transcripts, a total of 278, 283, 292, 288, and 291 were putatively identified as "cellulosome-bound transcripts" in glucose-, alfalfa-, energy cane-, corn stover-, and sorghum-grown C1A cultures, respectively (Additional file 1: Table S5), with the absolute majority of transcripts identified in all examined growth conditions. Cellulosome-bound transcripts were affiliated with 4 broad major categories: biomass-degrading CAZymes and accessory enzymes; hypothetical and conserved hypothetical proteins; proteases, phosphohydrolases, and protease inhibitors (serpins); and the enigmatic CotH family protein transcripts previously observed in fungal and bacterial cellulosomes and previously implicated as a structural component of the cellulosome [30]. Analysis of the transcriptional patterns of FDD transcripts under different growth conditions indicated that the relative contribution of the four major categories described above to the overall cellulosome composition did not vary significantly when C1A was grown on glucose versus plant biomass (likelihood ratio χ 2 = 59.88, p value = 0.055).
Examination of FDD CAZyme and accessory transcripts (Additional file 1: Table S5) suggests the involvement of the cellulosome in all stages of cellulose (putative endoglucanases, cellobiohydrolases, and β-glucosidases), arabinoglucoxylan (putative xylanases, xylosidases, arabinofuranosidases, acetylxylan esterase, and feruloyl esterases), xyloglucan (xyloglucanases), and glucomannan (putative mannanases/mannosidases) degradation. Within a specific GH family, the relative contributions  Fig. 3 Relative contribution of dominant transcripts within various GH families under different growth conditions. Only families with overall transcriptional level under all growth conditions above 1 % that of a suite of glycolytic genes (pyruvate kinase, glyceraldehyde-3-phosphate dehydrogenase, and fructose-1,6-bisphosphate aldolase) were studied. Within these, genes were selected that represented 10 % or more of the overall moralized FPKM under any growth condition. "Others" denotes all additional transcripts that never exceeded >10 % of overall moralized FPKM under any growth condition of FDD transcripts to the overall family transcriptional level varied (Table 4). Based on the number of transcripts and transcriptional activity, FDD transcripts represent the absolute majority of transcriptional activity in GH48 putative cellobiohydrolases, the majority in GH5 putative endoglucanases, roughly half the transcriptional activity in GH9 putative endoglucanases, GH10 putative xylanases, and GH43 putative β-xylosidases, and a small fraction of the transcriptional activities of GH11 putative xylanases and GH45 putative endoglucanases (Table 4). Interestingly, overall expression of GH and accessory enzyme transcripts was significantly downregulated in three (alfalfa, energy cane, and sorghum) growth conditions (Additional file 1: Table S6), mainly due to the significant downregulation of GH48, a major component of the cellulosome, under these growth conditions ( Table 2, Additional file 1: Table S6). Other notable contributions of the putatively cellulosome-bound, FDD-harboring transcripts to biomass degradation include the prevalence of carbohydrate esterases (3.5-5.7 % of overall FDD transcripts, depending on the growth condition) and feruloyl esterases (up to 3.8 % of overall FDD transcripts) within all FDD-harboring transcripts (Table 4), suggesting an important role of the cellulosome in the mobilization and debranching of hemicellulose backbones. In addition to CAZyme families responsible for cell wall decomposition, an important accessory transcript belonging to the swollenin/expansin enzyme family was identified as cellulosome bound. This enzyme family enables plant cell lengthening through non-catalytic disruption of hydrogen bonds in plant cell walls [31]. Homologs of this enzyme family have also been shown to enhance cell wall decomposition when utilized by microorganisms [32]. Out of the five swollenin/expansin transcripts  identified, four contained an FDD and represented 89-97 % of total normalized FPKM activity, depending on the growth condition, of total swollenin transcripts identified in C1A transcriptome (Table 4). Although swollenin and GH45 are structurally related [33], the predominant cellulosomal transcriptional pattern of the non-enzymatic swollenin is in contrast to that observed mostly free extracellular patterns of GH45 transcripts. The predominance of this non-catalytic homolog in the cellulosome emphasizes their important role in cell wall weakening as an additional mechanism to enhance plant biomass degradation efficiency by cellulosomal catalytic enzymes.

Discussion
In this study, we analyzed transcriptional patterns in strain C1A when grown on plant biomass as well as soluble (glucose) substrates. Collectively, our results suggest a Corrected FPKM values normalized by the library size, as calculated using the estimateDispersions function in the R package DESeq [44]  that strain C1A constitutively transcribes a wide array of FDD-containing (i.e., putatively cellulosome-bound) and free extracellular lignocellulolytic enzymes under all examined conditions. The results also highlight the simultaneous involvement of multiple functionally redundant CAZymes in plant biomass degradation, arguably as a tool to improve the speed and extent of biomass degradation by anaerobic fungi within its natural habitat (the herbivorous gut). Finally, the results provide an indepth evaluation of the contribution of free versus FDDcontaining (i.e., putatively cellulosome-bound) enzymes in biomass degradation in strain C1A.
Our results demonstrate that strain C1A constitutively transcribes a wide array of transcripts encoding lignocellulolytic enzymes ( Table 2, Additional file 1: Tables S1, S2, Figure S2). Microorganisms growing on lignocellulosic biomass invariably spend a large fraction of their carbon and energy reserves on the synthesis and export of lignocellulolytic enzymes (CAZymes). Therefore, regulation of the biosynthesis of such enzymes is key for optimal ecological fitness and resource allocation. Within model lignocellulolytic aerobic fungi, e.g., A. niger and T. reesei, growth on lignocellulosic biomass causes a drastic induction of cellulolytic and lignocellulolytic enzymes from almost undetectable transcriptional levels on glucose-grown cultures, to ≈12-20 % of the overall mRNA [24,25]. This induction pattern is associated with a drastic change in the relative composition of the CAZyme transcriptome from a glucoamylase-dominated profile when grown on glucose or other soluble substrates to an endoglucanase-, cellobiohydrolase-, xylanase-, arabinofuranosidase-, acetylxylan esterase-, and polysaccharide monooxygenase-dominated profile when grown on lignocellulosic biomass [24,25]. On the other hand, multiple anaerobic prokaryotes (e.g., Clostridium cellulolyticum, C. phytofermentans, and C. thermocellum) possess constitutively expressed CAZymes and high overall transcriptional levels of lignocellulolytic enzymes are observed in glucose-grown cultures [26][27][28]. Indeed, it is postulated that glucose sensing appears to act as a priming mechanism that stimulates biosynthesis of a wide range of CAZymes [26][27][28]. Our results suggest that anaerobic fungi employ a model similar to anaerobic bacteria as opposed to aerobic fungi. This conclusion is in accordance with our understanding of the ecological niche and life cycle of anaerobic fungi within its restricted habitat in the herbivorous gut. In such an environment, the life cycle of anaerobic fungi alternates between metabolically dormant spores and hyphae germinating from spores when ingested plant biomass is encountered in the gut. Fungal germination and growth is hence invariably linked to the availability of ingested plant biomass. Therefore, spore germination, hyphal growth, and production of lignocellulolytic enzymes in anaerobic fungi are tightly linked, and it is inconceivable to envision a situation in which anaerobic fungi grow solely on a soluble substrate within their natural habitat. Therefore we argue that, due to their ecological niche, their role as initial colonizers of plant biomass, and their sole dependence on plant biomass as a substrate within their natural habitat, the need for development of sophisticated mechanisms for regulating the expression of CAZyme genes is non-existent in anaerobic fungi. This is drastically different from what is encountered by aerobic lignocellulolytic fungi in their natural environments, where gradients in environmental conditions (temperature, pH, moisture), substrate availability (by season) and type (plant biomass vs sugars), and the relatively large residence time and degradation rates necessitate the development of regulatory processes for enzymatic biosynthesis. Nevertheless, despite this constitutive pattern of CAZyme gene transcription in anaerobic fungi it appears that growth on plant biomass triggers a distinct response in CAZyme GH families and individual transcripts ( Table 2; Figs. 2, 3). The rationale behind these family and transcript level shifts, observed mainly within GH families and transcripts involved mainly in various aspects of cellulose degradation, remains unclear.
Another interesting characteristic in lignocellulosic biomass degradation by strain C1A is the simultaneous engagement of a large number of functionally redundant enzymes in the degradation of a single polymer (e.g., cellulose or arabinoxylan). We argue that this strategy is employed by C1A to increase the efficacy and speed of the degradation process, and hence maximize the extent of plant biomass degraded within its relatively short residence time in the herbivorous gut. Further, the complementary nature of this strategy is further accentuated by variations in the location of the enzymes (cellulosomal vs free extracellular), the nature of the substrate targeted (chain length and side chains preferences), and the target position (e.g., reducing vs non-reducing end) within the substrate. Transcripts encoding most enzymatic activities required for the degradation of cellulose and hemicellulose are well represented in both putatively cellulosomal and non-cellulosomal fractions, allowing for the simultaneous degradation of these polymers at two distinct locations. Strain C1A simultaneously transcribes high levels of GH10 and GH11 family transcripts. GH10 enzymes are known to have broader substrate specificity, with the capability to attack xylan backbones with a high degree of substitutions and smaller xylo-oligosaccharides [34]. Therefore, such a pattern of high co-transcription allows for the instant and sustained breakdown of xylan backbone polymer regardless of their length and progress in side chain removal by accessory enzymes. Finally, the co-transcription of GH6 and GH48 cellobiohydrolases by C1A allows for the simultaneous targeting of reducing ends of both celluloses and cellooligosaccharides in plant biomass to improve speed and efficiency of cellulose degradation.
Third, our results highlight the importance of anaerobic fungal cellulosomes for biomass degradation. While broad upregulation in FDD transcripts was observed in plant biomass-grown versus glucose-grown cultures, no drastic changes in membership (presence/absence) of specific transcripts or composition (relative levels of specific transcripts) were observed. The results suggest that cellulosome structure does not vary considerably depending on the growth substrate, as previously suggested. Further, FDD transcripts identified strongly suggest that cellulosomes represent distinct catalytic units capable of independently attacking and converting intact plant fibers to sugar monomers. A large number of highly transcribed transcripts are involved in the initial disruption of plant fiber architecture through non-catalytic hydrolysis of hydrogen bonds (swollenin), mobilization of target plant polymers (feruloyl esterases), side chain removal (acetylxylan esterase, polysaccharide deacetylase), and degradation of plant polymers into sugar monomers (endoglucanases, cellobiohydrolases, β-glucosidases; xylanases and xylosidases).

Conclusions
Our work demonstrates that strain C1A constitutively transcribes a wide array of lignocellulolytic enzymes under different growth conditions. Although many of these enzymes are functionally redundant, differences in location (cellulosomal vs free extracellular), substrate preference (polymer length and substitution patterns), and target position within the substrate (e.g., reducing vs non-reducing end) allow for fast and efficient utilization of target substrates in the relatively short time frame of availability within the herbivorous gut. The utilization of this indiscriminate strategy as an ecological and evolutionary necessity, as well as capability of anaerobic fungi to utilize a broad range of plant biomass including lignocellulosic biomass substrates, renders anaerobic fungi appealing, yet understudied, candidates for utilization in biomass conversion to sugars and biofuels.

Orpinomyces sp. strain C1A
Strain C1A was isolated from the feces of an Angus steer in our laboratory on a cellobiose-switchgrass medium as described previously [22]. Strain C1A is maintained by biweekly subculture on a cellobiose-rumen fluid medium as described previously [35].

Plant biomass
Samples of mature Sorghum (Sorghum bicolor) and mature energy cane (Saccharum officinarum var. Ho02) were obtained from Oklahoma State University experimental plots in Stillwater, OK. Dried alfalfa (Medicago sativa) was obtained from a local farm and ranch supplier. Samples of corn stover from Zea mays were obtained from the Industrial Agricultural Products Center at the University of Nebraska in Lincoln. The composition of all substrates is listed in Additional file 1: Table S5.

Experimental setup
All transcriptomic experiments were conducted in a rumen fluid-free basal medium containing (g L −1 ) 0.5 g yeast extract, 0.47 g sodium butyrate, 2.4 g sodium acetate, 0.8 g sodium propionate, 2 g tryptone, 2 ml hemin solution (5 g L −1 in 1 M NaOH), 9.3 ml fatty acid solution (composition ml L −1 : 11.7 ml isobutyric acid, 11.7 ml valeric acid, 11. 7 ml isovaleric acid, and 11.7 ml methylbutyric acid), 150 ml mineral solution I (3 g L −1 K 2 HPO 4 ), 150 ml mineral solution II (composition g L −1 : 3 g KH 2 PO 4 , 6 g (NH 4 ) 2 SO 4 , 6 g NaCL, 0.6 g MgSO 4 ·7H 2 O, 0.6 g CaCl 2 ·2H 2 O), 10 ml Balch Vitamin solution (composition mg L −1 : 2 mg biotin, 2 mg folic acid, 10 mg pyridoxine-HCl, 5 mg thiamine-HCl, 5 mg riboflavin, 5 mg nicotinic acid, 5 mg DL calcium pantothenate, 0.1 mg vitamin B12, 5 mg PABA, 5 mg lipoic acid), and 1 ml Wolin's metal solution (composition g L −1 : 0.5 g EDTA, 3 g MgSO 4 ·7H 2 O, 0.5 g MnSO 4 ·H 2 O, 1 g NaCl, 0.1 g CaCl 2 ·2H 2 O, 0.1 g FeSO 4 ·7H 2 O, 0.1 g ZnSO 4 ·7H 2 O, 0.01 g CuSO 4 ·7H 2 O, 0.01 g AlK(SO 4 ), 0.01 g Na 2 MoO 4 ·2H 2 O, 0.01 g boric acid, 0.005 g Na 2 SeO 4 , 0.003 g NiCl 2 ·6H 2 O, 0.1 g CoCl 2 ·6H 2 O). After the medium was prepared, the pH was adjusted to 6.6. The medium was then dispensed under strictly anaerobic conditions as previously described [36,37]. After the medium was dispensed, sodium carbonate (6 g L −1 ) was added and the bottles were stoppered, sealed, and autoclaved at 121 °C for 20 min. After autoclaving, the bottles were cooled to room temperature. Bottles that were amended with plant materials were moved to an anaerobic glove bag (Coy Laboratory Products Grass Lake, MI), where the appropriate type of plant biomass (10 g L −1 ) was added. The bottles were then stoppered, sealed, and removed from the glove bag, and the headspace was replaced by repeated vacuuming and repressurization with 100 % CO 2 (insert Balch reference). Bottles that contained glucose were amended with 3.75 g L −1 of an anaerobic, sterile stock solution. All experiments that were conducted with plant biomass and glucose were performed in duplicate. The inoculum source for these experiments consisted of strain C1A that was grown in a rumen fluid-free cellobiose medium (same composition as above with the addition of 10 g L −1 cellobiose as the carbon source) until late log/early stationary phase. The inoculum was then centrifuged and resuspended in 20 ml of basal media with no carbon source. The experiment was started by adding this slurry of basal medium and fungal biomass (approximately 48 mg) into the appropriate bottles described above.

RNA extraction and sequencing
RNA extraction was conducted on late log phase cultures after 48 h of inoculation. Fungal biomass was harvested by vacuum filtration and ground into fine particles with a pestle under liquid nitrogen as previously described [35]. Total cellular RNA was extracted from ground fungal biomass using Epicentre MasterPure Yeast RNA Purification kit (Epicentre, Madison, WI, USA), stored in the provided RNase-free TE buffer, and quantified using Qubit fluorometer (Life Technologies, Carlsbad, CA, USA).
RNA-Seq analysis [38] was conducted using the HiSeq 2000 platform with 125 × 2 paired-end read chemistry at the University of Georgia Genomics Facility (Athens, GA, USA). Biological replicate sequencing libraries for all conditions (glucose, corn stover, sorghum, energy cane, and alfalfa) were created with poly-A tailed mRNA enrichment using the standard Illumina TruSeq mRNA RNA-Seq protocol (http://www.utsouthwestern.edu/ labs/next-generation-sequencing-core/assets/truseqstranded-mrna-sample-prep-guide.pdf). The sequencing libraries had an average insert size of approximately of ~300 bp.

Transcriptome assembly and RNA-Seq quantification
To represent all biological isoforms present in various growth conditions, the generated Illumina sequencing RNA-Seq [38] reads were assembled [39] by the de novo transcriptomic assembly program Trinity [40] using previously established protocols [41]. All settings for Inchworm, Chrysalis, and Butterfly steps were implemented according to the recommended protocol for fungal genomes, with the exception of the absence of the "-jac-card_clip" flag due to the low gene density of anaerobic fungal genomes. The assembly process was conducted on the Oklahoma State University High Performance Computing Cluster using a dual Intel Xeon E5-2620 "Sandy Bridge" hex core 2.0 GHz CPU node with 256 GB of RAM (https://hpcc.okstate.edu/content/cowboy-overview). Quantitative levels for all assembled transcripts were generated by mapping all generated sequencing reads to the assembled transcripts using the short read alignment mapping program Bowtie2 [42]. The quantitative program RSEM [43] was used to calculate all quantitative values in Fragments Per Kilobase of transcript per Million mapped reads (FPKM). To assess variability between biological replicates, the coefficient of determination R 2 was calculated between biological replicate pairs using RSEM-generated FPKM values. All FPKM values were normalized to the library size using the R package DESeq [44]. The obtained p-values were used to assess the significance of transcripts' up-and downregulation as shown in Tables 2, 3, and 4. All normalized FPKM values shown are averages of two biological replicates. Total normalized FPKM values of different GH families when Orpinomyces C1A was grown on different substrates were used in principal component analysis (PCA) using the R statistical package Labdsv [45], and the results were visualized in a biplot (Table 5).

Transcript functional annotation and CAZyme identification
Transcript annotation of all genes was conducted using a combination of homology comparison to public databases, protein domain identification, and peptide secretion signal prediction. Predicted protein sequences from the assembled transcripts were generated using the Transdecoder software portion in the Trinity package [40]. Transcripts that were present in at least one condition with an FPKM ≥1 and contained a predicted peptide coding regions were used in subsequent analysis. Predicted peptides were compared to public databases to identify the phylogeny using NCBI Blast C++ [46], where an e-value of e −5 or less was used as a cutoff for Blast classification. Signal peptide prediction was conducted using signalP 4.0 [47] using the recommended settings and eukaryotic training set. Protein domain identification [48] was achieved using the Table 5 Transcriptional patterns of C1A cellulosomal genes on various substrates a Significantly upregulated refers to the number of transcripts with a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq [44] b Upregulated refers to the number of transcripts with a differential expression p value between 0.01 and 0.1 as calculated by the nbinomTest function in the R package DESeQ [44] c Downregulated refers to the number of transcripts with a differential expression p value <0.01 as calculated by the nbinomTest function in the R package DESeq [44] d Significantly downregulated refers to the number of transcripts with a differential expression p value between 0.01 and 0.1 as calculated by the nbinomTest function in the R package DESeq [44] e No change refers to the number of transcripts with a differential expression p value >0.1 as calculated by the nbinomTest function in the R package DESeq [44]  hmmscan portion of the HMMER software package [49].
An e-value of e −4 was used as a cutoff for significance for domain assignment. All predicted peptide sequences were profiled against the PFAM 27.0 database [48] for general functional domain assignment. To specifically identify peptide sequences that are putative carbohydrate-active enzymes (CAZymes), all sequences were profiled against the Database for automated carbohydrate-active enzyme annotation (dbCAN) [50]. Sequences identified were further classified through manual curation and structural comparisons. Putative cellulosomal localization of transcripts was identified by the presence of the CBM_10 (Dockerin) domain that has previously been established as the enzyme attachment component to cellulosome in anaerobic fungi [51]. Differential transcriptional patterns between different conditions were analyzed by comparing Log 2 [FPKM biomass / FPKM glucose ] values. For inter-condition comparisons, a threshold of log 2 ratio >1 and log 2 ratio <−1 (corresponding to twofold over-or under-expression, respectively) was used to designate a specific transcript as significantly over-or under-expressed, respectively. Significance of transcripts' up-and downregulation computed by this method was in general agreement with significantly different values (p-values) determined as described in supplementary document (Additional file 1: Table S6). To study the effect of plant biomass on the cellulosome composition, we utilized likelihood ratio Chi-squared test to examine the significant difference between the relative abundances of various protein categories in glucose-grown versus plant biomassgrown cultures.

Sequence availability and accession numbers
Raw sequencing reads from each condition and the assembled transcript sequences will be available at Gen-Bank under the accession number SRX1030108 and at MGRAST under the accession number 4667732.3. Raw and normalized transcriptional levels of all transcripts are available as a (Additional file 2).