Genomic features of a mesophilic cellulose degrader
The complete genome of Ccel consists of a single circular 4,068,724 bp chromosome with a GC content of 37.4%. It encodes 3390 proteins, 63 tRNAs and 24 rRNAs (Additional file 1: Table S1; GenBank Accession Number NC_011898; [18]). CAZymes are the critical enzymes that cleave, build and rearrange oligo- and polysaccharides [19]. Relative to other mesophilic cellulosome-producing clostridia such as C. acetobutylicum[20] and C. cellulovorans[21], Ccel harbors the least number of CAZyme genes (149 genes), but features the largest portfolio of cellulosomal genes which consists of 62 dockerin-encoding genes and three cohesin-encoding genes (cipC/Ccel_0728, orfX/Ccel_0733 and Ccel_1543). The cellulosomal enzymes in Ccel are diverse and complementary in functions, which included cellulases, hemicellulases (xylanases, mannanases, and arabinofuranosidases), pectate lyases and chitinases [22]. Moreover, the cellulosomal genes in Ccel tend to physically cluster along the chromosome, representing an organizational pattern distinct from C. thermocellum ATCC 27405 [23]. Among the 65 cellulosomal genes in total, we identified several clusters: i) the “cip-cel” gene cluster (Ccel_0728-0740) that encodes the major cellulosome components (including scaffoldin), most of which encode cellulases [24], ii) a second cluster of 14 genes (Ccel_1229-1242) encoding exclusively secreted dockerin-containing proteins, which are probably involved in hemicellulose degradation and herein named the “xyl-doc” gene cluster [22], iii) three small clusters (each with two genes) encoding cellulosomal enzymes (Man26A/Ccel_0752-Cel9P/Ccel_0753, PL10/Ccel_1245-CE8/Ccel_1246 and Ccel_1655-1656), and iv) one cluster (Ccel_1549-1550) of one non-cellulosomal and one cellulosomal genes.
Structure of the cellulose degradome in C. cellulolyticum
To identify the components of the cellulose degradation in Ccel, we started by characterizing the populations of transcripts in Ccel cultures under a variety of carbon sources using RNA-Seq. The carbohydrate substrates tested included i) cellulose and its derivatives glucose and cellobiose, ii) hemicellulose (using xylan from oat spelts as a representative substrate) and its derivative xylose, and iii) corn stover, a natural plant-derived residue which consists of both cellulose and hemicellulose (Additional file 2: Figure S1A). In total, 12.4 million reads were uniquely mapped to the genome, representing combined sequence coverage of 223X. After removing rRNA reads, for each of the substrates tested, 74.3% to 84.2% of the reads were mapped to previously annotated coding regions, and the remaining were either upstream of a coding sequence (CDS; thus putatively identifying a 5′-untranslated region (5′-UTR)) or mapped to unannotated or potentially mis-annotated regions.
In total, a large majority (86.0%) of the genome was actively transcribed under at least one of the conditions, while 59.5%, 59.8%, 69.3%, 67.1%, 36.4% and 63.2% of the genome were transcribed under glucose, cellobiose, xylose, cellulose, xylan and corn stover, respectively. Furthermore, 8521 regions of a total of 1.16 Mb (28.5% of the genome) were expressed under each of the substrates tested, representing a “core transcriptional glycobiome”. These regions exhibited a scattered pattern along the genome. On the other hand, 167 regions (142 overlapping with CDS and 25 within intergenic regions) with a total of merely 14,338 bp (only 0.34% of the genome) were expressed under only one substrate (129 regions were found to be cellulose-specific, among which 18 were intergenic). Thus, specificity of the transcribed loci in response to carbon substrates was manifested in the relative level of transcription, instead of their presence or absence.
For each CDS, its Normalized Transcript Abundance (NTA) under a particular substrate was determined (Additional file 3: Table S2) and then compared across the various carbon substrates supporting Ccel cultivation (Additional file 2: Figure S1B). We defined the “cellulose degradome” as the collection of genes transcribed (NTA > 1) under cellulose. The “cellulose-specific degradome” was defined as those required for degradation of cellulose but not for that of cellulose derivatives (glucose and cellobiose); specifically, a gene was included only when i) its NTA under cellulose is greater than 1, and ii) the ratio of NTA between cellulose and glucose and that between cellulose and cellobiose are both greater than 2 and the p values (statistical significance of differential expression) are both lower than 0.001.
Those CDS encoding core metabolic functions (macromolecule biosynthetic process, protein biosynthesis and primary metabolic process) are enriched in the cellulose degradome of Ccel as compared to the complete proteome encoded in the genome. Moreover, except for nucleic acid binding (GO:0003676), various Gene Ontology (GO) categories related to environmental sensing, gene regulation and polysaccharide metabolism are also enriched in the cellulose degradome of Ccel.
Differentially Expressed Genes (DEGs; including both positively and negatively regulated) among substrates were further identified. At the threshold of P < 0.001, 1043 DEGs were identified from the 15 pair-wise comparisons of the six substrates. Most DEGs were involved in energy production and conversion, carbohydrate transport and metabolism, and translation. In total, 650 genes were differentially expressed between any two of the conditions of glucose (monosaccharide), cellobiose (disaccharide) and cellulose (polysaccharide), which formulated three main groups (via hierarchical clustering; Figure 1A; Additional file 4: Table S3). The first class (Class C1; 342 genes) showed the highest NTA under cellulose. Of them, 63 genes showed high NTA (Z-score > 0) in glucose. In comparison with cellulose degradome genes, the remaining 279 genes in the cellulose-specific class (cellulose-specific “degradome”) showed enrichment for ribosomal proteins (GO:0005840), oxidoreductase activity (GO:0016491), RNA binding (GO:0003723), gene expression (GO:0010467), macromolecule biosynthetic processes (GO:0009059) and protein metabolic processes (GO:0019538), etc (Figure 1B). The second class (Class C2) included 207 genes showing the highest NTA under cellobiose. Within this class are 17 genes showing high NTA under cellulose and 25 under glucose. The remaining 165 genes were enriched with ion transport (GO:0006811), protein binding (GO:0005515) and nucleotide metabolic processes (GO:0006139). A third class of 101 genes (Class C3) showed the highest NTA under glucose among the carbon sources, where catabolic processes (GO:0009056), carbohydrate metabolic processes (GO:0005975) and carbohydrate binding (GO:0030246) were enriched.
Surprisingly, 145 of the 148 CAZymes (except Ccel_0750, Ccel_0920 and Ccel_2109) encoded by Ccel genome were not found in the cellulose-specific degradome due to their similar transcriptional levels under cellulose and glucose, suggesting an unusual link between monosaccharide catabolism and cellulose degradome in this organism. To further probe the links among the substrate-specific degradomes, we performed co-expression analysis of all CAZyme genes encoded in Ccel genome under the different substrates.
Regulation of the cellulose degradome in C. cellulolyticum
Based on their substrate-dependent transcription patterns, the 143 CAZyme genes (except Ccel_0428, Ccel_0429, Ccel_2073, Ccel_2123 and Ccel_2442 which were not expressed under any of the carbon sources and a cohesin-encoding gene Ccel_1543) were clustered into four different groups (Figure 2A; Additional file 5: Table S4).
Carbon catabolite repression (CCR)
Group I includes 45 genes that showed higher expression levels under glucose, cellulose, xylan and corn stover relative to cellobiose and xylose, which included the “cip-cel” gene cluster (Ccel_0728-0740). Genes of this group mainly encode cellulosomal components, including scaffoldin subunits and major enzymatic subunits, which belong to GH families 5, 9, 26 and 48 and others involved in cellulose degradation. Surprisingly, most of the cellulosomal genes except the “xyl-doc” cluster belong to this group. Interestingly, the NTAs of all the 50 cellulosomal genes (not including the “xyl-doc” cluster) were correlated to each other, with highest correlation coefficients (R2 > 0.7) under glucose, cellulose, xylan and corn stover (Figure 2B; in grey).
Transcription of Group I CAZymes appears to be regulated by the carbon catabolite repression (CCR), as suggested its synchronic yet distinct differential patterns among substrates that featured a negative correlation between NTAs and growth rate. For example, the order in average NTA of Group I genes was cellulose (or corn stover or glucose) > xylan > xylose > cellobiose (Figure 2A), while that in growth rate was cellobiose > xylose > xylan > cellulose (or corn stover or glucose) (Additional file 2: Figure S1A). Catabolite control protein A (CcpA) is thought to be one of the key CCR regulators in Bacillus subtilis[25]. CcpA belongs to the LacI family of transcriptional regulators and binds selectively to specific DNA sequences (referred to as catabolite-responsive element, or cre) [26, 27]. Recently a 18-nt cre-like motif with 3 mismatches (TGTGTACGCGTTTATATT) was found upstream of the “cip-cel” gene cluster in Ccel; it was shown to be involved in regulating at least cipC by a CCR mechanism [28]. The Ccel genome has five genes (Ccel_1005, Ccel_1438, Ccel_2999, Ccel_3000 and Ccel_3464) that encode putative regulators of the LacI-family. In Ccel, the protein sequence of Ccel_1005 has the highest identity and similarity (34% and 55%, respectively) to that of B. subtilis CcpA. Four other proteins are slightly less similar (e.g., 25% and 46% in identity and similarity for Ccel_2999; 26% and 44% for Ccel_3000) to CcpA but more conserved in DNA-binding helix-turn-helix (HTH) domains. We therefore propose to use CcpA for Ccel_1005, while the other four LacI-family regulators are named herein as LfpC1, LfpC2, LfpC3 and LfpC4 (L acI f amily p roteins in C. cellulolyticum). Surprisingly, the expression levels of two neighboring genes, lfpC2 and lfpC3 (Ccel_2999 and Ccel_3000, respectively) were strongly negatively correlated with average expression levels of the “cip-cel” gene cluster with different carbon sources, and related coefficient (R2) reaches 0.79 (Figure 2C). Meanwhile, certain cre consensus-like sequences, possibly recognized by CcpA, LfpC1, LfpC2, LfpC3 and LfpC4, were determined via MEME [29] based on predicted DNA-binding motifs of these transcription factors [30]; the two center positions of the predicted putative 16-nt motifs were limited to “CG” owing to the conservation of this nucleotide pair in the CcpA binding site consensus sequences (Additional file 6: Figure S2). Genome-wide scanning of intergenic regions using FIMO [29] revealed 110 putative cre sites (18, 17, 20, 27 and 28 sites recognized respectively by CcpA, LfpC1, LfpC2, LfpC3 and LfpC4) in Ccel (Additional file 7: Table S5). However, only seven CAZyme genes on their upstream regions included a cre site motif, which was recognized by LfpCs but not by CcpA. Five of the seven genes (cel9Q/Ccel_0231, cipC/Ccel_0728, Ccel_0755, Ccel_1207 and Ccel_1439) belong to Group I (Additional file 7: Table S5). Notably, the putative cre site (AAGTTATCG TTAATTA) we identified for the “cip-cel” cluster was distinct and 87 bp upstream of the previously reported cre site [28], suggesting the presence of multiple cre sites within the upstream region of the “cip-cel” cluster. Thus the majority of cellulosomal genes might be regulated by CcpA-independent CCR, such as GlyR3 [31], CcpC [32] or CcpN [33].
Two-component systems (TCSs)
Group II includes 49 genes that showed high expression specifically on one substrate (e.g. cellulose, cellobiose, xylose or xylan) (Figure 2A). These genes encode noncellulosomal enzymes from GH10, 51, 94 and other GH and GT families (Additional file 5: Table S4). In particular, the genes encoding xylanases (GH8: Ccel_1258; GH10: Ccel_2319, 2320, 0153), a xylosidase (GH3: Ccel_1139) and arabinofuranosidases (GH51: Ccel_1255 and 1221) were highly expressed specifically under xylan, whereas cellobiose/cellodextrin phosphorylase genes (GH94: Ccel_3412 and 2109) are expressed specifically under cellulose, while hemicellulase genes (GH18: Ccel_2893, 0643 and 2820; GH23: 0815) and some glycosyltransferase genes (Ccel_0486, 3410, 1334, 0333) are expressed specifically under xylose.
Group III is mainly the “xyl-doc” gene cluster (Ccel_1229-1242) that exhibited higher expression levels under corn stover than other carbon sources (Figure 2A; Additional file 5: Table S4). The low expression of “xyl-doc” cluster genes on xylan from oat spelts indicates that they hydrolyze hemicellulose other than the xylan from oat spelts. They also encode cellulosomal components, which belong to GH43, 27, 10 and other families involved in hemicellulose degradation. The remaining CAZymes are collectively assigned to Group IV, which are mainly non-GH family enzymes, such as members of the GT1 family (Figure 2A; Additional file 5: Table S4).
Further analysis revealed that transcription of 76 CAZymes that include noncellulosomal enzymes (Group II; 49 genes) and cellulosomal hemicellulase components (including “xyl-doc” gene cluster; Group III; 27 genes) might be regulated by a TCS mechanism. Ccel possesses 37 putative TCSs, eleven of which were flanked by genes encoding Group II and Group III CAZymes and putative sugar ABC transporters (Figure 3A). In these loci, the CAZyme genes exhibited similar expression patterns to ABC transporter genes (if both were highly expressed) under the carbon sources tested. Thus CAZymes of Group II, Group III and ABC transporters appeared to be co-regulated by TCSs. Our results were confirmed by a recent study which showed that one TCS (XydS/XydR; Ccel_1227/1228) transcriptionally regulates Group III CAZymes (the “xyl-doc” gene cluster; [34]). Meanwhile, genes encoding sugar-binding proteins (SBP) were found in two loci (named sbp1 and sbp2, respectively) that encoded ABC transporter genes and TCS genes (Figure 3A; TCS-loci Category I). For example, Ccel_2109-2115 encoded one CAZyme (Ccel_2109; encoding a cellodextrin phosphorylase named cdpA), c ellulose-u tilization a ssociated ABC transporters (Ccel_2112-2110, named cuaA, cuaB and cuaC), and TCS (Ccel_2115-2113, named cuaD, cuaS and cuaR) (Figure 3B). Expression of the cuaA (Ccel_2112), encoding a potentially periplasmic high-affinity solute-binding protein, exhibited substrate-specificity (with the highest level under cellulose) and is strongly correlated with that of the cdpA under different carbon sources (R2 = 0.97) (Figure 3C). However, the sbp2 gene (cuaD) was expressed constitutively with TCS as an operon at a low level (<0.2% of sbp1 (cuaA) on cellulose) under each of the carbon sources. Moreover, upstream of cdpA, cuaA and cuaB, there is a conserved sequence motif that might serve as a putative binding site of CuaR (a TCS response regulator harboring an “AraC”-family DNA-binding domain) (Figure 3B). Therefore, SBP2 may be a “signal collector” of TCS. When an extracellular sugar molecule is specifically bound to SBP2, the complex formed may activate the sensor histidine kinase, which can phosphorylate a cognate response regulator (e.g. CuaR). Subsequently the phosphorylated regulator may promote the expression of genes encoding Group II of CAZymes and ABC transporters which specifically hydrolyze polysaccharides and transport the hydrolysates.
Thus via CCR control, cellulosomal genes (except the “xyl-doc” cluster) were induced under recalcitrant carbon sources (cellulose and corn stover) and repressed under cellobiose and xylose. On the other hand, via TCS regulation, noncellulosomal enzymes, cellulosomal hemicellulases encoded by the “xyl-doc” cluster and ABC transporters were induced in a substrate-specific manner.
Therefore, the CAZyme components of the cellulose degradome can be classified into two categories: i) the “core” proteins (Group I) which are required for cellulose degradation, and ii) the “accessory” proteins (Group II and III) which are not required for cellulose degradation. Furthermore, transcriptional regulation of the core is associated with CCR, while that of the accessory is linked to TCS.
Activation of cellulose degradation by glucose in C. cellulolyticum
Curiously, the NTA of most of the Group I genes were over four times higher under glucose than under cellobiose, xylose or xylan (Figure 2A, Additional file 3: Table S2), suggesting glucose induced transcriptionally at least part of the cellulose degradome. To test whether the NTA upregulation led to elevation in protein abundance, the secreted proteomes of Ccel under glucose and cellobiose were analyzed via label-free quantitative proteomics using LC-MS/MS. At the protein level, the number and yield of cellulosomal components under glucose were significantly higher than under cellobiose: for example, 13 cellulosomal components were identified under glucose, but only five components were found under cellobiose (Additional file 8: Table S6).
To test the hypothesis of glucose-based promotion of cellulase expression and cellulose degradation, we cultured Ccel on singular or mixed carbon source of cellulose (3 g/L) and glucose (3g/L). Under dual-substrate, arrival of mid-log phase was ~24 hours earlier than that under glucose alone and ~48 hours earlier than that under cellulose alone (Figure 4A), suggesting faster cellulose degradation when glucose is present. Moreover cellulose degradation under dual-substrate was ~50 hours faster than that under cellulose alone (Figure 4B), while glucose utilization rate under dual-substrate was similar to that under glucose alone (Figure 4D). Quantitative RT-PCR (qRT-PCR) revealed that the transcription level of the eight genes (six of them were from the cip-cel cluster) encoding major cellulosomal components in Group I under glucose or glucose-cellulose was significantly higher than (for six genes) or equal to (for two genes) that under cellulose (Figure 4C). In particular, the two main cellulosomal genes in the cip-cel cluster, Ccel_0728 (cipC) and Ccel_0729 (cel48F), were transcribed at significantly higher level (2-fold) under dual-substrate than under cellulose-alone. Thus glucose enhanced degradation of cellulose by inducing expression of the cellulosomal genes in Ccel.
To test whether the inductive effect of glucose on cellulose degradation is dependent on glucose concentration, we cultured Ccel on cellulose which was mixed with a gradient of glucose (0.5-8.0 g/L) or cellobiose (4g/L). The culture under cellulose-alone was used as control (Figure 4E). The peak cellulolysis rate decreased under incremental concentrations of the glucose supplement (Figure 4F): the rates under lower glucose-supplements (0.854 and 0.622 g/L/Day under 0.5 and 1.0 g/L respectively) were up to 41% higher than that of cellulose-alone (0.607 g/L/Day), while those under higher glucose-supplements (0.469, 0.449 and 0.434 g/L/Day under 2, 4 and 8 g/L respectively) were 23 ~ 29% lower than that of control (but still higher than that under 4 g/L cellobiose (0.305 g/L/Day)). On the other hand, the lag-time (the time taken to reach the peak cellulose degradation rate) under higher glucose-supplements (4.44, 4.10 and 3.90 Day under 2, 4 and 8 g/L respectively) was faster by1.42-1.96 Day than that of control (5.86 Day), while that under lower glucose (0.5 and 1 g/L) was only 0.76-1.24 Day faster than that of control. Thus glucose supplementation promotes cellulose degradation by inducing cellulase transcription at low concentrations.
Such glucose induction of cellulase transcription and cellulolysis and its dependency on glucose concentration appeared to be quite unique as they have not been previously reported in this and any other microorganisms [28, 35]. Several lines of evidence suggested glucose as an edible but not preferred carbon source of Ccel, which potentially explains the surprising trait: i) Ccel growth was much slower under glucose than under cellobiose [36] or xylose and xylan (Additional file 2: Figure S1A); ii) Under glucose-cellulose mixture Ccel cells did not exhaust glucose, which remained at ~1 g/L from mid- to late-log phase (Figure 4D); iii) The NTA of putative glucokinase genes (Ccel_0700 and Ccel_3221, the first enzyme in the Embden-Meyerhof pathway) under glucose were 36 ~ 58% lower than under other soluble sugars such as xylose and cellobiose (Additional file 3: Table S2); iv) Under higher glucose-supplements (4 and 8 g/L), the peak cellulolysis rates (0.449 and 0.434 g/L/Day) were higher than that under 4 g/L cellobiose-supplement (0.305 g/L/Day; Figure 4F), consistent with the report that repression of the cip-cel cluster by cellobiose was more drastic than by glucose [28]. Therefore, the activation of cellulase transcription by a non-preferred carbon source (i.e., glucose) and inhibition by a preferred substrate (i.e., cellobiose) in Ccel can be explained by the CCR mechanism.
A molecular model of the cellulose degradome in C. cellulolyticum
In view of the above, we propose a structural and regulatory model for the cellulose degradome in Ccel (Figure 5). In this model, utilization of cellulose requires at least three functional classes of proteins, including CAZymes that catalyze cellulose hydrolysis, ABC transporters of the hydrolysates and the signal transduction systems (CCR and TCS). The cellular degradation of cellulose consists of five steps: (A) When Ccel is grown on mineral medium with a lignocellulose substrate (including both pentose and hexose) or non-preferred monosaccharides (e.g., glucose) as the sole carbon source, the CCR mechanism is relieved, leading to low levels of intracellular glycolytic intermediates. Consequently, a homologue of the phosphocarrier proteins (Crh (catabolite repression HPr)-like protein, Ccel_0806) remains dephosphorylated and prevents the CcpA homologues, such as LfpC2 (Ccel_2999) or LfpC3 (Ccel_3000), from inhibiting the transcription of the major cellulosomal genes (except the “xyl-doc” gene cluster) or activates their expression via other regulators. (B) As a result, the cellulosomal components are expressed, secreted and assembled into cellulosomes anchored on the cell surface, which catalyzes hydrolysis of the lignocellulose. (C) The soluble saccharides resulted from lignocellulose hydrolysis are captured by sugar-binding proteins (SBP2); the signal is transduced into cells via the intramembrane-sensing histidine kinase of the TCSs. The histidine kinase phosphorylates the response regulator, which activates expression of ABC transporters and CAZyme genes. (D) The temporal synergy and functional complementarity between the transcriptionally upregulated CAZymes may then accelerate lignocellulose degradation generating the release of soluble sugars. (E) ABC transporters, whose transcription is also activated via the TCS, transport and feed the extracellular soluble sugars into the glycolysis pathway. The resultant high concentrations of glycolytic intermediates would inhibit the expression of cellulosomal genes via CCR, thus closing this five-step cycle of regulated cellulose degradation.