Global transcriptome analysis of Clostridium thermocellum ATCC 27405 during growth on dilute acid pretreated Populus and switchgrass
- Charlotte M Wilson1, 2,
- Miguel RodriguezJr1, 2,
- Courtney M Johnson1, 2,
- Stanton L Martin3,
- Tzu Ming Chu3,
- Russ D Wolfinger3,
- Loren J Hauser1, 2,
- Miriam L Land1, 2,
- Dawn M Klingeman1, 2,
- Mustafa H Syed1, 2,
- Arthur J Ragauskas2, 4,
- Timothy J Tschaplinski1, 2,
- Jonathan R Mielenz1, 2 and
- Steven D Brown1, 2Email author
© Wilson et al.; licensee BioMed Central Ltd. 2013
Received: 2 August 2013
Accepted: 19 November 2013
Published: 2 December 2013
The thermophilic anaerobe Clostridium thermocellum is a candidate consolidated bioprocessing (CBP) biocatalyst for cellulosic ethanol production. The aim of this study was to investigate C. thermocellum genes required to ferment biomass substrates and to conduct a robust comparison of DNA microarray and RNA sequencing (RNA-seq) analytical platforms.
C. thermocellum ATCC 27405 fermentations were conducted with a 5 g/L solid substrate loading of either pretreated switchgrass or Populus. Quantitative saccharification and inductively coupled plasma emission spectroscopy (ICP-ES) for elemental analysis revealed composition differences between biomass substrates, which may have influenced growth and transcriptomic profiles. High quality RNA was prepared for C. thermocellum grown on solid substrates and transcriptome profiles were obtained for two time points during active growth (12 hours and 37 hours postinoculation). A comparison of two transcriptomic analytical techniques, microarray and RNA-seq, was performed and the data analyzed for statistical significance. Large expression differences for cellulosomal genes were not observed. We updated gene predictions for the strain and a small novel gene, Cthe_3383, with a putative AgrD peptide quorum sensing function was among the most highly expressed genes. RNA-seq data also supported different small regulatory RNA predictions over others. The DNA microarray gave a greater number (2,351) of significant genes relative to RNA-seq (280 genes when normalized by the kernel density mean of M component (KDMM) method) in an analysis of variance (ANOVA) testing method with a 5% false discovery rate (FDR). When a 2-fold difference in expression threshold was applied, 73 genes were significantly differentially expressed in common between the two techniques. Sulfate and phosphate uptake/utilization genes, along with genes for a putative efflux pump system were some of the most differentially regulated transcripts when profiles for C. thermocellum grown on either pretreated switchgrass or Populus were compared.
Our results suggest that a high degree of agreement in differential gene expression measurements between transcriptomic platforms is possible, but choosing an appropriate normalization regime is essential.
KeywordsGenome Reannotation Biomass Elemental composition RNA-seq Microarray Phosphate Normalization Transcriptomics
Analysis of variance
BioEnergy Science Center
Basic Local Alignment Search Tool
Department of Energy
False discovery rate
Gene Expression Omnibus
High performance liquid chromatography
Inductively coupled plasma emission spectroscopy
Joint Genome Institute
Kernel density mean of M component
MicroArray Quality Control
Tandem mass spectrometry
Medium for Thermophilic Clostridia
National Center for Biotechnology Information
National Renewable Energy Laboratory
Open reading frame
Oak Ridge National Laboratory
Polymerase chain reaction
Ribosome binding site
RNA integrity number
Reads per kilobase per million
Reads per million
Sequence Read Archive
Trimmed mean of M component
Upper quartile scaling
United States Environmental Protection Agency.
Clostridium thermocellum exhibits one of the highest rates of degradation of cellulosic substrates, which is facilitated by large extracellular multi-subunit enzyme systems termed cellulosomes [1–3]. It also has productivity advantages associated with thermophilic growth conditions. The bacterium has many attributes that are of interest for fundamental research. It also has the potential to be used in industrial-scale consolidated bioprocessing (CBP) (without added enzymes) of lignocellulosic biomass into ethanol for the displacement of petroleum products [4–8].
The C. thermocellum ATCC 27405 genome was originally submitted to the US Department of Energy (DOE) Joint Genome Institute (JGI; Walnut Creek, CA, USA) for sequencing by JHD Wu (University of Rochester, Rochester, NY, USA) and ME Himmel (National Renewable Energy Laboratory (NREL), Golden, CO, USA). The genome was sequenced using the Sanger method, made available in November 2003 [GenBank:CP000568], and represented the first genome sequence for this species. Repetitive sequences such as transposases and those present in cohesin domains made closing this genome challenging and the genome sequence was not finished until 2007. The C. thermocellum ATCC 27405 genes were originally predicted using two gene modeling programs, Glimmer  and Critica , as part of a JGI annotation pipeline. The gene prediction program Prodigal  was developed at Oak Ridge National Laboratory (ORNL; Oak Ridge, TN, USA) and incorporated into the JGI annotation pipeline after the initial ATCC 27405 genome annotation. We have found that its use has improved the gene prediction models for several bacteria [12, 13]. As a result, we applied Prodigal to the C. thermocellum genome sequence and report an update to the C. thermocellum ATCC 27405 genome annotation in this study.
Previous studies have suggested that C. thermocellum coordinates its cellulosomal subunit composition depending on the growth substrate [14, 15] and growth rates . Such studies are important for designer cellulosome engineering studies, developing efficient industrial enzyme cocktails, metabolic engineering, and synthetic biology endeavors . Biomass from the monocot switchgrass (Panicum virgatum) and the woody dicot black cottonwood (Populus trichocarpa) have been proposed as model bioenergy crops for the USA . In order to gain insights into the C. thermocellum genes required for growth on either pretreated switchgrass or Populus we generated whole genome DNA microarray profiles for its growth on biomass for the first time. We have also developed an effective method to isolate high quality RNA from C. thermocellum during these biomass fermentations with initial solid substrate loadings of 5 g/L.
RNA sequencing (RNA-seq) has recently been used for prokaryotic transcriptome analysis [19–21]. It has several advantages over a microarray platform such as greater dynamic range of reads relative to the intensity of probe signal on a microarray platform. The technology allows for the identification of new transcripts and transcriptional start sites at a higher resolution than would be available on a tiling array. RNA-seq technologies and statistical approaches for transcriptome analyses are developing rapidly [22–26], and debate remains over the ideal methods for data normalization and which statistical methods are most useful to help identify biologically-relevant effects.
A comprehensive comparison of different normalization methods for Illumina data has been reported previously . We tested five RNA-seq normalization strategies: trimmed mean of M component (TMM); reads per million (RPM) scaling; reads per kilobase per million (RPKM); upper quartile scaling (UQS); and a newly developed method called kernel density mean of M component (KDMM). Each method is a scaling type method whose corresponding scaling factors are calculated based on the geometric mean for KDMM, arithmetic mean for RPM, geometric mean divided by arithmetic mean for TMM, and the 75th percentile for UQS. We compared the results from these different normalization methods with microarray data derived from the same cDNA using an established expression microarray platform to offer useful suggestions for future RNA-seq studies.
Genome reannotation and updated microarray probe sequences
Improvements in DNA sequencing technologies, assembly, and gene prediction algorithms have facilitated continuous updates to sequenced genomes [12, 13, 27–29]. The latest annotation of the C. thermocellum ATCC 27405 genome has 3,175 candidate protein coding sequences (CDSs) predicted using Prodigal [GenBank:CP000568.2] . Previously reported proteomics data was used to confirm predicted gene models  (see Additional file 1 for all peptides used for annotation confirmation and Additional file 2 for peptides used to update open reading frame (ORF) start sites and include new genes). Compared to the primary C. thermocellum ATCC 27405 annotation, 130 CDSs have been added or converted from pseudo genes into genes and 65 former CDSs were deleted or converted into pseudo genes (see Additional file 3 for examples of peptide hits used to update the genome annotation). Other modifications include the merging of two former genes into a single ORF and the modification of transcriptional start sites. A comparison of the annotation versions can be found at: http://genome.ornl.gov/microbial/cthe/. We have updated our microarray dataset to reflect the new gene numbers where probes originally designed to intergenic regions are now acknowledged to target a newly annotated gene (see Additional file 4 for microarray probe gene assignment update and Additional files 5 and 6 for details).
Of interest to us were any inherent compositional differences between the two biomasses. Quantitative saccharification of pretreated biomass samples revealed that there was more glucose in the Populus biomass (646 mg/g of biomass SD ± 13.6) compared to the switchgrass pretreated biomass (522.5 mg/g of biomass SD ± 9.3) and reflects the cellulose component of the two biomasses. The levels of xylose and arabinose differed between the biomasses with almost four times the amount in switchgrass (xylose: 72.5 mg/g of biomass SD ± 0.4; arabinose: 7.1 mg/g of biomass SD ± 1.0) relative to Populus (xylose: 19.4 mg/g of biomass SD ± 1.6; arabinose: 1.6 mg/g of biomass SD ± 0.2). This is a reflection of the hemicellulose compositional differences, in particular the arabinoxylan component that predominates in the cell wall of switchgrass .
Samples of the pretreated biomasses used as substrates for the fermentations were analyzed by inductively coupled plasma emission spectroscopy (ICP-ES) for elemental compositional differences that could influence the fermentation performance. The pretreated material was also compared to untreated biomass to identify any elemental differences associated with the pretreatment procedure. In both biomasses the pretreatment procedure appeared to introduce chromium, molybdenum, and titanium, which were significantly (P <0.001) different between pretreated and unpretreated biomass (Additional file 7).
Calcium was present in the untreated material at levels of 1,388 mg/kg and 2,868 mg/kg of Populus and switchgrass, respectively. The calcium was removed more efficiently from the Populus biomass with the amount in the pretreated biomass decreasing to 34.3 mg/kg, whereas levels remained high after pretreatment in the switchgrass biomass (1,918 mg/kg) (Additional file 4). Pretreatment efficiently reduced the levels of potassium, magnesium, manganese, phosphorus, strontium, and zinc from both biomasses. The divalent cations barium, calcium, copper, iron, manganese, nickel, strontium, and zinc as well as the phosphorus and sulfur content were higher in pretreated switchgrass compared to Populus (Additional file 7). The only significantly different element that was higher in pretreated Populus relative to switchgrass was molybdenum, which was likely introduced during the pretreatment procedure (Additional file 7).
Growth characterization on biomass
Major C. thermocellum fermentation products and residual biomass sugars
Switchgrass (5 g/L loading)
Populus (5 g/L loading)
0.05 (± 0.04)
0.02 (± 0.01)
522 (± 2)
0.05 (± 0.04)
0.03 (± 0.01)
584 (± 15)
0.3 (± 0.09)
0.1 (± 0.02)
423 (± 41)
0.4 (± 0.01)
0.2 (± 0.01)
368 (± 13)
0.5 (± 0.01)
0.2 (± 0.001)
281 (± 22)
0.8 (± 0.08)
0.3 (± 0.03)
220 (± 32)
Normalization and transcriptome analysis
RNA-seq is an alternative technology for microarrays in transcriptome analysis. This study sought to identify changes in the transcript profile of C. thermocellum ATCC 27405 grown on the substrates of pretreated Populus and switchgrass and whether these profiles were maintained across the two gene expression analytical platforms. RNA-seq reads gave a genome depth coverage of at least 580× (Additional file 9) and gave data for 3,370 genes (98.4% of the annotated protein coding genes). Fluorescence intensity values from the microarrays gave data on 3,157 genes (92.2% of the annotated genes). Data was collected for 3,088 genes on both platforms, constituting 90% of the 3,424 predicted genes (both protein coding and non-protein coding) in the latest version of the C. thermocellum ATCC 27405 genome. Correlations of log2 transformed fluorescent intensity counts for the array or log2 transformed read counts for the RNA-seq of the biological replicates for each condition gave Pearson R correlations ranging from 0.93 to 0.97 in the array and 0.94 to 0.98 in the RNA-seq (Additional file 10). A comparison of the array intensity values and RNA-seq read counts across the two transcriptomic techniques gave Spearman correlation coefficients ranging from 0.83 to 0.88 for each of the growth and substrate comparisons (Additional file 11).
Subset of relative expression values for cellulosome-related genes
Populus 12 hours (rank percentile)
Populus 37 hours (rank percentile)
Switchgrass 12 hours (rank percentile)
Switchgrass 37 hours (rank percentile)
Altered gene regulation and validation of expression differences
Summary of genes passing significance and 2-fold differential expression thresholds
Analysis, normalization strategy
Total number of genes FDR <0.05
Total number of genes differentially expressed (± 2-fold)
Seventy-three genes significantly (FDR <0.05) and differentially expressed (± 1, log 2 scale) that were in common between RNA-seq normalized by the KDMM strategy and microarray data
(Populus 12 hours) - (Populus 37 hours)
(Populus 12 hours) - (Switchgrass 12 hours)
(Populus 37 hours) - (Switchgrass 37 hours)
(Switchgrass 12 hours) - (Switchgrass 37 hours)
Peptidase S11 D-alanyl-D-alanine carboxypeptidase 1
Manganese containing catalase
Spore coat protein CotJB
Dipicolinate synthase subunit A
Flagellar hook-length control protein
Flagellar hook-capping protein
Flagellar operon protein
IS66 Orf2 family protein
Glycoside hydrolase (CelJ)
Amino acid-binding ACT domain protein
Spore coat protein, CotS family
Spore coat protein, CotS family
Glycosyltransferase group 1
Glycoside hydrolase family 3 domain protein
Carbohydrate-binding CenC domain protein
Radical SAM domain protein
Accessory gene regulator B
GTP-binding protein HSR1-related
RNA polymerase sigma factor, sigma-70 family
Glutamine synthetase catalytic region
Glutamyl-tRNA(Gln) amidotransferase, B subunit
Phosphate ABC transporter, inner membrane subunit PstA
Phosphate ABC transporter, inner membrane subunit PstC
ABC-type phosphate transport system periplasmic component-like protein
Sporulation protein YunB
RNA polymerase sigma factor, sigma-70 family
Urease accessory protein UreD
Urease accessory protein UreG
Urea ABC transporter, permease protein UrtC
Dockerin type 1 protein
CRISPR-associated helicase Cas3
Sporulation peptidase YabG
Cupin 2 conserved barrel domain protein
S-layer domain-containing protein
Sulfate ABC transporter, periplasmic sulfate-binding protein
Sulfate ABC transporter, inner membrane subunit CysT
UBA/THIF-type NAD/FAD binding protein
UDP-N-acetyl-D-glucosamine 2-epimerase, UDP-hydrolyzing
Sugar O-acyltransferase, sialic acid O-acetyltransferase NeuD family
ATP-dependent carboxylate-amine ligase domain protein ATP-grasp
Polysaccharide biosynthesis protein CapD
Lipopolysaccharide biosynthesis protein
Capsular exopolysaccharide family
PHP domain protein
ErfK/YbiS/YcfS/YnhG family protein
Peptidoglycan-binding lysin domain
Manganese/iron superoxide dismutase
Cell envelope-related function transcriptional attenuator, LytR/CpsA family
Growth stage-specific changes in gene expression
C. thermocellum expression profiles can vary based on growth rate . No genes showed consistent patterns of regulation at 12 hours relative to 37 hours postinoculation on both substrates using stringent criteria, which may reflect relative differences in growth (Additional file 8). By 37 hours there were eight genes consistently expressed by 2-fold or greater compared to the earlier sampling time point irrespective of the substrate. These eight genes included those encoding proteins related to spore formation (Cthe_0964 (also lysine biosynthesis), Cthe_1084, and Cthe_1759), a glycosyltransferase (Cthe_1085), and genes involved in nucleotide and amino sugar metabolism (Cthe_2642 and Cthe_2644) (Table 4). Other genes affected in the growth stage comparison include an anti-sigma factor (Cthe_1437) and a putative ABC transporter subunit (Cthe_2573). These genes are potentially contributing to the transition of the cells from log to stationary phase.
Substrate-specific gene expression
Comparison of differentially expressed genes permitted the identification of genes that were only affected on one of the biomass substrates. Six genes were upregulated during growth on Populus relative to switchgrass 12 hours after inoculation with the patterns of expression consistent across the two analytical platforms. These genes met the FDR <0.05 and ≥2-fold difference in gene expression requirements, and included genes encoding glycoside hydrolase and CenC carbohydrate-binding proteins (Cthe_1256 and Cthe_1257) (Table 4). A genomic locus that includes a gene encoding a predicted Radical SAM domain protein and an AgrB protein (Cthe_1309 and Cthe_1310) were upregulated on Populus at 12 hours relative to switchgrass. Interestingly, these two genes are upstream of a new addition to the C. thermocellum genome with predicted AgrD functions (Cthe_3348) suggesting a signaling or bacteriocin-like production specific to the substrate. Gene Cthe_2531 is predicted to be involved in sulfate transport and was upregulated when C. thermocellum was grown on Populus. Three other genes from this cluster were also upregulated but did not pass the significance threshold in the RNA-seq analysis. Conversely on switchgrass, three genes related to phosphate transport (Cthe_1603, Cthe_1604, and Cthe_1605) were upregulated. These genes are part of a putative high affinity phosphate transport system we have identified only in strain ATCC 27405 and this system is distinct from the common Na/Pi symporters found in all C. thermocellum strains examined to date. One Na/Pi symporter (Cthe_0064) in C. thermocellum ATCC 27405 was among the top 5% most highly expressed genes on both biomasses (Additional file 9).
Two genes (Cthe_1480 and Cthe_1481) with hypothetical function annotations were upregulated on switchgrass and met the significance criteria. The expression patterns of these genes were maintained in the comparison at 37 hours postinoculation. They have a general function prediction as members of the RND family of exporters and are well conserved in bacteria. Interestingly none of these genes were identified in a study of C. thermocellum ATCC 27405 grown on pure cellulose or pure cellobiose  suggesting the regulation of these genes were specific to the lignocellulosic biomasses used in the current study.
Differential expression of cellulosome genes and central carbon metabolism
Consistent expression patterns for cellulosomal-related genes identified in both the RNA-seq (KDMM) and array included two known cellulosome genes. Cthe_0624 (CelJ) encoding a glycoside hydrolase family 9 enzyme with predicted endoglucanase functions was upregulated in early growth stages on switchgrass relative to the later growth stage, while no differences were identified on Populus. This protein was reported as highly abundant in a proteome study with growth of C. thermocellum when grown on cellobiose, cellulose, and switchgrass . Cthe_1890 encoding a protein with a type 1 dockerin domain had higher expression in the latter growth stage on switchgrass relative to the 12-hour sample. A gene (Cthe_1256), predicted to encode a glycoside hydrolase family 3 enzyme that converts a variety of glucans into β-D glucose, was upregulated on Populus relative to switchgrass at 12 hours postinoculation.
An accurate and complete representation of an organism’s genome sequence and its functional annotation is requisite for systems biology studies and genome-scale engineering for synthetic biology . New technologies (for example DNA sequencing ), algorithms (for example Prodigal ), and biological features (for example sRNA ) expand our knowledge of genomes. However, the majority of genome sequences and annotations are rarely updated. Re-annotation has been suggested as an essential component for assaying and understanding systems biology data  and wiki-based solutions have been recommended to facilitate genome updates . In this study, we used the gene prediction program Prodigal to update the C. thermocellum ATCC 27405 gene models. The methodology, accuracy, and specificity improvements incorporated into Prodigal have been described . RNA-seq analysis and proteomic analysis performed using two-dimensional liquid chromatography (LC)-tandem mass spectrometry (MS/MS) offers the possibility of searching continuously updated genome databases with previously obtained information. This is an important advantage since it is likely that further improvements will be made to C. thermocellum gene models and annotations in the future.
We were able to develop a protocol to obtain high quality RNA from C. thermocellum grown on biomass for the first time and to enrich mRNA by subtractive hybridization so that greater than 99.6% of the reads did not map to the 5S, 16S, and 23S rRNA gene sequences. This protocol development opens up new possibilities for future RNA-seq studies of industrially-relevant biomass fermentations. In our transition to a transcriptomic analytical platform based on RNA-seq we sought to compare and contrast the relatively new technology of RNA-seq to an established custom designed microarray. The cross-platform comparisons described here are among the best that we are aware of, with Spearman correlation coefficients ranging from 0.83 to 0.88 (Additional file 11).
Normalization strategies remove experimental noise from transcriptomic datasets prior to analyses used to determine biological differences in samples of interest. In microarray analyses, known biases include variation in dye incorporation rates and hybridization of material to the platform . In RNA-seq analyses distinct biases relate to the depth of sequencing, the length and GC content of genes, and mapping approach [39–42]. We found that normalization of the RNA-seq data had dramatic effects on the final results of our data (Figure 1, Additional file 12). KDMM and UQS gave similar distribution and clustering profiles. The KDMM normalization method was the preferred regime in this study as it provided more results in common with the array data. The KDMM method uses a scaling factor based on the geometric mean of the mapped reads and the UQS method scales read count distributions so that the 75th percentiles are consistent after normalization . Both TMM and RPM performed poorly with our dataset. TMM gave the fewest genes (10) identified in the analysis of variance (ANOVA) as significantly differentially expressed, which was likely due to greater variation post-normalization (Additional file 12). TMM is a conservative normalization method that performs well where datasets have a consistent number of mapped reads across samples . The number of reads that mapped uniquely for given samples differed as much as approximately 2-fold between the largest and smallest totals (Additional file 7). The C. thermocellum sample that was run with the PhiX sequencing control had the fewest number of reads that mapped to the genome, and inconsistencies in the number of mapped reads is likely to explain why the other methods performed better than TMM in this instance. Although widely used, there are reports that the RPKM method can bias estimates of differential expression [40, 43]. In this study, many genes which were identified as having the largest expression differences in the array and KDMM normalized RNA-seq data, such as phosphate and sulfate transport genes, were not identified in significance testing using data normalized by the RPM (Figure 2) or similar RPKM method (Additional file 15).
A number of studies have investigated RNA-seq, mapping methods, technical variability and reproducibility, normalization, and statistical testing methods. However, the field of RNA-seq is still relatively new and rapidly evolving. Differential expression measurements cannot be estimated with any confidence if a single biological replicate is employed. We employed two biological replicate fermentations on each biomass with samples taken at two time points, 12 hours and 37 hours postinoculation, but we expect that as sequencing costs continue to decrease, more biological replicates will be used to increase statistical power. This will allow for greater confidence in RNA-seq differential expression estimates. We used the NimbleGen call files for the microarray data, which uses outlier detection and then summarizes unique probe intensity values into one value for three technical array replicates for each biological replicate. We also employed the Kenward-Roger method to estimate the degrees of freedom in the mixed model analyses of the array data. The array analysis had considerably more statistical power (six expression estimates per gene per condition) compared to the RNA-seq dataset (two expression estimates per gene per condition). Our array data and RNA-seq data generally agreed, although different genes were categorized as significant or did not meet criteria for certain comparisons (Table 3, Additional file 15). We have made the datasets available so that others may compare and contrast different methods and analyses.
The yields of the major fermentation products were approximately 1.4-fold higher after 37 hours on Populus compared to switchgrass with normalization to the original biomass loading. The results of this study suggest more favorable growth of C. thermocellum when pretreated Populus was the substrate. Hemicelluloses present in these two lignocellulosic substrates differ, with glucuronoxylan in hardwoods such as Populus while grasses have predominantly arabinoxylans [44, 45]. The dilute acid pretreatment of each of the biomass substrates should solubilize the majority of hemicelluloses from the biomass, which are then removed by numerous wash steps. It is likely, however, that residual material is left, as well as remaining quantities of inhibiting compounds derived from the pretreatment and breakdown of the hemicelluloses. Examples of inhibitor byproducts from pretreatment include vanillin, hydroxymethylfurfural (HMF), furfural, and syringic acid . Lignin remains after pretreatment and can influence the accessibility of C. thermocellum to cellulose in the biomass substrate. The degree of cellulose polymerization after pretreatment may be another factor that differs between the two biomasses that could influence the fermentation performance [47, 48]. ICP-ES analysis also revealed differences in calcium removal efficiency (Table 3), with the majority of calcium removed during pretreatment of Populus while two-thirds remained after pretreatment of switchgrass. The data suggests that under the pretreatment and process conditions used in this study the dilute acid pretreated Populus was a more accessible substrate for C. thermocellum fermentation compared to the pretreated switchgrass. Alternatively, the species biomass may have differed in the proportion of bound versus free calcium. Nonetheless, different pretreatment strategies and process conditions will be required for optimal conversion of different biomass feedstocks into different biofuels .
From both the microarray and the RNA-seq data we could identify C. thermocellum genes that were highly expressed when grown on these two complex biomass substrates. The cellotriose transport system (Cthe_0391-0393) was among genes that were highly expressed on both substrates. Dextrins of length 3 to 5 are the preferred substrate of C. thermocellum, and this particular transporter is one of five involved in carbohydrate transport and the only one with a specificity for cellotriose . Three other systems transport glucans ranging from one to five glucose subunits with variable substrate affinities and the last is specific for laminaribiose . High-level expression of the cellotriose transport system on Populus and switchgrass suggests the majority of the cellulose in these biomasses is processed by the C. thermocellum cellulosome into cellotriose. Other highly expressed genes included cellulosomal genes such as CipA (primary non-catalytic scaffoldin unit) and CelS (exoglucanase) (Table 2), which is in agreement with earlier data . Identifying highly expressed genes on various substrates is useful for strain engineering as it can expand the repertoire of available promoter sequences to facilitate enhanced cellulosic conversion.
More than 70 dockerin-containing proteins and potential cellulosome-related subunits have been identified in the C. thermocellum ATCC 27405 genome . Of interest in the current study were those genes encoding enzymes or proteins with functions related to cellulosome degradation of biomass and had differential regulation when C. thermocellum was grown on switchgrass compared to Populus (Additional file 15). For example, the genomic locus Cthe_1256-1257 that encodes a glycoside hydrolase and a carbohydrate-binding protein exhibited higher expression on Populus at 12 hours compared to switchgrass (Table 4). Cthe_1257 may encode a protein with potential for cellulose binding, while Cthe_1256 lacks a signal peptide and is predicted to function as a β-glucosidase cleaving imported dextrins to yield β-D glucose. These gene expression differences indicate a degree of specificity of the C. thermocellum response to different substrate availability while growing on the two biomasses. A glycoside hydrolase (Cthe_0624) was upregulated at 12 hours on switchgrass compared to 37 hours on switchgrass with no differences identified on Populus. The glycoside hydrolase (Cthe_0624) amino acid sequence includes a signal peptide and has xylan and lichenan hydrolase activities as well as activity against crystalline cellulose .
Cellulosomes are naturally shed at the end of C. thermocellum growth, which was exploited by an affinity purification method and proteomics approach to show C. thermocellum cellulosomal compositional changes occurred in response to different carbon sources . One surprising aspect of the current study was that larger differences in cellulosomal genes were not observed at the level of transcription for the two biomasses, which may be a reflection of the pretreatment procedure efficiently homogenizing the carbohydrate components of the two biomasses. Although C. thermocellum cannot use xylose, we observed cellulosomal xylanases (Cthe_1398, Cthe_1838, Cthe_1963, Cthe_2590, and Cthe_2972) were among the most highly expressed genes (top 10%) suggesting this activity is important to access its preferred substrates. Xylanases showed little to no differential expression under the conditions assayed in this study despite bulk differences in xylose content of the two biomass substrates. An earlier study also reported highly expressed xylanase proteins on switchgrass  but high-level expression was not found for chemostat growth on purified cellulose , which shows the value in exploring a range of substrates and including those of industrial relevance. It is worth noting that the growth conditions, ‘omic’ level, and detection technologies were quite different between the current transcriptomic and earlier proteomic studies. Further systematic, integrated omic studies will be required to reveal more of this organism’s complex regulatory control mechanisms.
A putative Pst high-affinity phosphate transport system was expressed to a greater amount on switchgrass compared to Populus 12 hours postinoculation while one member of a sulfate transport system was upregulated on Populus. Other members of the sulfate transport system were highly differentially expressed in both the RNA-seq and array; however, they did not pass the significance threshold for the RNA-seq. Differences in phosphorus and sulfur contents for pretreated biomasses were observed (Additional file 7); however, the defined medium (MTC) used to suspend each biomass substrate was identical and replete for phosphate and sulfate for pure cellulose fermentations. Phosphate and sulfate uptake genes were not upregulated during growth on pure cellulose or cellobiose . The corresponding binding proteins for ABC transporters often have high degrees of specificity that can distinguish the phosphate and sulfate oxyanions despite their similarities , although there is little data on these systems for C. thermocellum. Phosphate is required for C. thermocellum carbohydrate breakdown as the bacteria favor transport of cellodextrins over monomeric sugars. Cellodextrins enter C. thermocellum cells via ATP-dependent ABC transport systems and once inside a phosphate anion act as a nucleophile for phosphorolytic cleavage [53, 54]. Multiple uncharacterized phosphate transport systems exist in the ATCC 27405 genome including two putative Na+/Pi co-transporters (Cthe_0064 and Cthe_2810), a putative Pit transporter (Cthe_3000), as well as the Pst system differentially expressed between the two biomass substrates. The Pst transporter is typically only induced under conditions of phosphate starvation [55–58], which would indicate that cells in the switchgrass fermentations were limited in phosphate despite sufficient phosphate being provided in the MTC medium for growth of this organism on pure cellulose or cellobiose. We observed a greater amount of divalent cations in the switchgrass compared to Populus, but at levels relatively insignificant compared to those provided in the MTC medium. Differences in medium ion composition may have influenced chemical speciation through formation of compounds such as insoluble metallophosphates, or disruption of ion exchange. Alternatively, one or more compounds generated during the switchgrass fermentation may have interfered with the C. thermocellum Na/Pi symporter leading to upregulation of the energetically more expensive high-affinity phosphate transport system. We observed approximately twice as much molybdenum in pretreated Populus verses switchgrass (Additional file 7) and factors such as this may have interfered with sulfate uptake and/or iron-sulfur proteins involved in metabolism. Differences in the expression of C. thermocellum anion transporters (phosphate and sulfate) may indicate part of a coordinated system for osmoadaptation and/or pH stasis with variation in the ash composition of the two biomasses influencing the osmotic balance of the cell [59, 60]. Further studies are required to investigate the physiological status of C. thermocellum during industrially-relevant fermentations.
Much higher expression from gene locus Cthe_1479-1481 occurred on switchgrass relative to Populus at both sampling time points. These genes are well conserved in bacteria and are currently annotated as a member of the RND exporter family. This type of transport system is typically associated with Gram-negative bacteria where they act to remove toxic compounds from the cell . Inhibitory compounds are generated from the pretreatment processing of biomass substrates , and despite extensive washing of the pretreated biomass, residual compounds are likely to remain in low quantities. Thus it is conceivable that a toxic compound liberated solely from switchgrass is removed from the cell via this efflux system and this could be a possible target for strain development. A recent study identified arabitol, a putative fermentation inhibitor, as liberated during C. thermocellum fermentation on switchgrass . We also observed greater expression in genes related to urea uptake and metabolism at 37 hours compared to 12 hours on Populus (switchgrass failed to meet one or both of the threshold criteria), which coincided with increases in ethanol concentrations. A previous study showed that the largest response of C. thermocellum to ethanol shock treatment was in genes and proteins related to nitrogen uptake and metabolism .
Three spore-related genes upregulated at 37 hours compared to 12 hours on both biomasses indicated that cells were priming for transition to stationary phase. C. thermocellum ATCC 27405 is inefficient at sporulation, converting between 0 to 7% of resting cells into spores after stressor application . An agr-dependent quorum sensing mechanism for Clostridium acetobutylicum sporulation regulation and granulose formation has been recently described . However, early signal sensing and transduction mechanisms for sporulation in Clostridia are not as well defined as for Bacillus subtilis. Cthe_3383 among the most highly expressed of C. thermocellum genes during growth on biomass substrates (Additional files 14 and 15), is a newly predicted gene that encodes a small (40 aa) putative hypothetical protein (putative autoinducer prepeptide), and is adjacent to genes annotated as having roles in sporulation. At a separate genomic locus we observed differential gene expression for two genes on the different biomass substrates (Cthe_1309 and Cthe_1310) (Additional file 15), with higher expression occurring during fermentation on Populus at 12 hours postinoculation. The latter gene is predicted to encode an accessory gene regulator B. Interestingly, a new addition to the genome, Cthe_3348, is directly downstream of Cthe_1310 and is predicted to encode a 54 amino acid AgrD-like peptide. The agrD gene was highly expressed but was not considered differentially expressed like the two upstream genes. The role, if any, that Cthe_3383 and Cthe_3348 play in signaling and the C. thermocellum sporulation regulatory cascade remains to be elucidated (for alignment see Additional file 14).
The results suggest a high degree of concordance in differential gene expression measurements between the three transcriptomic platforms. We observed few transcriptomic differences for C. thermocellum cellulosome-related genes for cells fermenting either dilute acid pretreated Populus or switchgrass, which may indicate that under this pretreatment regime they sense and respond to similar carbohydrate profiles during active growth. We observed differential expression sulfate- and phosphate-related genes, which may point to aspects of metabolism for more consideration during industrial-relevant fermentations. We have identified new and highly expressed genes and our update to the ATCC 27405 genome will be useful for follow-on studies.
Microarrays and RNA-seq each have respective biases that can interfere with differential expression determinations and in this study RNA-seq normalization methods dramatically affected downstream analyses. RNA-seq offers important advantages for transcriptomic profiling and it will invariably substitute microarrays as a preferred method. However, DNA microarray testing and analysis has evolved over many years through studies such as the MicroArray Quality Control (MAQC) project [65, 66] and further studies and cost reductions in sequencing are similarly required to develop RNA-seq analyses.
A gene modeling program termed Prodigal  was applied to the C. thermocellum ATCC 27405 genome sequence, followed by a round of manual curation in combination with proteomics data analysis  to ensure no peptide evidence existed for any deleted genes (data derived from Yang et al.  and reported in Additional files 1,2,3). A six-frame translation generated predicted ORFs and a search of available peptide data against these ORFs resulted in three groups: 1) peptides that fall under existing gene call; 2) those that have one end within an existing gene call and the other outside, which were used to correct the start and end coordinates for a gene; and 3) those that were not within an existing gene and were used to add a new gene. In addition, the following criteria were assessed: whether peptide hit is unique or matches several places in the genome, number of times peptide was detected, peptide BLAST percent identity and length of match, transcription level via RNA-seq data from this study at the start of a gene/ORF, 100 bp upstream and average coverage, Prodigal score for coding potential, start codon used, Prodigal score for ribosome binding site (RBS), manually checked RBS, similar sequences, and their start sites by blasting ORF against the National Center for Biotechnology Information (NCBI) non-redundant database. Predicted genes were annotated using an automated annotation pipeline, as described previously . The current annotation and a comparison to the earlier versions can be found at http://genome.ornl.gov/microbial/cthe/.
The biomass substrates used in the fermentations were dilute acid pretreated switchgrass (Panicum virgatum cultivar Alamo; SWG) and dilute acid pretreated Populus (Populus trichocarpa x Populus deltoides F1 hybrid; POP). The biomasses were milled to -20/+80 mesh size and pretreated with dilute sulfuric acid at 0.050 g/g of dry biomass at 190°C for 1 minute residence time (flow-through mode) and 25% (w/w) total solids using a Sunds reactor at the NREL [14, 67]. The pretreated biomasses were washed with Milli-Q H2O (Millipore, Billerica, MA, USA) until less than 0.1 g/L glucose could be detected in the wash eluent, and dried prior to fermentations .
Compositional analysis of biomass
Trace elements were determined by ICP-ES. The samples for ICP-ES were prepared using a method based on the United States Environmental Protection Agency (USEPA) SW-846 Method 3050A. Pretreated and unpretreated biomass samples were oven dried and a 2 g sample digested by sequentially heating in nitric acid, hydrogen peroxide, and hydrochloric acid. The samples were filtered through Whatman 41 filter paper (Whatman, Maidstone, UK) and the volume made up to 50 mL with deionized (DI) water. Aliquots (5 mL) were subjected to ICP-ES analysis in an Optima 3000 DV ICP Emission Spectrometer (PerkinElmer, Waltham, MA, USA) with yttrium used as an internal standard .
Overnight inoculum cultures of C. thermocellum 27405 were grown anaerobically in 50 mL bottles. Five 40 mL aliquots from 5 g/L Avicel in MTC  50 mL serum bottles were used to inoculate the 5-L Twin BIOSTAT B plus fermenters (Sartorius Stedim Biotech, Göttingen, Germany) (total volume 2 L) for a final inoculum of 10%. Two replicate fermentations were performed for each biomass. The dry weight basis of the loading of the biomass in each fermenter was 5 g/L in MTC medium. The fermenters were run at 58°C, 250 rpm, and pH-controlled at 7.0 with 3 N NaOH. Time = 0 samples were taken immediately postinoculation of the fermenter vessels. At 12 hours and 37 hours post-inoculation, 50 mL samples were removed for transcriptomic analyses.
Samples were removed periodically from the fermenter vessel to determine cell counts and monitor fermentation product formation and residual carbohydrates (Additional file 8). Samples for cell counts were diluted with Milli-Q H2O when necessary and a 10 μL aliquot was loaded onto a hemocytometer counting chamber for counting. Cell counts were performed in triplicate for each fermenter at a given time point.
Fermentation residues were analyzed for carbohydrate composition using quantitative saccharification assay ASTM E 1758–01 (ASTM 2003), NREL/TP 510–42618, and HPLC method NREL/TP 51–42623. Cell-free samples from the fermenters were analyzed for metabolites (acetic acid, lactic acid, and ethanol) and residual carbohydrates (cellobiose, glucose, xylose, and arabinose) using a LaChrom Elite HPLC System (Hitachi High Technologies America, Pleasanton, CA, USA) equipped with a refractive index detector (model L-2490), as previously described .
Cells pelleted from an 8 mL sample drawn from each fermenter were resuspended in 1.5 mL of TRIzol (Invitrogen, Carlsbad, CA, USA) and used for cell lysis by bead beating with 0.8 g of 0.1 mm glass beads (BioSpec Products, Bartlesville, OK, USA) with 3 × 20 seconds bead beating treatments at 6,500 rpm in a Precellys 24 high-throughput tissue homogenizer (Bertin Technologies, Montigny-le-Bretonneux, France). The RNA from each cell lysate was purified, DNaseI-treated, and quantity and quality assessed, as previously described . Purified RNA of high quality (RIN >8) was pooled from the same fermentation samples and depleted of rRNA using Ribo-Zero rRNA Removal Kit for Gram-positive bacteria (Epicentre, Madison, WI, USA). The sample was then concentrated with RNA Clean & Concentrate-5 (Zymo Research, Irvine, CA, USA) following the manufacturer’s protocol.
Depleted RNA was used as the starting material for the Epicentre ScriptSeq mRNA-Seq Library Preparation Kit (Illumina-compatible) utilizing the FailSafe PCR Enzyme Mix (Epicentre) and following the manufacturer’s protocol. cDNA tagged with standard adaptors was eluted with 20 μL of Buffer EB provided in the MinElute PCR Purification Kit (Qiagen, Venlo, Netherlands) according to the ScriptSeq protocol. Cycles were increased to 14 during amplification and samples were purified using the MinElute PCR Purification Kit and eluted with 20 μL of Buffer EB. The final mRNA-seq library was quantified with a Qubit fluorometer (Invitrogen) and library quality was assessed with Bioanalyzer High Sensitivity DNA Chip (Agilent, Santa Clara, CA, USA).
Samples were diluted to 2 nM, denatured, and further diluted to 6 pM. These were run on cBot (Illumina, San Diego, CA, USA) (SR_Amp_Lin_Block_Hyb_V7) overnight to cluster on version 1.5 Flow Cell. The mRNA-seq libraries were analyzed on a HiSeq 2000 (Illumina) platform with a SR50 sequencing kit for a single read of 51 cycles. The lane containing the F188 12-hour Populus sample included the control of phiX DNA.
Raw reads were mapped to genome [GenBank:CP000568.1] using CLC Genomics Workbench version 5.5.1 (CLC bio, Aarhus, Denmark) using the default settings for prokaryote genomes. Uniquely mapped reads were log2 transformed on importation into JMP Genomics version 6 (SAS Institute, Cary, NC, USA). Data were normalized using default settings for each of the four normalization strategies (see Additional file 12 for pre- and post-normalization distribution curves) and any genes with no read counts were removed prior to ANOVA analysis. Filtering was applied to identify those genes with an FDR <0.05 and a greater than a log2 of ± 1 for differential gene expression. Raw RNA-seq data have been deposited in the NCBI Sequence Read Archive (SRA) [SRA:060947] and we have made mapped reads and data available through the BioEnergy Science Center (BESC) KnowledgeBase http://bobcat.ornl.gov/besc/index.jsp. Samples in the SRA series [SRA:060947] are labeled accordingly with the accession number given in square brackets. C. thermocellum harvested after growth on Populus for 12 hours: F185_Ctherm_Pop_12 hr [SRR:620218] and F188_Ctherm_Pop_12 hr [SRR:620325]. C. thermocellum harvested after growth on Populus for 37 hours: F185_Ctherm_Pop_37 hr [SRR:620219] and F188_Ctherm_Pop_37 hr [SRR:620327]. C. thermocellum harvested after growth on switchgrass for 12 hours: F186_Ctherm_Swg_12 hr [SRR:620229] and F187_Ctherm_Swg_12 hr [SRR:620532]. C. thermocellum harvested after growth on switchgrass for 37 hours: F186_Ctherm_Swg_37 hr [SRR:620238] and F187_Ctherm_Swg_37 hr [SRR:620324]. Note that the same nomenclature of fermenter number (F185, F186, F187, and F188), biomass substrate (Pop and Swg), and time point of sampling (12 hours and 37 hours) is used for naming the samples in the microarray Gene Expression Omnibus (GEO) submission, see details below.
Microarray sample labeling, hybridization, scan, and statistical analysis of array data
RNA-seq libraries were also used for hybridization to the microarray. Beginning with 100 ng of cDNA, half volume Cy3 labeling reactions were undertaken for all eight samples according to the manufacturer’s protocols. Cy3 labeling efficiency was assessed by NanoDrop ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, USA) and determined to fall within the range of 20 to 24 pmol/μg. Hybridizations were conducted using a 12-bay hybridization station (BioMicro Systems, Salt Lake City, UT, USA) and the arrays dried using a MAUI Wash System (BioMicro Systems). Microarrays were scanned with a SureScan High-Resolution DNA Microarray Scanner (5 μm) (Agilent), and the images were quantified using NimbleScan software (Roche NimbleGen, Madison, WI, USA).
Raw data was log2 transformed and imported into the statistical analysis software JMP Genomics 6.0 software (SAS Institute). The data were normalized together using a single round of the LOESS normalization algorithm within JMP Genomics, and distribution analyses conducted before and after normalization were used as a quality control step. An ANOVA was performed in JMP Genomics to determine differential gene expression levels via a direct comparison of the two biomasses and time points using the FDR testing method (P <0.05) and Kenward-Roger degrees of freedom method. Microarray data have been deposited in the NCBI GEO database [GSE:47010]. Samples in the GEO series [GSE:47010] are labeled accordingly with the specific GEO sample accession number given in square brackets. C. thermocellum harvested after growth on Populus for 12 hours: F185_Pop_12 hr_rep1 [GSM:1142896] and F188_Pop_12 hr_rep1 [GSM:1142902]. C. thermocellum harvested after growth on Populus for 37 hours: F185_Pop_37 hr_rep1 [GSM:1142897] and F188_Pop_37 hr_rep1 [GSM:1142903]. C. thermocellum harvested after growth on switchgrass for 12 hours: F186_Swg_12 hr_rep1 [GSM:1142898] and F187_Swg_12 hr_rep1 [GSM:1142900]. C. thermocellum harvested after growth on switchgrass for 37 hours: F186_Swg_37 hr_rep1 [GSM:1142899] and F187_Swg_37 hr_rep1 [GSM:1142901].
Microarray data were validated using RT-qPCR, as described previously . Six genes representing a range of gene expression values based on microarray hybridizations were analyzed using qPCR from cDNA derived from different time point samples. Oligonucleotide sequences of the primers targeting the six genes selected for qPCR analysis were: Cthe_0344_F CGACTTCCCGAACCAGATAA, Cthe_0344_R GCAGCGGCTATCTTCATTTC; Cthe_0482_F GAGCAGGGATTGGTAATGGA, Cthe_0482_R TACCGCAAGACCTACAAGCA; Cthe_1481_F AGTCATATCCGAAAACATGG, Cthe_1481_R TTGTAGTCGTCAAGGGAAGT; Cthe_1604_F GTGTCCCCGCTATTGCTAAA, Cthe_1604_R ATGGGTAAAATGCCGAATGA; Cthe_1951_F AAAATAAAAGCCCAGGATTC, Cthe_1951_R GCATTATCCTGAAGTTCGTC; and Cthe_2531_F CGGAAAGGACATTGTCATCC, Cthe_2531_R CAAAGCCAGGGTTACGACAT.
The authors gratefully acknowledge Brian Davison (ORNL) for critical review of the manuscript. The authors thank Kelsey Yee (ORNL), Janet Westpheling (University of Georgia, Athens, GA, USA), Lee Lynd (Dartmouth College, Hanover, NH, USA), and Edward Bayer (Weizmann Institute of Science, Rehovot, Israel) for helpful discussions. Sagar Utturkar (University of Tennessee, Knoxville, TN, USA) provided technical assistance with sequence data. This work was supported by the Office of Biological and Environmental Research in the DOE Office of Science through the BESC, a DOE Bioenergy Research Center. ORNL is managed by UT-Battelle, LLC, Oak Ridge, TN, USA, for the DOE under contract DE-AC05-00OR22725.
- Lynd LR, Van Zyl WH, McBride JE, Laser M: Consolidated bioprocessing of cellulosic biomass: an update. Curr Opin Biotechnol 2005, 16:577–583.View Article
- Lynd LR, Weimer PJ, Van Zyl WH, Pretorius IS: Microbial cellulose utilization: Fundamentals and biotechnology. Microbiol Mol Biol Rev 2002, 66:506–577.View Article
- Demain AL, Newcomb M, Wu JH: Cellulase, clostridia, and ethanol. Microbiol Mol Biol Rev 2005, 69:124–154.View Article
- Alper H, Stephanopoulos G: Engineering for biofuels: exploiting innate microbial capacity or importing biosynthetic potential? Nat Rev Microbiol 2009, 7:715–723.View Article
- Farrell AE, Plevin RJ, Turner BT, Jones AD, O’Hare M, Kammen DM: Ethanol can contribute to energy and environmental goals. Science 2006, 311:506–508.View Article
- Hahn-Hagerdal B, Galbe M, Gorwa-Grauslund MF, Liden G, Zacchi G: Bio-ethanol - the fuel of tomorrow from the residues of today. Trends Biotechnol 2006, 24:549–556.View Article
- Himmel ME, Ding S-Y, Johnson DK, Adney WS, Nimlos MR, Brady JW, Foust TD: Biomass recalcitrance: Engineering plants and enzymes for biofuels production. Science 2007, 315:804–807.View Article
- Stephanopoulos G: Challenges in engineering microbes for biofuels production. Science 2007, 315:801–804.View Article
- Delcher A, Bratke K, Powers E, Salzberg S: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 2007, 23:673–679.View Article
- Badger J, Olsen G: CRITICA: coding region identification tool invoking comparative analysis. Mol Bio Evol 1999, 16:512–524.View Article
- Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinforma 2010, 11:119.View Article
- Hauser LJ, Land ML, Brown SD, Larimer F, Keller KL, Rapp-Giles BJ, Price MN, Lin M, Bruce DC, Detter JC, Tapia R, Han CS, Goodwin LA, Cheng JF, Pitluck S, Copeland A, Lucas S, Nolan M, Lapidus AL, Palumbo AV, Wall JD: The complete genome sequence and updated annotation of Desulfovibrio alaskensis G20. J Bacteriol 2011, 193:4268–4269.View Article
- Yang S, Pappas KM, Hauser LJ, Land ML, Chen GL, Hurst GB, Pan C, Kouvelis VN, Typas MA, Pelletier DA, Klingeman DM, Chang YJ, Samatova NF, Brown SD: Improved genome annotation for Zymomonas mobilis. Nat Biotechnol 2009, 27:893–894.View Article
- Raman B, Pan C, Hurst GB, Rodriguez M Jr, McKeown CK, Lankford PK, Samatova NF, Mielenz JR: Impact of pretreated switchgrass and biomass carbohydrates on Clostridium thermocellum ATCC 27405 cellulosome composition: A quantitative proteomic analysis. PLoS One 2009, 4:e5271.View Article
- Gold ND, Martin VJJ: Global view of the Clostridium thermocellum cellulosome revealed by quantitative proteomic analysis. J Bacteriol 2007, 189:6787–6795.View Article
- Riederer A, Takasuka TE, Makino S-I, Stevenson DM, Bukhman YV, Elsen NL, Fox BG: Global gene expression patterns in Clostridium thermocellum as determined by microarray analysis of chemostat cultures on cellulose or cellobiose. Appl Environ Microbiol 2011, 77:1243–1253.View Article
- Fontes CM, Gilbert HJ: Cellulosomes: Highly efficient nanomachines designed to deconstruct plant cell wall complex carbohydrates. Ann Rev Biochem 2010, 79:655–681.View Article
- Department of Energy (DOE), Office of Energy Efficiency and Renewable Energy (EERE), Office of the Biomass Program: U.S. Billion-Ton Update: Biomass Supply for a Bioenergy and Bioproducts Industry. Oak Ridge, TN: Oak Ridge National Laboratory; 2011:227.
- Leimena MM, Wels M, Bongers RS, Smid EJ, Zoetendal EG, Kleerebezem M: Comparative analysis of Lactobacillus plantarum WCFS1 transcriptomes by using DNA microarray and next-generation sequencing technologies. Appl Environ Microbiol 2012, 78:4141–4148.View Article
- Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol 2009, 191:3203–3211.View Article
- Yoder-Himes DR, Chain PS, Zhu Y, Wurtzel O, Rubin EM, Tiedje JM, Sorek R: Mapping the Burkholderia cenocepacia niche response via high-throughput sequencing. Proc Natl Acad Sci U S A 2009, 106:3976–3981.View Article
- Jourdren L, Bernard M, Dillies MA, Le Crom S: Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics 2012, 28:1542–1543.View Article
- Shendure J, Ji H: Next-generation DNA sequencing. Nat Biotechnol 2008, 26:1135–1145.View Article
- Oshlack A, Robinson MD, Young MD: From RNA-seq reads to differential expression results. Genome Biol 2010, 11:220.View Article
- Soneson C, Delorenzi M: A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinforma 2013, 14:91.View Article
- Mardis ER: Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet 2008, 9:387–402.View Article
- Luo C, Hu G-Q, Zhu H: Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics 2009, 10:552.View Article
- Siezen RJ, Francke C, Renckens B, Boekhorst J, Wels M, Kleerebezem M, Van Hijum SAFT: Complete resequencing and reannotation of the Lactobacillus plantarum WCFS1 genome. J Bacteriol 2012, 194:195–196.View Article
- Wood V, Rutherford KM, Ivens A, Rajandream MA, Barrell B: A re-annotation of the Saccharomyces cerevisiae genome. Comp Funct Genomics 2001, 2:143–154.View Article
- Yang S, Giannone RJ, Dice L, Yang ZK, Engle NL, Tschaplinski TJ, Hettich RL, Brown SD: Elucidation of the Clostridium thermocellum ATCC27405 ethanol shock responses using an integrated transcriptomic, proteomic and metabolomic profiling approach. BMC Genomics 2012, 13:336.View Article
- Mazumder K, York WS: Structural analysis of arabinoxylans isolated from ball-milled switchgrass biomass. Carbohydr Res 2010, 345:2183–2193.View Article
- Nataf Y, Bahari L, Kahel-Raifer H, Borovok I, Lamed R, Bayer EA, Sonenshein AL, Shoham Y: Clostridium thermocellum cellulosomal genes are regulated by extracytoplasmic polysaccharides via alternative sigma factors. Proc Natl Acad Sci U S A 2010, 107:18646–18651.View Article
- Nataf Y, Yaron S, Stahl F, Lamed R, Bayer EA, Scheper TH, Sonenshein AL, Shoham Y: Cellodextrin and laminaribiose ABC transporters in Clostridium thermocellum. J Bacteriol 2009, 191:203–209.View Article
- Yang S, Giannone RJ, Dice L, Yang ZK, Engle NL, Tschaplinski TJ, Hettich RL, Brown SD: Clostridium thermocellum ATCC27405 transcriptomic, metabolomic and proteomic profiles after ethanol stress. BMC Genomics 2012, 13:336.View Article
- Esvelt KM, Wang HH: Genome-scale engineering for systems and synthetic biology. Mol Syst Biol 2013, 9:641.View Article
- Chen Y, Indurthi DC, Jones SW, Papoutsakis ET: Small RNAs in the Genus Clostridium . mBio 2011, 2:e00340–10. doi:10.1128/mBio.00340–10
- van den Berg BH, McCarthy FM, Lamont SJ, Burgess SC: Re-annotation is an essential step in systems biology modeling of functional genomics data. Plos One 2010, 5:e10642.View Article
- Salzberg SL: Genome re-annotation: a wiki solution? Genome Biol 2007, 8:102.View Article
- Dillies MA, Rau A, Aubert J, Hennequet-Antier C, Jeanmougin M, Servant N, Keime C, Marot G, Castel D, Estelle J, Guernec G, Jagla B, Jouneau L, Laloë D, Le Gall C, Schaëffer B, Le Crom S, Guedj M, Jaffrézic F, French StatOmique Consortium: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 2013, 14:671–683.View Article
- Oshlack A, Wakefield MJ: Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 2009, 4:14.View Article
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010, 464:768–772.View Article
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008, 5:621–628.View Article
- Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinforma 2010, 11:94.View Article
- Ebringerova A: Structural diversity and application potential of hemicelluloses. Macromol Symp 2005, 232:1–12.View Article
- Izydorczyk MS, Biliaderis CG: Cereal arabinoxylans: Advances in structure and physicochemical properties. Carbohyd Polym 1995, 28:33–48.View Article
- Klinke HB, Thomsen AB, Ahring BK: Inhibition of ethanol-producing yeast and bacteria by degradation products produced during pre-treatment of biomass. Appl Microbiol Biotechnol 2004, 66:10–26.View Article
- Yee KL, Rodriguez MJ, Tschaplinski TJ, Engle NL, Martin MZ, Fu C, Wang ZY, Hamilton-Brehm SD, Mielenz JR: Evaluation of the bioconversion of genetically modified switchgrass using simultaneous saccharification and fermentation and a consolidated bioprocessing approach. Biotechnol Biofuels 2012, 5:81.View Article
- Pu Y, Hu F, Huang F, Davison BH, Ragauskas AJ: Assessing the molecular structure basis for biomass recalcitrance during dilute acid and hydrothermal pretreatments. Biotechnol Biofuels 2013, 6:15.View Article
- DeMartini JD, Pattathil S, Miller JS, Li H, Hahn MG, Wyman CE: Investigating plant cell wall components that affect biomass recalcitrance in poplar and switchgrass.Energy. Environ Sci 2013, 6:898–909.
- Zhang YH, Lynd LR: Cellulose utilization by Clostridium thermocellum: bioenergetics and hydrolysis product assimilation. Proc Natl Acad Sci U S A 2005, 102:7321–7325.View Article
- Ahsan M, Matsumoto M, Karita S, Kimura T, Sakka K, Ohmiya K: Purification and characterization of the family J catalytic domain derived from the Clostridium thermocellum endoglucanase CelJ. Biosci Biotechnol Biochem 1997, 61:427–431.View Article
- Davidson AL, Dassa E, Orelle C, Chen J: Structure, function, and evolution of bacterial ATP-binding cassette systems. Microbiol Mol Biol Rev 2008, 72:317–364.View Article
- Strobel HJ, Caldwell FC, Dawson KA: Carbohydrate transport by the anaerobic thermophile Clostridium thermocellum LQRI. Appl Environ Microbiol 1995, 61:4012–4015.
- Alexander JK: Purification and specificity of cellobiose phosphorylase from Clostridium thermocellum. J Biol Chem 1968, 243:2899–2904.
- Fischer RJ, Oehmcke S, Meyer U, Mix M, Schwarz K, Fiedler T, Bahl H: Transcription of the pst operon of Clostridium acetobutylicum is dependent on phosphate concentration and pH. J Bacteriol 2006, 188:5469–5478.View Article
- Gebhard S, Tran SL, Cook GM: The Phn system of Mycobacterium smegmatis: a second high-affinity ABC-transporter for phosphate. Microbiology 2006, 152:3453–3465.View Article
- Antelmann H, Scharf C, Hecker M: Phosphate starvation-inducible proteins of Bacillus subtilis: proteomics and transcriptional analysis. J Bacteriol 2000, 182:4478–4490.View Article
- Aguena M, Yagil M, Spira B: Transcriptional analysis of the pst operon of Escherichia coli. Mol Genet Genomics 2002, 268:518–524.View Article
- El-Nashaar HM, Banowetz GM, Griffith SM, Casler MD, Vogel KP: Genotypic variability in mineral composition of switchgrass. Bioresour Technol 2009, 100:1809–1814.View Article
- Diaz-Ramirez M, Boman C, Sebastian F, Royo J, Xiong SJ, Bostrom D: Ash characterization and transformation behavior of the fixed-bed combustion of novel crops: poplar, brassica, and cassava fuels. Energ Fuel 2012, 26:3218–3229.View Article
- Alvarez-Ortega C, Olivares J, Martinez JL: RND multidrug efflux pumps: what are they good for? Front Microbiol 2013, 4:7.View Article
- Mearls EB, Izquierdo JA, Lynd LR: Formation and characterization of non-growth states in Clostridium thermocellum: spores and L-forms. BMC Microbiol 2012, 12:180.View Article
- Steiner E, Scott J, Minton NP, Winzer K: An agr quorum sensing system regulates granulose formation and sporulation in Clostridium acetobutylicum. Appl Environ Microbiol 2012, 78:1113–1122.View Article
- Paredes CJ, Alsaker KV, Papoutsakis ET: A comparative genomic view of clostridial sporulation and physiology. Nat Rev Microbiol 2005, 3:969–978.View Article
- Shi L, Campbell G, Jones WD, Campagne F, Wen Z, Walker SJ, Su Z, Chu TM, Goodsaid FM, Pusztai L, Shaughnessy JD Jr, Oberthuer A, Thomas RS, Paules RS, Fielden M, Barlogie B, Chen W, Du P, Fischer M, Furlanello C, Gallas BD, Ge X, Megherbi DB, Symmans WF, Wang MD, Zhang J, Bitter H, Brors B, Bushel PR, Bylesjo M, et al.: The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 2010, 28:827–838.View Article
- Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, MAQC Consortium, et al.: The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006, 24:1151–1161.View Article
- Schell DJ, Farmer J, Newman M, McMillan JD: Dilute-sulfuric acid pretreatment of corn stover in pilot-scale reactor: investigation of yields, kinetics, and enzymatic digestibilities of solids. Appl Biochem Biotechnol 2003, 105–108:69–85.View Article
- Sannigrahi P, Ragauskas AJ: Characterization of fermentation residues from the production of bio-ethanol from lignocellulosic feedstocks. J Biobased Mater Bio 2011, 5:514–519.View Article
- Kridelbaugh DM, Nelson J, Engle NL, Tschaplinski TJ, Graham DE: Nitrogen and sulfur requirements for Clostridium thermocellum and Caldicellulosiruptor bescii on cellulosic substrates in minimal nutrient media. Bioresour Technol 2013, 130:125–135.View Article
- Syed MH, Karpinets TV, Parang M, Leuze MR, Park BH, Hyatt D, Brown SD, Moulton S, Galloway MD, Uberbacher EC: BESC knowledgebase public portal. Bioinformatics 2012, 28:750–751.View Article
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.