The growth condition of N. patriciarum W5 was adapted from a rumen content of a water buffalo. The isolate was maintained in a rumen fluid-containing medium supplemented with 0.5% rice straw as the sole carbon and was anaerobically grown at 39°C for three or four days  with some variations. Glucose, carboxylmethyl cellulose (CMC), avicel or the powder of rice straw, napiergrass or sugarcane bagasse added to the broth (300 mL) as a sole carbon source to induce the cellulase production of N. patriciarum W5. To collect extracellular proteins from W5, four replicates of 300 mL of medium were cultured anaerobically in 39°C incubators for three to four days, and supernatants were harvested by centrifugation.
Extracellular cellulase purification
A total of 1.5 L of culture supernatants were filtered through a 0.2-mm membrane (Sartorius, Goettingen, Germany), then dialyzed and condensed 100-fold using Viva Flow 50 (10-kDa cutoff) (Sartorius) against starting buffer (50 mM Tris·HCl, pH 7.0) to a final volume of 20 mL at 4°C.
To purify the protein fractions that contain cellulases, sequential FPLC (ÄKTAFPLC; GE Healthcare, Taipei, Taiwan) was conducted. First, the samples were subjected to a HiPrep 16/10 QFF anion exchange column (GE Healthcare, Taipei, Taiwan) preequilibrated with 60 mL of starting buffer to stabilize the UV signal, and then a 10-mL sample was applied and eluted for 60 mL with a step gradient using a 15% mixing ratio of elution buffer (50 mM Tris·HCl, 2 M NaCl, pH 7.0) at a constant flow of 3 mL/minute. Fractions of 5 mL from the eluates were collected and assayed for the activity of pNPG, dye-CMC and 4-MUC (Sigma-Aldrich, St Louis, MO, USA), respectively. Fractions with apparent activity were pooled, condensed and dialyzed against gel filtration buffer (20 mM Tris·HCl, 150 mM NaCl, pH 7.0) with Viva Spin (10-kDa cutoff). Four milliliters of the condensed sample were applied to the HiLoad 16/60 Superdex 200 gel filtration column (GE Healthcare, Taipei, Taiwan) with a constant flow of 1.2 mL/minute. All 3-mL fractions were assayed for enzyme activity using pNPG, dye-CMC and 4-MUC as substrates. Fractions displaying the activity of BGLUs with EXGs (pNPG and 4-MUC) and EGs (dye-CMC) were pooled and condensed, respectively.
Enzyme activity assay
4-MUC assays were carried out for broad quantitative analysis of cellulase activity, and the activity was measured by fluorometry with excitation and emission wavelengths at 365 and 450 nm, respectively. Dye-CMC was used as a substrate for the quantification of EG activity, and the activity was measured at 590 nm. pNPG was used as the substrate for assays of BGLU activity, and the activity was measured at 410 nm. If both dye-CMC and pNPG activity was low, the 4-MUC activity represented the exocellulase activity. Protein concentration was measured with the Protein Assay reagent (500-0006; Bio-Rad Laboratories, Hercules, CA, USA). The specific activities were defined as digesting 1 μM substrate per milligram of protein per hour.
SDS-PAGE was performed to examine the approximate protein sizes . Proteins were separated on 12% polyacrylamide gels and stained with Coomassie Brilliant Blue R-250 (Bio-Rad Laboratories). The molecular weight markers were obtained from Fermentas (Burlington, ON, Canada).
A total of 100 μg of pooled fractionated P2-1 and P2-2 were separated with 17 × 17 cm of 12% SDS-PAGE controlled by the PROTEAN II xi electrophoresis system (Bio-Rad Laboratories). Duplicated samples were separated on the same 12% SDS-PAGE gel using non-β-ME sampling buffer and premixed without heating for zymogram tests. SDS-PAGE for the zymogram test was laid onto a 16-cm diameter of 1% CMC (for EG; Calbiochem, Darmstadt, Germany); catalog no. 217277; Merck & Co., Whitehouse Station, NJ, USA) agar plate prespread with 500 μL of 2% 4-MUC (for EXG), air-dried and finally moisturized with 0.5 mL of 90 mM sodium acetate on top of the PAGE gel and kept at 30°C for three hours or overnight to detect the EXG or EG activity, respectively. The 4-MUC plate was measured by fluorometry under UV light at 365 nm for qualitative activity. The CMC plate was later stained with 15 mL of Congo Red (1 mg/mL) at room temperature for 40 minutes and then destained with 1 M NaCl to visualize reactive (transparent) bands. The transparent bands on the zymogram were estimated for size guided by molecular weight markers, and the corresponding bands on SDS-PAGE were sliced from the same size range, dissolved and digested with trypsin solution, and subjected to ESI-TRAP LC-MS/MS analysis.
In-gel trypsin digestion
The protocol used for in-gel trypsin digestion of proteins in gels was adapted mainly from a method described previously . Briefly, the protein bands from one-dimensional gel were excised, and each band was diced into small pieces (about 0.5 mm3). The gel pieces were placed into microfuge tubes and washed a few times with solution containing 50% methanol and 5% acetic acid for two to three hours, followed by two washes with a solution of 25 mM NH4HCO3 in 50% acetonitril for 10 minutes each, and then the gel pieces were dried in a vacuum centrifuge. Reduction with dithiothreitol (DTT) and alkylation with iodoacetamide of proteins in gel pieces were performed, and the gel pieces were washed and dried in a vacuum centrifuge before trypsin digestion. A 25- to 40-μL trypsin solution in 25 mM NH4HCO3 containing 75 to 100 ng of sequencing grade modified trypsin (Promega, Madison, WI, USA) was added and incubated with gel pieces for 12 to 16 hours at 37°C. To recover the tryptic peptides, a solution of 30 μL containing 5% formic acid and 50% acetonitril was added to the gel pieces, agitated in a vortex for 30 to 60 minutes and transferred into a new tube. The remains were repeated once with 15 μL of solution, and the resulting two eluates were combined and dried in a vacuum centrifuge. The dried pellet was redissolved in 10 to 20 μL of 0.1% formic acid for LC-MS/MS analysis (Proteomics Core Laboratory, IPMB, Academia Sinica, Taipei, Taiwan).
MS method for protein identification and analysis
An LTQ mass spectrometer model coupled with an online capillary LC system from Thermo Fisher Scientific (Waltham, MA, USA) was utilized for protein identification and analysis. The capillary LC system equipped with an autosampler and Surveyor pumps (Thermo Fisher Scientific), a C18 trap column (100 μm × 3.0 cm, 5 μm, 200-Å Magic C18 AQ resin; Michrom Bioresources, Inc., Auburn, CA) and a C18 reverse phase column (75 μm × 11 cm, 5 μm, 100-Å Magic C18 resin, Auburn, CA) was used to deliver solvent and to separate tryptic peptides with a linear gradient from 5% to 40% of acetonitril in 0.1% (vol/vol) formic acid for 40 minutes (or 60 minutes) at the nanoflow rate of approximately 300 nL/minute. A sample volume of 5 μL was loaded onto the column. The C18 reverse phase column was coupled to a nanoelectrospray ionization source, and the acquisition of the data was conducted with a full MS scan followed by four MS/MS scans of the top four precursor ions from the MS scan. The MS scan was conducted over the mass-to-charge (m/z) range 400 to 1,800 using the data-dependent data acquisition mode with dynamic exclusion enabled. The acquired ESI-TRAP MS/MS data were analyzed using a MASCOT MS/MS Ion Search program (http://www.matrixscience.com/) against the ORFs of our transcriptome database. The peak list data (DTA) used for the database search were generated from MS/MS spectra using an ion intensity threshold of 1,000 and a minimum ion count of 10. The protein sequences in the database were indexed with trypsin digestion with two missed cleavages, the 600 to 3,500 molecular weight range, a variable modification of Met by oxidation and a static modification of Cys by carboxyamidomethylation. The DTA data were searched with 1.0 atomic mass units (u) of precursor peptide mass tolerance and 0.5 u of fragment mass tolerance. The matched peptides were accepted when they passed multiple filters, X
corr ≥ 1.5 for singly charged ions (z = 1), X
corr ≥ 2.0 for doubly charged ions (z = 2), X
corr ≥ 2.5 for triply charged ions (z = 3), ΔCN ≥ 0.1 and peptide probability ≤1 × 10-3. Matched proteins were accepted only when they had at least two distinct peptide hits.
RNA extraction and cDNA synthesis of W5
The cells were cultured under anaerobic conditions at 39°C for 2.5 days, and then the medium was removed by filtration to collect the cells. The culture mass was ground in liquid nitrogen, and total RNA was extracted using the hot acid phenol-chloroform method and quantified on a spectrophotometer. From each substrate-induced culture (CMC, avicel, cellobiose, rice straw, sugarcane and napiergrass), equal amounts of total RNA were pooled together. The mRNA of the pooled cellulose-induced total RNA samples were enriched using the Dynabeads mRNA Purification Kit (Invitrogen, Carlsbad, CA, USA), and double-stranded cDNA was synthesized from the 10 μg of purified mRNA sample using the SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen) according to the manufacturer's instructions. The cDNA were then fragmented and prepared for the shotgun libraries according to Roche's protocols. The deep sequencing experiments were carried out on the GS FLX Sequencer by Mission Biotech Co., Ltd. (Taipei, Taiwan).
To synthesize the rice straw-induced cellulase containing cDNA for reverse transcriptase (RT)-PCR and gene cloning, reverse transcription using total RNA was carried out with the SuperScript™ II Reverse Transcriptase Kit (Invitrogen). The reaction contained the total RNA (4 μg), oligo(dT) primer, BD SMART II A oligo (Clontech, Mountain View, CA, USA), 4 μL of 5 × RT buffer, 10 mM DTT, 0.5 mM each of deoxyribonucleotide triphosphate and 10 U of SUPERSCRIPT™ II RNase H Reverse Transcriptase (Invitrogen). The single-stranded DNA product was stored at -20°C.
Transcriptome analysis: cDNA assembly, ORF prediction and domain prediction
The 454 sequence reads were processed through SeqClean to remove poly(T) stretches and then assembled into unique contigs by CAP3 (read minimum overlap length = 40 with minimum identity = 0.9). ORFs were predicted by GETORF, a part of EMBOSS software , for in-frame regions between a start and a stop codon. All predicted ORFs were then submitted to local RPS-BLAST  and the NCBI CDD database to retrieve domain predictions with an e-value < 10-5.
For quantitative profiling, an RNA-Seq library was constructed from the cDNA (synthesized from rice straw-induced RNA) using the Illumina kit, and paired-end sequencing was performed on the Illumina GA2 sequencer of the High Throughput Sequencing Core at Academia Sinica, Taipei, Taiwan. Reads from PE2*40 nucleotides (7.4 million pairs) (insert range between 180 and 220 bp) were obtained. The data were mapped to the EST contigs derived from 454 assembly and PCR cloning, and the relative expression levels were calculated by the RPKM method  using CLC-bio Genomics Workbench software (CLC bio, Aarhus N, Denmark) and sorted by ranking. The contigs with ORFs longer than 500 bp were further analyzed for GH family gene classification. The sequences with the highest RPKM scores and significant e-values (< 10-10) were selected for functional annotation. The analysis flowchart is illustrated in Figure 4.
Strains and plasmids used for cloning and expression
The E. coli DH5α strain was used as a host for cloning and plasmid propagation. The strain was cultured in a low-salt LB medium. P. pastoris GS115 strain (Invitrogen) was used as a host for expressing cloned cellulase genes. The P. pastoris transformants were selected in YPD (1% yeast extract, 2% peptone, 2% dextrose) plates containing 100 μg/ml of zeocin. For expression, the BMGY medium and the BMMY medium were used and prepared according to the Pichia expression system manual (Invitrogen). The S. cerevisiae transformants were selected in SC-URA plates. The pGEM-T easy vector (Promega) was used for gene cloning and sequencing. pPICZ A plasmid (Invitrogen) and pRS426 plasmid with glyceraldehyde 3-phosphate dehydrogenase (GAPDH) promoter were used for protein expression in P. pastoris and S. cerevisiae, respectively.
Cloning and construction for selected cellulase genes
The reverse-transcribed W5 cDNA was used to clone the selected cellulase genes by PCR using the gene-specific oligonucleotide primers the sequences of which were derived from the transcriptome contigs (Table 5). For each pair, the forward primer was engineered with XhoI (New England Biolabs, Ipswich, Massachusetts, USA) and the reverse primer was engineered with EcoRI (New England Biolabs) recognition sequences to allow for directional cloning into expression vector. After amplification, each PCR was cloned into the pGEM-T easy vector for sequencing. Plasmid DNA from the positive clones were digested by XhoI and EcoRI, and the putative cellulase gene was subcloned into the pPICZ A vector, which contained an AOX1 promoter that can be induced by methanol, and allowed for selection by the antibiotic zeocin. For expression in S. cerevisiae, the ORF of the candidate cellulase gene was fused with the GAPDH promoter and cloned into the pRS426 plasmid and then transformed into yeast cells. The procedures for plasmid preparation, restriction digestion, ligation and E. coli transformation were carried out in accordance with the manufacturers' protocols.
Transformation and cellulase expression in Pichia pastoris
For growing P. pastoris GS115 cells, all cultures were incubated at 30°C with 250-rpm orbital shaking. Procedures for preparation of competent cells were carried out according to the manufacturer's description (Bio-Rad Laboratories). For transformation, 10 μg of each cellulase construct in the pPICZ A vector was linearized with SacI (New England Biolabs) and transformed into yeast P. pastoris GS115 strain by the Gene Pulser Xcell (Bio-Rad Laboratories). The 200-μL electroporated mixtures were plated into YPD medium containing 100 μg/mL zeocin and incubated at 30°C until colonies appeared. The introduction of each cellulase gene into the yeast cell was further confirmed by PCR using the forward and reverse primers of AOX1 promoter (Invitrogen). For expression and induction of the cellulase genes, the protocols were performed according to the Pichia expression system manual (Invitrogen). To monitor the activities of expressed proteins, 1-mL samples from culture were collected every 24 hours. After 1-mL samples were taken, methanol (100%) was added to a final concentration of 0.5% (vol/vol) to maintain induction. To screen for the Pichia transformants expressing high activity of cellulase protein, ten colonies were randomly selected from YPD/zeocin plates and individually cultured in 10 mL of BMMY at 30°C for five days for methanol induction.
Transformation and cellulase expression in Saccharomyces cerevisiae
Yeast BY4741 (MATa his3Δ1 leu2Δ0 met15Δ0 ura3Δ0), a descendant of S288C, was grown overnight at 30°C with 250-rpm shaking. Procedures for the preparation of competent cells were carried out according to the manufacturer's description (Bio-Rad Laboratories). For transformation, 200 ng of each cellulase construct in pRS426 was directly transformed into yeast BY4741 strain using the Gene Pulser Xcell Electroporation System (Bio-Rad Laboratories). The 50-μL electroporated mixtures were incubated in 1 mL of YPAD (1% yeast extract, 2% peptone, 0.04% adenine sulfate, 2% dextrose) and recovered after three hours of incubation at 30°C with 250-rpm shaking. The transformants, after being centrifuged and washed twice with 1 mL of deionized distilled water, were selected on the SC-URA plate at 30°C until URA+ colonies appeared. Only the strains that carried the desired plasmids will survive and form colonies on the SC-URA plate. Ten colonies of transformants from each strain were randomly selected and cultured in 5 mL of SC-URA medium at 30°C for 24 hours, and the harvested supernatants were then screened by the 4-MUC activity. At least three higher-expression colonies of each gene were diluted to 600-nm optical density (OD600) = 0.1 and grown in 50 mL of SC-URA medium at 30°C for 24 hours. The condensed supernatants and collected cell pellets were assayed again.
Functional screening and activity assay
For the 4-MUC enzymatic assay, 40 μL of the supernatants from the 1-mL sample (collected as described above) were added to reaction buffer (90 mM NaOAc, pH 5) containing 2 mg/mL 4-MUC (Sigma-Aldrich) and incubated for 20 hours at 30°C before addition of stop solution (2% Na2CO3, pH 11). The activity was determined by measuring absorbance at OD450 nm (excitation at 365 nm) using the SpectraMax M2 Microplate Reader (Molecular Devices, Sunnyvale, CA, USA).
To find the best candidates from the ten transformants, 4-MUC assays were repeated to select three colonies that displayed the highest activity for each specific gene. The culture condition was the same as above, except for scaling up culture to 50 mL of BMMY.
To evaluate the activities of the cell pellet, the protein extraction procedure was based on the Pichia expression system manual (Invitrogen). For the 4-MUC enzymatic assay, 40 μL of the pellet extractions from the 1-mL culture harvest were added to reaction buffer (90 mM NaOAc, pH 5) containing 4-MUC at 2 mg/mL and incubated for 2.5 hours at 30°C before addition of stop solution (2% Na2CO3, pH 11). The activity was determined as described above.
For determination of the activities in benchmarks, the genes of CbhI (an exoglucanase of T. reesei), EgIII (an endoglucanse of T. reesei) and BglI (a β-glucosidase of A. niger) were also cloned into the pPICZ A vector and transformed in the Pichia expression system. The primers of these genes were derived from the sequences at the NCBI. Cloning and construction were carried out the same way as for W5 genes. The calculation of specific activity in each cellulase candidate was derived from the raw-read/protein concentration. Then the activities of these cellulases were used as controls to derive relatively specific activity of W5 enzymes for exoglucanse (4-MUC), endoglucanase (dye-CMC) and β-glucosidase (pNPG), respectively.