Multifunctional cellulase catalysis targeted by fusion to different carbohydrate-binding modules

Background Carbohydrate binding modules (CBMs) bind polysaccharides and help target glycoside hydrolases catalytic domains to their appropriate carbohydrate substrates. To better understand how CBMs can improve cellulolytic enzyme reactivity, representatives from each of the 18 families of CBM found in Ruminoclostridiumthermocellum were fused to the multifunctional GH5 catalytic domain of CelE (Cthe_0797, CelEcc), which can hydrolyze numerous types of polysaccharides including cellulose, mannan, and xylan. Since CelE is a cellulosomal enzyme, none of these fusions to a CBM previously existed. Results CelEcc_CBM fusions were assayed for their ability to hydrolyze cellulose, lichenan, xylan, and mannan. Several CelEcc_CBM fusions showed enhanced hydrolytic activity with different substrates relative to the fusion to CBM3a from the cellulosome scaffoldin, which has high affinity for binding to crystalline cellulose. Additional binding studies and quantitative catalysis studies using nanostructure-initiator mass spectrometry (NIMS) were carried out with the CBM3a, CBM6, CBM30, and CBM44 fusion enzymes. In general, and consistent with observations of others, enhanced enzyme reactivity was correlated with moderate binding affinity of the CBM. Numerical analysis of reaction time courses showed that CelEcc_CBM44, a combination of a multifunctional enzyme domain with a CBM having broad binding specificity, gave the fastest rates for hydrolysis of both the hexose and pentose fractions of ionic-liquid pretreated switchgrass. Conclusion We have shown that fusions of different CBMs to a single multifunctional GH5 catalytic domain can increase its rate of reaction with different pure polysaccharides and with pretreated biomass. This fusion approach, incorporating domains with broad specificity for binding and catalysis, provides a new avenue to improve reactivity of simple combinations of enzymes within the complexity of plant biomass. Electronic supplementary material The online version of this article (doi:10.1186/s13068-015-0402-0) contains supplementary material, which is available to authorized users.

various crystalline and recalcitrant structures. Hemicellulose is assembled from a variable combination of sugar backbones and may have a variety of branching structures and species-specific variations [8,16,17]. For example, xyloglucan consists of a β-1,4-linked glucose with partial backbone acetylation and O6 branches containing xylose, galactose, and fucose [8], while glucuronoarabinoxylan consists of a β-1,4-linked xylose with partial backbone acetylation and O2 and O3 branches containing arabinose and glucuronate [8]. Ferulate esters also serve to crosslink the arabinoxylan branches to lignin [8]. Altogether, the complex matrix of cellulose, hemicellulose, and lignin is a primary impediment to the high-yield enzymatic deconstruction of biomass [6,18,19]. In order to achieve the inherent potential of a renewable biocommodities industry based on sugars derived from cellulosic biomass, improvements in many technologies including chemical pretreatment, enzyme hydrolysis, and microbial fermentation are still needed [5,[19][20][21].
To date, more than 48,000 CBM sequences have been classified into 71 CBM families based on sequence similarity, and the structures of 271 representative CBMs have been reported (http://www.cazy.org) [38]. The many CBM families contain members that bind to the various polysaccharides that occur in nature [39,40]. Three types of CBMs have been identified based on their structures and ability to influence the function of associated catalytic domains [37]. Type A CBMs interact with the planar surfaces of crystalline polysaccharides, such as cellulose, through interactions between aromatic amino acid side chains of Trp, Tyr, and Phe [41,42] and the polysaccharide. Type B CBMs have an open cleft that can bind polysaccharides found in amorphous regions of cellulose and hemicellulose [43][44][45][46]. Type C CBMs are suggested to bind short soluble oligosaccharides [37]. Therefore, different types of CBMs can target an attached catalytic domain to a particular substrate, and by doing so, have profound effects on the catalytic rates of the attached enzyme [47][48][49][50][51][52].
Ruminoclostridium thermocellum (formerly Clostridium thermocellum), a thermophilic, cellulolytic, and ethanologenic anaerobe, extracts nutrients from lignocellulosic biomass by producing a multi-enzyme complex called a cellulosome [53][54][55][56][57]. This complex is formed by the recruitment of enzymes to the scaffoldin protein, CipA (Cthe_3077), as a result of high-affinity interactions of dockerin and cohesin domains. Both the scaffoldin and recruited enzymes contain CBMs that attach to insoluble substrates. For example, CBM3a is an integral domain of scaffoldin CipA, and it helps to localize the cellulosome to the surface of crystalline cellulose to promote efficient hydrolysis [41]. In addition, many cellulosomal enzymes possess their own CBMs that localize them to additional substrates in close proximity to cellulose. R. thermocellum is thus an invaluable source of both enzyme catalytic domains and CBMs for studies to identify unique pairs with enhanced reactivity.
The use of native and engineered enzymes has the potential to reduce the cost of biofuel production [58]. Current fungal cocktails used for biomass hydrolysis are complex and might contain 50 or more different polypeptides [59]. A number of approaches are being considered that could improve the performance of enzyme mixtures in biomass deconstruction, including (1) elimination of redundant or nonfunctional proteins from the mixture; (2) stabilization of key enzymes from nonspecific irreversible adsorption, proteolytic, thermal, and other types of inactivation; and (3) substitution of enzymes with different binding properties, k cat , or other catalytic properties better matched to the conditions of the desired application. Recently, we reported that CelE [60], a single broad specificity glycoside hydrolase family 5 (GH5) domain from R. thermocellum, is able to hydrolyze cellulose, xylan, and mannan, the three major polysaccharides found in the plant cell wall, and so could potentially replace or augment more strictly specific cellulose-, xylan-, or mannan-degrading enzymes in a hydrolysis reaction. We also showed that the fusion of the CelE catalytic core/ domain (CelEcc) to CBM3a was highly reactive on pretreated biomass [61]. With this positive result, it was reasonable to consider whether other CBM domains might enhance this broad reactivity. Indeed, the ability to target enzymes toward different polysaccharide constituents of plant biomass via engineered fusion to CBMs with different binding specificities is an intriguing [51,[62][63][64][65][66][67][68], albeit not fully explored, aspect of glycoside hydrolase engineering.
In this paper, we report a combinatorial evaluation of the ability of representatives from each of the 18 CBM families found in R. thermocellum to modulate enzyme function of a single multifunctional enzyme, CelE, from the same organism. Following earlier studies where fluorescent proteins have been appended to CBMs and other proteins to better understand their binding properties [68][69][70][71][72][73][74][75], GFP_CBM fusions were used to study CBM binding. Enzyme_CBM fusions were then used to study effects on catalytic activity with purified polysaccharides and with ionic liquid pretreated switchgrass (IL-SG), a model bioenergy substrate containing amorphous cellulose and retaining a high fraction of hemicellulose [76,77]. Results show fusions of different CBMs to CelE gave enhancement of both rates and yields in hydrolysis with different purified polysaccharide substrates and also with IL-SG. The best improvements in reactivity for the same catalytic domain (~4×) were correlated with broad specificity and moderate affinity of CBM binding.

CBMs from R. thermocellum
Thirty-nine CBMs from R. thermocellum ATCC 27405 were selected for study (Additional file 1: Table S1), including nine representatives from family CBM3, seven from CBM4, two from CBM9, five from CBM22, three from CBM35, and one each from CBM6, CBM11, CBM13, CBM16, CBM25, CBM30, CBM32, CBM34, CBM42, CBM44, CBM48, CBM50, and CBM54 (Additional file 1: Table S1). In order to test all of the CBM classes encoded in the R. thermocellum genome, at least one sequence was selected from each family. When multiple sequences were found in the CBM family (i.e., there are 24 genes encoding a CBM3 domain in R. thermocellum), several sequences were selected to test for their functions.
The plasmid pEUTTJW ( Fig. 1) was designed to contain four unique restriction enzyme recognition sites, SgfI, PmeI, AflII, and BamHI, which allowed PCR-amplified DNA sequences (Additional file 1: Tables S2, S3) to be swapped into either the catalytic domain, linker, or CBM positions. By means of AflII and BamHI restriction enzymes, the set of GFP_CBM plasmids was constructed. Subsequently, SgfI and PmeI restriction enzymes were used to create the corresponding CelEcc_CBM plasmids. All genes were successfully cloned, sequence-verified, and translated into protein products using wheat germ cell-free translation ( Fig. 2; Additional file 1: Table S4).

Soluble polysaccharide binding
To determine the binding specificity of the R. thermocellum CBMs, we performed affinity gel electrophoresis with GFP_CBMs and soluble polysaccharides including hydroxyethylcellulose (HEC), icelandic moss lichenan, carob galactomannan, beechwood xylan, and wheat flour arabinoxylan. GFP_CBM binding was evaluated by calculating R r from gels prepared with and without substrate. Most of the constructs that bound soluble substrates had R r values less than 0.75. R r values are listed in Additional file 1: Table S5. Twenty-eight GFP_CBMs interacted with at least one of the substrates tested; 23 and 17 GFP_ CBMs were assigned to bind to either HEC or lichenan, respectively ( Fig. 3; Table 1; Additional file 2: Figure S1). Among all CBMs tested, CBM44 showed the broadest binding specificity.

Insoluble polysaccharide binding
Insoluble pull-down assays using GFP_CBMs were carried out with Avicel, phosphoric acid-swollen cellulose (PASC), birchwood xylan, 1,4-β-d-mannan, AFEX-SG, and IL-SG (Additional file 1: Table S6). Sixteen GFP_ CBM constructs were detected to bind to one or more Fig. 1 Schematics of the plasmid and fusion proteins used in this work. a Schematic of the plasmid and nucleotides in the functional region of pEUTTJW used to create fused gene sequences for cell-free protein translation. Locations of flanking primer pair used to transfer an assembled fusion protein into pVP67K for expression in E. coli are shown as circles 1 and 2 (blue lines). b Schematic of the domain structures of expressed protein consisting of either GFP (green) or CelE (purple), followed by the linker (blue) and the CBM domain (yellow) insoluble substrates according to the criterion of PF % of 10 % or greater (see Eqs. 2 and 3, "Methods"), and these are reported in Table 2. No binding was detected for 23 of the GFP_CBM constructs using the pull-down assay, and so are not included in Table 2.

Catalytic properties of CelEcc_CBM
Each CBM used in the binding assays was fused to the C-terminus of the GH5 catalytic domain of CelE (Cthe_0797), a multifunctional endoglucanase from R. thermocellum that can hydrolyze β-1,4-linkages in cellulose, xylan, mannan, and other polysaccharides [60,61]. This breadth of activity provided an opportunity to study the abilities of different CBMs to target a single catalytic domain to different substrates. CelEcc_ CBM3a served as the starting benchmark (Fig. 4, green bars and circles). CelEcc_CBM variants were tested for hydrolysis of PASC, icelandic moss lichenan, birchwood xylan, and 1,4-β-d-mannan (Fig. 4). CelEcc_ CBM6 (purple bar) and CelEcc_CBM30 (magenta bar) displayed greater than a twofold increase in specific activity relative to CelEcc_CBM3a with PASC (indicated by a red star). For reactions with lichenan, CelEcc_CBM4-3 (red star), CelEcc_CBM13 (red star), CelEcc_CBM22-2 (yellow bar and red star), CelEcc_ CBM30 (magenta bar and red star), and CelEcc_ CBM44 (orange bar and red star) showed greater than twofold increase in hydrolytic activity relative to CelEcc_CBM3a. For reactions with xylan and mannan, CelEcc_CBM44 (orange bar and red star) showed greater than twofold increase in hydrolytic activity relative to CelEcc_CBM3a.

Binding-affinity measurements
Owing to results from the binding capability and enhancement of catalytic function when fused to CelE, we further studied the binding properties of four CBMs: CBM3a; CBM6; CBM30; and CBM44. Binding-affinity constants (K) for E. coli-expressed and -purified GFP_CBMs were calculated with PASC, icelandic moss lichenan, and oat spelt xylan ( Fig. 5; Table 3). For PASC, GFP_CBM3a, GFP_CBM30, and GFP_CBM44 were determined to have K-and c-values of 8.26, 161.04, and 3.46 mg/mL; and 1.11, 1.51, and 0.69, respectively, while the K-and c-values of GFP_CBM6 for PASC could not be determined due to low affinity. With lichenan, GFP_ CBM6, GFP_CBM30, and GFP_CBM44 had calculated K-and c-values of 110.44, 3.19, and 1.23 mg/mL, and 1.54, 0.61, and 0.31, respectively. GFP_CBM6 and GFP_ CBM44 had K-and c-values of 0.76 and 2.22 mg/mL, and 0.79 and 0.99 for xylan, respectively. K-and c-values could not be ascertained for GFP_CBM3a with lichenan and xylan, and GFP_CBM30 with xylan. Of note, none of the four CBMs selected for these studies had a sufficiently high affinity for 1,4-β-d-mannan to be determined in these experiments.

Catalysis with IL-SG
CelEcc, CelEcc_CBM3a, CelEcc_CBM6, CelEcc_CBM30 and CelEcc_CBM44 were expressed in E. coli and purified to homogeneity. Equimolar active site concentrations (0.32 nmol) of these enzymes were reacted with IL-SG and the time course of sugar release was analyzed by NIMS (Fig. 6). After ~8 h of hydrolysis at 60 °C, ~33 % of total hexose and ~56 % of total pentose sugars present in the biomass were solubilized by CelEcc alone. Four of the hybrids gave increased yield of total hexose products relative to CelEcc, with CelEcc_CBM44 giving an ~50 % yield for conversion of the cellulosic fraction of biomass to soluble products and ~60 % yield for conversion of the hemicellulose fraction of biomass to soluble products. Figure 7 shows kinetic schemes that account for the products observed by quantitative NIMS from the reaction of CelEcc_CBM hybrids with IL-SG. These schemes assign apparent rate constants that account for release of Affinity gel electrophoresis characterization of GFP_CBM binding to hydroxyethyl cellulose. GFP_CBMs purified from the translation reaction using Ni-IMAC were used in these experiments. Binding was detected as a difference in migration for the "No substrate" gel compared to the hydroxyethyl cellulose gel. Red stars indicate GFP_CBM fusions assigned to have altered migration, and so are inferred to have binding properties. Images of other electrophoresis gels containing lichenan, galactomannan, beechwood xylan, and arabinoxylan are provided in Additional file 2: Figure S1. Binding assignments made from all affinity gel electrophoresis studies are summarized in Table 1. Soybean trypsin inhibitor (STI) was used as a control soluble products from the insoluble biomass and subsequent conversion of soluble oligosaccharides into smaller molecules [61]. By use of NIMS, cascades of products from both the hexose and pentose fractions of the biomass can be monitored simultaneously, and the time courses for products observed are shown in Fig. 8 (hexose fraction) and Fig. 9 (pentose fraction). Figure 8 shows the time course for the reaction of six CelEcc_CBM hybrids with the hexose fraction of IL-SG. The solid colored lines are results of simulations of the concentration of individual products based on the kinetic scheme of Fig. 7a and the differential equations shown in Additional file 3: Differential equations. Values for the apparent rates (see "Discussion") determined from the numerical integration are presented in Table 4. As observed previously for CelEcc_CBM3a reactions [61], cellobiose (g2, purple down triangles) is the dominant product for reaction of each of the CelEcc_CBM fusions with the cellulosic fraction in IL-SG. Comparison of the progress curves for cellobiose formation shows that fusion of CelE to different CBMs changed both the magnitude of apparent rates and the overall yield for production of cellobiose. For example, CelEcc_CBM22 had the smallest apparent rates and overall yield, while CelEcc_ CBM44 had the largest apparent rates and highest yield. Figure 9 shows the time course for the reaction of six CelEcc_CBM hybrids with the pentose fraction of IL-SG. Solid lines are derived from analysis of Fig. 7b as described above; the dotted black line represents the sum of the amounts of the individual products. With the pentose fraction, pentotriose (p3, black up triangles) is the dominant product for reaction of all of the CelEcc_CBM fusions, as observed earlier for CelEcc_CBM3a [61]. Although all of the CelEcc_CBM hybrids gave a similar yield of pentotriose at the endpoint of reaction (24 h), there were substantial differences in the magnitude of the dominant apparent rate (k3) associated with its formation (Table 4). Thus, CelEcc_CBM22, CelEcc_CBM6 and CelEcc_CBM3a were least effective at enhancing the rate for pentotriose accumulation, while CelEcc, CelEcc_CBM30 and CelEcc_CBM44 were most effective. The ability of CBM30 and CBM44 to promote both rapid hydrolysis and high yield of the pentose fraction can also be compared with the reaction of CelEcc (lacking a CBM), which did not promote rapid hydrolysis, but did achieve comparable yield after 24 h of reaction (Table 4; Fig. 6).

Discussion
To begin this study, we created a plasmid that allows convenient fusion of two protein domains separated by a polypeptide linker sequence. Each of these individual parts can be iterated against each other by using four wellbehaved restriction enzymes. Using this vector, a series of GFP_CBM expression plasmids were created. The fusion proteins were produced using cell-free translation, and the binding specificities of the GFP_CBMs were measured using soluble and insoluble pure polysaccharides and biomass (Tables 1, 2, 3; Figs. 3, 4, 5; Additional file 1: Tables S5, S6). Using the single broad specificity enzyme CelE as the catalytic domain, we were also able to examine the function of enzyme_CBM fusions against a range of substrates in a controlled manner (Figs. 4, 6).
All of the GFP_CBM and CelEcc_CBM constructs made were successfully expressed using cell-free translation. At least 100 µg of the GFP_CBMs and 30 µg of the individual CelEcc_CBMs were produced by 50 µL cell-free translation and used in described assays. Thirtyfour of the GFP_CBMs produced in cell-free translation Table 1 Qualitative determination of binding specificities of GFP_CBMs to soluble substrates "B" indicates binding was detected by affinity gel electrophoresis; "-" indicates binding was not detected. Estimated Rr values for all CBMs tested in this study are shown in Additional file 1: Table S5 GFP_CBM HEC Lichenan Galactomannan Beechwood xylan Arabinoxylan bound at least one substrate, and these results are overall consistent with previously determined binding specificities [40]. For example, CBM3a did not bind to any of the soluble substrates, but interacted strongly with insoluble cellulose and biomass in pull-down assays. Since CBM3a enhances activity towards insoluble cellulosic materials [41, 61, 66-68, 78, 79], it is not surprising that most of the CBMs studied herein were less active than CelEcc_ CBM3a on PASC. Interestingly, an ~fourfold increase for CelEcc_CBM30 was determined for reaction with PASC relative to CelEcc_CBM3a. Enhanced reactivity of CelEcc_CBM30 can be rationalized by the demonstrated binding of CBM30 to both HEC and PASC [80] and increased reactivity of a GH9 cellulase with crystalline cellulose [67]. Although we observed CBM6 binding to lichenan, the enhancement of the CelE reaction with PASC ( Fig. 4) could be due to the weak interaction of CBM6 with β-1,4-linked glucan reported by Czjzek et al. [81]. CBM22-2 produced in wheat germ extract was able to bind xylan, and this is consistent with the observation that CBM22s are primarily associated with xylanases and have been shown to bind xylan [82][83][84]. However, CelEcc_CBM22 was not particularly effective at hydrolyzing the pentose fraction in IL-SG (Table 4; Fig. 9). Likewise, GFP_CBM6 (from xylanase XynA, Cthe_2972) bound to beechwood xylan and arabinoxylan [81], but pairing CelE with this CBM gave only modest catalytic results with xylans. For both xylan and mannan, CelEcc_ CBM44 showed more than a twofold enhancement in reactivity relative to CelEcc_CBM3a. CBM44 is part of CelJ (Cthe_0624), an enzyme that we showed had weak multifunctional behavior in reaction with IL-SG [61]. By combining CBM44 (diverse binding specificity) with CelE (multifunctional catalysis), we were able to create a fusion hybrid with improved reactivity.
It is also worth noting that some CBMs did not improved the catalytic activity of CelE, even though the CBM independently showed binding to one or more substrates. Thus, the lack of a correct orientation of the CBM relative to the CelE catalytic domain may influence binding and/or reactivity. For example, CBM16 and CBM54 are naturally found at the N-terminus of the catalytic domain (Additional file 1: Table S1) and perhaps need to be in this arrangement to enhance the reactivity of the catalytic domain. The linker between CelEcc and the CBM used in this study may also influence reactivity. In previous studies, linker lengths, compositions, orientations, and conformations were reported to have significant effects on enzyme reactivity [85][86][87]. The linker used in this study is naturally found in the CipA scaffoldin of the R. thermocellum cellulosome (amino acids 323-364 of Cthe_3077), and has some differences to other naturally occurring linker sequences. Half of ~40 residues of the linker used in this study are threonines and prolines, which possibly promote a flexible, extended conformation between domains [86][87][88]. Similarly, the linkers that connect CBM6 and CBM44 to the catalytic domain are ~25-30 residues long with multiple prolines and threonines that could be in an extended Table 2 Qualitative determination of binding specificities of GFP_CBMs to insoluble substrates "B" indicates binding was detected by pull-down assay; "-" indicates binding was not detected. Estimated PF %'s for all CBMs tested in this study are shown in Additional file 1: Table S6 GFP_CBM conformation In contrast, the linker in native CBM22-2 is ~25-30 residues long with only a few threonine residues, and the linker between CBM30 and the adjacent GH9 catalytic domain is less than 10 residues long, possibly indicating close association with the catalytic domain. CBMs are also found at the C-terminus relative the catalytic domain, the same as in our CelE_CBM constructs, while CBM22-2 and CBM30 are generally found at the N-terminus relative to the catalytic domain. Thus, it is possible that reactivity observed in the CBM22-2 and CBM30 constructs is influenced by an improper domain orientation.
∆% hydrolysis relative to CelE_CBM3a We found that the highest binding affinity (K, Table 3) did not always predict the highest enzymatic activity. For example, CBM3a and CBM44 had the highest affinities for PASC, but did not give the highest reactivity with this substrate. In contrast, CBM30 showed weaker binding to PASC than other CBM constructs, but CelEcc_ CBM30 showed the highest reactivity with PASC. For lichenan, all CBMs that had detectable binding affinities also increased the reactivity of CelE relative to CelEcc_ CBM3a. However, although CBM44 had higher affinity for lichenan than CBM30, CelEcc_CBM30 showed ~2× faster reaction with lichenan than CelEcc_CBM44. Similarly, while GFP_CBM6 had a higher affinity for xylan than GFP_CBM44, CelEcc_CBM44 had the highest xylan reactivity. These trends support the previous conclusion that tight binding of a CBM (possibly reflecting dominance of k on over k off for interaction with the polysaccharide) may limit the number of productive hydrolytic events by the catalytic domain. If a CBM binds too tightly, the catalytic domain may not easily access new glycosidic bonds during the time duration when the CBM is adsorbed, thus restricting the diffusion of the catalytic domain [50,52].
The binding-interaction constants provided by simulation using the logistic equation (Eq 6) also give insight into the interactions of the CBMs with the substrates. Most c-values, similar to Hill constants [89], shown in Table 3 are close to 1, indicating no higher order contributions to binding. However, the c-values >1 determined for CBM6 and CBM30 possibly indicate cooperative binding. Multiple members of the CBM6 family have been shown to have two binding sites with different binding specificities [81,90], perhaps reflecting this possibility. In contrast, c-values <1 suggest noncooperative binding. This may be due to modifications to substrate/ polysaccharide chain confirmations, as has been seen in starch-binding proteins [91], along with other Binding-affinity plots for GFP_CBM fusions. GFP_CBMs used in this experiment were expressed in E. coli and purified as described in "Methods". The fraction bound (y-axis) versus substrate concentration (x-axis) are shown for three different insoluble substrates. The plots were used to determine dissociation constants with the binding model given in Eq 6. Shaded regions around the plotted affinity curve are the mean prediction bands at the 90 % confidence level. a PASC plot and data fitting (GFP_CBM3a, brown; GFP_CBM30, blue; GFP_CBM44, red). b Lichenan plot and data fitting (GFP_CBM6, green; GFP_CBM30, blue; GFP_CBM44, red). c Xylan plot and data fitting (GFP_CBM6, green; GFP_CBM44, red) ◂ possibilities such as steric occlusion of preferred binding sites and others. The range of polysaccharides tested represents many of the most common plant carbohydrates found in ionic liquid treated biomass, including amorphous forms of cellulose and mixed-linkage β-glucan, and branched (arabinoxylan, oat spelt xylan, and galactomannan) and unbranched hemicellulose (beechwood and birchwood xylans; and 1,4-β-d-mannan). Among the CBMs tested, the majority were able to bind to linear, soluble hexose chains (e.g., HEC, lichenan, and PASC, Tables 1, 2), while fewer bound to Avicel and the hemicellulosic substrates, either linear or branched. However, several CBMs, including CBM6 and CBM44 bound to arabinoxylan (Table 1), which has partial branching [8]. These two CBMs also gave enhanced catalysis with the hemicellulosic fraction in IL-SG (Table 4; Fig. 9). Ionic liquid pretreatment of switchgrass, which has been used on the biomass used in this work, converts cellulose to an amorphous state and retains the hemicellulose [76,77,92].
The crystal structure of CelE shows a large, wide active site, which allows reactivity with multiple substrates (C. M. Bianchetti, T.E. Takasuka and B.G. Fox, unpublished data). This active site appears to be well structured to support reactions with amorphous forms of cellulose and hemicellulose, but is not as reactive with crystalline cellulose [61]. CBMs included in this work have capability for binding crystalline, linear, and branched polysaccharides [42,80,81], providing a useful diversity to match the properties of CelE.
Studies of time-dependent biomass hydrolysis using the quantitative NIMS assay and numerical simulation warrant a couple of closing comments. We describe the results of the numerical analysis as apparent rates for the  appearance of soluble products, and in doing so acknowledge the complexity of the molecular-level events contributing to each of the steps given in the kinetic schemes of Fig. 7. These schemes may underestimate the total activity of the enzyme, for example, if a hydrolysis reaction does not yield a soluble product [93] detected by the NIMS analysis. With this caveat, an individual dominant apparent rate such as k2 for release of cellobiose from the hexose fraction or k3 for release of pentotriose from the pentose fraction will include contributions from a number of microscopic steps such as the accessibility of suitable sites on the substrate for catalysis (a property of the CelE active site) and binding affinity constants (K) for polysaccharide binding (influenced by the CBMs in this work). Other microscopic steps that can affect apparent rates include chemical steps in catalysis within the enzyme active site (which will be the same in this study), product release, the presence of alternative substrates and product inhibitors (possibly including soluble oligosaccharides and some branched polysaccharide products), changes in the composition of the remaining substrate, and perhaps others [61,94,95]. The systematic iteration of CBMs versus a single multifunctional catalytic domain offers a powerful tool to examine some of these critical aspects of the interactions of enzymes with biomass substrates and products.

Conclusions
The results show that wheat germ cell-free translation can be productively used to screen the properties of CBMs as binding domains and as enhancers of catalytic activity. We have shown that fusions to different CBMs can alter the reactivity with four different polysaccharides. The combination of broad binding specificity and moderate binding affinity in the CBM with a single multifunctional GH5 catalytic domain gave best catalysis with plant biomass. We also showed that CelEcc_CBM44 alone was able to hydrolyze half of the cellulose and hemicellulose present in IL-SG in a short time regime (6 h). Other CelEcc_CBM hybrids achieved a similar endpoint yield of soluble products, albeit at slower rates. The approach of fusing different CBMs to multifunctional catalytic domains has potential to facilitate creation of new enzyme_CBM hybrids with improved reactivity for specific polysaccharide substructures within the complexity of plant biomass.

Cloning of GFP_CBM
Additional file 1: Table S1 summarizes properties of the genes from where the selected CBMs were extracted, their amino acid sequences, and molecular weights. Nucleotide sequences were retrieved from NCBI (http:// www.ncbi.nlm.nih.gov/nuccore) and UniProt (http:// www.uniprot.org/uniprot/ [96]). Nucleotide sequences encoding each CBM domain were selected, and PCR primer pairs were designed (Additional file 1: Table S2) to amplify the gene fragment of interest [60]. The forward primer of 5′-GCGAACACCCTTAAG-3′ was followed by the gene specific sequence encoding the N-terminal sequence of the CBM, while the reverse primer of 5′-TCTAGAGGATCCTTA-3′ was followed by the gene specific sequence encoding the C-terminal sequence of the CBM. The forward and reverse primers provided AflII and BamHI sites at the 5′-and 3′-ends of the amplicon, respectively. R. thermocellum ATCC 27405 genomic or synthetic (Additional file 1: Table S3) DNAs were used as PCR templates, and amplified PCR products were digested using AflII and BamHI (Promega, Madison WI). The different CBM sequences were ligated into the C-terminal domain position, which is flanked by AflII and BamHI restriction sites. The nucleotide sequence encoding the protein linker between the N-and C-terminal domains (protein sequence of N-NATPTKGAT-PTNTATPTKSATATPTRPSVPTNTPTNTPANT-C) was not modified from the parent CipA sequence. Plasmids a hexose polysaccharide -ose -triose -biose y [2] y [3] y [4] y [5] b pentose polysaccharide -ose -triose -biose y [2] y [3] y [4] y [5] y [6]  isolated from successful transformations were sequenceverified by using the universal forward and reverse primers shown in Additional file 1: Table S2 at the University of Wisconsin-Madison Biotechnology Center.

Cloning of CelEcc_CBM
GFP_CBM constructs produced as described above and a previously created CelEcc_CBM3a plasmid were separately digested with SgfI and PmeI in order to obtain the nucleotide sequences encoding the CBM and CelEcc fragments. The digested plasmid and insert fragments were then gel-purified and ligated to form the CelEcc_ CBM fusion plasmids [60]. All CelEcc_CBM plasmids were sequence-verified as described above.

Cell-free translation
Plasmids encoding the individual GFP_CBM or CelEcc_ CBM hybrids were prepared by mini prep (Qiagen, Germany) and cell-free protein syntheses [60,61]. After mini prep, the plasmid DNA was treated with proteinase K (Sigma-Aldrich, St. Louis, MO, USA) in 10 mM Tris-HCl, pH 8.0, 5 mM EDTA and 0.1 % (w/v) SDS for 1 h at 37 °C to remove contaminating RNase. The proteinase K treatment was followed by phenol/chloroform extraction and ethanol precipitation. After ethanol precipitation, the concentration of plasmid DNA was measured using a Nanodrop 2000C spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). The plasmid DNA was adjusted to 1 µg/µL for use in the transcription and The wheat germ cell-free translation reaction was performed by bilayer method for 24 h at 37 °C with 59 µL of translation mixture [containing 56 µL of WEPRO2240H wheat germ extract (CellFree Sciences, Yokohama, Japan), 0.1 mM amino acids mix, and 0.07 µg/µL creatine kinase (Roche, Basel, Switzerland)] and 1.1 mL of translation buffer [1× Solutions 1, 2, 3 and 4 (CellFree Sciences, Yokohama, Japan)]. Translated proteins were visualized by SDS-PAGE, and the amount of produced protein was estimated using a gel imager (Bio-Rad, Hercules, CA, USA) (Additional file 1: Table S4). Cell-free reactions with empty vector were used as controls for protein translation, and in enzyme and pull-down assays.
Translated GFP_CBMs were used in pull-down assays with insoluble polysaccharides without further purification from the translation reaction mixture. Translated GFP_CBMs used in affinity gel electrophoresis were purified from the translation reaction mixture using Ni beads (GE Healthcare, Piscataway, NJ, USA) [60]. The translated protein was incubated and mixed with Ni beads in 100 mM MOPS, pH 7.4, containing 300 mM NaCl, 2 mM CaCl 2 and 25 mM imidazole to bind protein to beads. Then, the Ni beads were washed three times with 100 mM MOPS, pH 7.4, containing 300 mM NaCl, 2 mM CaCl 2 and 50 mM imidazole. Protein was eluted twice and combined from the Ni beads with 100 mM MOPS, pH 7.4, containing 300 mM NaCl, 2 mM CaCl 2 , and 250 mM imidazole. The purified protein was buffered exchanged into 100 mM MOPS, pH 7.4, containing 50 mM NaCl, and 2 mM CaCl 2 using VIVASPIN500 (Sartorius Stedim, Bohemia, NY, USA).
CelEcc_CBMs prepared using cell-free translation were used in enzyme assays without further purification from the translation reaction mixture. Previous studies have   established that the wheat germ extract has no endogenous enzymes capable of reacting with polysaccharides studied here [60]. All proteins synthesized by cell-free translation were checked for fractional solubility by SDS-PAGE after centrifugation at 13,200×g for 10 min at 4 °C. Solubility was assessed by the ratio of intensities for the expressed protein remaining in the supernatant after centrifugation as compared to before centrifugation. All of the constructs described here showed greater than 95 % solubility after cell-free translation.

Cloning, expression in Escherichia coli, and purification
Polymerase incomplete primer extension [97] was used to transfer the nucleotide sequences encoding the GFP_CBMs and CelEcc_CBMs from their respective pEUTTJW plasmids into the E. coli expression vector pVP67K [98]. The primer pair used to amplify the GFP_ CBM and CelEcc_CBM genes (Additional file 1: Table  S2) matches a portion of the sequence from pVP67K [98], while the primer pair used to amplify pVP67K matches the corresponding sequences on pVP67K. The PCR amplification of pEUTTJW and pVP67K were carried out in separate reactions. After the PCR, aliquots (2 µL) from the two PCR were mixed and immediately transformed into competent E. coli BL21-CodonPlus (DE3)-RILP cells (Agilent Technologies, Santa Clara, CA, USA). The transformed cells were plated onto LB agar plates containing 50 µg/mL kanamycin and 34 µg/mL chloramphenicol, and viable transformants were screened for plasmids containing inserts. GFP_CBM and CelEcc_CBM inserts in pVP67K were sequence-verified as described above using the universal forward and reverse primers shown in Additional file 1: Table S2. Further details on the construction and reactivity of CelEcc and CelEcc_CBM3a are provided in a previous study [61]. E. coli BL21-CodonPlus (DE3)-RILP cells containing a GFP_CBM, CelEcc or CelEcc_CBM expression plasmid were grown in 10 mL of noninducing medium [98] containing 50 µg/mL kanamycin and 34 µg/mL chloramphenicol for 12 h at room temperature, and then transferred into 500 mL of auto-induction medium containing the same antibiotics for 16 h at 37 °C [98]. Cells were harvested by centrifugation at 5000×g for 20 min. The cell pellet was suspended in 20 mM Tris HCl, pH 7.0, containing 1 mM EDTA, a protease inhibitor cocktail containing 1 µM E-64 (Sigma-Aldrich, St. Louis, MO, USA), and 0.5 mM benzamidine (Calbiochem, Spring Valley, CA, USA). The suspended cells were sonicated with a cycle of 15 s on and 15 s off for 10 min on ice. The sonicated cells were centrifuged at 20,000 rpm for 60 min at 4 °C, and the supernatant was loaded onto a HisTrap column (1.6 cm dia × 2.5 cm bed height, GE Healthcare, Piscataway, NJ, USA) equilibrated with 100 mM MOPS, pH 7.0, containing 500 mM NaCl. After loading, the column was washed with 10 volumes of the same buffer. The bound protein was eluted with a linear 100 mL gradient of 100 mM MOPS, pH 7.0, containing 500 mM NaCl and 0.5 M imidazole. To cleave the His-tag from the N-terminus of the fusion protein, 40 µg of His-tagged tobacco etch virus (TEV) protease was mixed with 1 mg of the protein sample, and incubated for 12 h at 4 °C [99]. Subtractive immobilized metal affinity chromatography was used to separate the His-tag-free protein from unreacted sample and His-tagged TEV protease. The His-tag-free protein was concentrated using VIVASPIN20 (Sartorius Stedim, Bohemia, NY, USA) at 4200×g to a final concentration of ~10 mg/mL. The His-tag was not cleaved from the N-terminus of GFP_CBM constructs. The protein concentration was estimated by BCA assay (Bio-Rad, Hercules, CA, USA), as well as spectrophotometrically at 280 nm by using the extinction coefficients calculated from the amino acid sequences of the constructs.

Soluble substrate binding assay
Affinity gel electrophoresis was performed to test binding specificities of the GFP_CBMs to various soluble polysaccharides (HEC, icelandic moss lichenan, carob galactomannan, beechwood xylan, and wheat flour arabinoxylan) [40,106]. Continuous 6 % polyacrylamide gels (29:1, acrylamide:bisacrylamide) containing 0.1 % (w/v) of soluble polysaccharide were prepared with Bio-Rad Criterion empty cassettes and 26-well combs (Bio-Rad, Hercules, CA, USA) in the presence of 2 mM CaCl 2 . Soybean trypsin inhibitor (STI, 5 μg) (Sigma-Aldrich, St. Louis, MO, USA) was used as an internal loading standard, and approximately 150 ng of each purified GFP_CBM was used for affinity gel electrophoresis. Electrophoresis was performed at 4 °C and pH 8.3 for 75 min at a constant voltage of 150 V in a Criterion Electrophoresis cell (Bio-Rad, Hercules, CA, USA). Gels were silverstained to detect the protein [107]. Briefly, the gels were soaked in fixing solution (500 mL methanol, 120 mL acetic acid, 0.5 mL 37 % formaldehyde in a total volume of 1 L made up with deionized water) for 1 h, washed three times in 50 % ethanol for 5 min, and then treated with 0.81 mM Na 2 S 2 O 3 •5H 2 O for 1 min. The gels were rinsed three times with de-ionized water for 20 s and then placed in staining solution (12 mM AgNO 3 , 0.75 mL/L 37 % formaldehyde) for 1 h. The gels were rinsed an additional three times with de-ionized water for 20 s, then placed in developing solution (0.57 M Na 2 CO 3 , 0.5 mL/L 37 % formaldehyde, 20 μM Na 2 S 2 O 3 •5H 2 O) for 5 to 10 min, and rinsed two times with de-ionized water for 5 s. The development was halted with 50 % methanol, 12 % acetic acid for 10 min, and washed in 50 % methanol for 20 min. Gel images were obtained using the Gel Doc EZ system (BioRad), and analyzed for the presence of binding by the calculation of relative mobility ratios (R r ) and visual inspection. R r values were calculated by the following equation: where R p is the relative mobility of a GFP_CBM compared to STI in the presence of substrate, and R n is the relative mobility of a GFP_CBM compared to STI in the absence of substrate [40,106]. A R r less than 0.750 was chosen to indicate GFP_CBM binding to decrease the chances of observing false-positive binding. The R r values are listed in Additional file 1: Table S5.

Insoluble substrate pull-down assay
Pull-down assays were used to test binding specificities of the GFP_CBMs to insoluble polysaccharides (Avicel PH-101, PASC, 1,4-β-d-mannan, birchwood xylan, AFEX-SG, and IL-SG). Aliquots (25 µL) of cellfree expressed, unpurified GFP_CBM were mixed with 1 mg of substrate in a final volume of 100 μL in 96-well microtiter plates, giving a final reaction concentration of 10 mg/mL insoluble substrate in 50 mM MES, pH 6.0, containing 2 mM CaCl 2 . Pull-down assays of protein in the absence of substrate were performed as a control, and all binding experiments were done in triplicate. Samples were incubated for 1 h at 4 °C and shaken at 600 rpm with a Thermo Scientific Titer Plate Shaker (Model No. 4625) (Thermo Fisher Scientific, Waltham, MA, USA), and then spun at 4300×g for 10 min at 4 °C. Aliquots (20 µL) of the sample supernatants were mixed with 20 μL of de-ionized water, and the fluorescence was measured with excitation at 488 nm and excitation at 510 nm. Supernatant aliquots of the no-substrate samples were taken before the 10-min spin for use as the total fluorescence control to account for any protein precipitation during subsequent calculations. Cell-free expressed GFP alone was also assayed to determine if there were interactions between GFP and the insoluble substrates tested. To calculate the substrate-bound fraction of a GFP_CBM, pellet fluorescence was calculated for each substrate/ no substrate combination (F s and F ns ) by subtracting the supernatant fluorescence, f, from total fluorescence, T, in the no-substrate reaction sample before the 10-min spin.
Normalized pellet fraction percentages (PF %) were calculated to account for protein precipitation. The no-substrate pellet fluorescence, F ns , was subtracted from the pellet fluorescence of a substrate-containing reaction, F s , and then divided by T and multiplied by 100. (2) [GFP_CBM migration (mm)/STI migration (mm)] with substrate [GFP_CBM migration (mm)/STI migration (mm)] without substrate The PF % determined from GFP alone was subtracted from the PF % of the GFP_CBM constructs to remove the influence of GFP-substrate interactions in the observed PF %. A normalized PF % (with GFP PF % subtracted) of 10 % or greater was chosen to indicate GFP_CBM binding to decrease the chances of observing false-positive binding. The normalized PF % values are listed in Additional file 1: Table S6.

Insoluble substrate binding affinity measurements
A range from 0 to 10 mg/mL of PASC, icelandic moss lichenan, 1,4-β-d-mannan, and oat spelt xylan were mixed with 0.5 μM of a GFP_CBM in a final volume of 400 μL of 25 mM Tricine, pH 8.0, with 188 mM NaCl, 2 mM CaCl 2 , and 1 mg/mL BSA in 2.0-mL microcentrifuge tubes. Binding reactions were carried out in triplicate. Control reactions with GFP_CBM at concentrations ranging from 0 to 0.85 μM in the absence of substrate were used to create standard curves to determine the amount of unbound protein remaining in the supernatant of a reaction containing substrate. Reactions were incubated for 1 h at 4 °C and shaken at 1200 rpm using an Eppendorf Thermomixer R (Eppendorf North America, Hauppauge, NY, USA) followed by centrifugation at 4300×g for 10 min at 4 °C. The fluorescence of 200 μL aliquots of reaction supernatants was measured with excitation at 488 nm and excitation at 510 nm. E. coliexpressed GFP was used as a control for nonspecific interactions of the GFP domain.
The fraction of GFP_CBM bound, θ, was calculated using Eqs. 4 and 5 where B represents bound concentration, 0.5 μM representing the concentration of GFP_CBM added to the reaction, f representing supernatant fluorescence, and m representing the slope of the standard curve for no-substrate sample supernatant fluorescence versus GFP_CBM concentration. GFP_CBM fraction bound values were plotted versus substrate concentrations (mg/mL), [S], and the plots were used to determine dissociation constants (mg/mL), K, and binding-interaction constants, c, using a logistic equation as the binding model and fitting Eq. (6). Dissociation constants were calculated using the NonlinearModelFit routine in Mathematica (Wolfram, Champaign, IL, USA).

Enzyme assays with pure substrates
For reaction with PASC, a 15-μL aliquot of the cell-free translation reaction was combined with 35 μL of MES buffer, CaCl 2 and PASC to give concentrations of 50 mM MES, pH 6.0, 2 mM CaCl 2 and 10 mg/mL of PASC. This solution was reacted for 20 h at 60 °C. For reaction with either icelandic moss lichenan, birchwood xylan, or 1,4-β-d-mannan, a 5-μL aliquot of the translation reaction was combined with 45 μL of MES buffer, CaCl 2 , and substrate to give final concentrations of 50 mM MES buffer, pH 6.0, 2 mM CaCl 2 and 10 mg/mL of polysaccharide and reacted for 20 h at 60 °C. DNS assays of reducing sugars were performed as described previously [108]. Briefly, 30 μL of supernatant from the reaction was mixed with 60 µL of DNS reagent and incubated for 5 min at 95 °C. The color change was monitored at 540 nm, and total reducing sugar content was estimated by comparison to standard curves prepared using d-glucose. All enzyme reactions were performed in triplicate.

NIMS analysis of reactions with IL-SG
Synthesis of the O-alkyloxyamine fluorous-tagged NIMS reagent has been published [61]. Reactions of CelEcc and CelEcc_CBM variants were carried out in 50 mM phosphate, pH 6.0 and IL-SG present at 10 mg/mL. For these studies, the enzymes were expressed in E. coli and purified as described above. The concentrations of purified enzyme stock solutions were CelEcc (18 mg/ mL, 38,230 Da); CelEcc_CBM3a (8 mg/mL, 60,118 Da); CelEcc_CBM6 (20 mg/mL, 55,460 Da); CelEcc_CBM30 (26 mg/mL, 64,687 Da); and CelEcc_CBM44 (5 mg/mL, 59,308 Da). Reactions were designed to contain equimolar amounts of enzyme-active sites (0.32 µmol), and all reactions were carried out at 60 °C for 24 h. At 1, 2, 4, 8, and 24 h, a 2-µL aliquot of the reaction mixture was transferred into a vial containing 6 µL of 100 mM glycine acetate, pH 1. Quenched reaction mixtures were incubated at room temperature for 16 h, and then a 0.12-µL aliquot was spotted onto the surface of the NIMS chip and removed after 30 s. A grid drawn manually on the NIMS chip using a diamond-tip scribe helped in spotting and identification of sample spots in the spectrometer. NIMS chips were loaded using a modified standard MALDI plate and analyzed using a 4800 MALDI TOF/TOF mass spectrometer (Applied Biosystems, Foster City, CA,