NMR Elucidation of Non-productive Binding Sites of 13C-Labeled Lignin Models with Carbohydrate Binding Module of Cellobiohydrolase I for Efficient Biomass Conversion


 Background: Highly efficient enzymatic saccharification of pretreated lignocellulose is a primary key step in achieving lignocellulosic biorefinery. Cellobiohydrolase I (Cel7A) secreted by Trichoderma reesei is an industrially used cellulase possessing carbohydrate binding module 1 (TrCBM1) as the C-terminal domain. Non-productive binding of TrCBM1 to lignin significantly decreases enzymatic saccharification efficiency and enhance cost of biomass conversion due to required additional enzymes. Understanding of the interaction mechanism between lignin and TrCBM1 is essentially required to realize cost-effective biofuels production, but the binding sites in lignin have not been clearly elucidated. Results: Three types of 13C-labeled b-O-4 lignin oligomer models were synthesized and characterized. The 2D 1H-13C HSQC spectra of the 13C-labeled lignin models exhibited that 13C-labels were correctly incorporated in the (1) aromatic rings and b positions, (2) a positions, and (3) methoxy groups, respectively. The TrCBM1 binding sites in lignin were analyzed by observing NMR chemical shift perturbations (CSPs) using the synthetic 13C-labeled b-O-4 lignin oligomer models. Obvious CSPs were observed in signals from the aromatic regions in oligomers bound to TrCBM1, whereas perturbations in the signals from aliphatic regions and methoxy groups were insignificant. This indicated that hydrophobic interactions and p–p stacking were dominating factors in non-productive binding. The synthetic lignin models have two configurations whose terminal units were differently aligned and donated C(I) and C(II). The C(I) ring showed remarkable perturbation compared with C(II), which indicated that binding of TrCBM1 is evidently affected by configuration of lignin models. Long-chain lignins (DP 4.16–4.70) clearly bound to TrCBM1. Interactions with short-chain lignins (DP 2.64–3.12) were insignificant, indicating that a DP greater than 4 was necessary for TrCBM1 binding. Conclusion: The CSP analysis using 13C-labeled b-O-4 lignin oligomer models enabled us to identify TrCBM1 binding sites in lignin at the atomic level. This specific interaction analysis will lead to new molecular design of cellulase having controlled affinity to cellulose and lignin for cost-effective biorefinery process.

2D 1 H- 13 C HSQC spectra of the 13 C-labeled lignin models exhibited that 13 C-labels were correctly incorporated in the (1) aromatic rings and b positions, (2) a positions, and (3) methoxy groups, respectively. The TrCBM1 binding sites in lignin were analyzed by observing NMR chemical shift perturbations (CSPs) using the synthetic 13 C-labeled b-O-4 lignin oligomer models. Obvious CSPs were observed in signals from the aromatic regions in oligomers bound to TrCBM1, whereas perturbations in the signals from aliphatic regions and methoxy groups were insigni cant. This indicated that hydrophobic interactions and p-p stacking were dominating factors in non-productive binding. The synthetic lignin models have two con gurations whose terminal units were differently aligned and donated C (I) and C (II) . The C (I) ring showed remarkable perturbation compared with C (II) , which indicated that binding of TrCBM1 is evidently affected by con guration of lignin models. Long-chain lignins (DP 4. 16-4.70) clearly bound to TrCBM1. Interactions with short-chain lignins (DP 2.64-3.12) were insigni cant, indicating that a DP greater than 4 was necessary for TrCBM1 binding.
Conclusion: The CSP analysis using 13 C-labeled b-O-4 lignin oligomer models enabled us to identify TrCBM1 binding sites in lignin at the atomic level. This speci c interaction analysis will lead to new molecular design of cellulase having controlled a nity to cellulose and lignin for cost-effective biore nery process.

Background
The enzymatic sacchari cation of lignocellulose is a key process for the sustainable manufacture of green chemicals and biofuels [1]. Trichoderma reesei is a lamentous fungus that is widely used for the production of commercially available cellulolytic enzyme cocktails. Cellobiohydrolase I (Cel7A) accounts for up to 60% of the cellulases secreted by T. reesei, and carbohydrate-binding module 1 (TrCBM1) (Fig. 1) is connected to the C-terminus of the Cel7A catalytic domain by a highly glycosylated linker [2]. TrCBM1 enhances enzymatic activity by binding to cellulose [3]. TrCBM1 also has a strong a nity for lignin, so lignin signi cantly inhibits the enzymatic sacchari cation of pretreated lignocellulose [4]. Non-productive binding of TrCBM1 to lignin should thus be suppressed to achieve e cient enzymatic sacchari cation. However, the mechanism of interaction is not entirely understood at the molecular level.
Binding of TrCBM1 to lignin is affected by various sacchari cation conditions, such as temperature [6], pH [7], and the substrate concentration [8]. The chemical properties of lignin have a critical impact on its cellulase binding a nity. Many pretreatments increase the number of phenolic OH groups in lignin and its degree of condensation, which enhance binding between cellulase and lignin [9,10]. Electrostatic repulsion due to large numbers of aliphatic OH groups and negatively charged functionalities, such as carboxyl and sulfone groups, interferes with cellulase binding [9,11]. Softwood lignin is mainly synthesized via the radical coupling of guaiacyl monomers, whereas hardwood lignin is synthesized from both guaiacyl (G) and syringyl (S) monomers. The ratio of syringyl to guaiacyl units, denoted by S/G, is one of the most important factors governing the physicochemical properties of ligni ed plant cell walls.
The S/G ratio affects non-productive binding, but it is not the decisive factor for adsorption. Guo et al. found that a low S/G ratio corresponded to a high adsorption capacity [12]. Yang et al. reported that organosolv lignin isolated from softwood lodgepole pine had a higher adsorption a nity for commercial cellulase than lignin from hardwood poplar [13]. However, genetic engineering studies have suggested the opposite, and comparisons of enzymatic hydrolysis in alfalfa cultivars and Eucalyptus mutants with high and low S/G ratios have yielded inconsistent results [14,15]. Although the relationship between the chemical structure of lignin and its binding a nity for cellulolytic enzymes and CBMs has been researched extensively, evidence for the identi cation of binding atoms in whole lignin molecules is entirely lacking. This is because no effective method for identifying the interactive sites in lignin, a heterogenous polymer, is available.
NMR chemical shift perturbation (CSP) analysis is a powerful method for the identi cation of substrate binding sites in proteins [16]. We previously performed CSP analysis to identify the binding amino acid residues in TrCBM1 with softwood and hardwood milled wood lignin (MWL) using 15 N-labeled TrCBM1 [17]. The results suggested that the aromatic rings in lignin participated in interactions with amino acid residues in TrCBM1, because the at plane surface including Y5, Y31, and Y32 in TrCBM1 was a main interaction site. However, the binding sites in lignin have not been characterized at the molecular level. In this study, we synthesized 13 C-labeled β-O-4 lignin oligomer model compounds with different labelling positions and performed CSP analysis to identify the TrCBM1 binding sites in the lignin models. Pure TrCBM1 without a catalytic domain and linker was expressed and puri ed and then added to NMR sample tubes containing the 13 C-labeled lignin models to monitor perturbations in signals from the binding sites in the model compounds. NMR analysis of the 13 C-labeled lignin model compounds provided the rst direct evidence for the identi cation of binding atoms in the linear lignin chains. The results were in good agreement with our previous binding site analysis of the protein counterpart, TrCBM1 with MWL [17]. Herein, we provide direct evidence to reveal the interaction sites in the β-O-4 lignin substructure that bind to TrCBM1.

Results
Expression and puri cation of TrCBM1 Escherichia coli BL21 (DE3) was used to express a histidine (His) tag-TrCBM1-green uorescent protein (GFP) fusion protein. TrCBM1 was obtained after the addition of enterokinase and thrombin to cleave the His tag and GFP, respectively. We previously performed sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), matrix-assisted laser desorption/ionization time-of-ight mass spectrometry (MALDI-TOF-MS) and 2D 1 H-15 N HSQC NMR to analyze the molecular mass and conformation of 15 Nlabeled TrCBM1 [17]. In this work, we performed SDS-PAGE and MALDI-TOF-MS to characterize unlabeled TrCBM1. Pure TrCBM1 was a single protein with a molecular mass of 5195.8 Da. The MALDI-TOF-MS spectrum of TrCBM1 and a full-length SDS-PAGE gel are shown in Figure S1 of the Additional le.
The 2D 1 H-13 C HSQC spectra of the 13 C-labeled lignin models contained signals that corresponded to the 1 H-13 C correlations (Fig. 3), which was evidence that 13 C was incorporated at the designated positions.
No HSQC signals attributable to side products were observed in the NMR spectra, which indicated that high-purity lignin models were prepared. HSQC signals at δ C /δ H 85. The long-and short-chain lignin models were separated via silica gel chromatography and subjected to NMR and binding analysis. Size exclusion chromatography (SEC) revealed that the 13 C-labeled lignin models had narrow molecular weight distributions ( Figure S2, Additional le). The degree of polymerization (DP) was calculated for each model compound from its weight-average molecular weight (Mw). The DPs of the long-chain lignin models ranged from 4.16 to 4.70, whereas those of the shortchain models ranged from 2.64 to 3.12 (Table 1). Although differences between the Mws of the long-and short-chain models were small, their NMR spectra (Fig. 3d) and SEC pro les ( Figure S2, Additional le) were clearly distinct. Therefore, the long-and short-chain lignin models could be used to evaluate the effects of Mw on molecular interactions with TrCBM1. Table 1 Molecular weight parameters of 13  a DP: Degree of polymerization calculated from the Mw and theoretical molecular mass of the lignin model.

Analysis of interactions between lignin models and TrCBM1
Interactions between the lignin models and TrCBM1 were evaluated via CSP analysis using 2D 1 H-13 C HSQC NMR. To analyze the binding positions in lignin model compounds 4 (Arβ) , 4 (α) , and 4 (m) with a high degree of sensitivity, the compounds were separated into high and low molecular mass fractions.
Assignment of the 13 C-lignin model HSQC signals was accomplished for all of the carbon atoms and non-exchangeable protons. Almost all of the signals from the A, B, and C rings were assigned separately, because the tops of the peaks were distinguishable [18]. We previously found that the lignin models have two diastereomers. Although these A and B rings have similar alignments, the C ring are differently aligned and designated as C (I) and C (II) [18]. Full assignment of the peaks in the lignin model spectra enabled us to analyze the interactions of carbon atoms and protons at the atomic level.
The 2D 1 H-13 C HSQC spectra of the lignin models (50 µM) in the presence and absence of TrCBM1 were superimposed (Fig. 4). Several HSQC signals acquired in the presence of 200 µM TrCBM1 showed obvious perturbation, particularly those of the aromatic regions in the long-chain lignin spectra. Most of these signals exhibited larger perturbations when the concentration of TrCBM1 was increased to 350 µM (Fig. 4b). In contrast, there was no signi cant perturbation of NMR signals from the aliphatic regions and methoxy groups in the presence of TrCBM1 (Fig. 4c). The results clearly showed that the aromatic regions in the long-chain lignin models were the primary sites of interaction with TrCBM1. However, no distinct perturbations were observed in signals from the methoxy groups or any of the aromatic and aliphatic regions when the short-chain lignin models were used for CSP analysis ( Figures  and C 5 (I) signals on the 13 C-axis were greater than 0.05 ppm (Fig. 5a). For the short-chain lignin models, only the Δδ values of the B 5 signals on the 1 H-axis exceeded 0.006 ppm (Fig. 5b). Line broadening was observed in the B 5 signals of the long-chain lignin models in the presence of TrCBM1 (350 µM). The observed line broadening could be attributed to both the on and off rates of complex formation and the multiple binding states of lignin due to non-speci c binding [17].

Adsorption experiments with the lignin models
The unlabeled long-and short-chain lignin models of compound 4 were used to evaluate TrCBM1 binding a nity according to the Langmuir adsorption model. The Mws and molecular weight distributions of the unlabeled lignin models (Table 1) were nearly identical to these of the 13 C-labeled lignin models ( Figure  S2, Additional le). The TrCBM1-His tag fusion protein was used instead of TrCBM1 without a His tag to conduct the adsorption experiments, because the soluble lignin models could not be separated from the TrCBM1 protein via centrifugation. After incubating the sample solutions containing the lignin models and TrCBM1-His tag at 50 °C for 1 h, cOmplete His Tag Puri cation Resin (Roche, Basel, Switzerland) was added to the solutions. The unbound lignin models in the supernatant could then be separated from the bound lignin models, which were adsorbed to the precipitated TrCBM1-His tag bound to the His tag resin. We con rmed that all of the TrCBM1-His tag would adsorb to the His tag resin by performing control experiments. We did not observe unwanted binding between the His tag resin and lignin models ( Figures S6 and S7, Additional le). The concentrations of the unbound lignin models in the supernatants were determined to calculate the adsorption parameters summarized in Table 2. Although the long-and short-chain lignin models had similar Γ max values, the K L of the long-chain lignin model was eight times higher than the K L of the short-chain lignin model. This result was consistent with the results of the NMR interaction analysis. The percentages of the lignin models that bound to TrCBM1 were calculated using the K L values. In the CSP analysis, 84.6% and 91.4% of the long-chain lignin models were bound to TrCBM1 at TrCBM1 concentrations of 200 and 350 µM, respectively. At TrCBM1 concentrations of 200 and 350 µM, 33.0% and 46.6% of the short-chain lignin models, respectively, were bound to TrCBM1. The chain length was thus an essential factor in binding between TrCBM1 and the lignin chains, which contained β-O-4 linkages exclusively. We also found that strong adsorption required a DP above 4.

Discussion
Understanding the mechanism of interaction between cellulase and lignin is essential for the e cient enzymatic sacchari cation of lignocelluloses. T. reesei is the most important industrial cellulaseproducing lamentous fungus. Cel7A is the most abundant secreted cellulolytic enzyme and contains a catalytic domain and TrCBM1. Non-productive binding between TrCBM1 and lignin decreases the e ciency of enzymatic sacchari cation of pretreated plant biomass, but the interactive sites in lignin have not been previously elucidated.
The at plane surface of TrCBM1 binds to both lignin and cellulose, and contains the three tyrosine residues (Y5, Y31, and Y32) and neighboring amino acid residues in the underside of TrCBM1 (Fig. 1) [7,19]. T17, V18, and T24 residues are present in the cleft, which is located on the opposite side of the at plane surface. This site also interacts with lignin and cellulosic substrates [17]. Although the TrCBM1 binding sites have been extensively investigated via point mutation analysis and NMR, evidence to identify the lignin atoms that participate in TrCBM1 binding is lacking. Therefore, we conducted CSP analysis to investigate interactions between 13 C-labeled β-O-4 lignin oligomer compounds and TrCBM1. Interactive sites in the long-chain lignin model were mapped in Fig. 6, which are based on Δδ values observed in CSP analysis. Interactions between the aromatic rings and TrCBM1 were obvious, whereas the aliphatic regions and methoxy groups were not major binding sites. This suggests that TrCBM1 adsorption on lignin occurs via hydrophobic interactions and π-π stacking of the aromatic rings in lignin and the three tyrosine residues on TrCBM1. Rahikainen et al. discussed the importance of TrCBM1 hydrophobicity. They reported that a Y32A TrCBM1 mutant had a lower association constant than wildtype TrCBM1, whereas a Y32W mutant increased the lignin and cellulose binding a nities of TrCBM1 [7]. The hydrophobicity of lignin results in a signi cant amount of non-productive binding with cellulase [13]. Therefore, hydrophobic interaction can reasonably be interpreted as a dominant driving force behind nonproductive TrCBM1 binding by lignin.
Interestingly, there is a variety of Δδ in the aromatic rings of lignin model shown in Fig. 6. The interaction patterns of terminal units on the A and C (I) rings exhibited similar interaction patterns although the A 5 position was not evaluated due to its extremely weak HSQC signal intensity. Internal units, B ring, showed remarkable interactions with TrCBM1 only at the B 5 position, suggesting that the interaction patterns of the terminal and internal units of the lignin model differed. We previously reported two con guration of lignin models whose terminal units were differently aligned and designated them C (I) and C (II) [18]. We demonstrated that the TrCBM1 binding behavior depended signi cantly on the con gurations of the lignin model. The molecular alignment of C (I) apparently resulted in preferential binding to TrCBM1. We are thus the rst to reveal that interactive sites in the lignin chains are signi cantly in uenced by their molecular con gurations.
Hydrogen bonding and electrostatic interactions are not negligible in non-productive TrCBM1 binding. Phenolic OH groups in lignin promote hydrogen bonding to cellulolytic enzymes [10,20]. Hydrophilic amino acid residues in TrCBM1, including Q7 and T17 in addition to main chain of H4 and I11 were previously shown to participate in binding with lignin via hydrogen bonds and electrostatic interactions [17]. In our CSP analysis, aromatic rings in the lignin models were the primary sites of interaction with TrCBM1. This was attributed mainly to hydrophobic interactions. However, the results could also be interpreted in terms of hydrogen-π interactions between the aromatic rings in lignin and OH groups on amino acid residues [21,22]. OH groups in the three tyrosine residues on the at plane surface of TrCBM1 and the OH group in the T17 residue in the cleft were thought to donate hydrogens for hydrogen-π interactions. Although CSP analysis indicated that interactions at the aliphatic positions were insigni cant compared with those at the aromatic positions, signals from the α positions had slightly higher Δδ values than signals from the other aliphatic positions in the lignin models (Fig. 5). This result could be attributed to hydrogen bonding between OH groups near the α positions of the lignin model and TrCBM1.
It is uncertain whether there is a correlation between the molecular weight of lignin and non-productive binding [13,23]. This is because a decrease in the molecular weight of lignin changes its hydrophobicity and OH content, which are the predominant factors affecting non-productive binding. Changes in these structural features of lignin have disturbed us to estimate the effect of the molecular weight of lignin on cellulase binding. Our CSP results clearly indicated that the long-chain lignin model had a stronger a nity for TrCBM1 than the short-chain lignin model (Table 2). Mattinen et al. found that cellohexaose adsorbed to TrCBM1, whereas short cellooligosaccharides, such as cellobiose and cellotriose, did not bind to TrCBM1 [24]. Based on these results and our observations, stacking on the at plane surface of TrCBM1 and interactions with the cleft required lignin and cellooligosaccharides with long chains. The low a nity of the short-chain lignin model could also be attributed to its conformation. The short-chain lignin model had a folded conformation in 90% water [18], and the folded short lignin chain could not cover the full length of the three tyrosine residues on the at plane surface of TrCBM1. Therefore, we concluded that β-O-4 lignin chains with DPs above 4 were essential for strong adsorption of TrCBM1.

Conclusions
Highly e cient enzymatic sacchari cation is hindered by the non-productive binding of lignin to cellulolytic enzymes. Until now, the binding mechanism was not fully understood. We synthesized 13 Clabeled β-O-4 lignin oligomer model compounds to identify the TrCBM1 binding sites in lignin via NMR CSP analysis. Signals from the aromatic regions in the lignin models exhibited obvious perturbations, whereas signals from the aliphatic regions and methoxy groups were not signi cantly perturbed. These results suggested that hydrophobic interactions and π-π stacking were the principal forces driving interactions between aromatic rings in the lignin models and tyrosine residues in TrCBM1. TrCBM1 bound differently with terminal and internal units in the lignin models. In addition, the binding patterns associated with the C (I) and C (II) terminal alignments differed. This indicated that binding of the lignin models to TrCBM1 was strongly affected by the molecular con guration. Perturbation of signals from the long-chain lignin models (DP 4.16-4.70) was obvious due to their strong binding a nity relative to that of the short-chain lignin models (DP 2.64-3.12). This indicated that a chain length greater than DP4 was necessary for strong interactions between lignin and TrCBM1. This is the rst study to characterize the interactive sites in a lignin model compound at the atomic level using puri ed TrCBM1. A detailed understanding of non-productive binding will lead to the establishment of a fundamental theory for the structural alteration of lignin and enzymes that are not susceptible to unfavorable binding.

Preparation of TrCBM1
The CBM1 of T. reesei Cel7A (accession number: CAH10320) was expressed and puri ed as described previously [17]. E. coli BL21 (DE3) was rst subjected to heat shock transformation with plasmids for the expression of His-tagged TrCBM1 fused with GFP, hereafter referred to as His tag-TrCBM1-GFP fusion protein. LB medium was inoculated with the transformant, followed by incubation at 37 °C and 200 rpm until the OD 600 reached 1.2. Protein expression was induced using 1 mM isopropyl β-1thiogalactopyranoside, and the bacteria were incubated at 37 °C and 200 rpm for 5 h. Following centrifugation and sonication, His tag-TrCBM1-GFP was isolated from the supernatant and puri ed via Ni a nity chromatography followed by anion exchange chromatography. The His tag and GFP region were removed using enterokinase (New England Bio Labs, MA, USA) and thrombin (GE Healthcare, IL, USA), respectively. The obtained TrCBM1 was concentrated in 50 mM acetic acid-d 4 buffer prepared with D 2 O using a Vivaspin turbo ultra ltration device (Sartorius, Göttingen, Germany). We skipped the His tag cleavage process to prepare His tag-TrCBM1 for adsorption analysis. The His tag-TrCBM1 was puri ed via Ni a nity and anion exchange chromatography.

Synthesis of β-O -4 lignin oligomer model compounds
β-O-4 lignin oligomer models were synthesized as described in our previous work using a modi ed protocol originally developed by Katahira et al [18,25]. Vanillin (1) was dissolved in acetone, and the solution was re uxed for 1.5 h in the presence of KI, K 2 CO 3 , and t-butyl-2-bromoacetate to obtain tbutoxycarbonylmethyl vanillin (2). The monomer (2)  Similarly, vanillins with 13 C-labeled carbonyl and methoxy group were used to synthesize a lignin model with 13 C-labeled α positions (4 (α) ) and a model with 13 C-labeled methoxy groups (4 (m) ). The lignin models were characterized using 2D 1 H-13 C HSQC NMR and SEC. SEC was performed using three TSKgel SuperMultipore HZ-M columns (Tosho, Tokyo, Japan) on a Shimadzu instrument equipped with an LC-20AD pump and an SPD-M20A diode array detector (Shimadzu, Kyoto, Japan). THF was used for elution at a ow rate of 0.35 mL/min at 40 °C.

NMR spectroscopy and CSP analysis
All NMR spectra were recorded at 298 K on a Bruker Avance III HD 600 spectrometer equipped with a cryogenic probe and a Z-gradient (Bruker BioSpin, MA, USA). The instrument was controlled using Bruker Topspin NMR software version 3.5. For CSP analysis, 50 µM of the 13 C-labeled lignin models with 13 Clabeled aromatic rings and β positions (4 (Arβ) ), α positions (4 (α) ), and methoxy groups (4 (m) ) were individually dissolved in 50 mM acetic acid-d 4 buffer, which was prepared with D 2 O (pD 5.0) and 10% (v/v) DMSO-d 6 . Each NMR sample had a volume of 250 µL and contained 20 µM 2,2-dimethyl-2silapentane-5-sulfonic acid as an internal standard. To identify the TrCBM1 binding sites in the 13 Clabeled lignin models, changes in chemical shift (Δδ C,H , ppm) were calculated by comparing chemical shift values in the edited 1 H-13 C HSQC spectra of the 13 C-labeled lignin models in the presence and absence of 200 and 350 µM TrCBM1. CSP analysis of both the long-and short-chain lignin models was performed to evaluate the effect of molecular weight on TrCBM1 binding to the lignin models. Assignment of the lignin model signals was based on our previous report [18].
Adsorption experiment of lignin models with TrCBM1 The Langmuir adsorption isotherm model was applied to evaluate the TrCBM1 binding a nities of the lignin models. Each unlabeled long-and short-chain lignin model was dissolved in 50 mM acetic acid buffer (pH 5.0) containing 3,200 µg/mL of TrCBM1 with a His tag (His tag-TrCBM1 Availability of data and materials All data generated or analyzed during this study are included in this manuscript and its Additional les.

Figure 1
Ribbon models of TrCBM1 viewed from the side and bottom. The structure of TrCBM1 has been determined through NMR analysis (PDB ID 2CBH) [5]. Red stick models of tyrosine residues (Y5, Y31, and Y32) and T17, V18, and T24 residues in the cleft are also displayed.    Positions that generated extremely low-intensity signals in the absence of TrCBM1 are indicated by asterisks (*). Positions that generated overlapping signals are indicated by # symbols. exhibited. In the A 2 and C 2(I) positions, the value is the same because their 2D HSQC signals were not distinguishable due to overlapping.