The mechanism by which a distinguishing arabinofuranosidase can cope with internal di-substitutions in arabinoxylans

Background Arabinoxylan is an abundant polysaccharide in industrially relevant biomasses such as sugarcane, corn stover and grasses. However, the arabinofuranosyl di-substitutions that decorate the xylan backbone are recalcitrant to most known arabinofuranosidases (Abfs). Results In this work, we identified a novel GH51 Abf (XacAbf51) that forms trimers in solution and can cope efficiently with both mono- and di-substitutions at terminal or internal xylopyranosyl units of arabinoxylan. Using mass spectrometry, the kinetic parameters of the hydrolysis of 33-α-l-arabinofuranosyl-xylotetraose and 23,33-di-α-l-arabinofuranosyl-xylotetraose by XacAbf51 were determined, demonstrating the capacity of this enzyme to cleave arabinofuranosyl linkages of internal mono- and di-substituted xylopyranosyl units. Complementation studies of fungal enzyme cocktails with XacAbf51 revealed an increase of up to 20% in the release of reducing sugars from pretreated sugarcane bagasse, showing the biotechnological potential of a generalist GH51 in biomass saccharification. To elucidate the structural basis for the recognition of internal di-substitutions, the crystal structure of XacAbf51 was determined unveiling the existence of a pocket strategically arranged near to the − 1 subsite that can accommodate a second arabinofuranosyl decoration, a feature not described for any other GH51 Abf structurally characterized so far. Conclusions In summary, this study reports the first kinetic characterization of internal di-substitution release by a GH51 Abf, provides the structural basis for this activity and reveals a promising candidate for industrial processes involving plant cell wall depolymerization.


Background
Arabinoxylan is a hemicellulosic polysaccharide composed of a β-1,4-linked xylose backbone, which can be mono-substituted (at O-3) or di-substituted (at O-2 and O-3) with α-L-arabinofuranosyl residues (Araf) and eventually with (4-O-methyl) glucuronic acid [1,2]. Industrially relevant biomasses such as sugarcane [3], corn stover [4] and grasses are rich in arabinoxylans, which can represent up to 50% (w w −1 ) of their polysaccharides in the secondary wall [2]. Moreover, arabinoxylans from cereals stimulate the activity of beneficial bacteria in the colon of humans and animals, being considered a source of prebiotic oligosaccharides with promising health-promoting properties [5,6].
Thus, in this work, we reveal a novel generalist GH51 enzyme that forms trimers in solution and can cope with both mono-and di-substitutions in arabinoxylans, with biotechnological potential for biomass saccharification. For the first time, the kinetic characterization by mass spectrometry was described for a di-substituted AXO and the structural basis for di-substitution recognition in the GH51 family was elucidated.

XacAbf51 is a thermotolerant α-l-arabinofuranosidase and enhances sugarcane bagasse saccharification
The enzyme XacAbf51 fused to an N-terminal His-tag was recombinantly expressed in Escherichia coli cells and purified to homogeneity by metal-affinity and size-exclusion chromatography. The melting temperature (T m ) assessed by circular dichroism spectroscopy (CD) and differential scanning calorimetry (DSC) is around 67 °C (Fig. 1a-c), indicating enhanced thermotolerance compared to other glycoside hydrolases from X. axonopodis pv. citri, which usually have a T m between 45 and 55 °C [23]. XacAbf51 cleaves the synthetic substrate pNP-Araf, which confirms its α-l-arabinofuranosidase activity (EC 3.2.1.55). It is very stable over time, remaining active up to 45 days, when stored at 4 °C (not shown), and retaining more than 80% of its activity after 55 h incubated at 50 °C (Fig. 1d).
The prominent thermotolerance and activity of XacAbf51 in conditions akin to those used for enzymatic hydrolysis in biorefineries led us to evaluate the biotechnological potential of this novel Abf as a complement in fungal enzyme cocktails used for sugarcane bagasse degradation, since arabinoxylan is an important component of this biomass [3]. As expected, the addition of XacAbf51 in celluclast and RUT-C30 enzyme cocktails enhanced the hydrolysis of delignified sugarcane bagasse in near 20%, indicating that XacAbf51 might be a useful additive in enzyme formulations for sugarcane bagasse saccharification ( Fig. 1e and f ).

XacAbf51 recognizes internal di-substituted Xylp residues
To better understand the catalytic properties of XacAbf51, we characterized the influence of pH and temperature on enzyme activity and investigated its substrate specificity. Maximum catalytic rates were observed at pH 5.5 (Fig. 2a) and temperature between 55 and 60 °C (Fig. 2b), which is fully compatible with the reaction conditions of commercial fungal enzyme cocktails. Besides pNP-Araf, XacAbf51 also cleaves natural polysaccharides such as arabinoxylan and arabinan (Table 1). A comparison of the reaction with arabinan and arabinoxylan at 10 mg mL −1 indicates that the enzyme cleaves arabinan better than arabinoxylan. The enzyme was not able to cleave pNP-Xylp and arabinogalactan, indicating a high specificity for Araf residues linked to xylan or arabinan backbones.
The higher activity of XacAbf51 on arabinan as compared to arabinoxylan prompted us to investigate whether the enzyme TxAbfD3 (EC 3.2.1.55) from T. xylanilyticus-a GH51 member highly active on arabinoxylan [24]-displays the same behavior. In contrast to XacAbf51, the enzyme TxAbfD3 was more active on arabinoxylan than on arabinan, showing that distinct substrate preferences occur within the family GH51, despite their capacity to recognize several substrates.

Structural basis for the cleavage of AX di-substitutions by XacAbf51
To investigate the molecular mechanisms by which XacAbf51 cleaves AX di-substitutions, we solved and analyzed its crystal structure. As a typical GH51 enzyme, XacAbf51 harbors the active site in a (β/α) 8 -barrel that is tightly associated with a β-sandwich domain. The β-sheets of this β-sandwich put the N-and the C-terminal regions of the barrel together, stabilizing these two regions that otherwise would be labile (Fig. 5a). Thus, although not participating in the catalysis, the Fig. 1 XacAbf51 is a thermotolerant Abf and enhances saccharification of delignified sugarcane bagasse. Circular dichroism spectrum of XacAbf51 (a) and thermal denaturation profile of the enzyme assessed by CD (b) and DSC (c). Residual activity of XacAbf51 over pNP-Araf after incubation at 50 °C for up to 72 h (d). Sugar released from delignified sugarcane bagasse by Celluclast (238 µg) (e) or T. reesei RUT-C30 enzyme cocktail (238 µg) (f) in the absence or presence of XacAbf51 (13 µg). **P value ≤ 0.01; ***P value ≤ 0.001 (one-tailed Student's t test) dos Santos et al. Biotechnol Biofuels (2018) 11:223 β-sandwich domain seems to be essential for the catalytic domain stability.
Structural comparisons revealed that XacAbf51 displays all structural features required for the retaining mechanism of hydrolysis conserved in GH51 enzymes [25][26][27]. The catalytic residues Glu182 (acid-base) and Glu304 (nucleophile) are positioned 3.7 Å apart from each other within the active site pocket (Fig. 5a). A glycerol molecule occupied the − 1 subsite in a conformation that mimics part of the Araf ring (Fig. 5b). All residues from this subsite are identical or semi-conserved between XacAbf51 and GH51 structures known so far, except for Cys80 and Cys186. These cysteine residues form a disulfide bridge in XacAbf51 and TxAbfD3, which likely contributes to the high thermostability of these enzymes [24]. In other GH51 Abfs, Cys80 and Cys186 residues are replaced by asparagine and glutamine (Fig. 5c). Although Asn181 is fully conserved between the compared GH51 Abfs, it adopts a different rotamer in XacAbf51 (Fig. 5b).
Structural superimposition of XacAbf51 with TxAbfD3 in complex with 3 2 -α-l-arabinofuranosyl-xylotriose (XA 3 X) evidenced the presence of a cavity near to the − 1 subsite that could potentially accommodate the second Araf substitution of a di-substituted substrate (Fig. 6a). To gain insights into the molecular events involved in binding and hydrolysis of Araf from internal di-substituted Xylp residues, we appended an O2-linked Araf at XA 3 X, thus generating XA 2+3 X, and carried out a molecular dynamics (MD) simulation of XacAbf51 complexed with this di-substituted substrate. According to this simulation, the side chains of Ser222 and Asp223 adopted different rotameric conformations to better accommodate the O 2 -linked Araf at the +2NR* subsite (Fig. 6b). The side chain of Asn181 rotated 180° around Cβ to interact with the O 2 atom of the arabinofuranosyl residue at the − 1 subsite. Trp254 formed hydrophobic interactions with the +2R Xylp residue, but no hydrogen bonds were observed between the enzyme and the xylan backbone, which correlates with the versatility of XacAbf51 in recognizing both arabinoxylan and arabinan. Selected inter-atomic distances between enzyme and XA 2+3 X remained stable over the simulation, indicating favorable interactions for substrate binding (Fig. 6c). Thus, the MD simulation data support that the pocket adjacent to − 1 subsite can accommodate the O2-linked Araf from . Note that the optimal ranges of pH and temperature for XacAbf51 activity are compatible with commercial fungal enzyme cocktails for lignocellulose saccharification Table 1 Kinetic parameters of XacAbf51 and TxAbfD3 on pNP-Araf and arabinan and comparative activity of XacAbf51 and TxAbfD3 on arabinoxylan.
For comparative purposes, the assays were performed at 50 °C, pH 5.5, which is compatible with industrial processes of biomass saccharification. The K m and k cat could not be estimated for arabinoxylan because the maximum velocity is not reached at the highest possible concentration of the substrate a Activity measured using substrate at 10 mg mL −1

pNP-Araf
Arabinan internal di-substituted Xylp residues, while the O3-linked decoration is placed into − 1 subsite for catalysis. Considering the pseudosymmetry of xylan and the design of catalytic interface, the backbone might also bind to the active site in the inverted direction, placing, in this case, the internal O2-linked Araf (from mono-or di-substitutions) into − 1 subsite for cleavage. In TxAbfD3, we observed variable regions at β6-α6 and β5-α5 loops that might explain its low activity against internal di-substitutions. The β6-α6 loop contains the tryptophan residue that interacts with the + 1 Xylp unit in TxAbfD3, but makes hydrophobic contacts with the + 2 Xylp residue in XacAbf51 (Fig. 6d). To test the influence of β6-α6 loop in substrate preference, the sequence TIPGGWPPRASST (Thr249-Thr261) and two extra residues (Ala310-Pro311) of XacAbf51 were replaced by the sequence TVPGPWEKKGPAT and DV of TxAbfD3, because the aspartic residue from the DV motif interacts with β6-α6 loop in TxAbfD3. CD analysis indicated a folded conformation of the mutant (data not shown); however, it was inactive against arabinan and arabinoxylan and poorly active against pNP-Araf. Another point of divergence between XacAbf51 and TxAbfD3 is the sequence SDD (Ser222-Asp224, β5-α5 loop) of XacAbf51, which is replaced by the NTA (Asn216-Ala218) motif in TxAbfD3, attracting the +2 Xylp unit via a hydrogen bond donated by Asn216 (Fig. 6d). This three-residue replacement caused enzyme aggregation, as assessed by Dynamic Light Scattering (DLS), and disrupted the enzyme activity against arabinoxylan and arabinan (results not shown). We also tested whether the triple replacement of β6-α6 loop, DV and SDD motifs would convert the substrate preference of XacAbf51 to that of TxAbfD3. Although the mutant showed a folded conformation with a similar hydrodynamic radius (R h ) to the WT enzyme, the triple modification also abolished the XacAbf51 activity against arabinoxylan and arabinan, indicating that other structural features might affect the positioning and dynamics of β6-α6 and β5-α5 loops, impairing activity when associated with transplanted loops.

Fig. 3
XacAbf51 releases Araf from mono-and di-substituted AXOS and from arabinoxylan. Capillary zone electrophoresis profiles of AXOS before (red lines) and after (black lines) incubation with XacAbf51. Although the peaks of decorated and undecorated oligosaccharides were indistinguishable in this assay, the increase of arabinose (Ara) peak after enzyme treatment shows the capacity of XacAbf51 to release Araf from several AXOS and from arabinoxylan and arabinan. a d XA 3 XX = 3 3 -α-l-arabinofuranosyl-xylotetraose; e XA 2+3 XX = 2 3 , 3 3 -di-α-l-arabinofuranosyl-xylotetraose; f arabinoxylan from wheat flour and arabinan from sugar beet. Black arrowheads represent the migration time of arabinose (Ara), xylobiose (X2), xylotriose (X3) and xylotetraose (X4) standard runs. Red arrowheads represent the substrate migration time (0 min, without enzyme). In (f), the Ara released from arabinan was used as a reference for the analysis of arabinoxylan cleavage, due to the anomalous migration of Ara in these conditions, compared to the standard run dos Santos et al. Biotechnol Biofuels (2018) 11:223 The biological unit of XacAbf51 is a trimer In the crystal structure of XacAbf51, six protein chains compose the asymmetric unit, but in a different spatial disposition from that observed for known GH51 hexamers such as TxAbfD3 [28] (Fig. 7a). Analysis of the crystal interfaces using jsPISA [29] indicates that trimers, composed by ABC or DEF chains, are the most stable quaternary structure of XacAbf51. Moreover, the interface between the dimer of trimers that compose the TxAbfD3 hexamer is not conserved in XacAbf51.
To determine the oligomeric state of XacAbf51 in solution, several experiments were carried out with the purified protein. The small angle X-ray scattering (SAXS) curve of XacAbf51 revealed a radius of gyration (4.5 nm) and a low-resolution molecular envelope that are consistent with the crystallographic trimer (Fig. 7b). Moreover, the sedimentation coefficient estimated from analytical ultracentrifugation (AUC) at different protein concentrations ( Fig. 7c and d) corresponds to a particle of 161 kDa, which is in accordance with the theoretical mass of the trimer (171 kDa). Estimation of R h using DLS (Fig. 7e) further supported that the biological unit of XacAbf51 is a trimer.

Evolution of GH51 enzymes
To gather insight into the evolution of GH51 Abfs, a phylogenetic tree was constructed based on the catalytic domain of characterized GH51 enzymes and their respective paralogues (Fig. 8). This phylogenetic reconstruction shows two major clades (clades I and II) referent to a gene duplication that occurred early in evolution, as indicated by the presence of genes from the two clades in Thermotoga petrophila, a species from a deep phylogenetic branch in the tree of life [30]. Members of clade I are abundant in bacteria, whereas those of clade II are found mainly in plants and fungi.
The division in two major clades reflects two main types of modular architecture. In clade I, most enzymes display the (β/α) 8 barrel + β-sandwich composition, but, in clade II, the proteins have an extra N-terminal domain which resembles carbohydrate-binding modules (CBM) from families 4, 6 or 11 (Fig. 8). Interestingly, enzymes with β-1,4-glucanase activity, found only in specific bacteria from Fibrobacter and Alicyclobacillus genera (clade Ib), have peculiar and diverse domain arrangements, indicating they emerged from gene duplication and recombination events. In these enzymes, the (β/α) 8 barrel is usually fused to one or more copies of putative cellulose-binding modules (CBM 3, 11 and 30). Moreover, unconventional domains (Gp9-like and cupredoxin-like) are detected in two endoglucanases from Alicyclobacillus sp.
To date, the only structures available for the GH51 family comprise Abfs from clade Ia with the (β/α) 8 barrel + β-sandwich composition. Except for XacAbf51, which is a trimer, the other structures reported so far Crystallographic structure of XacAbf51 reveals a typical fold of GH51 arabinofuranosidases and a disulfide bridge at − 1 subsite conserved in TxAbfD3, but divergent in other structurally characterized GH51 enzymes. a Scheme of XacAbf51 domain architecture (top) and cartoon representation of the 3D structure (bottom) highlighting the distance (3.7 Å) between the catalytic residues (sticks) compatible with the retaining mechanism of hydrolysis found in GH51 family. b Magnified view of − 1 subsite (ball and sticks, light gray C atoms) in which a glycerol molecule (yellow C atoms) is bound mimicking part of the arabinose scaffold observed in the crystallographic structure of TmAbf51-arabinose complex (pink C atoms). c Structure-based sequence alignment of − 1 subsite (boxed residues) from the GH51 enzymes of known structure. Dark violet represents identical residues, light violet semi-conserved and yellow highlights the cysteine residues that form a disulfide bridge only in the XacAbf51 and TxAbfD3 enzymes of the presented comparison. Tm: Thermotoga maritima; Tp: Thermotoga petrophila; Bl: Bifidobacterium longum; Rt: Ruminiclostridium thermocellum; Gs: Geobacillus stearothermophilus are hexamers, indicating that the molecular diversity of GH51 enzymes include changes in quaternary structure besides modular rearrangements. The capacity to cleave α-1,2 and α-1,3 Araf decorations in arabinoxylan and/or arabinan as well as α-1,5 bonds in arabinan is observed in Abfs from both clades I and II, evidencing the structural plasticity of the GH51 active site [13,20,[31][32][33][34][35][36][37].

Discussion
This study reports the first Michaelis-Menten kinetic parameters for the cleavage of internal Araf di-substitutions by a GH51 Abf and provides the structural basis Fig. 6 A cavity adjacent to − 1 subsite accommodates the second decoration of di-substituted AXOS. a Structural superposition of XacAbf51 structure (violet surface) with TxAbfD3 structure in complex with 3 2 -α-l-arabinofuranosyl-xylotriose (XA 3 X; blue C atoms). Subsites are labeled according to the nomenclature used by McKee and coworkers [14]. NR non-reducing end, R reducing end. b Comparison between XacAbf51 crystal structure and the modeled XacAbf51-XA 2+3 X complex after 100 ns of molecular dynamics simulation. According to this simulation, the xylan backbone bends at the β-1,4 linkage involving the reducing end of substrate to better accommodate the di-substitution in the cavity adjacent to the − 1 subsite of XacAbf51. c Selected inter-atomic distances between enzyme and substrate indicate favorable interactions over the simulation. Colored circles refer to the selected substrate atoms highlighted in b (open circles). d Structural comparison of XacAbf51 and TxAbfD3 crystal structures highlighting the divergent loops β5-α5 and β6-α6 that delineates the +1 and +2R subsites. The substrate XA 3 X bound to TxAbfD3, as well as the different positioning of W254 compared to W248 and the side chains of SDD and NTA motifs are shown in sticks and color-coded according to the respective structure. Note the hydrogen bond between N216 and the O2 of +2R Xylp residue that is absent in XacAbf51 dos Santos et al. Biotechnol Biofuels (2018) 11:223 Fig. 7 XacAbf51 is a trimeric enzyme. a Comparison of TxAbfD3 hexamer with the molecules found in the asymmetric unit of XacAbf51 crystal (cartoon with transparent surface). The schemes highlight that in the XacAbf51 crystal structure the ABC and DEF trimers interact with each other in a different way compared to the trimer-trimer interface of TxAbfD3 hexamer. b SAXS curve (open circles) agrees with the theoretical profile of XacAbf51 trimer calculated from the crystal structure using CRYSOL. The inset shows the pair-distance distribution function computed from the experimental data and used to generate the low-resolution envelope (white surface) fitted to the crystallographic trimer (cartoon). c, d AUC data show that XacAbf51 assumes a trimer arrangement in a wide range of protein concentration. e Summary of size and mass parameters estimated using four biophysical techniques demonstrates that the quaternary structure of XacAbf51 is a trimer Fig. 8 Molecular phylogenetic analysis of GH51 family. Phylogenetic tree (unrooted) based on a multiple sequence alignment of the (β/α) 8 barrel of characterized GH51 enzymes present in the CAZY database [7] and the respective paralogues. The evolutionary history was inferred using the maximum likelihood method implemented in the MEGA7 software [65,67]. The tree with the highest log likelihood (− 23,649.75) is shown and the percentage of trees in which the associated taxa clustered together are shown next to the branches (except for those with values below 50%). Branch lengths represent the number of substitutions per site. The right panel shows the domain architecture predicted for each sequence using the webserver SUPERFAMILY [63]. Proteins with known 3D structure are highlighted with purple (XacAbf51) or gray boxes. Paralogous sequences from T. petrophila are shown in bold (See figure on next page.) dos Santos et al. Biotechnol Biofuels (2018) 11:223 for this activity. Cleavage of terminal di-substitutions in AXOS has been reported for some GH51 enzymes, but internal di-substitutions have been described as poor or non-cleavable substrates [17][18][19][20]. Our data reveal a novel GH51 enzyme that releases both Araf residues from internal di-substitutions with a catalytic constant of ~ 10 s −1 . Although our data do not resolve the XacAbf51 preference between O2 or O3 linkages, they reveal that the first cleavage of a di-substitution is the rate-limiting step of the reaction catalyzed by XacAbf51, leading to a tenfold lower k cat /K m for the di-substituted compared to the O3-mono-substituted substrate.
For almost all GH51 enzymes characterized so far, kinetic parameters have only been assessed using synthetic substrates (pNP derivatives), probably because of the high-cost and limited availability of AXOS allied to the low response stability and time-consuming characteristic of HPAEC-PAD analyses [38]. To overcome such bottlenecks, we used mass spectrometry to monitor the enzymatic hydrolysis of mono(di)-substituted arabinoxylotetraoses-a fast, direct and highly sensitive approach that requires minimum amounts of substrate (in this study, we acquired each data point in 1 min and used less than 10 mg of substrate for a complete enzyme characterization). Thus, we envisage the mass spectrometry as a useful, fast and precise alternative, not only for future studies of GH51 enzymes, but also to assess Michaelis-Menten kinetics of oligosaccharide hydrolysis by other GHs, as previously reported for xylanases [39].
The positive effect of XacAbf51 in the saccharification of delignified sugarcane bagasse may be useful for the development of enzyme cocktails optimized for this biomass. Supplementation of fungal cellulases mixtures with hemicellulases and auxiliary enzymes, including a GH51 Abf, has already been shown to increase the conversion of AFEX pretreated corn stover into monosaccharides [40]. Here we evidence that this approach is also valuable to increase the hydrolysis yield of pretreated sugarcane bagasse. The cellulolytic fungi T. reesei displays three Abfs (GH43, GH62 and GH54), but is devoid of GH51 enzymes [41]. Thus, our data support that the XacAbf51 capacity of releasing terminal and internal di-substitutions of AXOS might improve the performance of widely used cellulolytic enzyme cocktails over arabinoxylan-rich biomasses.
Our structural data compared to those of GH62 and GH43 Abfs (EC 3.2.1.55) contribute to a better understanding of the molecular determinants for distinct substrate specificities in Abfs. GH62 enzymes specialized in mono-substitutions display a single arabinose-binding pocket in the middle of a long cleft where the xylan backbone binds (Fig. 9a). As proposed by Maehara and coworkers, the pseudosymmetry of xylan backbone and the active site topology of Araf62A likely allows arabinoxylan to bind into the cleft in two opposite directions to, respectively, allocate the O3-and O2-linked mono-substitutions at the − 1 subsite [42]. Differently, in the GH43 enzyme HiAXH-d3, which is specific for O3-linked Araf from di-substitutions, an auxiliary pocket accommodates the second Araf decoration and solvent-mediated hydrogen bonds (involving Trp526 and the ring oxygen of +2R Xylp) selects a single orientation of the xylan backbone, in a manner that the catalytic pocket is always occupied by the O3-Araf moiety (Fig. 9b) [14]. Similar to HiAXH-d3, XacAbf51 also displays an auxiliary pocket to accommodate the second substitution of di-substituted substrates (Fig. 9c). However, the residue Trp254 (equivalent to Trp526 of HiAXH-d3) makes a π-stacking interaction with +2R Xylp, which does not depend on the endocyclic oxygen, the only asymmetric feature of xylan. Thus, according to these analyses, it is plausible to suggest that the active site of XacAbf51 allows the bidirectional binding of arabinoxylan and AXOS to cleave O2-and O3-linked Araf from mono-or di-substitutions.
The positioning of Trp254 seems to play a role in disubstitution recognition. However, our mutational strategy to test this hypothesis (β6-α6 and/or β5-α5 loops transplantation from TxAbfD3 to XacAbf51) inactivated XacAbf51 instead of changing its substrate specificity, indicating an incompatibility that may require secondary mutations or the reverse transplant (from XacAbf51 to TxAbfD3) to attain the expected functional changes.
All GH51 proteins whose structure is currently available are bacterial enzymes from Clade Ia (Fig. 8). The oligomeric state of only three of them has been validated in solution [TpAbf51, [43]; TxAbfD3 (AUC data not shown) (See figure on next page.) Fig. 9 Molecular diversity of arabinoxylan-degrading mechanisms by Abfs. a The active site of Araf62A (GH62) is composed by a cleft that accommodates the xylan backbone and a − 1 subsite that binds specifically to mono-substitutions of Araf (O2-or O3-linked). The arabinose (green C atoms) and protein surface are from PDB 3WN0, while the xylan backbone (orange C atoms) is from PDB 3WN2 [42]. NR non-reducing end, R reducing end. b HiAXH-d3 from Humicola insolens (GH43) cleaves specifically the O3 linked Araf substitution from di-substituted Xylp units and displays an auxiliary pocket to accommodate the di-substitution. The residues W526 selects a single orientation for arabinoxylan binding into the active site via solvent-mediated interactions (dashed lines) with the endocyclic oxygen of +2R Xylp (PDB 3ZXK, [14]). c The active site of XacAbf51 also has an auxiliary pocket to accommodate di-substitutions, but the positioning of W254 seems to accept the binding of arabinoxylan in the direct and reverse direction to allow the cleavage of O3 and O2 substitutions, respectively, making this enzyme a generalist Abf dos Santos et al. Biotechnol Biofuels (2018) 11:223 and XacAbf51 (Fig. 7)] and served as a guide to map how the quaternary structure of GH51 enzymes evolved. The hexameric arrangement, which can be seen as dimer of trimers, seems to have appeared early during evolution of GH51 family, being found in the Thermotoga genus, a deep lineage back to the early forms of bacteria [30,43]. The hexameric arrangement remained stable in other thermophilic bacteria, such as Ruminiclostridium thermocellum (jsPISA prediction, [29]) and T. xylanilyticus [28]), but, in the mesophilic X. axonopodis pv. citri, the dimer of trimers was disrupted, giving rise to a trimeric enzyme. Based on these data, we suggest that the ancient GH51 arabinofuranosidases from clade I formed hexamers-possibly to withstand extreme conditions of high temperature-and that colder environments favored the emergence of trimeric enzymes, at least during X. axonopodis pv. citri speciation, changing the paradigm that GH51 Abfs are exclusively hexameric. According to Con-Surf analyses [44], the trimer interface, which is close to the active site, harbors residues more conserved than those assembling trimers into hexamers, indicating that the trimeric arrangement may be more crucial than the hexameric configuration for enzyme function.

Conclusions
In summary, our study expands our knowledge about the diversity of GH51 Abfs in terms of tertiary and quaternary structure and provides the structural basis for the release of internal Araf di-substitutions by a generalist Abf that copes with all types of Araf decorations in arabinoxylan and arabinan. The rare mode of action of XacAbf51, along with full pH and temperature compatibility with current fungal enzyme cocktails, is very attractive for industrial applications, especially in technologies for the production of fermentable sugars using arabinoxylan-rich biomasses such as sugarcane, corn stover and grasses.
The kinetic assays were monitored on a Waters Synapt HDMS, at V mode and ESI(+) with a spray voltage maintained at 3.0 kV and heated to 130 °C in the source. A total of 15 µL of the quenched reactions and 2 µL of 1 mM xylotriose (used as the internal standard) were added to 183 µL of water and injected into the mass spectrometer in scan mode (m/z 300-900) with direct infusion at a flow rate of 50 µL min −1 . An internal standard with ionization similar to analytes (xylotriose) was used to increase the reliability of the method [46]. A calibration curve was made to determine the concentrations of the products of the enzymatic reaction. The kinetic parameters of the reactions (k cat , K m and V max ) were determined by non-linear regression analysis (Hill model) of the Michaelis-Menten plot using the software Origin8.1.
Fermentations were performed using the BioFlo/Cel-liGen 115 system (Eppendorf, Hamburg, Germany) and water-jacketed 3.0-L vessels. The fermentation medium comprised of 5% (m v −1 ) milled soybean hulls, 5% (v v −1 ) milk whey, 2% (m v −1 ) (NH 4 ) 2 SO 4 and 1 mL L −1 of J647 antifoam (Struktol, Hamburg, Germany) in the batch phase and milk whey with lactose concentration of 177 g L −1 were fed from 72 to 170 h at an average rate of 0.5 g L −1 h −1 total sugar. Aeration was maintained at 1.0 VVM compressed air, pH between 3.8 and 4.8 using 2 M phosphoric acid and 10% ammonia, and DO above 30% with an agitation cascade (400-950 rpm). The initial volume was 1 L, and the reactors were inoculated with 1:10 volume of 7-day-old shake flask preculture using the same media composition as the fermentation batch medium, spore concentration in the inoculum bottle was 2.5 × 10 7 in 100 mL. Samples were withdrawn every 24 h, centrifuged at 21,000×g for 10 min and the supernatants stored at − 20 °C for analysis. Whole broth samples were adjusted to pH 5.0, frozen at − 20 °C and used for hydrolysis assays. Fermentations were terminated after 170 h when the feeding was stopped.
For quantifying protein, the sample was first diluted to a final concentration of 0.3-1.5 g L −1 in 50 mM Na citrate buffer, pH 5.0. A 200 μL sample was combined with 800 μL ice-cold acetone, mixed by inverting the tube several times and then maintained at − 20 °C for 1 h. The precipitated proteins were pelleted by centrifugation at 14,000×g and 4 °C for 5 min. The supernatant was removed and the pellet was air-dried for 5 min before resuspending in the original volume (200 μL) of buffer. The protein concentration was then quantified using the DC protein kit (BioRad, Hercules, CA) based on the method of Lowry [48] using bovine serum albumin as standard.

Complementation assays
Delignified sugarcane bagasse was prepared using an alkaline pretreatment (130 °C, 30 min, 1.5% m v −1 NaOH), yielding a material composed by 58.6% cellulose, 22.1% hemicellulose, and 8.8% lignin. Enzymatic hydrolysis reactions were performed in samples of 1 mL containing 5% of dry biomass (50 mg) and 237.5 µg of enzyme cocktail, supplemented or not with 12.5 µg XacAbf51, in buffer 50 mM sodium citrate, pH 5.5, with 0.02% sodium azide. The reactions were done in triplicate and incubated in a hybridization oven at 50 °C with agitation during 24 h. The enzyme cocktails used were Celluclast (Novozymes, Krogshoejvej, Denmark) and the whole broth from T. reesei RUT-C30, prepared as described above. Protein concentration was estimated by the Lowry method [48] using the DC protein kit (BioRad, Hercules, CA).

Circular dichroism
CD spectra were acquired on a JASCO J-815 CD spectrometer (Jasco, Tokyo, Japan) controlled by a CDF-426S/15 Peltier temperature control system using a quartz cuvette with a 1-cm path length. The enzyme was prepared in phosphate buffer (20 mM sodium phosphate, 150 mM NaCl, pH 7.5) at a final concentration of 8 µM. All spectra were obtained at 20 °C in the range 195-260 nm with a bandwidth of 2 nm and a response time of 4 s nm −1 . CD spectra were buffer subtracted and normalized to mean residue ellipticity. Thermal unfolding experiments were monitored at 220 nm in the temperature range 20-90 °C with a scan rate of 1 °C min −1 . The melting temperature was determined according to the sigmoidal-Boltzmann fitting of the CD denaturation curve.

Differential scanning calorimetry
Thermal stability was also analyzed by DSC using a VP-DSC device (Microcal, GE Healthcare, Northampton, MA). The enzyme was prepared in phosphate buffer (20 mM sodium phosphate, 150 mM NaCl, pH 7.5) at a final concentration of 2 mg mL −1 . A temperature rate of 1 °C min −1 was used and the reversibility of protein denaturation was tested. Denaturation curves were buffer subtracted, concentration normalized and the resultant endotherms integrated following assignment of pre-and post-transition baselines.

Dynamic light scattering
Size distribution of the purified enzyme in solution was evaluated using DLS. Measurements were acquired at 20 °C on a Malvern Zetasizer Nano ZS 90 (Model no. ZEN3690, Malvern, Worcestershire, UK) with a 633-nm laser, in a quartz cell with a scattering angle of 90°. The protein was analyzed at a concentration of 0.5 mg mL −1 in phosphate buffer (20 mM sodium phosphate, 150 mM NaCl, pH 7.5). An average of 20 runs was used to estimate the R h through Stokes-Einstein equation.

Analytical ultracentrifugation
Sedimentation velocity experiments were performed on a Beckman Optima XL-A analytical ultracentrifuge (Beckman Coulter, Indianapolis, IN) at 20 °C. Spectra were collected at both 220 and 280 nm. The protein was prepared in different concentrations ranging from 0.2 to 0.9 mg mL −1 in phosphate buffer (20 mM sodium phosphate, 150 mM NaCl, pH 7.5). AUC data were analyzed using the continuous sedimentation distribution method in the SEDFIT program [49]. The s 020,w value at infinite dilution was calculated by linear regression of s 20,w as a function of protein concentration.

Small angle X-ray scattering
Small angle X-ray scattering measurements were performed at three different concentrations (2, 4 and 6 mg mL −1 ) in 20 mM Tris buffer, pH 7.5. Data were collected at SAXS2 beamline (LNLS, Campinas, Brazil), integrated using Fit2D [50] and analyzed using GNOM [51]. The molecular envelope was calculated from the experimental SAXS data using the program DAMMIN [52]. Ten runs of ab initio shape determination yielded highly similar models (normalized spatial discrepancy values < 1), which were then averaged using the package DAMAVER [53]. The theoretical scattering curves of crystallographic structures were calculated and compared with the experimental SAXS curves using the program CRYSOL [54]. The crystallographic structure was fitted into the SAXS molecular envelope using the program SUPCOMB [55].

Protein crystallization, X-ray data collection and structure determination
XacAbf51 (27 mg mL −1 ) crystallized by vapor diffusion method in sitting drops containing 17% (w v −1 ) polyethylene glycol 3350 and 0.2 M ammonium chloride. Crystals were cryoprotected using the reservoir solution added of 20% (v v −1 ) glycerol. Diffraction data were collected at the BL12-2 beamline from the Stanford Synchrotron Radiation Lightsource (Stanford, CA). Data were processed using XDS [56] and the structure was solved by molecular replacement method using the program MOLREP and the atomic coordinates of TxAbfD3 (PDB ID: 2VRQ) as search model. Six chains were found in the asymmetric unit and the model was refined against electron density using COOT [57] and against X-ray data using phenix.refine [58] and REFMAC [59]. Final model was validated using MolProbity [60]. Data collection, processing and refinement statistics are summarized in Table 2.

Superimposition
of XacAbf51 crystal structure with that of TxAbfD3 in complex with 3 2 -α-l-arabinofuranosyl-xylotriose (XA 3 X) was performed with PDBeFOLD [61] and the coordinates of the ligand positioned into the XacAbf51 active site were transferred for the PDB file containing one trimer of XacAbf51. An Araf substitution was added to XA 3 X to generate XA 2+3 X and simulation systems using explicit solvent were created for energy-minimized trimeric structures of XacAbf51 in complex with XA 2+3 X. Energy minimization and MD simulations were carried out using YAMBER3 force field with the program YASARA [62]. Long-range Coulomb interactions were included with a cutoff of 7.86 Å. The simulation box was defined at 15 Å around all atoms of the structure. Protonation was performed at pH 7. Cell neutralization was reached filling the box with water molecules (d = 0.997 g mL −1 ) and Na/ Cl counter ions (0.9% m v −1 ) coupled with a short MD simulation for solvent relaxation. MD simulations were performed during 100 ns at 298 K, using a multiple time step of 2.0 fs for inter-molecular forces, 1.2 fs for intramolecular forces, periodic boundary conditions and unconstrained bonds and angles. Root mean square deviations (RMSDs) were calculated for the whole system and Euclidean distances between enzyme and substrate atoms were measured through the trajectory in the three active sites of the trimer and the average value is presented in function of the simulation time.

Phylogenetic analyses
The sequences of characterized GH51 enzymes present in the CAZY database, excluding redundant sequences (sequences from the same species with > 95% sequence identity) and synthetic constructs, were manually edited to include only the fragment corresponding to the (β/α) 8 barrel, as predicted by the webserver SUPERFAMILY [63]. The edited sequences were aligned using the software MUSCLE, available at the EMBL-EBI webserver (https ://www.ebi.ac.uk/Tools /msa/muscl e/) [64]. The multiple sequence alignment was provided for the MEGA7 software to perform evolutionary analyses [65]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. A discrete Gamma distribution was used to model evolutionary rate differences among sites [five categories (+G, parameter = 1.9905)]. The rate variation model allowed for some sites to be evolutionarily invariable ([+I], 1.20% sites). The analysis involved 72 amino acid sequences. All positions with less than 80% site coverage were eliminated. That is, fewer than 20% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 292 positions in the final dataset. The confidence of tree topology was assessed using the Bootstrap analysis based on 1000 bootstrap replications [66].