Skip to main content

Biochemical and structural characterisation of a family GH5 cellulase from endosymbiont of shipworm P. megotara



Cellulases play a key role in the enzymatic conversion of plant cell-wall polysaccharides into simple and economically relevant sugars. Thus, the discovery of novel cellulases from exotic biological niches is of great interest as they may present properties that are valuable in the biorefining of lignocellulosic biomass.


We have characterized a glycoside hydrolase 5 (GH5) domain of a bi-catalytic GH5-GH6 multi-domain enzyme from the unusual gill endosymbiont Teredinibacter waterburyi of the wood-digesting shipworm Psiloteredo megotara. The catalytic GH5 domain, was cloned and recombinantly produced with or without a C-terminal family 10 carbohydrate-binding module (CBM). Both variants showed hydrolytic endo-activity on soluble substrates such as β-glucan, carboxymethylcellulose and konjac glucomannan, respectively. However, low activity was observed towards the crystalline form of cellulose. Interestingly, when co-incubated with a cellulose-active LPMO, a clear synergy was observed that boosted the overall hydrolysis of crystalline cellulose. The crystal structure of the GH5 catalytic domain was solved to 1.0 Å resolution and revealed a substrate binding cleft extension containing a putative + 3 subsite, which is uncommon in this enzyme family. The enzyme was active in a wide range of pH, temperatures and showed high tolerance for NaCl.


This study provides significant knowledge in the discovery of new enzymes from shipworm gill endosymbionts and sheds new light on biochemical and structural characterization of cellulolytic cellulase. Study demonstrated a boost in the hydrolytic activity of cellulase on crystalline cellulose when co-incubated with cellulose-active LPMO. These findings will be relevant for the development of future enzyme cocktails that may be useful for the biotechnological conversion of lignocellulose.


The desire to reduce the consumption of fossil fuels has sparked tremendous interest in searching for alternative renewable energy sources, including lignocellulosic biomass. Plant-based lignocellulosic biomass is an abundant polymeric material that may be exploited for renewable energy production, thus helping to reduce the consumption of fossil fuels. Lignocellulose consists of polysaccharides, including cellulose, hemicelluloses, and pectin, as well as lignin, a complex aromatic polymer. Cellulose is the most important structural component in plant biomass, with an estimated global production of more than 1.5 × 1012 tons per year [1]. Cellulose is a homopolymer of glucose, linked by β-1,4-glycosidic bonds [2, 3]. Cellulose chains associate into an insoluble, often crystalline fiber structure, which make the material structurally strong and challenging for biodegradation. Intermolecular interactions with the other polymeric cell wall compounds add strength and recalcitrance. Consequently, although cellulose is an attractive source material for green energy, its exploitation is intricate due to its resistance to enzymatic depolymerization [4]. Indeed, efficient conversion of lignocellulosic biomass, or even relatively pure cellulose fibers, requires multiple enzymes acting synergistically to deconstruct the feedstock and generate monomeric sugars that can be converted to fuels and chemicals [5].

Enzymatic cellulose depolymerization is mainly catalysed by cellulases and lytic polysaccharide monooxygenases (LPMOs) acting in concert [5]. The cellulases include endo- and exo-acting glycosyl hydrolases (GHs) that are thought to act synergistically because, for instance, endo-β-1,4-glucanases hydrolyze internal glycosidic bonds to generate new chain ends on which exo-β-1,4-glucanases, also known cellobiohydrolases, can act. Despite decades of research, there is still a need for novel and efficient cellulases that may help improving the sustainability and economy of biorefining processes. Until now, most cellulose-active enzymes have been isolated and characterised from wood-decaying fungi and soil bacteria [6,7,8,9]. Cellulose-degrading higher organisms may use symbiotic microbes as a source of enzymes for biomass degradation, such as the marine shipworm, an eminent lignocellulose degrader [10,11,12], remains largely unexplored. In the present study, we analysed marine wood-digesting bivalve molluscs called shipworms, which feed on submerged wood in the ocean.

The shipworms are marine molluscs of the order Myida and the family Teredinidae (also called “termites of the sea”). They are wood-boring bivalves found throughout the world’s oceans [12]. They are notorious for boring into wooden structures immersed in seawater, where they settle on and excavate into wood as larvae that eventually grow to become elongated worms [13]. Thus, shipworms are unique because only few organisms have evolved the ability to feed on woody biomass as the sole nutrient source [7]. Majority of shipworm species possess a simple digestive system with a large caecum and a short intestine. Previously it was suggested that shipworms may present few microbes in their gut-digestive systems (cecum) that help in wood digestion [14]. However, later it was reported that endosymbiotic bacteria residing in a specialized region of the gill tissue have been shown to fix atmospheric nitrogen [15] also produce variety of carbohydrate active enzymes (CAZymes) that function in lignocellulose digestion in cecum of shipworms [16]. In addition, shipworms produce several endogenous CAZymes secreted by a specialized digestive gland that finally accumulate in cecum for lignocellulose digestion [10, 11]. Because of these unique features, shipworms are an attractive target for the discovery of new CAZymes for depolymerization of lignocellulose.

The genome analysis of endosymbiotic bacteria combined with cecum proteome analysis of shipworms revealed that gut-digestive system contains endosymbionts origin cellulases classified into several glycosyl hydrolase (GH) families with different substrate specificities [16,17,18]. For instance, GH5, which is a large protein family, contains not only endo-glucanases (EC, but also β-mannanases (EC, exo-1,3-glucanases (EC, endo-1,6-glucanases (EC, xylanases (EC, endo-glycoceramidases (EC and xanthanase [19]. In the quest for novel cellulases, we have identified, and performed cloning, expression, functional and structural characterization of a cellulase belonging to glycoside hydrolase family 5 (GH5). This protein is essentially identical to a putative GH5 cellulase from the shipworm endosymbiont Teredinibacter waterburyi, and was thus named TwCel5. The enzyme is part of a large multi-domain cellulase comprised of a N-terminal GH5 domain, followed by three CBM10 domains separated by serine-rich linker region and a C-terminal GH6 domain. In this study, we have produced and functionally compared two variants of the GH5 enzyme: the catalytic domain only (TwCel5CAT) and the catalytic domain connected to a C-terminal CBM10 (TwCel5CBM). Next to the functional characterization of the two variants, we also explored the potential synergistic effect between the GH5 cellulase and Cels2, a cellulose-active bacterial LPMO [20] that boosted depolymerisation activity for crystalline cellulose. The results showed that TwCel5 is an endo-β-1,4-glucanase with a catalytic properties that renders it a potentially attractive industrial biocatalyst for cellulose bioconversion.

Results and discussion

Sequence analysis and structural modelling

The 3312 nucleotides sequence (GenBank; OP793796) encodes a multi-domain protein (WAK85940.1), TwCel5-6, consisting of 1103 amino acids residues that possess a putative N-terminal signal peptide for protein secretion. The deduced amino acid sequence analysis using Interpro classified TwCel5-6 protein into putative N-terminal glycosyl hydrolase family 5 (GH5) domain (amino acids 15-322) and a C-terminal glycosyl hydrolase family 6 (GH6) domain (amino acids 690-1103), which are interspaced by 368 amino acids long region encoding three cellulose binding CBM10 modules (Fig. 1A). The five modules are connected by poly-serine linkers that are thought to be flexible, disorganized spacers [21]. A BLAST search using the deduced amino acid sequence of TwCel5-6 revealed the closest 99.49% sequence identity with a hitherto uncharacterized glycosyl hydrolase (WP_223144885.1) from the shipworm gill endosymbiont Teredinibacter waterburyi [22]. Notably, the latter partial protein only comprised 773 amino acids residues, containing N-terminal putative GH5 domain (amino acids 15-322) and incomplete C-terminal GH6 domain (amino acids 688-773) which are interspaced by three cellulose binding CBM10 modules. Alphafold2 structure prediction [23] of TwCel5-6 through Colabfold [24] showed, as predicted, an unstructured N-terminus (most likely the signal peptide) followed by five modules all connected by flexible linkers (Fig. 1B; see Additional File 1: Fig. S1 for prediction quality). Interestingly, Alphafold modelling predicts that the two first CBM10 modules contain a disulfide bridge that connects the end of the N-terminal linker with the start of the C-terminal linker (Fig. 1C).

Fig. 1
figure 1

Multi-modular architecture of the TwCel5-6. A Display of the full-length gene encoding protein TwCel5-6 is composed of a signal peptide (SP), a glycosyl hydrolase 5 (GH5), three family 10 cellulose-binding modules (CBMs), and a glycosyl hydrolase 6 (GH6) catalytic domains, respectively. B Alphafold2 predicted structure of TwCel5-6; the unstructured linkers, CBM10s and the GH5 and GH6 catalytic modules are colored grey, green, blue, and magenta, respectively. C Details of the first CBM10 module showing the disulphide bridge (black arrow) connecting the linkers that enter and exit the module

Protein expression, purification, and activity screening of TwCel5

Attempts to produce the full-length protein and several truncated variants failed (data not shown), except for the GH5 catalytic domain alone, TwCel5CAT, and the catalytic domain connected to one CBM10, TwCel5CBM, which were both expressed in soluble and active form. cDNA encoding the GH5 catalytic module of the full-length protein was cloned without the predicted signal peptide (Fig S1). Both proteins were produced with an affinity tag; TwCel5CAT contained a C-terminal 6xHis-tag and TwCel5CAT contained N-terminal V5-6xHis-tag. The His-tagged enzymes were purified to homogeneity using affinity chromatography followed by size exclusion chromatography. SDS-PAGE analysis of the purified proteins showed single homogenous protein bands migrating at around ~ 35 kDa and ~ 45 kDa for TwCel5CAT and TwCel5CBM (Fig. 2A), which is in accordance with the calculated theoretical protein masses of 33.7 kDa and 44.9 kDa, respectively.

Fig. 2
figure 2

A SDS-PAGE analysis of purified TwCel5. Lane 1, standard molecular weight markers (kDa); lane 2, TwCel5CBM; lane 3, TwCel5CAT. B Screening of hydrolytic activity on three cellulosic 0.5% (w/v) substrates. C Comparative progress curves for the degradation of 0.5% (w/v) PASC. Reactions were performed in a 50 mM sodium phosphate buffer (pH 7.5) containing 0.5 M NaCl and 1 µM enzyme incubated at 30 ºC. Hydrolytic activities were determined by measuring the release of reducing sugars using DNSA assay

According to the CAZy database (available at, the GH5 family contains a wide range of enzymes acting on diverse β-linked substrates, including cellulose. Initial activity screening experiments indicated that the two enzyme variants of TwCel5 were able to hydrolyse amorphous cellulose (PASC). However, only negligible hydrolytic activity was observed towards crystalline Avicel and Whatman® paper (Fig. 2B). The absence of enzymatic activity toward crystalline cellulosic substrates indicates that the TwCel5 variants have a preference for the amorphous regions of cellulose such as PASC similar to, e.g., the KG35 (GH5) endo-β-1,4-glucanase that was obtained from the black-goat rumen [25]. The two variants showed similar activities for all three substrates, which is somewhat surprising since one would expect that the presence of a CBM10 in TwCel5CBM is beneficial for activity on insoluble substrates, particularly at the low substrate concentrations used in this study [26,27,28]. The similarity in activity was confirmed by recording comparative progress curves for PASC (Fig. 2C). This result may be taken to an indication that the CBM10 has little affinity for PASC substrate than cellulose or that function of this domain cannot be observed with this truncated variant of TwCel5-6.

Enzymatic activity on other polysaccharides

The substrate specificity of TwCel5 was assessed by measuring the hydrolytic activity on nine different soluble cellulosic and hemicellulosic substrates. Both variants showed the highest activity for reactions with mixed linkage β-1,3, β-1,4 β-glucan (Fig. 3). Furthermore, the two enzyme variants also exhibited (lower) activity on CMC and konjac glucomannan that contain β-1,4 glycosidic linkages. However, the enzyme without CBM produced slightly higher amounts of reducing sugars. Both enzymes were inactive against hemicellulose substrates, includes xylan, arabinogalactan, arabinan, gum arabic, xyloglucan, and lichenan (Fig. 3). It thus seems clear that the two enzyme variants prefer substrates with β-1,4-linked glucose monomers. This aligns well with previous findings for several GH5 enzymes that showed similar substrate specificities [29,30,31]. Of note, the enzyme showed no detectable activity toward ivory nut mannan that consist of pure mannose polymer linked via β-1,4 glycosidic linkage or natural substrates like birch- and spruce-wood powder (data not shown). It is thus likely that β-1,4 glycosidic bonds between glucose-containing substrates are the target for the TwCel5 enzyme when hydrolysing konjac glucomannan, which consists of β-1,4-linked mannose, and glucose residues in a 60:40 ratio. These results reveal that enzymes such as TwCel5 from shipworm P. megotara bacterial gill-symbiont are likely to play an important role in lignocellulose digestion in the shipworm gut by hydrolysing β-glucans to simple sugars like glucose in the cecum which can be used for energy metabolism and growth. This highlights the unique features of the shipworm that wood digestion does not take place in gills where the bacteria are located, but bacterial cellulases are finally transferred to cecum that enables the host shipworms to directly consume glucose and other sugars [11, 22].

Fig. 3
figure 3

Hydrolysis of cellulosic and hemicellulose substrates by TwCel5CAT and TwCel5CBM as a function of time. Reactions were performed in a 50 mM sodium phosphate buffer (pH 7.5) containing 0.5 M NaCl and 50 nM enzyme incubated at 30 ºC for 1 h in triplicates. Reducing sugar equivalents were quantified using glucose as a standard. The results correspond to mean and standard deviations of triplicates

The products analysis from enzymatic hydrolysis

To gain more insight into the mode of action of TwCel5, products generated by the enzyme after 60-min incubation with barley β-glucan and konjac glucomannan were analysed by MALDI-TOF mass spectrometry. The MALDI-TOF MS spectra showed that TwCel5CAT hydrolysed barley β-glucan to products ranging from a degree of polymerization from 4 (DP4) to DP14 (rather than only low DP products), suggesting that the enzyme mostly attacks internal glycosidic linkages (Fig. 4A, upper panel). Notably, products with DP6, DP9 and DP12 were not detected. Although this does not exactly shows how the β-glucan is hydrolysed, (barely β-glucan is a linear homopolysaccharide of consecutively linked β-(1,4)-glucosyl residues, i.e., oligomeric cellulose segments, that are separated by single β-(1,3)-linkages [32]. However, it does show that specific linkages, likely β-(1,3) linkages, are not cleaved by the enzyme. The product spectrum for konjac glucomannan showed a continuum of oligosaccharides ranging from DP4 to DP14 (Fig. 4A, lower panel), which one would expect if the ratio of glucose to mannan distribution is random, which is the case [33]. Konjac glucomannan contains about 5–10% of the acetylated sugars [34], and the oligosaccharides observed at masses m/z 2189 ([DP13 + Na + acetyl]+) and 2352 ([DP14 + Na + acetyl]+), indeed suggest the presence of acetylation. Furthermore, products generated from enzymatic hydrolytic of PASC by HPAEC-PAD analysis revealed cellobiose as predominant product with lesser amount of glucose, and cellotriose (Fig. 4A), as is commonly observed for cellulases. In summary, the observed activities, product distribution, and cleavage patterns strongly indicate that TwCel5 is a β-(1,4)-endo-glucanase with a mode of action resembling that of other known GH5 enzymes [30, 35, 36].

Fig. 4
figure 4

Analysis of hydrolytic products generated by TwCel5CAT on different polysaccharides. A HPAEC-PAD chromatogram showing the soluble products generated upon 0.5% (w/v) PASC incubation of 1 µM enzyme in 50 mM sodium phosphate buffer (pH 7.5) at 30 °C for 24 h. B MALDI-TOF MS analysis of products generated from β-glucan (upper) and konjac glucomannan (lower) 50 nM enzyme incubation in 50 mM sodium phosphate, pH 7.5 and 0.5 M NaCl for 60 min at 30 °C. The lower panel shows the Na+ and K+ adducts of oligosaccharide, and ‘DP’ stands for degree of polymerization or ‘Ac’ for acetylation. None of the labelled peaks were observed in the negative control (i.e., a reaction without enzyme). Note the m/z difference of 162 observed between the peaks suggest a difference of a hexose residue. The products were identified based on cello-oligo standards, as shown

TwCel5 displays broad pH and temperature stability

Using barley β-glucan as substrate, the pH and temperature stability of TwCel5CAT and TwCel5CBM were determined to further characterize their optimal activities under different physicochemical conditions. When incubated at 30 ºC, both variants were active over a broad pH range, from 5 to 8, with maximum activity observed at around pH 7.0–7.5 (Fig. 5). More than 70% activity was retained at between pH 4.5 and 9.6. In contrast, activity was drastically reduced when moving to more extreme pHs (Fig. 5). Relatively broad pH optima are commonly observed for several known GH5 enzymes, e.g., cellulase 5 from sugarcane soil metagenome [30] and endo-cellulases [31, 37]. However, some GH5 enzymes with slightly acidic pH optima have also been reported [37, 38]. It is worth noting that the enzyme remained active for at least 60 min when incubated at from pH 4.0 to pH 10.6, suggesting that the protein is relatively stable in a wide pH range.

Fig. 5
figure 5

Influence of pH on the hydrolytic activity and stability of TwCel5 displaying the pH dependency of the hydrolytic activity performed in 50 mM different buffer systems using 0.5% (w/v) β-glucan, 0.5 M NaCl and 50 nM enzyme variants, incubated at 30 ºC for 60 min. The reducing sugars values are mean and standard deviation derived from data obtained from triplicates

Furthermore, TwCel5 activity increased in a temperature-dependent manner reaching maximum activity at temperatures between 30 and 50 °C range (Fig. 6). At temperatures above 50 °C signs of enzyme inactivation became noticeable during the 60 min incubation period, and this effect became stronger for the enzyme connected with the CBM that showed slightly low activity. It is worth noting that TwCel5CAT retained over 50% hydrolytic activity at temperatures ranging from 10 °C to 60 °C, which suggests that this protein is both cold-adapted and moderately thermotolerant (Fig. 6). Although, shipworms are adapted to a cold environment, it has previously been reported that enzymes from shipworm symbionts exhibit biochemical properties similar to TwCel5CAT, for instance, a cellulolytic endo-1,4-β-glucanase obtained from shipworm Lyrodus pedicellatus [39]. It is also interesting to observe that, to the best of our knowledge, the combination of being active at rather an alkaline pH and being thermo-tolerant is a rather rare and unusual enzyme feature [40,41,42]. As TwCel5 originates from a marine shipworm symbiont that resides in ocean water, we assessed the impact of NaCl salt concentration on enzyme hydrolytic activity. Using standard assay conditions, we found that no impact of 0–1.5 M NaCl concentration on enzyme activity (Additional file 1: Fig. S2). In conclusion, TwCel5 is a salt-tolerant β-1,4-endo-glucanase capable of functioning in wide pH and temperature ranges, making it an interesting candidate relevant for industrial applications.

Fig. 6
figure 6

Influence of temperature on the hydrolytic activity and stability of TwCel5 displaying the temperature dependency on the hydrolysis of β-glucan performed in 50 mM sodium phosphate buffer, pH 7.5, using 0.5% (w/v) β-glucan, 0.5 M NaCl and 50 nM enzyme variants, incubated at different temperatures ranging from 5 ºC to 30 ºC for 60 min. The reducing sugars values are mean and standard deviation derived from triplicates

Co-incubation of TwCel5 with LPMO boosted hydrolytic activity

It is well-known that shipworm gill endosymbionts produce a multitude of carbohydrate-active enzymes, including cellulose active lytic polysaccharide monooxygenases (LPMOs) and diverse glycosyl hydrolases (GHs), to accomplish the efficient wood digestion for nutrition and growth [16, 43]. A recent study combined the meta-transcriptomic, proteomic and biochemical analysis from wood-degrading shipworm Lyrodus pedicellatus reported the expression of gene encoding auxiliary activity family 10 (AA10) LPMO, which likely synergise with endogenous as well as endosymbiont multi-domain glycoside hydrolases that functions in the hydrolysis of β-1,4-glucans [10]. It was thus of great interest to determine whether a typical cellulose-active AA10 LPMO, CelS2 from Streptomyces coelicolor A3(2) [20, 44] would enhance the hydrolytic activity of TwCel5CAT or TwCel5CBM especially on Avicel, a semi-crystalline form of cellulose. As expected, a clear synergistic effect was observed on the generation of reducing sugars when CelS2 and TwCel5CAT or TwCel5CBM were combined as an enzyme cocktail (Fig. 7A). Interestingly, in the reaction with CelS2, the amount of reducing sugars released was higher for enzyme that connected to CBM10, TwCel5CBM compared to catalytic TwCel5CAT. This may suggest that LPMO activity uncovers the regions of crystalline substrate where the CBM10 is partially beneficial for the efficiency of GH5. Of note, a control reaction (Fig. 7B) showed that the apparent synergy is not just a result of GH5-catalyzed hydrolysis of longer soluble cello-oligomers generated by the LPMO. Our results add to studies showing that cellulose-active LPMO boosts the activity of shipworm glycoside hydrolases on crystalline cellulose such as Avicel [43] and support the notion that LPMO action is importantfor wood depolymerization in shipworms for digestion and nutrition.

Fig. 7
figure 7

Synergy experiment between the cellulase active LPMO CelS2 and cellulase TwCel5 showing the boost in hydrolytic activity against Avicel. A TwCel5CAT, TwCel5CBM or CelS2 each 1 µM were incubated alone or in combination for different time durations, all with 1 mM ascorbate included in the reaction mixture. B Reactions conducted from the oxidised products of avicel by CelS2 after 48 h treatment were incubated with the TwCel5 variants for 18 h; “control” refers to a reaction in which no enzyme was added. Values obtained are means and standard deviation derived from triplicate

Structural analysis of cellulase TwCel5CAT

To gain structural insights for TwCel5CAT, bond cleavage pattern and mode of action, we solved and determined the tertiary structure of the catalytic module. TwCel5CAT was crystallized in an apo form and a dataset diffracting to 1.0 Å resolution was collected (Table 1). The structure was determined by molecular replacement using the protein coordinates from the structure of the GH5 cellulase Cel5 (PDB entry 1EGZ) from Erwinia chrysanthemi, a gram-negative plant pathogen [45] as the search model (Table 2). TwCel5CAT model was refined to Rwork and Rfree of 11.42 and 13.30, respectively, and the final model was deposited in the PDB database (PDB identifier 8C10). Structural comparisons using the DALI server [46] revealed the highest closest structural matches to the cellulase CelE1 belonging to GH5 family (PDB identifier 4M1R; 67% sequence identity) obtained from a sugarcane soil metagenome [30], and endoglucanase EGZ (PDB identifier 1EGZ; 67% sequence identity) from Erwinia chrysanthemi [45, 47], respectively.

Table 1 Data collection and processing statistics. Values in parentheses are for the outermost shell
Table 2 Structure determination and refinement statistics. Values in parentheses are for the outermost shell

The first three-dimensional structure of a CAZY family GH5 subfamily 2 was a cellulase isolated from an alkaline Bacillus sp., found in soda lakes [48]. Consistent with the defining family member, TwCel5CAT exhibits a classical (β/α)8-barrel fold (also known as TIM-barrel; Fig. 8A) with two conserved glutamates; Glu157 & Glu245 in the catalytic center (Fig. 8B) that are positioned to promote the expected displacement mechanism characteristic of family 5 cellulases [49]. Thus, from homology to other GH family 5 cellulases, the proton donor is expected to be Glu157, and the nucleophile is Glu228. Furthermore, substrate binding cleft of TwCel5CAT is similar to other clade 2 GH5 enzymes, showing conserved aromatic and polar amino acids involved in substrate binding (Fig. 8C). In contrast to most other GH5s subfamily 2, TwCel5CAT has a tryptophan (Trp226) in the outer region of the reducing end subsites (Fig. 8D). Of the 15 unique structures from GH5s subfamily 2, only three have a Trp in this position, namely the thermostable GsCelA from Geobacillus sp. 70PC53 [50], the halotolerant Cel5R cloned from a soil metagenome [51] and Cel5B from C. hutchinsonii [52], respectively. Of note, a comparison of the Alphafold2 model of TwCel5CAT with the X-ray crystallographic structure showed RMSD for Cα-carbons of only 0.35 Å (Fig. S3A). Furthermore, the side chains of the amino acids in the conserved active site and substrate binding cleft were modelled close to flawlessly (Additional file 1: Fig S3B).

Fig. 8
figure 8

Crystal structure analysis of TwCel5CAT. A Orthogonal view of the TwCel5CAT tertiary structure showing the characteristic (β/a)8-barrel fold. Helices are blue, β-strands cyan, and loops white are shown. Mg2+ and K+ ions are shown as green- and purple-colored spheres, respectively. B The putative active site of TwCel5CAT, with the side chains of Glu157 and Glu245 shown with yellow carbons. C Illustration of substrate binding groove with the side chains of amino acids potentially interacting with substrate shown with blue carbons. For illustration purposes, a thiocellopentaose molecule has been placed in substrate binding site by structural superposition of TwCel5CAT structure with the structure of ligand containing Cel5A from Bacillus agaradharens (PDB identifier: 1H5V). D Structural superposition of subsite defining amino acids of TwCel5CAT and B. agaradharens Cel5A, highlighting the putative + 3 subsite in TwCel5CAT. TwCel5CAT, blue carbons; Cel5A, grey carbons; substrate, green carbons


This study describes the biochemical characterization and structural properties of GH5 domain of multi-domain cellulase TwCel5-6 from the shipworm’s endosymbiont T. waterburyi residing in the gills of shipworm including P. megotara. It is an endo-glucanase enzyme showing activity at broader pH from 5 to 8 and temperatures 40–50 °C, respectively. TwCel5CAT hydrolytic activity on crystalline Avicel was boosted upon synergistic interaction with cellulose oxidizing CelS2, and in this reaction set-up the presence of a C-terminal CBM10 domain fused to TwCel5CAT promoted enhanced cellulose saccharification. This endo-glucanase may be a suitable biocatalyst for liberating reducers sugars from pre-treated biomass in a broader pH and temperature range, including alkaline pH, low temperature and high salt tolerance. In summary, our study demonstrates that wood-digesting shipworms are a good source of novel enzymes active at alkaline pHs and moderately thermostable cellulases.


Chemicals and substrates

Analytical grade substrates were used. The pre-packed 5 mL HisTrap affinity (HP), PD-10 desalting columns (Sephadex G-25 resin), and size exclusion chromatography column (HiLoad® Superdex, 75 pg) used for protein purification were obtained from GE Healthcare. Amorphous, phosphoric acid swollen cellulose (PASC) was prepared from Avicel according to the method described before [53]. High-purity substrates include barley β-glucan, lichenan, konjac glucomannan, wheat arabinoxylan, birchwood xylan, and tamarind xyloglucan purchased from Megazyme. The model crystalline cellulose substrate Avicel PH-101, carboxymethyl cellulose (CMC), gum arabic and standard cello-oligomers were purchased from Sigma-Aldrich.

Sample collection and identification of genes

Specimens of adult shipworm Psiloteredo megotara collected from Norway spruce (Picea abies) wooden panels submerged for about 8–9 months in the Arctic Sea near Tromsø, Norway (N 69°46′47.515″; E 18°23′53.143″). The sampling was done in accordance with the Norwegian Marine Resource Act. The shipworm specimen was initially rinsed with sterile water and dissected on a clean bench to separate the specialised gill tissue containing endosymbionts. Bacterial enrichment was performed using crushed gill tissues in a medium supplemented with cellulose as a carbon source that were incubated for several months. DNA was isolated from the enrichment culture using the DNeasy Blood & Tissue Kit (QIAGEN). The metagenome sequencing was carried out using Illumina MiSeq 300 paired-end chemistry at the Norwegian Sequencing Centre ( Analysis of contigs was assembled, annotated, and uploaded to the GenBank sequence database (accession number grp 8783669). Full details of the metagenomic dataset will be published elsewhere. Genes coding for carbohydrate-active enzymes were mined using the dbCAN meta server [54]. A gene with accession number OP793796 (3312 bp) that is located on BankIt2638814 Contig82 was chosen for further studies due to its novel multi-domain architecture.

Gene cloning, expression, and protein purification

A gene (OP793796) encoding multi-domain protein TwCel5-6 has an accession number WAK85940.1 (codon optimized for E. coli expression using the OptimumGene PSO algorithm) was synthesized by GenScript Biotech (Piscataway, NJ 08854, USA). Gene fragments encoding TwCel5CAT and TwCel5CBM were generated using PCR using Q5 DNA polymerase (New England Biolabs, Ipswich, MA) and the primers described in Table S1. Both the genes were amplified, excluding a putative signal peptide (Additional file 1: Fig. S4), as predicted using the SignalP-5.0 prediction tool [55]. PCR products were purified using a PCR clean-up kit as per the manufacturer’s instructions (Macherey–Nagel, Germany) followed by agarose (1 w/v %) gel electrophoresis. Prior to cloning, the DNA concentration was determined using a nano UV spectrophotometer (Thermo Scientific, San Jose, CA, USA). The DNA fragment encoding TwCel5CAT was cloned (26–320 amino acid residues) into the pNIC-CH expression vector (AddGene, Cambridge, MA), which adds a C-terminal polyhistidine-tag (6xHis-tag) to the protein as per manufactures instructions. Similarly, a DNA fragment encoding TwCel5CBM was cloned (30–412 amino acid residues) into the directional champion pET151/D-TOPO™ expression vector (Invitrogen, Carlsbad, CA), which adds a cleavable N-terminal V5-6xHis-tag to the protein as per manufactures instructions.

Using a heat shock transformation method, the recombinant vectors were transformed into chemically competent OneShot E. coli TOP10 (Invitrogen, Carlsbad, CA) cells. Cells were grown in SOC medium for 60 min prior to plating on lysogenic broth (LB) agar plates supplemented with antibiotics; 50 µg/mL kanamycin (for TwCel5CAT) and 100 mg/mL ampicillin (for TwCel5CBM) depending on the vector, followed by overnight incubation at 37 °C. Colonies on the LB plates were picked and screened by colony PCR using the pair of T7 primers: T7, 5ʹ-TAATACGACTCACTATAGGG-3ʹ and T7 reverse 5ʹ-TAGTTATTGCTCAGCGGTGG-3ʹ. Positive clones were picked and inoculated in liquid LB containing appropriate antibiotics (as mentioned above), and the cultures were incubated overnight at 37 °C with shaking at 200 rpm. The recombinant plasmids were isolated from the E. coli cells using the Zymo MiniPrep Kit (Zymo Research). Prior to transformation to expression cells, the correct integration and DNA sequence of the genes was confirmed using Sanger sequencing (GATC Biotech, Constance, Germany). The expression vectors were transformed into chemically competent OneShot BL-21 Star (DE3) and ArcticExpress (DE3) E. coli expression cells, for TwCel5CAT and TwCel5CBM, respectively, using heat shock method as described above.

To produce recombinant proteins, E. coli transformants were inoculated and grown in a terrific broth medium supplemented with appropriate antibiotics and cells were incubated at 37 °C with horizontal shaking (200 rpm) until the optical density (OD600nm) reached between 0.6 and 0.8 followed by induction by adding 0.3 mM isopropyl-β-D-thiogalactopyranoside (IPTG) and 24 h incubation at 15 °C with horizontal shaking (200 rpm). Cultures were harvested by centrifugation at 10,000 × g for 15 min at 8 °C, using a Beckman Coulter centrifuge (Brea, CA, USA). Cells were stored frozen at − 20 °C until further use. For protein purification, a cell-free extract was prepared by re-suspending about 5 g wet cell biomass in 50 mL of 50 mM Tris-HCl buffer, pH 7.4, supplemented with 200 mM NaCl, 10% glycerol and 30 mM imidazole (lysis buffer). Prior to cell disruption, the suspension was supplied with 0.5–1 mg of cOmplete, EDTA-free protease inhibitor cocktail (Roche) and lysozyme (0.5 mg/mL). Cells were disrupted using a cooled high-pressure homogenizer (LM20 Microfluidizer, Microfluidics). Cell debris was removed by centrifugation at 27,000 × g for 30 min at 4 °C. The resulting cell-free extracts, containing cytosolic soluble proteins, were filtered using a sterile 0.45 µm filter (Sarstedt, Nümbrecht, Germany).

The filtered cell-free extract was subjected to immobilized metal affinity chromatography using an Äkta pure chromatography system equipped with a 5-mL HisTrap HP column (GE HealthCare) equilibrated with lysis buffer (see above). After sample loading, the HisTrap column was washed extensively using 50 mM Tris-HCl, pH 7.4, supplemented with 200 mM NaCl, 10% glycerol containing 70 mM imidazole (wash buffer) until UV absorbance dropped and stabilised at the baseline level. Bound proteins were eluted using the same buffer supplemented with 500 mM imidazole (elution buffer). Eluted proteins were first analysed using SDS—polyacrylamide gel electrophoresis (SDS-PAGE) using TGX Stain-Free precast gels (Bio-Rad, Hercules, Ca, USA). The molecular weight of the recombinant proteins was estimated using Invitrogen BenchMark Pre-stained Protein Ladder (Fisher Scientific, Waltham, Massachusetts, USA). Eluted proteins were concentrated using Vivaspin® 10,000 MWCO centrifugal filter units (Sartorius, Göttingen, Germany). Proteins were purified to homogeneity using a size-exclusion chromatography column (HiLoad® Superdex, 75 pg) pre-equilibrated with 50 mM sodium phosphate buffer (pH 7.4) containing 150 mM NaCl. Finally, buffer exchange was performed to 50 mM sodium citrate (pH 5.6) using a PD-10 desalting column. The protein concentration was determined by UV absorbance at 280 nm (A280) using theoretical molar extinction coefficients (TwCel5CAT: 71390 M−1·cm−1 and TwCel5CBM: 96745 M−1·cm−1) estimated using the ProtParam tool [56]. Purified enzymes were stored at 4 °C.

Biochemical characterization of recombinant TwCel5

The standard assays were performed in 300 µL reaction volume containing 50 mM sodium phosphate buffer (pH 7.5), 0.5 M NaCl, 0.5% (w/v) β-glucan and 50 nM of the purified enzyme. The reaction was started either by adding enzyme or substrate followed by incubation at 30 ºC using a thermomixer with horizontal agitation (500 rpm; Eppendorf, Hamburg, Germany) for 60 min (unless stated otherwise). To determine the pH stability and pH optima, reactions were carried out in 50 mM buffer systems (sodium citrate, pH 3.0–6.0; potassium phosphate, pH 6.5–8.0; and glycine–NaOH, 9.6–10.6) containing 0.5 M NaCl, 0.5% (w/v) β-glucan and 50 nM of the purified enzyme. To determine the thermal stability and optimal temperature, reactions were performed in 50 mM sodium phosphate buffer (pH 7.5), 0.5 M NaCl, 0.5% (w/v) β-glucan and 50 nM of the purified enzyme at various temperatures ranging from 5 to 70 °C. The effect of the NaCl concentration on activity was determined using 50 mM sodium phosphate buffer (pH 7.5), 0.5% (w/v) β-glucan, 50 nM of the purified enzyme and various concentrations of NaCl ranging from 0 to 1.5 M. Aliquots were collected at different time intervals in a period of 60 min; reactions were stopped by mixing the samples immediately with DNSA reagent. Product formation was determined by quantifying the amount of reducing end sugars using the 3,5-dinitro salicylic acid (DNSA) assay method [57] using glucose as a standard. The absorbance (A540nm) was recorded using Varioskan™ LUX multimode microplate reader (Thermo Scientific, San Jose, CA, USA). All the assays were performed in triplicate.

Lignocellulose substrate specificity

The substrate specificity of purified TwCel5CAT and TwCel5CBM was evaluated using a wide variety of complex lignocellulosic substrates, both soluble and crystalline. The insoluble and model crystalline polysaccharide substrate include Avicel PH-101, and Whatman® cellulose filter paper (0.5 µm particle size), whereas phosphoric-acid swollen cellulose (PASC) was used as amorphous form of substrate. The soluble lignocellulosic substrates included β-glucan, birchwood xylan, carboxymethyl cellulose (CMC), wheat arabinoxylan, konjac glucomannan, xyloglucan, and lichenan. Soluble substrates were dissolved according to the supplier’s protocol. The standard reactions were carried out in 300 µL reaction volume using 50 mM sodium phosphate buffer, pH 7.5, 0.5 M NaCl, using 50 nM of enzyme for soluble substrates (0.5% w/v) whereas 1 µM for insoluble substrates (1% w/v) that were incubated at 30 °C for 60 min, for soluble substrates, or 24 h, for insoluble substrates, with horizontal agitation (500 rpm). Aliquots were taken at different intervals; reactions were stopped by mixing the samples immediately with the DSNA reagent. The release of reducing end sugars was measured using DNSA assay, using glucose as a standard, as described (see above). When using insoluble substrates, samples were filtered before measurement.

Cellulase-LPMO synergy experiment

Purified CelS2 from Streptomyces coelicolor was a kind gift from Dr. Zarah Forsberg [20]. The cellulase-LPMO synergy was assessed by performing reactions with crystalline Avicel (1% w/v) in 50 mM sodium phosphate buffer (pH 6.0) using a thermomixer (Eppendorf, Hamburg, Germany) incubated at 30 °C with horizontal agitation (1000 rpm). Experiments to determine the synergy were conducted at a fixed total enzyme concentration of, such as 1 µM copper saturated CelS2 and/or 1 µM of one of the TwCel5 cellulase variants. The reactions were started by firstly supplying 1 mM ascorbic acid (final concentration) to all reaction mixtures, immediately followed by the addition of the enzyme. Reactions were incubated for 48 h, and aliquots were taken at different time intervals, followed by filtration using 0.45 µm filter for removing insoluble substrate and to stop the reaction. To check for generation of reducing ends due to the action of the cellulase variants of oligomeric products solubilized by the LPMO, which could lead to a false impression of synergy, control reactions were performed in which Avicel was first incubated with the LPMO for 48 h, after which the products were treated with the TwCel5 variants. Cellulose saccharification was assessed using the reducing end assay described above and all the experiments were performed in triplicate.

Product analysis by HPAEC-PAD (ICS-6000)

Hydrolytic products generated from PASC were detected by a Dionex ICS6000 system (Thermo Scientific, San Jose, CA, USA) using high performance anion exchange chromatography connected to pulsed amperometric detector using CarboPac PA200 IC analytical column. The eluent B (0.1 M NaOH and 1 M sodium acetate) and eluent A (0.1 M NaOH) was applied using following gradient program: 0–5.5% B for 3 min, 5.5–15% B for 6 min, 15–100% B for 11 min, 100–0% B for 6 s, 0% B for 6 min. The eluent flow rate was set to 0.5 mL/min. The cello-oligosaccharide with a degree of polymerization from one to five (DP1–DP5), was used as standards for product identification. The data were analysed using Chromeleon 7.2.9 software.

Product analysis by MALDI-TOF MS

Hydrolytic products generated from β-glucan and konjac glucomannan were identified using an UltrafleXtreme matrix assisted laser desorption ionization time of flight mass spectrometer (Bruker Daltonics GmbH, Bremen, Germany) equipped with a Nitrogen 337-nm laser. Samples were prepared by mixing one microliter of the sample with two microliter of 2,5-dihydrooxybenzoic acid (DHB) solution (9 mg/mL) that was directly applied onto a MTP 384 ground steel target plate (Bruker Daltonics). Sample spots were allowed to dry on the plate using a flow of dry hot air. Data were acquired using Bruker flexControl and flexAnalysis software. Products were identified based on m/z values.

Crystallization, data collection and analysis

Crystallization experiments were performed with a stock solution of the purified protein at 12.4 mg/mL, as estimated by A280, in 6 mM NaCl, 20 mM Tris-HCl at pH 8.0. Initial crystallization experiments were performed using the vapour diffusion sitting drop method set up by a Phoenix crystallization robot (Art Robbins Instruments). The crystallisation experiments were set up with sixty µl reservoir solutions and sitting drops with equal amounts of reservoir solution mixed with protein stock solution in a total drop volume of one microliter. The plates were incubated at 20 °C. Crystals appeared in 1–2 weeks in conditions containing 0.2 M MgCl2, 0.1 M Tris-HCl, pH 8.5, and 25% w/v of polyethylene glycol 3350 (PEG 3350). Crystals were harvested and transferred to a cryoprotectant solution consisting of the reservoir solution supplemented with 15% ethylene glycol and flash-cooled in liquid N2. X-ray diffraction data were collected at BEAMLINE 14.1 at BESSY II (Berlin, Germany). Data collection and processing statistics are presented in Table 1, and structure determination and refinement statistics are presented in Table 2. The crystal structure was solved by molecular replacement using MolRep in the CCP4 program package [58] with 1egz.pdb as the search model [45]. The initial refinement was executed in Refmac [59] followed by automated model improvement in Buccaneer [60]. The manual model building was done in Coot [61] interspersed by cycles of refinement in Refmac and resulted in a final Rwork/Rfree of 11.42/13.39. The atomic coordinates and structural details have been deposited in the Protein Data Bank with the accession code 8C10. Figures presented in the results section were generated using Pymol v4.60 (

Availability of data and materials

All the data are available for publication and information used from online resources has been cited properly. All data generated or analysed during this study are included in this published article [and its supplementary information files].


  1. Klemm D, Heublein B, Fink HP, Bohn A. Cellulose: fascinating biopolymer and sustainable raw material. Angew Chem Int Ed Engl. 2005;44(22):3358–93.

    Article  CAS  PubMed  Google Scholar 

  2. Mariano M, El Kissi N, Dufresne A. Cellulose nanocrystals and related nanocomposites: review of some properties and challenges. J Polym Sci Part B Polym Phys. 2014;52(12):791–806.

    Article  CAS  Google Scholar 

  3. Zhang Z, Donaldson AA, Ma X. Advancements and future directions in enzyme technology for biomass conversion. Biotechnol Adv. 2012;30(4):913–9.

    Article  CAS  PubMed  Google Scholar 

  4. Himmel ME, Ding S-Y, Johnson DK, Adney WS, Nimlos MR, Brady JW, et al. Biomass recalcitrance: engineering plants and enzymes for biofuels production. Science. 2007;315(5813):804–7.

    Article  CAS  PubMed  Google Scholar 

  5. Østby H, Hansen LD, Horn SJ, Eijsink VGH, Várnai A. Enzymatic processing of lignocellulosic biomass: principles, recent advances and perspectives. J Ind Microbiol Biotechnol. 2020;47(9–10):623–57.

    Article  CAS  PubMed  Google Scholar 

  6. Andlar M, Rezić T, Marđetko N, Kracher D, Ludwig R, Šantek B. Lignocellulose degradation: an overview of fungi and fungal enzymes involved in lignocellulose degradation. Eng Life Sci. 2018;18(11):768–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cragg SM, Beckham GT, Bruce NC, Bugg TD, Distel DL, Dupree P, et al. Lignocellulose degradation mechanisms across the tree of Life. Curr Opin Chem Biol. 2015;29:108–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Berlemont R, Martiny AC. Genomic potential for polysaccharide deconstruction in bacteria. Appl Environ Microbiol. 2015;81(4):1513–9.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Bhardwaj N, Kumar B, Agrawal K, Verma P. Current perspective on production and applications of microbial cellulases: a review. Bioresources and Bioprocessing. 2021;8(1):95.

    Article  Google Scholar 

  10. Sabbadin F, Pesante G, Elias L, Besser K, Li Y, Steele-King C, et al. Uncovering the molecular mechanisms of lignocellulose digestion in shipworms. Biotechnol Biofuels. 2018;11(1):59.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Stravoravdis S, Shipway JR, Goodell B. How do shipworms eat wood? Screening shipworm gill symbiont genomes for lignin-modifying enzymes. Front Microbiol. 2021.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Distel DL. The biology of marine wood boring bivalves and their bacterial endosymbionts. In: Wood deterioration and preservation. Washington, DC: American Chemical Society; 2003. p. 253–71.

    Chapter  Google Scholar 

  13. Nair NB, Saraswathy M. The Biology of Wood-Boring Teredinid Molluscs. In: Russell FS, Yonge M, editors. Advances in Marine Biology. Academic Press; 1971. p. 335–509.

    Google Scholar 

  14. Distel DL, Roberts SJ. Bacterial endosymbionts in the gills of the deep-sea wood-boring bivalves Xylophaga atlantica and Xylophaga washingtona. Biol Bull. 1997;192(2):253–61.

    Article  CAS  PubMed  Google Scholar 

  15. Lechene CP, Luyten Y, McMahon G, Distel DL. Quantitative imaging of nitrogen fixation by individual bacteria within animal cells. Science. 2007;317(5844):1563–6.

    Article  CAS  PubMed  Google Scholar 

  16. O’Connor RM, Fung JM, Sharp KH, Benner JS, McClung C, Cushing S, et al. Gill bacteria enable a novel digestive strategy in a wood-feeding mollusk. Proc Natl Acad Sci. 2014;111(47):E5096–104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Brito TL, Campos AB, von Meijenfeldt FAB, Daniel JP, Ribeiro GB, Silva GGZ, et al. The gill-associated symbiont microbiome is a main source of woody-plant polysaccharide hydrolase genes and secondary metabolite gene clusters in Neoteredo reynei, a unique shipworm from south Atlantic mangroves. bioRxiv. 2018.

    Article  Google Scholar 

  18. Pesante G, Sabbadin F, Elias L, Steele-King C, Shipway JR, Dowle AA, et al. Characterisation of the enzyme transport path between shipworms and their bacterial symbionts. BMC Biol. 2021;19(1):233.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Drula E, Garron ML, Dogan S, Lombard V, Henrissat B, Terrapon N. The carbohydrate-active enzyme database: functions and literature. Nucleic Acids Res. 2022;50(D1):D571–7.

    Article  CAS  PubMed  Google Scholar 

  20. Forsberg Z, Vaaje-Kolstad G, Westereng B, Bunæs AC, Stenstrøm Y, MacKenzie A, et al. Cleavage of cellulose by a CBM33 protein. Protein Sci. 2011;20(9):1479–83.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Howard MB, Ekborg NA, Taylor LE, Hutcheson SW, Weiner RM. Identification and analysis of polyserine linker domains in prokaryotic proteins with emphasis on the marine bacterium Microbulbifer degradans. Protein Sci. 2004;13(5):1422–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Altamia MA, Shipway JR, Stein D, Betcher MA, Fung JM, Jospin G, et al. Teredinibacter waterburyi sp. nov., a marine, cellulolytic endosymbiotic bacterium isolated from the gills of the wood-boring mollusc Bankia setacea (Bivalvia: Teredinidae) and emended description of the genus Teredinibacter. Int J Syst Evol Microbiol. 2020;70(4):2388–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nat Methods. 2022;19(6):679–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Song Y-H, Lee K-T, Baek J-Y, Kim M-J, Kwon M-R, Kim Y-J, et al. Isolation and characterization of a novel endo-β-1,4-glucanase from a metagenomic library of the black-goat rumen. Braz J Microbiol. 2017;48(4):801–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Várnai A, Siika-Aho M, Viikari L. Carbohydrate-binding modules (CBMs) revisited: reduced amount of water counterbalances the need for CBMs. Biotechnol Biofuels. 2013;6(1):30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bernardes A, Pellegrini VOA, Curtolo F, Camilo CM, Mello BL, Johns MA, et al. Carbohydrate binding modules enhance cellulose enzymatic hydrolysis by increasing access of cellulases to the substrate. Carbohydr Polym. 2019;211:57–68.

    Article  CAS  PubMed  Google Scholar 

  28. Chalak A, Villares A, Moreau C, Haon M, Grisel S, d’Orlando A, et al. Influence of the carbohydrate-binding module on the activity of a fungal AA9 lytic polysaccharide monooxygenase on cellulosic substrates. Biotechnol Biofuels. 2019;12:206.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Liberato MV, Silveira RL, Prates ÉT, de Araujo EA, Pellegrini VOA, Camilo CM, et al. Molecular characterization of a family 5 glycoside hydrolase suggests an induced-fit enzymatic mechanism. Sci Rep. 2016;6(1):23473.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Alvarez TM, Paiva JH, Ruiz DM, Cairo JP, Pereira IO, Paixão DA, et al. Structure and function of a novel cellulase 5 from sugarcane soil metagenome. PLoS One. 2013.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Naas AE, MacKenzie AK, Dalhus B, Eijsink VGH, Pope PB. Structural features of a bacteroidetes-affiliated cellulase linked with a polysaccharide utilization locus. Sci Rep. 2015;5:11666.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Lazaridou A, Biliaderis CG. Molecular aspects of cereal β-glucan functionality: physical properties, technological applications and physiological effects. J Cereal Sci. 2007;46(2):101–18.

    Article  CAS  Google Scholar 

  33. Takigami S. Chapter 18 - Konjac glucomannan. In: Phillips GO, Williams PA, editors. Handbook of Hydrocolloids (Third Edition). Woodhead Publishing; 2021. p. 563–77.

    Chapter  Google Scholar 

  34. Witkamp RF. 31.5—Biologically Active Compounds in Food Products and Their Effects on Obesity and Diabetes. In: Liu H-W, Mander L, editors. Comprehensive Natural Products II. Oxford: Elsevier; 2010. p. 509–45.

    Chapter  Google Scholar 

  35. Barras F, Bortoli-German I, Bauzan M, Rouvier J, Gey C, Heyraud A, et al. Stereochemistry of the hydrolysis reaction catalyzed by endoglucanase Z from Erwinia chrysanthemi. FEBS Lett. 1992;300(2):145–8.

    Article  CAS  PubMed  Google Scholar 

  36. Bischoff KM, Rooney AP, Li XL, Liu S, Hughes SR. Purification and characterization of a family 5 endoglucanase from a moderately thermophilic strain of Bacillus licheniformis. Biotechnol Lett. 2006;28(21):1761–5.

    Article  CAS  PubMed  Google Scholar 

  37. Santos CR, Paiva JH, Sforça ML, Neves JL, Navarro RZ, Cota J, et al. Dissecting structure-function-stability relationships of a thermostable GH5-CBM3 cellulase from Bacillus subtilis 168. Biochem J. 2012;441(1):95–104.

    Article  CAS  PubMed  Google Scholar 

  38. Liu J, Liu WD, Zhao XL, Shen WJ, Cao H, Cui ZL. Cloning and functional characterization of a novel endo-β-1,4-glucanase gene from a soil-derived metagenomic library. Appl Microbiol Biotechnol. 2011;89(4):1083–92.

    Article  CAS  PubMed  Google Scholar 

  39. Xu P-N, Distel DL. Purification and characterization of an endo-1, 4-β-D glucanase from the cellulolytic system of the wood-boring marine mollusk Lyrodus pedicellatus (Bivalvia: Teredinidae). Mar Biol. 2004;144(5):947–53.

    Article  CAS  Google Scholar 

  40. Annamalai N, Thavasi R, Vijayalakshmi S, Balasubramanian T. A novel thermostable and halostable carboxymethylcellulase from marine bacterium bacillus licheniformisAU01. World J Microbiol Biotechnol. 2011;27(9):2111–5.

    Article  CAS  Google Scholar 

  41. Hakamada Y, Koike K, Yoshimatsu T, Mori H, Kobayashi T, Ito S. Thermostable alkaline cellulase from an alkaliphilic isolate, Bacillus sp. KSM-S237. Extremophiles. 1997;1(3):151–6.

    Article  CAS  PubMed  Google Scholar 

  42. Yin YR, Zhang F, Hu QW, Xian WD, Hozzein WN, Zhou EM, et al. Heterologous expression and characterization of a novel halotolerant, thermostable, and alkali-stable GH6 endoglucanase from Thermobifida halotolerans. Biotechnol Lett. 2015;37(4):857–62.

    Article  CAS  PubMed  Google Scholar 

  43. Fowler CA, Sabbadin F, Ciano L, Hemsworth GR, Elias L, Bruce N, et al. Discovery, activity and characterisation of an AA10 lytic polysaccharide oxygenase from the shipworm symbiont Teredinibacter turnerae. Biotechnol Biofuels. 2019;12(1):232.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Forsberg Z, Mackenzie AK, Sørlie M, Røhr ÅK, Helland R, Arvai AS, et al. Structural and functional characterization of a conserved pair of bacterial cellulose-oxidizing lytic polysaccharide monooxygenases. Proc Natl Acad Sci. 2014;111(23):8446–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Chapon V, Czjzek M, El Hassouni M, Py B, Juy M, Barras F. Type II protein secretion in gram-negative pathogenic bacteria: the study of the structure/secretion relationships of the cellulase cel5 (formerly EGZ) from Erwinia chrysanthemi11Edited by I. B Holland J Mol Biol. 2001;310(5):1055–66.

    Article  CAS  Google Scholar 

  46. Holm L, Laakso LM. Dali server update. Nucleic Acids Res. 2016;44(W1):W351–5.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Py B, Bortoli-German I, Haiech J, Chippaux M, Barras F. Cellulase EGZ of Erwinia chrysanthemi: structural organization and importance of His98 and Glu133 residues for catalysis. Protein Eng. 1991;4(3):325–33.

    Article  CAS  PubMed  Google Scholar 

  48. Shaw A, Bott R, Vonrhein C, Bricogne G, Power S, Day AG. A novel combination of two classic catalytic schemes. J Mol Biol. 2002;320(2):303–9.

    Article  CAS  PubMed  Google Scholar 

  49. Koshland DE Jr. 1953 Stereochemistry and the mechanism of enzymatic reactions. Biol Rev. 1953;28(4):416–36.

    Article  CAS  Google Scholar 

  50. Chang CJ, Lee CC, Chan YT, Trudeau DL, Wu MH, Tsai CH, et al. Exploring the mechanism responsible for cellulase thermostability by structure-guided recombination. PLoS ONE. 2016;11(3):e0147485.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Garg R, Srivastava R, Brahma V, Verma L, Karthikeyan S, Sahni G. Biochemical and structural characterization of a novel halotolerant cellulase from soil metagenome. Sci Rep. 2016;6(1):39634.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Zhu Y, Han L, Hefferon KL, Silvaggi NR, Wilson DB, McBride MJ. Periplasmic cytophaga hutchinsonii endoglucanases are required for use of crystalline cellulose as the sole source of carbon and energy. Appl Environ Microbiol. 2016;82(15):4835–45.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Wood TM. Preparation of crystalline, amorphous, and dyed cellulase substrates. In: Methods in Enzymology. Oxford: Academic Press; 1988. p. 19–25.

    Google Scholar 

  54. Zhang H, Yohe T, Huang L, Entwistle S, Wu P, Yang Z, et al. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 2018;46(W1):W95–101.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Almagro Armenteros JJ, Tsirigos KD, Sønderby CK, Petersen TN, Winther O, Brunak S, et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nature Biotechnol. 2019;37(4):420–3.

    Article  CAS  Google Scholar 

  56. Gasteiger E, Hoogland C, Gattiker A, Se Duvaud, Wilkins MR, Appel RD, et al. Protein Identification and Analysis Tools on the ExPASy Server. In: Walker JM, editor., et al., The proteomics protocols handbook. Totowa, NJ: Humana Press; 2005. p. 571–607.

    Chapter  Google Scholar 

  57. Miller GL. Use of dinitrosalicylic acid reagent for determination of reducing sugar. Anal Chem. 1959;31(3):426–8.

    Article  CAS  Google Scholar 

  58. Winn MD, Ballard CC, Cowtan KD, Dodson EJ, Emsley P, Evans PR, et al. Overview of the CCP4 suite and current developments. Acta Crystallogr D Biol Crystallogr. 2011;67(Pt 4):235–42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Murshudov GN, Vagin AA, Dodson EJ. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr. 1997;53(Pt 3):240–55.

    Article  CAS  PubMed  Google Scholar 

  60. Cowtan K. The buccaneer software for automated model building. 1. tracing protein chains. Acta Crystallographica Section D. 2006;62(9):1002–11.

    Article  CAS  Google Scholar 

  61. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004;60(Pt 12 Pt 1):2126–32.

    Article  PubMed  Google Scholar 

Download references


This research work received the financial support to Dr. Madan Junghare, position by a grant from ERA-NET MarineBiotech, provided by the Research Council of Norway (Grant Number 283647), and by the NorZymeD and OXYMOD projects financed by Research Council of Norway (Grant Numbers: 221568 and 269408), respectively.


ERA-NET MarineBiotech (ERA-MBT), provided by the Research Council of Norway (Grant Number 283647), and by the NorZymeD and OXYMOD projects financed by Research Council of Norway (Grant Numbers: 221568 and 269408), respectively.

Author information

Authors and Affiliations



MJ contributed majorly to perform the experiments, data analysis and writing of the first draft of the manuscript. TV helped to perform cellulase-LPMO synergy experiment. LF, BA and IL helped to perform crystallization and analysis of data. VE helped to revise the final version of the manuscript. GVK supervised the project and revised the final manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Madan Junghare or Gustav Vaaje-Kolstad.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Figure S1.

TwGH5-6 model sequence coverage and predicted LDDT. Figure S2. Effect of NaCl concentration on hydrolytic activity of TwCel5CAT and TwCel5CBM measured using β-glucan (0.5% w/v) in 50 mM potassium phosphate buffer at pH 7.5, 50 nM enzymes and varying concentration of NaCl (0 - 1.5 M) incubated at 30º for 60 minutes. Reducing sugar equivalents were calculated by DNSA assay using glucose as standard. Values shown are mean and standard deviation obtained from triplicates. Figure S3. Comparison of the experimentally determined structure of TwCel5CAT and its Alphafold2 model. (A) Cartoon representation of the X-ray crystallographic model (cyan) and the Alphafold2 predicted model (blue) structurally superimposed. (B) Comparison of conserved amino acids in the active site and substrate binding cleft. Figure S4. Prediction of the putative signal peptide in deduced amino acid sequence of TwCel5-6 (residues 1-25) based on the SignalP-5.0 prediction tool. Table S1. Primers used for amplification of gene encoding the TwCel5 variants.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Junghare, M., Manavalan, T., Fredriksen, L. et al. Biochemical and structural characterisation of a family GH5 cellulase from endosymbiont of shipworm P. megotara. Biotechnol Biofuels 16, 61 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: