Genomic and transcriptomic analysis of the thermophilic lignocellulose-degrading fungus Thielavia terrestris LPH172

Background Biomass-degrading enzymes with improved activity and stability can increase substrate saccharification and make biorefineries economically feasible. Filamentous fungi are a rich source of carbohydrate-active enzymes (CAZymes) for biomass degradation. The newly isolated LPH172 strain of the thermophilic Ascomycete Thielavia terrestris has been shown to possess high xylanase and cellulase activities and tolerate low pH and high temperatures. Here, we aimed to illuminate the lignocellulose-degrading machinery and novel carbohydrate-active enzymes in LPH172 in detail. Results We sequenced and analyzed the 36.6-Mb genome and transcriptome of LPH172 during growth on glucose, cellulose, rice straw, and beechwood xylan. 10,128 predicted genes were found in total, which included 411 CAZy domains. Compared to other fungi, auxiliary activity (AA) domains were particularly enriched. A higher GC content was found in coding sequences compared to the overall genome, as well as a high GC3 content, which is hypothesized to contribute to thermophilicity. Primarily auxiliary activity (AA) family 9 lytic polysaccharide monooxygenase (LPMO) and glycoside hydrolase (GH) family 7 glucanase encoding genes were upregulated when LPH172 was cultivated on cellulosic substrates. Conventional hemicellulose encoding genes (GH10, GH11 and various CEs), as well as AA9 LPMOs, were upregulated when LPH172 was cultivated on xylan. The observed co-expression and co-upregulation of genes encoding AA9 LPMOs, other AA CAZymes, and (hemi)cellulases point to a complex and nuanced degradation strategy. Conclusions Our analysis of the genome and transcriptome of T. terrestris LPH172 elucidates the enzyme arsenal that the fungus uses to degrade lignocellulosic substrates. The study provides the basis for future characterization of potential new enzymes for industrial biomass saccharification. Supplementary Information The online version contains supplementary material available at 10.1186/s13068-021-01975-1.


Background
The biorefinery concept represents the basis for a more sustainable bio-based economy aimed at converting abundant renewable biomass sources into energy and value-added products. Today, around 40 lignocellulosic biorefineries operate across Europe [1]. Even though lignocellulose is a potential biomass resource, its degradation is impeded by high lignin content and heterogeneity

Open Access
Biotechnology for Biofuels *Correspondence: lisbeth.olsson@chalmers.se † Monika Tõlgo and Silvia Hüttner share first authorship 1 Wallenberg Wood Science Centre, Department of Biology and Biological Engineering, Chalmers University of Technology, SE-412 96 Gothenburg, Sweden Full list of author information is available at the end of the article of its polysaccharide constituents [2,3]. Biomass saccharification into fermentable monomeric sugars by enzymatic hydrolysis is a crucial step in a biorefinery, but it is hindered by the high cost of enzymes. Indeed, enzymes have been estimated to add 1 USD/gallon to the cost of bioethanol produced from poplar. Thus, there is strong demand for improved enzyme activity and stability [4].
Various potential industrial enzymes exist in nature [5] and the Kingdom Fungi, with more than a million species, represents a particularly rich source [6]. As major biomass degraders, fungi possess a broad array of enzymes suitable for lignocellulose degradation, which are often secreted in large quantities [7]. Thermophilic and thermo-tolerant fungi are especially interesting, as their enzymes can usually endure harsh conditions used in the industry, such as extreme temperatures or pH and harsh solvents [8,9]. For example, biomass hydrolysis by the industrial Trichoderma reesei enzymes in a separate hydrolysis-fermentation process (SHF), is performed at 45-50 °C and pH 5 and therefore additional enzymes that are added to this process to enhance hydrolysis further should show high activity under the same conditions. Thermostable enzymes can lower industrial processing costs as they can achieve faster reaction rates, greater stability, and are more easily adjustable to various setups [10]. Yet, it should be kept in mind that novel enzymes from thermophiles are not necessarily thermostable when produced heterologously [11].
Thielavia terrestris (syn Thermothielavioides terrestris [12]) is a well-known filamentous fungus identified in 1983 as a potential source of thermostable industrial enzymes based on successful (hemi)cellulase assays [13,14]. The species is a thermophilic saprobic Ascomycete isolated mainly from soil and compost in Asia [15][16][17] and from a cave cricket species in North America [18]. T. terrestris also played a pivotal role in the discovery of the cellulase-boosting effect of the glycoside hydrolase family 61 (GH61) proteins [19][20][21][22][23], today known as auxiliary activity family 9 (AA9) lytic polysaccharide monooxygenases (LPMOs) [24]. As described by Merino and Cherry [19], cultivation broth from T. terrestris primed for cellulase production showed striking synergy in degrading pre-treated corn stover when supplemented with the enzyme cocktail Celluclast. In 2011, T. terrestris strain NRRL 8126 and Myceliophthora thermophila ATCC 42464 were the first thermophiles whose genomes were fully sequenced and the first filamentous fungi with known telomere-to-telomere genome sequences [25].
The same study showed that T. terrestris could potentially degrade all plant cell wall polysaccharides and the fungus hydrolyzed alfalfa straw at temperature optima of 40 °C and 60 °C. As shown by proteomics analyses [26] and detailed biochemical characterization [15,16,20,[27][28][29][30][31][32][33], T. terrestris produces an array of biomass-degrading enzymes. However, no study has elucidated the gene expression mechanisms underlying the lignocellulolytic machinery of the fungus in detail.
In recent years, it has become clear that genetic or gene expression differences between fungal strains of the same species are not uncommon [17,[34][35][36][37]. In this study, we set out to sequence and analyze the genome and transcriptome of the newly isolated T. terrestris strain LPH172, which is characterized by superior enzymatic activity, thermostability, and pH stability [17]. Our current study aimed to elucidate the lignocellulose-degrading machinery of the fungus in detail and identify novel carbohydrate-active enzymes (CAZymes). We observed some genomic differences between LPH172 and the previously sequenced strain NRRL 8126. To corroborate genomic CAZyme analysis with transcriptome data, we grew the fungus on four substrates: glucose, Avicel, rice straw, and beechwood xylan. We observed that the fungus relied mainly on LPMOs and canonical cellulases when grown on cellulosic substrates and on hemicellulosic substrates, more conventional hemicellulases were induced together with some LPMOs. Interestingly, we also report co-expression and co-upregulation between LPMOs and other AA enzymes.

Strain identification
We previously isolated the T. terrestris strain LPH172 from compost in Northern Vietnam and showed that it could be exploited as an industrially relevant enzyme producer [17]. To confirm the identity of the fungus, first, we used two common fungal housekeeping genes [38] encoding transcription elongation factor 1-α and β-tubulin. The homologous gene sequences used for the identification procedure are listed in Additional file 1. Both housekeeping genes were 100% identical (e-value 0) to the T. terrestris NRRL 8126 homologues.
Furthermore, we used phylogenetic analysis of closely related species to confirm the strain identity. Three maximum likelihood phylogenetic trees were constructed based on the internal transcribed spacer (ITS-D1D2), RNA polymerase II (RPB2) and β-tubulin (TUB2) genes. In all three constructed trees, the T. terrestris strain LPH172 clustered closely together with four other T. terrestris strains (Additional file 1: Figure S1).
Both of these results confirmed the fungus in the current study to be a strain of T. terrestris.

Growth on different carbohydrates
To assess the ability of T. terrestris LPH172 to utilize different carbon sources, the strain was grown on various defined substrates on agar. Growth was measured by the diameter and density of mycelia and was compared to a selection of known mesophilic and thermophilic biomass degraders (Fig. 1). T. terrestris LPH172 grew best on starch and xylose, followed by glucose, cellobiose, and beechwood xylan, whereas only modest growth was observed on the cellulosic substrates Avicel and carboxymethyl cellulose (CMC). This finding suggests relatively high activity of amylases, xylanases, and β-glucosidases. Direct comparison to the previously sequenced T. terrestris CBS 117535 (GenBank acc. nr PRJNA249224) showed that LPH172 grew slightly better on most substrates except glucose. Good growth was observed on pectin and inulin (a fructose-based polymer), whereas growth on locust bean gum and guar gum (galactomannans), as well as bark powder was poor (Additional file 2).

Genome characterization General features
The genome of T. terrestris LPH172 was sequenced on a PacBio RS II instrument by GATC Biotech (Constance, Germany); it yielded 5,27,523 reads comprising over 7 billion bases. Table 1 gives an overview of the sequenced T. terrestris LPH172 genome. Its size was determined to be 36.6 Mb and guanine and cytosine (GC) content was 54.80%. Assembly quality, based on basic sequence statistics, was high as revealed by an average contig size (N50) of 3 Mb and N50 read length of 19,832. To assess completeness and integrity of the genome assembly, Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis was performed [39]. Over 98% of BUSCO genes in the LPH172 genome were complete, indicating excellent assembly integrity. Gene prediction algorithms identified 10,128 protein-coding genes. Fig. 1 Growth of T. terrestris LPH172 and other biomass-degrading filamentous fungi on different carbon sources. Seven different carbohydrate substrates at 2% (w/v) were used as sole carbon sources for growth on agar plates: monosaccharides (glucose, xylose), disaccharides (cellobiose), and polysaccharides (starch, Avicel, carboxymethyl cellulose-CMC, beechwood xylan). No carbon source was added in the control. The plates were incubated at 30 °C (S. commune, A. oryzae) or 50 °C (M. thermophila, M. cinnamomea, T. terrestris) for 2-7 days. Growth was evaluated visually and categorized from − (no growth) to +++ (very good growth), depending on the extent and density of the mycelium The size of fungal genomes can vary by orders of magnitude and the average for Ascomycota is 36.91 Mb [40,41]. Table 2 gives a brief overview of the characteristics of the LPH172 genome compared to other industrial and lignocellulose-degrading fungi with varied origin and thermostability. Even though the genome of T. terrestris NRRL 8126 (GenBank assembly nr GCA_000226115), sequenced in 2010, was slightly larger than that of LPH172, our analysis suggested LPH172 contained approximately 200 more genes. This discrepancy, in addition to inherent differences between the two strains, is likely a consequence of ongoing improvements in sequencing and annotation. The genome size of LPH172 was similar to those of other fungi listed in Table 2, as well as to the average Ascomycota genome. The same was true for the average genome GC content. The average gene length in strain LPH172 was 1628 bp and the average coding sequence was 1355 bp (Table 1). On average, three exons per gene were predicted, with exons covering 45.10% of the genome. 88% of the genes could be functionally annotated with BLASTp, 69% of which with high certainty (e-value <1e −6 ).

Thermostability features
Although there is no clear consensus on the causes contributing to elevated optimum growth temperatures and thermotolerance in fungi, possible factors include a reduction in genome size [49], higher average GC content in coding regions, and greater GC content in the third position of codons (GC3 content) [25,50].
In contrast to the thermophilic Ascomycete Malbranchea cinnamomea [43], the genome of T. terrestris LPH172 was not smaller compared to that of other mesophilic fungi ( Table 2). Normalized (gene length) GC content in gene-coding sequences was 63.5%, which was higher than the genome average of 54.80%. When looking only at the subset of genes encoding CAZymes, the average normalized GC content was even higher (64.5%). Normalized GC3 content in LPH172 was also high, amounting to 80.7% in coding sequences and 85.7% in CAZyme-encoding sequences. We also detected gene TT_05393, encoding an unknown protein with 33% identity (e-value 1.3e −19 ) to the known thermotolerance gene THTA from Aspergillus fumigatus (GenBank: AY560012.1) [51].

CAZyme comparison with other fungi
Plant biomass-degrading and other CAZymes are catalogued into classes, families and subfamilies in the Carbohydrate Active enZymes (CAZy) database (http:// www. cazy. org/) [52].The number of individual CAZyme domains and distribution across different CAZy families in T. terrestris LPH172 was analyzed and compared to other known fungal biomass degraders to assess the propensity for lignocellulose degradation (  Fig. 2), but a larger complement of AA family domains, particularly AA7 (n = 20), AA9 (n = 18) and AA3 (n = 16) (Fig. 3). Five members of AA11 (chitincleaving) LPMO domains were detected in both T. terrestris strains, but no AA13 (starch-cleaving LPMOs) or AA14 (xylan-cleaving LPMOs) domains were observed. LPH172 and NRRL 8126 were the only fungi, among the ones selected, presenting an AA16, a recently characterized C1-hydroxylating LPMO [53]. Putative candidates for CAZymes capable of degrading all major lignocellulosic polymers (cellulose, xylan, xyloglucan, (galacto)glucomannan, pectin, and lignin), as well as starch, inulin, and chitin were found. This finding was in line with growth of T. terrestris on all these carbon sources (Additional file 2). At least ten putative homologues of known transcription factors directly regulating (hemi)cellulose utilization in ascomycetous fungi [54] were also detected in the genome of T. terrestris LPH172 (Additional file 4).

Transcriptome analysis
Highly expressed CAZyme-encoding genes during growth on Avicel, rice straw, and beechwood xylan To verify the genome annotation of LPH172 and analyze gene expression, in particular of CAZyme-encoding genes, the transcriptome was analyzed under different growth conditions. The fungus was grown in shake flasks on four substrates-glucose, Avicel, rice straw and beechwood xylan-and total mRNA was extracted and sequenced. Glucose was chosen as reference monosaccharide because its degradation involves a limited number of CAZymes and should, therefore, reflect expression of mostly constitutive CAZyme-encoding genes. Beechwood xylan, comprising a xylan backbone with 4-O-methyl glucuronic acid side groups, was selected to detect CAZymes required for hardwood hemicellulose degradation [43]. Rice straw, which contains approximately 12% lignin, 38% cellulose, and 25% hemicellulose [55] was chosen to represent a complex, heterogeneous substrate requiring a large array of different CAZymes for degradation. Importantly, rice straw has also vast potential as feedstock in biorefinery applications. Finally, Avicel, which is up to 98% cellulose [56,57], was selected to identify enzymes required to degrade a highly crystalline and recalcitrant cellulosic substrate. Initially, our transcriptome analysis also included corn cob xylan as an arabinoxylan-containing substrate model for cereals. The results were not included because the purchased corn cob xylan turned out to be composed only of xylo-oligomers [58]. Transcriptome data from RNAseq was used to refine gene annotation through ab initio training with GeneMark v4.3 and an evidence-guided build with MAKER package v3.01.1. Results are summarized in Tables 4-6.  To identify which CAZyme-encoding genes were the most highly expressed (by transcript abundance) on each chosen substrate, we looked at the top 20 (arbitrary number) candidates under each growth condition, ranked by their average transcripts per million (TPM) value. The complete list of all expressed genes with their TPM values is available in Additional file 5.
On rice straw, the most highly upregulated genes encoded a putative CE1 acetylxylan esterase (TT_05636 with log2FC 16), an AA9 LPMO (TT_09068 with log2FC 13) and a GH11 endo-β-1,4-xylanase (TT_01839 with log2FC 12). In general, the set of forty most highly upregulated putative CAZyme-encoding genes on rice straw shared some candidates with Avicel, such as nine AA9 LPMO genes and four hemicellulose-active GH10 and GH11 xylanases, a GH7 endo-β-1,4-glucanase and CE1, CE5 and CE16 esterases. A putative gene encoding a CBM50 domain (TT_00910) was highly upregulated both on Avicel and rice straw. In addition to the AA9 LPMO genes highly upregulated on Avicel as well, on rice straw two more AA9 genes appeared: TT_03770 and TT_07455 (both with log2FC 6). Generally, more hemicellulose-active enzyme encoding genes were among the forty highly upregulated genes on rice straw than on Avicel, which is expected with the more complex substrate composition of rice straw. These genes encoded putative GH10 (TT_09033 with log2FC 8) and GH11 (TT_02489 with log2FC 8) endo-β-1,4-xylanases, a GH62 α-l-arabinofuranosidase (TT_09005 with log2FC 10), two CE3 esterases (TT_07717 and TT_08970, both with log2FC 6), a CE1 feruloyl esterase (TT_06092 with Fig. 4 Combined expression and upregulation values of CAZyme-encoding genes during cultivation on four different substrates. Putative CAZyme-encoding genes involved in biomass degradation were analyzed for their expression levels (TPM, transcripts per million) from three biological replicates, as well as their differential expression (log2FC). The heatmap shows a combination of top forty most highly upregulated CAZyme-encoding genes on three substrates Avicel, rice straw (RS), beechwood xylan (BX) when compared to glucose (Glc). Shading ranges from low expression (light blue) to high expression (magenta). Log2 fold-change (log2FC) shows gene expression during cultivation on Avicel, RS, and BX compared to cultivation on glucose. Shading of upregulated genes (i.e., gene transcripts more abundant on Avicel, RS, and/or BX than on glucose) ranges from light yellow (low upregulation) to dark green (high upregulation). Downregulated genes or genes for which no differential expression was detected or where upregulation was not significant are indicated by blank cells. Only significantly upregulated genes are shown (p ≤ 0.05). All numbers were rounded to the nearest integer.  log2FC 8), a CE5 acetylxylan esterase (TT_05762 with log2FC 7) and a CE5 cutinase (TT_08797 with log2FC 8). The two putative mannosidase encoding genes highly upregulated on Avicel (TT_09640 with log2FC 8 and TT_06537 with log2FC 7) were also highly upregulated on rice straw. Two putative AA7 oxidoreductase encoding genes (TT_06681 with log2FC 4 and TT_03025 with log2FC 5), as well as a transcript with an AA8 domain (TT_02325 with log2FC 10), were also among the top forty highly upregulated genes on rice straw.

Discussion
The present study sought to explain in detail the enzymatic machinery T. terrestris LPH172 possesses to break down major lignocellulosic polymers based on genome and transcriptome analysis. Specifically, cellulose degradation appeared to rely mostly on LPMOs and some highly expressed canonical cellulases. Compared to other carbon sources, growth on Avicel was poor, yet LPH172 performed better on this substrate than most other fungi (Fig. 1). Poor growth on Avicel could result from lack of cellulase induction or the high crystallinity of Avicel. We noticed similar discrepancy between putative cellulose-degrading genes and poor growth on Avicel in our previous work with M. cinnamomea [43]. Growth discrepancies between the two T. terrestris strains LPH172 and CBS 117535 corroborate previously reported differences in biomass degradation and enzyme production between strains of the same species [17]. It has been reported that two strains of A. niger, for example, produced diverse sets of biomass-degrading enzymes, even when grown on the same plant biomass substrates [37,60]. This was mainly attributed to the postgenomic and regulatory differences between strains. Differential gene expression analysis helped to suggest the main enzymes involved in degradation of tested substrates (Fig. 4). The range of upregulated CAZyme-encoding genes was perhaps more diverse than expected, with genes encoding mannanases, xylanases, pectinases and lignin-active enzymes being upregulated on all substrates regardless of the presence or absence of the corresponding polymers. Co-regulation of biomass-degrading enzymes or the presence of traces of other polymers could explain induction of these genes. Enzymological studies that compare the activities and activity optima of these enzymes will help determine the function of seemingly redundant enzymes such as the abundantly expressed AA9 LPMO genes.
(Hemi)cellulase encoding genes were highly expressed and upregulated on Avicel, which is a cellulosic substrate and, hence, should not require hemicellulases for degradation. However, this type of unanticipated expression has been shown before in T. terrestris, when the cellulosic substrate CMC induced xylanase production [16]. On the other hand, minor amounts of xylan previously reported to be found in Avicel [56,57] could also stimulate xylanase expression. Putative direct (hemi)cellulase regulating transcription factors were analyzed in the genome of LPH172 and the presence of the prominent promiscuous regulator Xyr1 known to affect both cellulases and hemicellulases [61,62] was detected (Additional file 4). This finding supports the hypothesis of (hemi)cellulase coregulation in this fungus or alternatively, a combination of co-regulation and induction due to the xylan contamination in Avicel.
Since their discovery a decade ago, LPMOs have been studied in several different fungal, bacterial and even insect species, with new families and activities being continuously reported [53,63,64]. In T. terrestris LPH172, the transcriptomic data indicate that AA9 LPMOs play a crucial role in cellulose degradation, as six such enzymes were highly upregulated and five were very highly expressed during growth on Avicel. Interestingly, on all three polymeric substrates the highest upregulated genes belonged to this family. Several AA9 genes in LPH172 were highly upregulated on rice straw, which contains some cellulose, but also on purified beechwood xylan. We hypothesize that traces of cellulose in the beechwood xylan substrate induce the expression of cellulose-degrading enzymes, or that co-regulation occurs. Alternatively, certain AA9 LPMOs could act on non-cellulosic substrates, including xylan, mannan or xyloglucan, as reported, for instance, for AA9 LPMOs from M. cinnamomea [65] and M. thermophila [66] and N. crassa [67]. A clear preference for CBM1-containing genes was shown among the highly expressed and upregulated CAZyme genes on Avicel but also on rice straw, supporting the predicted cellulose-binding character of CBM1 proteins.
Other members of AA CAZy families were also highly expressed and/or significantly upregulated during growth of LPH172 on various substrates. AA3_1-AA8 cellobiose dehydrogenases (CBD) act as reducing agents to fuel LPMO reactions [[ 23 , 28 , 68 -71 ]] . However, not all fungi containing LPMO genes contain supplementary cellobiose dehydrogenase encoding genes [68]. We observed high co-expression and co-upregulation of these enzyme encoding genes on cellulose-containing substrates. The AA3_1-AA8 CBD gene (TT_04380) that was highly coupregulated with several AA9 LPMOs in our study has been shown to act in synergy with a Thermoascus aurantiacus GH61A (AA9) enzyme [28]. Interestingly, on the cellulosic substrates we also noted the upregulation of two AA8 cytochrome domain containing CAZyme genes (TT_02325 and TT_09190) which could also potentially reduce the copper in the active center of LPMOs. However, the electron transfer to these AA8 domains remains unclear. According to Pfam analysis, TT_09190 also contains a putative sugar transporter domain. Moreover, absence of AA8 co-upregulation with AA9 LPMO genes on beechwood xylan might indicate that a different reduction system is utilized on hemicellulosic substrates than on the cellulosic substrates. AA3_2 single-domain flavoenzymes have also been shown to act in synergy as electron donors for LPMOs [72]. Here, we detected coexpression and co-upregulation of an AA3_2 dehydrogenase gene (TT_08234) and AA9 genes on Avicel and beechwood xylan. Other AA CAZymes capable of producing H 2 O 2 , and therefore potentially serving as LPMO co-factors, are AA7 family oxidoreductases. In fact, an AA7 enzyme with a novel oligosaccharide dehydrogenase activity has been recently shown to both transfer electrons to the LPMO active site copper but also produce H 2 O 2 as a co-substrate of LPMOs [72]. Here, AA7 encoding genes TT_06681 and TT_03025 were upregulated on both Avicel and rice straw; however, their exact roles in biomass degradation have to be elucidated by future studies. Interestingly, out of the 20 AA7 domains in this strain only four were expressed according to our transcriptomic analysis. An AA3 enzyme (TT_07514) that did not fit in the top genes mentioned in our analysis in Fig. 4 but was still significantly upregulated on all substrates, is not yet classified into an AA3 sub-family according to dbCAN. This AA3 encoding gene found to contain two putative GMC-oxidoreductase domains using Pfam analysis, as well as a putative bacterial luciferase-like domain. To our knowledge, such a domain has not been seen before in combination with AA3 domains and may indicate a fifth sub-family of AA3 CAZymes, but further studies are crucial to substantiate this hypothesis.
In general, the elevated number of LPMO-encoding genes in the fungus, together with their high expression and upregulation confirm the importance of (AA9) LPMOs for plant biomass decomposition by T. terrestris and explains why studying the secretomes of this species had such a clear cellulase-boosting effect [19]. The numerous LPMOs in filamentous fungi support the concept of microbial mutualism. According to this concept, some fungi are responsible mainly for LPMO secretion and for attacking crystalline substrate surfaces, thereby making way for others to degrade amorphous polysaccharides and eventually benefitting the whole microbial community [60,73]. Such interactions have been documented with regard to the mutually beneficial synthesis of vital growth substances in fungi [74]. Analogously, white rot fungi are known to degrade lignin, whereas brown rotters mainly modify lignin [7], indicating unique specifications for lignocellulose degradation in different filamentous fungi.
Finally, regarding possible genetic factors contributing to fungal thermostability [50], the genome of T. terrestris LPH172 revealed high GC content in the coding sequences of all genes and in those encoding CAZymes as well. In addition, the observed high GC3 content could contribute to the thermophilic lifestyle in T. terrestris, as also noted by Berka et al. [25]. Further research is needed to confirm and elucidate the mechanisms of this interesting phenomenon.

Conclusion
We sequenced and analyzed the genome of a novel T. terrestris strain LPH172. Both genome and transcriptome analyses of the novel thermophilic T. terrestris strain LPH172 revealed in detail the enzymatic machinery used by the fungus to break down lignocellulosic biomass. Using transcriptome data from growth on glucose, Avicel, rice straw, and beechwood xylan we conclude that the fungus relies on an LPMO-centered strategy when grown on cellulosic substrates and more on canonical hemicellulases when grown on xylan. The LPMO-focused degradation approach is supported by co-regulation of other AA enzyme encoding genes that likely are expressed as LPMO co-factors. We also detected high GC and GC3 content as possible genomic characteristics contributing to the thermostability of the strain. The present study provides the basis for further biochemical characterization of the lignocellulose-degrading machinery in T. terrestris and filamentous fungi in general. The apparent complementary or redundant nature of certain CAZymes identified in this study needs to be investigated further with enzymological techniques, whereas a more detailed physiological understanding can be achieved with additional transcriptome and proteome studies.

Isolation and maintenance of fungi
Samples containing decaying plant residues (compost, grasses, rice straw, mushroom ground, wood, and soil) were collected from different provinces in Northern Vietnam during 2012-2016. Fungal strains were isolated as described by Thanh et al. [17] by incubation at 50 °C and under acidic conditions (pH 2.0) on medium containing untreated rice straw as the sole carbon source. After 7-10 days of incubation, fungal colonies were transferred to potato dextrose agar (PDA) plates and purified by hyphal tip culture at 50 °C. The isolates were maintained in PDA slants in a refrigerator at 2-8 °C.

DNA and RNA extraction
To extract genomic DNA, strain LPH172 was grown on a PDA plate for 5 days at 50 °C, the mycelium was divided into six equal parts, and each part was used as inoculum in 100 mL liquid base medium containing 2% glucose. Cultures were incubated in 500-mL baffled Erlenmeyer flasks at 50 °C and 150 rpm for 48 h. The mycelium was harvested by filtering through sterile Miracloth (Merck Millipore) and rinsing extensively with liquid base medium without glucose. After pressing out excessive moisture by hand, the mycelium was snap-frozen in liquid nitrogen and ground to a fine powder in a Tissue-Lyser (Qiagen) at 30-s, 30-Hz intervals with pre-cooled tungsten steel balls. CTAB buffer (2% CTAB, 100 mM Tris-HCl, pH 8.0, 20 mM EDTA, 1.4 M NaCl) was immediately added at 10 mL/g mycelium , briefly vortexed and the suspension incubated at 57 °C for 1 h. DNA was purified three times by phenol-chloroform extraction until no interphase was visible, followed by 2-propanol precipitation [75]. The resulting pellet was resuspended in 1 mL TE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA) and incubated with 200 μg mL −1 RNase A (Thermo Fisher Scientific) at 60 °C for 2 h to remove residual RNA. After an additional round of phenol-chloroform extraction, the pellet was resuspended in 150 μL TE buffer and DNA was further purified with the DNeasy Plant Mini Kit (Qiagen) according to the manufacturer's instructions. Quality of the purified DNA was verified by agarose gel electrophoresis, Nanodrop (Thermo Fisher Scientific), and Qubit Fluorometer (Thermo Fisher Scientific) before genome sequencing.
For RNA extraction, a 100-mL pre-culture on glucose was prepared as described above for DNA extraction. After harvesting and washing the mycelium, this was divided equally between 250-mL baffled Erlenmeyer flasks containing 50 mL basal liquid medium supplemented with 2% Avicel, beechwood xylan, rice straw, corn cob xylan or glucose. After 5 days of cultivation at 50 °C and 150 rpm, the mycelium was harvested, frozen, and ground to a powder, as described for DNA extraction. RNA was extracted using TRIzol (Invitrogen) and chloroform, and further purified with the RNAeasy Plant RNA kit (Qiagen) with on-column DNAse digestion. Quality of the purified RNA was checked by agarose gel electrophoresis, Nanodrop (Thermo Fisher Scientific), Qubit Fluorometer (Thermo Fisher Scientific) and Bioanalyzer (Agilent Technologies). High quality RNA for transcriptome sequencing had and OD 260/280 of 1.8-2.0, an OD 260/230 of 2-0-2.2 and an RNA integrity number (RIN) of ≥ 8. Unless otherwise mentioned, all chemicals were supplied by Merck, except for corn cob xylan (Carbosynth) and rice straw powder (Center for Industrial Microbiology).

Genome sequencing, assembly, and analysis
Genome sequencing and assembly was carried out by GATC Biotech (Constance, Germany). According to the company's proprietary protocols, an 8-12-kb library was prepared by DNA fragmentation, size selection, end repair and adapter ligation, primer annealing, and polymerase annealing. Sequencing was performed on a PacBio RS II instrument (raw data output 400 Mb for a genome of ~ 37 Mb) with an average read length of > 6000 bp. De novo assembly of PacBio RS reads was achieved with proprietary GATC Biotech methods optimized for the sequencing technology, read length and type of raw data and included read filtering by length and quality, error correction of long PacBio reads through alignment of short reads ("reads of insert"), assembly of error corrected reads and assembly polishing. Completeness of the genome was assessed with BUSCO (v3.0.2b) against the fungi_odb9 gene dataset (http:// busco dev. ezlab. org/ datas ets/ fungi odb9. tar. gz). To analyze GC and GC3 content, seqinr, Biostrings, and sscu R packages were used [76][77][78].

Transcriptome sequencing, assembly, and analysis
Transcriptome sequencing was performed by GATC Biotech (Constance, Germany) with the Inview Transcriptome Explore package. Briefly, a randomly primed cDNA library was prepared by purifying poly-A-containing mRNAs, fragmenting, adapter ligation, and PCR amplification. Illumina sequencing with single reads (50 bp) generated 24 million reads per sample that could be mapped to the reference. Quality checks were performed, and all samples achieved a percentage of clean reads > 95%. Assembly and annotation were done by National Bioinformatics Infrastructure Sweden (NBIS). Guided assembly was done with Tophat2 (v2.0.9) and Stringtie (v1.2.2), whereas repeat masking employed the Repeat-Modeler package (v1.0.8). Ab initio training for annotation was done with GeneMark-ET (v4.3), Augustus, and snap. Gene builds were computed using the MAKER package (v3.01.1), which employed the following software: exonerate (v2.4), Blast+ (v2.2.28), RepeatMasker (v4.0.3), Bioperl (v1.6.922), Augustus (v2.7), tRNAscan (v1.3.1), snap, and GeneMark-ET (v4.3). Functional annotation of genes and transcripts was performed using the translated CDS features of each coding transcript. For each predicted protein sequence, a BLASTp search was performed on the UniProt/Swiss-Prot reference dataset with default parameters (e-value cut-off = 1, similarity cut-off = 30%) to retrieve gene name and protein function. Secreted proteins were predicted using the SignalP 4.0 Server. Genes containing CAZy domains were identified using dbCAN2 (accessed October 2019). Reads were assigned and counted to the genome annotation using featureCounts of the Rsubread package (v2.2.6) in R and converted to TPMs. Differential gene expression was analyzed using edgeR (v3.30.3) [59] in R with TMMnormalization and removal of reads with less than 1 read per million in all samples.