Integration of Aspergillus niger transcriptomic profile with metabolic model identifies potential targets to optimise citric acid production from lignocellulosic hydrolysate

Background Citric acid is typically produced industrially by Aspergillus niger-mediated fermentation of a sucrose-based feedstock, such as molasses. The fungus Aspergillus niger has the potential to utilise lignocellulosic biomass, such as bagasse, for industrial-scale citric acid production, but realising this potential requires strain optimisation. Systems biology can accelerate strain engineering by systematic target identification, facilitated by methods for the integration of omics data into a high-quality metabolic model. In this work, we perform transcriptomic analysis to determine the temporal expression changes during fermentation of bagasse hydrolysate and develop an evolutionary algorithm to integrate the transcriptomic data with the available metabolic model to identify potential targets for strain engineering. Results The novel integrated procedure matures our understanding of suboptimal citric acid production and reveals potential targets for strain engineering, including targets consistent with the literature such as the up-regulation of citrate export and pyruvate carboxylase as well as novel targets such as the down-regulation of inorganic diphosphatase. Conclusions In this study, we demonstrate the production of citric acid from lignocellulosic hydrolysate and show how transcriptomic data across multiple timepoints can be coupled with evolutionary and metabolic modelling to identify potential targets for further engineering to maximise productivity from a chosen feedstock. The in silico strategies employed in this study can be applied to other biotechnological goals, assisting efforts to harness the potential of microorganisms for bio-based production of valuable chemicals. Supplementary Information The online version contains supplementary material available at 10.1186/s13068-021-02099-2.

underexploited as it is saprophytic in nature with an ability to assimilate at least 69 carbon sources and 30 nitrogen sources [6]. There is an increasing need to unlock this metabolic potential, so that A. niger can play a key role in harnessing the value of underutilised second-generation feedstocks for the bioeconomy [7]. One such feedstock is sugarcane bagasse, the main by-product of sugarcane processing and a potential source of lignocellulosic sugars. Global sugarcane production was around 1900 million tonnes in 2013 [8], generating around half a billion tonnes of bagasse. To achieve cost-competitive citric acid production from bagasse hydrolysate requires the optimisation of strains away from sucrose-based fermentation to bagasse hydrolysate as the fermentation medium.
Strain optimisation can be achieved either via cycles of random mutagenesis and selection or by targeted engineering. The former is well demonstrated for citric acid production by A. niger [9], and although successful, its iterative nature makes it laborious and requires a suitable selection and evolution strategy to be available or designed. Rational strain engineering provides a faster strain development process that achieves the required genetic changes in a more stable manner. Optimising strains via targeted engineering is dependent on a metabolic understanding of the target organism and an ability to accurately identify targets. The establishment of omics technologies has enabled researchers to develop a more comprehensive understanding of the target organism; however, this can be challenging given the volume of data from omics analyses. One core systems biology method, constraint-based metabolic modelling, has now developed an extraordinary number of differing methods to address this challenge and integrate omics data with metabolic models [10][11][12][13][14][15][16].
In this study, we highlight the potential of bagasse as a feedstock for citric acid production, examining the performance of A. niger for the fermentation of bagasse hydrolysate to citric acid. Using fermentative time series data, we adapted our dynamic model [17] to capture the dynamics of bagasse hydrolysate fermentation. We show that the performance of the strain in this study is suboptimal and investigate further using transcriptome analysis at key fermentation timepoints. By employing a novel method involving an evolutionary algorithm guided by transcriptome data, we identify targets to achieve optimal citric acid productivity from bagasse hydrolysate.

Fermenting sugarcane bagasse hydrolysate to produce citric acid
To evaluate the fermentation of sugarcane bagasse hydrolysate for the production of citric acid, we obtained fermentative time series data on citric and biomass output as well as glucose, xylose, and phosphate input. From a hydrolysate containing 120 g/L total sugars consisting of glucose (80 g/L) and xylose (40 g/L), 50 g/L citric acid was produced in 6 days (Fig. 1). Glucose was fully consumed by day 5 at which point xylose consumption increased significantly with full consumption of sugars by day 7, indicating a sequential uptake mechanism. We observed similar characteristics to citric acid fermentations performed previously [17] with the onset of citric acid production coinciding with the full depletion of external phosphate and a switch to phosphate-limited growth.

Simulating the fermentation of sugarcane bagasse hydrolysate to citric acid by dynamic modelling
To capture the dynamics of sugarcane bagasse hydrolysate fermentation in silico, we adapted our dynamic modelling framework [17] to reflect mixed glucose/xylose fermentations. The adapted model simulates the sequential uptake of glucose and xylose and with adjustments made to kinetic parameters (see Methods) gives close fits to the in vivo fermentation data (Fig. 1). The model estimated that citric acid titres could reach a maximum of 85 g/L, almost twofold higher than what we observed in vivo (Fig. 1). By imposing a constraint on citric acid output in silico, the model was able to reflect in vivo citric acid production ( Fig. 1), suggesting the strain we used is suboptimal and highlighting the need for strain optimisation to realise optimal productivity.

Transcriptomic analysis at selected timepoints to investigate the fermentation of sugarcane bagasse hydrolysate to citric acid
To extend our investigation, we performed transcriptomic analysis at three key fermentation timepoints (Fig. 1). The first timepoint (T1) was taken, while external phosphate was still present before the onset of citric acid production and phosphate-limited growth. The other two timepoints (T2 and T3) were taken during citric acid production; the first of these (T2), while glucose was being consumed and the second (T3) during the main xylose consumption phase after glucose was fully consumed. Differential expression analysis revealed a greater degree of similarity between the two citric acid producing timepoints (T2 and T3) than for comparisons between these and the non-citric acid producing timepoint (T1) (Fig. 2).
To enable us to identify potential in vivo constraints that limit citric acid production, we associated transcripts with the reactions in the metabolic model and determined expression at a reaction-level. The most differentially expressed transcripts with reaction associations are shown in Tables 1, 2, 3. With reaction-level expression determined, we constructed metabolic schematics to visualise the changes in the transcriptome and their reaction-level effects for a given comparison (Fig. 3).
In comparing T1 with T2, the scale of change is clear when transitioning to citric acid production with widespread differential expression events observed across metabolism (Fig. 3A). In particular, reactions involved in biomass production were down-regulated, while citrate export was up-regulated together with the downregulation of TCA cycle reactions involved in citrate catabolism. Unexpectedly, pyruvate carboxylase whose activity is important to citric acid production [18] was down-regulated, suggesting this step as a point of constraint in vivo.
The expression changes are less extensive when transitioning from glucose to xylose consumption and appear to be directed at the change in substrate use (Fig. 3B). These include up-regulation of xylose import and xylulose kinase as well as phosphoketolase and acetate kinase that appear to activate an alternative xylose catabolic pathway, which may be associated with upregulation of the glyoxylate shunt through an increased supply of acetyl-CoA. We also observed further  up-regulation of citrate export, yet the rate of citric acid production in silico is around 2.3 times higher at T2 than T3 when citric output is unconstrained ( Table 4). The lower citrate exporter expression at T2 with respect to T3 may indicate citrate export as a point of constraint in vivo.

Investigating suboptimal citric acid production by transcriptome-guided in silico evolution
To develop a metabolic understanding of suboptimal citric acid production, we developed an evolutionary algorithm to perform in silico evolution of the model with the aim of reflecting non-optimised strains. We focused on T2 as citric output is around 2.6 times higher at T2 when unconstrained (Table 4) than when constrained (Table 5) to fit in vivo data, whereas citric output at T3 is virtually the same. The objective was to identify changes to flux bounds that constrain citric output to the value that closely fits in vivo data while maintaining the same carbon input and biomass output. To achieve this, we adapted an evolutionary algorithm [19] to evolve the model to more accurately reflect the in vivo metabolic state that is associated with constrained citric production. As many solutions may exist to this, we used the transcriptomic data to guide the in silico evolution to limit solutions to those that are more likely to resemble the one indicated by the transcriptome. This constrained the evolutionary algorithm to alter flux bounds only on reactions where there is a significant differential expression, with such cases implying transcriptional regulation over the reaction's activity. We compared and analysed the solutions from eight independent runs of the evolutionary algorithm to suggest targets for increasing citric acid productivity (Fig. 4). In total, we found 91 reactions suggested for targeted intervention of their activity; 65 for down-regulation and 26 for up-regulation (Table 6). Together, the list of targets provides high coverage of all potential targets that could bring about optimal citric acid production. Some of the targets were expected and consistent with the literature for example the Grey dots indicate transcripts that are not significantly differentially expressed. A q value (adjusted p value) threshold of 0.01 was applied to determine statistical significance. The x-axis corresponds to log2FC between selected timepoints. The y-axis corresponds to − log10 of the q value (adjusted p value). Data-points corresponding to the most significantly differentially expressed transcripts (q value < 1E−40 and ranked by log2FC) with reaction associations in iDU1327 are circled.
A Differential expression analysis between T1 and T2. The transcripts and their associated reactions that correspond to circled data-points are given in Table 1. B Differential expression analysis between T1 and T3. The transcripts and their associated reactions that correspond to circled data-points are given in Table 2. C Differential expression analysis between T2 and T3. The transcripts and their associated reactions that correspond to circled data-points are given in Table 3 ◂ up-regulation of citrate export and pyruvate carboxylase, while other targets were novel such as the down-regulation of inorganic diphosphatase.

Discussion
In our in vivo fermentation experiments with sugarcane bagasse hydrolysate, we observed a promising yield of citric acid; up to 50 g/L in 6 days from 80 g/L glucose and 40 g/L xylose. In our simulations, however, up to 85 g/L citric acid could be produced. By our analysis of the transcriptome at key timepoints and with our in silico toolkit, we have determined what may underlie the suboptimal citric acid production. The exhaustive list of targets all involve a common feature: an aim to minimise carbon loss as CO 2 and maximise citric output. One example of a target that is associated with minimising carbon loss via CO 2 is the down-regulation of inorganic diphosphatase. Forcing flux of this reaction alone was able to decrease citric output to the target value, suggesting that a high level of inorganic diphosphatase activity may negatively affect citric acid production. The reaction catalysed by inorganic diphosphatase acts to dissipate energy, thereby supporting a high carbon input flux with carbon output predominantly to CO 2 . This finding also relates to our previous work [17] on the relationship between phosphate levels and citric acid production. Decreased activity of inorganic diphosphatase may limit internal phosphate levels and enhance citric acid production. The majority of our targets for down-regulation are associated with anabolic pathways involved in the synthesis of biomass components. As the production of biomass becomes restricted by phosphate availability during citric acid production, any excess in anabolic flux would result in futile pathways. Comparison of the biomass output flux values between T2 and T3 reveals that the growth rate is ≈30-fold higher at T2, yet the fold changes in expression of anabolic reactions are significantly less than the fold change in growth rate, suggesting that the expression of these reactions is not excessive at T2. Among our targets are expected changes in metabolism including the up-regulation of citrate export and pyruvate carboxylase, both of which have significantly lower expression at T2 with respect to T3. The flux through these steps would be higher at T2 than T3 in the case of optimal citric acid production, suggesting that expression should also be higher at T2 contrary to what we see in this study. The citrate exporter has been overexpressed previously which resulted in a fivefold increase in citric acid production [20], and pyruvate carboxylase has been overexpressed for increasing production of malic acid [21].
The importance of energy metabolism to citric acid production is highlighted by the frequent targeting of oxidative phosphorylation reactions. These reactions were down-regulated from T1 to T2 by around 2-2.6fold, and constraining the flux of these reactions in line with the transcriptome data led to a drop in citric production. This may seem counter-intuitive as the addition of oxidative phosphorylation inhibitors has been shown to increase citric acid production; however, negative effects were observed when the activity of oxidative phosphorylation was too low [22]. This is consistent with our study, which shows that over-constraint of oxidative phosphorylation decreases citric output.
The objective of our study was to identify targets for increasing citric acid production by integrating transcriptome data with metabolic modelling. Many efforts have been made to integrate transcriptome data with metabolic models, with early examples including the GIMME algorithm [10], E-Flux [11], and iMAT [13], and more recently SPOT [15]. A disadvantage of these approaches was their use of absolute expression data that may not correlate closely with reaction activity. An alternative is to use differential expression data that indicate which reactions are subject to transcriptional regulation, such as MADE where differential expression data are used to determine binary expression states [14]. Other methods include PROM that requires a regulatory network [12] and LBFBA that relies on flux data to parameterise linear reaction-specific functions to determine flux bounds from expression data [16]. Our approach infers from differential expression data the metabolic factors that underpin suboptimal citric acid production in Aspergillus, and is tailored to applications where there is a defined metabolic goal. Its basis is an evolutionary algorithm with changes to flux bounds guided by differential expression data. Its limitation is that it outputs a set of possible solutions rather than a unique solution.

Conclusions
In this study, we demonstrate the production of citric acid from lignocellulosic hydrolysate by an engineered variant of A. niger ATCC1015. By performing in silico simulations using a dynamic model, we show how transcriptomic data across multiple timepoints can be coupled with evolutionary and metabolic modelling to inform targeted engineering strategies aimed at maximising productivity from a chosen feedstock. The same in silico strategies employed here can be applied to other biotechnological goals, assisting efforts to harness the potential of microorganisms for bio-based production of valuable chemicals.

Preparation of sugarcane bagasse hydrolysate
Sugarcane bagasse was obtained from Natems Sugar Pvt. Ltd. (India) and dried at 50 °C overnight to reach constant weight. Bagasse was milled using a knife mill with a 1 mm sieve prior to pre-treatment. Pre-treatment was

Shake flask fermentation experiments with time-course sampling
Fermentation experiments were performed in 250 mL baffled shake flasks (Bellco Glass Inc.; Vineland, NJ, USA) at a working volume of 30 mL. Bagasse hydrolysate was supplemented with 3 g/L NaNO 3 and 10 mM uridine. Spores from the A. niger strain ATCC1015 ΔpyrG Δoah Δgox [17] were added at 1 × 10 6 spores/mL. Spores were harvested from potato dextrose agar slants supplemented with 10 mM uridine. The slants were incubated at 37 °C for 3 days and spores were harvested using sterile cotton wool buds. Spores were suspended in saline Tween (0.1% Tween 80, 9 g/L NaCl) and centrifuged at 2500 rpm for 5 min. Spores were then washed 3 times in saline Tween prior to being used to inoculate cultures. Cultures were incubated at 30 °C with shaking at 250 rpm for 8 days. 500 µl homogeneous samples were taken twice daily 6 h apart. The supernatant and the biomass were separated by centrifugation at 20238 g for 3 min and stored  at − 20 °C to prevent any changes to metabolite concentrations in the supernatant and any changes to biomass dry weight.

Extracellular metabolite and biomass dry weight analysis
Glucose, xylose, and citric acid were determined enzymatically using the K-GLUC, K-XYLOSE, and K-CITR kits, respectively (Megazyme International Ireland Ltd., Wicklow, Ireland). Phosphate was determined using an assay kit (ab65622; Abcam, Cambridge, UK). Biomass dry weight was determined by washing biomass samples in pre-dried, pre-weighed 1.5 mL Eppendorf tubes 7 times in 1 mL dH 2 O, followed by drying at 70 °C to constant weight. Between each of the washing steps, the biomass samples were centrifuged at 20238 g  Table 6 for corresponding reactions). Green and red dots indicate targets for up-and down-regulation, respectively. The sectors indicate the areas of metabolism that were targeted   for 3 min to pellet the biomass, and the supernatant was aspirated without disruption of the biomass pellet.

Dynamic modelling to simulate the fermentation of bagasse hydrolysate to citric acid
Dynamic modelling was done as described previously [17] with some modifications. In brief, the FBA calculations were performed using bespoke Java code which implements the GLPK toolkit (GNU). dFBA routines were written directly into the Java code with the differential equations representing transport reactions solved by simple time-stepping (Euler method) with small values for the time-step. The iDU1756 model [19] was used, and deletions of the pyrG, gox, and oah genes were simulated by setting the flux bounds of their corresponding reactions to zero. Nitrate was used as the nitrogen input and uridine input was enabled. The sequential uptake of glucose and xylose was modelled by disabling xylose transport-mediated uptake at external glucose concentrations above 5 mM; the threshold of 5 mM was applied as this gave the best fit to the in vivo data. The kinetic parameters applied in the model are given in Table 7. The dFBA start time was adjusted to 10 h after inoculation.

Sampling for transcriptome analysis and isolation of RNA for RNA-Seq
Cultures were setup as described previously. For the timepoints T1, T2, and T3, cultures were harvested in biological triplicates at 21, 72, and 132 h, respectively. Cultures were harvested as follows: The flask contents were filtered through a double layer of Miracloth to separate the mycelia from the culture liquid. The mycelia were washed 2 times in chilled 100 mM Tris.HCl buffer pH 7.5 (≈150 mL per wash) and then 3 times in chilled dH 2 O (≈150 mL per wash). Washed mycelia were squeezedried in Miracloth and transferred to 50 mL Falcon tubes on ice and then flash frozen in liquid nitrogen followed by storage at − 80 °C. Samples were freeze-dried overnight prior to use for RNA isolation and stored at − 80 °C thereafter. RNA was extracted and purified: For each sample, 5 mg freeze-dried mycelia were added to a precooled 2 mL Eppendorf tube with two 3 mm tungsten carbide beads. The tubes were dipped in liquid nitrogen and kept on ice. Freeze-dried mycelia were ground using a TissueLyser set to 30 Hz for 30 s four times. 1 mL TRIzol reagent was added to each sample of ground mycelia followed by agitation using a TissueLyser set to 30 Hz for 30 s four times. RNA was extracted by the TRIzol method (Thermo Fisher Scientific) according to the manufacturer's instructions. RNA pellets were air-dried at 37 °C

Bioinformatics processing of RNA-Seq data
The tools used to construct the reference transcriptome were HISAT2 [23], StringTie [24], Mikado [25], Portcullis [26], and TransDecoder [27]. The latest version (v 4.0) of the A. niger ATCC1015 genome annotation [28] available from the Joint Genome Institute was used. The mitochondrial transcripts determined for the A. niger strain N909 [29] were included in the reference transcriptome. The tools used to perform quantification and differential expression analysis were Salmon [30], Wasabi [31], and Sleuth [32]. Figure 5 shows the workflow followed to process the RNA-Seq data.

Functional annotation of transcripts and generation of transcript-reaction associations for the iDU1327 model
The gene-protein-reaction associations in the iDU1756 model [19] were replaced with transcript-reaction associations in the model iDU1327 (see Additional file 1), determined by a comprehensive functional annotation process that employed a multitude of tools (Table 8). Mapping files (Table 9) and the KEGG database [53] were used to map the output from each tool to gene ontology (GO) molecular functions, EC numbers, and KEGG reactions. A consensus functional annotation was built, and KEGG reactions were included if associated with an EC number in the consensus. Figure 6 shows the workflow followed to construct the consensus functional annotation.

Mapping transcript expression to a reaction-level
The transcript expression data were mapped to a reaction-level following the transcript-reaction associations in iDU1327 and according to the following rules: In the case of an OR relationship, the expression of associated transcripts was summed. In the case of an AND relationship, the minimum expression of associated transcripts was taken. Reactions with expression below the cut-off (TPM < 1) were switched off.

Transcriptome-guided in silico evolution of constrained citric production
The in silico evolution of constrained citric production was performed using an evolutionary algorithm implemented in Java [19] with changes made to the fitness function and mutation operator as detailed below:

(i) Fitness function
The fitness was calculated with respect to T2 by a leastsquares fitting procedure (1) F = −log 10 f t − f a 2 , where F is the fitness, f t is the target flux, and f a is the actual flux. The fluxes included were those for five exchange reactions (internal phosphate, glucose, xylose, biomass, and citric acid). The target citric output flux was set to 0.12 in line with in vivo data, and the other target fluxes were set to their original values.

(ii) Mutation operator
Flux bounds were subjected to change by the mutation operator, informed by differential expression events (fold change ≥ 2) at both T1 to T2 and T2 to T3, and resulted in flux either being constrained (down-regulation), forced (up-regulation), or unchanged (no differential expression). Mutations were permitted to alter flux bounds within the multiplicative bounds determined by the fold change in expression and fluxes at T1 and T3, with the effect of imposing a limit on the extent of flux constraint and a minimum value for forcing flux. Mutations were not allowed to force flux on reactions without a clear direction of flux or beyond the maximum allowable flux. For initial mutations, the flux bound was set randomly between the minimum flux forced and the maximum allowable flux if forcing flux or between the original flux and the limit of flux constraint if constraining flux. For subsequent mutations, the flux bound was mutated from the existing mutated flux bound. Mutations constraining flux were applied to both lower and upper bounds for reversible reactions. Mutations were performed by adding a small value to the flux bound determined by the double Laplace function. The location parameter, µ , was set to zero, and the scale parameter, b , was set according to where b is the scale parameter, and B is the flux bound that the mutation is applied to.

(iii) Driving evolution of multiple solutions
A fitness threshold of 6 was applied to identify evolved solutions, and these were analysed to identify their key reactions. This threshold was chosen as at this fitness value the fluxes of selected exchange reactions are sufficiently close to their target values. Each mutated flux bound was evaluated for its contribution to the fitness by complementation with the original flux bound, and the reaction corresponding to the mutation with the greatest contribution to fitness was identified as the key reaction. Flux bounds of the key reaction were then reset to the original across the population and blocked from mutating again. The in silico evolution was run for 150,000 (2) b = 0.01|B|, |B| > 0 b = 0.001, |B| = 0,   generations, a duration sufficient to allow for the evolution of multiple solutions.

Processing solutions from in silico evolution to suggest targets for strain optimisation
Solutions from in silico evolution were subjected to optimisation and simplification. Mutated flux bounds were evaluated for their contribution to the solution fitness by resetting them to the original flux bounds, and the mutations were removed if the solution fitness remained over a threshold of 6. Additionally, the mutations on a reaction's flux bounds were optimised by making small adjustments. Processed solutions were analysed to rank their mutations by contribution to the solution fitness, by individually complementing the mutated flux bounds on each reaction with the original flux bounds. Key Table 9 Mapping files used in functional annotation of transcripts Name of mapping file Source Date