Skip to main content

Integration of Aspergillus niger transcriptomic profile with metabolic model identifies potential targets to optimise citric acid production from lignocellulosic hydrolysate



Citric acid is typically produced industrially by Aspergillus niger-mediated fermentation of a sucrose-based feedstock, such as molasses. The fungus Aspergillus niger has the potential to utilise lignocellulosic biomass, such as bagasse, for industrial-scale citric acid production, but realising this potential requires strain optimisation. Systems biology can accelerate strain engineering by systematic target identification, facilitated by methods for the integration of omics data into a high-quality metabolic model. In this work, we perform transcriptomic analysis to determine the temporal expression changes during fermentation of bagasse hydrolysate and develop an evolutionary algorithm to integrate the transcriptomic data with the available metabolic model to identify potential targets for strain engineering.


The novel integrated procedure matures our understanding of suboptimal citric acid production and reveals potential targets for strain engineering, including targets consistent with the literature such as the up-regulation of citrate export and pyruvate carboxylase as well as novel targets such as the down-regulation of inorganic diphosphatase.


In this study, we demonstrate the production of citric acid from lignocellulosic hydrolysate and show how transcriptomic data across multiple timepoints can be coupled with evolutionary and metabolic modelling to identify potential targets for further engineering to maximise productivity from a chosen feedstock. The in silico strategies employed in this study can be applied to other biotechnological goals, assisting efforts to harness the potential of microorganisms for bio-based production of valuable chemicals.


For a century, the filamentous fungus Aspergillus niger has been used industrially for the production of citric acid; currently, production exceeds 2 million tonnes a year [1]. The ease of culture and its tolerance to typical industrial fermentation stresses make A. niger [2] a desirable organism for industrial applications. Beyond its established uses, A. niger also has potential to produce other valuable chemicals including succinic [3] and itaconic acid [4].

The commercial production of citric acid by A. niger fermentation is dependent on sucrose-based feedstocks, primarily molasses [5]. In this regard, A. niger is underexploited as it is saprophytic in nature with an ability to assimilate at least 69 carbon sources and 30 nitrogen sources [6]. There is an increasing need to unlock this metabolic potential, so that A. niger can play a key role in harnessing the value of underutilised second-generation feedstocks for the bioeconomy [7]. One such feedstock is sugarcane bagasse, the main by-product of sugarcane processing and a potential source of lignocellulosic sugars. Global sugarcane production was around 1900 million tonnes in 2013 [8], generating around half a billion tonnes of bagasse. To achieve cost-competitive citric acid production from bagasse hydrolysate requires the optimisation of strains away from sucrose-based fermentation to bagasse hydrolysate as the fermentation medium.

Strain optimisation can be achieved either via cycles of random mutagenesis and selection or by targeted engineering. The former is well demonstrated for citric acid production by A. niger [9], and although successful, its iterative nature makes it laborious and requires a suitable selection and evolution strategy to be available or designed. Rational strain engineering provides a faster strain development process that achieves the required genetic changes in a more stable manner. Optimising strains via targeted engineering is dependent on a metabolic understanding of the target organism and an ability to accurately identify targets. The establishment of omics technologies has enabled researchers to develop a more comprehensive understanding of the target organism; however, this can be challenging given the volume of data from omics analyses. One core systems biology method, constraint-based metabolic modelling, has now developed an extraordinary number of differing methods to address this challenge and integrate omics data with metabolic models [10,11,12,13,14,15,16].

In this study, we highlight the potential of bagasse as a feedstock for citric acid production, examining the performance of A. niger for the fermentation of bagasse hydrolysate to citric acid. Using fermentative time series data, we adapted our dynamic model [17] to capture the dynamics of bagasse hydrolysate fermentation. We show that the performance of the strain in this study is suboptimal and investigate further using transcriptome analysis at key fermentation timepoints. By employing a novel method involving an evolutionary algorithm guided by transcriptome data, we identify targets to achieve optimal citric acid productivity from bagasse hydrolysate.


Fermenting sugarcane bagasse hydrolysate to produce citric acid

To evaluate the fermentation of sugarcane bagasse hydrolysate for the production of citric acid, we obtained fermentative time series data on citric and biomass output as well as glucose, xylose, and phosphate input. From a hydrolysate containing 120 g/L total sugars consisting of glucose (80 g/L) and xylose (40 g/L), 50 g/L citric acid was produced in 6 days (Fig. 1). Glucose was fully consumed by day 5 at which point xylose consumption increased significantly with full consumption of sugars by day 7, indicating a sequential uptake mechanism. We observed similar characteristics to citric acid fermentations performed previously [17] with the onset of citric acid production coinciding with the full depletion of external phosphate and a switch to phosphate-limited growth.

Fig. 1
figure 1

Time series of sugarcane bagasse hydrolysate fermentation with dynamic modelling. Green dots correspond to in vivo fermentation data. Purple dashed vertical lines indicate timepoints chosen for transcriptome analysis. Purple triangles correspond to data from cultures used for transcriptome sampling. Solid brown lines represent in silico data from a simulation with citric output constrained to fit the in vivo data. A Change in biomass dry weight (g/L) over time. B Change in external phosphate concentration (g/L) over time. C Change in external citric acid concentration (g/L) over time. Dashed brown line represents in silico data from a simulation with unconstrained citric output. D Change in external glucose concentration (g/L) over time. E Change in external xylose concentration (g/L) over time. Individual data-points are shown

Simulating the fermentation of sugarcane bagasse hydrolysate to citric acid by dynamic modelling

To capture the dynamics of sugarcane bagasse hydrolysate fermentation in silico, we adapted our dynamic modelling framework [17] to reflect mixed glucose/xylose fermentations. The adapted model simulates the sequential uptake of glucose and xylose and with adjustments made to kinetic parameters (see Methods) gives close fits to the in vivo fermentation data (Fig. 1). The model estimated that citric acid titres could reach a maximum of 85 g/L, almost twofold higher than what we observed in vivo (Fig. 1). By imposing a constraint on citric acid output in silico, the model was able to reflect in vivo citric acid production (Fig. 1), suggesting the strain we used is suboptimal and highlighting the need for strain optimisation to realise optimal productivity.

Transcriptomic analysis at selected timepoints to investigate the fermentation of sugarcane bagasse hydrolysate to citric acid

To extend our investigation, we performed transcriptomic analysis at three key fermentation timepoints (Fig. 1). The first timepoint (T1) was taken, while external phosphate was still present before the onset of citric acid production and phosphate-limited growth. The other two timepoints (T2 and T3) were taken during citric acid production; the first of these (T2), while glucose was being consumed and the second (T3) during the main xylose consumption phase after glucose was fully consumed. Differential expression analysis revealed a greater degree of similarity between the two citric acid producing timepoints (T2 and T3) than for comparisons between these and the non-citric acid producing timepoint (T1) (Fig. 2).

Fig. 2
figure 2

Volcano plots showing the differential expression between selected timepoints. Green dots indicate transcripts that are up-regulated. Red dots indicate transcripts that are down-regulated. Grey dots indicate transcripts that are not significantly differentially expressed. A q value (adjusted p value) threshold of 0.01 was applied to determine statistical significance. The x-axis corresponds to log2FC between selected timepoints. The y-axis corresponds to − log10 of the q value (adjusted p value). Data-points corresponding to the most significantly differentially expressed transcripts (q value < 1E−40 and ranked by log2FC) with reaction associations in iDU1327 are circled. A Differential expression analysis between T1 and T2. The transcripts and their associated reactions that correspond to circled data-points are given in Table 1. B Differential expression analysis between T1 and T3. The transcripts and their associated reactions that correspond to circled data-points are given in Table 2. C Differential expression analysis between T2 and T3. The transcripts and their associated reactions that correspond to circled data-points are given in Table 3

To enable us to identify potential in vivo constraints that limit citric acid production, we associated transcripts with the reactions in the metabolic model and determined expression at a reaction-level. The most differentially expressed transcripts with reaction associations are shown in Tables 1, 2, 3. With reaction-level expression determined, we constructed metabolic schematics to visualise the changes in the transcriptome and their reaction-level effects for a given comparison (Fig. 3).

Table 1 Most significantly differentially expressed transcripts between timepoints T1 and T2 with reaction associations in iDU1327
Table 2 Most significantly differentially expressed transcripts between timepoints T1 and T3 with reaction associations in iDU1327
Table 3 Most significantly differentially expressed transcripts between timepoints T2 and T3 with reaction associations in iDU1327
Fig. 3
figure 3

Metabolic schematics showing the key changes in the transcriptome from T1 to T2 (A) and from T2 to T3 (B) and their reaction-level effects. Green upward and red downward arrows indicate significantly up- and down-regulated reactions, respectively (q value < 0.01 and log2FC > 0.7). The arrow width is directly proportional to the log2FC value. The full names of abbreviated metabolites are given in the iDU1327 model (see Additional file 1). Reactions included are shown in simplified form with only key reactants and products

In comparing T1 with T2, the scale of change is clear when transitioning to citric acid production with widespread differential expression events observed across metabolism (Fig. 3A). In particular, reactions involved in biomass production were down-regulated, while citrate export was up-regulated together with the down-regulation of TCA cycle reactions involved in citrate catabolism. Unexpectedly, pyruvate carboxylase whose activity is important to citric acid production [18] was down-regulated, suggesting this step as a point of constraint in vivo.

The expression changes are less extensive when transitioning from glucose to xylose consumption and appear to be directed at the change in substrate use (Fig. 3B). These include up-regulation of xylose import and xylulose kinase as well as phosphoketolase and acetate kinase that appear to activate an alternative xylose catabolic pathway, which may be associated with up-regulation of the glyoxylate shunt through an increased supply of acetyl-CoA. We also observed further up-regulation of citrate export, yet the rate of citric acid production in silico is around 2.3 times higher at T2 than T3 when citric output is unconstrained (Table 4). The lower citrate exporter expression at T2 with respect to T3 may indicate citrate export as a point of constraint in vivo.

Table 4 Input/output fluxes in iDU1327 at selected timepoints without constraint on citric output

Investigating suboptimal citric acid production by transcriptome-guided in silico evolution

To develop a metabolic understanding of suboptimal citric acid production, we developed an evolutionary algorithm to perform in silico evolution of the model with the aim of reflecting non-optimised strains. We focused on T2 as citric output is around 2.6 times higher at T2 when unconstrained (Table 4) than when constrained (Table 5) to fit in vivo data, whereas citric output at T3 is virtually the same. The objective was to identify changes to flux bounds that constrain citric output to the value that closely fits in vivo data while maintaining the same carbon input and biomass output. To achieve this, we adapted an evolutionary algorithm [19] to evolve the model to more accurately reflect the in vivo metabolic state that is associated with constrained citric production. As many solutions may exist to this, we used the transcriptomic data to guide the in silico evolution to limit solutions to those that are more likely to resemble the one indicated by the transcriptome. This constrained the evolutionary algorithm to alter flux bounds only on reactions where there is a significant differential expression, with such cases implying transcriptional regulation over the reaction’s activity. We compared and analysed the solutions from eight independent runs of the evolutionary algorithm to suggest targets for increasing citric acid productivity (Fig. 4). In total, we found 91 reactions suggested for targeted intervention of their activity; 65 for down-regulation and 26 for up-regulation (Table 6). Together, the list of targets provides high coverage of all potential targets that could bring about optimal citric acid production. Some of the targets were expected and consistent with the literature for example the up-regulation of citrate export and pyruvate carboxylase, while other targets were novel such as the down-regulation of inorganic diphosphatase.

Table 5 Input/output fluxes in iDU1327 at selected timepoints with citric output constrained in line with in vivo data
Fig. 4
figure 4

Plot giving a comparative view of the target suggestions from 8 independent runs of the evolutionary algorithm. Each of the eight grey circles corresponds to the results of one replicate run, and dots on these circles indicate which reactions were targeted in the given run. The IDs of the reactions targeted are shown on the outside (see Table 6 for corresponding reactions). Green and red dots indicate targets for up- and down-regulation, respectively. The sectors indicate the areas of metabolism that were targeted

Table 6 Suggested targets for increasing citric acid output based on results from the evolutionary algorithm (primarily ranked by frequency and then by citric %increase)


In our in vivo fermentation experiments with sugarcane bagasse hydrolysate, we observed a promising yield of citric acid; up to 50 g/L in 6 days from 80 g/L glucose and 40 g/L xylose. In our simulations, however, up to 85 g/L citric acid could be produced. By our analysis of the transcriptome at key timepoints and with our in silico toolkit, we have determined what may underlie the suboptimal citric acid production. The exhaustive list of targets all involve a common feature: an aim to minimise carbon loss as CO2 and maximise citric output.

One example of a target that is associated with minimising carbon loss via CO2 is the down-regulation of inorganic diphosphatase. Forcing flux of this reaction alone was able to decrease citric output to the target value, suggesting that a high level of inorganic diphosphatase activity may negatively affect citric acid production. The reaction catalysed by inorganic diphosphatase acts to dissipate energy, thereby supporting a high carbon input flux with carbon output predominantly to CO2. This finding also relates to our previous work [17] on the relationship between phosphate levels and citric acid production. Decreased activity of inorganic diphosphatase may limit internal phosphate levels and enhance citric acid production. The majority of our targets for down-regulation are associated with anabolic pathways involved in the synthesis of biomass components. As the production of biomass becomes restricted by phosphate availability during citric acid production, any excess in anabolic flux would result in futile pathways. Comparison of the biomass output flux values between T2 and T3 reveals that the growth rate is ≈30-fold higher at T2, yet the fold changes in expression of anabolic reactions are significantly less than the fold change in growth rate, suggesting that the expression of these reactions is not excessive at T2.

Among our targets are expected changes in metabolism including the up-regulation of citrate export and pyruvate carboxylase, both of which have significantly lower expression at T2 with respect to T3. The flux through these steps would be higher at T2 than T3 in the case of optimal citric acid production, suggesting that expression should also be higher at T2 contrary to what we see in this study. The citrate exporter has been overexpressed previously which resulted in a fivefold increase in citric acid production [20], and pyruvate carboxylase has been overexpressed for increasing production of malic acid [21].

The importance of energy metabolism to citric acid production is highlighted by the frequent targeting of oxidative phosphorylation reactions. These reactions were down-regulated from T1 to T2 by around 2–2.6-fold, and constraining the flux of these reactions in line with the transcriptome data led to a drop in citric production. This may seem counter-intuitive as the addition of oxidative phosphorylation inhibitors has been shown to increase citric acid production; however, negative effects were observed when the activity of oxidative phosphorylation was too low [22]. This is consistent with our study, which shows that over-constraint of oxidative phosphorylation decreases citric output.

The objective of our study was to identify targets for increasing citric acid production by integrating transcriptome data with metabolic modelling. Many efforts have been made to integrate transcriptome data with metabolic models, with early examples including the GIMME algorithm [10], E-Flux [11], and iMAT [13], and more recently SPOT [15]. A disadvantage of these approaches was their use of absolute expression data that may not correlate closely with reaction activity. An alternative is to use differential expression data that indicate which reactions are subject to transcriptional regulation, such as MADE where differential expression data are used to determine binary expression states [14]. Other methods include PROM that requires a regulatory network [12] and LBFBA that relies on flux data to parameterise linear reaction-specific functions to determine flux bounds from expression data [16]. Our approach infers from differential expression data the metabolic factors that underpin suboptimal citric acid production in Aspergillus, and is tailored to applications where there is a defined metabolic goal. Its basis is an evolutionary algorithm with changes to flux bounds guided by differential expression data. Its limitation is that it outputs a set of possible solutions rather than a unique solution.


In this study, we demonstrate the production of citric acid from lignocellulosic hydrolysate by an engineered variant of A. niger ATCC1015. By performing in silico simulations using a dynamic model, we show how transcriptomic data across multiple timepoints can be coupled with evolutionary and metabolic modelling to inform targeted engineering strategies aimed at maximising productivity from a chosen feedstock. The same in silico strategies employed here can be applied to other biotechnological goals, assisting efforts to harness the potential of microorganisms for bio-based production of valuable chemicals.


Preparation of sugarcane bagasse hydrolysate

Sugarcane bagasse was obtained from Natems Sugar Pvt. Ltd. (India) and dried at 50 °C overnight to reach constant weight. Bagasse was milled using a knife mill with a 1 mm sieve prior to pre-treatment. Pre-treatment was performed in a 2 L vessel (Parr Instrument Company, Moline, IL, US): 100 g milled bagasse was added to the vessel and mixed with 900 mL 0.4 M NaOH to homogeneity. The vessel was heated to 140 °C and maintained at 140 °C for 45 min, and then cooled on ice until the temperature dropped to 60 °C. The contents of the vessel were transferred to a fruit press after pre-treatment. Pre-treated bagasse was pressed to remove the pre-treatment liquor and rinsed twice in 500 mL acidified dH2O. The acidified dH2O was prepared by adding 100 µl concentrated H2SO4 to 1.2 L dH2O. After rinsing, pre-treated bagasse was adjusted to pH 5–6 by the addition of concentrated H2SO4. The pre-treated bagasse was then transferred to Weck jars and autoclaved (121 °C 15 min) followed by storage at 4 °C until use. The pre-treated bagasse was subjected to enzymatic hydrolysis in 1 L shake flasks: Pre-treated bagasse was added to the flask at the equivalent of 50 g dry weight and autoclaved. Under aseptic conditions, 10 mL 1 M MES buffer pH 5.5 (filter sterile) and 24.5 mL enzyme solution (filter sterile) were added, followed by sterile dH2O up to a final volume of 400 mL. Enzyme solution was prepared by mixing 20 g Cellic CTec3 (Novozymes) with 20 g 25 mM MES buffer pH 5.5. Flasks were incubated at 50 °C with shaking at 160 rpm for 48 h. After hydrolysis, the hydrolysate slurry was centrifuged at 4600 rpm for 20 min in a Multifuge 3 SR benchtop centrifuge (Heraeus, Germany). The clear supernatant was filtered through Whatman glass microfibre filters GF/F (GE Healthcare UK Ltd., UK) using a vacuum pump and then filter sterilised into a sterile glass bottle using a Stericap™ PLUS filter (Merck Millipore). The filter sterile hydrolysate was stored at 4 °C.

Shake flask fermentation experiments with time-course sampling

Fermentation experiments were performed in 250 mL baffled shake flasks (Bellco Glass Inc.; Vineland, NJ, USA) at a working volume of 30 mL. Bagasse hydrolysate was supplemented with 3 g/L NaNO3 and 10 mM uridine. Spores from the A. niger strain ATCC1015 ΔpyrG Δoah Δgox [17] were added at 1 × 106 spores/mL. Spores were harvested from potato dextrose agar slants supplemented with 10 mM uridine. The slants were incubated at 37 °C for 3 days and spores were harvested using sterile cotton wool buds. Spores were suspended in saline Tween (0.1% Tween 80, 9 g/L NaCl) and centrifuged at 2500 rpm for 5 min. Spores were then washed 3 times in saline Tween prior to being used to inoculate cultures. Cultures were incubated at 30 °C with shaking at 250 rpm for 8 days. 500 µl homogeneous samples were taken twice daily 6 h apart. The supernatant and the biomass were separated by centrifugation at 20238 g for 3 min and stored at − 20 °C to prevent any changes to metabolite concentrations in the supernatant and any changes to biomass dry weight.

Extracellular metabolite and biomass dry weight analysis

Glucose, xylose, and citric acid were determined enzymatically using the K-GLUC, K-XYLOSE, and K-CITR kits, respectively (Megazyme International Ireland Ltd., Wicklow, Ireland). Phosphate was determined using an assay kit (ab65622; Abcam, Cambridge, UK). Biomass dry weight was determined by washing biomass samples in pre-dried, pre-weighed 1.5 mL Eppendorf tubes 7 times in 1 mL dH2O, followed by drying at 70 °C to constant weight. Between each of the washing steps, the biomass samples were centrifuged at 20238 g for 3 min to pellet the biomass, and the supernatant was aspirated without disruption of the biomass pellet.

Dynamic modelling to simulate the fermentation of bagasse hydrolysate to citric acid

Dynamic modelling was done as described previously [17] with some modifications. In brief, the FBA calculations were performed using bespoke Java code which implements the GLPK toolkit (GNU). dFBA routines were written directly into the Java code with the differential equations representing transport reactions solved by simple time-stepping (Euler method) with small values for the time-step. The iDU1756 model [19] was used, and deletions of the pyrG, gox, and oah genes were simulated by setting the flux bounds of their corresponding reactions to zero. Nitrate was used as the nitrogen input and uridine input was enabled. The sequential uptake of glucose and xylose was modelled by disabling xylose transport-mediated uptake at external glucose concentrations above 5 mM; the threshold of 5 mM was applied as this gave the best fit to the in vivo data. The kinetic parameters applied in the model are given in Table 7. The dFBA start time was adjusted to 10 h after inoculation.

Table 7 Kinetic parameters applied in dynamic modelling of sugarcane bagasse hydrolysate fermentation

Sampling for transcriptome analysis and isolation of RNA for RNA-Seq

Cultures were setup as described previously. For the timepoints T1, T2, and T3, cultures were harvested in biological triplicates at 21, 72, and 132 h, respectively. Cultures were harvested as follows: The flask contents were filtered through a double layer of Miracloth to separate the mycelia from the culture liquid. The mycelia were washed 2 times in chilled 100 mM Tris.HCl buffer pH 7.5 (≈150 mL per wash) and then 3 times in chilled dH2O (≈150 mL per wash). Washed mycelia were squeeze-dried in Miracloth and transferred to 50 mL Falcon tubes on ice and then flash frozen in liquid nitrogen followed by storage at − 80 °C. Samples were freeze-dried overnight prior to use for RNA isolation and stored at − 80 °C thereafter. RNA was extracted and purified: For each sample, 5 mg freeze-dried mycelia were added to a pre-cooled 2 mL Eppendorf tube with two 3 mm tungsten carbide beads. The tubes were dipped in liquid nitrogen and kept on ice. Freeze-dried mycelia were ground using a TissueLyser set to 30 Hz for 30 s four times. 1 mL TRIzol reagent was added to each sample of ground mycelia followed by agitation using a TissueLyser set to 30 Hz for 30 s four times. RNA was extracted by the TRIzol method (Thermo Fisher Scientific) according to the manufacturer’s instructions. RNA pellets were air-dried at 37 °C for 10 min and solubilised in 200 µl RNase-free water at 60 °C for 15 min. RNA samples were stored at − 80 °C. RNA samples were further purified using the TURBO DNA-free™ kit (Thermo Fisher Scientific) according to the manufacturer’s instructions. RNA samples were sent to the University of York Technology Facility for RNA-Seq library preparation by poly(A) purification, and libraries were sequenced at the University of Leeds Next Generation Sequencing Facility using an Illumina HiSeq 3000 platform with 2 × 150 bp sequencing.

Bioinformatics processing of RNA-Seq data

The tools used to construct the reference transcriptome were HISAT2 [23], StringTie [24], Mikado [25], Portcullis [26], and TransDecoder [27]. The latest version (v 4.0) of the A. niger ATCC1015 genome annotation [28] available from the Joint Genome Institute was used. The mitochondrial transcripts determined for the A. niger strain N909 [29] were included in the reference transcriptome. The tools used to perform quantification and differential expression analysis were Salmon [30], Wasabi [31], and Sleuth [32]. Figure 5 shows the workflow followed to process the RNA-Seq data.

Fig. 5
figure 5

Schematic showing RNA-Seq analysis workflow

Functional annotation of transcripts and generation of transcript–reaction associations for the iDU1327 model

The gene–protein–reaction associations in the iDU1756 model [19] were replaced with transcript–reaction associations in the model iDU1327 (see Additional file 1), determined by a comprehensive functional annotation process that employed a multitude of tools (Table 8). Mapping files (Table 9) and the KEGG database [53] were used to map the output from each tool to gene ontology (GO) molecular functions, EC numbers, and KEGG reactions. A consensus functional annotation was built, and KEGG reactions were included if associated with an EC number in the consensus. Figure 6 shows the workflow followed to construct the consensus functional annotation.

Table 8 Tools used in functional annotation of transcripts
Table 9 Mapping files used in functional annotation of transcripts
Fig. 6
figure 6

Schematic showing workflow followed to construct consensus functional annotation

Mapping transcript expression to a reaction-level

The transcript expression data were mapped to a reaction-level following the transcript–reaction associations in iDU1327 and according to the following rules: In the case of an OR relationship, the expression of associated transcripts was summed. In the case of an AND relationship, the minimum expression of associated transcripts was taken. Reactions with expression below the cut-off (TPM < 1) were switched off.

Transcriptome-guided in silico evolution of constrained citric production

The in silico evolution of constrained citric production was performed using an evolutionary algorithm implemented in Java [19] with changes made to the fitness function and mutation operator as detailed below:

(i) Fitness function

The fitness was calculated with respect to T2 by a least-squares fitting procedure

$$F = - log_{10} \sum \left( {f_{t} - f_{a} } \right)^{2} ,$$

where F is the fitness, ft is the target flux, and fa is the actual flux.

The fluxes included were those for five exchange reactions (internal phosphate, glucose, xylose, biomass, and citric acid). The target citric output flux was set to 0.12 in line with in vivo data, and the other target fluxes were set to their original values.

(ii) Mutation operator

Flux bounds were subjected to change by the mutation operator, informed by differential expression events (fold change ≥ 2) at both T1 to T2 and T2 to T3, and resulted in flux either being constrained (down-regulation), forced (up-regulation), or unchanged (no differential expression). Mutations were permitted to alter flux bounds within the multiplicative bounds determined by the fold change in expression and fluxes at T1 and T3, with the effect of imposing a limit on the extent of flux constraint and a minimum value for forcing flux. Mutations were not allowed to force flux on reactions without a clear direction of flux or beyond the maximum allowable flux. For initial mutations, the flux bound was set randomly between the minimum flux forced and the maximum allowable flux if forcing flux or between the original flux and the limit of flux constraint if constraining flux. For subsequent mutations, the flux bound was mutated from the existing mutated flux bound. Mutations constraining flux were applied to both lower and upper bounds for reversible reactions. Mutations were performed by adding a small value to the flux bound determined by the double Laplace function. The location parameter, \(\mu\), was set to zero, and the scale parameter, \(b\), was set according to

$$\begin{gathered} b = 0.01\left| B \right|,\,\left| B \right| > 0 \hfill \\ b = 0.001,\,\left| B \right| = 0, \hfill \\ \end{gathered}$$

where \(b\) is the scale parameter, and \(B\) is the flux bound that the mutation is applied to.

(iii) Driving evolution of multiple solutions

A fitness threshold of 6 was applied to identify evolved solutions, and these were analysed to identify their key reactions. This threshold was chosen as at this fitness value the fluxes of selected exchange reactions are sufficiently close to their target values. Each mutated flux bound was evaluated for its contribution to the fitness by complementation with the original flux bound, and the reaction corresponding to the mutation with the greatest contribution to fitness was identified as the key reaction. Flux bounds of the key reaction were then reset to the original across the population and blocked from mutating again. The in silico evolution was run for 150,000 generations, a duration sufficient to allow for the evolution of multiple solutions.

Processing solutions from in silico evolution to suggest targets for strain optimisation

Solutions from in silico evolution were subjected to optimisation and simplification. Mutated flux bounds were evaluated for their contribution to the solution fitness by resetting them to the original flux bounds, and the mutations were removed if the solution fitness remained over a threshold of 6. Additionally, the mutations on a reaction’s flux bounds were optimised by making small adjustments. Processed solutions were analysed to rank their mutations by contribution to the solution fitness, by individually complementing the mutated flux bounds on each reaction with the original flux bounds. Key reactions were identified by proceeding down the ranked list and successively removing mutations until citric acid output flux increased close to its original unconstrained value. For this purpose, a revised fitness was computed by substituting the target citric acid output flux of 0.12 with the original unconstrained value, and a threshold of 3.5 was applied for the revised fitness to identify the set of key reactions that when returned to their original flux bounds result in optimal citric production. This threshold was chosen as at this fitness value the fluxes of selected exchange reactions are sufficiently close to values that reflect optimal citric production. A set of key reactions was obtained from each processed solution, and from these target suggestions were sourced. A threshold of 50% increase in citric acid output flux at T2 was applied to identify prominent suggested targets that have significant effect in silico.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. Ciriminna R, Meneguzzo F, Delisi R, Pagliaro M. Citric acid: emerging applications of key biotechnology industrial product. Chem Cent J. 2017;11:1–9.

    Google Scholar 

  2. Meyer V, Wu B, Ram AF. Aspergillus as a multi-purpose cell factory: current status and perspectives. Biotechnol Lett. 2011;33:469–76.

    CAS  PubMed  Google Scholar 

  3. Yang L, Henriksen MM, Hansen RS, Lübeck M, Vang J, Andersen JE, Bille S, Lübeck PS. Metabolic engineering of Aspergillus niger via ribonucleoprotein-based CRISPR–Cas9 system for succinic acid production from renewable biomass. Biotechnol Biofuels. 2020;13:1–12.

    CAS  Google Scholar 

  4. Hossain AH, Li A, Brickwedde A, Wilms L, Caspers M, Overkamp K, Punt PJ. Rewiring a secondary metabolite pathway towards itaconic acid production in Aspergillus niger. Microb Cell Fact. 2016;15:1–5.

    Google Scholar 

  5. Dhillon GS, Brar SK, Kaur S, Verma M. Screening of agro-industrial wastes for citric acid bioproduction by Aspergillus niger NRRL 2001 through solid state fermentation. J Sci Food Agr. 2013;93:1560–7.

    CAS  Google Scholar 

  6. Lu H, Cao W, Ouyang L, Xia J, Huang M, Chu J, Zhuang Y, Zhang S, Noorman H. Comprehensive reconstruction and in silico analysis of Aspergillus niger genome-scale metabolic network model that accounts for 1210 ORFs. Biotechnol Bioeng. 2017;114:685–95.

    CAS  PubMed  Google Scholar 

  7. Hassan SS, Williams GA, Jaiswal AK. Moving towards the second generation of lignocellulosic biorefineries in the EU: drivers, challenges, and opportunities. Renew Sust Energ Rev. 2019;101:590–9.

    CAS  Google Scholar 

  8. Bezerra TL, Ragauskas AJ. A review of sugarcane bagasse for second-generation bioethanol and biopower production. Biofuel Bioprod Bior. 2016;10:634–47.

    CAS  Google Scholar 

  9. Xie H, Ma Q, Wei DZ, Wang FQ. Transcriptomic analysis of Aspergillus niger strains reveals the mechanism underlying high citric acid productivity. Bioresour Bioprocess. 2018;5:1–23.

    CAS  Google Scholar 

  10. Becker SA, Palsson BO. Context-specific metabolic networks are consistent with experiments. PLoS Comput Biol. 2008;4:e1000082.

    PubMed  PubMed Central  Google Scholar 

  11. Colijn C, Brandes A, Zucker J, Lun DS, Weiner B, Farhat MR, Cheng TY, Moody DB, Murray M, Galagan JE. Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS Comput Biol. 2009;5:e1000489.

    PubMed  PubMed Central  Google Scholar 

  12. Chandrasekaran S, Price ND. Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis. P Natl Acad Sci USA. 2010;107:17845–50.

    CAS  Google Scholar 

  13. Zur H, Ruppin E, Shlomi T. iMAT: an integrative metabolic analysis tool. Bioinformatics. 2010;26:3140–2.

    CAS  PubMed  Google Scholar 

  14. Jensen PA, Papin JA. Functional integration of a metabolic network model and expression data without arbitrary thresholding. Bioinformatics. 2011;27:541–7.

    CAS  PubMed  Google Scholar 

  15. Kim MK, Lane A, Kelley JJ, Lun DS. E-Flux2 and SPOT: validated methods for inferring intracellular metabolic flux distributions from transcriptomic data. PLoS ONE. 2016;11:e0157101.

    PubMed  PubMed Central  Google Scholar 

  16. Tian M, Reed JL. Integrating proteomic or transcriptomic data into metabolic models using linear bound flux balance analysis. Bioinformatics. 2018;34:3882–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Upton DJ, McQueen-Mason SJ, Wood AJ. An accurate description of Aspergillus niger organic acid batch fermentation through dynamic metabolic modelling. Biotechnol Biofuels. 2017;10:1–14.

    Google Scholar 

  18. Peksel A, Torres N, Liu J, Juneau G, Kubicek C. 13 C-NMR analysis of glucose metabolism during citric acid production by Aspergillus niger. Appl Microbiol Biot. 2002;58:157–63.

    CAS  Google Scholar 

  19. Upton DJ, McQueen-Mason SJ, Wood AJ. In silico evolution of Aspergillus niger organic acid production suggests strategies for switching acid output. Biotechnol Biofuels. 2020;13:1–21.

    Google Scholar 

  20. Steiger MG, Rassinger A, Mattanovich D, Sauer M. Engineering of the citrate exporter protein enables high citric acid production in Aspergillus niger. Metab Eng. 2019;52:224–31.

    CAS  PubMed  Google Scholar 

  21. Xu Y, Shan L, Zhou Y, Xie Z, Ball AS, Cao W, Liu H. Development of a Cre-lox P-based genetic system in Aspergillus niger ATCC1015 and its application to construction of efficient organic acid-producing cell factories. Appl Microbiol Biot. 2019;103:8105–14.

    CAS  Google Scholar 

  22. Wang L, Zhang J, Cao Z, Wang Y, Gao Q, Zhang J, Wang D. Inhibition of oxidative phosphorylation for enhancing citric acid production by Aspergillus niger. Microb Cell Fact. 2015;14:1–12.

    Google Scholar 

  23. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33:290–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Venturini L, Caim S, Kaithakottil GG, Mapleson DL, Swarbreck D. Leveraging multiple transcriptome assembly methods for improved gene structure annotation. GigaScience. 2018;7:giy093.

    PubMed Central  Google Scholar 

  26. Mapleson D, Venturini L, Kaithakottil G, Swarbreck D. Efficient and accurate detection of splice junctions from RNA-seq with Portcullis. GigaScience. 2018;7:giy131.

    PubMed Central  Google Scholar 

  27. GitHub–TransDecoder/TransDecoder: TransDecoder source. Accessed 19 June 2020.

  28. Andersen MR, Salazar MP, Schaap PJ, van de Vondervoort PJ, Culley D, Thykaer J, Frisvad JC, Nielsen KF, Albang R, Albermann K, Berka RM. Comparative genomics of citric-acid-producing Aspergillus niger ATCC 1015 versus enzyme-producing CBS 513.88. Genome Res. 2011;21:885–97.

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Juhász Á, Pfeiffer I, Keszthelyi A, Kucsera J, Vágvölgy C, Hamari Z. Comparative analysis of the complete mitochondrial genomes of Aspergillus niger mtDNA type 1a and Aspergillus tubingensis mtDNA type 2b. FEMS Microbiol Lett. 2008;281:51–7.

    PubMed  Google Scholar 

  30. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. GitHub–COMBINE-lab/wasabi: prepare Sailfish and Salmon output for downstream analysis. Accessed 24 June 2020.

  32. Pimentel H, Bray NL, Puente S, Melsted P, Pachter L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat Methods. 2017;14:687–90.

    CAS  PubMed  Google Scholar 

  33. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007;35:W182–5.

    PubMed  PubMed Central  Google Scholar 

  34. Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–54.

    CAS  PubMed  Google Scholar 

  35. Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21:3674–6.

    CAS  PubMed  Google Scholar 

  36. Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:W327–31.

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2010;39:D225–9.

    PubMed  PubMed Central  Google Scholar 

  38. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–6.

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–30.

    CAS  PubMed  Google Scholar 

  40. Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 2018;46:D493–6.

    CAS  PubMed  Google Scholar 

  41. Haft DH, Selengut JD, Richter RA, Harkins D, Basu MK, Beck E. TIGRFAMs and genome properties in 2013. Nucleic Acids Res. 2012;41:D387–95.

    PubMed  PubMed Central  Google Scholar 

  42. De Castro E, Sigrist CJ, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N. ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 2006;34:W362–5.

    PubMed  PubMed Central  Google Scholar 

  43. Jeske L, Placzek S, Schomburg I, Chang A, Schomburg D. BRENDA in 2019: a European ELIXIR core data resource. Nucleic Acids Res. 2019;47:D542–9.

    CAS  PubMed  Google Scholar 

  44. Enzyme Database–BRENDA. Accessed 7 July 2020.

  45. Pedruzzi I, Rivoire C, Auchincloss AH, Coudert E, Keller G, De Castro E, Baratin D, Cuche BA, Bougueleret L, Poux S, Redaschi N. HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 2015;43:D1064–70.

    CAS  PubMed  Google Scholar 

  46. Araujo FA, Barh D, Silva A, Guimarães L, Ramos RT. GO FEAT: a rapid web-based functional annotation tool for genomic and transcriptomic data. Sci Rep. 2018;8:1–4.

    PubMed  PubMed Central  Google Scholar 

  47. Kumar N, Skolnick J. EFICAz2. 5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics. 2012;28:2687–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Elbourne LD, Tetu SG, Hassan KA, Paulsen IT. TransportDB 2.0: a database for exploring membrane transporters in sequenced genomes from all domains of life. Nucleic Acids Res. 2017;45:D320–4.

    CAS  PubMed  Google Scholar 

  49. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 2015;43:D213–21.

    PubMed  Google Scholar 

  51. Lima T, Auchincloss AH, Coudert E, Keller G, Michoud K, Rivoire C, Bulliard V, De Castro E, Lachaize C, Baratin D, Phan I. HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 2009;37:D471–8.

    CAS  PubMed  Google Scholar 

  52. Duvaud S, Gabella C, Lisacek F, Stockinger H, Ioannidis V, Durinx C. Expasy, the Swiss Bioinformatics Resource Portal, as designed by its users. Nucleic Acids Res. 2021.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


Not applicable.


This work was supported by the Biotechnology and Biological Sciences Research Council (BBSRC) [Grant Number BB/S01196X/1], the Department of Biotechnology (DBT), Ministry of Science and Technology, Government of India [Grant Number BT/IN/Indo-UK/SSY/07/2018-19], and Innovate UK [Grant Number 104336].

Author information

Authors and Affiliations



DJU conceptualised the study, developed the methodology and software, performed the experiments, analysed the data, and wrote the original draft of the manuscript. MK assisted with data analysis. CW, LF, and LDG developed the methodology for producing bagasse hydrolysate. SJMM, SS, and AJW acquired the funding, supervised the work, and reviewed and edited the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Daniel J. Upton.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

The iDU1327 model presented as a spreadsheet.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Upton, D.J., Kaushal, M., Whitehead, C. et al. Integration of Aspergillus niger transcriptomic profile with metabolic model identifies potential targets to optimise citric acid production from lignocellulosic hydrolysate. Biotechnol Biofuels 15, 4 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: