Overview of experiments
One major goal of this study was the identification of gene targets that could be promising for subsequent genetic engineering approaches with the aim to increase the microalgal triacylglycerol (TAG) accumulation. Towards this end, we applied transcriptomics of M. neglectum under alternating phases of nitrogen (N) availability, as microalgal TAG accumulation is efficiently induced by −N treatment [9].
As a preparatory step, a long-term −N experiment of 17 days of autotrophic −N conditions was performed, in which the dynamics of starch, TAG, and total lipid accumulation in M. neglectum were identified (Fig. 1, exp1). As a result, the −N phase could be subdivided into an early stage of starch accumulation (e−N stage), and a subsequent, late stage where TAG and total lipid levels increased (l−N stage).
For transcriptome analysis, we decided to further extend the scope by not only investigating the conditions of TAG accumulation, but also including conditions of TAG degradation, in order to obtain a profound understanding of the transcriptional regulation of the TAG metabolism in M. neglectum. TAG accumulation can be reversed by resupplementation of N [24, 30, 31]. Therefore, the experiment one was repeated, and an additional N resupply treatment (r+N stage) after 48 h of −N conditions was included (Fig. 1, exp2).
Characterization of the cellular response of M. neglectum to nitrogen starvation to define the timing of starch, TAG, and total lipid accumulation
Starch and TAGs are the major storage compounds in chlorophyceae [32, 33]. For the model microalga C. reinhardtii, a biphasic pattern of starch and TAG accumulation under –N conditions has been described, with starch accumulation preceding TAG accumulation [34]. In order to investigate how far this pattern also applies to M. neglectum, the dynamics of starch and lipid accumulation were determined during a long-term −N trial (Fig. 1).
Removal of N resulted in cessation of cell doubling after approximately 2 days of starvation, while biomass concentrations continuously increased from initially 0.236 ± 0.024 g L−1 SE (standard error, n = 3) to 1.583 ± 0.06 g L−1 SE after 11 days of −N conditions. Increasing biomass concentrations and cessation of cell doubling indicate storage compound accumulation and translated into an increase in cell weight (Fig. 2a).
The cellular starch level was found to peak at day 1 and slowly declined afterwards (Fig. 2b). Such a pattern was also described for Chlorella zofingiensis [35], but is in contrast to C. reinhardtii, for which a continuously increasing starch level was reported, however with a decreasing slope after 2 days of −N conditions [34, 36]. The neutral lipid content of M. neglectum increased almost linearly until day 8, after which it remained approximately constant (Fig. 2c).
During the first 2 days of −N cultivation, no net increase of the cellular total lipid content was observed (Fig. 2c, black line). Interestingly, a clear accumulation of the neutral lipid fraction was observed (Fig. 2c, dark gray line), whereas the fraction of polar lipids decreased (Fig. 2c, light gray line). These opposing tendencies indicate that during this period TAG accumulation was connected to acyl chain recycling from membrane lipids into the TAG pool. At the same time, we noticed that FA synthesis remained active, because the volumetric total lipid content increased from 51 ± 7 mg L−1 SE (day 0) to 155 ± 5 mg L−1 SE (day 2) (data not shown). This did, however, not translate into an increase of the cellular total lipid content, because the cell concentration concomitantly increased from 3.7 ± 0.2 × 106 cells mL−1 SE (day 0) to 12.9 ± 0.9 × 106 cells mL−1 SE (day 2) (data not shown). In the following days, both the total lipid levels as well as the neutral lipid levels strongly increased, while the polar lipid amount remained approximately constant (Fig. 2c). This indicates that during this period TAG accumulation was directly fueled by de novo synthesis of cellular FAs. Additionally, a metabolic state allowing for total lipid hyperaccumulation had been established. Importantly, starch degradation apparently was not a major contributor to the total lipid accumulation in M. neglectum, because cellular starch levels declined by only ~4 pg from day 2 to day 8, while the total lipid content increased by ~28 pg in the same period of time (Fig. 2b, c).
Interestingly, in this period the cell weight even increased by ~40 pg (Fig. 2a), therefore by more than the sum of the lipid and starch levels. Accordingly, the observed increase of the cell weight cannot be attributed solely to lipid accumulation, indicating that additional carbon sinks must exist, possibly associated with cell wall components.
The optimal harvesting time for maximum volumetric neutral lipid production was found to be between day 4 and day 8, while the volumetric productivity of total lipids remained constant until day 8 (Fig. 2d). Note that low cell concentrations of approximately 4 million cells mL−1 were used for inoculation, in order to ensure optimal light penetration and consequently rapid neutral lipid accumulation [27, 37]. Low cell concentrations, however, might not be biotechnologically most relevant. Accordingly, we did not calculate absolute volumetric neutral lipid productivity values, as those would certainly be misleading by underestimating the amounts expected from cultures with higher cell concentrations.
In summary, the long-term −N experiment revealed a similar biphasic pattern of starch accumulation preceding TAG accumulation in M. neglectum, as has been reported for other chlorophyceae [34,35,36, 38].
Design of the transcriptome experiment and data evaluation
For the transcriptome study, we extended the analysis from −N conditions to include N resupply conditions, in order to analyze both the accumulation and the degradation phase of the TAGs. Towards this end, the −N experiment was repeated and after 48 h an aliquot of N-starved cells was resuspended in N-containing media, while the remaining cells were kept under continuous −N conditions (Fig. 1, exp2). As expected, the N resupply treatment resulted in decreased levels of both the neutral lipid and the total lipid contents (Additional file 1: Figure S1), indicating TAG and FA degradation, respectively.
RNA samples for deep mRNA sequencing were taken from all three stages of N availability (e−N, l−N, and r+N stages), constituting a total of twelve time points (Fig. 1, exp2, purple circles). The first sample was taken immediately prior to the −N treatment and is referred to as the reference time point N_0 representing the transcriptome from exponential growth conditions. For the e−N stage, three samples were taken, after 2, 4, and 8 h of −N conditions, referred to as N_2, N_4, and N_8, respectively. These time points were chosen to represent the transcriptional basis of starch accumulation and of acyl chain recycling from membrane lipids for early TAG accumulation. For the l−N stage, four samples were taken, after 24, 48, 56, and 96 h of −N conditions, referred to as N_24, N_48, N_56, and N_96, respectively. This was to elucidate the transcriptional basis for TAG and total lipid hyperaccumulation. The relatively large time interval was chosen to cover putatively different phases of TAG accumulation, as well as to differentiate between transient and stable transcriptional responses. For the r+N stage, another four samples were taken, after 2, 4, 8, and 14 h of N resupply conditions, referred to as R_2, R_4, R_8, and R_14, respectively. It was accordingly possible to identify different timings of transcript changes triggering the end of cellular quiescence and to comprehensively investigate the reversal of storage compound accumulation.
As the transcriptome of M. neglectum was sequenced for the first time, we aimed to acquire a great sequencing depth, in order to also obtain read support for genes with low expression, allowing accurate reconstruction of transcript models for the majority of genes. To reach this goal, we limited the number of samples being sequenced, so that more reads per individual sample could be obtained. Since we expected that transcript abundance changes were rather accurately monitored by mRNA-seq because mRNA-seq was often reported to exhibit low technical variability [39, 40] and to correlate well with RT-qPCR data [15, 19, 20, 41,42,43,44], sequencing replicates were not performed. Sequencing replicates have also been omitted in other studies investigating transcriptome changes of microalgae subjected to −N conditions [15, 21, 23]. To mitigate biological variance, a pool of equal amounts of total RNA from two biological replicates was sequenced for each individual time point. The approach of sequencing a pool of RNA from biological replicates was also conducted in other studies [41, 45, 46]. We note that using this approach we could not quantify the biological variance. However, the setup comprised two directly opposing culture conditions, each including several harvesting time points (Fig. 1, −N vs +N resupply). Therefore, despite this limitation, we chose this approach to obtain both a structural genome annotation refinement (Additional file 1: Results) and a first approximation of the transcriptome response of M. neglectum to alternating phases of N availability.
A total of 796 million 100 nt paired-end reads were obtained, translating into approximately 33 million fragments for each time point, and thus a coverage of >200-fold assuming a 32 mbp transcriptome. For data processing, the Tuxedo protocol was followed [47]. Accordingly, a genome- and reference-guided approach was conducted, based on read mapping by TopHat2 [48] and transcript assembly and quantification by the Cufflinks suite [49]. Both software tools used the genome assembly from [27] and the improved genome annotation obtained in this study by BRAKER1 [50] as additional input (Additional file 1: Results). Of the Cufflinks tools, we used cuffquant and cuffnorm, instead of cuffdiff due to the absence of sequencing replicates. As a result, normalized transcript abundance values, expressed as fragments per kilobase of exon per million fragments mapped (FPKM), were obtained.
Changes in transcript abundances relative to the reference time point N_0 were expressed as log2-transformed fold change (FC) values. We selected an absolute log2-FC > 1 to define a gene as responsive to the treatment, whereas an absolute log2-FC between 0 and 1 classified a gene as not-responsive, as has been done in previous studies [15, 46, 51, 52]. Interestingly, we observed in MA plot from previous studies that most absolute log2-FC values greater than ~1 were also statistically significant [53,54,55,56]. However, we stress that FC is not indicative of any statistical significance, and that furthermore less abundant transcripts might require higher absolute FC values to detect statistical significance than more abundant transcripts [56]. The FC threshold used in our study was supported by our own data for the expression of housekeeping genes. For those, the absolute log2-FC of at most one of the eight time points of −N conditions was >1, whereas the remaining FC values of −N conditions were ≤1. Although most housekeeping genes showed a transient transcriptional response in the r+N stage, their expression usually relaxed to the range of absolute log2-FC ≤ 1 after 8 h of N resupply (Additional file 1: Figure S2a).
The data were integrated into a database, which is available at https://tdbmn.cebitec.uni-bielefeld.de. For M. neglectum, the structural and functional annotation of the queried transcript locus is displayed, as well as the transcript abundance profile during the three different stages of N availability. In addition, published transcriptome datasets from other microalgal species that were also subjected to −N treatment were integrated into this database, enabling inter-species comparisons of transcript changes to −N conditions (Additional file 1: Results).
Transcriptome reconstruction and quantification
20,751 transcript loci were assembled by Cufflinks [47], which contained a total of 35,146 isoforms. The higher number of isoforms was predominantly due to the presence and absence of untranslated regions (UTRs). While UTRs were predicted by Cufflinks, they were not included in the reference annotation. Therefore, 85% of all loci had either a single or two isoforms attributed, i.e., the provided “UTR-free CDS isoform” and the fully annotated version including UTRs. For the remaining 15%, it was checked whether evidence for alternative splicing and dominant isoform switching [49] could be detected. This was found for at least four genes (Additional file 1: Results, Figures S3–S6); however, a more detailed analysis is required to further investigate the extent and effect of alternative splicing and dominant isoform switching in M. neglectum.
The top 100 most abundant transcripts under exponential growth conditions were annotated as proteins involved in the cellular processes translation (52%) and photosynthesis (28%) (Additional file 2). Two examples are a putative 60 S ribosomal protein L13a and a putative chlorophyll a-b binding protein (XLOC_013860, FPKM at N_0 = 4715 and XLOC_000814, FPKM at N_0 = 10,978, respectively). This has also been noted for B. sudeticus [23]. Interestingly, although the majority of transcripts assigned to photosynthesis were strongly decreased in the l−N stage (Additional file 2), a few exhibited both a high abundance and a stable expression pattern under −N conditions. A notable example is one isoform of RuBisCo small subunit (rbcS2, XLOC_007679, median FPKM = 10,028). Further examples for transcripts with highest abundances and stable expression patterns under −N conditions are a putative component of the cytosolic large 60 S ribosomal subunit (rpl7ae, XLOC_000987, median FPKM = 3838) and a putative elongation factor (putative fragment pair XLOC_005939 and XLOC_012699, median FPKM = 2413 and 2393, respectively) (Additional file 1: Figure S2b). Importantly, these genes are ideal candidates for cloning of the promoter regions to efficiently drive transgene expression in subsequent genetic engineering studies.
The range of FPKM values covered more than four orders of magnitude, and this range distribution was approximately similar in the different stages of N availability (Fig. 3a). The log2-FC values (relative to N_0) of most transcripts were between −1 and 1 at most time points (gray box in Fig. 3b). As noted above, this interval was accordingly defined as the “not-responsive range” (Additional file 1: Figures S2 and S10a). No alteration of the expression of the majority of genes by the experimental conditions is in accordance with the null hypothesis for the determination of significant differentially expressed genes [55]. The upper values of the 1.5-fold interquartile ranges of log2-FC values were for most time points less than four (dotted blue line in Fig. 3b). Therefore, transcripts with log2-FC values of more than four were summarized as highly regulated, and this was visually highlighted by a darker color (see Fig. 3c for the extent of up-regulation of MLDP in the l−N stage, which is indicated in Fig. 4 by a dark red color for the MLDP transcript at the time points from the l−N stage). In contrast, the range between absolute log2-FC values larger than one and less than four was linearly color-coded (see figure keys in Figs. 4, 5, 6).
Effect of N resupply on gene expression of selected examples
Nitrogen resupply treatment was applied to reverse the phenotypic effects of nutrient starvation and the efficiency of the selected procedure can be exemplified by the profile of the transcript encoding for the major lipid droplet protein (MLDP, Fig. 3c). TAGs, which are accumulated in response to −N treatment, are stored as lipid droplets inside the cell [28, 57], of which MLDP is a major structural protein [30, 58]. Therefore, as expected, a strong increase of the MLDP transcript levels was detected during the two stages of N starvation (e−N and l−N), while relaxation of gene expression occurred immediately upon N resupply in the r+N stage (Fig. 3c).
The N resupply treatment facilitated the assignment of potential functions to genes whose gene products putatively catalyze similar enzymatic reactions. For instance, two transcripts were annotated as ferredoxin-NADP+-reductase, a central enzyme of photosynthesis. While the first transcript was strongly up-regulated in the l−N stage (putative fragment pair XLOC_015550 and XLOC_016383), the second transcript was strongly down-regulated in this stage (XLOC_001499) (Additional file 1: Figure S2c). Interestingly, this pattern was reversed when nitrogen was resupplied and expression reverted to pre-starvation levels of the reference time point (Additional file 1: Figure S2c). This indicates that during nutrient starvation the former enzyme might be involved in maintaining photosynthetic electron flow at reduced availability of NADP+ levels or in photoprotective release of excitation pressure when demand for NADPH is low, for instance by redistribution of electrons to various redox reactions, which has been described for one of two leaf ferredoxin-NADP+-reductase genes of A. thaliana [59]. The latter ferredoxin-NADP+-reductase of M. neglectum could be important under environmental conditions supporting fast cell growth when demand for photosynthetically provided NADPH is high. The opposing expression patterns might also be indicative of a modulation of photosynthetic electron flow towards increased cyclic electron flow under −N conditions. Cyclic electron flow generates ATP [60] and was found to be important for neutral lipid accumulation and to be increased under both autotrophic and mixotrophic −N conditions in C. reinhardtii [61]. A second example is the transcriptional regulation of two isoamylase genes. Both genes were expressed under −N conditions in a stable pattern, except for the first time point of −N conditions, at which a gentle transcriptional regulation (~threefold) in opposite directions was observed (Additional file 1: Figure S2d). In the r+N stage however, both genes were contrastingly up- or down-regulated at three of four time points, respectively (Additional file 1: Figure S2d). As the r+N stage was characteristic of storage compound degradation, it is tempting to speculate that the up-regulated isoamylase candidate is implicated in starch degradation (XLOC_001619), whereas the down-regulated one is implicated in starch synthesis (putative fragments XLOC_004804 and XLOC_012040).
It should be mentioned that the N resupply approach furthermore allowed better interpretation of some of the –N transcriptome data. As an example, the transcript abundances of the subunits of the plastidial pyruvate dehydrogenase complex (cpPDHC) were approximately constant in both the e−N and l−N stages (Additional file 1: Figure S2e). However, in the r+N stage, a transient strong down-regulation was observed (Additional file 1: Figure S2e). This shows that cpPDHC was subjected to transcriptional regulation, which in our setup would not have been detected without the r+N treatment. Therefore, the differential transcriptional response of cpPDHC under −N and N resupply conditions indicates that under −N conditions cpPDHC expression is actively maintained, which is reasonable, because cpPDHC converts pyruvate to acetyl-CoA (CoA, coenzyme A), the carbon precursor for FA synthesis (see “Discussion”).
Global transcriptional responses in the three different stages of N availability
The differentiation into three distinct stages of N availability was the backbone for the interpretation of transcriptional patterns in this study. However, this differentiation was initially solely based on metabolic data, i.e., on the timing of starch and TAG accumulation, as well as of TAG degradation (Fig. 2b, c; Additional file 1: Figure S1). We therefore investigated whether the transcriptomes from the twelve time points would accordingly cluster into the three stages. Towards this end, we applied hierarchical clustering based on Jensen–Shannon distance [49, 62]. From visual evaluation of the resulting dendrogram, four clusters could be distinguished (Fig. 3d). The first cluster consisted of the transcriptomes from the time points of the e−N stage, whereas the second cluster consisted of those from the l−N stage (Fig. 3d). The third and the fourth clusters each contained two of the four transcriptomes from the time points of the r+N stage (Fig. 3d). The reason for the unexpected division of the r+N data into two separate clusters was likely of technical nature, most likely due to the higher magnitude of transcript abundance values at the R_2 and R_4 time points compared to R_8 and R_14 (larger box size and whiskers in Fig. 3a), because the distance metric relies on the extent of change in relative expression [49]. Therefore, the three different stages of nitrogen availability also manifested on the level of the transcriptome data sets.
In order to dissect the transcriptional responses into those being shared between the e−N and l−N stages, and those restricted to either the e−N or l−N stage, as well as those unique to the r+N stage, an analysis of shared responsive genes was performed (Fig. 3e; Additional file 1: Figure S7). As expected, more responsive genes were shared between the e−N and l−N stages, compared to the r+N stage, which was true for both up- and down-regulated genes (Fig. 3e). To subsequently identify the cellular processes in which the respective genes were involved, a gene ontology (GO) term enrichment analysis was performed (Additional file 1: Table ST1). GO terms represent unified vocabulary to annotate genes and gene products [63]. GO terms are organized in a hierarchical structure, with broader vocabulary at the higher level (e.g., “signal transduction”) and more specific vocabulary at the lower levels (e.g., “cAMP biosynthesis”) [63]. Three different categories (“roots”) of GO terms are defined, which are biological process, molecular function, and cellular component. An example for the category biological process is the GO term “translation,” which is a child of the GO term “gene expression,” and a parent to the GO terms “translational initiation,” “translation elongation,” and “translation termination” [63]. We restricted our analysis to the category biological process, because we were interested in the cellular processes that were subjected to transcriptional regulation in the three stages.
As a result, the enriched GO terms of the genes whose up-regulation was restricted to the e−N stage were indicative of an induction of cell division (Additional file 1: Table ST1). Furthermore, the GO term “microtubule-based processes” was enriched (Additional file 1: Table ST1). GO terms enriched among genes down-regulated specifically in the e−N stage included “chlorophyll biosynthesis process,” “aromatic amino acid family biosynthetic process,” and “translation” (Additional file 1: Table ST1).
Enriched GO terms of genes whose up-regulation was restricted to the l−N stage were “protein phosphorylation,” “fatty acid biosynthetic process,” “chlorophyll catabolic process,” and “lipid catabolism process” (Additional file 1: Table ST1). As the l−N stage was characterized by TAG and total lipid hyperaccumulation and thus central to this study, the top 100 genes showing the highest degree of up-regulation in this stage were additionally determined (Additional file 2). This was to identify the processes that were subjected to the highest transcriptional induction. These genes encoded almost the complete set of N assimilation proteins (Additional file 1: Results), including genes annotated as urea carboxylase. This indicates that M. neglectum is able to use urea as an external nitrogen source, which was also confirmed phenotypically in a separate experiment (Additional file 1: Figure S8). Interestingly, an MYB-domain containing transcription factor (XLOC_013389) was also found in this set, possibly implicated in core metabolic regulation (Additional file 1: Results). GO terms enriched among genes down-regulated specifically in the l−N stage included “photosystem II assembly” and “proline cis–trans isomerization” (Additional file 1: Table ST1).
The GO term enrichment profile of genes up-regulated in both the e−N and the l−N stages suggested an induction of the tricarboxylic acid cycle, glycolysis, and arginine biosynthesis (Additional file 1: Table ST1). Interestingly, the GO term “ATP hydrolysis coupled proton transport” was also enriched (Additional file 1: Table ST1), which was assigned to eleven genes that encoded putative vacuolar-type (V-type) ATPases. The GO term enrichment profile of genes down-regulated in both stages consisted mostly of photosynthesis-associated processes, such as light-harvesting and the non-oxidative pentose-phosphate shunt (Additional file 1: Table ST1). Interestingly, the GO term “cysteine biosynthetic process” was also found to be enriched in this set.
The resupplementation with N restored the unstressed cellular state allowing for exponential growth (Additional file 1: Figure S1). GO terms enriched among genes up-regulated specifically in the r+N stage were indicative of the respective processes, such as “ribosome biogenesis” and “photosynthesis” (Additional file 1: Table ST1). Enriched GO terms of genes whose down-regulation was restricted to the r+N stage included “tricarboxylic acid cycle,” “fructose 6-phosphate metabolic process,” and “ATP hydrolysis coupled proton transport” (Additional file 1: Table ST1). Since they were previously enriched in the set of genes up-regulated in both the e−N and l−N stages, it is tempting to speculate that these cellular processes are central to cope with −N conditions.
Reconstruction of pathway maps and the glycerolipid metabolism of M. neglectum
For reconstruction of metabolic pathway maps, tBLASTx comparison of known enzymes from C. reinhardtii and A. thaliana was performed with the transcriptome of M. neglectum. After tBLASTx application, redundancy in the list of candidate genes was minimized by the identification of fragmented genes, which were the result of the assembly status of the genome of M. neglectum as 6739 scaffolds [27]. For instance, a protein AB might be encoded by two individual genes on two different scaffolds, such that part A is encoded as a first individual gene at the end of a first scaffold, while part B is encoded as a second individual gene at the beginning of a second scaffold. Importantly, however, as both genes are fragments of the same protein AB, their transcriptional profiles are identical. Approximately half of all gene models were located at a scaffold margin, and accordingly tagged as putatively truncated. For these genes, the transcript data acquired in this study allowed the assignment of gene fragments to fragment pairs. As an example, two candidate genes for the α-carboxyltransferase subunit of the acetyl-CoA carboxylase (ACCase) complex, a central enzyme of FA synthesis, were tagged as likely fragmented and had very similar expression patterns (XLOC_015237 and XLOC_12365, Additional file 1: Figure S9). In addition, their domain structures matched, because both had a predicted crotonase-like superfamily domain, appropriately truncated at the C- and N-terminus, respectively (Additional file 1: Figure S9). Accordingly, we considered these two sequences as a fragment pair. To minimize redundancy, only the fragment containing the N-terminus of the respective protein was retained for further analysis (XLOC_015237), because it optionally encodes the targeting peptide, hence allowing for localization prediction [32]. An obvious consequence of gene fragmentation was an over-estimation of the gene content of M. neglectum for a specific enzymatic step. For instance, two transcript loci were identified by tBLASTx search for phosphoribulokinase, which, however, were identified as a fragment pair, due to almost identical expression patterns (Additional file 3, XLOC_002711 and XLOC_006065, respectively) and matching domain structures (C- and N-terminal truncated Udk superfamily hit, respectively) [64]. Accordingly, M. neglectum most likely contains only one phosphoribulokinase enzyme, similar to C. reinhardtii [65], N. oceanica [66], and P. tricornutum [67]. A further consequence was over-estimation of the FPKM values of the individual parts of a fragment pair, because transcript length is taken into account for FPKM calculation [47]. This, however, is a systematic bias and cancels out during FC calculation, as long as transcript length remains unchanged during the time course experiment. Therefore, the over-estimation of FPKM values of fragment pairs did not affect the evaluation of relative patterns (intra-gene comparisons), which was the central element for subsequent pathway analysis.
Following this approach, the glycerolipid metabolism of M. neglectum was reconstructed (see “Discussion”). Briefly, FA synthesis in the chloroplast generates acyl chains, which are esterified to glycerol-3-phosphate in the Kennedy pathway to yield various membrane lipids or TAGs (Fig. 4). The direct precursor for most glycerolipids, including TAG, is diacylglycerol (DAG). DAG accordingly represents a central intermediate of the glycerolipid metabolism [8]. The acylation of DAG yields TAG, and an important group of enzymes catalyzing this reaction are the diacylglycerol acyltransferases, which use DAG and acyl-CoA as substrates [8]. Diacylglycerol acyltransferase enzymes are divided into several classes, and the first two are responsible for the bulk of TAG synthesis in plants [68], abbreviated as DGAT and DGTT in C. reinhardtii, respectively [10]. This route of TAG formation is also referred to as the acyl-CoA-dependent route. The alternative, the acyl-CoA independent route, refers to the transacylation of DAG and an acyl donor glycerolipid molecule (phospholipid, galactolipid or DAG [69]). This reaction produces a TAG and a lyso-lipid molecule, and is catalyzed by phospholipid:diacylglycerol acyltransferase (PDAT) [8].
Transcriptional regulation of the glycerolipid metabolism of M. neglectum during the two stages of nitrogen starvation and the stage of nitrogen resupply
We developed a modified heat map visualization (Additional file 1: Results), which contains both the extent of differential transcriptional regulation at the individual time points of the three stages of N availability as well as the absolute transcript abundance at the reference time point (N_0), represented by one of five abundance categories (category I–V). As additional information, the putative protein localization as predicted by PredAlgo software [70] is indicated (Fig. 4). Using this modified heat map visualization, the transcriptional regulation of the glycerolipid metabolism of M. neglectum during the three stages of N availability was analyzed. In the e−N stage, 3- to 8-fold increased transcript abundances were observed for MLDP, one stearoyl-ACP (ACP, acyl carrier protein) desaturase (SAD) candidate and two long-chain acyl-CoA synthetase (LACS) candidates (Fig. 4, MLDP, SAD, and LACS, respectively). Genes down-regulated to a high extent (≥eightfold) were not found in the e−N stage, except for a hypothetical subunit of the ACCase complex (Fig. 4, additional β-CT). However, its transcript abundance under logarithmic growth conditions (N_0) was only 20% of a second candidate (Additional file 3, XLOC_016656 and XLOC_015237, respectively). Therefore, the overall contribution of the second gene’s product to the flux of FA synthesis might be negligible. Seven genes were subjected to more subtle decreases in transcript abundances in the e−N stage; those were implicated in FA synthesis, thylakoid membrane assembly, and FA degradation (Fig. 4, KASIII, KAR, ENR, ACP, and MGDGS as well as ECH, respectively). Regarding the latter, however, down-regulation of the respective gene was also observed in the r+N stage (Fig. 4, ECH), rendering interpretation difficult.
In the l−N stage, the highest increase in transcript abundance was noted for MLDP, correlating with the increasing number and diameter of lipid droplets under −N conditions [28]. Additionally strongly up-regulated (10- to >16-fold) were acyl-ACP thioesterase (FAT), SAD, and the putative lipase PGD1 (Fig. 4). More gentle up-regulation was noted for CTP:phosphocholine cytidylyltransferase (~fourfold) putatively involved in phosphatidylcholine (PtdCho) synthesis, as well as for three of four central subunits of the ACCase complex (~twofold) (Fig. 4, CCT and ACCase, respectively). Contrasting the transcriptional induction of PtdCho synthesis, reduced transcript abundances were noticed for the genes implicated in the synthesis of phosphatidylserine and phosphatidylglycerol in the l−N stage (Fig. 4, PSS and PGPS, respectively). Thylakoid membrane lipid synthesis was also differentially regulated, with monogalactosyldiacylglycerol (MGDG) synthesis being transcriptionally repressed and digalactosyldiacylglycerol (DGDG) synthesis induced (Fig. 4, MGDGS and DGDGS, respectively). This might be indicative of an alteration of the thylakoid membrane architecture, because MGDG has a conical shape, does not form bilayers, and its accumulation results in a negative membrane curvature, while DGDG has a cylindrical shape and forms bilayers [71]. Three steps of FA synthesis were down-regulated in the l−N stage (Fig. 4, MCMT, KASIII, and KAR). The remaining down-regulated genes had 5–40% lower transcript abundance values than other genes associated with the same function at the reference time point N_0 (Fig. 4, additional β-CT, BCCP, and SAD); therefore, the effect of their down-regulation in the l−N stage was considered to be negligible.
In the r+N stage, the synthesis of MGDG was transcriptionally induced, while DGDG and diacylglycerol-N,N,N-trimethylhomoserine (DGTS) synthesis was transcriptionally repressed (Fig. 4, MGDGS, DGDGS, and BTA, respectively). FA degradation was transiently sharply up-regulated (Fig. 4, ACX, ECH, and KAT), which is in accordance with the decreasing total lipid content upon N resupply (Additional file 1: Figure S1). Contrastingly, FA synthesis was transcriptionally repressed at almost all individual enzymatic steps during the first 4 to 8 h of N resupply (Fig. 4, ACCase, MCMT, KASII, KAR, HAD, ENR, SAD, FAT, and ACP). Two putative LACS transcripts were transiently induced in the r+N stage, and two of three phosphatidic acid phosphatase candidate genes were transiently down-regulated (Fig. 4, LACS and PAP, respectively).
Transcriptional regulation of the committed step of TAG synthesis
The only committed reaction to TAG synthesis is the addition of a third acyl chain to DAG [8]. In respect of the acyl-CoA-dependent pathway, four of nine DGTT candidate genes showed a transcriptional induction under −N conditions, while approximately constant abundances were noted for the single putative DGAT transcript (Fig. 4). In the r+N stage, a transient down-regulation was observed for three of the aforementioned DGTT transcripts, as well as for two additional putative DGTT transcripts, and for the DGAT transcript (Fig. 4). In respect of the acyl-CoA independent pathway, the transcript level of the PDAT candidate gene was approximately constant in the three stages of N availability, except for a transient down-regulation in the r+N stage after 2 h of N resupply (Fig. 4). The transacylation reaction catalyzed by PDAT generates a lyso-lipid. Two enzymes putatively catalyzing the re-acylation of this lyso-lipid were identified in M. neglectum. The first candidate was transcriptionally induced (~twofold) in the e+N stage, whereas the second candidate was transiently repressed in the r+N stage (Fig. 4, LP-C/E-AT).
Clustering of transcripts annotated as lipases reveals candidates likely involved in TAG accumulation and TAG degradation
Lipases hydrolyze the ester bond between the glycerol backbone and the acyl chain of TAG and other glycerolipid molecules, yielding a free FA and the corresponding lyso-lipid [8]. The released acyl chain can be subsequently incorporated into other glycerolipids after activation by CoA. By this process, acyl chains can be shuttled between the membrane lipid and TAG pool (another route is transacylation). Acyl chain recycling apparently contributed significantly to early TAG accumulation in M. neglectum, because the polar lipid content decreased during the first 2 days of −N conditions (Fig. 2c; see above).
We reasoned that lipases involved in the process of de novo TAG accumulation can be identified according to a transcriptional induction under −N conditions. In contrast, opposing TAG lipases that degrade storage lipids are expected to be down-regulated under −N conditions, but up-regulated upon N resupply. Accordingly in this work, the transcriptional pattern of a lipase candidate gene was used as an indicator for its putative function in storage lipid metabolism.
In C. reinhardtii, a correlation between transcriptional regulation and metabolic function was observed for the lipases CrLIP1 and PGD1 (plastid galactoglycerolipid degradation 1) [72, 73]. Whereas CrLIP1 was down-regulated under −N conditions and indirectly implicated in TAG turnover [72], PGD1 was up-regulated under −N conditions and implicated in de novo TAG accumulation [73]. For both genes, a putative homologue was identified in the transcriptome of M. neglectum. As expected, the CrLIP1 transcript exhibited a transient sharp increase in the r+N stage (~sixfold at R_2, Fig. 5, XLOC_016073), in agreement with a putative role in storage lipid degradation. Likewise, the PGD1 transcript in M. neglectum exhibited a strong induction in the l−N stage (>16-fold, Fig. 5, XLOC_012515), corroborating a putative role in de novo TAG accumulation. Although the expression patterns are clear indicators that these proteins indeed represent homologous enzymes of both microalgal species, biochemical characterization would be required to prove this model and to further determine substrate specificity in M. neglectum.
Additional putative lipases were identified according to the gene annotation or their GO term description. This revealed a total of 68 putative lipase transcripts in M. neglectum. In order to group these transcripts based on their transcriptional profiles, hierarchical clustering was performed [74], and the resulting dendrogram divided into four clusters, supported by values for the average silhouette width as cluster quality parameter [75, 76] (data not shown).
Most putative lipase genes of the first cluster were characterized by a constant or slightly up-regulated gene expression in the e−N stage, were most strongly up-regulated in the l−N stage, and induction relaxed when nitrogen was supplemented in the r+N stage (Fig. 5, c1). This profile is similar to the PGD1 gene expression pattern which was also part of this cluster (Fig. 5, c1, XLOC_012515), therefore indicating a putative involvement in the process of de novo TAG accumulation under −N conditions. The second cluster contained several lipase candidates which were strongly up-regulated in the r+N stage, including the aforementioned CrLIP1 transcript of M. neglectum (Fig. 5, c2, XLOC_016073). Since up-regulation was concomitant with the induction of putative FA degradation genes (beta-oxidation, Fig. 4), it seems likely that these lipase candidates are involved in the process of TAG degradation. The third cluster contained transcripts with approximately stable expression in the e−N and l−N stages, but decreased expression in the r+N stage (Fig. 5, c3). This down-regulation might indicate that the corresponding enzymes are not required for the process of thylakoid membrane reassembly occurring in the r+N stage (Additional file 1: Table ST1); alternatively, they might impair the process of thylakoid membrane reassembly, therefore necessitating down-regulation in the r+N stage. The fourth cluster consisted of only two genes, which were strongly down-regulated in the e−N and l−N stages (Fig. 5, c4). This suggests that the corresponding enzymes might act as suppressors of TAG accumulation under exponential growth conditions.
Reconstruction and prediction of compartmentalization of the central carbon metabolism of M. neglectum
Central to the development of metabolic engineering strategies for improved TAG accumulation is not only the glycerolipid metabolism, but also the central carbon metabolism, as it determines the availability of acetyl-CoA for FA synthesis. This has been demonstrated for C. reinhardtii, for which carbon precursor supply was reported to be a key metabolic factor controlling oil biosynthesis under mixotrophic −N conditions [77]. Therefore, we next reconstructed the central carbon metabolism of M. neglectum (Additional file 1: Results). Supported by localization prediction, we propose a compartmentalization similar to C. reinhardtii [32]. Accordingly, the oxidative pentose-phosphate pathway (OPPP) is entirely plastidial, whereas glycolysis is highly compartmentalized, such that the initial steps of glycolysis take place in the chloroplast, while the later steps from 3-phosphoglycerate to pyruvate are located in the cytosol [32] (Fig. 6). However, M. neglectum might be able to perform the initial steps of glycolysis from glucose to the triose phosphates additionally in the cytosol, because M. neglectum can utilize glucose as a sole carbon source (Additional file 1: Figure S11), whereas C. reinhardtii cannot [78]; this model, however, requires further localization studies.
Differentially regulated genes of the central carbon metabolism during the three stages of N availability
Most genes of the central carbon metabolism that were responsive in the e−N stage exhibited the same direction of transcriptional regulation also in the l−N stage (see below). Transcriptional induction that was restricted to the e−N stage was observed for the putative mitochondrial pyruvate dehydrogenase complex (Fig. 6, PDHC). A dependence of starch and lipid accumulation on mitochondrial respiration was shown for C. reinhardtii, because mitochondrial mutants exhibit impaired starch and lipid accumulation under −N and −S conditions, respectively [79, 80]. Down-regulation of genes specifically in the e−N stage was observed for a putative phosphoenolpyruvate (PEP) transporter, and for one of six acetyl-CoA synthetase candidates (Fig. 6, PPT and ACS, respectively). Up-regulation restricted to the l−N stage was observed for one of two glucose-6-phosphate isomerase candidates, for one of four malate dehydrogenase candidates, and for a second of six acetyl-CoA synthetase candidates (Fig. 6, PGI, MDH, and ACS, respectively). Transcripts with exclusively decreased abundances in the l−N stage were not found. For the two acetyl-CoA synthetase candidates with differential expression under −N conditions, enzymatic activity should be confirmed in future studies. This is because M. neglectum can use acetate only to a limited extent for growth (Additional file 1: Figure S11). However, a likely functional glyoxylate cycle is encoded, since both key enzymes, malate synthase and isocitrate lyase, had transcript support and responded to the N resupply treatment by a transient up-regulation (Additional file 3, XLOC_013435 and XLOC_002446, respectively).
Of the transcriptional responses shared between the e−N and the l−N stages, the most pronounced up-regulation (>16-fold) was observed for the OPPP (Fig. 6, G6PDH and 6PGDH). A similarly strong up-regulation was noted for fermentative reactions, which have the production of acetyl-CoA from pyruvate in common (Fig. 6, PFL, PFOR, PDC, and bifunctional ADH). Furthermore, the conversion of 2-phosphoglycerate to PEP, catalyzed by enolase, was transcriptionally strongly induced (4- to 16-fold) (Fig. 6, ENO). 4- to 8-fold up-regulation was detected for one of three glycerol-3-phosphate dehydrogenase candidate genes (Fig. 6, GPDH); the respective enzyme generates glycerol-3-phosphate, which is a substrate for the Kennedy pathway and the backbone of TAG. A less pronounced induction (2- to 5-fold) was noted for PEP carboxylase (Fig. 6, PEPC). This enzyme is characteristic for C4 plants and catalyzes the fixation of CO2 by the generation of oxaloacetate from PEP [81].
For glycolysis and gluconeogenesis, an opposing transcriptional regulation was observed under −N conditions. While two committed steps of glycolysis were both up-regulated, one committed step of gluconeogenesis was down-regulated (Fig. 6, PFK, PK, and FBP, respectively). Another transcript implicated in gluconeogenesis, a putative PEP carboxykinase, was also decreased in abundance in the l−N stage, although absolute transcript abundance values were already low at the reference time point N_0 (Fig. 6, PEPCK).
Interestingly, opposite transcriptional patterns within the same enzymatic step under −N conditions were also observed. This applied to both glyceraldehyde-3-phosphate dehydrogenase candidates, as well as to all four carbonic anhydrase candidates (Fig. 6, GAPDH and CAH, respectively). Each putative carbonic anhydrase was specifically responsive in either the e−N or the l−N stage (Fig. 6, CAH), suggesting a tight transcriptional regulation of the carbon concentration mechanism under −N conditions in M. neglectum.
Upon N resupply in the r+N stage, only a few enzymatic steps of the central carbon metabolism were transcriptionally induced, while most others were repressed. Interestingly, the aforementioned transcriptional induction of the OPPP under −N conditions was maintained during the first 4 h of N resupply (Fig. 6, G6PDH and PGL). Furthermore, one putative PEP transporter, one of two putative small subunits of RuBisCo, two of four putative malate dehydrogenase enzymes, as well as two of six putative malic enzyme proteins were transcriptionally up-regulated in the r+N stage (Fig. 6, PPT, rbcS2, MDH, and MME, respectively). The transcriptional repression of both putative phosphoglycerate kinase enzymes that had been noticed for the −N phase was continued for the first 4 h in the r+N stage, after which transcript levels normalized to pre-starvation levels (Fig. 6, PGK). A transient sharp down-regulation in the r+N stage was observed for cpPDHC, as well as for four of 15 putative triose phosphate transporters (Fig. 6, PDHC and TPT, respectively). Glycolysis was down-regulated upon N resupply at several enzymatic steps, contrasting the transcriptional induction under −N conditions (Fig. 6, HK, PFK, FBA, PGM, and PK).
Transcriptional regulation of starch metabolism
Finally, the starch metabolism of M. neglectum was reconstructed (Additional file 1: Results). Starch is the major carbon storage molecule in green microalgae and plants, and can amount up to 50% of the dry biomass in C. reinhardtii under −N conditions [82]. In M. neglectum, the cellular starch content after 1 day of −N conditions amounted to ~35 pg cell−1, which was similar to the cellular neutral lipid content at day 8 with ~33 pg cell−1 (Fig. 2b, c). We analyzed the transcriptional regulation of both pathways in order to understand the interplay of both storage compound production processes. In summary, most genes assigned to the starch metabolism were subjected to transcriptional regulation (Additional file 1: Figure S10b). However, the changes in transcript abundances did not yield a completely conclusive picture. For instance, genes indicative of starch synthesis and starch degradation were consistently up-regulated during the −N treatment (Additional file 1: Figure S10b), yet cellular net starch levels decreased slightly (Fig. 2b). This discrepancy might be attributed to currently unknown post-transcriptional or post-translational regulation steps of the corresponding catabolic enzymes. Interestingly, the most strongly induced transcripts in the l−N stage were annotated as putative starch phosphorylases, and these transcripts were transiently sharply repressed in the r+N stage (Additional file 1: Figure S10b). This indicates that starch phosphorylase might play a key role in starch metabolism in M. neglectum.