Differential expression of starch and sucrose metabolic genes linked to varying biomass yield in Miscanthus hybrids

Background Miscanthus is a commercial lignocellulosic biomass crop owing to its high biomass productivity and low chemical input requirements. Within an interspecific Miscanthus cross, progeny with high biomass yield were shown to have low concentrations of starch and sucrose but high concentrations of fructose. We performed a transcriptional RNA-seq analysis between selected Miscanthus hybrids with contrasting values for these phenotypes to clarify how these phenotypes are genetically controlled. Results We observed that genes directly involved in the synthesis and degradation of starch and sucrose were down-regulated in high-yielding Miscanthus hybrids. At the same time, glycolysis and export of triose phosphates were up-regulated in high-yielding Miscanthus hybrids. These differentially expressed genes and biological functions were regulated by a well-connected network of less than 25 co-regulated transcription factors. Conclusions Our results evidence a direct relationship between high expression of essential enzymatic genes in the starch and sucrose pathways and co-expression with their transcriptional regulators, with high starch concentrations and lower biomass production. The strong interconnectivity between gene expression and regulators, chemotype and agronomic traits opens the door to use the expression of well-characterised genes associated with carbohydrate metabolism, particularly in the starch and sucrose pathway, for the early selection of high biomass-yielding genotypes from large Miscanthus populations. Supplementary Information The online version contains supplementary material available at 10.1186/s13068-021-01948-4.

sinensis and a tetraploid M. sacchariflorus. M. x giganteus has desirable traits, including high yield and early establishment [10][11][12]. However, M. x giganteus must be clonally propagated, which doubles establishments costs compared to a seed-based option [13]. Therefore, several European breeding programmes are aiming to develop a seed-based crop through recreating the hybridisation event between M. sinensis and M. sacchariflorus to produce new hybrids that out-perform M. xgiganteus [13,14].
Miscanthus is harvested for the structural cell wall polysaccharides, and as a result, multiple studies have focused on its structural carbohydrates [15,16]. However, it is the processing and storage of non-structural carbohydrates (nsc), such as sucrose and starch, that underpin biomass traits [17].
We have previously shown that high-yielding Miscanthus genotypes from an interspecific hybrid mapping family had low starch concentrations in the stem and a low ratio of starch to fructose [17]. These distinctive carbohydrate profiles were consistent across years and growing environments; thus, the phenotype is likely to be genetically controlled [17,18]. Unlike many C3 temperate grasses, C4 species such as Miscanthus or maize do not accumulate fructans, but instead accumulate starch as a transient form of storage carbohydrate [19,20]. The concentration of starch in the mapping family was up to 15% of the dry weight (dw) on average. However, higher values were observed in the lowest yielding lines, raising the possibility of bred "starch-cane" Miscanthus for liquid biofuel or biogas generation [21]. Identifying differentially expressed genes (DEGs) that relate to the carbohydrate profile could further facilitate breeding for such traits.
In this study, we analysed root (all underground tissues, including both roots and rhizomes), stem and leaf RNAseq data from the hybrid progeny from a cross of a diploid M. sacchariflorus genotype and a diploid M. sinensis genotype, which had contrasting carbohydrate profiles and yield measurements. We identified differentially expressed genes associated with the observed metabolic profiles using the recently completed M. sinensis reference genome (Miscanthus sinensis v7.1 DOE-JGI) [22]. Integrating expression and metabolic data is a logical strategy given the strong interconnectivity between genotype, chemotype and phenotype, and the lower genetic complexity of intermediate phenotypes, such as metabolites and yield subcomponents [23,24].

Contrasting carbohydrate metabolism in sequenced genotypes from a Miscanthus mapping family
A total of 102 genotypes from a paired cross between diploid M. sinensis ("M. sinen 102") and a diploid M. sacchariflorus ("M. sacch 297") were established in field conditions and phenotyped. Non-structural carbohydrates were sampled in July 2014, during the summer growing season, and annual yield was obtained at harvest after the following winter. The distribution of carbohydrate concentrations and biomass yield for 98 hybrids were previously reported [17]. After including additional information about number of tillers and flowering day for the population ( Fig. 1 and Additional file 6: Table S1), we observed significant correlations between number of tillers and starch (r = − 0.45, p < 0.001), fructose (r = 0.31, p < 0.005), and total NSC (r = − 0.40, p < 0.0001) for the whole family (Additional file 6: Table S1). We also observed significant correlations between number of tillers and the ratio of sucrose/starch (r = 0.37, p < 0.001), fructose/starch (r = − 0.45, p < 0.001), glucose/starch (r = − 0.38, p < 0.001) and sucrose/fructose (r = − 0.32, p < 0.01). We observed a significant positive correlation between biomass yield and the number of tillers (r = 0.62 ± 0.03 for three seasons, p < 0.001) in the progeny of this interspecific cross. We also observed a correlation between flowering day and yield (r = 0.37 ± 0.02 for two seasons, p < 0.001) and canopy weight (r = 0.47 ± 0.1 for two seasons, p < 0.001), but not with number of tillers (Additional file 6: Table S1). We also observed significant correlations between flowering day and fructose (r = − 0.37, p < 0.001) and the ratio of starch/fructose (r = − 0.37, p < 0.001). Genotypes initiating flowering (flag leaf visible) in August and September usually result in higher yield and height.
Four M. sinensis X M. sacchariflorus hybrids from this family (triangles in Fig. 1) were selected for RNA sequencing in their fourth growing season (2013), based on a higher or lower than the average number of tillers. The two parents of the family were also sequenced (diamonds in Fig. 1). When the four sequenced hybrids were divided into two groups (genotypes 112 and 90 against genotypes 18 and 120), we observed significant differences between these groups in the number of tillers (p < 0.05), biomass yield quantified as dry weight per plant (p < 0.05), and the final canopy heights (p < 0.05). We also observed a significant difference between these two groups in the concentrations of starch (p < 0.005) and sucrose (p < 0.05), but we did not observe significant differences between groups in the concentrations of fructose or glucose. The most significant difference (p < 0.001) was observed in the total concentration of non-structural carbohydrates (NSC), which was calculated as the sum of the glucose, fructose, sucrose and starch concentrations. We observed significant differences also in the fructose/ starch (p < 0.05) and glucose/fructose ratios (p < 0.01). However, any other ratio between concentrations was not significantly different between the groups (Additional file 6: Table S2).
There was a significant difference between the M. sacchariflorus and M. sinensis parents in NSC (p < 0.05) and sucrose concentrations (p < 0.01). However, there was no significant difference between the parents in the starch, fructose or glucose concentrations (Additional file 6: Table S2). It is likely an example of heterosis (transgressive segregation) that significant differences in starch, fructose or glucose concentrations were observed in the hybrid progeny but not the parents.

Differential expression (DE) analysis between hybrids and parental species
We performed RNA-seq from the leaf, stem and root (all underground tissues, including both roots and rhizomes) samples extracted from four M. sacchariflorus X M. sinensis interspecific hybrids, and their two parents (Table 1). When the normalised counts obtained from DESeq2 [25] were used to cluster the samples (Fig. 2), the samples firstly grouped by tissue (PC1) and secondly by species (PC2). As a result, the downstream analysis was performed for each tissue separately. Stem and root samples clustered together, and the clustering of these separately from the leaf tissue explained 64% of the variation. Species explains 17% of the variation, with the hybrids falling between the two parents, which were furthest apart from each other.
We obtained 1386 differentially expressed genes (DEG; Additional file 6: Table S3) in total between the hybrids identified as "High NSC" and "Low NSC" (Fig. 1) at FDR < 0.05 (Fig. 3a). There were 892 DEGs in stems (598 up-regulated and 294 down-regulated), 741 DEGs in leaves (410 up-regulated and 331 down-regulated), and only 253 DEGs in roots (116 up-regulated and 137 downregulated). 64% of the DEGs in roots were DE in both of the other tissues too, but most DEGs in stem or leaves were exclusively DE in either stem or leaves.
We also compared the expression between the hybrids against each parent and considered a gene as DE if it was DE in both comparisons at FDR < 0.05 (Additional file 6: Table S4). Under these criteria, there were 2870 DEGs in roots, 1,464 DEGs in leaves, and 729 DEGs in stems (Fig. 3b). Only 64 among these DEG were also DE between "High NSC" and "Low NSC" hybrids. There were 16,311 DEGs between the hybrids and M. sinensis alone (Additional file 1: Figure S1), and 15,616 DEGs between the hybrids and M. sacchariflorus alone (Additional file 2: Figure S2), this is over a third of the total transcriptome.

Enriched Gene Ontology (GO) terms in DEGs
Enrichment analysis of GO terms over-represented among DE genes allowed us to identify the biological processes (BP) and molecular functions (MF) that are differentially regulated in each tissue. After annotating the reference transcriptome with the homologous proteins and full set of GO terms and (Additional file 6: Table S5), we simplified the results to the more general "GO slim" terms.
All the significant enrichment "GO slim" terms among DEGs between the "High NSC" and "Low NSC" hybrids were associated with metabolic processes, with the single exception of "response to stress" in stems ( Fig. 4; Additional file 6: Table S6). Among the GO terms in the "biological process" category, the most significantly enriched Values for four hybrids with contrasting phenotypes ("high" and "low"), which were selected for RNA sequencing, are highlighted (Triangles). Significant differences (T-test) between the hybrids are annotated under each phenotype. The two parents of the family were also sequenced and phenotyped (diamonds). Boxplots summarise the distribution of values for the whole family for each phenotype ones (p < 0.001) were "Carbohydrate metabolism" and "Secondary metabolism" in stem and leaves, and "Generation of precursor metabolites and energy" and "response to stress" in stems. Among the "molecular process" category, "hydrolysis on glycosyl bonds" and "redox activities" were the most significantly enriched (p < 0.0001) in both stems and leaves (Additional file 6: Table S6).
Thirty-six enzymatic reactions were annotated among DEG in the stem ( Table 2). Only six were down-regulated in "High NSC"; four involved in the generation A similar analysis on the enriched GO slim terms among DEGs between hybrids and parents (Additional files 3, 6: Figure S3; Table S7) revealed that the most significantly enriched GO terms (p < 0.01) were in the root and associated with RNA/DNA binding and translation (including ribosome biogenesis and equivalent terms), and several biosynthetic processes. Remarkably, there were no enriched GO terms in the stem between hybrids and parents.

DEG associated with the starch and sucrose metabolism
There were 88 DEGs associated with the enriched "Carbohydrate metabolism" (GO:5975) GO term (Additional file 6: Table S8), specifically 57 DEGs in stems (42 upregulated and 15 down-regulated) and 44 DEGs in leaves (20 were up-regulated and 24 down-regulated). Thirteen DEGs were common to both tissues and showed close fold-change values in both tissues. All but two of these 88 DEGs could be functionally annotated, 52 and 56 of them had a homologous protein in A. thaliana or rice, respectively.
Twenty-nine DEGs were involved in enzymatic reactions that were part of the starch and sucrose metabolic pathways (KEGG pathway ath00500; Additional file 4: Figure S4). Among these, all 20 DEGs in stems were upregulated in "High NSC", but half of the DEGs in leaves (which were beta-glucosidases) were down-regulated in "High NSC". Enzymatic proteins in the starch degradation pathway were DE in root and leaves (e.g. AMY3, ISA3, BAM1). At the same time, sucrose metabolism genes in the cytosol were only DE in stems (SUS3, SPS5). Similarly, reactions involving ADP-glucose were only DE in stems (e.g. AGP, SS2, SS3, SBE2).
Twenty-nine genes were annotated as involved in the "generation of precursor metabolites and energy" (GO:6091) (Additional file 6: Table S8), 17 of which could be annotated with an enzymatic code (KEGG pathway ath00010; Additional file 5: Figure S5). Six genes were involved in starch metabolism (ISA3, DBE1, PFK2, SBE2, PHS2). The phosphofructokinase 2 (PFK2) is the only one clearly down-regulated in "High NSC". Among the others, a malate synthase (MLS) and an aldehyde dehydrogenase 12A1 involved in siRNAs generation, and an Fts protease (FTSH6) in the chloroplast were all highly up-regulated (FC > 5) in "High NSC". On the other hand, triosephosphate isomerase (TIM) was down-regulated in "High NSC".
The relation between 32 DEGs involved in the 12 DE enzymatic reactions in starch and sucrose metabolism, plus three of the glycolysis reactions, are summarised in Fig. 5 and Tables 2 and 3.

DEG associated with other enriched GO terms
The 72 genes annotated as "Response to stress" were involved in a broad range of responses (Additional file 6: Table S10). On the other hand, the most significantly enriched GO terms in the "Molecular functions" category were associated with metabolic-related enzymatic reactions, namely "oxidoreductase activities" and "hydrolase activities". The former included 38 cytochrome P450 proteins.
"Secondary metabolism" was enriched in both stems and leaves. 17 of the 19 DEGs in stems were up-regulated, Fig. 4 GO SLIM terms (rows) that were significantly enriched (p < 0.05) in each tissue (columns) among differentially expressed genes (DEG) from the "High NCS" and "Low NCS" Miscanthus hybrids DE analysis. The size of a bubble is proportional to the number of DEG annotated with that GO term. Rows are sorted by descending p-value (F-Fisher test) and the bubble colour is representative to the obtained p-value, from lower (dark green) to higher (light green). Yellow (p > 0.05) and white (p > 0.1) bubbles were not enriched. All the enriched GO SLIM terms for the "biological process" (top 8 rows) and "molecular function" (bottom 5 rows) GO categories were included but half of the DEGs in leaves were down-regulated. 16 of the 31 genes involved in the "secondary metabolism" were cytochrome P450 proteins (Additional file 6: Table S11). Six were included in benzoxazinoids biosynthesis, which is associated with defence in grasses. Another six were involved in terpenoids and phenylpropanoid biosynthesis (KEGG ath00900 and ath00940).
Many of the identified DEG in enriched functions showed no homologies in model organisms and consequently remain uncharacterised. This is the case in 36 DE genes involved in the carbohydrate metabolism (over 88 total), whose function was evidenced by the presence of a protein domain, but with an unclear role. A similar case is noted in two genes involved in the "generation of precursor metabolites", 12 genes involved in the "secondary metabolism", and 17 genes involved in "response to stress".

Transcriptional regulatory co-expression network (TRN) inference and analysis of regulated target genes (regulons)
We annotated 5045 transcription factors (TFs) in the M. sinensis proteome based on homology with the Plant Transcription Factor Database (Additional file 6: Table S5) [26]. The set of target genes regulated by a given TF forms a regulon. We inferred the putative regulon of each TF based on co-expression between targets and TFs using the RTN package [27]. For 4,427 TFs we identified at least one target gene (Additional file 6: Table S12). The complete TRN included 26,710 genes (nodes) and 57,643 links between genes (edges).
We compared the overlap between the target genes in each regulon with the lists of DEGs previously obtained to identify regulons enriched in DE genes. We identified 100, 52 and 29 regulons (117 regulons in total) that were significantly enriched in DEGs in stems, leaves and roots, respectively (FDR < 0.05; Additional file 6: Table S13). Most regulons enriched in stems (62%) were only enriched in that tissue, but only 12% and 1.7% of the regulons in leaves and roots were exclusive to such tissue, respectively. We later verified this analysis using two-tail Leaf/stem FC = Log2 fold-change expression "high NSC"/"Low NSC" hybrids in either lead or stem tissues; ATH/RICE = Homologous protein in Arabidopsis thaliana and rice (The prefixes "AT" or "LOC_Os" were excluded from the gene name). TF, Transcription factor regulating the gene, only TFs that were DE or which regulon was enriched in DE-targets are shown. The prefix "Misin" was excluded from the TF gene name  GSEA, including the expression fold-change to rank the genes (Additional file 6: Table S13).
On the other hand, we identified the GO terms enriched (FDR ≤ 0.05) in each regulon to clarify the processes it may be involved (Additional file 6: Table S14). 2,989 regulons were associated with at least one significant GO term. Among these, 213, 232, and 115 regulons were, respectively, annotated as involved in "carbohydrate metabolism" (GO:5975), "generation of precursor metabolites and energy" (GO:6091), and "secondary metabolism" (GO:19748); for a total of 515 unique regulons. The TFs regulating 12 of these 515 regulons were differentially expressed (Additional file 6: Table S14).
We identified 28 regulons (Table 4 and Additional file 6: Table S15) that were (i) enriched with the "carbohydrate metabolism", "generation of precursor metabolites and energy", or "secondary metabolism" GO terms; and were also (ii) enriched in DEGs, or where the TF was DE. These 28 regulons contained 806 target genes in total, but only 134 were DE (Additional file 6: Table S16). Figure 6 contains the transcriptional regulation coexpression network (TRN) for all the DEGs present in it, plus the 28 previously highlighted regulons (Table 3), coloured by enriched GO term. The TRN could be divided into one large, well-connected subnetwork, which included a highly co-regulated core (Fig. 6a) and a regulation loop (Fig. 6b). This subnetwork included 205 DEGs, and 20 of the 28 highlighted regulons (circles/squares with a dark grey border) in Table 3. It also included 5 of the target genes involved in starch metabolism in Table 2 (green circles with a black border in Fig. 6). This subnetwork was enriched in genes associated with "carbohydrate metabolism" (green nodes in Fig. 6). There were two other large subnetworks, but these were not clearly associated with the three GO terms. Three regulons (Fig. 6c) evidenced the co-regulation of several genes involved in carbohydrate metabolism, including starch and sucrose metabolism, by the same TF.

Discussion
We performed a transcriptional RNA-seq analysis between selected Miscanthus hybrids with negative correlations between starch and sucrose concentrations and biomass yield.
Using a mapping family (n = 102) between a diploid M. sinensis and a diploid M. sacchariflorus, we previously demonstrated that high biomass-yielding Miscanthus hybrids had low starch and high fructose concentrations in the stem, and a lower ratio of sucrose, glucose and starch to fructose under peak growing conditions [17].
Here, we selected four hybrids from this mapping family based on the number of tillers (transect count), which was highlighted previously as a target phenotype for increasing biomass yield [28]. These four hybrids could be divided into two groups (Table 1), which showed significant differences in the concentrations of starch and sucrose, but not of hexose. The most significant differences were observed for total NSC because of the cumulative effect of the differences in starch and sucrose.
Tillering was correlated positively with yield and negatively with NSC, and flowering was correlated with yield and plant height. Flowering and senescence represent the termination of biomass accumulation and delayed flowering can result in increased biomass. However, not all late or non-flowering genotypes are high yielding, as was seen here. Likewise, while tillering is associated with yield in the progeny of this cross, the M. sacchariflorus parent had higher tillering than the sinensis parent, but very low biomass (Fig. 1). Additionally, a large number of progeny have a higher yield than both parents (Fig. 1), indicating that heterosis may well be a factor within the population.
Approximately 10% of the total genes in Miscanthus were differentially expressed (DE) between these two groups of hybrids in stems and leaves, but not many were in roots (Fig. 3a). Among these DE genes, there was an enrichment of genes involved in carbohydrate and secondary metabolism in stem and leaves, and in the "generation of precursor metabolites and energy" in stem only (Fig. 4). To better understand how these three biological processes are regulated, we built a transcriptional regulatory co-expression network (TRN; Fig. 6), which is later further discussed. However, these GO terms for biological processes were not regulated similarly in both tissues. While the DEGs in the enriched categories were predominantly up-regulated in stems, they were evenly up-regulated and down-regulated in leaves.
The DE of carbohydrate metabolising genes between the leaf, stem and root is to be expected as it has been previously reported that carbohydrates are differentially distributed between these tissues in Miscanthus in July, the same month our study was conducted [18,29]. Specifically, starch was up to 6 × more concentrated in the leaves than the stems, the below-ground biomass preferentially accumulated starch, and soluble sugars tended to be in greater concentrations in the stems compared to leaves [18]. Our transcriptional observations therefore largely reflect the distribution of carbohydrates; starch metabolism transcripts were DE in the leaf where starch is the most abundant carbohydrate, and sucrose metabolising enzymes were DE in the predominantly sucrose accumulating stem [29]. Fewer DEGs were observed in roots. Seasonal carbohydrate profiling of rhizomes in four genotypes showed that the soluble sugar contents were similar between genotypes and across two sites located 340 km apart [29].
We observed that multiple genes involved in the synthesis (AGP, SS2, SS3, BE2) and degradation of starch in the chloroplast (AMY3, ISA3, SEX4, BAM1) were down-regulated in high biomass-yielding genotypes ( Fig. 5; Table 2). We also observed down-regulation of genes involved in the synthesis (SPS5) and degradation (SUS) of sucrose in high biomass-yielding genotypes. Genes involved in the starch metabolic pathway are upregulated by a high sugar status [30][31][32], as there was a negative relationship between yield and soluble sugar (i.e. Table 4 Regulons (set of genes regulated by a TF) significantly enriched in DEGs (or which TF was DE) and associated with "carbohydrate metabolism", "generation of precursor metabolites and energy", or "secondary metabolism" GO terms  high yielders had lower sugar), it is consistent that the expression of sugar stimulated genes would be lower in high-yielding genotypes. Contrary to this, we noticed the upregulation with a high fold-change in high biomass yield genotypes of triosephosphate isomerase TIM/PDTPI, which encodes a plastidic triose phosphate isomerase [33], and Phosphofructokinase 2 (PFK2). PFK2 catalyses the penultimate step before usable energy is extracted from the phosphorylated products of photosynthesis. This enzyme is, Fig. 6 Transcriptional regulatory co-expression of all the differentially expressed genes (circles) observed between "High NSC" and "Low NSC" Miscanthus hybrids. Sixteen TFs which regulon was enriched in DEGs were also included (squares). TFs are represented in a larger size than the target genes. Nodes with a black or grey border are listed in Tables 2 or 4, respectively. Activation or repression between a TF and target gene are, respectively, represented by red or blue links (edges) between genes (nodes). Node colour corresponds to GO term enrichment, lighter colours for regulons and darker colours for individual genes; green: carbohydrate met., orange: generation of precursor met. and energy, purple/indigo: secondary met. At the bottom of the plot, DE target genes or TFs (larger size) in the TRN that were not connected to other DEGs therefore, a main control point of glycolysis. The observation that high biomass plants have low carbohydrates can seem counter-intuitive, but the rationale is highly logical; high biomass plants maximise growth at the expense of their carbon reserves [34], whereas slow-growing types accumulate their reserves. The upregulation of the PFK2 gene encoding a major glycolytic enzyme is suggestive of a more rapid metabolism of photosynthate to fuel growth in the high-yielding types. In summary, starch and sucrose synthesis was down-regulated in high-yielding Miscanthus hybrids, while glycolysis and export of triose phosphates was up-regulated in high-yielding Miscanthus hybrids.
These results support that high-yielding Miscanthus genotypes were more rapidly accumulating structural mass, likely cellulose via sucrose metabolism [35][36][37], at the expense of starch [17,29,38]. The latter is further supported by the significant differences in the fructoseto-starch (but not glucose-to-starch) ratio between high and low yielding hybrids [17], which was also observed between the sequenced hybrids. Fructose is an indicator of sucrose metabolism, because it is produced exclusively from the metabolism of sucrose by the action of sucrose synthases (SUS), while glucose is produced by the metabolism of both sucrose and starch [39,40]. Furthermore, in a C13 labelling experiment, it was observed that a greater proportion of the labelled carbon was observed in the insoluble fraction (mainly comprising cellulose) of a rapidly growing Miscanthus genotype, whereas a greater proportion was partitioned into starch in a slower-growing type [17]. Our results, therefore, add to these previous observations with the addition of transcriptomic evidence of the relationship between carbon metabolism, partitioning and growth.
We built a transcriptional regulatory co-expression network (TRN) that included 4,427 regulons. We identified the 28 regulons ( Table 4) that were significantly enriched in the "carbohydrate metabolism", "generation of precursor metabolites and energy", or "secondary metabolism" GO terms, and also in DE target genes (or where the TF itself was a DEG).
When we plotted the TRN for every DEGs plus these 28 regulons (Fig. 6), 20 of these 28 regulons were well connected to each other's and formed a contained subnetwork with a "core" (Fig. 6a) and a "loop" (Fig. 6b). At the core, and TF 6G000800 was DE in every tissue and regulated six genes, including a Beta-fructofuranosidase (Table 2), a Malate dehydrogenase (Table 3), a 2-carboxy-D-arabinitol-1-phosphatase (Table 3), and an uncharacterised gene (07G271500). This gene (07G271500) was remarkably co-regulated by six different TFs. TFs GLDP1 (15G053900) and bHLH041-like (5G170700) appeared to be the link between the core and loop of the subnetwork.
The two transcription factors were connected to FLR1, ERD9 and indirectly to the highly expressed a malate transferase (MLS).
The loop (Fig. 6b) included 14 TFs that were co-regulated and activated. These TF were connected through single target genes that were not well characterised. The other two large subnetworks were not clearly associated to the studied GO terms. One of them included, however, two of the 28 highlighted regulons (DYW1 and 7G253100). Three small subnetworks with just one/two TFs highlighted the co-expression of several targeted genes (Fig. 6c).

Conclusion
Our results evidence a direct relationship between high expression of essential enzymatic genes in the starch and sucrose synthesis pathway, their transcriptional regulators (TFs) and co-expression, high starch concentrations, and lower biomass production. The strong interconnectivity between expression, regulators, genotype, chemotype and agronomic traits opens the door to use the expression of well-characterised genes associated with carbohydrate metabolism, particularly in the starch and sucrose pathway, for the early selection of high biomass-yielding genotypes from large Miscanthus populations. Since regulation appears to play a key role in NSC content, these identified TFs offer a breeding or biotechnological target for improvement, or selection of "starch-rich" genotypes.

Mapping population establishment and phenotyping
A total of 102 genotypes from a paired cross between diploid M. sinensis genotype "M. sinen 102" and a diploid M. sacchariflorus genotype "M. sacch 297" were sown from seed in trays in a glasshouse in 2009. In 2010, individual plants were split to form three replicates of each genotype and then planted out into the field in a spaced-plant randomised block design comprising three replicate blocks at IBERS, Aberystwyth, UK. Tiller numbers were counted in the field in 2012. Plant material was harvested for RNA extraction from additional clonal pot grown plants in an unheated glasshouse mid-May 2012, when plants were actively growing. Thus, leaves were fully expanded, stems were elongating, and no flowering had occurred. Plants were checked for the presence of rhizome prior to final selections being made, and only those with rhizome present were selected for this study. Details of the carbohydrate analysis were previously described [17]. Briefly, the family was harvested in February 2015 following the 2014 growing season. Biomass was dried to a constant weight, and the average DW weight per plant (kg) was calculated. Soluble sugars were extracted and quantified enzymatically and photometrically from known standard curves on the same plate, as previously detailed [29]. Starch was extracted using a modified Megazyme commercial assay procedure and quantified photometrically from known standard curves on the same plate, as previously described [29]. Four hybrid genotypes were selected based on a low or high number of tillers (transect count of tillers). Correlation between concentrations, plant height and biomass phenotypes for the whole mapping population was previously quantified [17]. Pearson's correlation values between the number of tillers and the other phenotypes were determined for the whole family. Differences between the four selected hybrids for all phenotypes were determined with Student's two-tailed t-tests.

RNA sequencing and pre-processing
RNA was extracted from the four selected hybrids, as well as from the two parents of the family. Extraction was performed using RNeasy Plant Mini kit (Qiagen, CA, USA) according to the manufacturer's instructions. Total RNA samples were sent to the sequencing service at the Earlham Institute (Norwich, UK) where standard Illumina RNA-seq libraries were prepared and sequenced using the HiSeq 2000 platform. The raw reads were filtered with Trim Galore [41] using the default options for paired-end reads to remove Illumina adaptor sequences and reads with quality scores below 20. Cleaned reads were aligned to the M. sinensis reference genome (Miscanthus sinensis v7.1 DOE-JGI, http:// phyto zome. jgi. doe. gov) [22] downloaded from Phytozome with Kallisto using the "quant" mode with default options [42]. Previously, the reference was indexed using the M. sinensis gene annotation (Miscanthus sinensis v7.1 DOE-JGI, http:// phyto zome. jgi. doe. gov) downloaded from Phytozome in GFF3 format. This same gene annotation was functionally annotated with GO terms and enzyme codes with the commandline version of Blast2GO [43] using BLASTX with an E-value of 1e−10 and the NCBI non-redundant (nr) and EBI InterPro databases. To identify and annotate the transcription factors, the M. sinensis proteome was downloaded from Phytozome and aligned to the Plant Transcription Factor Database (version 5) [26] using Diamond [44]. We retained alignments with a minimum identify of 70% and score of 200.

Differential expression and enrichment analysis
The differential expression and enrichment analysis are fully available in an R notebook (See Data availability), which is also available via Github. Briefly, Kallisto count files, one from each of the 23 libraries, were imported in R using TXimport [45]. Differential analysis was performed using DESeq2 [25] for each tissue (root, stem, leaf ) independently. Raw gene counts were obtained from Kallisto alignments and normalised using DESeq2 for the top 1,000 most variable genes to cluster the samples. Genes with a False Discovery Rate (FDR) < 0.05 were considered differentially expressed (DE). We compared two groups of hybrids; each hybrid group was composed of two genotypes (genotypes 112 and 90 against genotypes 18 and 120). We also compared the hybrids against the M. sacchariflorus and M. sinensis parent, one at the time. A gene only was considered DE between hybrids and parents when it was DE against both parents. The enrichment analysis was based on an F-Fisher test (FDR < 0.05) using the library topGO [46] with the "weight01" algorithm. Using the lists of DE genes and functional annotation as inputs, topGO compared the number of DEGs in each category with the expected number of genes for the whole transcriptome. The "weight01" algorithm resolves the relations between related GO ontology terms at different levels. Enriched categories were plotted using ggplot2 [47]. Genes in enriched GO terms were further analysed using the online Phytomine [48] and Thalemine [49] databases. Genes annotated with enzyme codes were plotted using the online KEGG mapper [50].

Transcriptional regulatory network (TRN) inference and regulon enrichment analysis
The TRN inference and regulon enrichment analysis was done with RTN v. 2.10.1 [27] and topGO v. 2.38.1 [46]. The code is fully available in an R notebook (see Data availability), which is also available via Github. Briefly, the normalise counts from the previous analysis with DESeq2 and the list of known TFs in M. sinensis were provided to the RTN package with default options. RTN uses permutation (1000 permutations, FDR < 0.05) to remove non-significant TF-target associations, and bootstrapping (100 bootstraps, 95% consensus) to remove unstable interactions, before applying the ARACNE algorithm (eps ≥ 0) for network reconstruction. GO term enrichment in regulons with topGO was done as previously described but using the list of target genes in a given regulon as gene-set instead of the list of DEGs. The overlap between the target genes in each regulon to the lists of DEGs was done by enrichment analysis with Master Regulatory Analysis (MRA) and two-tail GSEA. MRA compared the list of DEGs (present/absent) with the members in each regulon. GSEA ranked the genes by expression fold-change and compared with either the activated (positive regulon) and repressed (negative regulon) subsets of each regulon. Both methods are between traits. Table S2. Traits significantly different (T-test) between the sequenced samples. Table S3. Normalised counts, expression fold-change and P-values for all the genes in roots, stem and leaf tissue between groups of hybrids. Table S4. Normalised counts, expression fold-change and P-values for all the genes in roots, stem and leaf tissue between hybrids and parents. Table S5. Functional annotation, GO and enzyme codes for all the genes in the reference genome. Table S6. Enriched GO terms among DEG between groups of hybrids. Table S7. Enriched GO terms among DEG between hybrids and parents. Table S8. Detailed functional annotation of 88 DEG within the enriched "carbohydrate metabolism" GO term. Table S9. Detailed functional annotation of 29 DEG within the enriched "generation of precursor metabolites and energy" GO term. Table S10. Detailed functional annotation of 72 DEG within the enriched "response to stress" GO term. Table S11. Detailed functional annotation of 31 DEG within the enriched "secondary metabolism" GO term. Table S12. Transcription factor and target genes contained in each regulon in the TRN. Table S13. Enrichment analysis of differential expression genes for each regulon in the TRN. Table S14. Enrichment analysis of GO terms in each regulon for the TRN. Table S15. Annotation of the twenty-eight regulons were enriched with the "carbohydrate metabolism", "generation of precursor metabolites and energy", or "secondary metabolism" GO terms; and were also enriched in DEGs, or where the TF was DE. Table S16. Annotation of the target genes in the twenty-eight highlighted regulons.