Skip to main content

Multi-omics analysis reveals novel loci and a candidate regulatory gene of unsaturated fatty acids in soybean (Glycine max (L.) Merr)

Abstract

Background

Soybean is a major oil crop; the nutritional components of soybean oil are mainly controlled by unsaturated fatty acids (FA). Unsaturated FAs mainly include oleic acid (OA, 18:1), linoleic acid (LLA, 18:2), and linolenic acid (LNA, 18:3). The genetic architecture of unsaturated FAs in soybean seeds has not been fully elucidated, although many independent studies have been conducted. A 3 V multi-locus random single nucleotide polymorphism (SNP)-effect mixed linear model (3VmrMLM) was established to identify quantitative trait loci (QTLs) and QTL-by-environment interactions (QEIs) for complex traits.

Results

In this study, 194 soybean accessions with 36,981 SNPs were calculated using the 3VmrMLM model. As a result, 94 quantitative trait nucleotides (QTNs) and 19 QEIs were detected using single-environment (QTN) and multi-environment (QEI) methods. Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with a significant single-environment QTN.

Conclusions

For QTNs and QEIs, further haplotype analysis of candidate genes revealed that the Glyma.03G040400 and Glyma.17G236700 genes were beneficial haplotypes that may be associated with unsaturated FAs. This result provides ideas for the identification of soybean lipid-related genes and provides insights for breeding high oil soybean varieties in the future.

Background

Soybean [Glycine max (L.) Merr.], a major oil crop, is commonly used in cooking oil [1]. Soybean oil is mainly composed of saturated fatty acids (FAs) and unsaturated FAs. Among them, saturated FAs include palmitic and stearic acids, and unsaturated FAs include oleic (OA), linoleic acid (LLA), and linolenic acids (LNA) [2, 3]. Unsaturated FA is the main component of vegetable oil, accounting for more than 80% [4]. The increase in the content of OA, a monounsaturated FA, can improve oxidative stability and prevent oxidation [4]. LLA and LNA are polyunsaturated FAs and are very beneficial to human health [5]. However, the LLA and LNA show poor stability at a high temperature and are easily oxidized [6]. Thus, an important goal of soybean breeders is to increase the OA level and reduce the LLA and LNA content [7, 8].

Genome-wide association study (GWAS) mapping can identify the genetic basis of a variety of complex traits [9]. To date, the single-locus GWAS method has been widely applied to mine genetic loci underlying important agronomic traits, including 100-seed weight in soybean and oil content and yield-related traits in maize [2, 10, 11]. However, quantitative trait nucleotides (QTNs) have been detected using the single-locus GWAS method, which has limited ability to detect QTNs because quantitative traits are affected by a polygenic background [12].

Currently, the mixed linear model (MLM) method can correct population structure and family relationships and is widely used [13]. Based on the MLM method, single-locus GWAS methods have been widely proposed, including EMMAX, FaST-LMM and GEMMA [12, 14, 15]. However, single-locus GWAS methods generally require Bonferroni correction and can be affected by a polygenic background. To overcome this problem in single-locus GWAS methods, multi-locus GWAS methods have been applied, in which statistics are applied to all loci [16]. These multi-locus GWAS methods mainly include FASTmrEMMA, FASTmrMLM, FarmCPU, and pLARmEB [17,18,19,20]. However, these methods have a high calculation burden, and the advantages of QTN-by-environment interactions (QEIs) have not been fully considered.

To address this, a new multi-locus GWAS model, the 3 V multi-locus random-SNP-effect mixed linear model (3VmrMLM), has been presented [21]. This method improves the QTL detection capability and can analyze the genetic variation of complex traits. It provides a new method for the gene discovery of complex traits.

In this study, a dataset of 194 soybean accessions with 36,981 SNPs was applied [2]. We analyzed the unsaturated FA content in this population of 194 soybean accessions based on the multi-locus GWAS model (3VmrMLM). Our aim was to detect significant QEIs and stable QTNs compared with the results of our previous study and other independent studies and to further identify candidate genes related to unsaturated FA content.

Results

Phenotypic variation of three soybean unsaturated FA compositions

The distribution of unsaturated FA content (including OA, LLA, and LNA) in the 194 soybean accessions is shown in Table 1. The coefficient of variation (CV%) differed among the three years. In 2013, the unsaturated FA content had the highest CV at 51% (OA), 48% (LLA), and 52% (LNA). In 2014, the CVs of OA, LLA, and LNA were relatively consistent at 28%, 25%, and 29%, respectively. In 2015, the CV of unsaturated FAs was basically the same as in 2014 (Table 1). The heritabilities of OA, LLA, and LNA were 0.41, 0.36, and 0.35, respectively (Table 1). The above results showed that the content of unsaturated FAs was affected by the environment.

Table 1 Statistical analysis of oleic, linoleic, and linolenic acid traits

The correlation coefficient of the unsaturated FA content was calculated. As shown in Fig. 1, OA, LLA, and LNA content had a high correlation within the same year. However, the OA, LLA, and LNA content was not high between different years. In 2013, OA was positively correlated with LLA and LNA (0.92 and 0.83, respectively). In 2014, OA was positively correlated with LLA and LNA (0.84 and 0.65, respectively). In 2015, OA was positively correlated with LLA and LNA (0.79 and 0.64, respectively). These results show that unsaturated FAs affect soybean oil accumulation.

Fig. 1
figure 1

Distribution of oleic, linoleic, and linolenic traits in soybean and Pearson coefficients

Identification of QTNs for unsaturated FA-related traits using 3VmrMLM

In this study, the unsaturated FA content was reanalyzed using the single-environment QTN model (3VmrMLM). A total of 94 significant QTNs were associated with the unsaturated FA content (LOD score ≥ 3.0). Among them, 30, 34, and 30 QTNs were associated with OA, LLA, and LNA content, respectively (Table 2).

Table 2 QTNs identified for unsaturated fatty acid content using the QTN detection model in 3VmrMLM

In 2013, 2014, and 2015, 12, 10, and eight QTNs were associated with OA content, with LOD scores of 4.55–25.92, 6.25–23.29, and 3.37–9.57, respectively. A total of 17, 10, and seven QTNs associated with LLA content were identified with LOD scores of 4.43–24.14, 3.42–20.76, and 4.98–17.08 in 2013, 2014, and 2015, respectively. In 3 years (2013, 2014, and 2015), 10, eight, and 12 QTNs associated with LNA content were detected with LOD scores of 4.17–22.53, 5.07–10.72, and 6.46–20.27, respectively (Table 2, Additional file 1: Fig. S1).

Detection of QEIs for unsaturated FA content using 3VmrMLM with multiple environments

The unsaturated FA content was reanalyzed in 3 years (2013, 2014, and 2015) using the multiple-environment QEI model (3VmrMLM) for identifying QEIs. A total of 19 significant/suggested QEIs were identified (Table 3, Fig. 2). Three significant QEIs overlapped with the above QTNs. In these QEIs, the r2 value was between 2.01 and 14.67, and the variance value was between 0.03 and 1.33 (Table 3).

Table 3 Significant/suggested QEIs for soybean unsaturated fatty acid content in three environments detected using the QTN-by-environment detection model in 3VmrMLM
Fig. 2
figure 2

Manhattan plots of the multi-environment analysis for the oleic, linoleic, and linolenic acid content in soybean seeds

Candidate gene prediction of significant QTNs associated with unsaturated FA in soybean

There were 1246 genes identified in the flanking genomic region of each significant QTN using the 3VmrMLM method (Additional file 1: Table S1). We further conducted the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis. As shown in Additional file 1: Fig. S2A, 201 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems, including lipid metabolism, amino acid metabolism, energy metabolism, transport, and catabolism. The results of the above enrichment analysis showed that some candidate genes around QTN were found in different processes.

The same methods mentioned above were used to analyze candidate genes in the flanking regions of the QEIs. A total of 301 candidate genes were found in the linked regions of significant QEIs (Additional file 1: Table S2). KEGG analysis found that 53 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, and organismal systems, including carbohydrate metabolism and lipid metabolism (Additional file 1: Fig. S2B). In the multiple-environment QEI model, five known SNP markers were identified. In addition, some new SNP markers, including rs6528670, rs1902760, rs4633292, rs2457629, and rs48948953, were related to FA synthesis. Moreover, some known markers were identified in the multiple-environment QEI model, including rs14264702, rs34595703, rs44492166, rs23852645, and rs26951255. Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant QTN in a single year, of which rs14264702 has been reported [22].

Transcriptomic analysis of HUFA and LUFA soybean seeds

RNA-seq analysis was conducted to reveal the transcriptional regulation of unsaturated FA metabolism in HUFA (high unsaturated fatty acid) and LUFA (low unsaturated fatty acid) soybean seeds. Three comparison groups were analyzed: a comparison group of five HUFA and five LUFA varieties (FHUFA vs. FLUFA); a comparison of 10 HUFA and 10 LUFA varieties (THUFA vs. TLUFA); and a comparison of 15 HUFA and 15 HUFA varieties (HUFA vs. LUFA) (Additional file 1: Table S3).

There were 4013, 3504, and 2546 DEGs in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Fig. 3A, B). In each comparison group, the number of upregulated DEGs was higher than the number of downregulated DEGs. As shown in Fig. 3C, 1160 common DEGs were upregulated, while 183 common DEGs were downregulated.

Fig. 3
figure 3

Multivariate statistical analysis of the transcriptome data in the soybean samples. A Volcano maps in different comparison groups. B Number of differentially expressed genes. The green and yellow columns represent the numbers of genes with upregulated and downregulated expression, respectively. C Upsetplot diagram showing the overlapping DEGs in the three comparison groups

Identification of candidate genes by integrating GWAS and RNA-seq analysis

To further identify candidate genes, DEGs were identified by integrating GWAS and RNA-seq analysis and by analyzing potential candidate genes. In the single-environment QTN model, 91, 85, and 61 DEGs were found in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Additional file 1: Table S1). A total of 30 DEGs were found in all three comparison groups. Among them, Glyma.10G079500, Glyma.19G163600, Glyma.09G033600, and Glyma.02G068900 genes were upregulated (Log2FC > 3), and Glyma.07G205400, Glyma.09G053700, and Glyma.06G175100 genes were downregulated (Log2FC <  − 1) (Table 4, Additional file 1: Fig. S3A).

Table 4 Candidate genes are identified in the transcriptome and QTN detection model

In the multiple-environment QEI model, 26, 28, and 20 DEGs were found in the FHUFA vs. FLUFA, THUFA vs. TLUFA, and HUFA vs. LUFA groups, respectively (Additional file 1: Table S2). Among these candidate genes, nine were simultaneously detected by GWAS and common DEGs in all three comparison groups. These nine genes included those encoding an Acyl-CoA-binding protein (Glyma.17G236700), Ankyrin repeat family protein (Glyma.09G053700), Nodulin MtN3 family protein (Glyma.08G025100), Integrase-type DNA-binding superfamily protein (Glyma.18G206600), Calmodulin-domain protein kinase 9 (Glyma.14G023500), ARM repeat superfamily protein (Glyma.03G036700), Protein kinase superfamily protein (Glyma.03G036000), BRI1 kinase inhibitor 1 (Glyma.06G039100), NAC domain-containing protein 73 (Glyma.13G234700), and unknown function protein (Glyma.18G205700, Glyma.18G205400) (Table 5, Additional file 1: Fig. S3B). The expression of these genes was further determined by qRT-PCR and was basically consistent with that of the transcriptome data (Additional file 1: Fig. S4).

Table 5 Candidate genes are identified in the transcriptome and QEIs detection model

Metabolic profiling analysis of MHUFA and MLUFA soybean seeds

To determine the unsaturated FA regulatory network at the seed development stage, a non-targeted metabolic profiling analysis was applied. There were 15 high unsaturated FA (HUFA) and 15 low unsaturated FA (LUFA) soybean varieties applied in this study (Additional file 1: Table S3). Multiple metabolites were detected using non-targeted metabolomics, including secondary metabolites, lipids, amino acids, and flavonoids.

To explore the differences in metabolites between different varieties, three comparison groups were studied: five high-unsaturated FA (FMHUFA) and five low-unsaturated FA (FMLUFA) varieties (FMHUFA vs. FMLUFA); 10 high-unsaturated FA (TMHUFA) and 10 low-unsaturated FA (TMLUFA) varieties (TMHUFA vs. TMLUFA); and 15 high-unsaturated FA (MHUFA) and 15 low-unsaturated FA (MLUFA) varieties (MHUFA vs. MLUFA). The OPLS-DA analysis showed that the model accurately described each sample and was suitable for subsequent analysis (Fig. 4A). According to the OPLS-DA model, 70 differentially abundant metabolites (DAMs) were upregulated, and 291 DAMs were downregulated in FMHUFA vs. FMLUFA (Fig. 4B). These metabolites included lipids, secondary metabolites, and unknown metabolites. In addition, 202 and 322 DAMs were identified in TMHUFA vs. TMLUFA and MHUFA vs. MLUFA, respectively (Fig. 4B). As shown in Fig. 4C, four common upregulated DAMs and 29 common downregulated DAMs were identified.

Fig. 4
figure 4

Multivariate statistical analysis of the metabolome data in the soybean samples. A OPLS-DA model analysis. B Number of differential metabolites. The green and yellow columns represent the number of genes that were upregulated and downregulated, respectively. C Upsetplot diagram showing the overlapping DAMs in the three comparison groups

Fig. 5
figure 5

In the QTN detection model, network analysis of the candidate DEGs and DAMs in the three comparison groups. A FHUFA vs. FLUFA, B THUFA vs. TLUFA, and C HUFA vs. LUFA. Yellow circles represent genes. Green squares represent metabolites. The solid line represents a positive correlation, while the dashed line represents a negative correlation

Differential accumulation of metabolites with MHUFA and MLUFA content

In this study, the metabolic changes of high and low unsaturated FA content in 30 soybean varieties during the R6 period were studied. In FMHUFA vs. FMLUFA, 29 DAMs were annotated into the KEGG pathway. Among them, the isoflavone pathway had the most DAMs, including Genistein, 8-C-glucosylnaringenin, genistin, and biochanin A (Additional file 1: Fig. S5A). In TMHUFA vs. TMLUFA, 16 DAMs were annotated into the KEGG pathway; among them, the TCA cycle had the most DAMs. 1-Pyrroline-4-hydroxy-2-carboxylate, 5-amino-6-ribitylamino uracil, and 2-(acetamidomethylene) succinate were differentially accumulated in TMHUFA vs. TMLUFA (Additional file 1: Fig. S5B). In MHUFA vs. MLUFA, 39 DAMs were annotated into the KEGG pathway, including the TCA cycle, LLA metabolism, and biosynthesis of amino acids. LysoPC (22:2(13Z,16Z)), (2S,5S)-trans-carboxymethylproline, and quercetin 3-sambubioside were differentially accumulated in MHUFA vs. MLUFA (Additional file 1: Fig. S5C).

Co‑expression analysis of candidate genes and DAM metabolites

Candidate genes and metabolite networks were analyzed. In the single-environment model, the co-expression network of 30 candidate genes and DAMs in three comparison groups was constructed. In the FHUFA vs. FLUFA network, the results indicated that the 75 subnetworks were significantly correlated (|r|> 0.5, p < 0.05). PE (22:5) was positively associated with Glyma.14G216100 (r > 0.51, p < 0.02), Glyma.03G040000 (r > 0.51, p < 0.01), Glyma.02G210300 (r > 0.51, p < 0.02), Glyma.17G236200 (r > 0.52, p < 0.01), Glyma.11G100600 (r > 0.59, p < 0.006), and Glyma.09G157500 (r > 0.68, p < 0.0007). Quinone was positively associated with Glyma.17G236700 (r > 0.53, p < 0.01) and Glyma.03G040400 (r > 0.67, p < 0.001) (Fig. 5A). In the THUFA vs. TLUFA network, 1-pyrroline-4-hydroxy-2-carboxylate was positively associated with Glyma.09G051900 (r > 0.88, p < 5.53E−14) and Glyma.03040400 (r > 0.88, p < 2.35E−14). Glyma.17G236700 was positively associated with 2-(acetamidomethylene) succinate (r > 0.51, p < 0.0007), 5-amino-6-ribitylamino uracil (r > 0.69, p < 6.85E−07), and 1-pyrroline-4-hydroxy-2-carboxylate (r > 0.69, p < 6.09E−07) (Fig. 5B). In the HUFA vs. LUFA network, 1-pyroline-4-hydroxy-2-carboxylate B was significantly associated with Glyma.03G040400 (r > 0.84, p < 3.69E−17) and Glyma.09G051900 (r > 0.85, p < 8.61E−18). Glyma.17G236700 was significantly associated with LysoPC(22:2(13Z,16Z)) (r > 0.64, p < 3.04E−08) and 1-pyrroline-4-hydroxy-2-carboxylate B (r > 0.64, p < 2.01E−08) (Fig. 5C).

In the multiple-environment QEI model, a co-expression network of nine candidate genes and DAMs was constructed for three comparison groups. In the FHUFA vs. FLUFA network, the 22 subnetworks were significantly correlated (|r|> 0.5, p < 0.05). Quinone was positively associated with Glyma.08G025100 (r > 0.88, p < 2.14E−07) and Glyma.17G236700 (r > 0.53, p < 0.01) (Fig. 6A). In the THUFA vs. TLUFA network, Glyma.18G205400 was positively associated with 5-amino-6-ribitylamino uracil (r > 0.90, p < 9.27E−16) and 1-Pyrroline-4-hydroxy-2-carboxylate (r > 0.92, p < 2.96E−17). Glyma.17G236700 was significantly associated with 2-(acetamidomethylene) succinate (r > 0.51, p < 0.0007), 1-pyrroline-4-hydroxy-2-carboxylate (r > 0.69, p < 6.09E−07), and 5-Amino-6-ribitylamino uracil (r > 0.69, p < 6.85E−07) (Fig. 6B). In the HUFA vs. LUFA network, Glyma.17G236700 was significantly associated with LysoPC (22:2(13Z,16Z)) (r > 0.64, p < 3.04E−08) and 1-pyrroline-4-hydroxy-2-carboxylate B (r > 0.649, p < 2.01E−08) (Fig. 6C).

Fig. 6
figure 6

QTN-by-environment detection model and network analysis of candidate DEGs and DAMs in the three comparison groups. A FHUFA vs. FLUFA, B THUFA vs. TLUFA, and C HUFA vs. LUFA. Yellow circles represent genes. Green squares represent metabolites. The solid line represents a positive correlation, while the dashed line represents a negative correlation

Gene-based association and haplotype analysis of candidate genes

To further determine the relationship between candidate genes and traits, the SNPs of the candidate genes were applied for the gene-based association and haplotype analysis of the candidate genes. According to the results of candidate gene screening based on gene expression data from qRT-PCR and transcriptomics, Glyma.03G040400 and Glyma.17G236700, as the candidate genes of QTNs and overlapping SNPs of QTNs and QEIs, were studied to understand the gene variations affecting soybean unsaturated FAs and to further determine beneficial haplotypes. Three SNPs were found in the promoter and CDS regions of Glyma.03G040400 (Additional file 1: Table S4). SNP markers 39,188,954, 39,189,172, and 39,190,333 showed an association with LLA (Fig. 7A, Additional file 1: Table S4). Among the three haplotypes of Glyma.03G040400, Hap 3 and Hap 2 had a significantly higher LNA content than Hap 1 in 2013 and 2014 (Fig. 7B and C).

Fig. 7
figure 7

Gene-based association analysis and haplotype analysis. A Gene-based association analysis of Glyma.17G236700 related to linolenic content. B, C The relationship between haplotypes and linoleic content analysis of Glyma.17G236700 in 2013 and 2014, respectively. D Gene-based association analysis of Glyma.03G040400 related to linoleic content. E, F The relationship between haplotypes and linolenic content analysis of Glyma.03G040400 in 2013 and 2014, respectively. * and ** indicate significance at p < 0.05 and p < 0.01, respectively

For candidate gene Glyma.17G236700, seven SNPs were found in the promoter and CDS region. Of these, SNP markers 5,048,564, 5,048,670, 5,048,842, 5,049,415, 5,049,422, 5,049,438, and 5,050,845 were significantly associated with LLA content in 2013 and 2014 (− log10(P) ≥ 2) (Additional file 1: Table S4, Fig. 7D). Four haplotypes of Glyma.17G236700 were defined by the seven SNPs (Fig. 7E and F). Among the four haplotypes, Hap 3 and Hap 4 had a significantly higher LNA content (2013 and 2014) than Hap 1 and Hap 2.

Discussion

Soybean is an important oil crop. However, different proportions of FAs may play an important role in soybean oil. Therefore, it is of great significance to improve the content and quality of soybean oil. The single locus method has been widely used to detect genetic variation in crops, including GLM and MLM [23, 24]. However, single-locus GWAS methods generally need Bonferroni correction and can be affected by a polygenic background. In this study, 194 soybean accessions were analyzed using the 3VmrMLM method (Additional file 1: Figure S1, Table 2). We identified 12, 10, and eight significant/suggested SNPs for OA, 17, 10, and seven significant/suggested SNPs for LA, and 10, eight, and 12 significant/suggested SNPs for LLA in 2013, 2014, and 2015, respectively (Table 2). In addition, we compared 3VmrMLM with a single-locus MLM method by Zhao et al. We detected 63 SNPs using the MLM method. Hence, the 3VmrMLM method detected more significant SNPs than the MLM method. Among these SNPs, four SNPs were found using the MLM and 3VmrMLM methods simultaneously, including rs4953186 rs52833743, rs35024325, and rs6481810, and the discovery of rs35024325 and rs6481810 SNPs has been reported [2, 22].

Environmental changes have an important impact on the quality and yield of crops; analysis of multiple environments can increase the detection capability of SNPs. In this study, six, five, and eight QEIs were found for OA, LA, and LLA, respectively (Fig. 2, Table 3). Among these SNPs, five have been reported [22, 25, 26]. A total of 1246 genes around the significant/suggested QTNs were predicted in this study; of them, 40 genes were involved in lipid synthesis (Additional file 1: Table S1). For example, the MYB transcription factor has been reported to affect oil accumulation [27]. The OsLTP gene is involved in the transport of lipid molecules in rice [28]. In this study, Glyma.03G040400 (GmLTP1), located on chromosome 3, was significantly related to LNA using the GLM method based on gene-based association (Additional file 1: Table S4). In addition, the GmLTP1 gene was a beneficial haplotype (Fig. 7).

A total of 301 genes around the significant/suggested QEIs were detected in this study (Additional file 1: Table S2). Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant single-environment QTNs. Among the overlapping SNPs, genes related to FA synthesis and seed development, such as ACBP and FTSH, were identified. ACBPs can play an important role in maintaining lipid homeostasis [29]. In addition, we found that the Glyma.17G236700 (ACBP) gene had a beneficial haplotype (Fig. 7).

Conclusion

The 3VmrMLM method was more comprehensive for GWAS. This method overcame the huge computational burden of traditional models. In this study, 94 QTNs and 19 QEIs were identified. Five major candidate genes were found. The gene expression data from different soybean tissues and transcriptome data were used to identify Glyma.03G040400 and Glyma.17G236700 as key candidate genes around the SNPs. The beneficial haplotypes of Glyma.03G040400 and Glyma.17G236700 may be helpful for further application in soybean breeding.

Methods

Plant materials, field trials, and phenotypic evaluation

An association panel of 194 soybean germplasm resources was planted at Harbin (162.41° E, 45.45° N) in 2013, 2014, and 2015. Field trials were conducted using single-row plots (2 m long and 0.65 m between rows) and a randomized complete block design with three replicates per experimental site. The unsaturated FA content of each sample was determined using gas chromatography (GC-14C, Shimadzu Company, Japan), according to our previous method [30]. The OA, LLA, and LNA content were applied in single-environment (QTN) and multi-environment (QEI) analyses.

Genotypic data

A genotypic dataset consisting of 36,981 SNPs from 194 soybean germplasm resources was generated by Specific-Locus Amplified Fragment Sequencing (SLAF-seq), which was reported in Han et al. and Zhao et al. [2, 31]. The 36,981 SNPs were distributed on 20 soybean chromosomes, with minor allele frequencies > 0.04 and missing data of < 10% (Fig. 8).

Fig. 8
figure 8

Density distribution of single nucleotide polymorphisms (SNPs)

GWAS

The 36,981 SNPs and unsaturated FA content of 194 soybean accessions were used for association analysis via the 3VmrMLM method in 3VmrMLM software [21]. QTNs for OA, LLA, and LNA content were calculated from a single environment (3 years of phenotypic data from 2013, 2014, and 2015). QEIs for OA, LLA, and LNA content were calculated using a joint analysis of multiple environments. The threshold of significance for QTNs and QEIs was set at p = 0.05 and LOD score ≥ 3.0.

Differential expression analysis based on RNA-seq

At the R6 stage, 30 soybean varieties with a high content of three unsaturated FAs and a low content of three unsaturated FAs were collected for RNA sequencing (RNA-seq) with two biological replicates. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). The cDNA sequencing libraries of 18 RNA samples were constructed and sequenced, and RNA-seq data were generated using the Illumina platform. Differentially expressed genes (DEGs) were identified using the edgeR package in R software [32]. The significance level was set as follows: |log2(fold change)|≥ 1.

Identification of candidate genes

The 100-kb flanking region of each identified QTN and QEI was defined to search for candidate genes according to linkage disequilibrium decay analysis, as described in Zhao et al. [2]. Candidate genes for unsaturated FAs were extracted in the following steps. According to previous reports, known genes related to FA content in Arabidopsis were considered references to screen their homologous genes in the soybean genome. The new candidate genes were identified using DEGs for unsaturated FAs.

Metabolite profiling

The non-targeted metabolome was completed by Bioacme Biotechnology Co., Ltd. (Wuhan, China). Briefly, a 100 mg soybean sample was loaded into a 2-mL centrifuge tube, and 300 μL 75% methanol/water was added. The tubes were centrifuged at 12,000 rpm for 15 min at 4 °C. Metabolites were screened and identified using the Metlin database. The differential metabolites were calculated using an orthogonal partial least squares-discriminant analysis (OPLS-DA) model, with a variable importance in the projection (VIP) score of ≥ 1 and a |log2 (fold change)| of ≥ 1.

Haplotype analysis and gene-based association analysis of candidate genes

The SNP variation of candidate genes was analyzed based on genome sequencing data. These SNPs were located at the full length of the gene, including exons, intronic regions, and upstream and downstream of the gene. Therefore, in this study, phenotypic data, including high and low total unsaturated FA content, from 50 soybean germplasm resources were used over 3 years to conduct an association analysis. A general linear model (GLM) was used to further determine the association between the SNP variation of candidate genes and unsaturated FA content using TASSEL software [33]. Significant SNP variation in candidate genes was considered when the P value was less than 0.01.

Quantitative real-time PCR (qRT-PCR)

Soybean seeds with high and low unsaturated FA content were collected at the R6 stage. Total RNA was extracted using the TRIzol method, and cDNA was generated using the ReverTra Ace qPCR RT Master Mix (TOYOBO, Osaka, Japan). Real-time quantitative PCR (qRT-PCR) was performed on an ABI 7500 fast real-time PCR platform with SYBR Green (TOYOBO, Osaka, Japan). GmACTIN4 was used as an internal control, and the primer sequences for candidate genes are listed in Additional file 1: Table S5. The L-13 soybean seed samples were used as a calibrator. The results of qRT-PCR were calculated using the 2−ΔΔCT method [34].

Co-expression analysis

The correlation coefficient was calculated between candidate genes and DAM metabolites, and a Pearson correlation cutoff value of 0.5 was generated. Data were visualized using the Cytoscape package [35].

Statistical analysis

Statistical significance was evaluated using Student’s t-test performed with SPSS 22.0 software (IBM Corp., Armonk, NY, USA). “*” and “**” represent a significance level of p < 0.05 and p < 0.01, respectively. The mean and standard deviation (mean ± SD) were calculated using the data from three biological replicates.

Availability of data and materials

All data generated or analyzed during this study are included in this published article and its additional files.

References

  1. Tunde-Akintunde T, Olajide J, Akintunde B. Mass-volume-area. Related and mechanical properties of soybean as a function of moisture and variety. Int J Food Prop. 2005;8:449–56. https://doi.org/10.1080/10942910500267513.

    Article  Google Scholar 

  2. Zhao X, Jiang H, Feng L, Qu Y, Teng W, Qiu L, Zheng H, Han Y, Li W. Genome-wide association and transcriptional studies reveal novel genes for unsaturated fatty acid synthesis in a panel of soybean accessions. BMC Genomics. 2019;20:68. https://doi.org/10.1186/s12864-019-5449-z.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Liu X, Qin D, Piersanti A, Zhang Q, Miceli C, Wang P. Genome-wide association study identifies candidate genes related to oleic acid content in soybean seeds. BMC Plant Biol. 2020;20:399. https://doi.org/10.1186/s12870-020-02607-w.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ascherio A, Willett W. Health effects of trans fatty acids. Am J Clin Nutr. 1997;66:1006–10. https://doi.org/10.1093/ajcn/66.4.1006S.

    Article  Google Scholar 

  5. Di Q, Piersanti A, Zhang Q, Miceli C, Li H, Liu X. Genome-wide association study identifies candidate genes related to the linoleic acid content in soybean seeds. Int J Mol Sci. 2021;23:454. https://doi.org/10.3390/ijms23010454.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Marangoni F, Agostoni C, Borgh C, Catapano A, Cena H, Ghiselli A, La Vecchia C, Lercker G, Manzato E, Pirillo A, Riccardi G, Risé P, Visioli F, Poli A. Dietary linoleic acid and human health: focus on cardiovascular and cardiometabolic effects. Atherosclerosis. 2020;292:90–8. https://doi.org/10.1016/j.atherosclerosis.2019.11.018.

    Article  CAS  PubMed  Google Scholar 

  7. Flores T, Karpova O, Su X, Zeng P, Bilyeu K, Sleper D, Nguyen H, Zhang Z. Silencing of GmFAD3 gene by siRNA leads to low alpha-linolenic acids (18:3) of fad3-mutant phenotype in soybean [Glycine max (Merr.)]. Transgenic Res. 2008;17:839–50. https://doi.org/10.1007/s11248-008-9167-6.

    Article  CAS  PubMed  Google Scholar 

  8. Al Amin N, Ahmad N, Wu N, Pu X, Ma T, Du Y, Bo X, Wang N, Sharif R, Wang P. CRISPR-Cas9 mediated targeted disruption of FAD2–2 microsomal omega-6 desaturase in soybean (Glycine max L.). BMC Biotechnol. 2019;19:9. https://doi.org/10.1186/s12896-019-0501-2.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Zhu C, Gore M, Buckler E, Yu J. Status and prospects of association mapping in plants. Plant Genome. 2008;1:5–20. https://doi.org/10.3835/plantgenome2008.02.0089.

    Article  CAS  Google Scholar 

  10. Zhang X, Guan Z, Li Z, Liu P, Ma L, Zhang Y, Pan L, He S, Zhang Y, Li P, Ge F, Zou C, He Y, Gao S, Pan G, Shen Y. Combination of linkage mapping and GWAS brings new elements on the genetic basis of yield-related traits in maize across multiple environments. Theor Appl Genet. 2020;133:2881–95. https://doi.org/10.1007/s00122-020-03639-4.

    Article  CAS  PubMed  Google Scholar 

  11. Li H, Peng Z, Yang X, Wang W, Fu J, Wang J, Han Y, Chai Y, Guo T, Yang N, Liu J, Warburton M, Cheng Y, Hao X, Zhang P, Zhao J, Liu Y, Wang G, Li J, Yan J. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet. 2013;45:43–50. https://doi.org/10.1038/ng.2484.

    Article  CAS  PubMed  Google Scholar 

  12. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4. https://doi.org/10.1038/ng.2310.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yu J, Pressoir G, Briggs W, Vroh B, Yamasaki M, Doebley J, McMullen M, Gaut B, Nielsen D, Holland J, Kresovich S, Buckler E. Unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38:203–8. https://doi.org/10.1038/ng1702.

    Article  CAS  PubMed  Google Scholar 

  14. Kang H, Zaitlen N, Wade C, Kirby A, Heckerman D, Daly M, Eskin E. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–23. https://doi.org/10.1534/genetics.107.080101.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Lippert C, Listgarten J, Liu Y, Kadie C, Davidson R, Heckerman D. FaST linear mixed models for genome-wide association studies. Nat Methods. 2011;8:833–94. https://doi.org/10.1038/Nmeth.1681.

    Article  CAS  PubMed  Google Scholar 

  16. Wang S, Feng J, Ren W, Huang B, Zhou L, Wen Y, Zhang J, Dunwell J, Xu S, Zhang Y. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep. 2016;6:19444. https://doi.org/10.1038/srep19444.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  17. Liu X, Huang M, Fan B, Buckler E, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12: e1005767. https://doi.org/10.1371/journal.pgen.1005767.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ren W, Wen Y, Dunwell J, Zhang Y. pKWmEB: integration of Kruskal-Wallis test with empirical bayes under polygenic background control for multi-locus genome-wide association study. Heredity (Edinb). 2018;120:208–18. https://doi.org/10.1038/s41437-017-0007-4.

    Article  CAS  PubMed  Google Scholar 

  19. Wen Y, Zhang H, Ni Y, Huang B, Zhang J, Feng J, Wang S, Dunwell J, Zhang Y, Wu R. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform. 2018;19:700–12. https://doi.org/10.1093/bib/bbw145.

    Article  PubMed  Google Scholar 

  20. Zhang J, Feng J, Ni Y, Wen Y, Niu Y, Tamba C, Yue C, Song Q, Zhang Y. pLARmEB: integration of least angle regression with empirical bayes for multilocus genome-wide association studies. Heredity (Edinb). 2017;118:517–24. https://doi.org/10.1038/hdy.2017.8.

    Article  CAS  PubMed  Google Scholar 

  21. Li M, Zhang Y, Xiang Y, Liu M, Zhang Y. IIIVmrMLM: the R and C++ tools associated with 3VmrMLM, a comprehensive GWAS method for dissecting quantitative traits. Mol Plant. 2022;15:1251–3. https://doi.org/10.1016/j.molp.2022.06.002.

    Article  CAS  PubMed  Google Scholar 

  22. Bachlava E, Dewey R, Burton J, Cardinal A. Mapping and comparison of quantitative trait loci for oleic acid seed content in two segregating soybean populations. Crop Sci. 2009;49:433–42.

    Article  CAS  Google Scholar 

  23. Zhong H, Liu S, Sun T, Kong W, Deng X, Peng Z, Li Y. Multi-locus genome-wide association studies for five yield-related traits in rice. BMC Plant Biol. 2021;21:364. https://doi.org/10.1186/s12870-021-03146-8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Reinprecht Y, Poysa V, Yu K, Rajcan I, Ablett G, Pauls K. Seed and agronomic QTL in low linolenic acid, lipoxygenase-free soybean (Glycine max (L.) Merrill) germplasm. Genome. 2006;49:1510–27.

    Article  CAS  PubMed  Google Scholar 

  25. Kim H, Kim Y, Kim S, Son B, Choi I. Analysis of quantitative trait loci (QTLs) for seed size and fatty acid composition using recombinant inbred lines in soybean. J Life Sci. 2010;20:1186–92. https://doi.org/10.5352/JLS.2010.20.8.1186.

    Article  CAS  Google Scholar 

  26. Hyten D, Pantalone V, Saxton A, Schmidt M, Sams C. Molecular mapping and identification of soybean fatty acid modifier quantitative trait loci. J Amer Oil Chem Soc. 2004;81:1115–8. https://doi.org/10.1007/s11746-004-1027-z.

    Article  CAS  Google Scholar 

  27. Shi M, Yu L, Shi J, Liu J. A conserved MYB transcription factor is involved in regulating lipid metabolic pathways for oil biosynthesis in green algae. New Phytol. 2022;235:576–94. https://doi.org/10.1111/nph.18119.

    Article  CAS  PubMed  Google Scholar 

  28. Chen L, Ji C, Zhou D, Gou X, Tang J, Jiang Y, Han J, Liu Y, Chen L, Xie Y. OsLTP47 may function in a lipid transfer relay essential for pollen wall development in rice. J Genet Genomics. 2022;49:481–91. https://doi.org/10.1016/j.jgg.2022.03.003.

    Article  CAS  PubMed  Google Scholar 

  29. Hamdan M, Lung S, Guo Z, Chye M. Roles of acyl-coA-binding proteins in plant reproduction. J Exp Bot. 2022;73:2918–36. https://doi.org/10.1093/jxb/erab499.

    Article  CAS  PubMed  Google Scholar 

  30. Xie D, Han Y, Zeng Y, Chang W, Teng W, Li W. SSR- and SNP-related QTL underlying linolenic acid and other fatty acid contents in soybean seeds across multiple environments. Mol Breed. 2012;30:169–79.

    Article  CAS  Google Scholar 

  31. Han Y, Zhao X, Liu D, Li Y, Lightfoot D, Yang Z, Zhao L, Zhou G, Wang Z, Huang L, Zhang Z, Qiu L, Zheng H, Li W. Domestication footprints anchor genomic regions of agronomic importance in soybeans. New Phytol. 2016;209:871–84. https://doi.org/10.1111/nph.13626.

    Article  CAS  PubMed  Google Scholar 

  32. Robinson M, McCarthy D, Smyth G. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40. https://doi.org/10.1093/bioinformatics/btp616.

    Article  CAS  PubMed  Google Scholar 

  33. Bradbury P, Zhang Z, Kroon D, Casstevens T, Ramdoss Y, Buckler E. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5. https://doi.org/10.1093/bioinformatics/btm308.

    Article  CAS  PubMed  Google Scholar 

  34. Price A, Patterson N, Plenge R, Weinblatt M, Shadick N, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9. https://doi.org/10.1038/ng1847.

    Article  CAS  PubMed  Google Scholar 

  35. He L, Xiao J, Rashid K, Yao Z, Li P, Jia G, Wang X, Cloutier S, You F. Genome-wide association studies for pasmo resistance in Flax (Linum usitatissimum L.). Front Plant Sci. 2019;9:1982. https://doi.org/10.3389/fpls.2018.01982.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Panthee D, Pantalone V, Saxton A. Modifier QTL for fatty acid composition in soybean oil. Euphytica. 2006;152:67–73. https://doi.org/10.1007/s10681-006-9179-3.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.

Funding

This study was financially supported by Heilongjiang Provincial Project (ZD2022C002, JD22A015), the National Key Research and Development Project of China (2021YFD1201604, 2021YFF1001204, 2021YFD1201103), the Chinese National Natural Science Foundation (U22A20473), the Youth Leading Talent Project of the Ministry of Science and Technology in China (2015RA228), the National Ten-thousand Talents Program, The national project (CARS-04-PS07). Young leading talents of Northeast Agricultural University (NEAU2023QNLJ-003). The funding bodies had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

XZ (Xunchao Zhao) performed the experiments and prepared the manuscript. YZ (Yuhang Zhan), KL and YZ (Yan Zhang) prepared the samples. CZ, MY, ML and YL analyzed the data. PZ, YH and XZ (Xue Zhao) conceived the experiments and revised the manuscript.

Corresponding authors

Correspondence to Peng Zuo, Yingpeng Han or Xue Zhao.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The authors have consented for publication.

Competing interests

The authors declare that they have no competing interests. All authors agree to authorship and approved the final manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Table S1.

QTNs identified for unsaturated fatty acids content using the QTN detection model in 3VmrMLM. Table S2. QEIs identified for unsaturated fatty acids content three environments detected using the QTN-by-environment detection model in 3VmrMLM. Table S3. Unsaturated fatty acids content of the 30 soybean varieties. Table S4. The association between SNP in Glyma.17G23670 and Glyma.03G040400 gene and soybean unsaturated fatty acids content based on 50 soybean germplasms. Table S5. Primers used for qRT-PCR. Figure S1. Manhattan plots of the single-environment analysis for the oleic, linoleic and linolenic traits in 2013, 2014 and 2015 of soybean. Figure S2. A and B KEGG pathway annotation around QTN and QEI candidate genes, respectively. Figure S3. Candidate genes are identified in the transcriptome and (A): QTN detection model, (B): QTN-by-environment detection model. Figure S4. Analysis of candidate genes by qRT-PCR. Figure S5. Differential accumulation of metabolite in the three comparison groups.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Zhan, Y., Li, K. et al. Multi-omics analysis reveals novel loci and a candidate regulatory gene of unsaturated fatty acids in soybean (Glycine max (L.) Merr). Biotechnol Biofuels 17, 43 (2024). https://doi.org/10.1186/s13068-024-02489-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13068-024-02489-2

Keywords