Identification of candidate genes controlling oil content by combination of genome-wide association and transcriptome analysis in the oilseed crop Brassica napus

Background Increasing seed oil content is one of the most important targets for rapeseed (Brassica napus) breeding. However, genetic mechanisms of mature seed oil content in Brassica napus (B. napus) remain little known. To identify oil content-related genes, a genome-wide association study (GWAS) was performed using 588 accessions. Results High-throughput genome resequencing resulted in 385,692 high-quality single nucleotide polymorphism (SNPs) with a minor allele frequency (MAF) > 0.05. We identified 17 loci that were significantly associated with seed oil content, among which 12 SNPs were distributed on the A3 (11 loci) and A1 (one loci) chromosomes, and five novel significant SNPs on the C5 (one loci) and C7 (four loci) chromosomes, respectively. Subsequently, we characterized differentially expressed genes (DEGs) between the seeds and silique pericarps on main florescences and primary branches of extremely high- and low-oil content accessions (HO and LO). A total of 64 lipid metabolism-related DEGs were identified, 14 of which are involved in triacylglycerols (TAGs) biosynthesis and assembly. Additionally, we analyzed differences in transcription levels of key genes involved in de novo fatty acid biosynthesis in the plastid, TAGs assembly and lipid droplet packaging in the endoplasmic reticulum (ER) between high- and low-oil content B. napus accessions. Conclusions The combination of GWAS and transcriptome analyses revealed seven candidate genes located within the confidence intervals of significant SNPs. Current findings provide valuable information for facilitating marker-based breeding for higher seed oil content in B. napus.

genetic and complex molecular regulatory mechanisms underlying variations in SOC of B. napus.
Quantitative trait locus (QTL) mapping and genomewide association study (GWAS) have been widely used to dissect the regulatory loci and genetic architecture of complex agronomical quantitative traits at the whole genome level. Previous studies have focused on QTL identification for oil content in B. napus and have identified numerous QTLs in all 19 linkage groups of B. napus [8,10,19,24,49,51,54,60]. GWAS, as another alternative for identifying QTLs, is not restricted to the traditional biparental linkage mapping and offers a higher resolution. Additionally, GWAS has recently been widely used in the study of important complex traits in B. napus, such as seed germination and vigor [16], plant height and primary branch [32,52], harvest index [38,42], yield traits [39,57]. In addition, GWAS has been reported in various plants, such as Arabidopsis [1], rice [21,22,65], soybean [23] and maize [28,50] and so on.
Brassica napus is an allopolyploid species with a complex genome structure, which originated ~ 7500 years ago from a spontaneous hybridization between B. rapa (AA, 2n = 20) and B. oleracea (CC, 2n = 18) [7]. SOC is an important complex quantitative traits but its genetic and molecular mechanisms remain undefined. So far, there are relatively few reports on the study of SOC by GWAS. Liu et al. identified 50 loci that were significantly associated with SOC using 521 B. napus accessions genotyped with the Brassica 60 K SNP array by GWAS and validated a novel locus on chromosome A5 that could increase 1.5-1.7% of the seed oil content by linkage mapping [36]. Li et al. detected a QTL on chromosome A08 with a significant association with seed oil content using GWAS [30]. Wang et al. detected 17 loci associated with seed oil content by GWAS [56] using a total of 238 rapeseed cultivars. In this study, we selected 588 B. napus accessions for GWAS by high-throughput genome resequencing.
To understand the genetic control of SOC at the population level through the identification of associated loci with SNPs, we genotyped 588 B. napus accessions that were collected from Asia (466), Europe (102), North America (13) and Australia (7) using high-throughput genome resequencing, and carried out a GWAS with PCA + K statistical models. SNPs that were significantly associated with SOC were identified. In addition, we performed transcriptome sequencing of four tissues with extremely high-(HO) and low-oil content (LO) B. napus accessions. Among the genes identified in both the GWAS and transcriptome analysis, seven were identified as candidate genes involved in seed oil accumulation, which were verified by quantitative real-time PCR (qRT-PCR). The current study thus may contribute to markerbased breeding for higher seed oil content in B. napus.

Plant materials and phenotyping
A total of 588 B. napus lines were collected from spring, winter and semi-winter accessions and cultivated in Southwest University of Beibei, Chongqing, China (29°45′N, 106°22′E, 238.57 m) for 3 consecutive years (2016-2018). All the field experiments followed a randomized complete block design with two biological replications. Each accession was planted in two rows of 10-12 plants per row, with 20 cm between plants within each row and 30 cm between rows. The trial management was performed in accordance with local standard breeding field protocols. At maturity, open-pollinated seeds including five representative plants in the middle of each plot were collected for the SOC measurements. The oil content of the desiccated seeds was measured by nearinfrared reflectance spectroscopy (NIRS DS2500) using the parameters described by Gan et al. [14].

Genome-wide association analysis
The seed oil content with two biological replicates for 3 consecutive years (2016-2018) was evaluated by the method of the best linear unbiased prediction (BLUP) based on a linear model using an R script (http://www. eXten sion.org/pages /61006 ). An association analysis was implemented in TASSEL5.2.1 software using the P + K model [6]. The population structure (Q), relative kinship (K) and SNP genotyping in the association panel has been well described in our previous report [41]. The uniform threshold of GWAS was P < 2.59 × 10 −6 (1/valid SNPs used, − log 10 (1/385,692) = 5.59) [56]. The quantile-quantile plot was shown with the expected P value and − log 10 (P) of each SNP, and the Manhattan plot was demonstrated using the R package qqman.

Transcriptome sequencing and identification of differentially expressed genes
Two extremely high-oil content (HO) lines and one extremely low-oil content (LO) line were selected from the GWAS population for transcriptome sequencing (RNA-Seq), respectively. The HO lines were SWU47 (CQ24) and Zhongshuang11 (CQ52), while the LO line was Ningyou12 (CQ46). Total RNA was extracted from four tissues of the HO and LO accessions, respectively. Tissues were harvested 30 days after flowering from seed and silique pericarps on the main inflorescence (30SM and 30SPM, respectively) and on the primary branch (30SB and 30SPB, respectively). For each sample, two biological replicates were performed, with each collected from three independent plants. All samples were immediately placed in liquid nitrogen and stored at − 80 °C for RNA sequencing (RNAseq) and quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis.
Sequencing library preparation and sequencing reactions were conducted at the Biomarker Technologies Corporation (Beijing, China). Gene expression levels were estimated using FPKM (Fragments per kilobase of exon per million reads mapped). Differentially expressed genes (DEGs) between two samples were obtained with Cuffdiff, based on the criteria false discovery rate (FDR) < 0.05 and |log 2 fold change| > 2.0 [4].

Identification and expression analysis of acyl-lipid metabolism genes between extremely high-and low-oil content B. napus lines
To seek out genes associated with acyl-lipid metabolism (ALM), B. napus homologous gene sequences were analyzed against a list of genes involved in acyl-lipid metabolism obtained from the "Arabidopsis Acyl-Lipid Metabolism" website (ARALIP) (http://arali p.plant biolo gy.msu.edu/) [34]. The differentially expressed lipid metabolism-related genes were obtained on the basis of the RNA-Seq data of the B. napus HO and LO lines created in the present study. Similarly, to analyze the spatial expressions of differentially expressed acyl-lipid metabolism-related genes in the extremely HO and LO B. napus lines, the expression values of these genes in 30SM, 30SPM, 30SB and 30SPB were obtained from this RNA-Seq data as well. The expression heatmap of the differential expressed acyl-lipid metabolism genes was generated using HemI1.0 [58].

GO and KEGG enrichment analysis of DEGs
GO enrichment and KEGG pathway analysis of DEGs were performed using the online OmicShare tool (http:// www.omics hare.com/tools /index .php/) [37,45]. The threshold of significantly enriched GO terms was set to FDR < 0.05 [39].

Identification of potential candidate genes
To identify candidate genes associated with SOC, the 300-kb flanking regions on either side of the markers significantly associated with SOC were chosen as the confidence interval for in-depth analysis, as described previously [56]. The DEGs within the confidence interval of SNPs significantly associated with SOC were screened to identify differentially expressed (DE) candidate genes. Additionally, DEGs involved in acyl-lipid metabolism were also identified.

Validation of candidate genes by qRT-PCR analysis
Total RNA was extracted from all tested tissues with the EZ-10 DNAaway RNA Mini-prep Kit [Sangon Biotech (Shanghai), Co., Ltd], and then cDNA was synthesized from 1 µg RNA using the PrimeScript ™ RT reagent kit with gDNA Eraser according to the manufacturer's instructions (Perfect Real Time; TaKaRa Biotechnology, Dalian, China). The gene-specific primers for qRT-PCR of the candidate genes and reference gene are listed in Additional file 1: Table S7. The PCR consisted of 10 μL SYBR II (TakaRa), 2.0 μL cDNA, 1.6 μL primer, 0.4 μL ROX Reference Dye II and distilled water to a final volume of 20 μL. The PCR program was as follows: 95 °C for 30 s and 35 cycles of 95 °C for 5 s, followed by 56-60 °C (depending on the primers used) for 30 s. For each reaction, three biological replicates were performed, and relative expression levels were obtained using the 2 −∆∆Ct method, with BnActin7 as internal controls.

Phenotypic variation of SOC
The 588 B. napus accessions were planted in three environments (2016CQ, 2017CQ and 2018CQ) from 2016 to 2018, with two replications performed each year. Extensive phenotypic variations of SOC were observed (Table 1), and specific seed oil content (SOC, % of seed weight) phenotypes of 588 accessions for GWAS analysis are shown in Additional file 1: Among the three environments, the coefficient of variation (CV) of SOC was less than 10%, reflecting a relatively small variation of SOC within the entire GWAS panels. SOC among three environments in B. napus showed continuous variation and approximated a normal distribution ( Fig. 1), suggesting SOC consisted of quantitative traits controlled by multiple genes. The broad sense heritability of SOC was 73.4%, which is less than previously reported by Liu et al. [36] (87.4%). These results suggested that most of the phenotypic variation in SOC was attributed to genetic effects despite being greatly affected by the environment in this study.

Genome-wide association analysis
GWAS for seed oil content (SOC) was performed using the P + K model [48]. The QQ plot is shown in Fig. 2a, and the results showed that this model could be used to identify association SNPs. A total of 17 significant SNPs for SOC (P < 2.59 × 10 −6 ) were identified and these SNPs were unevenly distributed across four chromosomes (A1, A3, C5 and C7) ( Fig. 2b and Table 2). Eleven significant SNPs were distributed on the A3 chromosome and for up to 64.71%, the significant correlation region ranged from 17.68 18.36 Mb, which was consistent with some previous QTL mapping results ( Fig. 3) [10,68]. Only one SNP was distributed on the A1 and C5 chromosomes, respectively. Additionally, the remaining four SNPs were distributed on the C7 chromosome. Individual significant SNPs for SOC explained 5.46-6.68% of the phenotypic variation (R 2 ). The location and detailed information for all 17 SNPs are listed in Table 2. Among all the detected significant SNPs, twelve SNPs (70.59%) were within the previously identified significant QTL confidence intervals associated with seed oil content, suggesting the high reliability of SNPs identified in this study (Table 2) [10,36,54,66,68]. To find candidate genes associated with oil content, all these significant SNPs will be further analyzed.

Comparative analysis of three Brassica napus lines using transcriptome sequencing
Three natural B. napus accessions with extremely significant differences in seed oil content (SOC) were selected from the genome-wide association analysis populations. The oil and protein contents of the desiccated seeds were determined by near-infrared reflectance spectroscopy (NIRS DS2500), and the results are shown in Fig. 4a. The average oil contents during the 3-year period for SWU47 (CQ24, high-oil) and Zhongshuang11 (CQ52, high-oil) were significantly higher than those of Ningyou12 (CQ46, low oil) and there is no significant difference in seed protein content and 1000-seed weight between the high-and low-oil lines (Fig. 4a). To determine the FA compositions of 30SM, 30SB and mature desiccated seeds of three accessions subjected to gas chromatographymass spectrometry (GC-MS). In 30SM, 30SB (Fig. 4b) or mature desiccated seeds (Fig. 4c), the results showed significantly higher C18:1 in CQ24 and CQ52 (high-oil  content), and significantly lower C16:0 and C18:2 in CQ24 and CQ52 compared to CQ46 (low-oil content).

Transcriptome analyses for differentially expressed genes
In the transcriptome analyses, RNA was obtained from four tissues of extremely high-(CQ24 and CQ52) and low-oil content (CQ46) B. napus lines at CQ including 12 independent samples and a total of 24 libraries (two biological replicates per sample) were constructed for transcriptome sequencing. After removing the low quality and contaminant reads, a total of 364.0 million clean reads were acquired, with an average of 23.05 million reads per sample. On average, 95.24% of the input reads mapped uniquely to the B. napus reference genome (Additional file 1: Table S2). The correlation coefficient between the two biological replicates of each sequencing sample exceeded 0.9 for all tested samples (Additional file 1: Table S2), suggesting a high reproducibility among the samples.
To identify differentially expressed genes (DEGs) between extremely high-and low-oil content B. napus lines, the following criterion were applied: |log 2 fold change| > 2.0 and FDR < 0.05. For CQ24/CQ46, 2185 and 4858 genes were differentially expressed in individual tissues. Among which, the number of DEGs was variable, and the number of upregulated genes exceeded that of the number of downregulated genes in all tissues tested. Under CQ52/CQ46, between 3347 and 4018 genes were differentially expressed in individual tissues, among which the number of differential genes was not very variable and the number of upregulated exceeded the number of downregulated genes in all tissues tested except  S1a). These results suggested that positive regulatory genes might play a major role in the formation of high oil content in B. napus. To find conserved DEGs between high-and low-oil content lines, Venny analysis was conducted using CQ24/CQ46 and CQ52/CQ46 in all four tissues, respectively. We found 1628, 2658, 2146 and 1493 common DEGs in 30SM, 30SB, 30SPM and 30SPB, respectively, suggesting that transcriptomic variations are different in diverse tissues and variations in 30SB and 30SPM are greater than 30SM and 30SPB between highoil and low-oil content B. napus (Fig. 5a).

Functional classification of common DEGs between CQ24 (HO)/CQ46(LO) and CQ52(HO)/CQ46(LO)
To understand the functional classification of common DEGs in four tested tissues using CQ24(HO)/CQ46(LO) and CQ52(HO)/CQ46(LO) (hereinafter referred to as common DEGs), Gene Ontology (GO) enrichment analysis was performed. GO terms were divided into three main categories: biological process, cellular component and molecular function (Additional file 2: Fig. S1b, c). The results showed that common DEGs involved in cellular (GO:0009987), metabolism (GO:0008152) and To understand the functional distribution of these common DEGs, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was also conducted. The results showed that the most enriched pathway was involved in metabolism and the main pathways of common DEGs in all four tested tissues were carbohydrate metabolism, global and overview, amino acid metabolism, energy metabolism, lipid metabolism and biosynthesis of other secondary metabolites (Fig. 5b). In this study, we focused on common DEGs in lipid metabolism, and 64 differential expressed lipid metabolism-related genes were found in all four tissues, of which 4, 24, 10, and 6 were specific to 30SM, 30SB, 30SPM, and 30SPB, respectively. In addition, six common differential lipid metabolism genes were shared in 30SM and 30SB, two in 30SB, 30SM and 30SPB, two in 30SB, 30SM and 30SPM, three in 30SB, 30SPB and 30SPM, and seven among all four tissues (Additional file 1: Table S4).
To understand the expression patterns of 64 common differential expressed lipid metabolism-related genes between HO (CQ24, CQ52) and LO (CQ46) accessions, the heatmap of these genes was drawn based on the RNA-seq data which were normalized to the log 2 FPKM using HemI1.0 (Additional file 1: Table S4, Additional file 2: Fig. S2a). Among all 64 common differential expressed lipid metabolism-related genes, 38 genes were upregulated in the HO (CQ24, CQ52) lines compared with the LO (CQ46) lines, and 26 genes showed opposite expression patterns. We found seven genes (Bna-A07g31890D; BnaC02g06560D; BnaCnng62740D; BnaA03g60440D; BnaA07g35160D; BnaA07g20190D; BnaC08g44190D) that were differentially expressed in all tested tissues between HO and LO accessions, and only BnaC08g44190D was downregulated in HO compared with LO. Moreover, a total of 13 DEGs (BnaC03g16690D; BnaA07g31890D; BnaC02g00470D; BnaC03g04180D; BnaC02g04910D; BnaC04g40760D; BnaA03g13590D; BnaA04g17150D; BnaC08g12280D; BnaA01g15860D; BnaC01g18950D; BnaA10g09480D; BnaC09g31660D) involved in the TAG biosynthesis pathway and TAG assembly were upregulated in the HO compared with the LO lines. Therefore, we speculated that these upregulated lipid metabolism genes in HO lines may play an indispensable role in the formation of high oil content of B. napus.

TAG biosynthesis and assembly pathway gene expression in four tested tissues of the HO and LO accessions
To preliminarily elucidate the difference in seed oil content between transcriptome sequencing accessions at the transcription level, the genes involved in the TAG biosynthesis pathway were analyzed and presented using a Log 2 FPKM (calculation method) as relative transcript levels among the tissues and accessions, with a specific focus on FA synthesis in the plastid, and TAG accumulation/packaging pathways in the endoplasmic reticulum (Fig. 6) [2,34].
In terms of FA synthesis in the plastid, most genes exhibited differential expression patterns among 30SM, 30SB, 30SPM, and 30SPB tissues in the same accession, especially in seeds (30SM, 30SB), and the expression level of most FA synthesis genes was significantly higher than in silique pericarps (30SPM and 30SPB). In addition, many genes showed differential expression levels in the same tissue among the HO (CQ24, CQ52) and LO (CQ46) accessions ( Fig. 6 and Additional file 1: Table S6). For example, the transcript levels of the pyruvate dehydrogenase complex (PDHC), acetyl-CoA carboxylase (ACCase), malonyl-CoA:ACP malonyltransferase (MCMT), 3-ketoacyl-ACP synthase II/III (KASII/ III), ketoacyl-ACP reductase (KAR), hydroxyacyl-ACP dehydrase (HAD), enoyl-ACP reductase (ENR), stearoylacyl carrier protein desaturase (SAD), and acyl-ACP thioesterase A (FATA ) were generally lower in the seed tissues (30SM and 30SB) of the LO (CQ46) than the HO (CQ24 and CQ52) accessions. However, these genes showed no difference in transcription levels in silique pericarps (30SPM and 30SPB) among the HO and LO lines. Together, the differential expression of fatty acid synthesis genes in seeds is more likely to explain the difference in oil content among the three accessions than the silique pericarps; however, this does not preclude an effect of the silique pericarps in modulating the oil content difference.
Glycerol-3-phosphate (G-3-P) and acyl-CoA, which are precursors of glycerolipid assembly in the endoplasmic reticulum (ER), are produced by NAD-dependent glycerol-3-phosphate dehydrogenase (GPDH), and be involved in plastid fatty acid export 1(FAX1) and longchain acyl-CoA synthetase 9 (LACS9) [5,12,31]. The transcript levels of GPDH and LACS9 were higher in the 30SM and 30SB of HO than in the LO lines, suggesting an increase in the acyl-CoA pool and glycerol-3-phosphate (G-3-P) substrate for HO lines in seeds. However, their expression in silique pericarps (30SPM and 30SPB) was not consistent. It is worth noting that the expression levels of GPDH and LACS9 in seeds were much higher than those in silique pericarps, but the expression of FAX1 was higher in silique pericarps than in seeds (Fig. 6). Thus, based from the transcript levels alone of the four transcriptome sequencing tissues, it remains vague whether these steps are important in regulating the oil content differences among the three accessions.

Identification candidate genes by combining GWAS and transcriptome sequencing analysis
The candidate gene regions with SOC as determined by GWAS analysis were the 300-kb flanking regions on either side of the markers significantly associated with SOC, as described previously [56]. All genes within the confidence interval of all 17 significant SNPs with SOC are listed in Additional file 1: Table S5. By combination with transcriptome sequencing analysis, we found a total of 41 genes that were differentially expressed between high and low oil accessions, of which 16 and 32 were DEGs in CQ24/CQ46 (No. X1-X16) ( Table 4) and CQ52/ CQ46 (NO. Z1-Z32) ( Table 5), respectively. Furthermore, seven common DEGs (X5/Z30, X7/Z31, X10/Z20, X12/ Z13, X16/Z10, X15/Z9, and X14/Z1) were identified between CQ24/CQ46 and CQ52/CQ46. The biological functions of these differential expressed candidates were analyzed by applying Protein Basic Local Alignment Search Tool (BLASTP) searches against all Arabidopsis proteins. The results are listed in Tables 4 and 5 and the expression patterns of these genes between high (CQ24, CQ52) and low oil (CQ46) accessions are shown in Additional file 2: Fig. S3. Seven common DEGs (X5/Z30, X7/ Z31, X10/Z20, X12/Z13, X16/Z10, X15/Z9, and X14/Z1) were considered to be important candidate genes related to seed oil content for the following study.

Verification of the transcriptome sequencing data by quantitative real-time polymerase chain reaction (qRT-PCR)
To confirm the accuracy of the RNA-Seq results, some genes were selected to perform qRT-PCR analysis, and the expression levels of these genes by qRT-PCR and transcriptome sequencing are shown in Fig. 7. Although the expression trend of individual genes, such as Bna-A03g60440D, was not consistent with the RNA-Seq results between two high-oil content accessions, their expression trend was consistent between high-and lowoil content accessions. In short, the results of RNA-Seq are highly consistent with qRT-PCR. The results fully demonstrate the reliability and accuracy of the transcriptome sequencing data.

The identification of significant SNPs for seed oil content (SOC)
Genome-wide association analysis (GWAS) based on millions of markers is currently widely used in analysis of complex agronomic traits for crops such as rice [22], maize [50], soybean [23] and sorghum [46]. Many reports have addressed QTL mapping of seed oil content in Brassica napus [8,10,13,24,49,54,55,59,68]. However, there have been relatively few reports on the identification of significant SNPs associated with seed oil content by GWAS compared with the QTL mapping method. Liu et al. identified 50 SNPs that were significantly associated with seed oil content using 521 B. napus accessions combined with the Brassica 60K SNP array [36]. Seven stable QTLs for seed oil content were identified by Sun et al. by combined linkage and association mapping methods in Brassica napus [51]. Four significant SNPs for seed oil content were located on chromosomes A1, C3, and C5 [27]. Seventeen loci associated with seed oil content were identified by combining high-throughput genome resequencing and phenotyping using GWAS [56]. Although previous studies have identified many SNPs that were significantly associated with SOC in Brassica napus, none of them have combined the analysis with transcriptome sequencing to obtain candidate genes contributing to SOC. In our previous study, the population structure and linkage disequilibrium of 588 Brassica napus accessions were evaluated, and provided a high-resolution genomic variation map consisting of 616053 SNPs by the resequencing method. In the present study, we obtained 17 SNPs that were significantly associated with seed oil content (SOC) in Brassica napus, 12 of which were overlapped with previous studies (Table 2), supporting the high reliability of loci detected in current study. In addition, we also found five new SNPs significantly associated with SOC in Brassica napus (Table 2), and all the significant SNPs detected in current study were further analyzed by combination with the RNA-Seq data.

Comparative analysis of three Brassica napus lines using transcriptome sequencing from phenotype to gene expression
Identification of differentially expressed (DE) candidate genes in combination with GWAS and RNA-seq has been proven to be more successful than each separate strategy [38,64]. In this study, an extremely low-oil (CQ46) and two extremely high-oil content accessions (CQ24, CQ52) were selected from the GWAS analysis population to perform transcriptome sequencing (Fig. 4). A difference of approximately 10% SOC was observed between the HO and LO accessions (Fig. 4a), without a difference between the seed protein content and 1000-grain weight. Additionally, C18:1 was higher in HO (CQ24 and CQ52) than LO (CQ46) accessions, and C16:0 and C18:2 showed the opposite pattern either in seeds at 30 days after flowering or at maturity, which are consistent with the higher expression levels of KASIII, SAD and FAD3 in seeds (30SM and 30SB) of high-oil content than low-oil content accessions (Fig. 6). Moreover, a previous study has shown that fatty acid accumulation is accelerated from 26 to 34 days after flowering [40]. Therefore, seeds and silique pericarps at 30 days after flowering were selected for transcriptome analysis among typical highand low-oil content accessions. In addition, we analyzed the transcriptional levels of major genes in the TAG biosynthesis and assembly pathway in four tissues of HO (CQ24, CQ52) and LO (CQ46) accessions. As shown in Fig. 6, transcripts of the genes involved in the TAG biosynthesis and assembly pathway were generally higher in seeds (30SM and 30SB) than silique pericarps (30SPM and 30SPB), which suggested that the seeds were more active in TAG biosynthesis and assembly than silique pericarps. Exceptionally, we found that FAX1, PLD, CPT, PLC, and PDAT were expressed at higher levels in silique pericarps than seeds, indicating that silique pericarps may also play a non-negligible role in TAG biosynthesis, which is consistent with a previous study examining the importance of the silique wall in the regulation of seed oil content [18]. The expression level of most of genes involved in fatty acid (FA) and TAG biosynthesis in seeds (30SM and 30SB) was significantly enhanced in HO (CQ24 and CQ52) compared with LO, which including PDHC, ACCase, MCMT, KAR, HAD, KASIII, FATA , KASII, SAD, LACS9, GPDH, PAP, FAD3, DGAT and so  Table S6). To our surprise, the transcription levels of OBO and CALO were significantly higher in all tested tissues of HO (CQ24 and CQ52) than LO (CQ46) (Fig. 6, Additional file 1: Table S6), which may suggest an important role for these OBO and CALO in high seed oil formation. This result is consistent with the findings of Liu et al. showing that the overexpression of soybean oleosin can increase the seed lipid content in transgenic rice [35]. These results strongly suggested that it is feasible to select these accessions as extremely high-(CQ24 and CQ52) and low-oil content (CQ46) lines for RNA-Seq.

The identification of SOC-related candidate genes
Brassica napus is one of the most important oil crops in the world along with soybean and palm. The typical SOC of B. napus germplasm varies from 35 to 50% [36]. Recently, through the efforts of breeders, ultra-high oil content germplasm materials with 55-65% oil content have been produced [20]. Hu et al. predicted that the B. napus seed oil content could be increased to 75% [17]. Therefore, there is still a great potential to increase the B. napus seed oil content, especially in the main Chinese rapeseed producing areas such as Chongqing of the Yangtze River Basin. Previous studies examining the seed oil content of B. napus have focused on many QTL mappings or a small number of GWAS, but in this study, the combination of GWAS and transcriptome sequencing was implemented for this purpose. According to the 300-kb flanking regions on either side of the 17 significantly associated SNPs with SOC [56], we obtained a total of 411 genes. These genes were annotated by applying the BLASTP program against the Arabidopsis proteome (TAIR10) with an E-value threshold of 1E−5 [38] (Additional file 1: Table S5), and a total of 14 acyl-lipid metabolism (ALM)-related genes were found (Table 3). Although these 14 genes did not differ significantly in the four detested tissues of high-and low-oil accessions, and their expression patterns are shown in Additional file 2: Fig. S2b, we found BnaC05g10520, BnaA01g12060 and BnaA01g12140D were expressed at higher levels in all tested tissues of high-and low-oil accessions. In previous reports, BnaA01g12140D which is homologous to AT4G22330 (ATCES1), encodes a nuclear and endoplasmic reticulum localized Acyl-CoA-independent ceramide synthase that is involved in sphingolipid metabolism, disease resistance, nutrient limitation, and response to salt stress [62,67]. Therefore, we speculate that these genes (BnaC05g10520, BnaA01g12060 and BnaA01g12140D) play an important role in lipid metabolism of B. napus.
A combination of our GWAS and the DEGs of the transcriptome sequencing results between HO and LO accessions revealed 16 genes (X1-X16) under CQ24/ CQ46 (Table 4) and 32 genes (Z1-Z32) under CQ52/  (Table 5). The expression patterns of all the genes using HemI1.0 are shown in Additional file 2: Fig. S3 [58]. Interestingly, seven common genes were obtained under CQ24/CQ46 and CQ52/CQ46, and they were considered important candidate genes associated with seed oil content in B. napus. BnaA01g13030D (X5/Z30) and BnaA03g36510D (X7/Z31) represented two genes of unknown function, and their expression levels were significantly lower in seeds of HO (CQ24 and CQ52) than LO (CQ46) (Additional file 2: Fig. S3c). The gene (BnaA03g36620D, X10/Z20) encodes AOX1a, an isoform of alternative oxidase. The absence of AOX1a in Arabidopsis leads to acute sensitivity to combined light and drought stress [15]. AOX1a has been implicated in the modulation of metabolic homeostasis in cadmium (Cd)-exposed Arabidopsis plant and displays a differential role in roots and leaves in response to sublethal cadmium exposure [25,26]. The maize AOX1a gene plays an essential role under oxidative stress [47]. In Brassica napus roots, selenite can activate the alternative oxidase pathway and alter primary metabolism and may ultimately improve selenium tolerance [11]. BnaA03g37060D (X12/Z13) was a putative component of photosystem II complex and may be involved in response to UV-B, ozone and wounding. In all tested tissues, BnaA03g37060D is less expressed in HO and is not expressed in LO accessions. BnaC05g10690D (X14/ Z1) exhibited extremely high expression levels in LO compared with HO lines, and this finding was verified by qRT-PCR (Fig. 7, Additional file 2: Fig. S3c). Therefore, we speculate that interfering with the expression of this gene may contribute to increase seed oil content in Brassica napus. BnaC05g10700D (X15/Z9) was an ARM repeat superfamily protein and its specific function is unknown, and its expression level in HO is higher than that of LO accessions. Increasing the expression level of BnaC05g10700D may help to increase seed oil content. BON2 (BnaC05g10710D, X16/Z10) encodes a copine-like protein. In Arabidopsis, the BON gene family can promote cell growth and development in addition to repressing cell death [63]. Additionally, Li et al. [33] have validated that plasma membrane-localized calcium pumps and copines coordinately regulate pollen germination and fertility in Arabidopsis. Furthermore, BnaC05g10710D exhibited a significantly higher expression level in HO compared with LO accessions, and this result also further verified by qRT-PCR (Fig. 7). This gene (BnaC05g10710D) may play a positive regulatory role in the formation of high oil content in B. napus. Of course, the role of the seven candidate genes mentioned above in the formation of seed oil content in B. napus must be further confirmed, which will be an important task for our next studies.

Conclusion
In the present study, 17 loci significantly associated with seed oil content in B. napus were successfully obtained, and 12 significant SNPs were found to overlap with QTLs from previous studies, which proved the reliability of this study. In addition, five novel significant SNPs distributed on the C5 and C7 chromosomes were identified, which provided valuable information for further exploration of genes that contribute to increase seed oil content in B. napus. Subsequently, the combination of GWAS and transcriptome analyses revealed seven functional candidate genes located within the confidence intervals of significant SNPs associated with seed oil content in B. napus. These results may facilitate marker-based breeding for higher seed oil content in B. napus.
Additional file 1: Table S1. Seed oil content (SOC, % of seed weight) phenotypes of 588 accessions for GWAS analysis. 2016CQ, 2017CQ and 2018CQ refer to the three environments, CQ refers to Chongqing. BLUP represents the SOC phenotypic value obtained by the best linear unbiased prediction in three environments. Missing data are replaced by '− 999' . Table S2. The quality statistics of RNA sequencing data. Table S3. The number of differential genes is counted under HO/LO accessions. Table S4. Common differential lipid metabolism genes in four tested tissues under CQ24/CQ46 and CQ52/CQ46 based on KEGG pathway analysis. Table S5. All genes within the confidence interval of significant related SNPs with SOC. Table S6. Summary of gene expression of TAG biosynthesis and assembly pathway in all four tissues among LO (CQ46) and HO (CQ24, CQ52) Brassica napus lines. Table S7. Primer sequences for qRT-PCR verification in this study.
Additional file 2: Fig. S1. (a) Statistics on the number of differential genes in different tissues with different seed oil content (SOC) Brassica napus lines. (b) Gene ontology (GO) enrichment analysis of common DEGs in 30SM and 30SB in Fig. 5a. (c) Gene ontology (GO) enrichment analysis of common DEGs in 30SPM and 30SPB in Fig. 5a. Fig. S2. Expression patterns of identified ALM genes within the confidence interval of significant related SNPs with SOC. (a) Heatmap of identified common differential ALM genes was derived from KEGG pathway analysis in all tested tissues under CQ24/CQ46 and CQ52/CQ46. (b) Heat map of identified all ALM genes within the confidence interval significantly associated with SOC was derived from transcriptome sequencing among HO (CQ24, CQ52) and LO (CQ46) lines. Fig. S3. Expression patterns of candidate genes identified by GWAS and transcriptome sequencing. Heatmap of identified candidate genes was derived from RNA sequencing data in all tested tissues between CQ24/CQ46 (a) and CQ52/46 (b) and common candidate genes (c).