Selection of a rare segregant displaying the trait of low glycerol/high ethanol yield and lacking the major causative allele ssk1E330N…K356N
Previous work has identified the S. cerevisiae strain CBS6412 as a strain with an unusually low ratio of glycerol/ethanol yield and genetic analysis identified the ssk1E330N…K356N allele as a major causative gene[19] (Figure 1a). In order to identify the minor QTLs and their causative genes responsible for determining this complex trait, we have first screened all superior segregants with a glycerol/ethanol ratio as low as the superior parent strain, for a segregant that lacked the ssk1E330N…K356N allele. Among the 44 superior segregants available, only a single such segregant, 26B, was present. Its glycerol yield was equally low and its ethanol yield equally high as the superior parent strain CBS4C, both in minimal medium with 5% glucose and in rich YP medium with 10% glucose (Figure 1b). Hence, 26B showed the same phenotypic difference with the inferior parent strain ER7A as CBS4C (Figure 1b).
Backcross of the unique superior segregant 26B with the inferior parent ER7A and screening for superior segregants
We next switched the mating type of 26B from Matα to Mata (see Materials and methods) and crossed the Mata 26B strain with the Matα inferior parent strain, ER7A, which is a derivative of the industrial strain Ethanol Red, currently used worldwide in bioethanol production. The hybrid diploid ER7A/26B showed a glycerol/ethanol yield phenotype, which was intermediate between that of ER7A and 26B (Figure 1b). The hybrid was sporulated and 260 meiotic segregants were screened for low glycerol yield (and corresponding higher ethanol production) in 100 ml fermentations with YP 10% glucose. The parent strains 26B and ER7A, and the hybrid diploid, were used as controls in each batch of fermentations.
Glycerol and ethanol yield of the segregants in each batch were normalized to those of 26B, which were set to 100%. ER7A and the diploid 26B/ER7A showed an average glycerol yield of 146% and 124% and a decreased ethanol yield of 98.1% and 99.4% (Figure 2a). The glycerol and ethanol yield of the segregants showed a Gaussian distribution, which extended over the range of the two parental strains. In the case of the lowest glycerol yield, this extension was only marginal. The population means of the glycerol yield (123%) and ethanol yield (98.8%) were close to those of the diploid 26B/ER7A. In general, glycerol and ethanol yield of the segregant population correlated inversely (as determined with a Pearson test), meaning that low glycerol yield was usually accompanied by high ethanol yield. Nearly all exceptions to this correlation were segregants with an unusually low ethanol yield that failed to show a correspondingly higher glycerol yield. To compose the pool of selected superior segregants, two cut-off criteria were defined, a glycerol yield lower than 120% of 26B and an ethanol yield higher than 99% of 26B. These cut-off criteria resulted in the selection of a set of 34 superior segregants. These were all retested in 100 ml fermentations with YP 10% glucose and 22 segregants showed again a low glycerol yield combined with a correspondingly higher ethanol yield using the same cut-off criteria (Figure 2b). These 22 segregants were selected for QTL mapping with pooled-segregant whole-genome sequence analysis. A second pool with 22 randomly selected segregants was also subjected to pooled-segregant whole-genome sequence analysis and referred to as the unselected control pool (Figure 2b).
Pooled-segregant whole-genome sequence analysis and QTL mapping
The genomic DNA of the selected and unselected pools, as well as the parent strain 26B, was extracted and submitted to custom sequence analysis using Illumina HiSeq 2000 technology (BGI, Hong Kong, China). The genome sequence of the parent strain ER7A has been determined in our previous study (data accession number SRA054394)[19]. Read mapping and single nucleotide polymorphism (SNP) filtering were carried out as described previously[20, 29]. The SNP variant frequency was plotted against the SNP chromosomal position (Figure 3). Of the total number of 21,818 SNPs between CBS4C and ER7A, 5,596 SNPs of CBS4C were found back in 26B. These SNPs were used for mapping minor QTLs in the genomic areas that were not identical between 26B and ER7A. The other genomic areas were completely devoid of SNPs because they were identical between the 26B and ER7A parents (white gaps in Figure 3). The scattered raw SNP variant frequencies were smoothened and a confidence interval was calculated, as previously described[20, 29]. The Hidden Markow Model, EXPloRA (see Materials and methods) was used to evaluate whether candidate regions showed significant linkage to the low glycerol phenotype. EXPloRA indicated six significant QTLs: on chr. I (3859–11045), chr. II (584232–619637), chr. IV (316389–375978 and 696486–748140), and chr. XIII (600902–610995 and 634582–640415) for the selected segregants pool.
The locus on chr. I was present in both the selected and unselected pool and was thus likely linked to an inadvertently selected trait, such as sporulation capacity or spore viability. It was excluded from further analysis. EXPloRA also reported two significantly linked loci on chr. VI (169586–170209) and chr. VII (472620–493523) for the unselected pool. Both loci were linked to the inferior parent, ER7A. For the region on chr. VII, the linked locus with the inferior parent genome was also present in the selected pool. Both loci likely represent linkage to inadvertently selected traits, such as sporulation capacity or spore viability. It is unclear why the locus on chr. VI was only present in the unselected pool. Since both loci were not linked to the low glycerol phenotype they were not investigated further.
The locus on chr. II was interesting since it also appeared in the previous mapping with the two original parents, CBS4C and ER7A, but in that case it was not pronounced enough to be significant[19]. The mapping with the backcross has now confirmed the relevance of this locus. On chr. IV and XIII, two new QTLs with a significant linkage to the low glycerol/high ethanol yield phenotype were detected. These QTLs were not present in our previous mapping with the original parent strains CBS4C and ER7A.
All QTLs with a significant link to the phenotype under study, i.e. those on chr. II, IV and XIII, were further investigated in detail. Selected SNPs within the respective QTLs were scored in the 22 individual superior segregants to determine precisely the SNP variant frequency and the statistical significance of the linkage. Using the binomial test previously described[20, 29] none of the three loci was found to be significantly linked to the genome of the superior parent strain 26B with the low number of superior segregants available. Therefore, we isolated 400 additional F1 segregants of the diploid 26B/ER7A and screened them for low glycerol/high ethanol production. In addition, we performed four rounds of random inbreeding (mating and sporulation) with all F1 segregants from the diploid 26B/ER7A to increase the recombination frequency[23] and subsequently also evaluated 400 F5 segregants in small-scale fermentations for glycerol/ethanol yield. The results for the 400 F1 and 400 F5 segregants are shown in Figure 4a. The glycerol and ethanol yields are again expressed as percentage of that of the superior parent strain 26B. There was again a clear inverse correlation between glycerol and ethanol yield. From the 800 segregants, we selected in total 48 superior segregants, i.e. 22 F1 segregants and 26 F5 segregants (Figure 4b).
We next scored selected SNPs in the putative QTLs on chr. II, IV and XIII in the 22 additionally selected F1 segregants and the 26 selected F5 segregants. Next, we determined the SNP variant frequency and the corresponding P-value, as described previously[20, 29], for the following groups of segregants: the 22 initially selected segregants of the sequenced pool, the 22 additionally selected F1 segregants, the total of 44 selected F1 segregants, the 26 selected F5 segregants and the total of 70 selected F1 and F5 segregants. They are shown in Figure 4c. By increasing the number of superior segregants, we were now able to demonstrate significant linkage (P-value < 0.05) to the genome of the superior parent strain 26B for the three QTLs under study. For the QTLs on chr. II and IV the linkage was very strong, while for the QTL on chr. XIII it was still weak, but significant. In contrast, the second region on chr. IV did not show any significant linkage with none of the pools.
Identification of causative genes in the QTLs on chr. II, IV and XIII
For further analysis, we first selected three potential candidate genes within the three QTLs, based on their known function in glycerol metabolism. SMP1, which is located in the QTL on chr. II (594,864 to 593,506 bp), encodes a putative transcription factor involved in regulating glycerol production during the response to osmostress[30]. The gene is located in the chromosomal region from 584,232 to 619,637 bp, which was predicted as most significant by the EXPloRA model. The 26B SMP1 allele has two point mutations within its coding sequence, which are changing the primary protein sequence at position 110 from arginine to glutamine and at position 269 from proline to glutamine. Hence, we have named this allele smp1R110Q,P269Q.
The SNP with the highest linkage within the QTL found on chr. IV, was located at position 411,831 bp (Figure 4c), which is within the open reading frame of GPD1 (411,825 to 413,000 bp). This is the structural gene for the NAD+-dependent cytosolic GPDH[15, 16]. This enzyme catalyzes the conversion of DHAP to glycerol 3-phosphate through the oxidation of NADH and has been shown to be the rate-controlling step in glycerol formation[1, 16]. The GPD1 allele of 26B harbors a point mutation, changing leucine at position 164 into proline. This mutation was found before (DDBJ database data, accession number AY598965). The GPD1 allele of 26B was named gpd1L164P.
The SNP with the highest linkage within the QTL found on chr. XIII was located at position 606,166 bp (Figure 4c), which is within the open reading frame of HOT1 (605,981 to 608,140 bp). HOT1 encodes a transcription factor required for the response to osmotic stress of glycerol biosynthetic genes, including GPD1, and other HOG-pathway regulated genes[31, 32]. The 26B HOT1 allele contains two non-synonymous point mutations, changing proline to serine at position 107 and histidine to tyrosine at position 274. We have named the HOT1 allele of 26B, hot1P107S,H274Y.
We first investigated the effect of smp1R110Q,P269Q, gpd1L164P and hot1P107S,H274Yon the low glycerol/high ethanol phenotype using reciprocal hemizygosity analysis (RHA)[25]. For that purpose, we constructed for each gene a pair of hemizygous diploid 26B/ER7A hybrid strains, in which each pair contained a single copy of the superior or the inferior allele of SMP1, GPD1 or HOT1, respectively, while the other copy of the gene was deleted. The three pairs of hemizygous diploids were tested in the same 100 ml YP 10% glucose fermentations as previously used for the screening. The parent strains 26B and ER7A and the hybrid diploid 26B/ER7A were added as controls. The glycerol and ethanol yields were again expressed as percentage of those of 26B, which were set at 100%. The significance of any differences between the strains was evaluated using a two-tailed unpaired t-test with a P-value < 0.05 considered to indicate a significant difference. The results of the RHA are shown in Figure 5. They indicate that both smp1R110Q,P269Q and hot1P107S,H274Y, but not gpd1L164P, derived from the superior parent 26B cause a significant drop in the glycerol/ethanol ratio compared to the alleles of the inferior parent strain ER7A. For smp1R110Q,P269Q only the reduction in glycerol, and not the increase in ethanol, was significant with the P-value < 0.05 used. These results indicate that smp1R110Q,P269Q is probably a causative gene in the QTL on chr. II. They do not exclude that the QTL may contain a second causative gene, especially since smp1R110Q,P269Q is not located in the region with the strongest linkage (lowest P-value).
The RHA with the GPD1 alleles failed to show any difference both for glycerol and ethanol production (Figure 5). Hence, the superior character of the gpd1L164P allele could not be confirmed with RHA. This is remarkable because the SNP with the strongest linkage (lowest P-value) in the QTL on chr. IV was located in the open reading frame of GPD1 and showed very strong linkage to the low glycerol/high ethanol phenotype. The hot1P107S,H274Y allele of the superior strain 26B, in contrast, caused a reduction in glycerol and an increase in ethanol production, and both changes were significant (P-value < 0.05) (Figure 5). Hence, these results indicate that hot1P107S,H274Y is a causative allele in the QTL on chr. XIII and because it contains the SNP with the strongest linkage (lowest P-value), it is likely the main causative allele in this QTL.
The glycerol yield for the inferior parent ER7A and the diploid 26B/ER7A were on average 143% and 126% of the 26B yield (Figure 5). Ethanol yield of both strains was correspondingly reduced to 98% of the 26B yield. Clearly, the smp1R110Q,P269Q and hot1P107S,H274Y alleles can only be responsible for part of the difference in the glycerol/ethanol ratio between the parent strains. The same was found previously for the ssk1E330N…K356N allele[19]. This confirms that the glycerol/ethanol ratio in yeast fermentation is a true polygenic, complex trait, determined by an interplay of multiple mutant genes.
Expression of the gpd1L164Pallele from 26B in haploid gpd1∆ strains reveals its superior character
Several explanations could account for the failure to confirm the superior character of the gpd1L164P allele from 26B in the RHA test. A closely located gene may be the real causative gene, the gpd1L164P allele may be effective only in a haploid genetic background or the effect of the gpd1L164P allele may be suppressed through epistasis by one or both of the other two superior alleles, smp1R110Q,P269Q and hot1P107S,H274Y. To distinguish between these possibilities, we amplified the gpd1L164P allele from strain CBS4C and the GPD1 allele from strain ER7A by PCR (410,523 to 413,479 bp, including promotor, ORF and terminator). The PCR fragment was ligated in the centromeric plasmid YCplac33, resulting in plasmids YCplac33/gpd1L164P-CBS4C and YCplac33/GPD1-ER7A. Both plasmids were transformed into gpd1∆ strains of the two parents 26B and ER7A, the hybrid diploid 26B/ER7A and the lab strain BY4742[33, 34]. All strains were tested in 100ml fermentations with YP 10% glucose. Glycerol and ethanol yields were determined after 120 h of fermentation. The results are shown in Figure 6.
When the gpd1L164P-CBS4C allele or the GPD1-ER7A allele were expressed in the gpd1∆ strains of the superior parent 26B or the hybrid diploid 26B/ER7A, the increase in glycerol production and the decrease in ethanol production was the same for the two alleles. On the other hand, expression of the gpd1L164P-CBS4C allele in the gpd1∆ strains of the inferior parent ER7A or the lab strain BY4742, enhanced glycerol production and reduced ethanol production significantly more than expression of the GPD1-ER7A allele. The latter shows that the gpd1L164P-CBS4C allele is superior compared to the GPD1-ER7A allele. The difference between the two alleles is apparently not dependent on the haploid or diploid background of the strain but seems to be related with the presence of the two other superior alleles, smp1R110Q,P269Q and hot1P107S,H274Y. They are both present in the two strains, 26B and 26B/ER7A, in which gpd1L164P-CBS4C has no differential effect and absent in the two strains, ER7A and BY4742, in which gpd1L164P-CBS4C has a differential effect. Hence, the superior potency of gpd1L164P-CBS4C may be suppressed through epistasis by smp1R110Q,P269Q and/or hot1P107S,H274Y. On the other hand, we cannot exclude that the effect of gpd1L164P-CBS4C is suppressed by one or more other mutant genes present in the superior parent 26B or the hybrid diploid 26B/ER7A.
We have scored the final 70 superior segregants with a glycerol yield < 120% and an ethanol yield > 99% of that of the superior parent 26B, for the presence of the three causative alleles, smp1R110Q,P269Q, gpd1L164P and hot1P107S,H274Y. The results are shown in Figure 7a. The largest group of superior segregants contained all three mutant alleles, followed by smaller groups with only two of the three mutant alleles and finally the three smallest groups with only one mutant allele. Hence, there was a clear correlation between the number of mutant alleles and low glycerol/high ethanol yield in this group of selected segregants. On the other hand, although there was a tendency for a lower mean glycerol/ethanol yield ratio with an increasing number of mutant alleles, the differences between the means of the different groups were small and the variation remained large and with the same range for the three largest categories.
We have also investigated a possible correlation between the different mutant alleles and the strength of the low glycerol/high ethanol phenotype. For that purpose, we determined the percentage of segregants with a specific mutant allele in sets of strains with a decreasing glycerol yield or an increasing ethanol yield. The results show that there is no preference between the different alleles in the strains with a higher glycerol yield, but in the strains with the lowest glycerol yield, the gpd1L164P allele is preferentially present, followed by the hot1P107S,H274Y allele, although this only holds for the category with the lowest glycerol yield (Figure 7b). Hence, the order of potency of the three alleles appears to be: gpd1L164P > hot1P107S,H274Y ≥ smp1R110Q,P269Q. There was no correlation between the variant frequency of the three alleles for high ethanol yield, indicating that other minor QTLs may affect ethanol yield independently from glycerol yield and act together with the currently identified alleles.