Association mapping identifies quantitative trait loci (QTL) for digestibility in rice straw
Biotechnology for Biofuels volume 13, Article number: 165 (2020)
The conversion of lignocellulosic biomass from agricultural waste into biofuels and chemicals is considered a promising way to provide sustainable low carbon products without compromising food security. However, the use of lignocellulosic biomass for biofuel and chemical production is limited by the cost-effectiveness of the production process due to its recalcitrance to enzymatic hydrolysis and fermentable sugar release (i.e., saccharification). Rice straw is a particularly attractive feedstock because millions of tons are currently burned in the field each year for disposal. The aim of this study was to explore the underlying natural genetic variation that impacts the recalcitrance of rice (Oryza sativa) straw to enzymatic saccharification. Ultimately, we wanted to investigate whether we could identify genetic markers that could be used in rice breeding to improve commercial cultivars for this trait. Here, we describe the development and characterization of a Vietnamese rice genome-wide association panel, high-throughput analysis of rice straw saccharification and lignin content, and the results from preliminary genome-wide association studies (GWAS) of the combined data sets. We identify both QTL and plausible candidate genes that may have an impact on the saccharification of rice straw.
We assembled a diversity panel comprising 151 rice genotypes (Indica and Japonica types) from commercial, historical elite cultivars, and traditional landraces grown in Vietnam. The diversity panel was genotyped using genotype by sequencing (GBS) methods yielding a total of 328,915 single nucleotide polymorphisms (SNPs). We collected phenotypic data from stems of these 151 genotypes for biomass saccharification and lignin content. Using GWAS on the indica genotypes over 2 years we identified ten significant QTL for saccharification (digestibility) and seven significant QTL for lignin. One QTL on chromosome 11 occurred in both GWAS for digestibility and for lignin. Seven QTL for digestibility, on CH2, CH6, CH7, CH8, and CH11, were observed in both years of the study. The QTL regions for saccharification include three potential candidate genes that have been previously reported to influence digestibility: OsAT10; OsIRX9; and OsMYB58/63-L.
Despite the difficulties associated with multi-phasic analysis of complex traits in novel germplasm, a moderate resolution GWAS successfully identified genetic associations encompassing both known and/or novel genes involved in determining the saccharification potential and lignin content of rice straw. Plausible candidates within QTL regions, in particular those with roles in cell wall biosynthesis, were identified but will require validation to confirm their value for application in rice breeding.
The need to cut carbon emissions has become a global priority and the production of low carbon liquid fuels and chemicals are important components in the drive for a sustainable industrial bio-economy. The use of major crops and agricultural land exclusively for biofuel production is considered unsustainable and generates concerns over global food security. However, the use of non-food crop residues represents an alternative source of biomass. Such lignocellulosic crop biomass is typically composed of around 70% polysaccharides that can be potentially depolymerized to produce sugars for fermentation. Millions of tons of rice straw are burned every year for disposal . Field burning of biomass generates ground-level atmospheric pollution that is responsible for premature mortalities, lost economic activity and decreased agricultural yields in many rice-growing nations . Consequently, there are clear benefits to valorizing rice straw and other residues to produce fuels and chemicals. However, the use of biomass is hindered by its recalcitrance to digestion.
Most agriculturally important broad-acre cereals have large complex genomes that make them complicated to use for research purposes. One exception to this is rice (Oryza sativa), one of the worlds’ most important cereal crops. Rice has a small diploid genome (only about twice the size of Arabidopsis) and well-developed molecular genetic tools .
Albeit with many advantages, most research focused on understanding the synthesis and construction of plant cell walls has been conducted in Arabidopsis . Unfortunately, many aspects of this research cannot be directly transferred to grasses, as monocots and dicots differ in their cell wall biology . While they both comprise cellulose microfibrils embedded in a matrix of hemicellulose and lignin, there are substantial differences in these two components and how they bond to one another. While the predominant hemicellulose in dicot lignocellulose is an acetylated glucuronoxylan, grasses have more complex, highly decorated arabinoxylans . Grass arabinoxylans are notably decorated with hydroxycinnamic acid esters associated with arabinosyl side chains. Ferulic acid esters on arabinoxylans form cross links with neighbouring stretches of different arabinoxylan chains and with lignin , a feature not found in dicots. Lignin structure also differs considerably between dicots and grasses, with a greater preponderance of hydroxycinnamic acids in grass lignin .
Alterations in cell wall components can affect the recalcitrance of lignocellulosic biomass, and thus improve its saccharification with the potential to improve energy crops through plant breeding [9,10,11]. While reducing lignin can decrease recalcitrance in grasses , several publications also indicate that alterations in hydroxycinnamic esters can have a significant effect on recalcitrance . In rice and Brachypodium, decreased levels of ferulic acid accompany increases in lignocellulose digestibility [14,15,16]
Recently, important advances that lay the foundations for engineering or breeding plants for biofuel production have been made. These include lists of genes that could be manipulated or mined towards a goal of pathway engineering. However, for practical implementation, many challenges remain to be addressed . In plants and animals, studies of genetic sources of phenotypic variation have been the key to determining the cause of disease, improving agriculture and understanding adaptive processes . In particular, genetic analysis of natural variation has been used to identify both genes and quantitative trait loci (QTL) that account for significant amounts of phenotypic variation for a given trait within a population. QTL were originally mapped in bi-parental populations in plants . In bi-parental mapping populations, genetic resolution is often limited, confined to a range of 10 cM to 30 cM due to the restricted number of meiotic events captured during a cross between two parental lines . For example, Truntzler et al. identified 26 and 42 QTL in a maize bi-parental population that accounted for much of the variation in forage digestibility and cell wall composition traits, respectively, apparent in that population . Penning et al. similarly identified QTL for cellulase digestibility in a recombinant inbred population of maize , and Liu et al. identified a broad region on chromosome 1 that influenced digestibility in rice straw in a bi-parental population . Unfortunately, the number, effect and resolution of individual QTL in a bi-parental population frequently hamper causal gene identification. In addition, only a couple of all possible alleles present in a species can be examined for linkage to a trait in a population derived from two parental individuals .
Linkage disequilibrium (LD) mapping, or association mapping (AM) exploits historical recombination events that have occurred in all of the genomes contained within a population. All major alleles segregating in those genomes can then be considered when attempting to identify significant marker–phenotype associations . Over the last few years, genome-wide association studies (GWAS) have become increasingly popular. GWAS is a powerful approach that overcomes many of the constraints inherent to bi-parent linkage mapping. It exploits the considerable variation revealed by high-throughput molecular markers in natural or constructed populations across all chromosomes with high resolution . An appropriate panel of genotypes, density of molecular markers and high-quality phenotypic data are key to establishing successful association study. GWAS was first applied in humans  and, after over two decades, is continuing to provide a powerful approach for the localization of genes underlying both simple and complex traits in many species, including crops. The advent of high-density single-nucleotide polymorphism (SNP) genotyping is allowing whole-genome scans to identify small haplotype blocks that are significantly correlated with quantitative trait variation . GWAS in crops usually use a population of diverse (and preferably homozygous) genotypes that is genotyped once and can be phenotyped for many traits to generate specific mapping populations for specific traits or QTL . There have been a number of studies using a range of genetic approaches to identify QTL for digestibility with different degrees of resolution in different species such as sorghum , Miscanthus , maize , alfalfa , and poplar . Nevertheless, digestibility/saccharification is a difficult trait to measure, with potential variation arising from both the field and the laboratory phases of the work .
Rice is a selfing species and, like Arabidopsis, a good candidate for GWAS. Huang et al. identified an unbiased set of common SNPs that was used to identify strong associations between genetic loci and 14 agronomic traits, including heading date, grain size, and starch quality . With the now well-developed molecular genetics tools, the advent of affordable large-scale DNA sequencing and association genetic studies starting to reach their full potential, GWAS in rice has the potential to identify both QTL for saccharification and novel genes involved in cell wall synthesis.
The aim of the present work was to determine whether GWAS can be used to identify QTL and candidate genes associated with the saccharification potential of rice straw. Using a new association panel comprising 151 rice genotypes from Vietnam, we measure lignocellulose digestibility and lignin content in field-grown straw from this population across 2 years. Association studies using only the indica subset revealed a number of significant QTL and candidate genes, some common to both lignin content and digestibility.
The SNP matrix used for association mapping in the present work was generated by genotyping by sequencing (GBS) 172 rice genotypes, followed by GBS “Discovery Pipeline” analysis (Tassel Version: 3.0.166, date: April 17, 2014). We identified a total of 328,915 SNPs that were stored in HapMap  and used as genotypic data for GWAS (Fig. 1). The average density of SNP markers in our panel is 1SNP/Kb. It has been reported that genome-wide linkage disequilibrium decay rates for rice subspecies such as indica and japonica are estimated at ~ 123 kb and ~ 167 kb , and cultivated rice has a longer range of decay (100 kb to over 200 kb) . For GWAS studies, the coverage of markers that we generated should therefore give satisfactory resolution. Indeed, this SNP density means that causative polymorphisms stand a reasonable chance of being in LD with one or more markers and should help to identify small haplotype blocks that are significantly correlated with complex traits such as lignocellulose recalcitrance.
From 172 genotypes used for SNP identification, we reduced the number for GWAS to 151 due to appearance of some identical genotypes. Controlling for population structure is a standard procedure in GWAS and is particularly important in this research as genotypes were collected from many different sources and include both indica and tropical japonica varieties. The diversity level and stratification of the population were examined before performing GWAS. A phylogenetic tree and heat map of the values in the kinship matrix created from the SNPs, which both show relatedness among the population were calculated using GAPIT (Fig. 2) [37, 38]. The results show that there are two subpopulations in the association mapping panel (Fig. 2). The smaller subpopulation includes 22 tropical japonica genotypes with the other subpopulation comprising 129 indica genotypes.
Measuring lignocellulose recalcitrance and lignin content
Lignocellulose recalcitrance to digestion was measured by incubating ground straw from individual genotypes with a commercial cellulase cocktail following a water pre-treatment at 94 °C using an automated platform . To determine QTL for recalcitrance in our rice association panel, we harvested straw over two consecutive years during the spring season in 2013 (93 genotypes) and the summer season in 2014 (151 genotypes). The results from the 2014 harvest showed values in the range of 20–134 nmol of reducing sugar equivalents/mg of biomass per hour of hydrolysis (nmol/mg h), and for the 2013 harvest the range was between 23 and 72.8 nmol/mg h (Fig. 3). There is little correlation between the saccharification data sets from both years in the 93 genotypes present in both trials (Fig. 4). We attribute the lack of correlation between two datasets largely to environmental effects of growth in different seasons on saccharification. This illustrates the difficulties inherent in measuring complex traits where field and laboratory phases of the analysis and different years of growth can introduce non-genetic variation. In addition to that, there is also potential influence of different environmental conditions to marker effects (i.e. marker by environment interaction effects)  Most rice genotypes are adapted for optimal growth in a specific growing season, while some are adapted for both seasons, causing differences in biomass quality.
Lignin content was assessed using the acetyl bromide method  and showed a significant degree of variation among the 151 rice genotypes included in the association panel, ranging between 26.3% and 14.3% (Fig. 5).
A correlation analysis between lignin content and recalcitrance revealed no significant correlation between the two for the indica population (R2 = 0.0006), although there was a significant correlation apparent in the smaller japonica sub-population (R2 = 0.066, and the p = 0.045*) (Fig. 6). Based on these results, we decided to remove the japonica subpopulation to improve the power of GWAS and to avoid the population structure misleading the analysis .
GWAS for recalcitrance
We ran GWAS for recalcitrance in 2 years separately, using adjusted saccharification genotype means from straw biomass harvested from 83 indica genotypes in 2013 and 125 indica genotypes in 2014. A separate mixed linear model (MLM) was fitted for each year separately in TASSEL . We identified several significant associations in each year including seven QTL regions, on CH2, CH6, CH7, CH8, and CH11, present in both years’ data (Table 1). The data set from 2014 yielded a total of 102 significant SNP associations (Table 1). Figure 7 shows a Manhattan plot showing QTL for saccharification with a false discovery rate (FDR) of < 0.05, as the cutoff for significant SNPs (above the red line). The quantile–quantile (QQ) plot that represents deviation of the observed P values from the null hypothesis is shown in Additional file 1. The genetic effects of these QTL to phenotype variance were calculated as phenotypic variance explained (PVE) by significant SNPs (see Table 1). There are SNP clusters/QTL on CH1, CH2, CH6, CH7, CH8, and CH11, which have PVE values ranging from 18% (at CH2_24.6 ± 0.2 Mb) to 56% (at CH7_26.4 ± 0.4 Mb) (Table 1).
GWAS for lignin content
By fitting the adjusted means of lignin of 124 indica genotypes grown in 2014 in the same GWAS model as for recalcitrance, we found 56 significant SNPs using a cutoff at p < 0.001 and MAF > 0.05. The FDR correction for p value was not applied because none of the SNPs qualified for FDR < 0.05. In this case, we used only the p value to account for the significance of each SNP associated with lignin content. This means that we have accepted an overestimate of the true significance of some SNPs and accept that some may be false positives. The QQ plot that represents deviation of the observed p values from the null hypothesis is shown in Additional file 1. The significantly associated SNPs with lignin content are situated in CH1, CH2, CH3, CH8, CH10, and CH11 (Table 2). These significant SNPs explain from 5.18% (at CH10_19.2 ± 0.3 Mb) to 12.58% (at CH11_4.0 ± 0.2 Mb) of the phenotypic variation (Table 2). The QTL on CH11_4.0 ± 0.2 Mb is at the same region as a QTL found in GWAS for digestibility, although no common significant SNPs were found between these two GWAS (Fig. 8, Tables 1 and 2).
Identification of candidate genes
Candidate genes for recalcitrance
To identify the candidate genes underlying the QTL, we searched within 400 kb (± 200 kb of the peak SNPs) around the significant loci identified, based on the linkage disequilibrium (LD) decay range, published for rice [36, 42]. The MSU Rice Genome Annotation Project (https://rice.plantbiology.msu.edu/expression.shtml) database was used to search for genes and their expression data in these regions (Additional file 2). Candidates were selected based on whether the function of the genes had been characterized before in rice or if similar genes in other species had known roles in cell wall biosynthesis or modification. Table 2 shows the candidates identified for each saccharification QTL. Three candidate genes located in QTL regions found in both years of harvest have previously been shown to affect lignocellulose digestibility. The first one, LOC_Os06g39390 (OsAT10) encoding a p-coumaroyl coenzyme A transferase belongs to the Mitchell clade of BADH acyl transferases and has previously been shown to add p-coumaroyl esters to arabinoxylan . This gene and its close neighbour, locus LOC_Os06g39470 (OsAT8), belong to family PF02458 transferases [10, 43]. In 2010, Piston et al. showed that cell walls of lines where both genes are down-regulated exhibit a reduced content of ester-linked ferulate . A candidate gene located within the QTL region on chromosome 7 is LOC_Os07g49370 (OsIRX9) that encodes a glycosyl transferase involved in the synthesis of the xylan backbone in the secondary and primary cell walls. Expressing OsIRX9 in an Arabidopsis irx9 mutant background restored xylosyltransferase activity and stem strength to wild-type levels . A candidate gene within the QTL on chromosome 2 is locus LOC_Os02g46780 next to the SNP-S2_28582605 (p = 1.05E−07), identified as OsMYB58/63 L , which is a homologous to the Myb transcription factor OsMYB58/63 involved in the expression of a rice secondary wall-specific cellulose synthase gene, OsCesA7 .
Table 1 lists the QTL regions along with the positions of the three candidates mentioned above, and a number of other potential candidate genes.
Candidate genes for lignin content
All genes located in QTL regions and their expression data are listed in Additional file 3. Candidate genes associated with lignin content QTL were identified following the same procedure as for recalcitrance. The list of candidate genes in the QTL regions is shown in Table 2. Several QTL regions encompass genes known to be involved in lignin biosynthesis. A hydroxycinnamoyltransferase (HCT) gene on CH11 (CH11_4.0 ± 0.2 Mb) is in the common QTL region between GWAS for recalcitrance and lignin content. Interestingly, there are also two potential HCT genes located within a digestibility QTL on chromosome 8, namely, LOC_Os08g43040 and LOC_Os08g43020 (Table 2). Reduced expression of HCT in alfalfa has been shown to increase stem digestibility .
There is a cluster of seven peroxidase genes located close to the peak in the lignin QTL region on CH3_14.5 ± 0.4. Also, a laccase, LOC_Os11g47390.1, located in the QTL region CH11_18.8 ± 0.3, is surrounded by several cell wall genes, including a wall-associated kinase (WAK), a kinase, a receptor-like protein kinase, and a glycosyl hydrolase. Peroxidases together with laccases have been proposed to take part in the polymerization of monolignols into lignin . Downregulation or disruption of these enzymes led to the reduction of lignin content in plants [48,49,50].
The lignin content in our rice accession straws are at a similar level to that of grasses in general and higher than in dicot but lower than in wood species [5, 49,50,51]. A comparison of our results with the other unpublished data (using the same method) in our laboratory shows that rice has a top high lignin content and has the highest range of digestibility in the studied grasses.
We have piloted the use of GWAS to identify QTL for the saccharification potential of rice straw using an association panel of 151 Vietnamese elite and landrace genotypes. In this association panel, based on the pairwise studies for relatedness among all the genotypes, 129 indica genotypes were grouped into the main population and 22 tropical japonica genotypes were grouped into a smaller group, which can be considered as a sub-population. The japonica sub-population was removed from all GWAS to reduce the number of confounding factors. False positives and negatives in GWAS can occur when the patterns of population structure overlap with patterns of the phenotype and with patterns in environmental variation .
We used an automated multi-phasic saccharification platform to phenotype the straw samples collected over two different growing seasons (spring and summer) in 2 years (2013 and 2014), . Only eight genotypes in the top of 25% for digestibility in 2013 were found in the top 25% in 2014. We attribute this to the environmental effects on the population including variation in day length requirement for different genotypes [53, 54]. Despite this apparent lack of correlation, we nevertheless identified seven QTL that were common across both years. There have been a number of studies using different genetic approaches to identify QTL for saccharification in different types of plant biomass. Only a few candidate genes have been identified and validated from association mapping for saccharification so far. In alfalfa, 20 simple sequence repeat (SSR) markers were predicted to be associated with fiber-related quality traits (heritability, H2 = 45 to 73.6); no specific candidate genes were reported but their finding helped to facilitate marker-assisted breeding programs . In sorghum, screening 703 SSR markers against low and high saccharification (glucose release by cellulase) pools identified two markers on the sorghum chromosomes 2 (23–1062) and 4 (74-508c) associated with saccharification yield; these markers were physically close to genes encoding plant cell wall synthesis enzymes such as xyloglucan fucosyltransferase (149 kb from 74-508c) and UDP-d-glucose 4-epimerase (46 kb from 23-1062) . In maize, recombinant inbred lines screened for lignin abundance and sugar yield established 11 QTL, using pyrolysis molecular-beam mass spectrometry to establish stem lignin content and an enzymatic hydrolysis assay to measure glucose and xylose yield . So far, several naturally occurring mutants with reduced lignin have been identified in cereals such as brown midrib (bm) mutants in maize , orange lemma (rob) mutants in barley , and “gold hull internode” (gh) mutant in rice . The phenotypes with reduction and changes in lignin characteristic of these mutants has shown their potential impacts on cell wall digestibility [58,59,60,61]. In the present work, we have used a direct GWAS approach in an association panel to screen for QTL in rice and found a number of genes already established as affecting saccharification, as well as other novel candidates.
By screening the regions in close proximity to the significant SNPs in the seven 2-year QTL, as well as two single-year QTL, we identified 12 candidate genes, which included the transcription factors, OsMYB26 TF, OsMYB58/63 L, and an ortholog of BdMYB48. The other candidate genes are OsHCT2, three homologs of HCT, Os4CL2, OsCESA11, OsAT8, OsAT10 (BAHD family), and OsIRX9 (a GT43). OsAT10, OsIRX9, and OsMYB58/63L were detected in both years of assays.
Association mapping based on examining individual genes and alleles at the loci responsible for lignin content has been applied to perennial ryegrass to identify significantly associated SNPs. An intronic SNP in the candidate gene LpCCR1 in poplar was found significantly associated with cell wall digestibility and Klason lignin content in stem material . Similarly, association mapping across 40 candidate genes associated with lignin content were characterized by pyrolysis molecular-beam mass spectrometry (PyMBMS), and 13 significant single marker associations were found for 9 candidate genes in black cottonwood (Populus trichocarpa). In the present study, we used the acetyl bromide method  to measure lignin in the association panel given that is faster, simpler and presents better recovery of lignin in different herbaceous tissues than Klason-  and thioglycolic acid-based methods . In our GWAS, we identified seven QTL regions, with one of them (CH11) coinciding with the one found in the GWAS for digestibility. This is in contrast with the results of Penning et al., in maize, where they did not find overlapping QTL for lignin abundance and saccharification . This common QTL in CH11 contains a homolog of HCT. Although there are no reports published about functional studies of any OsHCT, in Medicago, HCT expression determines stem digestibility . As well as candidates in monolignol synthetic pathways, some QTL contain putative candidate genes involved in lignin polymerization such as a cluster of seven peroxidase genes located next to the QTL peak on CH3 and a laccase gene in the QTL region CH11_18.8 ± 0.3. Homologues of these genes in Arabidopsis and tobacco are involved in determining lignin content [66,67,68].
The use of crop residue biomass provides a way to avoid competition between biofuel and food production for feedstock. Since rice straw is an abundantly available and globally underutilized resource, it provides an attractive feedstock for bio-refining . However, to take full advantage of this resource, we need to improve its processing potential and make it more easily digestible with industrial enzymes to allow the production of cost-competitive sustainable biofuels by fermentation. To this end, we have assembled a diversity panel from rice germplasms in Vietnam, which is the fourth largest rice exporter in the world . Rice is a cereal with a small-sized diploid genome (~ 430 Mb), well-developed molecular genetics tools, and has representative cell wall characteristics of grasses, making it an important crop from which to extrapolate knowledge on cell wall to other cereals . This is important because our understanding of the biosynthetic gene machinery and molecular structure of plant cell walls remains incomplete and the molecular basis of biomass digestibility even more so.
The availability of accurate genomic information in rice opens the possibility for precise and robust GWAS for multigenic traits such as saccharification. We produced a high-density SNP matrix for 151 rice cultivars that were in parallel phenotyped for straw digestibility and lignin content. We were able to identify a number of QTL for these parameters and proposed a number of candidate genes associated with some of these QTL. Besides these QTL, we could identify outstanding genotypes that can be included in breeding programs for biomass quality. The markers identified could be validated and used in a breeding program for the selection of high digestible straw genotypes with a potential increase of up to 48 kg ha−1 of sugar released (Additional file 4).
In conclusion, association mapping for two traits associated with rice straw quality succeeded in identifying genetic variation in genomic regions that contain plausible candidate genes affecting digestibility. This forward genetic approach is a powerful way to identify known and novel genes involved in these traits. Future work is nevertheless required to validate these candidates and carry out the functional studies required to confirm their roles in cell wall biosynthesis. Such validation will lead to the robust application of associated molecular markers in breeding programs aiming to select plants with improved digestibility and avoid grain yield penalties.
The association panel comprises 151 rice genotypes from Vietnam, which originated from two Oryza sativa subspecies: indica and tropical japonica. These genotypes were selected from a trial population derived from a breeding project at the Plant Biotechnology Division, Field Crops Research Institute (FCRI), 84 different genotypes which are reserved in the Germplasm Bank of FCRI, 29 high-quality genotypes which are popularly cultivated in different areas in Vietnam, and 38 landrace cultivars. These collected genotypes are expected to be highly inbred lines with homozygous genomic background. (See Additional file 5 for the list of the genotypes used). From these, a subset of 93 genotypes was grown in 2013 and the full panel was grown in 2014. Several field traits of this population from other trials such as plant height, flowering time, and grain yield are listed in Additional file 6.
The association panel was grown in the field, in Hai Duong province, the north of Vietnam (GPS coordinates are attached in Additional file 7). The first field trial, including 93 single plots, was sown in January and harvested in May 2013, and the second field trial, including 151 single plots, was sown in June and harvested in October 2014. Straw samples for each genotype were collected from five plants in the plot (plot size = 2 × 5 m = 10 m2, plant density/plot = 40/m2) after harvest for grain, and these five plants were kept separately as five replicates for each genotype. All samples were taken from the main tiller. The straw collected was dried for 2 days in the open air in Vietnam. Straw samples were kept in separate paper bags and sent to the Centre for Novel Agricultural Product (CNAP), University of York, UK, for characterization. The rice stems (minus nodes) were cut into small pieces, then ground to a fine powder and stored. These samples were used for different assays including saccharification, and total lignin content.
Phenotyping for cell wall traits
The saccharification for 93 genotypes in 2013 and 151 genotypes in 2014 was analyzed using an automated platform as described in Gomez et al. . Samples of five plants from the same genotypes were treated as five separated replicates. In brief, ground straw samples were formatted in 96 well plates, in randomized positions, with four technical replicates of 4 mg for each sample using a robotic platform (Labman Automation, Stokesley, North Yorkshire, UK) . The samples were analyzed using a liquid handling robot (Tecan LTD, UK), which performed a water pre-treatment at 94 °C for 20 min, followed by an enzymatic hydrolysis during 8 h at 50 °C. The enzyme used for saccharification was a 4:1 mixture of Celluclast and Novozyme 188 (Novozymes). The saccharification was estimated by measuring the reducing sugars released from the biomass material. This was done with a colorimetric assay using 3-methy-2-benzothiazolinone hydrazone method (MBTH) [39, 52]. Three standards of 50, 100 and 150 nmol glucose (three replicates each) and filter paper disks (four replicates)—as control—were used to account for any change in enzyme concentration or condition through time.
Total lignin content
Lignin content was quantified using acetyl bromide . Three replicates from each straw sample were used for lignin determination. Four mg of ground samples was weighed into 2 ml tubes and 250 µl freshly prepared acetyl bromide solution (25% v/v acetyl bromide/75% glacial acetic acid) was added before incubating at 50 °C for 2 h, followed by a further 1 h with vortexing every 15 min to solubilize the lignin. Samples were then cooled to room temperature before being transferred to 5 ml volumetric flasks. Subsequently, 1 ml of 2 M NaOH was added, followed by 175 µl freshly prepared 0.5 M hydroxylamine hydrochloride. After shaking, the samples were then made up to 5 ml with glacial acetic acid, and the 280 nm absorbance was read using a Shimadzu UV-1800 spectrophotometer. Lignin content (µg.mg-1 cell wall) was determined using the following formula: (Absorbance ÷ (coefficient × path length)) × ((total volume × 100%) ÷ biomass weight)). The coefficient for grass (17.75) was used for rice .
The analysis of the raw saccharification and lignin content data took into account sources of non-genetic variation relating to field and laboratory factors . The genotype means used in GWAS are therefore adjusted rather than raw means. All statistical analysis were obtained from using R-package asreml (https://www.vsni.co.uk/software/asreml-r) in R studio (https://www.rstudio.com/). To avoid the population structure misleading GWAS analysis, we decided to remove the japonica subpopulation. The trait file of indica genotype used in GWAS is listed in Additional file 8.
The genotypic data was produced by genotyping by sequencing (GBS) assays. 172 rice genotypes were sequenced on an Illumina platform at the Rice Laboratory, Cornell University, USA. The GBS assay involved library construction, sequencing, data analysis, and SNP detection from HapMap, following the methods described in . The GBS analysis pipeline (Tassel Version: 3.0.166, date: April 17, 2014) was applied to analyze the data after sequencing . The report of the GBS is attached as Additional file 9.
Population stratification using GAPIT
To study stratification of the population, a phylogenetic tree was created from GAPIT (Fig. 2) [37, 38]. This was determined based on the kinship matrix, which accounts for the degree of genetic relatedness or coefficient of relationship between individual members of the population. Kinship among genotypes was calculated using an R implementation (www.R-project.org) available as part of GAPIT software libraries [38, 75]. Using output distances, clustering was performed in R using the internal package “hclust” with default parameters.
Mixed linear model (MLM) using tassel
Based on the genotypic data stored in the HapMap and the phenotypic data collected from the analysis of saccharification from 2013 and 2014 harvest (sugar released) and lignin content from 2014 harvest (% of total lignin), GWAS was performed by merging genotype and each phenotype to examine the association between the markers and the studied trait to identify the quantitative trait loci (QTL).
GWAS was performed using the compressed mixed linear model approach, which includes both fixed and random effects [37, 76] carried out by TASSEL  that was also implemented in the Efficient Mixed-Model Association (EMMA)  for performing association mapping while simultaneously correcting for relatedness and population structure.
The data were merged and manipulated with Tassel 3.0 . The Q Matrix file was created, using PSIKO (https://www.uea.ac.uk/computing/pisko) on a Linux platform. The proportion of the phenotypic variation explained (PVE) by each marker was estimated by the relevant R2 in TASSEL [41, 78].
The significant level for association with a SNP in the Fig. 7 was based on FDR value. Please find the formula for calculating FDR as follows. FDR = pvalue × (n/rank), in which n = total number of SNP, and rank = ranking of SNP based on p value. FDR < 0.05 = significant (−log 10 of the last significant value = 5% FDR cutoff).
Availability of supporting data
All supporting data are provided with this submission and additional files are detailed below.
Analysis of variance
Superfamily named after the first four members of the family to be biochemically characterized (BEAT: benzylalcohol acetyltransferases, AHCT: anthocyanin hydroxycinnamoyl transferase, HCBT: anthranilate hydroxycinnamoyl/benzoyl transferase, DAT: Deactylvindoline acetyltransferase)
False discovery rate
Genotyping by sequencing
Genome-wide association study
Hydroxycinnamoyl-CoA shikimate/quinate transferase
- irx :
Irregular xylem mutant
Logarithm of the odds
3-Methy-2 benzothiazolinone hydrazone
Mixed linear model
Principal component analysis
Pearson correlation coefficient
Phenotypic variance explained
Quantitative trait loci
Recombinant inbred lines
Single nucleotide polymorphism
Simple sequence repeat
Domínguez-Escribá L, Porcar M. Rice straw management: the big waste. Biofuels Bioprod Biorefin. 2010;4(2):154–9.
Chen J, Li C, Ristovski Z, Milic A, Islam SYG, Wang S, et al. A review of biomass burning: emissions and impacts on air quality, health and climate in China. Sci Total Environ. 2017;579:1000–344.
Gomez LD, Bristow JK, Statham ER, McQueen-Mason SJ. Analysis of saccharification in Brachypodium distachyon stems under mild conditions of hydrolysis. Biotechnol Biofuels. 2008;1:1–15.
Fagard M, Höfte H, Vernhettes S. Cell wall mutants. Plant Physiol Biochem. 2000;38(1):15–25.
Vogel J. Unique aspects of the grass cell wall. Curr Opin Plant Biol. 2008;11(3):301–7.
Smith PJ, Wang HT, York WS, Peña MJ, Urbanowicz BR. Designer biomass for next-generation biorefineries: leveraging recent insights into xylan structure and biosynthesis. Biotechnol Biofuels volume. 2017;10:286.
Schendel RR, Meyer MR, Bunzel M. Quantitative profiling of feruloylated arabinoxylan side-chains from graminaceous cell walls. Front Plant Sci. 2015;6:1249.
Ralph J. Hydroxycinnamates in lignification. Phytochem Rev. 2010;9:65–83.
Carroll A, Somerville C. Cellulosic biofuels. Annu Rev Plan Biol. 2009;60:165–82.
Mitchell RAC, Dupree P, Shewry PR. A novel bioinformatics approach identifies candidate genes for the synthesis and feruloylation of arabinoxylan. Am Soc Plant Biol. 2007;144(1):43–53.
Vega-Sánchez ME, Ronald PC. Genetic and biotechnological approaches for biofuel crop improvement. Curr Opin Biotechnol. 2010;21(2):218–24.
Daly P, McClellan C, Maluk M, Oakey H, Lapierre C, Waugh R, et al. RNAi-suppression of barley caffeic acid O-methyltransferase modifies lignin despite redundancy in the gene family. Plant Biotechnol. 2019;17(3):594–607.
ClaireHalpin. Lignin engineering to improve saccharification and digestibility in grasses. Current Opinion in Biotechnology. 2019 April; 56: 223–229.
Chiniquy D, Sharma V, Schultink A, Baidoo EE, Rautengarten C, Cheng K, et al. XAX1 from glycosyltransferase family 61 mediates xylosyltransfer to rice xylan. Proc Natl Acad Sci USA. 2012;109(42):17117–22.
Marriott PE, Sibout R, Lapierre C, Fangel JU, Willats WGT, Hofte H, et al. Range of cell-wall alterations enhance saccharification in Brachypodium distachyon mutants. Proc Natl Acad Sci USA. 2014;111(40):14601–6.
Bartley LE, Peck ML, Kim SR, Ebert B, Manisseri C, Chiniquy DM, et al. Overexpression of a BAHD acyltransferase, OsAt10, alters rice cell wall hydroxycinnamic acid content and saccharification. Plant Physiol. 2013;161(4):1615–33.
The Royal Society. Sustainable biofuels: prospects and challenges—a royal society report. London: The Royal Society, European Technology and Innovation Platform; 2008.
Brachi B, Morri GP, Borevitz JO. Genome-wide association studies in plants: the missing heritability is in the field. Genome Biol. 2011;12(10):232.
Flint-Garcia SA, Thornsberry JM, Buckler ES. Structure of linkage disequilibrium in plants. Annu Rev Plant Biol. 2003;54:357–74.
Zhu C, Gore M, Buckler ES, Yu J. Status and prospects of association mapping in plants. Plant Genome. 2008;1(1):5–20.
Truntzler M, Barrière Y, Sawkins MC, Lespinasse D, Betrán J, Charcosset A, et al. Meta-analysis of QTL involved in silage quality of maize and comparison with the position of candidate genes. Theor Appl Genet Vol. 2010;121:1465–82.
Penning BW, Sykes RW, Babcock NC, Dugard CK, Held MA, Klimek JF, et al. Genetic determinants for enzymatic digestion of lignocellulosic biomass are independent of those for lignin abundance in a maize recombinant inbred population. Plant Physiol. 2014;165(4):1475–87.
Liu B, Gómez LD, Hua C, Sun L, Ali I, Huang L, et al. Linkage mapping of stem saccharification digestibility in rice. PLoS ONE. 2016;11(7):e0159117.
Pasam RK, Sharma R, Malosetti M, Eeuwijk FAV, Haseneyer G, Kilian B, et al. Genome-wide association studies for agronomical traits in a world wide spring barley collection. BMC Plant Biol. 2012;12:16.
Somers DJ, Banks T, DePauw R, Fox S, Clarke J, Pozniak C, et al. Genome-wide linkage disequilibrium analysis in bread wheat and durum wheat. Genome. 2007;50(6):557–67.
Alqudah MA, Sallam A, Baenziger PS, Börner A. GWAS: Fast-forwarding gene identification and characterization in temperate Cereals: lessons from Barley—a review. J Adv Res. 2019;22:119–35.
Hästbacka O, Chapelle ADL, Kaitila I, Sistonen P, Weaver A, Lander E. Linkage disequilibrium mapping in isolated founder populations: diastrophic dysplasia in Finland. Nat Genet. 1992;2:204–11.
Huang X, Han B. Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol. 2014;65:531–51.
Wang YH, Poudel DD, Hasenstein KH. Identification of SSR markers associated with saccharification yield using pool-based genome-wide association mapping in sorghum. Genome. 2011;54(11):883–9.
Slavov G, Allison G, Bosch M. Advances in the genetic dissection of plant cell walls: tools and resources available in Miscanthus. Front Plant Sci. 2013;4:217.
Wang Z, Qiang H, Zhao H, Xu R, Zhang Z, Gao H, et al. Association mapping for fiber-related traits and digestibility in Alfalfa (Medicago sativa). Plant Sci. 2016;7:331.
Allwright MR, Payne A, Emiliani G, Milner S, Viger M, Rouse F, et al. Biomass traits and candidate genes for bioenergy revealed through association genetics in coppiced European Populus nigra (L.). Biotechnol Biofuels. 2016;9:195.
Oakey H, Shafiei R, Comadran J, Uzrek N, Cullis B, Gomez LD, et al. Identification of crop cultivars with consistently high lignocellulosic sugar release requires the use of appropriate statistical design and modelling. Biotechnol Biofuels. 2013;6:185.
Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;967(11):961.
Yonemaru J, Ebana K, Yano M. HapRice, an SNP haplotype database and a web tool for rice. Plant Cell Physiol. 2014;55(1):e9.
McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Victor J, Ulata GZ, et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. PNAS. 2009;106(30):12273–8.
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–60.
Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28(18):2397–9.
Gomez LD, Whitehead C, Barakate A, Halpin C, McQueen-Mason SJ. Automated saccharification assay for determination of digestibility in plant materials. Biotechnol Biofuels. 2010;3:23.
Johnson DB, Moore WE, Zank LC. The spectrophotometric determination of lignin in small wood samples. Tappi. 1961;44(11):793–8.
Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23(19):2633–5.
Mather KA, Caicedo AL, Polato NR, Olsen KM, McCouch S, Purugganan MD. The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics. 2007;177(4):2223–32.
Piston F, Uauy C, Dubcovsky J. Down-regulation of four putative arabinoxylan feruloyl transferase genes from family PF02458 reduces ester-linked ferulate content in rice cell walls. Planta. 2010;231(3):677–91.
Chiniquy D, Varanasi P, Ronald PC. Three Novel Rice Genes Closely Related to the Arabidopsis IRX9, IRX9L, and IRX14 Genes and Their Roles in Xylan Biosynthesis. Front Plant Sci. 2013;10(4):83.
Hirano K, Kondo M, Aya K, Miyao A, Sato Y, Antonio BA, et al. Identification of transcription factors involved in rice secondary cell wall formation. Plant Cell Physiol. 2013;54(11):1791–802.
Noda S, Koshiba T, Hattori T, Yamaguchi M, Suzuki S, Umezawa T. The expression of a rice secondary wall-specific cellulose synthase gene, OsCesA7, is directly regulated by a rice transcription factor, OsMYB58/63. Planta. 2015;242(3):589–600.
Chen F, Dixon RA. Lignin modification improves fermentable sugar yields for biofuel production. Nat Biotechnol. 2007;25:759–61.
Marie B, Monties B, Montagu MV, Boerjan W. Biosynthesis and Genetic Engineering of Lignin. Crit Rev Plant Sci. 1998;17(2):125–97.
Shmulsky R, Jones PD. Forest products and wood science an introduction. 6th ed. Chichester: Wiley-Blackwell; 2011.
Rowell RM, Pettersen R, Tshabalala MA. Cell wall chemistry. In: Rowell RM, editor. Handbook of wood chemistry and wood composites. Boca Raton, Taylor and Francis Group: CRC Press; 2012. p. 33–72.
Abramson M, Shoseyov O, Hirsch S, Shani Z. Genetic modifications of plant cell walls to increase biomass and bioethanol production. In: Lee JW, editor. Advanced biofuels and bioproducts. New York: Springer Science+Business Media; 2012. p. 315–338.
Gomez LD, Whitehead C, Roberts P, McQueen-Mason SJ. High-throughput Saccharification assay for lignocellulosic materials. J Vis Exp. 2011;53:3240.
Vergara BS, Chang TT. The flowering response of the rice plant to photoperiod: a review of the literature. 4th ed. Los Banos: International Rice Research Institute; 1985.
Krishnan P, Ramakrishnan B, Reddy KR, Reddy VR. High-temperature effects on rice growth, yield, and grain quality. Adv Agron. 2011;111:87–206.
Tang HM, Liu S, Hill-Skinner S, Wu W, Reed D, Yeh C, et al. The maize brown midrib2 (bm2) gene encodes a methylenetetrahydrofolate reductase that contributes to lignin accumulation. Plant J. 2013;77(3):380–92.
Nordgen. nordgen.org/. https://www.nordgen.org/bgs/system/export_pdf.php?bgs=254. Accessed 31 Jan 2020
Zhang K, Qian Q, Huang Z, Wang Y, Li M, Hong L, et al. GOLD HULL AND INTERNODE2 encodes a primarily multifunctional cinnamyl-alcohol dehydrogenase in rice. Plant Physiol. 2006;140(3):972–83.
Barrière Y, Chavigneau H, Delaunay S, Courtial A, Bosio M, Lassagne H, et al. Different mutations in the ZmCAD2 gene underlie the maize brown-midrib1 (bm1) phenotype with similar effects on lignin characteristics and have potential interest for bioenergy production. Maydica. 2013;58(1):6–20.
Chen Y, Liu H, Ali F, Scott MP, Ji Q, Frei UK. Lübberstedt T Genetic and physical fine mapping of the novel brown midrib gene bm6 in maize (Zea mays L.) to a 180 kb region on chromosome 2. Theor Appl Genet. 2012;125:1223–355.
Stephens J, Halpin C. Barley ‘orange lemma’ is a mutant in the CAD gene. 2008. unpublished poster.
Koshiba T, Murakami S, Hattori T, Mukai M, Takahashi A, Miyao A, et al. CAD2 deficiency causes both brown midrib and gold hull and internode phenotypes in Oryza sativa L. cv. Nipponbare. Plant Biotechnol. 2013;30(4):365–73.
Parijs FRDV, Ruttink T, Haesaert G, Roldán-Ruiz I, Muylle H. Association mapping of LpCCR1 with lignin content and cell wall digestibility of perennial ryegrass. In: Roldán-Ruiz I, Baert J, Reheul D, editors. Breeding in a world of scarcity. Berlin: Springer International Publishing; 2016. p. 219–224.
Hatfield RD, Grabber J, Ralph J, Brei K. Using the acetyl bromide assay to determine lignin concentrations in herbaceous plants: some cautionary notes. J Agric Food Chem. 1999;47(2):628–32.
Bunzel M, Schüßler A, Saha GT. Chemical characterization of Klason lignin preparations from plant-based foods. J Agric Food Chem. 2011;59(23):12506–13.
Suzuki S, Suzuk Y, Yamam IN, Hattori T, Sakamoto M, Umezawa T. High-throughput determination of thioglycolic acid lignin from rice. Plant Biotechnol. 2009;26(3):337–40.
Blee KA, Choi JW, O'Connell AP, Schuch W, Lewis NG, Bolwell GP. A lignin-specific peroxidase in tobacco whose antisense suppression leads to vascular tissue modification. Phytochemistry. 2003;64(1):163–76.
Berthet S, Demont-Caulet N, Pollet B, Bidzinski P, Cézard L, Bris PL, et al. Disruption of LACCASE4 and 17 results in tissue-specific alterations to lignification of arabidopsis thaliana stems. Plant Cell. 2011;23(3):1124–37.
Zhao Q, Nakashima J, Chen F, Yin Y, Fu C, Yun J, et al. Laccase is necessary and nonredundant with peroxidase for lignin polymerization during vascular development in Arabidopsis. Plant Cell. 2013;25(10):3976–87.
Binod P, Sindhu R, Singhania RR, Vikram S, Devi L. Bioethanol production from rice straw: An overview. Biores Technol. 2010;101:4767–74.
Workman D. World's Top exports; 2016. http://www.worldstopexports.com/rice-exports-country/http://www.worldstopexports.com/rice-exports-country/. Accessed 30 May 2016.
Yuan Q, Quackenbush J, Sultana R, Pertea M, Salzberg SL, Buell CR. Rice bioinformatics analysis of rice sequence data and leveraging the data to other plant species. Plant Physiol. 2001;125(3):1166–74.
Fukushima RS, Hatfield R. Comparison of the acetyl bromide spectrophotometric method with other analytical lignin methods for determining lignin concentration in forage samples. J Agric Food Chem. 2004;52(12):3713–20.
Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379.
Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE. 2014;9(2):e90346.
VanRaden PM. Efficient Methods to Compute Genomic Predictions. J Dairy Sci. 2008;91(11):4414–23.
Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;203–208:38.
Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178(3):1709–23.
Sun G, Zhu C, Kramer MH, Yang SS, Song W, Piepho HP, et al. Variation explained in mixed-model association mapping. Heredity. 2010;405:333–40.
Guo K, Zou W, Feng Y, Zhang M, Zhang J, Tu F, et al. An integrated genomic and metabolomic framework for cell wall biology in rice. BMC Genom. 2014;15(1):596
Katiyar A, Smita S, Lenka SK, Rajwanshi R, Chinnusamy V, Bansal KC. Genome-wide classification and expression analysis of MYB transcription factor families in rice and Arabidopsis. BMC Genom. 2012;13:544.
Handakumbura P. Understanding the transcriptional regulation of secondary cell wall biosynthesis in the model grass Brachypodium distachyon. Massachusetts, US: University of Massachusetts - Amherst; 2014.
Bruce Alberts AJ, Lewis J, Raff M, Roberts K, Walter P. The plant cell wall. In: Molecular biology of the cell, 4th edn. Newyork: Garland Science; 2002.
Hazen SP, Scott-Craig JS, Walton JD. Cellulose synthase-like genes of rice. Plant Physiol. 2002;128(2):336–40.
The authors acknowledge Dr. Francisco José Ostos Garrido from Instituto Agricultura Sostenible–CSIC, who provided some guidance on analyzing raw phenotypic data in R program. We are grateful to Dr. Pete Hedley from The James Hutton Institute, Prof. Susan McCouch and Dr. Namrata Singh at the University of Cornell, who helped to prepare and progress the GBS assay. We also acknowledge Dr. Zhesi He at CNAP, University of York, who helped to create the Q matrix file to be used for GWAS analysis. We would like to thank Dr. Swen Langer who helped to run the ferulic and p-coumarice assay at CNAP.
This research project was funded by Biotechnology and Biological Sciences Research Council (BBSRC) (Grant numbers BB/P022499/1 and BB/N0136689/1) and the Ministry of Science and Technology (MOST) in Vietnam
Ethics approval and consent to participate
Consent for publication
The authors give their consent for the publication of the manuscript and all supporting documents and data.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Containing Q-Q plots of GWAS
Containing the list of genes locate in the regions of digestibility QTL
Containing the list of genes locate in the regions of Lignin QTL
Detailed calculation for estimated gains of using a marker in breeding
Containing the list of rice line genotypes from GWAS population
Agronomic trait data of sequenced genotypes
Field trial GPS coordinates
Containing data of studied traits
Containing a report of Genotyping by Sequencing (GBS) – Reference Pipeline
About this article
Cite this article
Nguyen, D.T., Gomez, L.D., Harper, A. et al. Association mapping identifies quantitative trait loci (QTL) for digestibility in rice straw. Biotechnol Biofuels 13, 165 (2020). https://doi.org/10.1186/s13068-020-01807-8
- Rice (oryza sativa)