Interest in the detailed lignin and polysaccharide composition of plant cell walls has surged within the past decade partly as a result of biotechnology research aimed at converting biomass to biofuels [1, 2]. Numerous studies have established the link between the relative amount of lignin and cellulose in vascular tissues and the accessibility of plant cell walls to chemical, enzymatic, and microbial digestion [2–4]. Comparisons of different species , and transgenic studies in which synthesis of cell wall components is genetically modified [3, 4, 6], are particularly useful in identifying these linkages.
High-resolution, solution-state 2D 1H–13C HSQC NMR spectroscopy has proven to be an effective tool for rapid and reproducible fingerprinting of the numerous polysaccharides and lignin components in unfractionated plant cell wall materials [7–11]. Recent advances in “ball-milled” sample preparations dissolved or swelled in organic solvents have enabled unfractionated material to be profiled without the need for component isolation [12, 13]. The heterogeneous and highly polymeric nature of the ball-milled cell wall material, in which polymers are of significantly lower degree of polymerization (DP) than in the intact cell wall (where DP of cellulose is ~7000-15000) , results in spectra with broad linewidths and considerable complexity. However, the dispersion provided by the two-dimensional correlation of protons to their attached 13C nuclei, at natural abundance, enables resolution and assignment of numerous lignin, cellulose, and hemicellulosic components. The 2D 1H–13C HSQC experiment is thus a powerful tool for cell wall profiling based on our ability to simultaneously identify and comparatively quantify numerous components within spectra generated with relatively short acquisition times (15–20 min/sample, but up to 5 h if excellent signal-to-noise and the ability to detect minor components is desirable).
As sample preparation and data acquisition methods have improved [10, 11], the task of spectral analysis has become a bottleneck in large studies. NMR-based chemometrics is one data analysis approach recently applied to investigate structural/compositional differences between wood samples from Populus. Chemometrics is a multivariate approach with an extensive history in metabonomics [15, 16]. General strengths of a multivariate approach that simultaneously examines features from different sample groups include the ability to detect subtle patterns among features across sample groups, albeit sometimes with confusion by artifacts , and assess the relative importance of each feature for group discrimination .
NMR-based chemometrics is characterized by a sequence of steps involving: i) NMR data processing, including baseline correction if necessary; ii) generation of a feature set usually by selecting intensity values on each peak or summing over segmented regions (spectral binning); iii) production of a data table in which each sample represents a row and the features are columns; iv) normalization (row-based) and scaling (column-based) of the data; and v) multivariate statistical modeling. The greatest pitfalls lie in feature selection (step ii). Originally developed as a rapid and consistent method to generate data sets automatically and handle problems of peak “drift”, spectral binning unfortunately reduces spectral resolution and can generate artifacts in crowded spectra where the boundary of a bin may lie at the center of a signal. Even when the full resolution spectrum is used without binning, the common technique of analyzing 2D data by generating a 1D row vector from the 2D grid results in a loss of correlation information between the 1H and 13C intensity values during the analysis process, although this may be retained by indexing the 1D data so that 2D spectra can be recreated, including after, for example, principal component analysis .
An alternative to peak-based or bin-based feature selection is to mathematically model the data and use the modeled parameters as features for subsequent analysis. If the model can efficiently represent the relevant features of the data, the modeling step dramatically reduces the number of columns in the data matrix (data reduction) without loss of relevant information or generation of artifacts. Recently, spectral deconvolution using fast maximum-likelihood reconstruction (FMLR) was shown to accurately quantify metabolites in 2D 1H–13C HSQC spectra [17, 18]. FMLR constructs the simplest time-domain model (e.g., the model with the fewest number of signals and parameters) whose frequency spectrum matches the visible regions of the spectrum obtained from identical Fourier processing of the data [19, 20].
Spectral analysis of 2D 1
C HSQC NMR data by FMLR would appear to be an attractive approach for high-throughput plant cell wall profiling in the following respects:
FMLR has already been shown to accurately model the characteristics of complex 2D 1H–13C HSQC solution spectra , and can be performed with minimal input information and operator intervention (moderately high throughput).
Because of the high spectral dispersion inherent in 2D 1H–13C NMR data, the detailed but localized amplitude and frequency information derived from FMLR should be easily combinable with assigned region-of-interest tables to generate the relative concentration of cell wall components in each sample (cell wall component profiles). Previous work has shown the utility of region of interest (ROI)-segmentation in quantitative 2D 1H–13C NMR studies [21, 22].
ROIs that correspond to a resolved peak or peak cluster can be defined even when the NMR assignment is tentative or unknown. The cell wall component profiles are thus suitable for both untargeted and targeted profiling.
Simple visual inspection of the cell wall component profiles might suffice to identify patterns of enrichment and depletion of various components between sample groups.
The cell wall component profiles are also a robust feature set for input into multivariate analysis.
We apply here the spectral analysis methodology of FMLR with ROI-based segmentation to a large (98 samples) 2D 1H–13C NMR study of Arabidopsis lignin mutants and controls involving 20 sample groups (10 consolidated groups). Our focus here is not on biological conclusions to be drawn from the study (this is published concomitantly) , but on the methodology and software implementation of data analysis for powerful cell wall profiling by NMR.