### Static model

The construction of the compartmental model and methods for estimating the steady-state flux distribution for *Brachypodium* have been discussed elsewhere [13]. To estimate the cinnamic acid (CA) and *p*-coumaric acid (*p*CA) input that is actually taken up by the plants from the medium and enters the lignin pathway, we used the steady-state model and recorded all input values of CA and *p*CA that led to lignin and wall-bound phenolic pro**file**s matching the experimental data. These data came from labeling experiments in wild-type lines and include labeled and unlabeled lignin monomer profiles, as well as the label incorporation level in wall-bound *p*-coumaric acid and ferulic acid [27]. We used the estimated flux distribution, as well as CA and *p*CA input values to parameterize the dynamic model.

### Construction of the dynamic model

#### Model format

We designed our dynamic model as a system of ordinary differential equations (ODE), taking the metabolites as the states of the system. The enzymatic fluxes are modeled under the tenets of biological systems theory (BST) in generalized mass action (GMA) format [16,17,18,19]. These base methods are explained in detail in our previous work on switchgrass [11]. The compartmental model proposed here additionally includes diffusion fluxes, which transfer mass between the compartments (Fig. 2). These net diffusion fluxes are modeled in the format

$$D_{j} \, = \,d_{j} \left( {X_{i} \, - \,Y_{i} } \right),$$

(1)

where \(X_{i}\) and \(Y_{i}\) are corresponding pools of the same metabolite in the cytosol and outer ER surface compartments, respectively, and \(d_{j}\) is the rate constant of diffusion.

Due to the lack of specific literature on metabolite profiles, and similar to our approach for switchgrass, the concentrations are normalized with respect to the wild-type steady-state value, and the base value is set to 100:

$$\begin{aligned} X_{i} \, = \,\frac{100}{{Z_{{i , {\text{SS}}}} }} \cdot Z_{{i , {\text{cyt}}}} , \hfill \\ Y_{i} \, = \,\frac{100}{{Z_{{i , {\text{SS}}}} }} \cdot Z_{{i , {\text{ER}}}} , \hfill \\ \end{aligned}$$

(2)

where \(Z_{{i , {\text{SS}}}}\) is the nominal total concentration of metabolite \(i\) at steady state, and \(Z_{{i , {\text{cyt}}}}\) and \(Z_{{i,{\text{ER}}}}\) are local concentrations of metabolite \(i\) in the respective compartments. Expressed differently, \(X_{\text{i}}\) and \(Y_{\text{i}}\) are the local, normalized concentrations of metabolite \(i\) in each compartment. At the steady state, the total concentration of each metabolite is equal to 100,

$$r \cdot X_{i} \, + \,\left( {1\, - \,r} \right) \cdot Y_{i} \, = \,100 ,$$

(3)

where \(r\) is the portion of the cytosol compartment volume and \(1\, - \,r\) is the portion of the ER compartment volume with respect to total volume. Hence, \({r \mathord{\left/ {\vphantom {r {\left( {1\, - \,r} \right)}}} \right. \kern-0pt} {\left( {1\, - \,r} \right)}}\) is the ratio of volumes earlier introduced as \(R\). We set \(r\) equal to 0.9, which corresponds to the cytosol compartment accounting for 90% and the ER outer surface compartment to 10% of the cell volume within which the lignin pathway is active.

#### Equations for labeled and unlabeled substrates

We extended the dynamic base model to a system accounting for labeled and unlabeled metabolites, which doubled the number of ODEs. Each enzymatic flux \(V_{j}\) is calculated using the total metabolite pool whether labeled or unlabeled; in other words, the enzymes are assumed to be blind to labels:

$$V_{j} \, = \,a_{j} \prod\limits_{i = 1}^{n} {\left( {X_{i,L} \, + \,X_{i,UL} } \right)^{{g_{i,j} }} } \prod\limits_{i = n + 1}^{n + m} {X_{i}^{{h_{i,j} }} } .$$

(4)

where \(a_{j}\) is the rate constant, \(X_{i,L}\) and \(X_{i,UL}\) (\(1 < i < n\)) are labeled and unlabeled metabolites, respectively, and \(g_{i,j}\)’s are kinetic orders. For \(n\, + \,1\, < \,i\, < \,n\, + \,m\), each variable \(X_{i}\) represents the amount of an enzyme involved in the reaction. It is customary to assign enzyme kinetic orders, \(h_{i,j}\), equal to 1 if the enzyme is involved in the reaction and 0 if it is not. \(X_{i}\) is replaced by \(Y_{i}\) if the reaction is taking place in the ER compartment. The labeled and unlabeled flux portions are calculated based on how rich in labels the substrate pool of the flux is (Eq. 5).

$$\begin{aligned} V_{j,L} \, = \,\frac{{X_{s,L} }}{{X_{s} }} \cdot V_{j} , \hfill \\ V_{j,UL} \, = \,\frac{{X_{s,UL} }}{{X_{s} }} \cdot V_{j} , \hfill \\ \end{aligned}$$

(5)

where \(X_{s}\) is the substrate of the reaction \(V_{j}\). To compute the rate of change in each metabolite, the labeled fluxes are used to establish the ODEs for labeled metabolites, and the unlabeled fluxes are used to formulate the differential equations of the unlabeled metabolites. As an example, the equations for labeled and unlabeled *p*-coumaroyl-CoA in the ER compartment are written as

$$\begin{aligned} & \frac{{dY_{5,L} }}{dt}\, = \,\frac{{Y_{4,L} }}{{Y_{4} }} \cdot V_{25} \, - \,\frac{{Y_{5,L} }}{{Y_{5} }} \cdot V_{26} \, - \,\frac{{Y_{5,L} }}{{Y_{5} }} \cdot V_{31} \, - \,\frac{r}{1\, - \,r}d_{9} \left( {Y_{5,L} \, - \,X_{5,L} } \right), \\ & \frac{{dY_{5,UL} }}{dt}\, = \,\frac{{Y_{4,UL} }}{{Y_{4} }} \cdot V_{25} \, - \,\frac{{Y_{5,UL} }}{{Y_{5} }} \cdot V_{26} \, - \,\frac{{Y_{5,UL} }}{{Y_{5} }} \cdot V_{31} \, - \,\frac{r}{1 - r}d_{9} \left( {Y_{5,UL} \, - \,X_{5,UL} } \right). \\ \end{aligned}$$

(6)

#### Unknowns of the system and parameterization

The values of the kinetic orders, \(g_{i,j}\), rate constants,\(a_{j}\), \(d_{j}\), and steady-state values of the parallel metabolite pools, \(X_{{i,{\text{SS}}}}\) and \(Y_{{i,{\text{SS}}}}\), are unknown and need to be estimated. To estimate the unknowns, we used Monte Carlo sampling, generating a random set sampled from the space ℝ^{p}, and within the biologically reasonable ranges, where *p* is the number of unknowns.

Most of the metabolite steady-state values can be algebraically computed as follows. We can rewrite Eq. 1 for the labeled portion of diffusion flux as

$$D_{j,L} \, = \,d_{j} \left( {L_{{X_{i} }} \cdot X_{i} \, - \,L_{{Y_{i} }} \cdot Y_{i} } \right),$$

(7)

where \(L_{{X_{i} }}\) and \(L_{{Y_{i} }}\) are label incorporation levels in \(X_{i}\) and \(Y_{i}\) pools, which are known from the labeled flux distributions [13]. From Eqs. 1, 3 and 7 together, the three unknowns \(d_{j}\),\(X_{i}\) and \(Y_{i}\) can be algebraically computed for each diffusion flux. Due to the stoichiometry of the pathway, four of the metabolites, namely cinnamic acid (\(X_{2}\),\(Y_{2}\)), caffeoyl shikimate (\(X_{9}\), \(Y_{9}\)), 5-OH-coniferaldehyde (\(X_{16}\), \(Y_{16}\)), and 5-OH-coniferyl alcohol (\(X_{17}\), \(Y_{17}\)), have the same level of label incorporation in both of their cytosol and ER compartments. That leads to a degenerate case where Eqs. 1 and 7 are linearly dependent (\(L_{{X_{i} }} \, = \,L_{{Y_{i} }}\)), and steady-state values cannot be computed algebraically. Hence, the steady-state values of these metabolite pools, \(\left( {X_{{2,{\text{SS}}}} ,Y_{{2,{\text{SS}}}} } \right)\), \(\left( {X_{{9,{\text{SS}}}} ,Y_{{9,{\text{SS}}}} } \right)\), \(\left( {X_{{16,{\text{SS}}}} ,Y_{{16,{\text{SS}}}} } \right)\), \(\left( {X_{{17,{\text{SS}}}} ,Y_{{17,{\text{SS}}}} } \right)\), need to be estimated. Since the total concentrations of each pair of metabolites (\(r \cdot X_{i} \, + \,\left( {1\, - \,r} \right) \cdot Y_{i}\)) are equal to 100 at the steady state, it suffices to estimate either \(X_{{i,{\text{SS}}}}\) or \(Y_{{i,{\text{SS}}}}\) for parallel metabolite pools, and use Eq. 3 to compute the other. Given the steady-state flux distribution, the diffusion rate constants, \(d_{j}\), are then computed using Eq. 1. To sample the steady-state metabolite values, we considered the direction of diffusion between \(X_{{i,{\text{SS}}}}\) and \(Y_{{i,{\text{SS}}}}\). As Eq. 3 shows, the weighted average of \(X_{{i,{\text{SS}}}}\) and \(Y_{{i,{\text{SS}}}}\) is equal to 100 at the steady state, which implies that one of the pools is greater than or equal to 100, and the other smaller than or equal to 100. The direction of diffusion determines the pool smaller than 100 since the diffusion flux pours into that pool at the steady state. This way, we obtain an interval for sampling bounded between zero and 100. Therefore, we sampled the array

$$\left[ {Y_{{2,{\text{SS}}}} ,X_{{9,{\text{SS}}}} ,X_{{16,{\text{SS}}}} ,X_{{17,{\text{SS}}}} } \right]$$

(8)

within [0,100] and computed the corresponding parallel pools, \(\left[ {X_{{2,{\text{SS}}}} ,Y_{{9,{\text{SS}}}} ,Y_{{16,{\text{SS}}}} ,Y_{{17,{\text{SS}}}} } \right]\), using Eq. 3. In fact, the pools of metabolites in Eq. 8 are the pools into which the diffusion fluxes pour at the steady state (Fig. 2).

Kinetic orders *g*_{i,j} were sampled from the interval [0,1] if metabolite *i* was a substrate or activator of flux *j* (except for effluxes *E*_{1}*, E*_{2}*, E*_{3} and *E*_{4} for which we used the interval [0, 4]), and [− 4,0] if they acted as inhibitors of flux *j* [28]. Using the steady-state flux and metabolite values and the randomly generated kinetic orders, *g*_{i,j}, the rate constants, *a*_{j}, of the enzymatic reactions are then computed via Eq. 4.

Due to the high number of unknown parameters, dilution experiments were parameterized one at a time. The thus obtained working parameters from one experiment were used to regenerate new parameters in the reduced parameter space, employing random Monte Carlo sampling. Therefore, hundreds of thousands of parameter combinations were simulated to satisfy all model criteria.

Specifically, we used a variation of an explore-and-exploit algorithm [29,30,31], starting with an initial, randomly generated parameter set. This algorithm is designed to target those areas of the parameter space that are in the vicinity of an admissible solution once such a solution is found; it exploits these neighborhoods rather than randomly exploring the parameter space at large. In other words, the algorithm benefits from the search history and performs a more effective optimization within promising, localized domains of the parameter space. Details of our implementation of the algorithm are discussed in Additional files 1 and 2.

For the dynamic model, the criteria of admissibility were the same as the criteria in the static model. Thus, parameter values were deemed admissible if they met the model criteria presented in “Results” section. Namely, they had to yield a good representation of the observed lignin monomer composition and total lignin amount, and match the label incorporation profiles in lignin monomers, as well as in wall-bound *p*-coumaric acid and ferulic acid to a satisfactory degree.

### Dynamic model validation

#### Enzyme activity estimation from changes in gene expression

Ideally, it would be possible to determine the concentrations and activities of all enzymes involved in the pathway. However, in situ measurements of protein concentrations of the magnitudes available in lignin biosynthesis are extremely difficult to obtain, and as of yet no reliable information is available. It, therefore, appears that reliable and rather precise transcripts of genes coding for the enzymes of interest are our best option.

It is known that changes in gene expression are not necessarily translated one-to-one measure into the corresponding enzyme activity, and it is in fact likely that a *p*% change in transcript results in a change in enzyme activity that is much lower than *p*%. To account for this uncertainty, we used the measured transcript profile of *BdPTAL* transgenics as an upper bound for the corresponding enzyme activity levels. Thus, we randomly generated 20,000 altered enzyme activity profiles bounded between the wild type and *BdPTAL* transcript level, simulated all these profiles with the dynamic model, and retained those close enough to the *BdPTAL* observations of lignin amounts and labeling levels. Figure 8 shows the altered enzyme activity profiles that yielded admissible lignin composition profiles. In addition to the enzyme transcripts, the right panel depicts the changes in carbon influx relative to wild type in the admissible solutions. Although the enzyme transcripts of the upstream pathway are not measured in this study, we included this extra measure in our model to examine the necessity of an orchestrated decrease in carbon influx into the pathway by the plant. The results indicate that the median of the change is at 90%, which leads to the conclusion that the influx is not strongly redirected before entering the pathway.