Bacterial strain, growth medium, and continuous culture conditions
A derivate of the Clostridium autoethanogenum DSM 10061 strain—DSM 19630—deposited in The German Collection of Microorganisms and Cell Cultures (DSMZ) was used in all experiments and stored as glycerol stocks at − 80 °C.
Cells were grown either on CO (~ 60% CO and 40% Ar; BOC Australia) or CO + H2, termed “high-H2 CO” here (~ 15% CO, 45% H2 and 40% Ar; BOC Australia) in chemically defined medium (without yeast extract) [28]. Experimental data for growth on syngas (~ 50% CO, 20% H2, 20% CO2, and 10% N2/Ar; BOC Australia) other than proteomics data, which was generated in this work, are from our previous study [28].
As during growth on syngas [28], cells were grown under strict anaerobic conditions at 37 °C and at a pH of 5 (maintained by 5 M NH4OH). Chemostat continuous cultures were operated at D = 1.0 ± 0.01 day−1 (µ = 0.04 ± 0.001 h−1) and D = 1.0 ± 0.01 day−1 (µ = 0.04 ± 0.001 h−1) for CO and high-H2 CO, respectively, (D = 1.0 ± 0.03 day−1 [µ = 0.04 ± 0.001 h−1] for syngas) in 1.4 L Multifors bioreactors (Infors AG) at a working volume of 750 mL. The system was equipped with peristaltic pumps; mass flow controllers (MFCs); pH, ORP, and temperature sensors and was connected to a Hiden HPR-20-QIC mass spectrometer (Hiden Analytical) for online high-resolution off-gas analysis. Antifoam was continuously added to the bioreactor using a syringe pump to avoid foaming.
We targeted the lowest and highest (~ 0.5 and 1.4 gDCW/L, respectively) steady-state biomass concentrations of the previous syngas cultures [28] to compare the effect of H2 supplementation at different levels of inhibition from by-products. This was achieved using various steady-state gas–liquid mass transfer rates: for CO, 510 and 665 RPM at 46.5 mL/min gas flow resulting in 0.47 ± 0.02 and 1.43 ± 0.08 (gDCW/L), respectively; for high-H2 CO, 650 and 1000 RPM, and 46.5 and 110 mL/min gas flow resulting in 0.46 ± 0.04 and 1.45 ± 0.04 (gDCW/L), respectively. Four biological replicate cultures with two steady states (low and high biomass) per independent chemostat run were performed. All the steady-state results reported here were collected after optical density (OD) and gas uptake and production rates had been stable in chemostat mode for 3–5 working volumes, similar to syngas data.
Biomass concentration analysis
Biomass concentration (gDCW/L) was estimated for CO and high-H2 CO cultures by measuring the OD of the culture at 600 nm using the correlation coefficient of 0.21 between culture OD and dry cell weight determined in [28] for syngas cultures.
Bioreactor off-gas analysis
Bioreactor off-gas analysis was performed as specified in [28] by an online Hiden HPR-20-QIC mass spectrometer (MS) using the Faraday Cup detector. Shortly, gas uptake (CO and H2) and production (CO2 and ethanol) were determined using “online calibration” of the MS by analysing the respective feed gas directly from the cylinder after each analysis cycle of the bioreactors. Specific rates (mmol/gDCW/h) were calculated by taking into account the exact composition of the respective gas, bioreactor liquid working volume, feeding gas flow rate, off-gas flow rate based on the fractional difference of the inert gas Ar in the feeding and off-gas composition, the molar volume of ideal gas, and the steady-state biomass concentration. To achieve a more accurate carbon balance, ethanol stripping and the total soluble CO2 fraction in culture broth were also taken into account based on off-gas analysis.
Extracellular metabolome analysis
Extracellular metabolome analysis was carried out using filtered broth samples stored at − 20 °C until analysis. Organic acids, alcohols, and amino acids were quantified using HPLC as described before [43]. We note that cells produced 2R,3R-butanediol.
Intracellular metabolome analysis
Intracellular metabolome analysis was based on the method previously developed for the autotrophic growth of C. autoethanogenum [45] with details specified in [28]. Briefly, 1 mL of a high biomass culture was pelleted by immediate centrifugation followed by extraction of intracellular metabolites using acetonitrile. Metabolite concentrations were determined using LC–MS analysis in negative ion mode and relevant standards.
Cell-free synthesis of stable-isotope labelled proteins
Twenty proteins covering central metabolism, the HytA–E hydrogenase, and a ribosomal protein of C. autoethanogenum (Additional file 4: Table S5) were selected for cell-free synthesis of stable-isotope labelled (SIL) proteins. First, the genes encoding for these proteins were synthesised by commercial gene synthesis services (Biomatik). The PCR-amplified target genes were sub-cloned into the cell-free expression vector pEUE01-His-N2 (Cell-Free Sciences) and transformed into Escherichia coli DH5α. Next, plasmid DNA was extracted and purified by alkaline lysis after cells had been cultured overnight in LB medium containing 50 μg/mL ampicillin. Correct gene insertion into the pEUE01-His-N2 was verified by DNA sequencing. Subsequently, cell-free synthesis of His-tag fused C. autoethanogenum proteins was performed using the bilayer reaction method with the wheat germ extract WEPRO8240H (Cell-Free Sciences) as described previously [56, 57]. Briefly, mRNAs used for cell-free synthesis were prepared by an in vitro transcription reaction at 37 °C for 6 h using the SP6 RNA polymerase. In vitro translation of C. autoethanogenum proteins was performed using a bilayer reaction (200 μL substrate layer and 40 μL translation layer) at 17 °C for 24 h in a 96-well microplate. The translation layer was supplemented with l-Arg-13C6,15N4 and l-Lys-13C6,15N2 (Wako) at final concentrations of 20 mM to achieve high efficiency (> 99%) stable-isotope labelling of proteins. Finally, in vitro synthesised proteins were purified using the Ni-Sepharose High-Performance resin (GE Healthcare Life Sciences) and stored at − 80 °C until further use.
Proteome analysis
Proteome analysis of CO, high-H2 CO, and syngas cultures was carried out for four biological replicates from the high biomass concentration (~ 1.4 gDCW/L) experiments using a DIA MS approach [55]. 2 mL of culture was pelleted by immediate centrifugation (25,000×g for 1 min at 4 °C) and stored at − 80 °C until analysis.
Sample preparation
Frozen cell pellets were thawed, washed with phosphate-buffered saline, resuspended in 500 µL of lysis buffer (pH 7.6) containing 2% (w/v) SDS (L4390; Sigma-Aldrich), 0.1 M DTT (V3155; Promega), 0.1 M Trizma® base (T1503; Sigma-Aldrich), and vortexed. The cell suspension was transferred to a 2 mL screw cap microtube (522-Q; Thermo Fisher Scientific) containing 0.1 mm glass beads (11079101; BioSpec Products). Cell lysis was performed by repeating the following “lysis cycle” four times: incubation for 10 min at 100 °C; bead beating using program “cycle 5” on the Precellys™ 24 instrument (Bertin Technologies); centrifugation at 14,000 rpm for 10 min at room temperature; vortexing (excluded from the final fourth lysis cycle). Next, 400 µL of lysate was carefully removed without withdrawing glass beads. Protein concentration in cell lysates was determined using the 2D Quant Kit (80-6483-56; GE Healthcare Life Sciences).
Alkylation of sulfhydryl groups and protein digestion was based on the filter-aided sample preparation (FASP) protocol [60]. 100 µg of protein was loaded and mixed with 200 µL of 8 M urea ([UA]; U5128; Sigma-Aldrich) in 0.1 M Trizma® base (pH 8.5) on an Amicon® Ultra-0.5 mL centrifugal filter unit with nominal molecular weight cutoff of 30,000 (UFC503096; Merck Millipore), and centrifuged at 14,000 rpm for 10 min at room temperature. The filter was washed and centrifuged once more with 200 µL of UA after which sulfhydryl groups were alkylated with the addition of 100 µL of 0.05 M iodoacetamide (I6125; Sigma-Aldrich) in UA, vigorous vortexing, and incubation for 30 min at room temperature in the dark. Next, the filter was centrifuged as described above, and then washed three times and centrifuged with UA. Subsequently, the addition of 100 µL of 25 mM ammonium bicarbonate (AMBIC) and centrifugation was repeated twice before proteins were digested on the filter for 16 h at 37 °C with 2 µg of Trypsin/Lys-C mix (V5073; Promega) in 30 µL of ~ 17 mM AMBIC and acetic acid. Peptides were recovered by centrifuging the filters upside down at 1000 rpm for 2 min at room temperature, followed by two times of addition of 30 µL of 25 mM AMBIC and centrifugation as in the previous step. Finally, the collected peptide material was mixed with 10 µL of 0.1% (v/v) formic acid (FA) in 5% (v/v) acetonitrile (ACN) to stop digestion.
Samples were desalted using C18 ZipTips (ZTC18S096; Merck Millipore) as follows: the column was wetted using 0.1% FA in 100% ACN, equilibrated with 0.1% FA in 70% (v/v) ACN, and washed with 0.1% FA before loading the sample and washing again with 0.1% FA. Finally, peptides were eluted with 0.1% FA in 70% ACN. Total peptide concentration in each sample was determined using the Pierce™ Quantitative Fluorometric Peptide Assay (23290; Thermo Fisher Scientific) to ensure that the same total peptide amount across samples could be injected for DIA MS analysis. To further increase the accuracy of relative protein quantification, each sample was spiked with the same amount of a mix of SIL peptides derived from the 20 SIL proteins (see above) using the same FASP-based workflow as for culture samples with an additional step of reduction of disulfide bonds using DTT. Finally, samples were freeze-dried and reconstituted with 10 µL of 2% (v/v) ACN containing 0.05% (v/v) trifluoroacetic acid (TFA) to which 1 µL of an iRT Peptide mix (Ki-3003; Biognosis) were added, pre-diluted one in five to meet the manufacturer’s recommendations. In addition, the whole material eluted from a ZipTip of one sample from each gas mix and one syngas-grown culture sample spiked with SIL peptides were used for DIA MS spectral library generation using data-dependent acquisition (DDA; see below).
Sample fractionation for DIA MS spectral library
To increase the proteome coverage in DIA MS analysis, a pool of samples from each gas mix was fractionated using high pH reverse-phase fractionation, based on the protocol of the Thermo Fisher Scientific product 84868. A Waters Sep-Pak tC18 cartridge (WAT054960; Waters) was conditioned twice by the addition of 500 µL of 100% ACN and centrifugation inside a 15 mL falcon tube at 3000×g for 2 min at room temperature. The same was repeated with 0.1% FA. Next, ~ 15 µg of the FASP product of one sample from each gas mix were pooled together, mixed with 0.1% FA for a final volume of 500 µL and loaded on the column by centrifugation (same conditions). The cartridge was then washed with 500 µL of Milli-Q water before eight fractions were collected with an increasing ACN step gradient (from 5 to 50%) at high pH in triethylamine background. Finally, the fractions were freeze-dried and reconstituted with 10 µL of 2% ACN containing 0.05% TFA.
Nano-LC method
For both the DDA spectral library generation and DIA sample runs, a Thermo-Scientific U3000 nano-HPLC system was used in a trap column configuration for concentration and separation of the peptide samples. The samples were initially loaded onto a Thermo Acclaim PepMap C18 trap reversed-phase column (75 µm × 2 cm nano viper, 3 µm particle size) at a flow rate of 8 µL/min using 2% ACN containing 0.05% TFA for 6 min. Separation was achieved at 250 nL/min using 0.1% FA in water (buffer A) and 0.1% FA in ACN (buffer B) as mobile phases for gradient elution with a 75 µm × 50 cm PepMap RSLC C18 (2 µm particle size) Easy-Spray Column at 45 °C. Peptide elution employed a 2–8% ACN gradient for 14 min followed by two step gradients of 8–30% ACN gradient for 80 min and 30–45% ACN for 10 min. The total acquisition time was 130 min including a 95% ACN wash and re-equilibration step. For each DDA sample run, a volume of 5 µL equating to ~ 1.5 µg of protein digest was injected. Likewise, for each DIA sample run, a volume of 5 µL equating to 0.5 µg of protein digest was injected.
DIA MS spectral library generation
The following 17 samples were analysed on the Q-Exactive HF (Thermo Fisher Scientific) in DDA mode to yield the spectral library for DIA MS data analysis: (1) one unfractionated sample from each gas mix and one unfractionated syngas-grown culture sample spiked with SIL peptides; (2) four replicates of one pool of all 12 unfractionated culture samples; (3) eight high pH reverse-phase fractions of a pool of samples from each gas mix; and (4) a mix of eight SIL-proteins (see Additional file 4: Table S5).
The eluted peptides from the C18 column were introduced to the MS via a nano-ESI and analysed using the Q-Exactive HF. The electrospray voltage was 1.8 kV, and the ion transfer tube temperature was 250 °C. Employing a top-20 ddMS2 acquisition method, full MS scans were acquired in the Orbitrap mass analyzer over the range m/z 400–1200 with a mass resolution of 120,000 (at m/z 200). The AGC target value was set at 3.00E+06. The 20 most intense peaks with a charge state between 2 and 6 were fragmented in the high energy collision dissociation (HCD) cell with a normalised collision energy of 28%. MSMS spectra were acquired in the Orbitrap mass analyzer with a mass resolution of 15,000 at m/z 200. The AGC target value for MSMS was set to 1.0E+05, while the ion selection threshold was set to 1.8E+05 counts. The maximum allowed ion accumulation times were 50 ms for full MS scans and 40 ms for MSMS. For all the experiments, the dynamic exclusion time was set to 20 s, and undetermined charge state species were excluded from MSMS.
Identification results from DDA analysis were used to build a spectral library for DIA MS data confirmation and quantification using Skyline [61] (see below). For this, raw DDA data files were analysed with Proteome Discoverer 2.2 (Thermo Fisher Scientific) using SEQUEST HT against a C. autoethanogenum DSM 10061 [31] database containing ~ 3750 sequences while also annotated to include the 20 SIL-proteins and a fusion of the 11 iRT peptides. The NCBI annotation of sequence NC_022592.1 [59] was used as the annotation genome here, with CAETHG_RS07860 removed and replaced with the carbon monoxide dehydrogenase genes named CAETHG_RS07861 and CAETHG_RS07862 with initial IDs of CAETHG_1620 and 1621 [59], respectively. The workflow editor was used to create customised searches and result reports, where RAW data files were processed to generate a Magellan Server File (MSF) result file and a.pd result output file, which was later incorporated in Skyline for the DIA MS spectral library build.
The SEQUEST HT search parameters included: 10 ppm precursor ion mass tolerance; product ion mass tolerance of 0.05 m/z; full trypsin specificity with two missed cleavages allowed for peptides with a length of 6–150 AAs. Cysteine carbamidomethylation was set as a fixed modification, while methionine oxidation, deamidation of glutamine and asparagine as well as N-terminal acetylation were set as variable modifications. The mass analyser used was Fourier Transform Mass Spectrometry while the activation type was HCD. Peaks were filtered with a signal to noise (S/N) threshold of 1.5. A separate SEQUEST HT search node was included with fixed modifications set to include 13C(6)15N(2)/+ 8.014 Da (K) and 13C(6)15N(4)/+ 10.008 Da (R) for the SIL-proteins. Within this search node, cysteine carbamidomethylation (+ 57) was set as a fixed modification, while methionine oxidation (+ 16), deamidation of glutamine and asparagine (+ 0.984) as well as N-terminal acetylation (+ 42) were set as variable modifications.
Database searching against the corresponding decoy database containing reversed protein sequences was performed using Percolator to evaluate the FDR of peptide identifications. The final.pd result file contained peptide–spectrum matches (PSMs) with q values estimated at 1% FDR for peptides ≥ 4 AAs.
DIA MS data acquisition
As for the DDA method, eluted peptides were introduced to the MS via a nano-ESI and analysed using the Q-Exactive HF. The electrospray voltage was 1.9 kV, and the ion transfer tube temperature was 250 °C. DIA was achieved using an inclusion list: m/z 400‒1000 in steps of 15 amu and a quadruple isolation window of 16 amu, scans cycled through the list of 40 isolation windows interspersed with an MS1 scan for every 10 targets. Full MS scans were acquired in the Orbitrap mass analyser over the range m/z 400–1200 with a mass resolution of 120,000 (at m/z 200). Identical to the DDA method, the AGC target value was set at 3.00E+06 with a maximum injection time of 50 ms. All DIA scans implemented an NCE collision energy of 28% while MSMS detection in the Orbitrap was at a resolution setting of 30,000 (at m/z 200). The AGC target was set to 1.0E+06 with a maximum injection time of 45 ms. A first fixed mass of m/z 200 was applied, and default charge state of 2 was set for scanning MS2 events.
DIA MS data analysis
DIA MS data analysis was performed with the software Skyline [61]. The .pd result file from Proteome Discoverer was used to build the DIA MS spectral library within Skyline using only PSMs with q value < 0.01. The following parameters were used for DIA MS data analysis: six of the most intense y and b (only y for SIL-protein-aided label-based quantification) product ions from ion 3 to last ion of charge state 1 and 2 among precursor charges 2, 3, and 4 were picked while product ions falling within the DIA precursor window were excluded; chromatograms were extracted with a library match mass tolerance of 0.05 m/z for product ions with an extraction window within 5 min of the predicted retention time after iRT alignment; full trypsin specificity with two missed cleavages allowed for peptides with a length of 8–25 AAs; cysteine carbamidomethylation as a fixed peptide modification. In addition, for SIL-protein-aided label-based quantification, peptide modifications included heavy labels for lysine and arginine as 13C(6)15N(2)/+ 8.014 Da (K) and 13C(6)15N(4)/+ 10.008 Da (R), respectively, and these heavy labels were set as “internal standard type” to aid peak picking.
Next, for both data sets (label-free and label-based), shared peptides were removed, and a minimum of five transitions per precursor and two peptides per protein were allowed. The mProphet peak picking algorithm [62] within Skyline was used and trained with shuffled sequence decoys to separate true from false-positive peak groups (per sample) and only peak groups with q value < 0.01 (representing 1% FDR) were used for further quantification. For the proteome-wide data set (label-free), we confidently quantitated 14,705 peptides and 1655 proteins across all samples, and 10,134 peptides and 1403 proteins on average within each sample with at least two peptides per protein.
Determination of differentially expressed proteins and absolute protein expression levels
Protein expression fold changes with p- and q values were determined using the software MSstats [63] with high-quality feature selection, top3 featureSubset, and Tukey’s median polish as run-level summarisation within its linear mixed models. For the proteome-wide data set (label-free), only proteins with at least two peptides in each bio-replicate under comparison were used (filtering for two, instead of one, peptides per protein has shown higher quantification accuracy [64, 65]) and input data were normalised using quantile normalisation, independently determined as the most suitable normalisation method using the software Normalyzer [66]. Higher quantification accuracy for the SIL proteins was achieved by label-based quantification through normalising light (endogenous) data with heavy (spike-in). Proteins were considered to be differentially expressed by a q value < 0.05 after FDR correction [58], and for proteome-wide label-free quantification additionally with a fold-change > 1.5.
Absolute protein expression levels as MS signal intensities were estimated for proteins with at least two peptides in each bio-replicate of the respective gas mix by summing the five most intense product ions (of the most intense precursor) of the three most intense peptides (two if only two quantified). This combination has shown the highest accuracy for label-free absolute quantification from DIA MS data [67].
Differentially expressed proteins within SIL-protein-aided label-based quantification are presented in Additional file 4: Table S5, and within the proteome-wide label-free data set between syngas and CO or high-H2 CO and CO in Additional file 4: Tables S6, S7, respectively. Absolute protein expression levels are in Additional file 4: Table S8. The MS proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository [68] with the data set identifier PXD008367.
Genome-scale metabolic modelling
Model simulations were performed using GEM iCLAU786 of C. autoethanogenum [43] with modifications and simulation details specified in [28]. For simulations reported here, we used experimentally determined C. autoethanogenum biomass amino acid composition of high biomass syngas cultures reported in [28]. Biomass amino acid composition was determined at the Centre of Food and Fermentation Technologies (Tallinn, Estonia) using a method based on acid hydrolysis and LC–MS.
Briefly, we used FBA [44] to estimate intracellular fluxes (SIM1–19) and predict “optimal” growth phenotypes for our experimental conditions (SIM20–34) using either maximisation of ATP dissipation or biomass yield, respectively, as the objective function. In addition, for SIM35–41, CO2 reduction with the redox-consuming FdhA activity (reaction rxn00103_c0) was zeroed and the ratio between H2 utilisation for direct CO2 reduction (reaction rxn08518_c0), and Fdred and NADPH generation (reaction leq000001) by the HytA–E/FdhA complex was fixed at a value corresponding to the respective experiment’s \(q_{{{\text{H}}_{2} }}\)/qCO ratio (see [28] for details; both syngas and high-H2 CO data were used for fitting). Finally, we note that since carbon recoveries above 100% were observed, model input data for gas uptake rates were modified to achieve feasible solutions as specified in [28].
Simulation results identified as SIMx (e.g., SIM1) in the text are reported in Additional file 3: Tables S3, S4. The reactions together with their stoichiometries forming the metabolic network of GEM iCLAU786 can be found in Additional file 3: Table S4 and from the SBML model file of the GEM iCLAU786 in Additional file 5.