Structure-oriented substrate specificity engineering of aldehyde-deformylating oxygenase towards aldehydes carbon chain length

Background Aldehyde-deformylating oxygenase (ADO) is an important enzyme involved in the biosynthetic pathway of fatty alk(a/e)nes in cyanobacteria. However, ADO exhibits quite low chain-length specificity with respect to the substrates ranging from C4 to C18 aldehydes, which is not suitable for producing fuels with different properties or different chain lengths. Results Based on the crystal structures of cADOs (cyanobacterial ADO) with substrate analogs bound, some amino acids affecting the substrate specificity of cADO were identified, including the amino acids close to the aldehyde group and the hydrophobic tail of the substrate and those along the substrate channel. Using site-directed mutagenesis, selected amino acids were replaced with bulky ones introducing steric hindrance to the binding pocket via large functional groups. All mutants were overexpressed, purified and kinetically characterized. All mutants, except F87Y, displayed dramatically reduced activity towards C14,16,18 aldehydes. Notably, the substrate preferences of some mutants towards different chain-length substrates were enhanced: I24Y for n-heptanal, I27F for n-decanal and n-dodecanal, V28F for n-dodecanal, F87Y for n-decanal, C70F for n-hexanal, A118F for n-butanal, A121F for C4,6,7 aldehydes, V184F for n-dodecanal and n-decanal, M193Y for C6–10 aldehydes and L198F for C7–10 aldehydes. The impact of the engineered cADO mutants on the change of the hydrocarbon profile was demonstrated by co-expressing acyl-ACP thioesterase BTE, fadD and V184F in E. coli, showing that n-undecane was the main fatty alkane. Conclusions Some amino acids, which can control the chain-length selectivity of substrates of cADO, were identified. The substrate specificities of cADO were successfully changed through structure-guided protein engineering, and some mutants displayed different chain-length preference. The in vivo experiments of V184F in genetically engineered E. coli proved the importance of engineered cADOs on the distribution of the fatty alkane profile. The results would be helpful for the production of fatty alk(a/e)nes in cyanobacteria with different properties. Electronic supplementary material The online version of this article (doi:10.1186/s13068-016-0596-9) contains supplementary material, which is available to authorized users.


Background
The biosynthesis of fatty alk(a/e)nes by plants, insects, birds, green algae and cyanobacteria has been attracting great attention, since fatty alk(a/e)nes have been considered as the ideal replacements for fossil-based fuels [1][2][3][4][5]. It has been accepted that one of the enzymatic pathways producing alk(a/e)nes is derived from fatty acyl-ACP or -CoA in a two-step reaction: fatty acyl-ACP or -CoA is first reduced into fatty aldehyde by acyl-ACP or -CoA reductase, then fatty aldehyde is converted into alk(a/e) ne by aldehyde decarbonylase (now renamed as aldehyde-deformylating oxygenase, ADO). In 2010, Schirmer et al. identified two genes involved in alk(a/e)ne biosynthesis in cyanobacteria: acyl-ACP reductase and ADO [1]. In 2013, Akhtar et al. reported that a carboxylic acid reductase (CAR) from Mycobacterium marinum could convert a wide range of aliphatic fatty acids (C 6 -C 18 ) into corresponding aldehydes, which can then be transformed into fatty alkane by ADO [6]. From the viewpoint of chemistry, transformation of aldehydes into alk(a/e)nes by ADO is quite difficult and unusual, so cADO (cyanobacterial ADO) has attracted particular interest in industry and academia [7].
Since then, several important conclusions have been drawn: (1) the C1-derived coproduct of the cADO-catalyzed reaction is formate, instead of previously supposed carbon monoxide [8]; (2) oxygen is absolutely required, and one O-atom is incorporated into formate [9,10]; (3) the auxiliary reducing system providing four electrons is needed, and the homologous electron transfer system worked more effectively than the heterologous and chemical ones in supporting cADO activity [1,9,[11][12][13] (Scheme 1). It has been observed that self-sufficient cADOs fused to homogenous ferredoxin (Fd) and ferredoxin-NADP + reductase (FNR) could efficiently catalyze conversion of aldehydes into alk(a/e)ne [14]. Andre et al. reported that cADO was reversibly inhibited by H 2 O 2 originating from poor coupling of reductant consumption with alk(a/e)ne formation, and the kinetics of cADO towards aldehyde substrates of carbon chain lengths between 8 and 18 carbons showed that cADO did not exhibit strong chain-length specificity with respect to its substrates [15]. cADO also produces n-1 aldehydes and alcohols in addition to alk(a/e)ne [16]. Mechanistic studies have demonstrated that a radical intermediate is involved in the cADO-catalyzed reaction, and a possible catalytic process has been proposed based on the crystal structures of cADO from Synechococcus elongatus strain PCC7942 [17][18][19][20]. cADO was engineered to improve specificity for short-to medium-chain aldehydes [21]. Hayashi et al. investigated the role of three cysteines in the structure, stability and alk(a/e)ne production of cADO [22]. Based on the crystal structures of cADO, cADO belongs to the non-heme dinuclear iron oxygenase family of enzymes including methane monoxygenase, type I ribonucleotide reductase and ferritin [1,17,[23][24][25].
Fatty alk(a/e)nes are the main component of traditional fuels such as gasoline, diesel and jet fuel. The carbon number distribution of hydrocarbons varies in different fuels, for example, 4-12 in gasoline, [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] in diesel and 8-16 in jet fuel [26]. Increasing interest in developing the next generation of biofuels, which can function as "dropin" fuels, has spurred high attention towards the enzymes involved in hydrocarbon biosynthesis. The acyl-ACP thioesterases with different carbon chain-length specificity could be used to synthesize the fatty acid-based fuels such as fatty alcohols and FAEs (fatty acid esters) with different carbon chain length distributions [27]. A number of different carbon chain length-specific acyl-ACP thioesterases have been successfully utilized to control the carbon chain length distributions of fatty acids and/or fatty acid derivatives in genetically engineered microbes, such as tesA from Escherichia coli (C16:0), CCTE from Cinnamomum camphora (C14:0), and BTE from Umbellularia californica (C12:0) [28]. Moreover, engineering efforts have also been successful in altering the specificity of wild-type desaturases, such as the Castor Δ 9 -18:0-ACP desaturase, leading to the isolation of mutants with up to 15-fold increased specific activity towards 16-carbon substrates [29]. The substrate specificity of β-ketoacyl-ACP synthase was modified from 8:0-ACP substrate to 6:0-ACP through protein engineering [30]. Very recently, an acyl carrier protein (ACP) from Synechococcus elongatuswas engineered to enhance production of shortened fatty acids such as C 14 fatty acid [31].
In this current study, the amino acids of cADO-1593 from Synechococcus elongatus which might influence the chain-length selectivity of the substrates were identified and fully characterized. The substrate specificities of cADO towards different chain-length substrates were achieved through structure-orientated engineering, and in vivo experiments were also performed by introducing an engineered ADO mutant into a fatty alk(a/e)ne producing E. coli factory.

Identification of the amino acids that may influence the substrate specificity of cADO
According to the crystal structure of cADO-1593 from Synechococcus elongatus strain PCC7942 (PDB code: 4RC5) [17], amino acids involved in the substrate channel were identified, including Tyr21, Ile24, Ile27, Val28, Gly31, Phe67, Cys70, Phe86, Phe87, Phe117, Ala118, A121, Tyr122, Tyr125, Val184, Met193 and Leu198 (Fig. 1). Since the side chains of Phe67, Phe86, Phe117, Tyr122 and Tyr125 are either parallel with the substrate analog or do not point towards the substrate analog and only provide a hydrophobic environment for the substrate, they are not investigated in the current study. The other amino acids, which might have some influence on the substrate specificity of cADO, were investigated.
Investigation into the substrate tunnel of cADO-1593 revealed that the amino acids, Gly31 and Ala118 are close to the di-iron center and the aldehyde group of the substrate (<C 4 ). Replacement of these two amino acids with tyrosine or phenylalanine, which may introduce a steric block in this position, might improve the selectivity of cADO for short-chain substrates against medium-and long-chain substrates.
Amino acids including Ile24, Ile27, Val28, Phe87, Ala121, Val184 and Leu198 were also identified along the substrate tunnel. All of them point their side chains towards the bound substrates and their side chains are approximately perpendicular to the substrate. Both Val41 and Ala134 of cADO PMT1231 from Prochlorococcus marinus (MIT9313) [21] have been shown to have effects on substrate specificity, which are, respectively equivalent to the sites of Val28 and Ala121 of cADO-1593. These amino acids might also have some impact on access of medium and long-chain length substrates when they are replaced with bulky ones, such as tyrosine or phenylalanine.
The amino acids Tyr21 and Cys70 are situated close to the hydrophobic end of the substrate analog according to the crystal structure of cADO-1593, which was predicted to be the possible substrate entrance. Therefore, mutation of them into the bulky ones might have some influence on substrate entry.

Site-directed mutagenesis, overexpression, purification and enzymatic assays of WT and cADO mutants
Considering that replacement of the above identified amino acids with large ones might impede the binding of substrates beyond certain length or substrate access, thirteen mutants, including Y21R, I24Y, I27F, V28Y, G31F, C70F, F87Y, A118F, A121F, V184F, M193Y and L198F, were constructed following the standard protocol using WT cADO-1593 as the template. All mutants were successfully overexpressed in E. coli BL21(DE3) and purified on a nickel column as previous described [13] (Additional file 2: Figure S2). All enzymatic assays were carried out in the presence of the chemical reducing system NADH/ PMS (phenazine methosulfate) and catalase [13].

Activities towards medium-to long-chain length aldehydes
Medium-and long-chain (C 10,12,14,16,18 ) aldehydes were used as the substrates to investigate the effects of the mutations on the substrate specificity ( Fig. 2; Additional file 3). A118F did not show any obvious activity against C 14,16,18 aldehydes, and only exhibited slight activity towards n-dodecanal and n-decanal. Eleven mutants, excluding F87Y, displayed dramatically reduced activity towards C 14,16,18 aldehydes. Notably, the activities Identified residues which might affect the substrate specificity of cADO (PDB code: 4RC5). The identified residues include those close to the aldehyde group (Gly31 and Ala118) and the hydrophobic tail (Tyr21 and Cys70) of the substrate, and those along the substrate channel (Ile24, Ile27, Val28, Phe87, Ala121, Val184, Met193 and Leu198). The bound substrate analog was colored yellow of V184F (4.4-fold), F87Y (2.5-fold), I27F (2.1-fold) and V28Y (2.0-fold) towards n-dodecanal were greatly enhanced. G31F, A121F and M193Y exhibited comparable activity to WT towards n-dodecanal, and Y21R and I24Y displayed low activities towards n-dodecanal and n-decanal. Different behavior against n-decanal for I27F and V28Y were observed: improved activity for I27F and reduced activity for V28Y. C70F exhibited comparable activities to WT towards n-dodecanal and n-decanal. L198F showed reduced activity against n-dodecanal, but improved activity for n-decanal (1.4-fold). The activities of A121F, V184F and M193Y for n-decanal were improved. F87Y showed enhanced activity against n-decanal (1.8-fold) and similar activity to WT towards n-octadecanal.

Activities towards medium-to short-chain aldehydes a. Apparent k cat values of WT cADO and mutants towards n-heptanal
n-Heptanal has been successfully used as the substrate to determine the apparent k cat values of cADOs, and was utilized to examine whether the kinetics of the mutants were influenced (Fig. 2) [12][13][14]. The apparent k cat value of M193Y was improved by about three-fold in comparison with that of WT, and was the highest one among all mutants. Compared with WT, the apparent k cat values of L198F, I24Y and A121F were increased by two-fold, and C70F, Y21R, F87Y, V184F and G31F exhibited the comparable apparent k cat values to WT, demonstrating that these mutations had little influence on the activities towards n-heptanal. The activities of I27F, V28Y and A118F were severely impaired.
b. Kinetic characterization of WT cADO and some mutants towards C 6-9 aldehydes C 6,8,9 aldehydes were also used as the substrates to investigate the effects of some mutations on chain-length selectivity (Fig. 3). In comparison with WT, A121F, C70F, M193Y and L198F showed 2.7, 2.5, 1.7 and 1.4-fold increase in k cat app against n-hexanal, respectively, and I24Y demonstrated similar k cat app for n-hexanal. When n-octanal was used as the substrate, M193Y (3.2-fold) showed significantly improved activity, and A121F and L198F exhibited comparable activity to WT, and I24Y and C70F displayed much lower activity (Fig. 3b). While n-nonanal was used as the substrate, M193Y and L198F exhibited 1.7 and 2.0-fold increase in k cat app respectively, and the apparent k cat value of I24Y was much lower than that of WT, and those of C70F and A121F were about half of that of WT. According to the published results by Khara et al. for cADO-PMT1231 from Prochlorococcus marinus (strain MIT9313), A134F, which is equivalent to A121F of cADO-1393, showed significantly improved activity against C 6,7,8 aldehydes [21]. Therefore, the kinetic parameters of A121F against C 6-9 aldehydes were determined in detail (Additional file 4). The kinetic parameters of A121F and WT cADO-1593 are listed in Table 1. In comparison with WT, A121F exhibited one-fold increase in the K m value for n-nonanal and similar K m value for other substrates. It seems that the K m values of WT and A121F towards C 6-9 aldehydes decrease with increasing chain-length of the substrates. This mutant displayed higher k cat values for C 6,7 aldehydes, but similar values against C 8,9 aldehydes. The k cat value of A121F towards n-hexanal was the highest among the substrates tested. Compared with WT, A121F showed significantly improved catalytic efficiency (k cat /K m ) against n-hexanal and slightly higher one for n-heptanal. The catalytic efficiencies of WT towards C 8,9 aldehydes are much higher than that of A121F for n-hexanal, which could be mainly caused by the big difference in K m between them.

c. Apparent k cat values of WT cADOs and several mutants towards n-butanal
Since Gly31 and Ala118 are very close to the aldehyde group of the substrate, it is expected that G31F and A118F may exhibit inhibition against short aldehydes such as n-butanal than WT (Fig. 3d). In comparison with WT cADO-1593, the apparent k cat value of A118F towards n-butanal was improved by 2.2-fold, whereas G31F showed greatly decreased activity for n-butanal (data not shown) (Fig. 3d). Considering that large Phe might have negative effects on activity, A118L was also constructed. Unexpectedly, A118L gave the same result as A118F). A121F exhibited the highest k cat app value towards n-butanal (3.3-fold increase).

Further characterization of WT and some cADO mutants a. Circular dichroism (CD) for WT cADO and some mutants
CD was used to investigate the effects of mutations on the conformational or structural changes. All mutants displayed similar CD spectroscopies to WT (Fig. 4), and secondary structures of WT cADO and all mutants were (See figure on next page.) Fig. 2 Yields of fatty alkanes for wild-type cADO and variants towards C 10,12,14,16,18 aldehydes (a-c) and the apparent k cat values towards n-heptanal (d). Yields of fatty alkanes for wild-type cADO and variants towards C 10 , 12,14,16,18 aldehydes were determined by GC-MS and using C 20 alkane (10 μM) as an internal standard. The amount of n-hexane produced was quantified by GC and a standard curve of known concentrations of n-hexane also very close (Additional file 5: Table S1). These results indicated that mutations did not lead to significant conformational changes for these mutants. Thus, the reasons why mutants showed different behavior (activities and/or chain length selectivity) from WT are due to the change of the side chains of amino acids instead of conformational or structural changes.

b. Mixed-substrate competition assays for WT and some cADO mutants
Based on the results of all mutants towards different chain-length substrates, it seems that mutants showed different chain length preference, for example, I24Y for n-heptanal, I27F for n-decanal and n-dodecanal, V28F for n-dodecanal, F87Y for n-decanal, C70F for n-hexanal, A118F for n-butanal, A121F for C 4,6,7 aldehydes, V184F  Table 1 Kinetic parameters of wild-type cADO and A121F towards C 6-9 aldehydes for n-dodecanal and n-decanal, M193Y for C 6-10 aldehydes and L198F for C 7-10 aldehydes. To further confirm the impact of mutations on chain length selectivity of cADO, the mixed-substrate competition assays were carried out for some mutants. While the preferred chainlength substrates of mutants were used, enzymatic activities of mutants were assayed in the presence of a short chain length substrate (n-butanal/n-heptanal/n-nonanal) and a long one (n-octadecanal), respectively. When WT ADO was assayed against n-heptanal in the presence of the competition substrates n-butanal and n-octadecanal, respectively, both did not show obvious inhibition (Table 2). However, under same conditions, inhibition was observed for A121F towards n-heptanal in the presence of n-butanal and n-octadecanal. In contrast, n-butanal did not exhibit inhibition for I24Y against n-heptanal, whereas n-octadecanal displayed some inhibition. While n-octanal was used as the substrate, WT and M193Y demonstrated similar behavior to A121F in the presence of the competition substrates. WT and L198F performed differently in the presence of n-butanal using n-nonanal as the substrate: it did not inhibit WT, but inhibited L198F, whereas both could be inhibited by n-octadecanal.
When WT ADO, F87Y and I27F were assayed against n-dodecanal in the presence of n-heptanal and n-octadecanal, respectively, both exhibited inhibition for three enzymes to different extent (Table 3). In contrast, for  V184F and V28Y, n-heptanal displayed some inhibition, whereas n-octadecanal did not.

Impact of engineered cADOs on distribution of fatty alkane profile in an E. coli cell factory
To demonstrate the importance of engineered ADO mutants on distribution of fatty alkane profile in vivo, V184F showing the highest activity for n-dodecanal was introduced into an engineered E. coli which can produce high titers of n-dodecanoic acid [28]. It is known that the fatty acid intermediate, such as fatty acyl-ACP or acyl-CoA, could be reduced by acyl-ACP or -CoA reductase into fatty aldehyde, which is further converted into fatty alk(a/e)ne by cADO [1]. Therefore, enhancing production of FFAs with certain chain-lengths is quite essential for fatty alk(a/e)ne production in genetically engineered E. coli [32]. To achieve this, the following strategies were used: (1) The gene fadE was knocked out, which can accumulate fatty acyl-CoA by blocking the fatty acid degradation pathway. (2) The gene fadD (an acyl-CoA synthetase) from E. coli was overexpressed to further boost fatty acyl-CoA yield.
(3) The gene BTE encoding Cinnamomum camphora acyl-ACP thioesterase B was overexpressed to increase the abundance of medium-chain free fatty acids such as dodecanoic acid. (4) The gene acr1 encoding fatty acyl-CoA reductase (FAR) from Acinetobacter sp. M-1 was overexpressed to reduce the accumulated fatty acyl-CoA into the corresponding fatty aldehyde. (5) Wild-type cADO or the mutant V184F was overexpressed to produce fatty alk(a/e)nes. The effects of overexpression of wild-type cADO and V184F on fatty alk(a/e)ne production in recombinant strain of E. coli BL21 (DE3) (ΔfadE) carrying BTE and fadD were investigated.
For the control strain, BL21 (ΔfadE), n-palmitic acid (C 16:0 ) and n-steric acid (C 18:0 ) were major FFAs (95 %) produced, together with trace quantities of n-dodecanoic acid (C 12:0 ) and n-tetradecanoic acid (C 14:0 ) ( Table 4). The titers of n-dodecanoic acid of the recombinant strain of LB99 (co-expression of BTE and fadD) were improved by 5.5-fold compared to the control strain, while those of n-tetradecanoic acid, n-palmitic acid and steric acid were significantly reduced by 0.43-fold, 7.5-fold, and 23-fold, respectively (Table 5). A large quantity of n-dodecanol was also detected in LB99, which could be caused by some endogenous reductases in E. coli [33]. The engineered strains LB100 (co-expression of BTE, fadD, and wild-type cADO gene) and LB101 (co-expression of BTE, fadD, and cADO mutant V184F gene) produced similar titers of n-dodecanoic acid, which were higher than that of the control strain and lower than that of LB99. These results indicated that co-overexpression of fad and BTE dramatically improved the titers of n-dodecanoic acid in BL21 (ΔfadE).

Table 3 Yields of n-undecane of WT and some cADO mutants against n-dodecanal in the presence of the competition substrates
Yields of n-undecane of WT and some cADO mutants against n-dodecanal (150 μM) in the presence of the competition substrates (2 mM n-heptanal or 150 μM n-octadecanal) were determined by GC-MS   Figure  S3).

Discussion
The carbon number distribution of fatty alk(a/e)nes varies in different fuels such as gasoline, diesel and jet fuel, and has important effects on the properties of fuels. Therefore, it's significant to genetically control the carbon chain-length of microbial hydrocarbons [27]. The available crystal structures of cADOs with fatty acids or fatty alcohol or substrate analog bound have enabled structure-guided substrate specificity engineering of cADO [1,17,21,23]. In this current paper, we have identified some potential amino acids which might impact the substrate chain-length specificity of cADO through structural analysis of cADOs. All selected amino acids are adjacent to the aldehyde group and the hydrophobic tail of the substrate and along the substrate pocket. We hypothesized that the chain-length selectivity of cADO might be changed or at least substrate access would be influenced when these amino acids were replaced with the large ones. All mutants except F87Y showed greatly reduced or no activity (A118F) towards long-chain aldehydes (C 14,16,18 ), and demonstrated preference against <C 14 aldehydes, supporting our hypothesis that replacement of the amino acids with the large ones resulted in hindering access of long-chain substrates (≥C 14 ) (Figs. 2, 3). The results are consistent with the relative positions of these amino acids in the crystal structure along the substrate channel (Fig. 1). Therefore, the size of the hydrophobic channel of the substrate was successfully decreased. Most of these amino acids could be useful for engineering cADO to synthesize fatty alk(a/e)nes (<C 13 ).
Gly31 and Ala118 are close to the aldehyde group of the substrate. According to the predicted orientation of the side chains of G31F and A118F by PyMOL (Additional file 7: Figure S4), both mutants present their side chains approximately towards the C 4 position of the bound ligand, consistent with their performance against C 14,16,18 aldehydes. However, they performed differently towards other substrates such as C 4,7,12 aldehydes. The results are not in agreement with the expected orientation of the side chains of G31F and A118F (Additional file 7: Figure  S4). A118F had significant effects on substrate specificity, whereas G31F did not. Considering the relative position and the different performance of G31F and A118F, it appears that amino acid 31 is in a more flexible position than amino acid 118. To achieve propane production by cyanobacteria, Ala118 is a good candidate for further protein engineering.
In the case of the mutants of the amino acids along the substrate channel, including I24Y, I27F, V28Y, F87Y, A121F, V184F, M193Y and L198F, they behaved differently, and were discussed according to their relative positions to the bound substrate analog: (1) Phe87 protrudes the side chain to the C 5 position. Given the small difference between the side chains of Phe and Tyr, it is understandable that introducing an additional hydroxyl group might not have big impact on substrate specificity and activity, except for n-dodecanal (2.5-fold improvement).
(2) Ile27 and Val28 are in the relatively similar position, presenting their side chains towards the C 7 -C 8 position of the substrate analog. However, the results demonstrated that mutation of them into the large ones caused some steric hindrance for long-chain substrates (≥C 14 ), and the activities of I27F and V28Y against n-dodecanal are about twofold higher than that of WT. Khara et al. reported the similar results for V41Y (the counterpart of V28Y of cADO-1593) and WT of cADO-PMT1231 against the substrates with different chain-lengths [21].
(3) Though the side chains of both Val184 and Leu198 points towards the C 9 -C 10 position, V184F and L198F performed differently. Mutation of Leu198 into large Phe might have negative impact on substrate binding (>C 9 . The side chain of Ala121 points approximately towards the C 10 -C 11 position of the substrate analog, which is consistent with the substrate selectivity of A121F. The results of A121F against the tested substrates are very similar to those of A134F of PMT1231 (corresponding to A121F of cADO-1593), and further proved the significance of this amino acid for improving the chain-length selectivity of cADO. A121F showed preference towards ≤C 12 aldehydes, higher preference for C 4,6,7 aldehydes and highest for n-hexanal (Figs. 2, 3; Table 1). The results suggested that A121F exerted great effects on both substrate preference and activity. (5) Ile24 presents the side chain towards the C 12 -C 13 position, and the results indicated that mutation of Ile24 into large Tyr affected access of medium to long-chain length substrates (≥C 9 ). I24Y showed higher preference for n-heptanal. (6) The side chain of Met193 points to the C 13 -C 14 position, thus mutation of Met193 into large Tyr could lead to hinder binding of aldehydes (>C 12 ) to enzymes. M193Y showed the highest activity against n-heptanal. Finally, it is worth pointing out that the inconsistency between predicted and actual chainlength selectivity was observed for some mutants such as I27F, V28Y, V184F, I24Y and M193Y. Tyr21 and Cys70 are close to the hydrophobic tail of the substrate. Mutation of Tyr21 into long and hydrophilic Arg impeded access to long chain-length aldehydes (≥C 12 ), whereas replacement of Cys70 with large Phe hindered access of long chain-length aldehydes (>C 12 ) (Figs. 2, 3). Y21R did not show any preference towards tested substrates, whereas C70F displayed highest activity for n-hexanal. It has been reported that C71A/S (Cys71, equivalent to Cys70 of cADO-1593) of cADO from Nostoc punctiforme PCC 73102 reduced the hydrocarbon producing activity of cADO and facilitated the formation of a dimer [22]. Based on the results and the positions of Tyr21 and Cys70 in the crystal structure, we predicted that both amino acids are possibly involved in the substrate entrance. However, according to the crystal structure of the complex of PMT1231 with the substrate (11-(2-(2-Ethoxyethoxy)ethoxy)undecanal) (PDB code: 4PGI, L194A of PMT1231 complexed with the substrate), Marsh et al, observed a T-shaped region of electron density for the bound substrate, and suggested that the fork of exiting close to Leu194 of PMT1231 might be the substrate entry point (Additional file 8: Figure S5) [23]. Our results seem not to support this. Meanwhile, Marsh et al, found that L194A of PMT1231 had similar kinetic properties to WT implying that L194A does not play a key role in limiting substrate access to the active site. Thus, the fork of the T-shaped region for the complexed substrate occupying the cavity to bind fatty acids is the possible substrate entry point.
Substrate competition experiments reflected the different binding affinities between the preferred and competition substrates (Tables 2, 3). It seems that WT binds to n-octadecanal more tightly than to n-butanal and WT showed different binding affinities towards C 7,8,9 aldehydes, which are consistent with the corresponding K m values of WT against them (Tables 1, 2, 3). The results of A121F and L198F suggested that both mutants showed improved binding affinities towards n-octadecanal and n-butanal. In comparison, the competition results of I24Y and M193Y indicated that enhanced binding affinities towards n-octadecanal were observed for them and those against n-butanal did not change a lot. While n-dodecanal was used as the preferred substrate in the presence of the competition substrates n-octadecanal and n-heptanal, similar conclusions were drawn. Enhanced binding affinities against n-octadecanal and n-heptanal for F87Y and I27F were observed. V184F and V28Y showed increased binding affinities towards n-heptanal, and the binding affinities of V184F and V28Y for n-octadecanal were not changed.
As discussed above, all mutants except F87Y exhibited lower activities against C 18,16,14 aldehydes than WT (Fig. 2a, b), whereas V184F showed the highest activity for n-dodecanal among all mutants and comparable activity to WT towards n-heptanal (Fig. 2c, d). It seems that the replacement of Val184 with Phe had important effects on substrate binding with chain-length preference, but no big influence on enzymatic activity. Therefore, V184F was chosen for further investigation. It was introduced into engineered E. coli producing high titer of n-dodecanoic acid to see if the carbon chain-length selectivity of cADO mutants could be used to control the carbon chain-length distribution of fatty alk(a/e)nes in vivo [28]. The results of fatty alk(a/e)ne production in genetically engineered E. coli suggested that cADO was successfully engineered for n-undecane production in E. coli and introduction of the mutant V184F had significant influence on distribution of fatty alk(a/e)ne profile in E. coli (Table 4). Thus, V184F could be potentially used for n-undecane production by genetically engineered microbial cell factories in future.

Conclusions
Some amino acids, which could affect the substrate specificity of cADO were identified based on the crystal structure of cADO with the bound substrate analogs and kinetically characterized. The substrate preferences of some mutants towards different chain-length substrates were successfully enhanced through structure-orientated protein engineering. The in vivo experiments of V184F in genetically engineered E. coli demonstrated the impact of structure-guided engineering of cADOs on the distribution of the fatty alk(a/e)ne profile. The study would deepen our understanding of the structure-function relationship of cADOs, and provide a guide for designing cADO to produce fatty alk(a/e)nes with certain chain lengths.

Bacterial strains, plasmids and media
E.coli DH5α and BL21(DE3) were, respectively used for routine DNA cloning and protein expression. E. coli strains were grown in LB broth or terrific broth media containing antibiotics at standard concentrations. 50 μg/ mL Kanamycin was added when required.

Construction of site-directed mutants
General molecular biology techniques were carried out by standard procedures [36]. Plasmid DNA was isolated using the Plasmid Mini Kit I. Site-directed mutants were constructed according to the standard QuikChange Site-Directed Mutagenesis protocol (Stratagene Ltd, La Jolla, California, USA) using pET28a-1593 as a template and the primers listed in Table S1 (Additional file 9: Table  S2). The required mutations were confirmed by DNA sequencing.
For construction of double/triple/multiple mutants, pET28a-1593 harboring single or double or triple mutation(s) was used as a template following the same protocol as above.

Protein overexpression and purification
Wild-type cADO 1593 and the mutants were overexpressed in E. coli BL21(DE3) following the published procedure [13]. The plasmids were transformed into E. coli BL21(DE3) competent cells. Terrific broth media at 37 °C was utilized for protein expression. The cultures were induced with 1 mM IPTG supplemented with 50 μM ferrous ammonium sulfate and 50 μg/mL kanamycin when OD 600nm reached around 0.6. The cells were continuously grown for additional 3.5 hours before being harvested at 37 °C, 220 rpm. The cultures were then disrupted by sonication in binding buffer The recombinant protein was washed using binding buffers containing a gradient (30 to 250 mM imidazole) at 4 °C. SDS-PAGE was performed in 12 % polyacrylamide gel using Coomassie Blue R-250 staining. The buffer containing 1 mM EDTA and 1 mM NTA was utilized to dialyze the protein for preparing apo-cADO-1593, and stoichiometric amounts of ferrous ammonium sulfate was added to reconstitute the diferrous form of cADO-1593 prior to assay. Proteins were concentrated and the concentration was determined by the Bradford method using bovine serum albumin as a standard [36].

Synthesis of C 14,16,18 aldehydes
According to the published procedure, C 14,16,18 aldehydes were synthesized, respectively using the corresponding fatty alcohols as the starting materials [11,37]. The synthesized products were confirmed by GC-MS.

Quantitation of nonvolatile C 11,13,15,17 alkanes by gas chromatography-mass spectrometry (GC-MS)
Quantification of nonvolatile C 11,13,15,17 alkanes was performed by gas chromatography-mass spectrometry (GC-MS). GC-MS analysis was performed on an Agilent 7890A gas chromatograph equipped with a split/split less capillary inlet, an Agilent 5975C GC/MSD with Triple-Axis Detector and an Agilent 7683B automatic liquid sampler (ALS). A HP-WAX column (30 m × 0.25 mm × 0.25 µm) was utilized with the following oven temperature program: 40 °C held for 5 minutes, to 240 °C at 25 °C min −1 , and held for 15 minutes. The injector temperature was 250 °C (split less injection), and the carrier gas employed was helium at a flow rate of 1 mL min −1 .

Gas chromatography detection of volatile C 3,5,6,7,8 alkanes
The C 3,5,6,7,8 alkane products were quantified by detecting headspace of the reactions using gas chromatography (GC). At time intervals (0, 1, 2.5, 5, 7.5 min for >C 4 aldehydes and 0, 1, 2.5, 3.5, 5 min for n-butanal), the reactions were terminated by being laid on ice. The mixtures were shaken at 37 °C and 200 rpm unless specified otherwise. Reactions of propane and pentane were performed at room temperature without being shaken due to their low boiling point. All assays were performed in triplicate.
Detection and quantification of the alkane products were performed on a Varian 3800 GC equipped with a HP-INNOWAX column (30 m × 0.25 mm × 0.25 μm). The column temperature was programmed as follows: 63 °C held for 6 minutes (for detection of propane and pentane) and 63 °C held for 1 minute, to 120 °C at 20 °C min −1 (for detection of n-hexane, n-heptane and n-octane). FID temperature was set at 200 °C and the injector temperature was 200 °C (20:1 split). The carrier gas helium was at a flow rate of 1 mL min −1 . Pure alkane standards were utilized to identify and quantitation of each alkane.

Fatty alk(a/e)ne production and analysis in genetically engineered E. coli strains
A single colony was cultured in LB medium overnight and then inoculated into modified mineral medium at 30 °C [1]. Cells were grown in the presence of kanamycin (25 mg/mL for pLB1593-acr1 and pLB1593-V184F-acr1), ampicillin (50 mg/mL for pKC11) and chloramphenicol (17 mg/mL for pAL143). P BAD promoter and P T7 were, respectively induced with 0.4 % l-arabinose and 0.5 mM isopropyl β-d-thiogalactoside at an OD 600nm of 0.6-0.8. Cell cultures were induced for 24 hours.
Cell cultures were then mixed thoroughly with equivalent volume of chloroform-methanol (v/v, 2:1), together with n-pentadecanol, n-eicosane and n-heptadecanoic acid as internal standards [34]. As described earlier, cells were prepared and analyzed for alk(a/e)ne production [34]. The temperature of the injector was set at 250 °C and the column temperature was programmed as follows:100 °C for 1 minute, then increase of 5 °C/min to 200 °C and increase of 25 °C/min to 240 °C and held for 15 minutes.

Circular dichroism spectroscopy
Far-UV CD spectra (190 to 260 nm) were measured for protein samples (0.12 mg/mL) in 10 mM potassium phosphate buffer (pH 7.2) on a Jasco J-810 spectropolarimeter at 25 °C. Data were averaged over three runs and the background was subtracted.
Secondary-structure analyses were performed with BeStSel method [38], which is available at the bestsel.elte. hu server.

Substrate competition assays for some engineered proteins and WT
In the substrate competition assays, the preferred substrates for the mutants were added together with a shorter-or longer-chain one as the competition substrate.
For mutants A121F and I24Y, n-heptanal (2 mM) was added together with equal molar n-butanal or n-octadecanal (150 μM). The apparent k cat values of the engineered proteins for n-heptanal were determined as above. n-Octanal and n-nonanal were, respectively employed to evaluate the substrate preference for M193Y and L198F in the same way. In the assays for mutants V184F, F87Y, I27F and V28Y, n-dodecanal was used together with n-heptanal (2 mM) or n-octadecanal (150 μM). n-Decane was employed as an internal standard to evaluate the production of n-undecane. The composition of the reaction mixture and reaction time were same as those in enzyme assays.