Core and pan-genome of Shewanella
A large proportion (94%) of genes from 24 Shewanella strains were grouped into 7830 homologous clusters (Additional file 1: Table S1). The pan-genome possessed 13,406 gene families, members of gene families in these strains were divided into three categories (core, accessory, specific) based on their appearance in different genomes. Among these gene families, 1878 families existed in all 24 genomes and hence represented the core gene complements (core genome). And 1788 families from core genome harbored only one gene from each strain (single-copy core families). Although these strains shared a core gene set, there were individual differences among the subsets of genes. The 5801 families existed in only one genome comprising the unique genes or singletons (strain-specific genome) and the remaining 5727 families (accessory genome) were present in more than one, but not all 24 genomes. The number of non-redundant strain-specific genes across different genomes varied from 58 to 782, and S. piezotolerans WP3 had the largest number of unique gene families (782) while S. sp. MR-4 possessed the smallest number of unique gene families (58) (Fig. 1a). This was consistent with the previous study that S. piezotolerans WP3 had the largest genome size among the sequenced Shewanella genomes [26]. The large proportion of specific genes suggested that the Shewanella strains harbored a high level of genomic diversity and uniqueness of each strain, showing their ability to survive in different environments. The size of pan-genome got large unboundedly with the increase of new genomes even including 13,406 non-redundant genes (Fig. 1b), which indicated that the Shewanella pan-genome was still “open”. This open pan-genome showed great potential for discovering novel genes with more Shewanella strains sequenced. In contrast to the pan-genome, the size of core genome appeared to reach a steady-state approximation after including 1878 non-redundant genes. In addition, only 14% of the pan-genome was found to be kept constant, while the remaining 86% was variable across the strains, which indicated that the pan-genome exhibited a high level of genome variability.
Phylogeny of Shewanella
To gain insights into similarity and distance of the strains, two phylogenetic trees were constructed, one was based on the concatenated alignment of 1788 single-copy core genes (Fig. 1c), and another was based on absence or presence of each gene families (Fig. 1d). In the two trees, strains were grouped into two clades. The first clade was made up of most sediment strains, and the second one consisted mostly of water strains. S. loihica PV-4, isolated from iron-rich microbial mats [27], was phylogenetically distant from the rest strains, suggesting it had a low degree of similarity within core genes to the other strains. The separation tendency of different subgroups bore a high resemblance in the two trees in spite of the strains having different relative positions. S. frigidimarina NCIMB 400 showed much evolutionary relatedness to the S. denitrificans OS217 on the single-copy core gene tree, but they were no longer the sisters on the pan-genome tree, which indicated that non-core genes were likely to make them diverged. In addition, the S. loihica PV-4 was the sister with the S. amazonensis SB2B on the pan-genome tree, but had substantial distance on the single-copy core gene tree, suggesting that the variable genes made up a dominant proportion of the phylogenetic signal and played a role in the evolution of these two strains. It was also noted that those divergent strains, including S. frigidimarina NCIMB 400, S. denitrificans OS217, S. amazonensis SB2B, S. violacea DSS12, S. woodyi ATCC 51908, S. sediminis HAW-EB3, S. halifaxensis HAW-EB4, S. pealeana ATCC 700345, S. piezotolerans WP3 and S. loihica PV-4, contained more specific genes. Hence, it was likely that the larger differences of genes lead to evolutionary divergence, which suggested that different gene gain and loss might play important roles in the evolution of some Shewanella strains and make them apart from their relatives.
CAZyme identification and profiling
Carbohydrate metabolism was one of the most important metabolic activities [28]. Members of Shewanella can anaerobically transfer electrons from cell metabolism to versatile electron acceptors [6]. The process of obtaining electrons from carbohydrates by Shewanella was the hydrolysis of complex carbohydrates present in biomass [29]. This was achieved through the presence of a repertoire of secreted or complexed carbohydrate active enzymes (CAZymes). To further understand the mechanism of polysaccharide hydrolysis in Shewanella, we identified the CAZymes in pan-genome. These strains contained an abundance of glycosyltransferases (GT), glycoside hydrolases (GH), carbohydrate esterases (CE), carbohydrate binding molecules (CBM), and a small number of auxiliary activities (AA) and polysaccharide lyases (PL) (Fig. 2a). In addition, these CAZymes were more abundant in accessory and specific genome than in core genome. This diversification of the Shewanella CAZyme-encoding genes allowed for different electron-transporting roles in cellular metabolism. In addition, strains varied in their number of unique CAZyme-encoding genes, suggesting the metabolic abilities of carbohydrates were specialized among Shewanella strains. There were 18 strains possessing specific genes for CAZymes, and GTs and GHs were most abundant (Fig. 2b).
Additionally, we found that strains isolated from the water seemed to contain less specific CAZyme-encoding genes than other strains from more diverse environments. The diversity of such specific CAZyme-encoding genes was important to the strategies of carbohydrate metabolism that would result in metabolism specialization and environment adaptation. The S. woodyi ATCC 51908 had the most strain-specific CAZymes (28), followed by S. violacea DSS12 (17) and S. sediminis HAW-EB3 (13). The three strains had more strain-specific CAZyme-encoding genes and they were isolated from detritus or sediment. Such high number of strain-specific CAZyme-encoding genes might contribute to their metabolic diversity in more diverse environments. Furthermore, GTs were the most common in the unique genes, such patterns suggested that GTs might be important in the environmental adaptation of these strains. However, GEs, CBMs and GHs were the most abundant among the S. woodyi ATCC 51908 unique genes, suggesting that GEs, CBMs and GHs might be important in the detritus adaptation of S. woodyi ATCC 51908.
Functional profiling of pan-genome
To gain insights into functional diversity in these strains, the pan-genome from the 24 genomes was also compared using functional gene ontology categories. All of the GO terms fell into three categories: biological processes (BP), molecular function (MF) and cellular component (CC) (Fig. 3a). Genes with biosynthetic process, transport and signal transduction were more abundant in the BP category. Genes related to ion binding, DNA binding and oxidoreductase activity were enriched in MF category. For the CC category, genes involved in cytoplasm, intracellular and protein complex were dominant. Pan-genome processed a great number of genes involved in various metabolic pathways and enzymes. The presence of so many carbon and energy utilization pathways, material transport and synthesis systems was a reflection of the high-pressure and deep-sea extreme environment. In addition, the existence of so many genes involved in transport, transmembrane transport and ion binding provided supports for iron reduction. The vast majority of genes in the core genome were involved in housekeeping functions, such as biosynthetic process, metabolic functions, cytoplasm, intracellular and ribosome (Fig. 3a). These categories were also present, but hardly appeared in the accessory and specific genome, whereas the accessory and specific genes were enriched in transport, signal transduction, binding and enzymatic activity (Fig. 3a). The non-core genes could be important for adaptation to specific habitats. Such functional enrichment patterns of accessory and specific genes reflected their notable respiratory diversity.
In addition, cluster of orthologous group (COG) functions of the pan-genome were compared. Genes with predicted functions and unknown functions were more abundant in the pan-genome (Additional file 2: Figure S1). The majority of genes in the core genome were involved in ‘Energy production and conversion’ (C), ‘Amino acid transport and metabolism’ (E) and ‘Translation, ribosomal structure and biogenesis’ (J) (Additional file 2: Figure S1). Whereas the accessory and specific genome had a slight increase in genes involved in ‘Amino acid transport and metabolism’ (E), ‘Transcription’ (K) and ‘Signal transduction mechanisms’ (T) (Additional file 2: Figure S1).
To reveal the uniqueness of the strains, we detected several specific genes that might be related to the survival environment. Among the strain-specific genes in pan-genome, S. loihica PV-4 had a gene encoding menaquinone biosynthesis protein MenD that facilitated electron transfer. Previous study suggested that menaquinone was the electron transporter in the respiratory chain and was essential for the survival of S. loihica PV-4 [27]. This menD gene did not exist in other strains, which might be important for the survival of S. loihica PV-4 in the iron-rich environment. In addition, S. woodyi ATCC 51908, S. pealeana ATCC 700345, S. sediminis HAW-EB3 and S. halifaxensis HAW-EB4 possessed one or two strain-specific genes encoding nitrite reductase that were involved in nitrogen oxide reduction. The S. amazonensis SB2B, also had a specific gene encoding glutamine synthetase that was a key enzyme of nitrogen metabolism. It was reported that Shewanella was capable of nitrate reduction using nitrate as the electron acceptor under certain conditions [30]. These five strains had such strain-specific genes related to nitrogen utilization, which indicated that they had characteristics that were more complex in response to nitrogen. In addition, S. amazonensis SB2B and S. sediminis HAW-EB3 possessed genes encoding inorganic diphosphatase. The specificity of this enzyme has been reported to vary with the source and the activating metal ion [31]. S. violacea DSS12 had a specific gene-encoding related cytochrome oxidase that was responsible for oxidative phosphorylation. This gene was adjacent to other electron transfer genes in S. violacea DSS12 genome, suggesting that it might interact with those adjacent genes and function in the related pathways.
Selective pressure analysis
To investigate conservation and evolution of housekeeping genes, functional diversification and evolutionary pressure of single-copy core genes were detected. In addition, the non-synonymous (dN) to synonymous (dS) substitution rates (dN/dS) were estimated for each single-copy core family. The analysis showed that all of the single-copy core genes encountered a strong purifying selection (Fig. 3b). Such an important pattern of strong purifying selection indicated that the single-copy core genes were highly conservative and purifying selection contributing to their long-term stability.
The combination of selective pressure and the functional categories could reveal the functions involved in rapid evolution. We estimated the strength of purifying selection for single-copy genes in different functions. Genes tended to exhibit different degrees of conservative evolutionary directions, and some of them experienced weaker purifying selection especially those involved in translation, protein folding and structural constituent of ribosome (Fig. 3b). However, genes involved in biological processes of cellular component assembly evolved under strongest purifying selection. For different categories of molecular function, genes undergoing the strongest purifying selection were those involved in DNA binding transcription factor activity, while genes involved in structural constituent of ribosome underwent more relaxed purifying pressure. Furthermore, genes involved in ribosome exhibited higher evolutionary rates, while genes involved in plasma membrane exhibited lower evolutionary rates. These results suggested that genes undergoing higher purifying selection, played dominant roles in the evolutionary rates among the Shewanella strains, making them retain the original functional process of cellular component assembly, plasma membrane, etc. This observation of purifying selection in single-copy core genes suggested that purifying selection might drive evolution in the essential life functions across all 24 Shewanella strains.
Gene gain and loss
Gene gain and loss during evolution can increase the fitness of bacteria within habitats [32, 33]. To obtain deeper insight into the evolution of gene families over a phylogeny, we performed an analysis of gene expansion and contraction at each branch of speciation. Diversification at branches was associated with a large number of changes in gene family size across the Shewanella tree. The 3594 gene families were more likely to have been inherited from the most recent common ancestor (MRCA) of Shewanella, and 1805 of which (50.22%) families experienced size changes (Fig. 4a). In the tree, each strain branch of the phylogeny has changed in evolution (Fig. 4a). In addition, S. loihica PV-4, the most distantly related strain based on single-copy phylogeny, had only six changed genes. Furthermore, contractions outnumbered expansions on all branches except the branch of S. loihica PV-4, S. putrefaciens CN-32, and S. baltica OS195. An average of 15 gene families was under expansion, whereas 123 gene families contracted in each strain branch, indicating that loss function played an important role in dynamic evolution. The gene families expanded in those branches of strains were mainly enriched in categories of oxidation–reduction process, catalytic activity and membrane. (Additional file 3: Figure S2A). Gene enrichment analysis showed that these strains have lost the majority of genes involved in oxidation–reduction process, DNA binding and membrane (Additional file 3: Figure S2B). The enrichment of changed genes in reduction process and membrane seemed to be ecologically important and contribute to the successful adaptation of the strains.
Due to recent acquisition and deletion, the gene that is not in the MRCA is more variable than the core gene [34]. HGT is the main driver of bacterial evolution and adaptation [35]. To infer the recent evolutionary dynamics of Shewanella on a larger scope of strains, we examined all horizontally acquired genes and obtained 1800 genes that were of horizontal origin and the number of HGT-origin genes detected each strain was different (Fig. 4b). Gene transfer occurred in many genes, which indicated that horizontal transfer genes played an important role in the gene flow and acquisition and contributed to the open pan-genome of the Shewanella. Among these strains, S. baltica OS185 appeared to gain the largest number of genes, whereas S. loihica PV-4 did not detect the horizontal gene. S. loihica PV-4 showed closest evolutionary relatedness in core gene similarity to the nearest common ancestor, and the numbers of gene expansion, contraction and HGT genes were minimum, probably because the strain’s living environment is similar to the ancestor. To infer the contribution of donors to horizontal gene transfer, we identified donors for these horizontal transfer genes (Fig. 4c, Additional file 4: Table S2). These transferred genes appeared to originate from members of Vibrionales and Enterobacteriales family, particularly the genera Vibrio, which suggested that HGT events were more effective to occur in closely related organisms rather than in distantly related organisms. In addition, these HGT-origin genes were enriched in DNA metabolic process, transposition and DNA binding (Additional file 3: Figure S2C). The acquisition of genes in DNA metabolic process, transposition and DNA binding could aid in their survival in different environments. For example, the S. woodyi ATCC 51908, S. sediminis HAW-EB3, S. sp. MR-4 and S. sp. MR-7 acquired two or three genes encoding the nitrate reductase. The S. woodyi ATCC 51908, S. sediminis HAW-EB3, S. baltica OS185, S. baltica OS223 and S. piezotolerans WP3 strains acquired a gene encoding glutamine synthetase. These genes were involved in the nitrogen cycle, which was apparently important for the successful adaptation. These results suggested that HGT was an important driver of the evolution of these genomes. Therefore, the available gene pool for HGT could explain the difference in non-MRCA genomes and the increased pan-genome size of these strains. These findings suggested that gain and loss of genes that were apparently accessory for the majority of the strains of a genus might be a successful strategy for biological diversification, rapid evolution and environmental adaptation.
Evolution of the mtr clusters in Shewanella
Previous studies on S. oneidensis MR-1 indicated that genes within a cluster (mtrBAC–omcA–mtrFED) were involved in the metal-reducing pathway, while the S. denitrificans OS217 did not possess this cluster [17, 36]. In particular, this cluster comprised seven genes that were clustered in sequential order of mtrD–mtrE–mtrF–omcA–mtrC–mtrA–mtrB [37]. The feoA–feoB operon involved in ferrous iron transport was common to all strains and adjacent to the mtr–omc cluster. To explore new features of metal reduction pathways, we compared this kind of mtr–omc gene cluster in these 24 strains. Although genes within this cluster shared a high degree of similarity with the homologues described in S. oneidensis MR-1, the number and composition carried out by different Shewanella strains were still changed (Fig. 5a). The complete mtrBAC–omcA–mtrFED cluster was conserved in 15 out of the 24 strains. In addition, the mtrABC operon was conserved in all the genomes except for S. denitrificans OS217 and S. violacea DSS12. The mtrDEF operon, homologues of mtrABC, was virtually absent from S. sp. W3-18-1, S. putrefaciens CN-32, S. putrefaciens 200, S. frigidimarina NCIMB 400, S. denitrificans OS217, S. violacea DSS12 and S. woodyi ATCC 51908. The mtr–omc clusters of 17 strains were considerably different from that of S. oneidensis MR-1 since there were gene duplication, gene loss, or new gene gains within it. Such change of this cluster reflected the evolutionary history and possibly metal respiratory specialization, which agreed well with the previous study that the absence of genes in this mtr–omc cluster would result in slower iron reduction [38]. For example, S. denitrificans OS217 and S. violacea DSS12, which both lacked the entire cluster, showed limited anaerobic growth capacity that may be due to gene loss in the process of ecological specialization [4]. In addition, expansion of omcA gene occurred in S. amazonensis SB2B, S. sediminis HAW-EB3, S. pealeana ATCC 700345 and S. loihica PV-4, which reflected gene acquisition and dynamic evolution in metal respiration. The tree of genes within the cluster and the cymA gene showed that each gene formed a group with its orthologs from different strains (Additional file 5: Figure S3). Branches of each gene clustered into two distinct groups, which imply two distinct evolutionary paths for different strains. In addition, the evolutionary relationship of omcA between these strains was determined according to their genetic characteristics. These S. baltica strains showed much evolutionary relatedness in each gene, and the omcA genes from multi-locus in S. loihica PV-4, S. sediminis HAW-EB3 and S. pealeana ATCC 700345 formed the relatively independent branch, showing the duplication events (Additional file 6: Figure S4).
The presence of mtr–omc cluster in Shewanella was supposed to correlate with its deep-sea habitat, this cluster presumably shared with other deep-sea microorganisms [39]. To examine extensive genetic exchange between dissimilatory metal-reducing bacteria, the genetic occurrence of the mtr homologues was predicted (Fig. 5b). The apparent widespread distribution of metal-reducing pathways in other bacteria indicated the importance of electron-transfer pathway in microbial oxidation–reduction of iron [37]. The cluster facilitating the iron respiration was conserved among closely related species such as Ferrimonas balearica and Vibrio vulnificus, and showed the same sequential order of mtrC–mtrA–mtrB, indicating that the order of the homologues genes was highly conserved. This was in accordance with the previous study that homologues of the metal-reducing pathway were found in a series of other dissimilatory metal-reducing bacteria and might have co-evolved in these bacteria [40]. The protein products of these mtr genes worked together to facilitate electron transfer across the cell envelop [37], which had been recognized as an early form of respiration, although it was widespread among the bacteria [41], there were still fewer strains containing complete orthologous genes of metal reduction system. The genes mtrA and mtrD were present in most strains, but few mtrBCEF and omcA genes were present. The lack of apparent mtr genes in the most metal-reducing strains was consistent with previous studies that the electron-transfer pathways used by metal-reducing strains for extracellular reduction of Fe(III)-containing minerals had evolved independently [17].
The metal-reducing pathway was generally considered as one of the most ancient microbial metabolisms on earth a long time ago [39]. Previous study had described that Shewanella contained regions, which were likely to be horizontally acquired from members of the Enterobacteriaceae family bacteria [4]. To infer the evolutionary relationships of mtr–omc cluster on a larger scope of strains, individual phylogenetic trees of mtr genes from different organisms were constructed (Additional file 7: Figure S5). Phylogenetic trees of mtrE, mtrF, mtrA and mtrD showed that Shewanella strains exhibited very similar evolutionary relatedness among each other and were relatively independent from other bacteria. However, several Shewanella strains were scattered to different branches in phylogenetic trees of mtrB, mtrC, omcA (Additional file 7: Figure S5). These mtr genes of Shewanella strains showed evolutionary relatedness to Ferrimonas species, probably because of the extensive horizontal gene transfer events (Fig. 6a). In addition, the differences of the metal-reducing pathway of Shewanella strains might be due to the result of multiple gene gains and loss events during the evolution. As these isolates originated from diverse geographic locations and habitats, such as marine water, sediments and subsurface, they carried out a diverse range of metabolic processes (Fig. 6b). And the mtr genes had changed during the evolution since they were under strong purifying selection (Fig. 6c). The strong purifying selection played a key role in selecting for such iron respiration changes in their evolution. The mtrC had lower similarity with the homologues compared with other genes, underwent more diversifying selection. Besides, due to the constraint of purifying selection, some of the mtr experienced gene loss and gene duplication, as well as new gene gains (Fig. 6d). In the metal-reducing pathway, electrons from the inner membrane passed through the periplasm and across the outer membrane to the extracellular minerals via the protein components encoding by mtr–omc cluster [42] (Fig. 6e). With the gain and loss of genes, genes of mtr–omc cluster were no longer core genes, and the ability for Shewanella to adapt to environmental conditions by metal-reducing pathway was presumably no longer essential for survival. Another possible explanation could be that other pathways had been developed to utilize different types of electron acceptors during their evolution to facultative anaerobic environments [43] (Fig. 6f). In particular, mtr genes experienced additional contraction events with some of them lost during evolution, which indicated that respiration had become less competitive as the environment changed.