Skip to main content
Fig. 4 | Biotechnology for Biofuels

Fig. 4

From: Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)

Fig. 4

Flow diagram of the CUPP program. A CAZy protein family may be processed as a full-length protein or as the domain region alone (identified using dbCAN). A CD-HIT cluster file can be supplied which will reduce the number of protein sequences used in clustering by selecting one representative sequence for a cluster of highly similar sequences. The proteins are clustered based on conserved peptides, and for each of the resulting groups, a CUPP group is created consisting of the conserved peptides of the group. The distance between the individual proteins and the individual groups are saved as two separate dendrogram files (and distances between the proteins are also saved in Newick tree format for interaction with other programs). The CUPP group peptides are used in CUPP protein prediction for annotation of CUPP group to the query protein. In addition, the CUPP groups associated CAZy family, CAZy subfamily and EC function are also annotated to the query protein

Back to article page