De novo prediction of the genomic components and capabilities for microbial plant biomass degradation from (meta-)genomes

Table 2 Accuracy of classifying microbes as lignocellulose-degraders or non-degraders

	Presence/absence of Pfam domains	Weighted Pfam domain representation	Presence/absence CAZy family representation			Weighted CAZy family representation
	Presence/absence of Pfam domains	Weighted Pfam domain representation	A	B	C	a	b	c
nCV macro-accuracy	0.91	0.84	0.90	0.96	0.94	0.91	0.93	0.87
nCV recall	0.86	0.73	0.81	0.94	0.90	0.88	0.88	0.79
nCV true negative rate	0.96	0.96	0.98	0.98	0.98	0.95	0.98	0.95

L1-regularized SVMs were trained with Pfam domain or CAZY family (meta-)genome annotations. Capital letters denote classifiers trained based on the presence or absence of CAZy families and small letters indicate classifiers trained based on the relative abundances of CAZy families in annotations. Abbreviations “A”, “a”,” B”, “b”, “C”, “c” denote the following: Classifiers “A“,“a“ were trained with annotations of all CAZy families for 16 microbial genomes; Classifiers “B“,“b“ were trained with annotations for all CAZy families, except for the GT family members (which were not annotated for the Tammar Wallaby metagenome), for 16 genomes and the TW metagenome of plant biomass degraders; Classifiers “C“,“c“ were trained with annotations for the GH families and CBMs for the 16 microbial genomes and three metagenomes of plant biomass degraders, as only these were annotated for the metagenomes. All CAZy-based classifiers were trained with available annotations for 64 genomes of non-biomass degraders. The Pfam-based classifiers were trained with 21 (meta-)genomes of biomass-degraders and 82 microbial genomes of non-degraders. For more details on the experimental set-up and the evaluation measures shown see the Methods section on performance evaluation.

ISSN: 2731-3654