Genomics of Gene Expression

Microbes continually shape Earth’s biochemical and physical landscapes by inhabiting diverse metabolic niches. Despite the important role microbes play in ecosystem functioning, most microbial species remain unknown highlighting a gap in our understanding of structured complex ecosystems. To elucidate the relevance of these unknown taxa, often referred to as “microbial dark matter,” the integration of multiple high throughput sequencing technologies was used to evaluate the co-occurrence and connectivity of all microbes within the community. Since there are no standard methodologies for multi-omics integration of microbiome data, we evaluated the abundance of “microbial dark matter” in microbialite-forming communities using different types meta-omic datasets: amplicon, metagenomic, and metatranscriptomic sequencing previously generated for this ecosystem. Our goal was to compare the community structure and abundances of unknown taxa within the different data types rather than to perform a functional characterization of the data. Metagenomic and metatranscriptomic data were input into SortMeRNA to extract 16S rRNA gene reads. The output, as well as amplicon sequences, were processed through QIIME2 for taxonomy analysis. The R package mdmnets was utilized to build co-occurrence networks. Most hubs presented unknown classifications, even at the phyla level. Comparisons of the highest scoring hubs of each data type using sequence similarity networks allowed the identification of the most relevant hubs within the microbialite-forming communities. This work highlights the importance of unknown taxa in community structure and proposes that ecosystem network construction can be used on several types of data to identify keystone taxa and their potential function within microbial ecosystems.