4  Taxonomic Analysis

Following taxonomic classification and exploratory visualization, downstream analyses were conducted in R to characterize microbial groups associated with Deroceras laeve.

This chapter presents an example workflow using Archaea as a representative taxonomic subset. The same strategy can be applied to Bacteria, Fungi, Viruses, or other groups of interest by modifying the selected taxonomic level.

Input Data

Kraken2-derived BIOM files were imported into R using the phyloseq package.

Code
library(phyloseq)
library(ggplot2)
library(dplyr)

dlaeve_physeq <- import_biom("dlaeve_taxonomy.biom")

Taxonomic labels were curated by removing unnecessary prefixes and assigning standardized rank names.

Code
dlaeve_physeq@tax_table@.Data <-
  substring(dlaeve_physeq@tax_table@.Data, 4)

colnames(dlaeve_physeq@tax_table@.Data) <- c(
  "Kingdom", "Phylum", "Class",
  "Order", "Family", "Genus", "Species"
)

4.1 Example Workflow: Archaea

Taxonomic Selection

Reads assigned to Archaea were extracted for focused downstream analyses.

Code
archaea <- subset_taxa(
  dlaeve_physeq,
  Kingdom == "Archaea"
)

4.1.1 Removal of Unidentified Genera

Only classified genera were retained for abundance summaries and visualization.

Code
archaea <- subset_taxa(
  archaea,
  Genus != ""
)

4.1.2 Relative Abundance Transformation

Code
percentages <- transform_sample_counts(
  archaea,
  function(x) x * 100 / sum(x)
)

4.2 Phylum-Level Summarization

Code
percentages_glom <- tax_glom(
  percentages,
  taxrank = "Phylum"
)

percentages_df <- psmelt(
  percentages_glom
)

4.2.1 Identification of Dominant Taxa

The ten most abundant phyla were selected, while less abundant taxa were grouped as Others.

Code
abundance_phylum <- percentages_df %>%
  mutate(Phylum = as.character(Phylum)) %>%
  group_by(Phylum) %>%
  summarize(Abundance = mean(Abundance)) %>%
  arrange(desc(Abundance)) %>%
  slice_head(n = 10)

4.2.2 Visualization

Relative abundance plots were generated with ggplot2 using custom color palettes.

Custom colors:

Code
colors <- c(
  "#9BCF53", "#FFD93D", "#FF70AB",
  "#A3FFD6", "#FEC7B4", "#83C0C1",
  "#FF0080", "#F273E6", "#7B66FF",
  "#FFF67E"
)

colors_by_phylum <- setNames(
  colors[1:nrow(abundance_phylum)],
  abundance_phylum$Phylum
)

colors_by_phylum["Others"] <- "#A7BED3"

Plot:

Code
ggplot(
  percentages_df,
  aes(
    x = Sample,
    y = Abundance,
    fill = Phylum
  )
) +
  geom_bar(
    stat = "identity",
    position = "fill"
  ) +
  scale_fill_manual(values = colors_by_phylum) + 
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(
      angle = 90,
      hjust = 1
    )
  ) +
  scale_y_continuous(
    labels = scales::percent_format()
  ) +
  labs(
    x = "Samples",
    y = "Relative abundance (%)"
  )

4.3 Genus-Level Analysis

To further refine the archaeal community profile, reads were also summarized at the genus level.

Code
genus_glom <- tax_glom(
  percentages,
  taxrank = "Genus"
)

genus_df <- psmelt(
  genus_glom
)

4.3.1 Identification of Dominant Genera

The ten most abundant genera were selected, while less abundant taxa were grouped as Others.

Code
abundance_genus <- genus_df %>%
  mutate(
    Genus = as.character(Genus)
  ) %>%
  group_by(Genus) %>%
  summarize(
    Abundance = mean(Abundance),
    .groups = "drop"
  ) %>%
  arrange(desc(Abundance)) %>%
  slice_head(n = 10)

genus_df <- genus_df %>%
  mutate(
    Genus = ifelse(
      Genus %in% abundance_genus$Genus,
      Genus,
      "Others"
    )
  )

4.3.2 Visualization

Relative abundance plots were generated using the same custom palette employed for phylum-level summaries.

Code
colors_by_genus <- setNames(
  colors[1:nrow(abundance_genus)],
  abundance_genus$Genus
)

colors_by_genus["Others"] <- "#A7BED3"

ggplot(
  genus_df,
  aes(
    x = Sample,
    y = Abundance,
    fill = Genus
  )
) +
  geom_bar(
    stat = "identity",
    position = "fill"
  ) +
  scale_fill_manual(
    values = colors_by_genus
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(
      angle = 90,
      hjust = 1
    ),
    legend.text = element_text(
      face = "italic"
    )
  ) +
  scale_y_continuous(
    labels = scales::percent_format()
  ) +
  labs(
    x = "Samples",
    y = "Relative abundance (%)"
  )

4.4 Adaptation to Other Taxa

The same analytical framework can be applied to other microbial groups by modifying the selected taxonomic subset according to the rank of interest and the structure of the dataset.

For example, some groups can be selected directly at the Kingdom level:

Code
subset_taxa(dlaeve_physeq, Kingdom == "Bacteria")
subset_taxa(dlaeve_physeq, Kingdom == "Viruses")

In other cases, more specific taxonomic ranks may be used. For fungal analyses, representative phyla were selected as follows:

Code
subset_taxa( dlaeve_physeq, 
             Phylum %in% c("Ascomycota", "Basidiomycota", "Microsporidia" )
  )