2020-01-29

github.com/k-hench

Motivation

Given a set of communities - how can we judge if these communities are different or similar?

More precisely: If we have several records of species assemblages, can we infferr if they were sampled from the same biological community?

This tutorial covers two tools to help you in forming an opinion:

  • PERMANOVA: a statistical test on predifined groups of communities
  • nMDS: a visualization of all species records in 2D

To follow you’ll need a few packages:

install.packages("tidyverse")
install.packages("vegan")
install.packages("ggalt")

nMDS

  • ordination technique to reduce the dimensinality of a complex dataset (eg a species list).
  • aim is to give a visualisation in 2D summarising the species composition.
  • units are the individual observations, therefore not dependent on the grouping of these observations
  • based on the (Bray-Curtis) distances between observations

load nMDS data

library(tidyverse)
library(ggalt)
library(vegan)
data_samples <- read_tsv('data.tsv')
Latittude Longitude Spot Group species_1 species_2 species_3 species_4
9.3508 -82.2577 STRI_point gr1 1 4 5 1
9.3594 -82.2774 Casa_blanca gr1 1 3 4 0
9.3014 -82.2941 Punta_Juan gr2 5 0 1 4
9.3668 -82.2914 Punta_Caracol gr1 0 3 3 3
9.2487 -82.2944 Mangrove_point gr3 2 5 0 2
9.2422 -82.1119 Moguls gr2 5 2 0 5

prepare nMDS export function

export_nmds <- function(tib, nmds, score_name = "Spot", species_name = "Species"){
  # doublecheck that rownames are uniqe,
  # otherwise the merge at the end duplicates entries 
  if(any(duplicated(rownames(scores(nmds))))){
    stop(paste("Sorry, the rownames of the distance matrix need to be unique.
               Currently that is not the case:\n\n ", 
               paste(c('!',' ')[2 - (duplicated(rownames(scores(nmds))) | duplicated(rownames(scores(nmds)),
                                                                                     fromLast = TRUE))],
                     rownames(scores(nmds)),
                     collapse = '\n  ')))
    }
  
  # export obesravtions in nMDS space
  data_scores <- as.data.frame(scores(nmds)) %>% 
    as_tibble(rownames = score_name)
  # export species in nMDS space
  data_species <- as.data.frame(scores(nmds, "species"))%>% 
    as_tibble(rownames = species_name)
  
   list(spots = tib %>% left_join(data_scores),
        species = data_species)
}

run and export nMDS

data_matrix <- data_samples %>%
  select(species_1:species_4) %>%
  as.matrix()

rownames(data_matrix) <- data_samples$Spot

set.seed(5)
data_nmds <- metaMDS(data_matrix, k = 2, trymax = 999, distance='bray')
data_nmds_results <- export_nmds(data_samples, data_nmds)
Run 0 stress 0.04718413 
Run 1 stress 0.09445906 
Run 2 stress 0.04718396 
... New best solution
    .  .  .  .  .
Run 18 stress 0.04718408 
... Procrustes: rmse 0.0001986373  max resid 0.0004407735 
... Similar to previous best
Run 19 stress 0.05841901 
Run 20 stress 0.05841916 
*** Solution reached

plot nMDS

ggplot(data_nmds_results$spots, aes(x = NMDS1, y = NMDS2, group = Group)) + 
  coord_equal() +
  geom_hline(yintercept = 0, color = 'lightgray', alpha = .4) +
  geom_vline(xintercept = 0, color = 'lightgray', alpha = .4) +
  geom_text(inherit.aes = FALSE,
            data = data_nmds_results$species, 
            aes(x = NMDS1, y = NMDS2, label = Species)) +
  geom_point(aes(colour = Group), size = 2) +
  geom_encircle(aes(colour = Group, fill = Group), 
                s_shape = 1, alpha = 0.2, size = 1, expand = 0)+
  scale_x_continuous(expand = c(.01,.05))+
  theme(legend.position = 'bottom',
        panel.grid = element_blank())

nMDS grouping is external

Permanova

  • statistical test that the centroids and dispersion of the groups are equivalent for all groups
  • rejection of the null hypothesis means that either the centroid and/or the spread of the objects is different between the groups (Wikipedia)
  • “The only assumption of the test is that […] the observations are independent and that they have similar distributions” (Anderson, 2001)

assumptions of Permanova

# Bray-Curtis
distances <- vegdist(data_matrix, 
                     distance = 'bray')
# order of samples
distances_groups <- data_samples[match(labels(distances),
                                       data_samples$Spot),]$Group
# beta dispersion
distances_betadispersion <- betadisper(distances, 
                                       distances_groups)
# test homogenous dispersion
# not met, p < 0.05*******
permutest(distances_betadispersion) 
#> 
#> Permutation test for homogeneity of multivariate dispersions
#> Permutation: free
#> Number of permutations: 999
#> 
#> Response: Distances
#>           Df   Sum Sq   Mean Sq      F N.Perm Pr(>F)
#> Groups     2 0.002086 0.0010430 0.1771    999  0.836
#> Residuals  9 0.053012 0.0058903

run Permanova

# run PERMANOVA
data_permanova <- adonis(formula = distances ~ distances_groups,
                permutations = 999, method = 'bray')
print(data_permanova) # significant, p = 0.001 - differences between reef sites
#> 
#> Call:
#> adonis(formula = distances ~ distances_groups, permutations = 999,      method = "bray") 
#> 
#> Permutation: free
#> Number of permutations: 999
#> 
#> Terms added sequentially (first to last)
#> 
#>                  Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)   
#> distances_groups  2   0.87737 0.43869  13.183 0.74552  0.002 **
#> Residuals         9   0.29948 0.03328         0.25448          
#> Total            11   1.17685                 1.00000          
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1