Environmental DNA: Analysis

Once you have processed your eDNA sample, you need to analyze it to extract the relevant information to answer your research questions. Similar to the processing, the analysis stage along with its associated tools and resources, depend on whether you are interested in understanding single-species information, or broader community-level information.

Stage outcomes

You will gain an improved understanding on how to analyze and visualize your eDNA in ways that are relevant to meeting your research objectives. There are a range of graphs and figures that can be produced, especially for more common type questions, such as:

Single-species

Which location has the most DNA from my species of interest?
How does the amount of DNA from my species of interest change over time at my site?

Jump to Single-species Analyses

Community

Who is there?
How many taxa are there?
How different are the samples in your dataset?
How does the environment affect the observed biological communities?

Jump to Community Analyses

Single-species Analyses

Now that you have calculated the concentration of your target DNA in your sample and across all samples in your dataset, you can run basic statistical tests to compare the amount of target DNA across sample groups (e.g. site, location, depth, time, etc.) to evaluate if there are any interesting detection patterns for your species of interest. Or, if you are simply evaluating a sample for having your species of interest or not, you can report on those detections and associated concentrations.

Regardless, it is important to keep in mind the limits of detection and quantification of your single-species genetic test because these set the minimum concentration values you can confidently report, and the variability in detection among sample replicates.

Learn more about limits of detection, and quantification of your single-species analyses.

Which location has the most DNA from my species of interest?

Map of invasive European Green Crab DNA concentration measured in eDNA samples collected from the coastal waters of British Columbia.

If you have conducted a survey of a large geographic area and are interested in where your species of interest was detected and how that varied across sites, you might consider plotting your results on a map by sampling location, scaling the size of the symbol or shading the color of the symbol according to the concentration of DNA detected in your sample. If you are interested in what might be driving differences among sites, you can pair the measured concentrations with other environmental data that you also collected (salinity, temperature, depth, nutrients, etc.) to evaluate the presence of correlations between DNA concentration and site environmental differences.

How does the amount of DNA from my species of interest change over time at my site?

Bubble plot showing the concentration (intensity of bubble color) of invasive European Green Crab (EGC) DNA concentration measured monthly (y-axis) over 4-5 years (x-axis) at a single location.

You may also wish to understand how the concentration of DNA from your species of interest varies over time at your location. This can be useful information because it tells you about what times of year the species is more likely to be abundant or more rare in your waters. If you are interested in the migration of that species through your waters, this could be helpful in pinpointing when it is around. Or, for example, in the case of a pathogen or toxic phytoplankton species, this could be useful information to help you better manage your aquaculture facility.

Community Analyses

DNA metabarcoding yields large amounts of DNA-based community composition data that, once processed, is:

Highly dimensional - there are typically more biological taxa than samples (except in the case of longer time series or large geographic surveys)
Highly complex - it is sparse, with many zeros across the dataset
Traditionally compositional - the number of sequences for a given biological taxa for a given sample is itself arbitrary and can only be interpreted relative to the total number of sequences (i.e. the rest of the taxa) in the sample.

These dataset features can make it challenging to use more traditional statistical methods to analyze eDNA data. Below are some relevant questions with associated analyses you can conduct for community data.

Who is there?

Relative abundance of sequences for fish species found in eDNA samples taken along False Creek, British Columbia.

One of the first things you are probably wondering about your eDNA data is what species or biological taxa is present. As a reminder, eDNA-derived community composition is usually shown by plotting the relative abundance of sequence reads for each major taxonomic group. This proportional data can be plotted using stacked bar graphs, donut plots, pie charts, or treemaps. These are especially useful if you are comparing groups of samples. Stacked bar graphs or stacked area graphs can be useful to show longitudinal datasets (time series). If you want to show and compare the relative abundance of taxa across all samples, heatmaps or bubble plots are a useful visualization to employ. See for example the Integrated Coastal Observatory map.

How many taxa are present and how are their amounts distributed within a sample?

The measure of species diversity within a single sample is called the alpha diversity. You might plot your measure of alpha diversity using a scatter plot to show how diversity varies across samples or, if you are curious how diversity varies within and across groups of samples, a box plot would be a good visualization to use.

Scatter plot showing algal, bacterial, fish, and invertebrate diversity, as estimated using the Shannon Diversity index, in eDNA samples collected in False Creek, Vancouver, British Columbia.

Boxplot showing the fish diversity from eDNA samples, using the Shannon Diversity Index, from west to east along False Creek.

How different are the ecological communities across different sites?

NMDS Ordination plot of eDNA samples collected in False Creek, Vancouver. Each point represents a sample and it is placed on the plot according to how related it is to other samples. The plot reveals that fish communities in False Creek structure by depth.

Comparison of community composition across samples is called beta diversity. It is a way to measure how species composition changes across habitats, sites, or environmental gradients. Beta diversity is calculated using commonly used metrics such as the Bray-Curtis Dissimilarity Index, Jaccard index, Aitchison distance or other distance metrics.

The dimensionality of the dataset is then reduced using ordination methods, such as Principle Coordinates Analysis (PCoA) or Nonmetric Multidimensional Scaling (NMDS) analysis, and plotted using ordination plots. The GUSTA ME guide is a very helpful web resource for beta diversity-related concepts. Clustering analysis, and the resulting dendrograms, is also another common way to show sample relatedness.

How does the environment affect the observed biological communities?

Commonly, environmental data is collected alongside aquatic eDNA samples. This data might be measurements of temperature, salinity, nutrients - whatever environmental characteristics you hypothesize to influence diversity patterns. You can then explore the relationships between these ‘environmental drivers’ and your eDNA biodiversity data using statistical tests like correlations (that you might display using correlograms), mantel tests, constrained ordinations, and many more.

Note: The described analyses and visualizations are just the tip of the iceberg when it comes to ways to explore, analyze, and visualize eDNA-derived biodiversity data. These data are quite complex and multifaceted. How you carve up and analyze eDNA datasets always comes back to what questions you are trying to answer and/and hypothesis you are aiming to test. With thousands of biological taxa to explore in each dataset, you may choose to use all of the data or focus on a particular taxonomic group, and dig into that more deeply.

Resources

The resources below can be useful when analyzing and visualizing the data from your eDNA samples.

Featured

Integrated Coastal Observatory

Tools Data

The Integrated Coastal Observatory (ICO) is a coordinated network of partners along the coast of British Columbia using eDNA to monitor marine biodiversity. This webtool is used to display biodiversity data.

Visit Link

GUSTA ME

Documentation Training

This guide contains descriptions of a range of analysis techniques used in microbial ecology.

Visit Link

Next stage

Interested in seeing more? Take the next step and move on to the Application stage.

Proceed to Application