MAG Catalogues
MAG Catalogues as a resource
MAGs 1 are an approach to deriving genome-resolved information from metagenomic datasets.
MGnify’s MAG Catalogues are biome-specific, clustered, annotated collections of MAGs. Biomes are selected on the grounds of data availability, community interest, and project objectives.
Practical 1: finding MAGs by taxonomy on the MGnify website
Search the All genomes list for the genus Jonquetella
In which catalogues is that genus found?
What do thise biomes have in common, and how does this align with the species found? 2
Now, we want to get a FASTA sequence for this genus.
Using what we’ve learned about QC on the course, look at the detail statistics of the Jonquetella MAGs. Which one is best? 3
We will use it later.
Practical 2: query MGnify catalogues using sourmash
Sourmash is a tool to compare DNA sequences against each other. The MGnify Genomes resource uses the sourmash library to create sketches (hashes) of every genome in every catalogues. You can query this index using your own sequences (typically MAGs you have retrieved from elsewhere or assembled yourself).
Use the MAG sequence FASTA file you earlier retrieved. 4
In which catalogues is a match found for that query genome?
What use cases can you think of for this kind of cross-catalogue search? 5
Practical 3: query MGnify catalogues using sourmash, programmatically
The MGnify website is just a client of the MGnify API 6.
For this part of the practical, there is a Jupyter Notebook you can follow along and try to complete the code blocks.
To open it on your training VM:
cd ~/mgnify-notebooks
git status
# make sure you're on the "comparative_practice_2023" branch
task edit-notebooksAfter a few seconds, some URLs will be printed in the terminal. Open the last one (http://127.0.0.1:8888/lab?token=.....), by right-clicking on the URL and selecting “Open Link”, or by copying-and-pasting it into a web browser like Chromium/Firefox.
Follow along the steps (completing some code blocks) in the notebook.
Use the Jupyter Notebook after the course
This notebook is based on a publicly accessible version. You can use this at any time.
- It is available to use from your web browser, no installation needed: notebooks.mgnify.org
- You can see a completed version of it, with all the outputs, on docs.mgnify.org
- You can use a prebuilt docker image and our public
notebooksrepository: github.com/ebi-metagenomics/notebooks. This should work on any computer you can install Docker on. - You can try and install all the dependencies yourself
¯\_(ツ)_/¯
Footnotes
Metagenome Assembled Genomes↩︎
Hint… what does
anthropiin the speciesJ. anthropiderive from?↩︎Hint… each MAG’s detail page overview tab shows stats including completeness, contamination, and N50.↩︎
If you got lost earlier, download it from MGYG000304175.fna↩︎
There are interesting use cases for researchers (checking which environments a species is found in, checking whether a newly assembled genome is novel etc), as well as use cases for services like MGnify (cross-linking genomes between catalogues where those datasets are not clustered together).↩︎
Application Programming Interface↩︎