You are given a two data sets:
a Staphylococcus aureus bacterium reference genome and some FASTQ files of a closely related mutant strain available at ~/shared-team/2025_training/week1/staph.
a metagenomic sample from the coffee fermentation process available at ~/shared-team/2025_training/week1/metagenome. In this task you will briefly explore the assembled MAGs (you will assemble your own metagenome in week 5).
Tasks: Explore the provide fastq and fasta file before and after assembly
Think about some echo statements that you can use in your log files.
Setup:
Gather information about tools and commands that you will need (e.g. fastqc).
Inspect the quality of the fastq files (Hint: use fastqc)
What is the reference genome length? (Hint: select the reverse of the headers, remove new line characters and count the bases)
Using this information and number and length of the reads?
What is the length of each contig?
Create a plot (using R or Python) to compare the contig lengths
Search the genebank nucleotide website for your fav (short!) sequence and try to display it using the Entrez/dna_features_viewer
View the same sequence in an actual genome brower (why is this better?)