By the end of this session, you will be able to:
maxbin_bins/001.fasta
) using all three tools and interpret the output files..tsv
, .gbk
, and summary reports.Abricate is a bioinformatics tool developed by Torsten Seemann for identifying antimicrobial resistance and virulence genes in genomic sequences by screening against curated databases. It is widely used in microbial genomics and public health surveillance.
Big thanks to Torsten Seemann for creating practical and useful tools that contribute significantly to the research community.
To screen all FASTA files in a directory:
abricate *.fasta
This command processes each .fasta
file and displays results on the screen.
To list databases installed in Abricate:
abricate --list
This will show the available databases and their descriptions, helping you choose the appropriate one for your analysis.
To redirect Abricate output to a .tsv
file for further review:
abricate *.fasta > abricate_out.tsv
This saves results to abricate_out.tsv
without printing to the screen.
Use the --db
flag to specify a database for targeted analysis:
abricate --db <database_name> *.fasta > abricate_<database_name>_out.tsv
For instance, to use the ResFinder database:
abricate --db resfinder *.fasta > abricate_resfinder_out.tsv
Abricate output in TSV format includes:
Spend the next 20-30 minutes using abricate to explore your MAGs. Do you find many resistance genes or virulence factors??
Prokka is a bioinformatics tool designed for the rapid annotation of prokaryotic genomes, producing outputs that adhere to standard file formats. It is commonly used to identify coding sequences (CDS), tRNAs, rRNAs, and other genomic features, annotating them based on known databases.
Prokka is another product created by the mighy Torsten Seemann
If you use Prokka results in your work, cite Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30(14):2068-9.
Prokka can be finickety when installed in different Conda environments. I have found that for best results, it’s good to use the seqanalysis
environment.
conda deactivate
conda activate seqanalysis
conda install prokka
Run these commands to confirm the installation:
prokka
prokka --version
prokka --listdb
To annotate a genome bin, use the following command:
prokka maxbin_bins.001.fasta
This command will generate an output directory named based on the current date (e.g., PROKKA_11132024
).
The output log provides detailed information about the annotation process. You will see information about:
Prokka provides several options to customize the annotation process. Here are some useful flags:
Specify a Prefix for Output Files:
By default, Prokka uses a prefix based on the date. You can change this with the --prefix
option to make outputs easier to track:
prokka --prefix my_annotation maxbin_bins.001.fasta
This will name the output files with my_annotation
instead of PROKKA_<date>
.
Use a Reference Genome for Enhanced Annotation:
If you have a related reference genome that can guide the annotation, you can use the --proteins
flag:
prokka --proteins reference_proteins.faa maxbin_bins.001.fasta
This helps Prokka prioritize annotation based on known proteins, which can improve accuracy.
Set the Locus Tag:
Use the --locustag
flag to define a custom locus tag prefix for your gene IDs:
prokka --locustag ABC123 maxbin_bins.001.fasta
Genus-Specific Annotation:
The --genus
flag allows Prokka to apply more targeted rules for annotation if the genus is known:
prokka --genus Escherichia maxbin_bins.001.fasta
Number of CPUs: To speed up the annotation, increase the number of CPU cores used:
prokka --cpus 8 maxbin_bins.001.fasta
These options allow you to tailor the annotation process to your specific project needs, improving both the customization and accuracy of your results.
After running Prokka, explore the generated directory (e.g., PROKKA_11132024
) and examine the output files, particularly files of the form *.gbk
.
Explore the files, understand the annotations, and consider how different databases and thresholds affect the final output.