Long-Read Metagenomics

“The future is already here—it’s just not very evenly distributed.”
— William Gibson

The question now facing us is have we wasted your time teaching you short-read metagemomics, when the future is with long reads?


Podcast

Listen to this podcast


Socratic dialogue

Take a look at what Socrates and Aspasia have to say in the gentle Socratic dialogue about whether short-read metagenomics belongs in the dustbin of history.

Socrates: My dear Aspasia, I see a fire in your eyes today, one that speaks of passionate conviction. What thought commands such energy?

Aspasia: Oh, Socrates! It is the matter of metagenomics. The world, once enthralled by short-read sequencing, should now rise to see its limitations. It belongs in the dustbin of history. The time has come for long-read sequencing to reign supreme in the realm of metagenomic discovery, with its clarity and completeness beyond the reach of fragmented short-read assemblies.

Socrates: You propose, then, that short reads, which have brought us so far, are now antiquated? That we discard them as one would the quill after discovering the pen? I wonder, Aspasia, if such a judgment is not precipitous. Tell me, what brings you to this conclusion?

Aspasia: Socrates, let us first consider the problem of assembly. Short-read sequencing, even with coverage as deep as the ocean, falls prey to contig fragmentation. The repetitive elements of microbial genomes are insurmountable barriers to full assembly, creating mosaics of incomplete information. Contrast this with long-read sequencing technologies like Oxford Nanopore and Pacific Biosciences. These instruments, with reads spanning tens to hundreds of kilobases, unravel genomes with a continuity that short reads could never achieve. The long reads map through repetitive regions and complex genomic structures, enabling us to reconstruct genomes as they truly are.

Socrates: And yet, does not this continuity come at a cost? The error rates of long-read sequencing, though improved, have historically been a subject of much discussion. Have we not, through bioinformatics, developed hybrid approaches where short-read accuracy and long-read contiguity combine to achieve formidable results?

Aspasia: True, Socrates, long reads once bore the stigma of high error rates. But that was the past. Modern Nanopore and PacBio technologies boast significant strides in base-calling accuracy. The PacBio HiFi reads now approach 99.9% accuracy, with length enough to span complex genomic regions and repetitive sequences. And Nanopore, with adaptive sampling and real-time sequencing improvements, has crossed into regions previously unattainable. Error correction algorithms and advancements in deep learning have driven down these inaccuracies. Why, then, cling to short reads whose fragmented insights pale against such sweeping completeness?

Socrates: A fair point, Aspasia, but let me pose a question: Is cost not still a factor for the scientific community? To some, accessibility and affordability hold as much weight as accuracy. Illumina’s short-read sequencing, with its relatively low cost and established infrastructure, democratizes metagenomics across labs of various means. And do not computational advancements such as SPAdes, MEGAHIT, and binning algorithms like MetaBAT and MaxBin mitigate these concerns by assembling short reads into high-quality metagenome-assembled genomes (MAGs)?

Aspasia: Ah, Socrates, while cost may be an argument for those content with the rudimentary, let us consider the missed potential. Short-read MAGs are indeed assembled, but their incompleteness and chimeric errors are commonplace. How many microbial species with low genomic representation are lost in the noise? In contrast, long reads, even at greater expense, offer a more profound resolution of microbial diversity and strain-level variation, particularly when considering the intricate interplay of microbial communities and plasmid structures. We cannot advance with partial maps when entire landscapes are available.

Socrates: Yet, one cannot ignore the advances in bioinformatics tailored for short reads. Sophisticated assembly graphs and algorithms like metaFlye, which you might argue belong to long-read sequencing, have extended their hands to short-read data, producing MAGs with remarkable precision. And what of co-assembly approaches where short reads from multiple samples are pooled to enhance assembly statistics? The realm of metagenomics has become a field of combinatorial prowess.

Aspasia: The bioinformatics you speak of, Socrates, often stitches fragments based on statistical probabilities rather than true sequence continuity. This is akin to reconstructing a city based on scattered blueprints, a far cry from walking its streets. Even metaFlye, which thrives on long-read data, struggles when chained to short reads. And let us not forget the discovery of novel plasmids, integrative conjugative elements, and the dynamic, mobile genetic content that is often fragmented and lost in short-read assemblies. Is not the very essence of metagenomics to reveal these complex ecological roles?

Socrates: Your points are persuasive, Aspasia. But I maintain that short-read sequencing should not be consigned to the dustbin. Rather, it should continue as a foundation, complementing long-read technology. Hybrid methods, like those that combine Nanopore reads with Illumina polishing, yield high-quality genomes with far lower financial and computational strain than long reads alone. Moreover, for ecological and population-level studies where breadth of data trumps individual genome completeness, short reads excel in their depth and precision.

Aspasia: A union, then, Socrates? You, the philosopher who insists on synthesis, propose a partnership between the old and the new? I admit, there is value in short reads as companions to their long-read successors. Yet, should we not push forward with ambition rather than cling to what is limited?

Socrates: Let us not throw away the tools that built our current understanding. Rather, we must use them with wisdom. The future of metagenomics lies not in the exclusive use of long reads or short reads, but in the judicious application of both, where each shines brightest. Would you not agree, Aspasia, that progress is a matter of integration, of seeing the strengths in every part of the toolkit?

Aspasia: Perhaps, Socrates, I see the balance you seek. But let us not forget that for the high peaks of discovery, long reads have set their claim. If we wish to unlock the true mysteries of metagenomic landscapes, they must lead the expedition.

Socrates: And in their shadow, short reads will continue to illuminate the paths they cannot reach alone. Thus, with both in hand, we stride into the future, seeking to know the world as deeply and completely as our tools allow.


Reading list

Read and critically review the following papers:


Discussion points

  • Discuss whether short-read metagenomics is becoming obsolete.
  • Debate whether long-read metagenomics is a revolution or if it remains more hype than reality.
  • Why should anyone still be using short-read metagenmomics?
  • How much of what you have learnt this week is still relevant to long-read metagenomics?