Question 1

What is a de Bruijn graph in genome assembly?

Accepted Answer

A de Bruijn graph represents all k-mers from sequencing reads as edges, with (k-1)-mer prefixes and suffixes as nodes. The genome sequence corresponds to an Eulerian path through the graph. This formulation elegantly handles the massive redundancy of high-coverage sequencing data, since each k-mer appears as a single edge regardless of how many reads contain it.

Question 2

How does k-mer size affect assembly?

Accepted Answer

Small k values produce simpler graphs with better connectivity but more ambiguity from coincidental k-mer matches. Large k values resolve more repeats but require higher coverage (each read produces fewer k-mers) and are more sensitive to sequencing errors. Many assemblers try multiple k values and merge results.

Question 3

What is N50 and why does it matter?

Accepted Answer

N50 is the length of the shortest contig such that contigs of this length or longer cover at least 50% of the assembly. Higher N50 indicates a more contiguous assembly. A bacterial genome (5 Mb) assembled into 200 contigs with N50 = 50 kb means most of the genome is in contigs ≥50 kb long.

Question 4

Why are repeats problematic for assembly?

Accepted Answer

Genomic repeats longer than the read length (or k-mer size) create ambiguous paths in the assembly graph — the assembler cannot determine which flanking unique sequences belong together. This fragments the assembly at repeat boundaries. Long-read sequencing (PacBio, Oxford Nanopore) resolves more repeats by spanning them entirely.

Genome Assembly Simulator: De Bruijn Graph Construction

Formula

The Assembly Challenge

De Bruijn Graph Construction

The Role of K-mer Size

From Graph to Contigs

FAQ

Sources

Embed

Genome Assembly Simulator: De Bruijn Graph Construction

Formula

The Assembly Challenge

De Bruijn Graph Construction

The Role of K-mer Size

From Graph to Contigs

FAQ

Sources

Other simulations: Bioinformatics & Computational Biology

RNA-seq Differential Gene Expression

UPGMA Phylogenetic Tree Construction

Protein Folding Energy Landscape

Needleman-Wunsch Global Sequence Alignment

Embed