Blog

Human Pangenome: 466 Assemblies Now Available in Persephone

21:34 21 May 2026 in Data sets, General News, Web Version

The Human Pangenome Reference Consortium is building a pangenome reference resource. As you probably know, the de facto reference genome, GRCh38, is derived from the sequences of over 20 individuals. The new collection of assemblies represents validated, telomere-to-telomere genomic sequences for 466 haplotypes, complete with annotations. Although not every contig has been fully scaffolded into complete chromosomes, the available assemblies capture the vast majority of chromosomal sequence and provide a far more complete and representative view of human genomic diversity.

We have loaded all the assemblies to Persephone. Our technique for linking genomes via shared sequence tags enables quick navigation and alignment of matching sequences. The artificial markers (tags) of 300 bases have been extracted from the CHM13 reference genome and mapped onto all available pangenome sequences. There are a few ways to find syntenic sequences in Persephone. Of course, if the sequence names reflected the chromosome number, there would be no problem finding the matching contigs. Unfortunately, as we mentioned, the sequences do not always represent full chromosomes. One way to find the match is to right-click the Tags track and call the Find synteny tool. It will display the list of best-matching sequences ordered by the number of common tags (#Connectors).

For a given reference map (chromosome X), the FIND SYNTENY interface allows detecting the best-matching maps in selected genotypes

After defining a syntenic region through the connector lines between the tags, it becomes easier to zoom in and analyze sequence similarities with single-base precision by utilizing BLASTN or minimap2 on the two aligned sequences.

We conducted an ortholog analysis for all genes in CHM13 and GRCh38, comparing them to a total of 464 other genotypes. To begin, select either CHM13 or GRCh38 and identify a gene of interest. The properties panel will feature an “Orthologs” tab, which displays all matching proteins along with their multiple alignments. Aligning more than 400 protein sequences might be challenging but Persephone displays the result in seconds, though the timing for different alignment algorithms will vary.