About Persephone

Persephone® has been originally designed as a genome viewer for gene predictors. It has allowed users to align various types of evidence with gene models and assess the quality of gene structure predictions.

Since 2010, the tool has evolved to visualize genetic maps, QTLs, variants, RNA-seq, synteny, NGS read alignment, and more, facilitating swift navigation in the ever-expanding world of genomic information. The browser can show multiple maps on the screen, which helps analyze pangenomes. Its capability to compare maps in real time offers a convenient method for detecting structural variations.

Today, Persephone is a state-of-the-art industrial-scale genome browser capable of rapidly showing large data sets due to unique compression algorithms, optimized data transfer, and a fast-rendering engine, which engages some cutting-edge technologies borrowed from the gaming industry.

The entire Persephone framework consists of a database, an API server, which handles communications and serves the main client application, a BLAST and text search servers, and a loader utility to populate the database. Users can also quickly visualize their data without loading it to the database by dragging and dropping local files or reading them from remote locations via URL.

Persephone is used daily in large corporations and academic institutions and continues to uphold its reputation for stability, power, and versatility.

Getting Started with Persephone

The Dataflow

A typical workflow for the genomic data assumes loading the data into the database (Oracle, Postgres, or MySql-compatible, such as MariaDB). A variety of files in common bioinformatic formats are parsed, the data is indexed, compressed, and stored in a system specifically designed for fast retrieval. The typical data types and file formats are:

reference genomic sequence(s) (from FASTA or GENBANK formats)
linkage groups (CSV or tab-delimited files) with marker positions in cM
gene annotation tracks (GFF3, GTF, bed, or GENBANK files)
marker tracks – features with a name and position on the genome (CSV or tab-delimited files)
quantitative trait loci (QTLs) (CSV or tab-delimited files)
quantitative tracks (bedGraph, bigWig, wiggle, bedmethyl)
individual resequencing data (variants: SNPs and indels) (VCF files)
orthologs (paralogs) that help link syntenic maps (tab-delimited files)
synteny regions (could be based on BLASTN, minimap or mummer output) (CSV, PAF, chain files)
BAM/CRAM files

The main data-processing component, which understands these formats and uploads the data into the database, is called PersephoneShell. It reads the data files following the instructions from an INI-formatted control file, which specifies how to interpret the data. For example, an INI file may contain the rules on how to parse the FASTA headers or which attributes from GFF3 to load, etc. PersephoneShell can run in an interactive mode displaying prompts or helping with auto-completion of the commands. Alternatively, PersephoneShell can be included in scripts and perform the commands in batch mode.

The main Persephone client application runs in a web browser.

Storing the data in the database has several advantages, especially in the corporate environment, namely:
– consolidating a variety of files in a central company-wide repository;
– using common nomenclature, which helps with an inventory of bioinformatic assets. For instance, it is easy to see which primer or probe sequences are associated with a marker or which genomes the marker has been mapped onto;
– pre-processing large volumes and optimizing storage and retrieval of data. The system has been used with billions of SNPs and millions of maps and markers.

When browsing the data from the database with Persephone, the users can drag/drop their external files to visualize custom tracks. A powerful export engine also allows analyzing or exporting entire data sets, such as, promoter regions for a list of genes with common function or all protein sequences generated from gene models predicted on a genome.

The Persephone software stack is typically installed using a Docker image, which contains all necessary components.

Please see the complete documentation of the system setup, working with Persephone, and loading the data.

Designed for very large data sets – Persephone has been designed to optimize storage, access, and visualization of genetic information for large genomes. For example, plants contain great complexity in their genetic information. Healthcare has lagged behind agriculture in the use of genetic information due to difficulties accessing large population sizes, which is commonplace in crops. With the amount of human genetic data exploding, as genomics moves into the clinic, Persephone offers the scalable solution that can handle terabytes of data required to be easily accessed.

Optimized data compression and memory usage – Knowing the characteristic of genetic data enables Persephone to provide compression rates that are better than the standard ZIP compression. This same compression capability allows Persephone to optimize memory usage across the system and provides handling large files in real time.

Capacity to scan whole genomes – Currently, many genetic tests consist of a panel of one or a few genes. Whole genome sequencing is demonstrating that while there are a few mutations that are common to particular diseases, there are many mutations that are unique to individuals. Just as radiologists are necessary to interpret the subtle differences and indicators in X-ray or MRI data, it is likely that clinical geneticists will need tools to interpret whole genomes and their individual differences, not just panels of a few genes. As new genetic discoveries are made, Persephone enables the rapid identification of afflicted individuals without having to rescan whole genomes.

Intuitive user interface – Persephone’s easy-to-use, intuitive interface allows non-expert users to utilize familiar point-and-click, drag-and-drop, cut-and-paste, and zoom functions to explore and compare genetic information. To enable better interpretation, Persephone organizes genetic information for easy visualization, search, filtering, and comparison. Persephone’s high-performance graphics provides a smooth, animated genome visualization.

About Persephone

Getting Started with Persephone

The Dataflow

Recent Posts