Features

The table below lists the features of the Persephone application.

Feature	Notes
Show genetic maps with marker mapping	Marker positions are given in centimorgans (cM)
Show maps based on sequence (chromosomes and scaffolds)	Some map sets can contain millions of scaffolds
Show multiple maps on one screen with synteny visualization	Connections are based on orthologous genes, common markers, sequence similarity regions
Link genetic and physical maps by common markers	Identical markers located on different maps are automatically linked
Maps can be shown vertically and horizontally	The vertical layout is common for genetic maps. Sequence maps are typically shown in the horizontal orientation
Sequence is shown for entire chromosomes	When the entire map is shown at low zoom, the genomic sequence is represented by a histogram based on the GC content. Zoom in to see individual base pairs. A text window with a sequence view of 2 Mbp is synchronized with the graphics. The text selection is reflected on the map.
Track types
– sequence	For example, the wheat genome is 16 Gbp; after zooming in, a track can show individual nucleotides
– gene models	Typically, tens of thousands of features per one track
– markers (in general, a position on a map: SNP markers, repeats, regions of interest, etc.)	Tested with millions of markers per one track
– quantitative tracks (RNA-seq coverage, methylation or conservation levels, etc.)	Example: The human genome has the conservation track (phyloP100) where each nucleotide has a value. Several quantitative tracks can be merged into one
– QTLs	Each QTL is assigned to a trait and a study
– Variation (SNPs and indels)	The variants track can have multiple sub-tracks (e.g., one per sample). Examples: Human samples have 80 million SNPs per patient. 3000 rice accessions with 5 million SNPs each.
Marker details form	Shows properties, sequences, and lists all locations of the marker on other maps
Gene details form
– basic properties	The prediction method, coordinates on the map, etc.
– spliced, unspliced, protein sequence with color decoration	The text selection is synchronized between the tabs
– metrics tab with exon coordinates and sizes	Selection is synchronized with the sequence tabs
– transcripts view	All gene models that overlap with the selected gene are combined in one zoomable view. Coordinates and sizes of exons and introns are shown. Differences in splice patterns are highlighted. Genes with common CDS can be collapsed. Sorting can be done by the unspliced sequence length, transcript name, or CDS length.
Realtime BLAST	The interface allows selection of multiple genomes or individual chromosomes. Results are shown graphically, allowing analysis of each HSP. The raw BLAST output is also available. The BLAST parameters can be customized
Bookmarks	Remembers screen layout, can be shared with other users
Find the best matching syntenic chromosome for a given map	Click a track (markers or genes) and see which tracks in which genomes have the most matching features. Dot plot is shown for each pair of maps to be considered.
Synteny matrix	Zoomable dot plot for all vs. all maps of two genomes (based on orthologs, markers, or syntenic regions)
Create new tracks by filtering existing tracks	Extract features based on some criteria and create a new track. The features can be given distinct colors based on some criteria
Create new tracks by filtering and converting existing tracks	Example: find genes that have ‘cancer’ in their description and show them as a marker track with labels
Import tracks from external files
– markers	Read from a simple tab-delimited file with 3 columns.
– VCF	In progress
– BAM	Allows loading BAM files from a URL or the local disk. BAI file is required. In the case of loading from a URL, only the BAI file is transferred and processed; the rest of the data is fetched on demand. BAI file is analyzed and used to display the density of the read alignment. Tested on files of 200 GB. Works with BAM files without sending the data to the server.
– CRAM	Requires CRAI index file
– bedGraph	Quantitative data, such as RNA-seq coverage plots
– gff	Gene models or multi-part matches
– BED files	The BED tracks can be added to the database or as a private track. Features can have distinct colors
– PAF or ribbon files	The standard PAF files or proprietary ribbon files store syntenic ribbons
– FASTA files	Users can drag/drop FASTA files with multiple sequences and create a new genome entry
Text search powered by Apache Solr	The external user data is also indexed
Load BAM files to the database	In addition to the BAM files loaded by end users for themselves, the BAM data can be added to the database to make it available to all users
Gene models in the annotation track can carry quality marks	The marks will have different colors depending on the quality of prediction.
Export	Allows to output genes, markers or QTLs with their properties and sequences for map segment or entire map set
– markers and qualifiers
– gene models and qualifiers
– bedGraph tracks
– BLAST search results
– genomic sequence
– text search results
– Export of ortholog pairs	export all orthologs for a pair of genomes. In pipeline
Instant genomic sequence comparison (<1 Mbp regions)	Show two maps, zoom into a region of 1 Mbps, create ribbons for identical sequences on the fly by clicking a button that runs BLASTN (results are typically shown in less than 1 sec)
Instant genomic sequence comparison (full chromosomes)	Using ‘minimap2’ to align entire maps and produce ribbons of synteny.
DNA motif search	The motifs with wildcards can be found in the entire sequence. The sequence view and graphics are synchronized
Inventory of tracks	Activate/deactivate tracks by selecting them from a large collection of tracks
Highlight and label genomic regions	Select a region of a map and save it for the future, giving the area a label and a specific color.
Statistics of maps of a genome or gene models in a track	The gene structure statistics can be shown for one track or all tracks of a map set
Link in and link out	The object properties can be shown as URLs to external resources. The web application accepts a set of parameters in the URL to navigate directly to a region or an object or interest
Multiple protein sequence alignment	Aligning ortholog protein sequences or a custom set provided as a multi-FASTA text

Features

Recent Posts