Background


Ebola

Filovirus outbreaks have ravaged West and Central Africa in recent decades.


Sequence data


The sequence data in this project derive from multiple distinct sources:

ncbi-refseqs: Genome-length reference sequences of representative filovirus species. These XML-formatted files are downloaded directly from NCBI and are uniquely identified within this project by their GenBank accession numbers.

fasta-curated: A non-redundant set of filovirus-derived EVE loci. These FASTA sequences have been curated via systematic screening of whole genome sequence (WGS) assemblies using the DIGS tool. Sequences in this source have unique IDs based on arbitrary numbering.

fasta-refseqs: EVE reference sequences - i.e. best-guess estimates of the ancestral sequences of the filoviruses that gave rise to EVEs. Where possible these are consensus/ancestral sequences derived from alignments included in this project. Sequence IDs used in this source correspond to the names of the unique EVE loci they represent (see here for details).


Sequence-associated data


Sequences included in this project are linked to auxiliary data in tabular format, this includes:

  1. Basic taxonomic data for genome-length virus reference sequence in ncbi-refseqs.
  2. Locus data for the EVE sequences in fasta-curated.


Multiple sequence alignments (MSAs)


Several distinct categories of MSA are included in this project, each representing a distinct taxonomic level.

  1. Tip (i): Virus species (genome-length alignment)
  2. Tip (ii): EVE lineages. These alignments contain sets of EVE sequences derived from the same ancestral germline colonisation event (i.e. orthologs or duplicates)
  3. Internal: Virus genera
  4. Root: Viruses and EVE reference sequences