Overview


This is the home page of DIGS for EVEs, a study lead by Robert J. Gifford.

DIGS-for-EVEs header

In this project I have used the database integrated genome screening (DIGS) tool to systematically screen metazoan genomes in silico with the aim of recovering viral 'fossil record'. This record is comprised of endogenous viral elements (EVEs) - DNA sequences found in eukaryotic genomes that derive from ancient viruses.

I've attempted to make the viral fossil record more readily accessible to interested researchers through the: (i) introduction of a standard nomenclature, and; (ii) the use of the GLUE software framework to capture the complex semantic links among data items.

Scope


The DIGS for EVEs project is currently restricted to EVEs derived from viruses with genome sizes under 30 kilobases. Thus, the project excludes poxviruses and other double-stranded DNA viruses with similarly large genomes.

DIGS for EVEs is an open-ended project for several reasons. Firstly, new genome data are continually accumulating, and every new genome offers the possibility to identify novel EVEs.

Secondly, all EVEs have been detected based on their similarity in sequence to viruses. Characterisation of novel viruses can therefore lead to the identification of new EVEs in published genomes.

Finally, discriminating EVEs from other kinds of genomic sequences is straightforward for some virus groups, but less so for others. In challenging cases, carrying out the kinds of detailed investigation required to positively identify EVEs is a slow process.

In this project, we aim to focus on identifying the most tractable kinds of EVE, across the broadest possible range of Eukaryotic hosts. Scope is expected to expand incrementally as the project progresses.

GLUE projects

The GLUE software framework aims to enable reuse of virus genome data and associated algorithms across different sequence analysis contexts. We have incorporated data recovered in the DIGS-for-EVEs project into 'sequence-based resources' constructed using GLUE.

Parvovirus-GLUE: Resources for comparative genomic analysis of parvoviruses. Parvoviruses (family Parvoviridae) are a diverse group of small, non-enveloped DNA viruses that infect a broad and phylogenetically diverse range of animal species. The family includes numerous pathogens of humans and domesticated species, but they are also being developed as next-generation therapeutic tools - for example, rodent protoparvoviruses (RoPVs) are promising anticancer agents that show natural oncotropism and oncolytic properties, while adeno-associated virus (AAV), a non-autonomously replicating dependoparvovirus, has been successfully adapted as a gene therapy vector.

Flavivirus-GLUE: Resources for comparative genomic analysis of flaviviruses. The flaviviruses (family Flaviviridae) are a group of enveloped, positive-strand RNA viruses, many of which pose serious risks to human health on a global scale. Arthropod-borne flaviviruses such as Zika virus (ZIKV), Dengue virus (DENV), and yellow fever virus (YFV) are the causative agents of large-scale outbreaks that result in millions of human infections every year, while the bloodborne hepatitis C virus (HCV) is a major cause of chronic liver disease.

Hepadnaviridae-GLUE: a hepadnavirus-focussed project that captures information about hepadnaviruses ( family Hepadnaviridae) -a group of reverse-transcribing DNA viruses that infect vertebrates. The type species - hepatitis B virus (HBV) - is estimated to infect ~300 million people worldwide, causing substantial morbidity and mortality. Recent studies have revealed that hepadnaviruses infect a diverse range of vertebrate species, ranging from fish to mammals. They are associated with disease in many of these species.

Filovirus-GLUE: [pending release 2021]

HHV6-GLUE: human betaherpesvirus 6A (HHV-6A) and human betaherpesvirus 6B (HHV-6B) are closely related species of roseolovirus (genus Roseolovirus). These viruses cause lifelong, persistent infections and are known to sometimes integrate into the telomeric regions of chromosomes. Insertions into the germline sometimes occur, and a small proportion of people (1-2%) carry chromosomally-integrated HHV6 (ciHHV6) insertions in their genome.

CRESS-GLUE: - this GLUE repository focuses on circular Rep-encoding single-stranded DNA (CRESS DNA) viruses (phylum Cressdnaviricota).

Deltaretrovirus-GLUE: deltaretroviruses are an unusual group of complex retroviruses, with only a few species known. This project contains all GenBank sequence data, as well as rare examples of deltaretrovirus-derived ERVs.

Lentivirus-GLUE: [pending release 2021]

ERVdb: [pending release late 2022]


Nomenclature for ERVs and EVEs


We have applied a systematic approach to naming ERVs and EVEs, described here. EVEs and ERVs are assigned a unique identifier (ID) constructed from a defined set of components.

EVE Nomenclature - Lenti example

The first component is a classifier that usually denotes the virus family that the EVE derives from. For example, the classifier 'ERV' is applied to all endogenous retroviruses. Classifiers usually follow the conventions established previously, as far as possible. For example, endogenous hepadnaviruses are given the classifier eHBV (endogenous hepatitis B virus), as this term has generally been used to describe endogenous hepadnaviruses. For segmented viruses classifiers also designate a gene or segment (e.g. EBLN = endogenous borna-like nucleoprotein).

The second component is a composite of two distinct subcomponents separated by a period: (i) the name of the specific subgroup it derives from; (ii) a numeric ID that uniquely identifies the insertion. The numeric ID is an integer that identifies a unique insertion locus that arose as a consequence of an initial germline infection. Thus, orthologous copies in different species are given the same number.

Where an EVE sequence is thought to have been duplicated within the germline following it's initial incorporation (e.g. via segmental duplication or transposition) we have appended an additional 'duplicate id' to the numeric ID, separated by a period. Please note that we have not yet resolved the orthologous relationships among sets of eHBV sequences belonging to multicopy eHBV lineages. We have therefore assigned unique duplicate IDs to each sequence within these lineages.

The third component of the ID defines the set of host species in which the ortholog occurs, or did occur prior to being deleted.

DIGS screening files


Probe and reference sets of polypeptide sequences for DIGS were obtained from NCBI's 'Viral Genomes resource.

In addition to reference sequences obtained from NCBI, the DIGS reference library included in this repo contains reference sequences for EVEs, and for various non-EVE genomic sequences that give spurious matches to viruses in DIGS, including: (i) host genomic sequences; (ii) endogenous retroviruses; (iii) transposons.

Related Resources


Virus-Host Interface: In these research projects, the DIGS tool was used to investigate the evolution of virus-interacting proteins (VIPs) and the GLUE software environment was used to capture domain knowledge recovered in each study.


Related Publications


Campbell M, Loncar S, Gifford RJ, Kotin R, and RJ Gifford (2021)
Comparative analysis reveals the long-term co-evolutionary history of parvoviruses and vertebrates.
[preprint]

Bamford CGG, de Souza WM, Parry R and RJ Gifford (2021)
Comparative analysis of genome-encoded viral sequences reveals the evolutionary history of the Flaviviridae.
[preprint]

Lytras S, Arriagada G, and RJ Gifford (2021)
Ancient evolution of hepadnaviral paleoviruses and their impact on host genomes.
Virus Evolution [view]

Hildebrandt E, Penzes J, Gifford RJ, Agbandje-Mckenna M, and R Kotin (2020)
Evolution of dependoparvoviruses across geological timescales – implications for design of AAV-based gene therapy vectors. Virus Evolution [view]

Pénzes JJ, de Souza WM, Agbandje-Mckenna M, and RJ Gifford (2019)
An ancient lineage of highly divergent parvoviruses infects both vertebrate and invertebrate hosts.
Viruses [view]

Callaway HM, Subramanian S, Urbina C, Barnard K, Dick R, Hafentein SL, Gifford RJ, and CR Parrish (2019)
Examination and reconstruction of three ancient endogenous parvovirus capsid proteins in rodent genomes.
Journal of Virology [view]

Hron T, Elleder D, and RJ Gifford (2019)
Deltaretroviruses have circulated since at least the Paleogene and infected a broad range of mammalian species.
Retrovirology [view]

Halo JV, Pendleton AL, Jarosz AS, Gifford RJ, Day ML, and JM Kidd (2019)
Origin and recent expansion of an endogenous gammaretroviral lineage in canids.
Retrovirology [view]

Zhu H, Gifford RJ*, and Murcia PR* (2018)
Distribution, diversity and evolution of endogenous retroviruses in perissodactyl genomes.
Journal of Virology
* co-corresponding authors [view]

Zhu H, Dennis T, Hughes J, and RJ Gifford (2018)
Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. [preprint]

Gifford RJ, Blomberg B, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, and WE Johnson (2018)
Nomenclature for endogenous retrovirus (ERV) loci.
Retrovirology [view]

Pénzes JJ, Marsile-Medun S, Agbandje-McKenna M, and RJ Gifford (2018)
Endogenous amdoparvovirus-related elements reveal insights into the biology and evolution of vertebrate parvoviruses.
Virus Evolution [view]

Blanco Melo D, Gifford RJ, and P. Bieniasz (2018)
Reconstruction of a replication-competent ancestral murine endogenous retrovirus-L.
Retrovirology [view]

Dennis TPW, Flynn PJ, de Souza WM, Singer JB, Moreau CS, Wilson SJ, and RJ Gifford (2018)
Insights into circovirus host range from the genomic fossil record.
Journal of Virology [view]

Dennis TPW, de Souza WM, Marsile-Medun S, Singer JB, Wilson SJ, and RJ Gifford (2018)
The evolution, distribution and diversity of endogenous circoviral elements in vertebrate genomes.
Virus Research [view]

Souza WM, Romeiro MF, Fumagalli MJ, Modha S, de Araujo J, Queiroz LH, Durigon EL, Figueiredo LT, Murcia PR, Gifford RJ. (2017)
Chapparvoviruses occur in at least three vertebrate classes and have a broad biogeographic distribution.
J Gen Virol. [view]

Blanco-Melo D, Gifford RJ, and PD Bieniasz (2017)
Co-option of an endogenous retrovirus envelope for host defense in hominid ancestors.
Elife [view]


License


This project is licensed under the GNU Affero General Public License v. 3.0.