Background/Description
Deltaretroviruses (genus Deltaretrovirus) are a highly unusual genus of retroviruses (family Retroviridae) that have only been identified in a restricted subset of mammalian species. They include the primate T-cell lymphotropic viruses (PTLVs) that infect apes (including humans) and Old World monkeys, and bovine leukemia virus (BLV) which infects cattle.
Like all retroviruses, deltaretroviruses cause common, persistent infections. Infection is frequently asymptomatic, but can lead to inflammatory and malignant disease over the longer term.
Deltaretroviruses and their hosts. Left to right: (i) mandrills (ii) colobus monkeys and (iii) chimpanzees are among the many species of African primate infected with deltaretroviruses. Human populations are also infected, and have been for millennia - deltaretroviral proviruses have been recovered from mummified remains in the Andes (iv), showing that deltaretroviruses were present in human populations that reached South America.
This is Deltaretrovirus-GLUE, a GLUE project supporting comparative genomic and evolutionary analysis of deltaretroviruses. It contains a richly annotated sequence dataset for these viruses, comprised of both viral sequences and endogenous retroviruses (ERVs).
There are a wide variety of ways in which the Deltaretrovirus-GLUE resource can be used:
- To perform comparative genomic studies across the family Deltaretrovirus, e.g. as part of an investigation of deltaretrovirus diversity.
- To facilitate in-depth comparative investigations of any virus species or group within the family Deltaretrovirus.
- As a source of systematically organised information about endogenous retroviruses (ERVs) derived from deltaretroviruses.
What is a GLUE project?
GLUE is an open, integrated software toolkit that provides functionality for storage and interpretation of sequence data.
GLUE supports the development of “projects” containing the data items required for comparative genomic analysis (e.g. sequences, multiple sequence alignments, genome feature annotations, and other sequence-associated data).
Projects are loaded into the GLUE "engine", creating a relational database that represents the semantic relationships between data items. This provides a robust foundation for the implementation of systematic comparative analyses and the development of sequence-based resources.
The core schema of this database can be extended to accommodate the idiosyncrasies of different projects, and GLUE provides a scripting layer (based on JavaScript) for developing custom analysis tools.
Hosting of GLUE projects in an online version control system (e.g. GitHub) provides a mechanism for their stable, collaborative development.
Some examples of 'sequence-based resources' built for viruses using GLUE include:
- COV-GLUE: A GLUE resource for tracking genetic variation in SARS-COV2. CoV-GLUE contains a database of amino acid replacements, insertions and deletions which have been observed in GISAID CoV-19 sequences sampled from the pandemic.
- RABV-GLUE: Tailored toward epidemiological tracking of rabies virus (RABV). Includes a database of RABV sequences and metadata from NCBI, updated daily and arranged into major and minor clades, and an analysis tool providing genotyping, analysis and visualisation of submitted FASTA sequences.
- HCV-GLUE: This GLUE resource aims to support analysis of drug resistance and vaccine escape in hepatitis C virus (HCV). A database of HCV sequences and metadata from NCBI, updated daily and arranged into clades (genotypes, subtypes). As well as pre-built multiple-sequence alignments of NCBI sequences, it includes an analysis tool providing genotyping, drug resistance analysis and visualisation of submitted FASTA sequences.
What does building the Deltaretrovirus-GLUE project offer?
Deltaretrovirus-GLUE offers a number of advantages for performing comparative sequence analysis of deltaretroviruses:
- Reproducibility. For many reasons, bioinformatics analyses are notoriously difficult to reproduce. The GLUE framework supports the implementation of fully reproducible comparative genomics through the introduction of data standards and the use of a relational database to capture the semantic links between data items.
- Reusable data objects and analysis logic. For many - if not most - comparative genomic analyses, data preparation is nine tenths of the battle. The GLUE framework has been designed to ensure that work spent preparing high-value data items such as multiple sequence alignments need only be performed once. Hosting of GLUE projects in an online version control system such as GitHub allows for collaborative management of important data items and community testing of hypotheses.
- Validation. Building GLUE projects entails mapping the semantic links between data items (e.g. sequences, tabular data, multiple sequence alignments). This process provides an opportunity for cross-validation, and thereby enforces a high level of data integrity.
- Standardisation of the genomic co-ordinate space. GLUE projects allow all sequences to utilise the coordinate space of a chosen reference sequence. Contingencies associated with insertions and deletions (indels) are handled in a systematic way.
- Predefined, fully annotated reference sequences: This project includes fully-annotated reference sequences for major lineages within the Deltaretrovirus family.
- Alignment trees: GLUE allows linking of alignments constructed at distinct taxonomic levels via an "alignment tree" data structure. In the alignment tree, each alignment is constrained to a standard reference sequence, thus all multiple sequence alignments are linked to one another via a standardised coordinate system.
Installing Deltaretrovirus-GLUE
On computers with GLUE installed, the Deltaretrovirus-GLUE project can be instantiated by navigating to the project folder, initiating GLUE, and issuing the following command in the GLUE shell:
Mode path: /
GLUE> run file buildDeltaretrovirusCoreProject.glue
This will build the Deltaretrovirus-GLUE core project by executing the commands in
this file.
The Deltaretrovirus project can be further extended to incorporate ERV sequences by executing the commands in this file, as follows.
Mode path: /
GLUE> run file buildDeltaretrovirusPaleoProject.glue
The Deltaretrovirus paleovirus extension incorporates a set of endogenous viral elements (ERVs) recovered from the genomes of metazoan species. Building the paleovirus extension allows automated alignment and phylogeny reconstruction for individual ERV lineages in the project, based on the classifications in these files. Individual ERV sequences have been classified into sets considered likely to have arisen from the same germline colonisation event. Loci have been named using a systematic approach (see here for details).
Related Publications
Singer JB, Thomson EC, McLauchlan J, Hughes J, and RJ Gifford
(2018)
GLUE: A flexible software system for virus sequence data.
BMC Bioinformatics
[view]
Zhu H, Dennis T, Hughes J, and RJ Gifford
(2018)
Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database.
[preprint]
Hron T, Elleder D, and RJ Gifford
(2019)
Deltaretroviruses have circulated since at least the Paleogene and infected a broad range of mammalian species.
Retrovirology
[view]
Hron T, Farkašová H, Gifford RJ, Benda P, Hulva P, Tamás Görföl, Pačes J, and D. Elleder
(2018)
Remnants of an ancient deltaretrovirus in the genomes of horshoe bats (Rhinolophidae).
Virus Research
[view]
Gifford RJ, Blomberg B, Coffin JM, Fan H, Heidmann T, Mayer J, Stoye J, Tristem M, and WE Johnson
(2018)
Nomenclature for endogenous retrovirus (ERV) loci.
Retrovirology
[view]
Farkašová H, Hron T, Pačes J, Hulva P, Benda P, Gifford RJ, Elleder D.
(2017)
Discovery of an endogenous Deltaretrovirus in the genome of long-fingered bats (Chiroptera: Miniopteridae).
Proc Natl Acad Sci U S A.
[view]
Contributors
Robert J. Gifford (robert.gifford@glasgow.ac.uk)
Daniel Elleder (daniel.elleder@img.cas.cz)
License
This project is licensed under the GNU Affero General Public License v. 3.0.