May 17th, 2018
Advances in DNA sequencing have dramatically increased the rate at which new viruses are being identified. However, the host associations of viruses that have only been identified via sequencing are often open to question.
In 2013, DNA sequences derived from mysterious viruses called ‘cycloviruses’ were detected in patients with severe neurological disease on two continents. In Vietnam, a virus named "Cyclovirus-VN" was identified in association with severe brain infections. Around the same time, a closely related virus was detected in Malawi - again via sequencing - in association with cases of unexplained paraplegia.
The authors of these studies emphasised that a causal link with disease had yet to be determined. Somewhat inevitably though, the impression has been created - at least in some quarters - that cycloviruses are emerging viruses of humans and domestic animals.
"Newly emerging viruses such as Cycloviruses, which are causing neurological problems in children in Asia, are also emerging in sewage and are spreading.". The Guardian
What is certainly true is that cycloviruses continue to be detected in human samples, as well in samples derived from other sources. In most respects, however, they remain a mystery - so are these viruses really a threat to humans?
My lab has recently published a paper in the journal Virus Research, describing our investigation into the evolutionary interactions between vertebrates and circoviruses. Circoviruses (family Circoviridae, genus Circovirus) are among the smallest of all known viruses. They have small, simple genomes comprised of a single, circular strand of DNA ~2000 bases in length. Their genomes encode two major proteins: replication-associated protein (Rep) and capsid (Cap).
These viruses remain a bit of a mystery - they are not very widely studied and are difficult to grow in cell culture. However, over the past decade it's become apparent they are extremely widespread in the environment, being among the most commonly detected viruses in metagenomic samples.
Two pathogenic circoviruses have been described. First, there is “beak and feather disease virus” (BFDV), which is associated with serious disease in psittacine birds (parrots and their relatives). BFDV, which was first identified in captive birds, poses a significant threat to some endangered parrot species
Secondly, there is porcine circovirus 2 (PCV-2), which is thought to cause post-weaning multisystemic wasting syndrome (PMWS), an emerging disease of domestic swine Symptoms of PMWS are poor growth rate and/or acute malnutrition and weight loss. The emergence of PCV-2 as a pathogen appears linked - in an as-yet-undetermined way - to the modern swine production process.
We previously demonstrated that sequences derived from circoviruses occur in animal genomes. These endogenous circoviral elements (CVe) provide something akin to a fossil record for circoviruses. Like fossils, CVe sequences provide a retrospective source of information about evolution, allowing us to make inferences about the long term interactions between circoviruses and the species groups they infect.
In this study we performed a comprehensive survey of CVe in published vertebrate genome sequences.
We identified nearly 200 CVe in vertebrate genomes. This number is possibly a little misleading, as most of these sequences are part of single, large group of highly duplicated elements found in carnivore genomes. All of these CVe derive from a small number of germline incorporation events (possibly just one), and seem to have have been duplicated in carnivore genomes by mechanisms associated with non-LTR retrotransposons (the prime suspect being LINE-1
We think that the 200-odd CVe we find in published genome sequences represent at least 19 distinct occasions in which genetic material derived from a circovirus was incorporated into the vertebrate germline. However, this is a conservative estimate - the number could be much higher, perhaps double.
We also established that CVe were inserted into the germline of songbirds, parrots, snakes, and fish many millions of years ago, before the major speciation events in these vertebrate groups. Knowing this allows us to show how incredibly ancient these CVe are - ranging from ~50 million years old for the elements in birds, mammals, and fish, to ~100 million years old for the element in snakes.
Among these demonstrably ancient CVe, one identified in cyprinid fish was among the most intriguing. It has an unusual structure, comprising several genomes arranged end-to-end. It's not too hard to imagine how these tandem genome structures could have been generated as a consequence of the rolling circle mechanism of DNA replication used by these viruses.
More surprisingly, this CVe is closely related to a contemporary circovirus that infects common barbel (Barbus barbus), called barbel circovirus (BarbCV). Indeed, the similarity was so close we wondered if the BarbCV genome sequence might actually be derived from this CVe (i.e. it’s actually a PCR artifact However, this seems unlikely, not only because the genome of BarbCV is intact (whereas the CVe in cyprinids are somewhat degraded), but also because some of the researchers who reported BarbCV have also described CVe in the Indian rohu (Labeo rohita), so it seems more than likely they’re aware of this possibility and have ruled it out.
The upshot of all this is it looks like BarbCV is the modern version of a fish circovirus that circulated over 39 million years ago. Given how rapidly circoviruses can evolve, it is striking that the ancient virus looks so similar to the modern one. Along the same lines, we describe CVe in the genomes of songbirds (order Passeriformes) that are over 38 million years old, and appear to represent the ancestors of modern circoviruses infecting birds.
Interestingly, ancient CVe (30-60 million years old) were also identified in the genomes of psittaciform birds (parrots). This one seemed to have been incorporated into the germline separately from the one in songbirds, and seemed more remotely related to modern avian circoviruses such as BFDV. Could it be that parrots have - over the course of evolution - been afflicted by a lineage of circoviruses that is distinct from the one found in other birds? More speculatively (much more), could this be why parrots seem to be afflicted far more seriously by BFDV infection than other avian species?
Another interesting CVe was identified in the Ryukyu mouse (Mus caroli). This sequence grouped convincingly with carnivore circovirus (CarCV) in evolutionary trees, and because it is easy to imagine that most zoonotic infections of large-bodied mammals might have originated in smaller ones, it is tempting to think that this demonstrates a rodent origin for carnivore circoviruses. However, the fact of the matter is that at this point we can’t yet draw any firm conclusions here. Nevertheless, the robust grouping of the Ryukyu mouse CVe with a modern virus infecting dogs suggests that these two sequences could be the first representatives of a larger sub-lineage of mammalian circoviruses.
In addition, the CVe in the Ryuku mouse has apparently been generated quite recently - presumably after this species diverged from the domestic mouse (Mus musculus). This indicates that a broad range of species and genus-specific CVe remain to be identified in vertebrate genomes.
We identified some pairs of CVe that stood an outside chance of being very old orthologs, but were more likely to be distinct integrations. These included insertions in the genomes of frogs that would be very old indeed (>200 million years) if they could be shown to be orthologs - but again, this seems unlikely.
Another pair of possible (but unlikely) orthologs was identified in marsupials. While CVe in the Tasmanian devil and koala genomes may not be orthologs of one another, they could nevertheless be old enough to predate the arrival of placental mammals in Australia. This would indicating that circoviruses were present in Australian marsupials ancestrally, and were not introduced to the continent by placental mammals.
Finally, we identified sequences in the genome of the inshore hagfish that appeared to be derived from a highly divergent circovirus-related lineage. Hagfish are basal vertebrates - in other words, they branched off the vertebrate tree early in evolution. The CVe we identify in the hagfish genome presumably derive from a circovirus lineage that infects these species. It will be interesting to see if any related virus sequences turn up in metagenomic samples.
We have a lot left to learn about circoviruses. For example, to reduce disease risks, we really need to understand why some circovirus infections are asymptomatic, whereas others cause severe disease. Studies of porcine circoviruses implicate the modern swine production process in the emergence of circovirus disease, but through what mechanism precisely?
are based on whole, inactivated virus, or the Cap protein.)
Once again, we can expect to gain a great deal of insight into this question (as well as other aspects of circovirus ecology and evolution) simply by investigating the distribution and diversity of contemporary circoviruses using the stupendous power of next-generation sequencing.
A limitation of our study is the lack of statistical support for some of the internal branching patterns of the Circovirus tree. In addition, there are many CVe that we are not yet able to date, or for which we are only able to provide one age bound (i.e. a minimum or maximum). However, further sampling of CVe and circoviruses may resolve these issues, making it possible to calibrate the timeline of circovirus evolution in greater detail, and with greater precision.
Our study allowed us to calibrate the long-term evolution of some circovirus lineages. This in turn allowed us to demonstrate that the protein-coding sequences of these viruses have changed remarkably little in millions of years. This is surprising when we consider the capacity of viruses to evolve extremely rapidly. Also, if circoviral proteins aren't changing much, how have circoviruses been able to adapt so that they could counter host defences, and infect new hosts?
Possibly, the answer lies in the non-coding regions of circovirus genomes, which appear much less conserved. Interestingly, a recent study indicated the presence of conserved secondary structures within the non-coding region of the goose circovirus genome. These structures appeared to be conserved despite the underlying sequences being highly variable, which suggests they could be functional elements in circovirus replication.
I'm intrigued by the notion that circovirus adaptation might be characterised by rapid evolution in non-coding sequences while the core protein components of the replication cycle remain largely unchanged.