Description

This GitHub repository contains data and program logic supporting comparative and phylogenetic investigations of APOBEC3 evolution. This resource was generated as part of a collaborative investigation by Kei Sato, Jumpei Ito, and Rob Gifford.

Nomenclature

A3 genes contain conserved regions called Z-domains. Z-domains can be used to group A3 genes into classes, as follows: there are three distinct types of Z-domain, labelled Z1-Z3, and A3 genes are modular in nature, consisting of either a single Z domain (Z1, Z2, or Z3) or some combination of two Z domains (e.g. Z2 and Z3). In the proposed nomenclature for A3 genes they are labelled accordingly (e.g. A3Z2-A3Z3).

In some genomes, one or more of the three Z domains has been duplicated. In this case, duplicates are distinguished by adding lowercase letters (a, b, c, etc), labelling each duplicate alphabetically, proceeding in a 5' to 3' direction.

Sequence data

The sequence data in this project have been organised into the following sources:

ncbi-refseqs: mRNA reference sequences for A3 genes from distinct species. These XML-formatted files are downloaded directly from NCBI using a GLUE module (see here) and are uniquely identified within this project by their GenBank accession numbers.

fasta-curated: A non-redundant set of loci disclosing similarity to A3 Z-domains. These FASTA sequences have been curated via systematic screening of whole genome sequence (WGS) assemblies using the DIGS tool. Sequences in this source have unique IDs based on arbitrary numbering.

fasta-refseqs: Reference sequences of A3 genes not included in NCBI (e.g. pseudogenes).

Sequence-associated data

Sequences included in this project are linked to auxiliary data in tabular format, this includes:

  1. Sequence-associated metadata for the reference mRNA sequences in ncbi-refseqs.
  2. Locus data for the sequences in fasta-curated.

Multiple sequence alignments (MSAs)

Several distinct categories of MSA are included in this project, each representing a distinct taxonomic level.

  1. Tip MSAs: Alignments of A3 alleles from a single species (mRNAs)
  2. Internal MSAs: A3 genes (mRNAs) and A3 gene loci (genomic DNA)
  3. Root MSA capturing homology between distint A3 genes

Scripts

Scripts used in the analysis can be found here.

Contributors

Jumpei Ito (jampei0513@yahoo.co.jp)

Robert J. Gifford (robert.gifford@glasgow.ac.uk)

Kei Sato


Related Publications


Ito J, Gifford RJ, and Kei Sato (2019)
Retroviruses drive the rapid evolution of mammalian APOBEC3 genes.
PNAS [view]

Zhu H, Dennis T, Hughes J, and RJ Gifford (2018)
Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. [preprint]


License


This project is licensed under the GNU Affero General Public License v. 3.0.