Structure of a DIGS-based investigation

Comparative studies using database-integrated genome screening (DIGS) entail separate 'exploration' and 'analysis' phases, with each of these phases being split into two component parts as follows:

Overview - phases

As shown above, this process is usually iterative - at least to some degree - since analysis of screening results often reveals new information that can be used to design more informative or comprehensive screens.

Exploration phase: Setting up and running an in silico screen

DIGS is a project-based framework in which investigations are centred around a genome feature of interest. Any genome feature can be investigated in principle, so long as it contains sufficient sequence conservation to be reliably detected in a similarity search.

The 'reference sequence library' is a curated set of sequences relevant to the genome feature under investigation). Usually this will consist of:

However, depending on the kind of investigation being performed, it may also contain :

Screening entails selecting particular sequences from the reference library for use as 'probes' in a BLAST search of a specific 'target database'.

Sequences that match to the query ('hits') can then be extracted and classified. A convenient way of rapidly classifying or 'genotyping' hits is via BLAST-based comparison to the reference library, as indicated in the illustration below.

Exploration phase

Schematic representation of the exploration phase of a DIGS-based investigation.
Here, the genome features being investigated are a set of related genes In step (1) a sequence from the reference library is selected and used as a 'probe' or 'query' in a BLAST-based search of a chosen target database. In step (2), sequences identified in this search are extracted and classified via BLAST-based comparison to the reference library. These searches provides a way to effectively 'delve in' to genomic databanks and recover related sequences and as such, they provide a means to survey unmapped regions of the genomic 'landscape'.

Analysis phase: Inspecting results and exporting data for comparative analysis

In DIGS, a similarity search-based screening pipeline is linked to a relational database management system (RDBMS), and the outputs of screening are captured in a project-specific relational database.

This approach not only provides a convenient and robust basis for implementing systematic, automated screens that proceed in an efficient, non-redundant way, it also allows screening data to be interrogated using structured query language (SQL) - a well-established, powerful approach for querying relational databases.

  1. Investigation of output via the relational database.
  2. Comparative genomic analysis of exported sequence data

Analysis phase

Analysing screening output: A schematic representation of the two component parts of the 'analysis' phase of DIGS-based screen (some comparative analysis do not require an alignment, but most do).