Database-integrated genome screening (DIGS)

Molecular sequence data are highly information rich, and are now being generated much faster than they can be analysed. Consequently, public databases contain multitudes of gene, pseudogene, transposon, virus, and non-coding element sequences that have not been yet been identified, or are only poorly described.

Database-integrated genome screening (DIGS) can be used to screen genome and transcriptome assemblies for sequence features of interest (e.g. genes, transposons, functional non-coding sequences), facilitating investigations of their evolution, distribution and diversity.

In DIGS, the output of sequence similarity search-based screens is captured in a relational database. This allows for the interrogation and manipulation of output data using structured query language (SQL). In addition, it provides all the benefits of a relational database management system (RDBMS) with respect to features such as data recoverability, multi-user support and network access.

The DIGS tool

The DIGS tool is a PERL program for implementing DIGS with assembled sequence data (not short read data). It uses the Basic Local Alignment Search Tool (BLAST) to perform sequence similarity searches, and the MySQL RDBMS to capture their output.

Instructions for installing and running the DIGS tool can be found on the DIGS tool wiki

For examples of how the DIGS tool can be used, see these pages.

Author

Robert J. Gifford (robert.gifford@glasgow.ac.uk)

Contributors

Dan Blanco-Melo (dblan003@gmail.com)

Henan Zhu (h.zhu.1@research.gla.ac.uk)

Tristan Dennis (t.dennis.1@research.gla.ac.uk)

Josh Singer (josh.singer@glasgow.ac.uk)

Joseph Hughes (joseph.hughes@glasgow.ac.uk)

Sejal Modha (sejal.modha@glasgow.ac.uk)

Richard Orton (richard.orton@glasgow.ac.uk)

Paul Bieniasz (paul.bieniasz@rockefeller.edu)