Compressed Directed Acyclic Word Graph with Application in Local Alignment

Compressed Directed Acyclic Word Graph with Application in Local Alignment

0.00 Avg rating0 Votes
Article ID: iaor20134081
Volume: 67
Issue: 2
Start Page Number: 125
End Page Number: 141
Publication Date: Oct 2013
Journal: Algorithmica
Authors: ,
Keywords: computers: data-structure
Abstract:

Suffix tree, suffix array, and directed acyclic word graph (DAWG) are data‐structures for indexing a text. Although they enable efficient pattern matching, their data‐structures require O(nlogn) bits, which make them impractical to index long text like human genome. Recently, the development of compressed data‐structures allow us to simulate suffix tree and suffix array using O(n) bits. However, there is still no O(n)‐bit data‐structure for DAWG with full functionality. This work introduces an n ( H k ( S ¯ ) + 2 H 0 * ( 𝒯 S ¯ ) ) + o ( n ) equ1 ‐bit compressed data‐structure for simulating DAWG (where H k ( S ¯ ) equ2 and H 0 * ( 𝒯 S ¯ ) equ3 are the empirical entropies of the reversed sequence and the reversed suffix tree topology, respectively.) Besides, we also propose an application of DAWG to improve the time complexity for the local alignment problem. In this application, the previously proposed solutions using BWT (a version of compressed suffix array) run in O(n 2 m) worst case time and O(n 0.628 m) average case time where n and m are the lengths of the database and the query, respectively. Using compressed DAWG proposed in this paper, the problem can be solved in O(nm) worst case time and the same average case time.

Reviews

Required fields are marked *. Your email address will not be published.