Techniques for identifying tandem repeats in DNA sequences
chapter
posted on 2017-12-06, 00:00authored byM Miah, Saleh Wasimi
Tandem repeats are repetitive elements in DNA sequences, which are assumed to be an important cause of genomic instability in biological science. Many algorithmic techniques exist to detect the tandem repeats from a DNA sequence, which can be broadly classified to be either exact or heuristic. Exact methods use mathematical techniques like Fourier transform, wavelet transform, K-tuples based method, etc. Heuristic methods use tools like combinational methods, suffix trees, statistical models, Hamming distance, and such others to identify approximate tandem repeats. Most of the available software tools for identification are designed to be data intensive due to the huge size of genomic data and variable number of tandem repeats in genome sequence. Published literature is extensive on exploring the identification techniques for tandem repeats but almost nil in reviewing the techniques. This paper briefly reviews some available popular identification techniques of tandem repeats providing insight and critique on their advantages and disadvantages learned through testing each software under identical environment.
Funding
Category 1 - Australian Competitive Grants (this includes ARC, NHMRC)