|
|
|
Deepak, Akshay and Fernández-Baca, David (2012) Enumerating All Maximal Frequent Subtrees. Technical Report 12-01, Computer Science, Iowa State University.
Abstract
Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection
horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by denition and reveal additional
subtrees not displayed by MAST. This can happen in two ways such a subtree is included in majority but not all of the input trees or such a
subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collections
of trees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though
very useful, the number of FSTs suer from combinatorial explosion just a single MAST can exhibit exponentially many FSTs. This limits both the size of the trees that can be enumerated and the ability to comprehend enumerated FSTs. To overcome this, we propose enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any other FST. The set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the
novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web.
Contact site administrator at: ssg@cs.iastate.edu
|