archives

Enumerating All Maximal Frequent Subtrees


Home 

About 

Browse 

Search 

Register 

Subscriptions 

Deposit Papers 

Help
    

Deepak, Akshay and Fernández-Baca, David (2012) Enumerating All Maximal Frequent Subtrees. Technical Report 12-01, Computer Science, Iowa State University.

Full text available as:Adobe PDF

Abstract

Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has practical applications as a metric for tree similarity, for computing tree congruence, in detection horizontal gene transfer events and as a consensus approach. Enumerating FSTs extend the MAST problem by denition and reveal additional subtrees not displayed by MAST. This can happen in two ways  such a subtree is included in majority but not all of the input trees or such a subtree though included in all the input trees, does not have the maximum number of leaves. Further, FSTs can be enumerated on collections of trees having partially overlapping leafsets. MAST may not be useful here especially if the common overlap among leafsets is very low. Though very useful, the number of FSTs suer from combinatorial explosion just a single MAST can exhibit exponentially many FSTs. This limits both the size of the trees that can be enumerated and the ability to comprehend enumerated FSTs. To overcome this, we propose enumeration of maximal frequent subtrees (MFSTs). A MFST is a FST that is not a subtree to any other FST. The set of MFSTs is a compact non-redundant summary of all FSTs and is much smaller in size. Here we tackle the novel problem of enumerating all MFSTs in collections of phylogenetic trees. We demonstrate its utility in returning larger consensus trees in comparison to MAST. The current implementation is available on the web.

Subjects:All uncategorized technical reports
ID code:00000679
Deposited by:Deepak Akshay on 09 February 2012



Contact site administrator at: ssg@cs.iastate.edu