EvoMiner: Frequent Subtree Mining in Phylogenetic Databases







Deposit Papers 


Deepak, Akshay, Fernández-Baca, David, Tirthapura, Srikanta, Sanderson, Michael J and McMahon, Michelle M (2011) EvoMiner: Frequent Subtree Mining in Phylogenetic Databases. Technical Report 11-08, Computer Science, Iowa State University.

Full text available as:Adobe PDF

There is a later version of this eprint available: Click here to view it.


The problem of mining collections of trees to iden- tify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-specic constant-time candi- date generation scheme, an ecient ngerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that re- quires neither costly subtree operations nor database traversal. As a result of these techniques, our algo- rithm achieves speed-ups of up to 100 times or more over Phylominer, another algorithm for mining phyloge- netic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed.

Keywords:Data mining, maximum agreement subtree, phylogenetics, evolutionary bioinformatics, pattern discovery.
Subjects:All uncategorized technical reports
ID code:00000677
Deposited by:Deepak Akshay

Available Versions of This Paper

Contact site administrator at: