archives

AUTOMATED DATA-DRIVEN DISCOVERY OF MOTIF-BASED PROTEIN FUNCTION CLASSIFIERS


Home 

About 

Browse 

Search 

Register 

Subscriptions 

Deposit Papers 

Help
    

Wang, Xiangyun, Schroeder, Diane, Dobbs, Drena and Honavar, Vasant (2002) AUTOMATED DATA-DRIVEN DISCOVERY OF MOTIF-BASED PROTEIN FUNCTION CLASSIFIERS.

Full text available as:Adobe PDF

Abstract

AUTOMATED DATA-DRIVEN DISCOVERY OF MOTIF-BASED PROTEIN FUNCTION CLASSIFIERS Xiangyun Wang, Diane Schroeder, Drena Dobbs, and Vasant Honavar Artificial Intelligence Laboratory Department of Computer Science and Graduate Program in Bioinformatics and Computational Biology Iowa State University Ames, IA 50011, USA www.cs.iastate.edu/~honavar/aigroup.html honavar@cs.iastate.edu ABSTRACT This paper describes an approach to data-driven discovery of decision trees or rules for assigning protein sequences to functional families using sequence motifs. This method is able to capture regularities that can be described in terms of presence or absence of arbitrary combinations of motifs. A training set of peptidase sequences labeled with the corresponding MEROPS functional families or clans is used to automatically construct decision trees that capture regularities sufficient to assign the sequences to their respective functional families. The performance of the resulting decision tree classifiers is then evaluated on an independent test set. We compared the rules constructed using motifs generated by a multiple sequence alignment based motif discovery tool (MEME) with rules constructed using expert annotated PROSITE motifs (patterns and profiles). Our results indicate that the former provide a potentially powerful high throughput technique for constructing protein function classifiers when adequate training data are available. Examination of the generated rules in relation to known 3-dimensional structures of members in the case of two families (MEROPS families C14 and M12) suggests that the proposed technique may be able to identify combinations of sequence motifs that characterize functionally significant 3-dimensional structural features of proteins.

Keywords:protein function classification, decision trees, motifs, bioinformatics
Comments:Wang, X., Schroeder, D., Dobbs, D., and Honavar, V. (2002). Automated Data-Driven Discovery of Motif-Based Protein Function Classifiers. Information Sciences. In press.
Subjects:Computing Methodologies: ARTIFICIAL INTELLIGENCE: Learning (K.3.2)
Computing Methodologies: PATTERN RECOGNITION: Applications
Computer Applications: LIFE AND MEDICAL SCIENCES
ID code:00000289
Deposited by:Vasant Honavar on 07 December 2002
Alternative Locations:http://www.cs.iastate.edu/~honavar/Papers/infoscipapernewest.pdf



Contact site administrator at: ssg@cs.iastate.edu