Multinomial Event Model Based Abstraction for Sequence and Text Classification







Deposit Papers 


Kang, Dae-Ki, Zhang, Jun, Silvescu, Adrian and Honavar, Vasant (2005) Multinomial Event Model Based Abstraction for Sequence and Text Classification. Technical Report, Department of Computer Science, Iowa State University.

Full text available as:Adobe PDF


In many machine learning applications that deal with sequences, there is a need for learning algorithms that can effectively utilize the hierarchical grouping of words. We introduce Word Taxonomy guided Naive Bayes Learner for the Multinomial Event Model (WTNBL-MN) that exploits word taxonomy to generate compact classifiers, and Word Taxonomy Learner (WTL) for automated construction of word taxonomy from sequence data. WTNBL-MN is a generalization of the Naive Bayes learner for the Multinomial Event Model for learning classifiers from data using word taxonomy. WTL uses hierarchical agglomerative clustering to cluster words based on the distribution of class labels that co-occur with the word counts. Our experimental results on protein localization sequences and Reuters text show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and similar or often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model.

Keywords:Word Taxonomy, Word Taxonomy Guided Naive Bayes Learner, Word Taxonomy Learner, Multinomial Event Model
Subjects:Computing Methodologies: ARTIFICIAL INTELLIGENCE: Knowledge Representation Formalisms and Methods (F.4.1)
Computing Methodologies: ARTIFICIAL INTELLIGENCE: Learning (K.3.2)
ID code:00000368
Deposited by:Dae-Ki Kang on 29 April 2005

Contact site administrator at: