archives

On a robust document classification approach using TF-IDF scheme with learned, context-sensitive semantics.


Home 

About 

Browse 

Search 

Register 

Subscriptions 

Deposit Papers 

Help
    

Pandit, Sushain (2008) On a robust document classification approach using TF-IDF scheme with learned, context-sensitive semantics.. Technical Report TR10-04, Computer Science, Iowa State Universiity.

Full text available as:Adobe PDF

Abstract

Document classification is a well-known task in information retrieval domain and relies upon various indexing schemes to map documents into a form that can be consumed by a classification system. Term Frequency-Inverse Document Frequency (TF-IDF) is one such class of term-weighing functions used extensively for document representation. One of the major drawbacks of this scheme is that it ignores key semantic links between words and/or word meanings and compares documents based solely on the word frequencies. Majority of the current approaches that try to address this issue either rely on alternate representation schemes, or are based upon probabilistic models. We utilize a non-probabilistic approach to build a robust document classification system, which essentially relies upon enriching the classical TF-IDF scheme with context-sensitive semantics using a neural-net based learning component.

Keywords:document classification neural networks semantics tf-idf sushain pandit
Subjects:Computing Methodologies: ARTIFICIAL INTELLIGENCE
Computing Methodologies: ARTIFICIAL INTELLIGENCE: Learning (K.3.2)
Computing Methodologies: DOCUMENT AND TEXT PROCESSING (H.4-5)
ID code:00000643
Deposited by:Sushain Pandit on 10 June 2010



Contact site administrator at: ssg@cs.iastate.edu