|
|
|
Pandit, Sushain (2008) On a robust document classification approach using TF-IDF scheme with learned, context-sensitive semantics.. Technical Report TR10-04, Computer Science, Iowa State Universiity.
Abstract
Document classification is a well-known task in information
retrieval domain and relies upon various indexing schemes
to map documents into a form that can be consumed by a
classification system. Term Frequency-Inverse Document
Frequency (TF-IDF) is one such class of term-weighing
functions used extensively for document representation.
One of the major drawbacks of this scheme is that it ignores
key semantic links between words and/or word meanings
and compares documents based solely on the word
frequencies. Majority of the current approaches that try to
address this issue either rely on alternate representation
schemes, or are based upon probabilistic models. We utilize
a non-probabilistic approach to build a robust document
classification system, which essentially relies upon
enriching the classical TF-IDF scheme with context-sensitive
semantics using a neural-net based learning component.
Contact site administrator at: ssg@cs.iastate.edu
|