Vemula, Satish, Justin, Hare and Seo-Young, Noh (2003) Query-friendly Compression and Indexing of Recurring Structures in XML Documents. Technical Report 03-02, Computer Science, Iowa State University.
XML documents are by design self-describing. In order to accomplish this, the XML data is highly verbose and very repetitious. Although techniques already exist to compress XML and text in general, most do not keep the data in a form that is useful to users. We present a technique that makes use of recurring structures within an XML document to compress the file in a way that can achieve better compression than other query-friendly compression techniques while still maintaining the data in a form that allows for both querying and indexing. Further, we present an example implementation of the technique, complete with an index-building mechanism and query processing capabilities.
Contact site administrator at: firstname.lastname@example.org