Morphology, Query terms, and Relevance

At the upcoming LREC conference I will present some of the results from Tvärsök, a project to evaluate the effects of various morphological analysis tools on retrieval results. The project was a cooperation between Euroling Ab, SICS, and CST.

One of the research-wise most interesting findings was inspired by my recently being the academic opponent at the public defense of Kimmo Kettunens Ph D dissertation. He has studied generative morphologies applied to information retrieval and found that the nine most frequent morphological cases for Finnish nouns suffice to model most of what is needed for indexing and query processing purposes. In the table given below, I show how those nine cases actually are distributed non-uniformly over relevant and non-relevant documents. Locative cases are less likely to occur in topically relevant documents.

Skev distribution av sökfrågetermer
Search term case distribution in relevant and non-relevant texts (the most divergent values marked in bold; χ2: 70.155; df = 2; p < 0.005)

The LREC paper, “Experiments to investigate the connection between case distribution and topical relevance of search terms in an information retrieval setting”, is authored with Hercules Dalianis from Euroling and Bart Jongejan from CST, will soon be in the eprints archive, and is also presented on the Euroling blog.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s