I had the honour to officiate as an opponent at the public defense of Aron Henriksson’s licentiate thesis Semantic Space of Clinical Text. The very ambitious and well-written thesis reported on a series of very well-wrought and well executed experiments on using semantic spaces implemented through random indexing to process electronic patient records for a set of clearly defined text-oriented tasks: finding synonyms, identifying and explaining abbreviations, detecting adverse reactions to medication, and assigning clinical codes to texts. There were a number of methodological and formal questions to discuss.
- What does it mean to explore window length in aggregating distributional data, and should aggressive distance weighting be employed?
- Should different semantic spaces be combined in an ensemble? What semantic relations are we targeting and what do we expect the various spaces to provide?
- What are appropriate evaluation gold standards, and what would a perfect system do? (My sense is that Aron did his evaluation a disservice by aiming to emulate a very specific standard which does not necessarily actually provide a valid task model.)
Aron and myself, and hopefully (part of) the auditorium, left the seminar with a long list of new ideas on applying our favourite techniques to our favourite text collections.