On December 2, Evangelos Kanoulas and I organised a workshop on “Practical Issues in Information Access System Evaluation”, intending to collect and discuss industrial voices and experiences on how evaluation proceeds in such contexts. The workshop was well attended — about thirty participatnts from industry, academia, and combinations of the two met for a full day to discuss these issues. The workshop was kindly hosted by the BCS and funded by a contribution from the ELIAS program. A report is forthcoming.
I spent these past two weeks visiting Stanford university, at the invitation of Martin Kay, thinking about how to enrich distributional models to handle more situtational factors, how to work on what features of an utterance encode such aspects of semantics that might be most relevant to situational models, and on various other ideas tangentially connected to these thoughts. I hope to return for a longer visit next year. (As it turns out, these two weeks were well chosen, in view of the inclement weather back home in Stockholm.)
I have been going to CLEFs for a number of years now. I used to go to TRECs, where a cross-lingual retrieval track caught my attention, and at one of the cross-lingual sessions Carol Peters proposed that there be instituted a European cross-lingual evaluation campaign, vectored to the needs and requirements of a society with several equally prestigious languages. (At one of the cross-lingual TREC sessions one of the attendants explained that French simply can be understood as misspelled English for the purposes of retrieval).
- 2000 Lisbon
- I participated in a panel, together with Noriko Kando and others, on “Evaluating Multi-lingual Information Access”
- 2001 Darmstadt
- Magnus Sahlgren presented our work on experiments using a multi-lingual vector space: “Vector-based Semantic Analysis using Random Indexing and Morphological Analysis for Cross-Lingual Information Retrieval” and I presented some thoughts on sharing resources: “Resource Sharing for Cross-Lingual and Multi-Lingual Document Retrieval in General and the CLEF Campaigns in Particular” proposing sharing processes and tools over SOAP-like services.
- 2002 Rome
- Magnus Sahlgren again presented our continued experiments, this year done together with Rickard Cöster and Timo Järvinen, on “Automatic Query Expansion Using Random Indexing” and I presented work done together with Preben Hansen on “Cross-language relevance assessment and task context”.
- 2003 Trondheim
- I presented continued work done together with Preben Hansen with the logical title of “Continued experiments on cross-language relevance assessment and also work done by Rickard Cöster, Magnus Sahlgren, and myself on “Selective compound splitting of Swedish queries for Boolean combinations of truncated terms”.
- 2004 Bath
- I presented an iCLEF talk on “Cooperation, Bookmarking, and Thesaurus in Interactive Bilingual Question Answering” (work done with Preben Hansen and Magnus Sahlgren) as well as posters on “Dynamic Lexica for Query Translation” (work done with Rickard Cöster, Magnus Sahlgren, and Timo Järvinen) and “Dictionary-based Amharic – English Information Retrieval” with (Atelach Alemu Argaw, Lars Asker, and Rickard Cöster).
- 2005 Wien
- I presented a poster on “Weighting Query Terms Based on Distributional Statistics” (with Magnus Sahlgren and Rickard Cöster) and gave the iCLEF overview talk.
- 2006 Alicante
- I presented a poster on “Trusting the Results in Cross-lingual Keyword-Based Image Retrieval” (work done with Fredrik Olsson) and gave the iCLEF overview talk.
- 2007 Budapest
- I gave a talk presenting the work from the CHORUS Coordination Action on “Use cases for interactive multi-lingual multi-media information access?”
- 2008 Århus
- Presented the iCLEF track together with Paul Clough and Julio Gonzalo, and presented a log analysis of iCLEF logs: “User confidence and satisfaction tentatively inferred from iCLEF logs”
- 2009 Corfu
- I presented work from the CHORUS Coordination Action at under the title “Affect, Appeal, and Sentiment as Factors Influencing Interaction with Multimedia Information” at the Theseus workshop co-located with CLEF and then chaired the iCLEF session.
- 2010 Padova
- This year, the PROMISE project provided a new backbone for CLEF, graduating it from a workshop to an independent conference. I helped organise the panel that PROMISE project held to explain itself to the participants.
- 2011 Amsterdam
- I had mostly adminstrative duties this year, no experiments, preparing for the following year. I was responsible for CLEF-LOC: the CLEF Lab Overview Committee.
- 2012 Rome
- I presented the poster boasters, a lab overview: “Introduction to the CLEF 2012 labs”, and gave an invited talk to the eHealth workshop: “the eHealth area needs dynamic and learning representations and evaluations to fit”; Gunnar Eriksson gave a talk on authorship attribution at PAN: “Features for modelling characteristics of conversations” where I had made some (smallish) contributions; Fredrik Olsson presented joint work from Gavagai at RepLab: “Profiling Reputation of Corporate Entities in Semantic Space”.
- 2013 Valencia
- I presented a poster for the Replab work done at Gavagai together with Linus Ericsson: “Semantic Space Models for Profiling Reputation of Corporate Entities” and participated again in an eHealth panel to talk about a “Wish list from tech industry”.
- 2014 Sheffield
- I presented a poster for the Replab work done at Gavagai (not done by me but by Afshin Rahimi, Magnus Sahlgren, Andreas Kerren, and Carita Paradis) and I gave an invited talk in the Replab session on no-data challenge suggestion titled “Features and target tasks – Do we know what we are doing and why?” which was great fun!
- 2015 Toulouse
- I presented a poster which outlined some of the results from an ELIAS-supported workshop on “Evaluating Learning Language Representations” hosted by us at Gavagai the previous Fall. I also gave an invited talk in the Social Book Search session, arguing for informed feature selection and use case based target tasks.
- 2016 Évora
- This past September I gave a talk on “Evaluating Categorisation in Real Life – an argument against simple but impractical metrics” based on work done by Vide Karlsson, supervised by myself and Pawel Herman. It was 43 Celsius in the shade.
Today, I had the honour to give one of the keynote talks to Nordiskt statiskermöte, where the slogan for the day was given as
“I’m a statistician. To save time, let’s assume I am never wrong.”
I asked the audience of Nordic statisticians to request hypotheses and more motivated feature sets from their clients, and not to agree to fishing expeditions. Most seemed to agree with that thought!
The 39th Annual ACM SIGIR Conference on Research and Development in Information Retrieval was held in Pisa, Italy in July 2016. The Industry Day program, on July 20, chaired by Gilad Mishne of Color Genomics and myself, was a rapid-fire succession of twelve talks, two keynotes by Debora Donato and Hadar Shemtov, and ended with one panel. It was great fun, not least for the chairs!
I had the honour to officiate as an opponent at the public defense of Johan Eklund’s doctoral dissertation With or without context: automatic text categorization using semantic kernels at the Swedish School of Library and Information Science in Borås. Johan used topological models to work out the scope of categorisation approaches, and then performed a number of experiments for text categorisation using variants of distributional approaches.
There were a number of methodological and formal questions to discuss.
- How are the topological models applicable in practice?
- Is a hierarchical knowledge model a reasonable model of human knowledge organisation?
- Johan used only single-term lexical features, aggressively filtered for efficiency. What might happen if we used non-lexical features?
- Johan used standard data sets. The data set quite obviously influenced the result and should be viewed as a parameter for this type of experimentation. How does the palette of categories influence the result? (This was not tested.)
- Is the F1-score as useless a metric as I myself consider it to be. (Johan did not convince me of its usefulness, but neither did I convince him of its uselessness).
Anyone interested in the mathematics of categorisation will be inspired by the models delineated in Johan’s dissertation!