Category Archives: for the record

Visiting Scholar at Stanford

This coming academic year of 2017-18 I will be at Stanford University, at its Department of Linguistics. I am looking forward to tugging at some of the most interesting loose ends from the past few years of technology development at Gavagai in the hope of finding interesting seams to work!
Professor Martin Kay, who hosts my visit, took me in for an internship at Xerox PARC in 1991. Now he will be again pointing out the best directions to develop.

industrial take on complex search tasks

This morning, I gave a presentation to the workshop on
Supporting Complex Search Tasks on how we at Gavagai handle complex information needs. Mostly I claimed three things:

  • Complexity is not necessarily in the formulation of the information need. Most of our customers perceive themselves as having simple information needs. Or at least those needs are simple to formulate in informal language. We believe an information system should accommodate this, and if needs indeed are complex or change, allow simple and painless reformulation.
  • Greatest challenge is in attention to new information — introducing new information aggregation tools will add business complexity, not reduce it.
  • Evaluation of information systems in the way it is done in academia is good for assessing progress on the cutting edge. Industry has greater need for establishing Best Practice guidelines and in satisficing technology needs than in optimising them.

Slides for my talk on Complex aspects of seemingly simple information needs.

In interesting discussions after the initial presentations, the workshop discussed the need for a quality assessment of data collection methodology. We expect to suggest such a procedure for next year’s edition of this workshop.

Workshop in London

On December 2, Evangelos Kanoulas and I organised a workshop on “Practical Issues in Information Access System Evaluation”, intending to collect and discuss industrial voices and experiences on how evaluation proceeds in such contexts. The workshop was well attended — about thirty participatnts from industry, academia, and combinations of the two met for a full day to discuss these issues. The workshop was kindly hosted by the BCS and funded by a contribution from the ELIAS program. A report is forthcoming.

Visit to Stanford

Stanfords beautiful campus has more galleries than corridors I spent these past two weeks visiting Stanford university, at the invitation of Martin Kay, thinking about how to enrich distributional models to handle more situtational factors, how to work on what features of an utterance encode such aspects of semantics that might be most relevant to situational models, and on various other ideas tangentially connected to these thoughts. I hope to return for a longer visit next year. (As it turns out, these two weeks were well chosen, in view of the inclement weather back home in Stockholm.)

Going to CLEFs

I have been going to CLEFs for a number of years now. I used to go to TRECs, where a cross-lingual retrieval track caught my attention, and at one of the cross-lingual sessions Carol Peters proposed that there be instituted a European cross-lingual evaluation campaign, vectored to the needs and requirements of a society with several equally prestigious languages. (At one of the cross-lingual TREC sessions one of the attendants explained that French simply can be understood as misspelled English for the purposes of retrieval).

2000 Lisbon
I participated in a panel, together with Noriko Kando and others, on “Evaluating Multi-lingual Information Access”
2001 Darmstadt
Magnus Sahlgren presented our work on experiments using a multi-lingual vector space: “Vector-based Semantic Analysis using Random Indexing and Morphological Analysis for Cross-Lingual Information Retrieval” and I presented some thoughts on sharing resources: “Resource Sharing for Cross-Lingual and Multi-Lingual Document Retrieval in General and the CLEF Campaigns in Particular” proposing sharing processes and tools over SOAP-like services.
2002 Rome
Magnus Sahlgren again presented our continued experiments, this year done together with Rickard Cöster and Timo Järvinen, on “Automatic Query Expansion Using Random Indexing” and I presented work done together with Preben Hansen on “Cross-language relevance assessment and task context”.
2003 Trondheim
I presented continued work done together with Preben Hansen with the logical title of “Continued experiments on cross-language relevance assessment and also work done by Rickard Cöster, Magnus Sahlgren, and myself on “Selective compound splitting of Swedish queries for Boolean combinations of truncated terms”.
2004 Bath
I presented an iCLEF talk on “Cooperation, Bookmarking, and Thesaurus in Interactive Bilingual Question Answering” (work done with Preben Hansen and Magnus Sahlgren) as well as posters on “Dynamic Lexica for Query Translation” (work done with Rickard Cöster, Magnus Sahlgren, and Timo Järvinen) and “Dictionary-based Amharic – English Information Retrieval” with (Atelach Alemu Argaw, Lars Asker, and Rickard Cöster).
2005 Wien
I presented a poster on “Weighting Query Terms Based on Distributional Statistics” (with Magnus Sahlgren and Rickard Cöster) and gave the iCLEF overview talk.
2006 Alicante
I presented a poster on “Trusting the Results in Cross-lingual Keyword-Based Image Retrieval” (work done with Fredrik Olsson) and gave the iCLEF overview talk.
2007 Budapest
I gave a talk presenting the work from the CHORUS Coordination Action on “Use cases for interactive multi-lingual multi-media information access?”
2008 Århus
Presented the iCLEF track together with Paul Clough and Julio Gonzalo, and presented a log analysis of iCLEF logs: “User confidence and satisfaction tentatively inferred from iCLEF logs”
2009 Corfu
I presented work from the CHORUS Coordination Action at under the title “Affect, Appeal, and Sentiment as Factors Influencing Interaction with Multimedia Information” at the Theseus workshop co-located with CLEF and then chaired the iCLEF session.
2010 Padova
This year, the PROMISE project provided a new backbone for CLEF, graduating it from a workshop to an independent conference. I helped organise the panel that PROMISE project held to explain itself to the participants.
2011 Amsterdam
I had mostly adminstrative duties this year, no experiments, preparing for the following year. I was responsible for CLEF-LOC: the CLEF Lab Overview Committee.
2012 Rome
I presented the poster boasters, a lab overview: “Introduction to the CLEF 2012 labs”, and gave an invited talk to the eHealth workshop: “the eHealth area needs dynamic and learning representations and evaluations to fit”; Gunnar Eriksson gave a talk on authorship attribution at PAN: “Features for modelling characteristics of conversations” where I had made some (smallish) contributions; Fredrik Olsson presented joint work from Gavagai at RepLab: “Profiling Reputation of Corporate Entities in Semantic Space”.
2013 Valencia
I presented a poster for the Replab work done at Gavagai together with Linus Ericsson: “Semantic Space Models for Profiling Reputation of Corporate Entities” and participated again in an eHealth panel to talk about a “Wish list from tech industry”.
2014 Sheffield
I presented a poster for the Replab work done at Gavagai (not done by me but by Afshin Rahimi, Magnus Sahlgren, Andreas Kerren, and Carita Paradis) and I gave an invited talk in the Replab session on no-data challenge suggestion titled “Features and target tasks – Do we know what we are doing and why?” which was great fun!
2015 Toulouse
I presented a poster which outlined some of the results from an ELIAS-supported workshop on “Evaluating Learning Language Representations” hosted by us at Gavagai the previous Fall. I also gave an invited talk in the Social Book Search session, arguing for informed feature selection and use case based target tasks.
2016 Évora
This past September I gave a talk on “Evaluating Categorisation in Real Life – an argument against simple but impractical metrics” based on work done by Vide Karlsson, supervised by myself and Pawel Herman. It was 43 Celsius in the shade.

Keynote address to Nordiskt statistikermöte

nsm2016-jussi-karlgren1

Today, I had the honour to give one of the keynote talks to Nordiskt statiskermöte, where the slogan for the day was given as

“I’m a statistician. To save time, let’s assume I am never wrong.”

I asked the audience of Nordic statisticians to request hypotheses and more motivated feature sets from their clients, and not to agree to fishing expeditions. Most seemed to agree with that thought!

Slides from my talk are here.

SIGIR Industry Track

The 39th Annual ACM SIGIR Conference on Research and Development in Information Retrieval was held in Pisa, Italy in July 2016. The Industry Day program, on July 20, chaired by Gilad Mishne of Color Genomics and myself, was a rapid-fire succession of twelve talks, two keynotes by Debora Donato and Hadar Shemtov, and ended with one panel. It was great fun, not least for the chairs!

The program is archived here, with most of the slides.