Today, I had the honour to give one of the keynote talks to Nordiskt statiskermöte, where the slogan for the day was given as
“I’m a statistician. To save time, let’s assume I am never wrong.”
I asked the audience of Nordic statisticians to request hypotheses and more motivated feature sets from their clients, and not to agree to fishing expeditions. Most seemed to agree with that thought!
Slides from my talk are here.
The 39th Annual ACM SIGIR Conference on Research and Development in Information Retrieval was held in Pisa, Italy in July 2016. The Industry Day program, on July 20, chaired by Gilad Mishne of Color Genomics and myself, was a rapid-fire succession of twelve talks, two keynotes by Debora Donato and Hadar Shemtov, and ended with one panel. It was great fun, not least for the chairs!
The program is archived here, with most of the slides.
I had the honour to officiate as an opponent at the public defense of Johan Eklund’s doctoral dissertation With or without context: automatic text categorization using semantic kernels at the Swedish School of Library and Information Science in Borås. Johan used topological models to work out the scope of categorisation approaches, and then performed a number of experiments for text categorisation using variants of distributional approaches.
There were a number of methodological and formal questions to discuss.
- How are the topological models applicable in practice?
- Is a hierarchical knowledge model a reasonable model of human knowledge organisation?
- Johan used only single-term lexical features, aggressively filtered for efficiency. What might happen if we used non-lexical features?
- Johan used standard data sets. The data set quite obviously influenced the result and should be viewed as a parameter for this type of experimentation. How does the palette of categories influence the result? (This was not tested.)
- Is the F1-score as useless a metric as I myself consider it to be. (Johan did not convince me of its usefulness, but neither did I convince him of its uselessness).
Anyone interested in the mathematics of categorisation will be inspired by the models delineated in Johan’s dissertation!
I had the pleasure of giving a talk to the students at Hyper Island at the invitation of Kelly Smith on the sort of things we work with at Gavagai. Slides are here (slides are in Swedish).
The 2nd edition of the SEXI workshop was held in conjunction with the 2016 WSDM conference in San Francisco. (The first edition was held at the 2013 WSDM in Rome). The discussions ranged from filtering content to suit specific audiences to notions of appeal, pleasure, and affect as target measures for evaluating interaction. Eventually some ideas for a shared task were crystallised. A lesson we debated with some interest was why we had such trouble attracting corporate speakers, even while many of the large internet companies are quite engaged in technology with relevance for the topic. The interest was considerably higher at the event in Rome; cultural differences between Europe and the US were thought to have some effect on this. The response from one company which openly works with adult material as a matter of course was
“Thanks for getting in touch. While this sounds very interesting, we are
unfortunately unable to participate.
Thank you and best of luck with the workshop!”
Which of course is somewhat disappointing.
It would seem to be the responsibility of technologists not to be wary of understanding human behaviour if they are to provide better services for those same humans!
I had the pleasure to give a guest seminar on how we use Random Indexing at Gavagai to work with large amounts of text, at CERAS at Stanford University. The seminar was hosted by Love Björnsson who has very interesting data sets which we plan to work with to measure the level of authority and argumentation in topically coherent text.
I was glad to have both Pentti Kanerva who is the originator of Random Indexing, and Martin Kay who supervised my internship on text analysis and translation in 1991 in the seminar! Slides are here.
This year, as previous years, I have given two lectures on evaluation (and some thoughts on application) of information access. Slides are here for the first lecture here for the second lecture.