This Spring of 2023 I will be teaching a class at the Department of Linguistics in Helsinki under the fairly general and demanding header “Approaches to Natural Language Understanding” with Timothee Mickus. We will mostly talk about approaches to knowledge representation of language and have several labs for students to experiment with.

L0: https://docs.google.com/presentation/d/1EOPs-q1JvXHo_jbOPVmgWTN_yl_-yCpYOvECsBvrYBw/edit?usp=sharing

L1: https://docs.google.com/presentation/d/1YO374ZDfBt6GCd4WaDo6DjaWcpRwxdgIk9S-VmaIN1o/edit?usp=sharing

Reading material:

L2: Evaluation and Benchmarks and Shared Tasks

L9: Podcast dataset

L10: Audio features

L13: Research directions

Jussi Karlgren and Pentti Kanerva: Semantics in High-dimensional Space


TREC 2020: The Podcast Track

I came back to TREC — first time since 1999 (CLEF started then, and that took over most of my attention). This year, I was one of the organisers of the Podcast Challenge which involved retrieval and summarisation of data from the 100 000 Podcast Data set we released for this purpose. We expect great things to happen with this data set: in some way, the current state of speech and podcast analysis field is quite similar to what text and social media analysis was in the mid nineties! The scale of the data set is daunting, the features we want to work with are not quite settled (we expect to see great results from peering into the audio, something noone did this year), the use case we aim for is not entirely determined, and the medium of podcasts is in its infancy and is likely to develop and change rapidly in th coming years! It’ll be right exciting to be here to see what comes next!

Panel on AI and responsibility

I was honoured to be asked to participate as a discussion facilitator for the 35th conference on IT and Law held in digital form on November 11-12. There are numbers of interesting questions to do with how law meets autonomous decision making — how can responsibility be distributed when the operator of a system has less expertise than previously was typical? How can we address the question of invisible harm, when automatic decision making systematically causes some slight disadvantage to some of us? How can we act on the principles of editability and transparency in face of information imbalance? This discussion was not the last word on these issues!

CLEF 2020

This is the first year I did not travel to a CLEF meeting. Participation over a video link does work, and some of the interesting presentations came across quite well, but the full experience was somewhat curtailed. Hoping to see the world back on rails again next year!

CLEF 2019

At this year's CLEF in Lugano I presented a poster on How Lexical Gold Standards Have Effects On The Usefulness Of Text Analysis Tools For Digital Scholarship
The Usefulness Of Text Analysis Tools For
Digital Scholarship
, presented work on detecting signs of eating disorders in social media posts done by my student Elena Fano as her master thesis,
and, at the newly instituted industrial session I discussed thresholds for adopting systematic evaluation schemes in operational settings. This last presentation was largely based on the chapter on these sorts of things in the new CLEF book.

Talk at UC Davis

I was honoured to be invited to UC Davis, a short train ride from Stanford, by Raul Aranovich to give a talk on “Hyperdimensional computing for human data meets the squinting linguist” or “Explicitly encoded high-dimensional semantic spaces used for authorship profiling” at the linguistics department there! Slides are here.

Webinar for ACM

I gave my first webinar ever, for the ACM, at the invitation of Rose Paradis. The title was An encoding model for hypothesis driven research on large heterogeneous data streams (OR “The squinting linguist meets hyperdimensional computing”).

Giving a webinar was a strange experience: talking to an audience of more than a thousand people but not seeing them in the room. It will take some time to get used to this sort of thing! Slides are here and the talk itself is published by the ACM on video which is a bit strange since it is mostly audio (plus the slides, of couorse).