Dolphins at VIHAR

On this day our position paper A proposal to use distributional models to analyse dolphin vocalization outlining our plans to work with dolphin vocalisation using distributional semantics was presented at the 1st International Workshop on Vocal Interactivity in-and-between Humans, Animals and Robots. The paper was presented by Mats Amundin, and co-authors were Robert Eklund, Henrik Hållsten, and Lars Molinder, who came up with the original idea.


Transaktionstransparens och informationsimbalans

Idag publicerar jag och ett antal vänner (“sju it- och mediedebattörer”) en text om transaktionstransparens, (något vi ser som eftersträvansvärt) på DN debatt. Detta i motsats till den informationsimbalans vi ser uppstår mellan informationsintensiva organisationer och de individer som har med dem att göra som kunder eller som medborgare. Vi ser att individer inte ser värdet på de data delar med sig, men inte heller har möjlighet att bedöma det värdet utan att få tillgång till liknande verktyg och liknande mängder data som organisationen själv har. Detta kommer inte att ske. Som motvikt till detta föreslår vi att företag och organisation som värdesätter information om sina kunder eller andra individer de interagerar med anger det värde de anser den informationen har som en del av sin ekonomiska redovisning. Det ger kunder möjlighet att bedöma vilket värde de data de delat med sig av har.

Visiting Scholar at Stanford

This coming academic year of 2017-18 I will be at Stanford University, at its Department of Linguistics. I am looking forward to tugging at some of the most interesting loose ends from the past few years of technology development at Gavagai in the hope of finding interesting seams to work!
Professor Martin Kay, who hosts my visit, took me in for an internship at Xerox PARC in 1991. Now he will be again pointing out the best directions to develop.

Stanley Greenstein defends his Ph D dissertation on predictive modelling

Today I had the pleasure to witness the public defense of Stanley Greenstein’s Ph D dissertation on legal implications of predictive modelling: “Our humanity exposed — Predictive modelling in a legal context” for which I was a co-supervisor on technical matters.

In his dissertation, Stanley gives an inventory of several legal frameworks which might be relevant for the effects predictive modelling might have on an individual. He discusses the risk of “potential harm” — harms which an individual might not even be aware have occurred, such as a somewhat higher interest rate or insurance payment, or not being selected for a job. He examines how European regulations on data protection and human rights are applicable to understanding such harms, and focusses on the target notion of “empowerment” as a legal concept to address the information imbalance between large organisations and individuals.

Föreläsning för humaniorastudenter

Jag var inbjuden att hålla en föreläsning om i stort sett vad som helst jag nu kunde tänkas vilja tala om för humaniorastudenter på humanistdagen på Stockholms universitet på institutionen där jag tagit både grund- och doktorsexamen. Jag försökte anlägga sträng min och tala om sådant som humanister borde göra istället för det de gör när de träffar på en dator. Lite ljusbilder här!

Sentiment analysis in a character-level model

Learning to Generate Reviews and Discovering Sentiment. Alec Radford, Rafal Jozefowicz, Ilya Sutskever.

In this paper (apparently only published thru arxiv, so not carefully reviewed by anyone just yet) the authors present an intriguing result. They build a neural-inspired model (LSTM, a fairly standard one) which predicts the next byte in a text, given the ones it already has seen. They train the model on product reviews, and then use it as an input to a simple classifier. The model, in spite of being trained on characters, does very well (better than many standard lexical models, e.g.) on classifying sentiment of product reviews! The authors even find (to their own delight) an indicator cell specifically for sentiment, and show how it tracks sentiment along the progression of the text. This may seem strange, but actually there is a fairly reasonable hypothesis to explain the result: there is more to sentiment than lexical resources can model. This model appears to capture signal which is encoded in something more than the sequence of words.

In general, coercing most everything about language into lexical models (as recent results have done) is fixing the representation on one analysis level which happens to be accessible due to the nature of our writing system. Breaking this strong binding is probably a good idea.