Jag var inbjuden att hålla en föreläsning om i stort sett vad som helst jag nu kunde tänkas vilja tala om för humaniorastudenter på humanistdagen på Stockholms universitet på institutionen där jag tagit både grund- och doktorsexamen. Jag försökte anlägga sträng min och tala om sådant som humanister borde göra istället för det de gör när de träffar på en dator. Lite ljusbilder här!
Learning to Generate Reviews and Discovering Sentiment. Alec Radford, Rafal Jozefowicz, Ilya Sutskever. https://arxiv.org/abs/1704.01444
In this paper (apparently only published thru arxiv, so not carefully reviewed by anyone just yet) the authors present an intriguing result. They build a neural-inspired model (LSTM, a fairly standard one) which predicts the next byte in a text, given the ones it already has seen. They train the model on product reviews, and then use it as an input to a simple classifier. The model, in spite of being trained on characters, does very well (better than many standard lexical models, e.g.) on classifying sentiment of product reviews! The authors even find (to their own delight) an indicator cell specifically for sentiment, and show how it tracks sentiment along the progression of the text. This may seem strange, but actually there is a fairly reasonable hypothesis to explain the result: there is more to sentiment than lexical resources can model. This model appears to capture signal which is encoded in something more than the sequence of words.
In general, coercing most everything about language into lexical models (as recent results have done) is fixing the representation on one analysis level which happens to be accessible due to the nature of our writing system. Breaking this strong binding is probably a good idea.
Deese, James. 1962. On the structure of associative meaning. Psychological Review, Vol 69(3), 161-175. http://dx.doi.org/10.1037/h0045842
Deese, who has published extensively on association norms and the methodology of eliciting associations, discusses here what sort of relation the terms in associative pairs might have. Most of the paper notes that the methodology heretofore has been faulty, and Deese’s contribution is to introduce frequency well into the model. He also discusses associative relations in terms of replaceability and combinability and in the assymmetry between items on different hyponymi levels.
Deese posits (referring to previous work by Woodworth, Ebbinghaus, and Galton, to which I expect I might return further on) that the associative relation between elicited terms is not one of meaning in the way meaning usually is understood. (Woodworth per Deese, classifies (grades, probably) words both by meaning and meaningfulness. This needs to be looked up properly). The associative relation is not as readily mappable on known lexicogrammatic relations.
I will try to make notes of interesting papers I read from now on. Instead of scribbling in the margins of other papers, napkins, and post-it notes, I will scribble here.
This morning, I gave a presentation to the workshop on
Supporting Complex Search Tasks on how we at Gavagai handle complex information needs. Mostly I claimed three things:
- Complexity is not necessarily in the formulation of the information need. Most of our customers perceive themselves as having simple information needs. Or at least those needs are simple to formulate in informal language. We believe an information system should accommodate this, and if needs indeed are complex or change, allow simple and painless reformulation.
- Greatest challenge is in attention to new information — introducing new information aggregation tools will add business complexity, not reduce it.
- Evaluation of information systems in the way it is done in academia is good for assessing progress on the cutting edge. Industry has greater need for establishing Best Practice guidelines and in satisficing technology needs than in optimising them.
Slides for my talk on Complex aspects of seemingly simple information needs.
In interesting discussions after the initial presentations, the workshop discussed the need for a quality assessment of data collection methodology. We expect to suggest such a procedure for next year’s edition of this workshop.
On December 2, Evangelos Kanoulas and I organised a workshop on “Practical Issues in Information Access System Evaluation”, intending to collect and discuss industrial voices and experiences on how evaluation proceeds in such contexts. The workshop was well attended — about thirty participatnts from industry, academia, and combinations of the two met for a full day to discuss these issues. The workshop was kindly hosted by the BCS and funded by a contribution from the ELIAS program. A report is forthcoming.
I spent these past two weeks visiting Stanford university, at the invitation of Martin Kay, thinking about how to enrich distributional models to handle more situtational factors, how to work on what features of an utterance encode such aspects of semantics that might be most relevant to situational models, and on various other ideas tangentially connected to these thoughts. I hope to return for a longer visit next year. (As it turns out, these two weeks were well chosen, in view of the inclement weather back home in Stockholm.)