Today, I gave a talk for the Europeanatech conference at the Bibliothèque Nationale in Paris. The session was on multilinguality and its challenges for libraries and memory institutions, but I took it upon myself to broaden the scope a bit, to argue that the various aspects of handling new text, including “Big Data” aspects of information processing are related to that same challenge. Some are technology related such as those of data volume and velocity, and should be solved by engineers. Others are policy related, such as those of data variety and veracity: in this case, which type of data should be recorded, retained, archived, and made accessible to research and the general public? I argued that multilinguality is a policy issue of the same kind, and that libraries and memory institutions should think carefully about the challenges they intend to meet and the queries and requests they intend to fulfil in the next century or so.
It is probably not necessary to record every tweet, just as today not every phone conversation is recorded for the future, but may be interesting to record the impact or effect of them, and the change over time in the stream. Not everything written should be retained.
Slides are here.