TwiNL: A deeper understanding of society.

The ultimate goal of the TwiNL project is to offer language
researchers a resource that they have always dreamed of. We archive Twitter.This archive is a collection of language
that can be traced to the individual language user. We can also infer from the data how old they are, which gender
they have, in what location they were when they tweeted. And this is crucial data to answer
crucial theoretical questions. There is also a TwiNL project website where everybody can
look up statistics about Dutch tweets. We could do this before, but what is new about this, is that
since we have an archive you can look up tweets also from a longer time period. An advantage of this for researchers is that we can now
build models which are more fine-grained. So instead of collecting information about the Netherlands
we can zoom in to smaller locations like a municipality. The average person on Twitter is young. So sociolinguists, the people who study the use of language
in groups and the formation of languages within groups, are very interested in this new archive. They have only been able to do small studies,
on the street, talking to people. And this is just sitting in your chair
and searching in this big archive. eScience has two roles in the TwiNL project. The first is finding algorithms for searching
through large quantities of tweets, which you do with eScience algorithms on a parallel machine. The second is to find visualization techniques which are very
well equipped to show large volumes of data to users. An interesting example is following the weather. In January 2013 Holland was hit by a snowstorm. So what we did was, we searched on Twitter for the Dutch word
for snow (sneeuw), and we checked where it was mentioned. Another interesting application would be to follow
diseases like the flu or hay fever. What you could do, is check where in the Netherlands
people use these words, and then you could see exactly where the disease starts,
and you could also follow it in a time path, or location path to see how it develops. A concrete example of using the TwiNL approach
is predicting events. Predicting events that are not planned, not known beforehand. But clearly, a group of people, like a group of hooligans,
is planning, by sending tweets to each other, to cause some problems somewhere. They will not be explicit, they will make indirect references:
Who is going to drive? Who is going to pick up? And it is this type of various implicit language clues that
we try to extract from these massive streams of Twitter. In the end we can use all of these clues to make fairly
exact predictions of when the events is going to happen. The TwiNL project was proposed by three partners. The Netherlands eScience Center, SURFsara, and Radboud University. It already inspired researchers because
we solved a Big Data problem for them. This has inspired research in linguistics. For example, people that want to know the spread of
dialect words. Where in particular dialect words are used, in the Netherlands and in Belgium. eScience, in TwiNL for example, is crucial
for bringing our science ahead. Because we are faced with lots of digital data; a lot
of digital data is coming our way. Language data. And we need to handle those large amounts with supercomputing. The next step is to analyse these data with algorithms
that do natural language processing. And we need supercomputers for that as well. These big streams of social media data and other textual
archives are coming out of our ears, so to speak. So we need big computational power and analysis,
and storage, and memory. We need it all.

You May Also Like

About the Author: Oren Garnes

Leave a Reply

Your email address will not be published. Required fields are marked *