#twitter, #datascience and the art of #riskmonitoring
July 2016 was one of those months, which underlined how highly volatile and unpredictably cruel the world is. Terrorist attacks in Nice (France), were followed by an attempted coup d’etat in Turkey, an amok shooting in Munich and another violent terrorist attack in the small Bavarian city of Arnsberg. It’s not only those recent events, but the chain of events throughout the last years, which have led to increased security precautions in cities across Europe and the US. Terrorist attacks, as one-time heavy-impact events, do not fall into the realm of general prediction models. Nevertheless, the assessment of security threats has re-emerged at the top of the agenda of executives of governmental agencies and corporations alike. More than ever, they are looking into new innovative ways to leverage the bulk of accessible, public data to enhance their understanding. Social media data, and Twitter streams in particular, are center stage.
Social media streams are of unique value in detecting security risks, as they are a primary, decentral source for on-the-ground information. As opposed to regular media, which due to aggregation processes, always involves a time lag, Twitter can capture events shortly after occurrence and is also suitable for analyzing issues overtime. In short: accessing Twitter streams can help governmental agencies and corporations to get an informational edge.
Early Warning System Using the Twitter
Social media monitoring has long been a common practice within corporations, to enhance their view on their target group. More or less sophisticated software assists in the process and enables companies to define keywords and extract the most relevant information, retweets or relevant influencer. This domain of “reputation and brand management” is fairly mature, varying only in the depth of analysis. Depending on the stakeholder, risks can be of different nature and scope. Defining risk-associated keywords be lengthy or sheerly impossible, as it takes away the flexibility of monitoring emerging issues. However, the tradeoff between specific (fast, less computing power) and broad (lengthy, more computing power) will not cease to exist, thus parties need to find an adequate balance. Once found, there exists a solid body of research regarding the predictability of emerging topics. It is possible, to algorithmically detect anomalies in the pattern of occurrence of word clusters. This enables for reliable predictions, regarding the rise and fall of issues. As such, the system is suitable as an early warning system. In the case of a terrorist attack or train accident, the relevant authorities will be adequately alarmed and resources can be allocated accordingly. Making use of twitter streams thus makes every smartphone user (or Twitter account holder) a on-the-ground correspondent and data science techniques enables organizations to leverage this new reality.
Improving Crime Prediction with text analytics
Predictive Policing has been a focus topic among police departments (primarily in the US, but most recently also in Germany). Standard prediction models take historic crime data sets with information about the type of crime, location and time and give indications, on where to focus patrol duty. While some more recent applications also draw to socioeconomic data, most approaches tend to revolve around those three variables. However, researchers from the University of Virginia recently proved that sophisticated linguistic analysis of geospatial and temporally tagged tweets improved the predictive quality of the model.
For 19 of 25 assessed types of crimes did the researchers find that their model could improve predictive power, among them “Stalking”, “Criminal Damage” and “Gambling”. Their twitter-specific approach helped them to identify geographically tagged (neighborhood specific) trending topics. Merging these into the prediction model made it possible to make more granular and accurate predictions about the type and quantity of criminal activity in certain neighborhoods. Notably, the analysis was fueled by the world-class Open Data Portal of the City of Chicago.
Leveraging the geospatial component of Twitter
Another interesting analytical avenue for data science approaches comes with the geospatial information, which is tagged to most tweets. This precise location data is of revance for those companies or governmental agencies, which share an interest in the details of activity surrounding a certain piece of infrastructure (train station, sights, etc.). While not capturing all topic-related tweets, the geo-focused approach allows to drill down on various issues within the immediate surroundings and predict emerging trends accordingly.
Anomalies in twitter activity can be an alarming sign (regardless of their actual content) and monitoring those is of relevance for security organization or other. Taking the incidents at Cologne central station on New Year’s Eve 2016 as an example, monitoring the quantity of tweets could have potentially have alarming consequences, while the content probably varied and it took a while for issues to emerge. These data sets could also be merged with anonymized telco data, to arrive at an even more impactful analysis.
Generally, topic-specific, temporal and spatial aspects always play a role when detecting issues and risks. Nevertheless, keeping in mind different points of departure can help to establish different analytical levels which allow for more cautious, preventive granularity.
Twitter, Social Media data as raw diamond
Without a doubt: linguistic and big data text analysis can help to inform government and corporations to better monitor and navigate the risk landscape. Tweets are a particularly interesting source for information, as they are a timely representation of on-the-ground events.
However promising the upside, twitter analysis is not an easy endeavour. Due to the constraint in characters, twitter users tend to use creative abbreviations and word creations. From a natural language processing (NLP) perspective, this poses a challenge. In this setting, starting with highly domain-specific NLP approaches is the most promising approach, which can gradually be expanded to other domains.
Lots of progress can be expected in this emerging field, even the U.S. military recently renewed a call for proposals which deals with emerging threat analytics in social media, stating the goal to “tackle […] the challenges of gathering useful information from billions of social media posts generated by millions of users.” In essence, most value for the end user will be in the conceptualizations of powerful tools, which allow for analytical insight into the heavy stream of content, generated by users by the second. It will contribute to the transformation of risk assessment as a more bottom-up process, with top-down validation methods, which will hopefully increase security and security perception in the medium-term.