The Power of Text

If one desires the excitement of taking public transportation on a frequent basis, one observation is common through the bench: almost everyone is on the smartphone, checking apps, social media and the news. As “texting” has become so common in our lives, the importance of “text” has steadily increased. Understanding text in all its dimensions is also the basis for conversational AI.

Text – be it in books, journals, chat protocols or any other shape or form – is extremely powerful data. It holds enormous information, not only about facts and knowledge but also about language, the way we converse with each other. The rise of the smartphone has accelerated of the growth of the available text corpus. Everyone, by simply engaging in daily chat behaviour with family and friends, is now a contributor to the foundation of modern text analysis. Thousands of researchers, startups and corporations are trying to make sense of this ever growing body of text to derive tools which can help automate repetitive tasks in certain fields of life: travel and ordering food are just the most obvious ones. Also in more minor aspects of our lives, we are exposed to snippets of the power of text analytics: When checking mails, Gmail will oftentimes draft the three most likely responses for you to choose from.

But as the text corpus growth at a staggering speed, the advances of modern text analytics, or natural language processing (NLP) as it is oftentimes called, are tough to judge from the outside. Because mastering “text” and “language” are extremely tough domains and we tend to evaluate them at the highest standards, as they are at the core of our identity. Ask a new, innovative chat bot a tricky question and it might not have the right answer (even though it might for 80% of standard questions in a certain domain) and you’ll immediately laugh: “yeah, how stupid this chat bot must be not to understand my simple question”. While that is a reasonable (and understandable) position to take and multiple screenshots of rather absurd human-chatbot conversations have gone viral on Twitter and social media, it lacks a basic appreciation of the extreme difficulty of mastering text.

To enhance the appreciation of this difficulty, consider that all NLP algorithms (so everything that is called conversational AI) are based on the components of text, essentially the combination of different letters in strings with varying length. Any layer of meaning, or understanding remains human. Thus, let’s take two examples to see what makes text so difficult to be understood by computers:

Double negation: I don’t think you can’t buy fish in this shop.

While this might be a rather uncommon phrase in English (these phrases are much more common in languages like German), it serves as a good example. We immediately understand that the person expressing this thought actually does think that one can purchase fish at the respective supermarket. However, if a computer approaches this sentence it would spot the words “don’t think” and attach some meaning to it, which has been learned from other sentences, which has the same sequence. However, if this would be used as the key indicator, then other relevant core words of the sentence “buy”, “fish” and “shop”, would be stored with the tag don’t think. Let’s now imagine you ask a chatbot, trained on this data the question: “Do you think I can buy fish in this shop?”, the question would be answered negatively because the computer has learned that “don’t think” is associated with this word combination. Certainly, one could implement “rules”, which try to spot these double negatives. However, then the structure of the sentence comes into play. What about long sentences? Weird sentences?

Jokes: “Yes, I would totally vote for Trump”

Image it’s 2016 and you are having a chat on WhatsApp (Facebook Messenger, Telegram or whatever platform you use to communicate) about the upcoming US presidential election. You guys exchange links on a recent Trump scandal and your friend exclaims “Yes, I would totally vote for Trump”, which – just from the context of your conversation and from knowing him – you understand to be a humorous comment. However, consider a an algorithm / computer: at face value, this sentence looks identical to a statement caught by a newspaper reports at a Trump rally: “Yes, I would totally vote for Trump”. So the ironic tone of the sentence would need to be derived.

Certainly, researchers have data sets with “jokes”, millions of sentences which are labeled as jokes or ironic comments. Based on this data set, you can train algorithms to give probabilistic assessments of whether any sentence is a joke or not. However, imagine a scenario where you converse with a chatbot, and the bot mistakenly responds to your sentence as if it were a joke.

Just these two very basic examples give a pretty strong glimpse into the challenge that NLP researchers face, when developing the algorithms that are supposed to power conversational AI. Language is our most powerful form of expression, it gives us the freedom to verbalize our thoughts. We have almost unlimited degrees of freedom, when composing our sentences. Algorithms are smart and they are getting smarter every day. However, managing text (and/or voice as well) remains the holy grail of artificial intelligence.

It is fairly easy to belittle the current state of productified NLP given our supreme mastering of our own language. The challenges with text are manifold. But it will be thrilling to see, how far the research in NLP and text analytics will go within the course of the next five to ten years. The bigger the challenges, the bigger the opportunity.