Why to be sceptical about the rise of AI startups

Barely a week passes without the announcement of yet another (seed) funding round for a startup, which claims to utilize “artificial intelligence”, “deep learning”, “machine learning” or “proprietary algorithms”. Algorithms, it seems, are at the core of almost all ventures these days – is that really the case?

Certainly, by reading the vast majority of news reports in the industry’s go-to publications like TechCrunch, VentureBeat, one is inclined to think that without algorithms, without some artificial intelligence at the core of the product, there is no chance to receive funding from the top venture capital firms. At the end of the day, artificial intelligence cries competitive advantage.

However, it is worth to critically ask how much impact “artificial intelligence” or “algorithms” can have at the early stages of a venture, by reflecting on the following two issues: (1) Volume of Accessible Data and (2) Development versus Application of Algorithms. This is certainly not implying that most AI startups are a hoax, but it should trigger a critically informed view when trying to make sense of the AI hype out there.

Volume of Accessible Data

Writing an algorithm is a piece of cake. An algorithm, at its core, is a rule of action: if this happens, then this should be the consequence. The hinge: “this” can be extremely complex. But if you would implement a segmentation in your customer database along the lines of “if the customer is older than 49 years, then send him/her a product offer for red wine, if not, send offer for white wine” that is technically already an algorithm.

Now, in this case you have just arbitrarily chosen the cut-off age at 49 years (why should a 48 year old customer not receive this offer?) and also assigned a more or less random product. No wonder, there is a difference between “an algorithm” and “a good algorithm”.

The essential difference: A good algorithm has been developed and tested on a large dataset and has proven to generate good results. In the example referenced above, how would we rank the quality of the algorithm: the conversion rate (actual percentage of customers who then bought the product) should be significantly higher than if the customer would receive a randomly generate offer. But to arrive at a valid customer base segmentation, one needs access to a large volume of transactional data. What have customers with certain characteristics bought in the past and what have they not bought – these data points form the input for the algorithmic calibration and fine-tuning. However, startups which have only been on the market for few months rarely have this volume of accessible data. So, whenever an AI startup asks for funding, ask how much (unique) data they have. Most likely, data will be their defensible competitive advantage.

What is an acceptable amount of data? Tough to say, as it depends on its nature. Generally, the less structured the data (say e.g. text), the more data you need to distill valid patterns and rules of action. If the startup claims to have natural language processing algorithms in place, be aware that they should be based on a large corpus of relevant text excerpts. If there is limited data, it will be the crucial intellectual task to determine how essential the algorithm would be for the business model. If it turns out it actually isn’t (or the team is just too great), then it seems to be feasible to move along anyway. If it actually is, then… well.

Development versus Application of Algorithms

VCs are in the game of spotting competitive advantages. Simply claiming the existence of “proprietary algorithms” is not enough for conveying this competitive advantage. If one digs deeper into nature and landscape of the development of algorithms, one find that the real frontier algorithmic development takes place in only very few institutions: research labs at universities, corporate AI labs at Google, Facebook and other tech giants and few other innovation grounds.

These companies and institutions have access to large datasets (images, text, etc.) and thus the prime advantage to develop new and performative algorithms. Most other companies actually focus on the right application of existing algorithms. That makes sense, as the development of algorithms is an extremely lengthy and uncertain endeavour and in a highly competitive environment, corporations often lack the time and cash to invest in these activities. Moreover, the application of algorithms is a tough nut by itself. While “off-the-shelf” solutions exist for a lot of problems, bringing the data set into the right shape, twisting and turning the data to make it suitable for the algorithm and then validly test the results against other possible algorithms is a science itself. However, as most less-R&D-intensive companies are focusing on this area, it becomes evident that the real value creation is in the possession of unique data sets (see the first point) and the availability of data science skills to leverage this data for the company’s commercial strategy.

To conclude, it seems to be a stretch to label startups without data AI startups. At least, it remains questionable how they should have trained and validated their algorithms in a legitimate and reliable way. So, when hearing a startup pitch about AI and machine learning as its product’s core, the first question should always be: how much unique data do you have? If there is no data (or no unique data), then the question will be: how long will it take to acquire the data? And with that question, the assessment will have shifted to general considerations about the business model. A healthy scepticism about early stage AI startups thus seems reasonable.

Now, will this change any funding dynamics? Probably not, as believing in a AI product vision is oftentimes just too tempting anyway…