Data Science will save Christmas – or not?
Christmas is just around the corner. Christmas markets are drawing large crowds, mulled wine consumption is accelerating and on the weekends, one is advised to avoid shopping malls, as thousands are keen to make the best deals during their Christmas shopping routine. With Christmas Eve rapidly approaching, there is one question which barely any parent can escape: How Santa is eventually able to distribute all gifts to the right addresses? (Let’s skip the issue with the chimney this time) One common answer to ease curiosity relates to Santa’s magic sleigh, which somehow enables him to drop all gifts during one night. Santa’s magic sleigh, that much seems obvious, is clearly outperforming any kind of same-day-delivery-attempts by online retailer Amazon and e-commerce giant Zalando. But what happens if Santa’s sleigh gets stolen?
Well, Christmas could turn into a nightmare for millions of kids. To avoid this, FICO – a software analytics company mostly known for its credit card anti-fraud systems – challenged the global community of data science enthusiasts to compose a back-up routing plan, which could be fulfilled with a more standardized sleight. How should trips from the North Pole and back be scheduled? How can weight and size limitations be taken into account? To get the problem solved (and to prevent a massive Christmas fiasco), FICO posted a comprehensive dataset on kaggle.com, a popular website, which hosts global data science competitions. Hundreds of data scientists signed up to solve the challenge. While helping Santa to efficiently manage Christmas logistics with his given fleet of reindeers might be an honorable goal, there are other companies, which turn to kaggle.com with real-life business problems.
Take for example Germany’s leading drug store chain Rossmann, which currently hosts a data science competition on kaggle.com. The company asks data scientists to conceptualize algorithms to predict six weeks of daily sales for its 1.125 drug stores across Germany. Providing a comprehensive data set, competition participants will have to opportunity to work on an accurate prediction model, taking into consideration a large variety of factors, which have an influence on daily sales, such as promotions, school holidays and seasonality. While the challenge itself is probably motivation enough for some participants, Rossmann will award 35.000 US-Dollar to the winning teams. And so far, more than 3.300 data science teams have registered. Besides access to the brain power of worldwide data scientists in the course of the competition and a fair shot of getting an accurate daily sales prediction model for its drug stores, why do companies like Rossmann, Walmart, OTTO Group and others regularly post data science challenges on kaggle.com?
Why do companies regularly post data science challenges on kaggle.com?
Solving a real-life data science problem, developing prediction and forecasting models, conceptualizing calibrated algorithms, which ultimately improve and support the commercial activities, are certainly among the main reasons for companies turn to kaggle.com. Even taking into consideration the 35k price money, getting an external perspective from multiple thousand teams worldwide on a data set seems to be quite cheap, compared to other available options. Leveraging the power and brand of the online-platform, reaching out to broad data science community is fairly easy and effective.
More than 500.000 unique visitors are drawn to kaggle.com each month, which consequently implies a high (and cheap) level of visibility for companies. Thus, companies can benefit in various other ways. Just by posting a data science competition, participating companies present themselves as open, innovative and data-driven companies, which are not afraid to crowdsource ideas and solutions. In times, where corporate culture and employer branding are essential differentiators for talent attraction, getting exposure to the community of data scientists, which possess high-in-demand analytical skills, is a very valuable opportunity for companies. Unlike Google, Apple and Facebook, which have a strong reputation as a haven for talented data scientists, other companies do not have this global brand and therefore have a valuable opportunity to position themselves on kaggle.com.
Which ultimately reveals the main objective in this regard: Recruiting data scientists. Getting access to talent and potentially converting this into full-time employment. By chance, German drug store chain Rossmann also points out that they have various positions for entry-level and senior data scientist vacant at their headquarter and the moment and are happy to receive applications. Just like hackathons are designed to get access to talented programmers, data science competitions (on kaggle.com or comparable platforms) primarily serve the purpose of getting access to data scientists and these are – and will be in the years to come – a scarce resource in any company’s org chart. Conveniently, there is also a job board on kaggle.com.
With recruiting as main purpose, is there any valuable output of the competitions?
Companies, which provide data sets for data science competitions, get – as a final result – algorithms, prediction models and the likes in return. These seem like “graspable” results (especially if you compare them to the standard output of consulting activities, which is primarily some well-designed slides), but there is a hook. While the large number of participating data science team suggests a high level of competition and quality, developed algorithms are oftentimes not general enough to have direct commercial applicability. This is a frequent problem in data science and recent news about the commercial viability of kaggle.com support the analysis that the competition platform cannot avoid these shortcomings. With the popularity of kaggle.com rapidly growing in the last years, the company has attracted more than 11 million USD of venture capital funding, but has struggled to come up with a valid business plan ever since. With its data science revenue stream not sufficiently large, the company breached out to energy consulting, in order to provide sector-specific end-to-end data science solutions. As a stand-alone option, data science competitions do not appear to create enough value. Just in September this year, news broke that kaggle.com had to lay off several employees.
Data science competitions – good or bad?
The rapid success and popularity of kaggle.com underlines that data science is on the rise. Companies are willing to post data sets and teams sign up in large quantity for the challenges. But at the same time, trouble to set up a valid business model points to some major pitfalls: algorithmic solutions as competition-output might not always be commercially applicable. Having a dedicated data science consulting team at-site, which provides a tailored solution, is still the superior use case for companies seeking to solve problems. This leaves data science competitions as a valid recruiting tool, which companies can turn to when hiring for the internal data science department. In terms of concrete output though, data science competitions are tough to judge. And in that regard it’s tough to predict its results. Maybe that would also be a good competition on kaggle.com – how to predict the commercial applicability of results developed in the course of kaggle.com-data-science-competitions?