Data Science and the Olympics: Are the games in Rio clean?

The Olympics 2016 in Rio de Janeiro (Brazil) just started a few days ago, but discussions have already circled away from sports and athletics towards a topic, which should have not been center stage: potential doping activities of various athletes.
Even before the start of the Olympics, there were rumors that the entire team of the Russian Federation would be banned, due to systematic doping activities. Eventually though, the International Olympic Committee (IOC) did decide to allow the athletes to compete in Rio, which caused various actors to question the integrity of the organization as such. Doping is a serious issue as it distorts fair competition between athletes. The Olympics are much more than just sportive comparisons between the world’s best athletes, but give lots of athletes the opportunity to qualify for large sponsor deals. The financial volume attached to the event creates incentive for cheating – data science, however, can help to optimize drug testing efforts.

Sun Yang, Yulia Efimova – the 2016 Olympics and its doping problem

Sun Yang, a Chinese swimmer, has – so far – performed splendid at the Olympics in Rio, where the swimming competitions are usually the attraction of the first week. He won the silver medal at the 400 meters freestyle competition and then moved on to win Gold at the shorter distance, the 200 meters freestyle. Already at the London Olympics in 2012 did he win two Gold medals, being the first Chinese ever to win the highest Olympic honors in a swimming competition. However, the aftermath of his performance is focused on a different issue. Sun Yang had been tested positive on trimetazidine in 2014, a steroid usually prescribed for heart palpitations. Mack Horton (himself winner of the 400 meter freestyle competition), called Sun Yang a “drug cheat”, causing outrage by the Chinese delegation, while gaining some applause from fellow athletes. “I used the words drug cheat because he tested positive,” Australian national Horton said later on. “I just have a problem with athletes who have tested positive and are still competing. No athlete has really come forward and said it. It wouldn’t have felt right if I raced against someone who had tested positive and didn’t bring it up. Hopefully others will follow.” In the women’s competition, it is Yulia Efimova’s participation, which is drawing scepticism. The Russian swim star, who failed doping control multiple times in the last years, won the silver medal in the 100m breaststroke race. While the participation of these athletes is approved by the relevant authorities (in this case FINA, the International Swimming Federation) and the benefit of the doubt applies, it sheds scepticism on the efforts which are undertaken to effectively track down doping.

How can data science support doping testing?

Under the auspices of the IOC, the World Anti-Doping Agency (WADA) is leading the effort against doping in sports, working together in close collaboration with the national anti-doping agencies. While the agency is involved in a lot of oversight and compliance activities (e.g. accrediting the scientific laboratories for analysis for doping control), they also active utilize modern data science tools.

One of the key challenges within doping control has been that those willing to trick the system are quick to develop new substances for performance enhancement, beating testing methods which can only be developed after the fact. To counter this situation, the Athlete Biological Passport (ABP) has been introducing. Through monitoring certain biological variables of athletes over time on aggregate datasets, the effects of doping can be singled out. As a result, it would allow for testing prioritization based on data, rather than trying to detect certain doping substances or methods. idalab was in contact with WADA’s Pierre-Edouard Sottas, who is managing the Athlete Biological Passport program earlier this. “I confirm that the anti-doping movement uses specific data mining tools for targeted testing and this since the inception of the ABP”, Sottas stated. While the specifics of the methods were not discussed in great detail, he shared some of them: “The main applications are the use of machine learning tools to recognize the pattern of doping in the biological profiles collected on athletes, including Bayesian inference techniques and support vector machines.” Crucial efforts are thus already on the way and Sottas appeared confident, that the sophistication of methods will only increase overtimes, allowing for the assemblance of powerful tools: “We have ongoing discussion on further refinements and new tools development which may arise from the collection of larger biological profiles.”

With such sophistication, why can athletes still cheat?

The question may arise, how – in light of biological profiles and data science methods – athletes still get away with cheating. The current regularities do not envision life-long prohibition from competition, if athletes are caught cheating with performance enhancing substances. Efimova and Sun Yang have both been prohibited from competition, but are now eligible to perform again. Efimova, though, has caused scepticism with her comments that failing a doping test and thus being ineligible for competition is somehow like have a “suspended driver’s license”, a rather shady comparison one might comment. Fact is though, that it is incentives which cause athletes to continue to cheat the system.

If only banned from competition for a few months, or years, athletes might give doping a try. Current testing mechanisms are still mainly driven by the economics. Tests cannot be obtained after every competition, similarly new performance-enhancing drugs might not be part of the standard testing procedure. While data science might detect doping effects within biological profiles and thus helps to inform testing strategies, this procedure can only take effect with significant delay. The earnings which could be accumulated in between might just be too tempting for certain stakeholders to let them pass.

Wherever there is money, there is incentive to bypass the rules and trick the system. More funds for anti-doping efforts would certainly contribute to more effective roll-out of those efforts, which are already conceptualized. Even those companies, which are heavily involved in sponsoring athletes, should have an incentive to contribute to the funding. Because, if more and more athletes are found guilting for drug cheating, public interest in the respective sport might be significantly damaged (see for example cycling). Cheating undermines fairness and fairness is one of the principles, spectators around the world value the most in sports. Empowering independent institutions with adequate financial funds to apply rigorous testing systems, applying the existing powerful tools of data science at scale, should be aspired by all involved stakeholders. So that the Olympics will produce more talk about outstanding athletic performances and not turn into a pharmaceutical showcase.

Contact the author
Niels Reinhard
+49 (30) 814 513-13

Leave a Comment

Your email address will not be published. Required fields are marked *