Towards Scalable Fraud Detection – a journey from Sklearn to Spark

Dr. Stanimir Dragiev | Dr. Tammo Krueger

Zalando

The new season of idalab talks, will begin on January 20th with Stanimir Dragiev and Tammo Krueger, who will discuss “Towards scalable fraud detection: a journey from Sklearn to Spark”. The talk will start at 4:00 pm.

Zalando is Europe’s leading online fashion retailer and currently on its way to become the platform for all fashion related business — designers, producers, logistic solutions, on-and off-line retailers. The new platform architecture challenges the company’s current in-house solutions – including the fraud detection infrastructure – to become more scalable, dependable and versatile.

This talk is a travelogue describing the journey of rewriting an in-production classification system from scratch using Scala and Spark to run on AWS. Dragiev and Krueger will start by looking at the drawbacks that are inherent to the old sklearn based solution running a static cluster, most prominently: hard maintenance, data bottlenecks, too coarse-grained parallelisation. Then, they will outline the design principles which are at the bottom of our new Scala/Spark-based solution. Often neglected aspects like model specification, data organisation and evaluation and keeping track of multiple models will also be discussed.

Dragiev and Krueger’s new solution mitigates the identified pain points by leveraging the features that Scala and Spark bring into play, in particular: strong typing, data parallelisation and easy scale out. The talk will end with a comparison between both solution, conducting measurements that highlight the performance gains experienced with Spark, for both learning and prediction times.

The event took place on January 20th, 2017.

Find the slides of the seminar on SlideShare.

Dr. Stanimir Dragiev | Dr. Tammo Krueger
Zalando

Dr. Stanimir Dragiev is a Data Scientist at Zalando, were he’s building Machine Learning solutions for fraud detection and prevention since 2014.

Tammo Krueger is a Senior Quantitative Analyst at Zalando, his main focus is the application of machine learning and statistics to real-world problems.