idalab talks: Towards Scalable Fraud Detection – a journey from Sklearn to Spark

The new season of idalab talks, will begin on January 20th with Stanimir Dragiev and Tammo Krueger, who will discuss “Towards scalable fraud detection: a journey from Sklearn to Spark”. The talk will start at 4:00 pm.

Zalando is Europe’s leading online fashion retailer and currently on its way to become the platform for all fashion related business — designers, producers, logistic solutions, on-and off-line retailers. The new platform architecture challenges the company’s current in-house solutions – including the fraud detection infrastructure – to become more scalable, dependable and versatile.

This talk is a travelogue describing the journey of rewriting an in-production classification system from scratch using Scala and Spark to run on AWS. Dragiev and Krueger will start by looking at the drawbacks that are inherent to the old sklearn based solution running a static cluster, most prominently: hard maintenance, data bottlenecks, too coarse-grained parallelisation. Then, they will outline the design principles which are at the bottom of our new Scala/Spark-based solution. Often neglected aspects like model specification, data organisation and evaluation and keeping track of multiple models will also be discussed.

Dragiev and Krueger’s new solution mitigates the identified pain points by leveraging the features that Scala and Spark bring into play, in particular: strong typing, data parallelisation and easy scale out. The talk will end with a comparison between both solution, conducting measurements that highlight the performance gains experienced with Spark, for both learning and prediction times.


Dr. Stanimir Dragiev is a Data Scientist at Zalando, were he’s building Machine Learning solutions for fraud detection and prevention since 2014.

Tammo Krueger is a Senior Quantitative Analyst at Zalando, his main focus is the application of machine learning and statistics to real-world problems.

About idalab talks

We frequently invite leading scholars, data scientists, business experts and big data thought leaders to discuss their work, gain new perspectives and generate fresh insights. idalab talks are hosted on an irregular basis and are open to friends and family.

Information about upcoming talks will always be posted on this blog. If you would like to attend, feel free to shoot us a mail:

Contact the author
Serena Rota
+49 (30) 814 513-15
Subscribe
Share