“We need to develop tools which empower domain experts to find insights”: Interview with Patrick Lucey
In a multi-billion business such as sports, data science can provide teams the crucial edge to success in a highly competitive environment. We talked to Patrick Lucey, Director of Data Science at STATS about the challenges and opportunities of data science in professional sports.
STATS is the leading sports statistics company, which provides professional teams as well as media with cutting-edge insights. The US-headquartered company offers a wide portfolio of data products, ranging from player tracking, athlete monitoring and video analysis solutions.
idalab: In which sport has a data-driven approach been most disruptive?
Lucey: I’m not sure if disruptive is the right word, but the most progressive is definitely the NBA in basketball. Since the adoption of the STATS SportVU system, teams and analysts are relying on the high-level analysis that can be derived from this data.
idalab: What sports are most “resistant” to data analytics? What potential do you see in those sports?
Lucey: If good data and analytical tools are available and can make someone’s job easier, then people will use it. The problem is that such tools aren’t available for the majority of sports, or don’t give the specific analysis that people want. Hence why people are resistant to such analytics – because they don’t see the value in it. This can be down to the data captured as capturing the player and ball locations at a high-frame rate can be difficult or the tools developed as coaches and analysts often don’t want to consume pages and pages of spreadsheets to tell them something they already know. Or the analysis is wrong as it makes incorrect assumptions, so the trust is immediately gone. This is both a key challenge and opportunity, as we need to develop tools which empower domain experts to find the patterns and insights they are wanting. The tracking data we use, allows to model specific interactions and contexts, which couldn’t be done before.
idalab: Have those tools been developed and adopted for soccer teams?
Lucey: Soccer is far more complicated than most sports, as the game is continuous, low-scoring and strategic.
idalab: How does this complexity impact the key frontiers for machine learning in soccer?
Lucey: In terms of using the tracking data, it is first finding the underlying structure of the data. For one team, there are potentially 10! permutations (approximately over 3.5 million) of player positions which is prohibitively high. Players move around continuously, so we first need to find the team structure. To do this, we need to apply unsupervised learning, which allows us to discover such structures. Once we can do this, we can count specific patterns of plays or compare one play to another one. We have been successful at doing this lately, and that allows for clustering and high-level play prediction to occur.
idalab: When is comes to pre-game preparation, what are you guys looking at?
Lucey: In terms of pre-game, we have methods which can objectively measure the “style of play” of both specific players and teams. Based on this prior information, we have a pretty strong indication of what teams and players will do against a specific opponent and we could recommend that to the coach.
idalab: Is that something Brentford and Midtjyilland, two of the team most often cited when it comes to the application of data in soccer, have been using?
Lucey: In the case of Brentford and Midtjylland, they have discovered ways to use data to find better ways of valuing players and performance which has given them an edge. In terms of our recent work, we have made substantial progress in this area lately as well. We can detect formations directly from the tracking data. We can also evaluate the likelihood of scoring a goal from a shot from the tracking data. This allows for new analysis to be conducted into how teams create and concede chances, as well as measure how effective and efficient players and teams are (including goal-keepers) in various aspects of the game. This can lead to better insights, as well as better evaluation of performance and player value.
idalab: Are stats about player performance and life style, health data and the like already put into correlation to help scouting?
Lucey: This is a tricky one. There is no doubt that these are very strong predictors of performance, however, obtaining such information is tough due to privacy concerns. I know teams are capturing such information to gauge the health of their own players, but I don’t think this data can be shared or made public as it is very personal information. Obviously having more detailed, personalized information would help with future predictions. However, we haven’t had this type of data so we shouldn’t really predict about these predictions…
idalab: Patrick, thanks for all those great insights.
Patrick Lucey is currently the Director of Data Science at STATS. He has been an Associate Research Scientist at Disney Research Pittsburgh, conducting research into Sports Analytics and Audience Analysis, and worked at the Robotics Institute at CMU, using facial expressions to aid in the diagnosis of medical conditions.