Predictive Analytics in Sports – Riding the Big Data Wave

With the constant pressure to improve performance, it is of little surprise that predictive analytics in sports is such a hot topic. As data capture and analysis technologies evolve, the quest for greater, more accurate and real-time insight looks set to continue. 

“Data is the new oil” is both an oversimplified and nuanced description of the current state of play in the world of insights. Those peddling the mantra most are probably unaware how wildly inaccurate it is in the same way as they don’t understand how succinct it could be.

  1. Data will, much like oil once did, power a wave of new transformative technologies – artificial intelligence, automation and advanced, predictive analytics.
  2. Data, much like oil, requires refining to extract the value of a trapped asset because in its rawest form it does very little to power the machines we want it to.

At this point we should stop, because perpetuating the analogy of “data is the new oil” isn’t helpful.

  • Limited transferability – If I take a barrel of crude oil from the Green Canyon in the Gulf of Mexico or the Fateh Oil Field off Dubai, they’ll achieve the same thing. In that sense, they’re worth the same to whatever refinery they ship to. But while the data which Uber holds has an intrinsic value to Lyft, it probably doesn’t hold the same value for Walmart, AT&T or Netflix.
  • Not a finite resource – Using data once then assuming its usefulness has been depleted would certainly be a mistake because it’s not a finite resource.
  • Ease of extraction – The world’s data is not becoming more difficult to source. It’s not getting more expensive to extract (the inverse is true of data in fact).

What data can achieve relies far more on the circumstantial. Those who use it best circumstantially are reaping the rewards accordingly, something particularly true in the world of sport.

The new wave

Unquestionably, this influx of data has forever changed the world of sport. Gone is the era of the pint-swilling, chain-smoking Premier League footballer. The curtain comes down on the era of portly pitcher Bartolo Colón. The John Daly physique isn’t a common sight amongst participants in golf majors. Today it seems intuitive that athleticism will pay dividends for athletes.

In the same way as businesses’ ability to collect and interpret data can be a significant contributor to their success, the world of sport has seen an overhaul in its approach to sports science and decision making. That, in its first wave, changed the nutrition and physique of athletes.

Over the past two decades, an industry which predominantly relied on intuition has become increasingly data-driven. New technology makes it possible to track, quantify and analyse almost everything athletes do in training and match environments.

Decision-making off the field has increasingly found itself driven by analytics too. While GPS tracking, heart rate monitors and laser gates quantify on-field performance, we have machine learning trying to unearth the next superstar, find a competitive advantage for coaches to implement or even gauge fan engagement translated to season ticket sales off it. All told, professional sport feels like it has fully embraced statistical rigour by embracing data.

The dawn of predictive analytics in sports

So, what did the first sports statistic look like? A reasonable contention would be batting average in cricket, something has been used to gauge cricket players’ relative skills since the 18th century. This involved capturing and aggregating individual scores for all players and dividing by the number of games they played. Henry Chadwick, an English statistician raised on cricket and dubbed the “Father of Baseball”, took this to develop the batting average (BA) in baseball along with earned run average (ERA) in the 19th century.

For all that early invention, often progress can be marred by setbacks. One of the most famed errors of early sports data was that of Charles Reep who, through a phenomenal misinterpretation of his own statistics which he gathered pitch side, concluded that most goals were scored from fewer than three passes.

It was, after all, a simple error which should haven’t all that much consequence? Not quite – his mistake resulting in the invention of the long-ball football which marred England from the 1950’s for the next half century and beyond. Jonathan Wilson, in Inverting the Pyramid: The History of Football Tactics, said of the misinterpretation:

“It is, frankly, horrifying that a philosophy founded on such a basic misinterpretation of figures could have been allowed to become a cornerstone of English coaching. Anti-intellectualism is one thing, but faith in wrong-headed pseudo-intellectualism is far worse.”

It stands testament to the importance of interpretation of data as much as anything else. Conversely, a more recent success has added “analytics” to the lexicon of professional sports everywhere. Captured in Michael Lewis’ 2003 book, Moneyball: The Art of Winning an Unfair Game, the story of the Oakland A’s success in building a team of undervalued talent through sabermetrics, the empirical statistics of baseball so keenly studied by the Society for American Baseball Research (SABR). It has inspired mass adoption of statistical analysis and imitation of their approach throughout Major League Baseball and across so many other top-level sports.

Ignacio Palacios-Huerta, professor at London School of Economics, aided Chelsea’s preparation for their 2008 Champions League penalty shootout by providing information of the tendencies of the Manchester United players. He correctly anticipated Cristiano Ronaldo’s stuttered approach to his penalty, which Petr Cech saved as a result, and that Edwin Van der Sar was far more competent diving to his right. The issue; asides from some bad luck in John Terry slipping with a chance to win, was Nicolas Anelka deviating from the plan and placing his shot to Van der Sar’s right where he comfortably saved it.

What Moneyball was to the Big Data revolution, Astroball is to the tale of modern sporting analytics and where the landscape has advanced to. The book by Ben Reiter covers the Houston Astros being rebuilt during an historically bad three years, which made them one of the worst teams ever in professional baseball.

Reiter approached them as a journalist asking how a team could be so consistently bad despite an assembly of brilliant minds like Sig Mejdal, who was previously at NASA, and Jeff Luhnow, a management consultant from McKinsey who had succeeded with the rival St. Louis Cardinals.

The part which makes it particularly compelling is Reiter, after seeing the “process” first hand, ran a cover story in 2014 in Sports Illustrated proclaiming the Astros as winners of the World Series in 2017 – something universally derided which turned out to be prophetic. “The Nerd Cave”, of which Mejdal took charge, gave the Astros a consistent edge in pitching and recruitment by moving away from the dichotomy Moneyball creates between scouts and analytics to embrace and metricise the biases and heuristics of intuition. It represents an astonishing leap in a short space of time where the intuition so reviled by statistics has been embraced to quantify evaluation of ability by eye.

Where to next?

The world of sport is doing well in terms of interpreting the volume of data it has to hand. When a football club tracks its players’ GPS data, it can tell where they are throughout the 90 minutes. When it comes to the opposition however, there’s a limitation. Most publicly available football data works on on-the-ball action – think the likes of passing completion, tackles, interceptions, expected goals (xG) and expected assists (xA).

The next frontier is in what happens off the ball, something Mladen Sormaz and Dan Nichol presented at the OptaPro Analytics Forum 2019 in “Quantifying the impact of off-the-ball movement in football”.

We now look to the delayed run or dropping player to capture the shape damage to the defensive team that is created by the movement of the attacking team.

What technology makes this practical?

Outside of having access to GPS universally available, computer vision promises a lot in the space – that is gaining high-level understanding from videos by using methods including machine learning to identify objects within frame.

The ability to capture all of those tens of thousands of data points directly from a feed is interesting and the additional scale allows far more questions to be asked in significantly larger data sets.

Reinforcement Learning (RL) will hopefully answer that – a field of machine learning where the programmer sets the parameters and RL seeks to interpret what the maximum return is from whatever set it’s presented. Those near open-ended questions are hugely interesting when facing enormous data sets as the model’s ability to ask various questions the programmer mightn’t know to ask has the potential to unearth incredible, potentially unsought insights.

What does this all mean?

As technology and predictive analytics becoming increasingly prevalent, sport will continue to evolve, and team, fans, media and other industries will find new uses for this data.

In the simplest terms, this could mean better insights and more of them. New links of cause and effect. New statistics to metricise performance. New on-base percentages (OBP) which revolutionise sports by finding undervalued players like those in Moneyball.

In Flutter’s world, those statistics promise a lot for our offerings. The ability to garner more real-time insights creates the possibility for new markets offered more on a play-by-play basis and a richer live experience.

Regardless of the unknowns, we stand on the cusp of an exciting time for discovery in professional sport.

If you’re building innovative new solutions to collect and analyse large volumes of data, we’d love to hear from you.


Comments are closed.