A B C D E F G H I L M O P R S T U X
An As Av

Average

Averages are used a lot in football. Yet, an average lacks a lot of context. A player who scores an average of one goal per match, could have gotten that average by scoring ten goals in the first match (against an easy opponent) and then not scored at all for the next nine matches. So a weighted average might already be better than a simple average, although some people feel that you then introduce more subjectivity. This is not the case. An average is as subjective as a weighted average. The only difference is that often there is less convergence of opinion on a weighted average than on a simple average. By introducing weights you introduce more elements where people can disagree.

Most often average are take over many data points. That also makes an average insensitive. That means that if a player suddenly starts to do better or worse, it takes a lot of time before you see that change in the averages. Especially when preparing for the upcoming match, it is often much better to look at measures that are more sensitive to change than an average.

For player recruitment scouts are looking for players who are above average. But here there are pitfalls too. If your club is above average, it doesn’t tell you much if you learn that player A is above average for the league. That player is probably better than a player who is below the league average, but he still might weaken the team given that the team itself is also above average. 

In fact even if your club is below the league average and you can hire a player whose is above the league average, then you still don’t know whether he is going to strengthen the team. For you have to also look at the player he is going to replace in the team. If that player is one of your star players and even better than the above average player, hiring this above average player is still going to weaken your team. Of course, if the only other option is a below league average player, then hiring the above league average player is the least bad option. But simply stating that a player is above the league average is not enough to conclude that he will strengthen the team.

The same goes for stating that a player is in the top 5% percentile, or even the top 1% percentile. If data shows that a player is above the league average, that only means that he is in the top 50% percentile. So placing the player in the top 5% percentile already gives you much more information. Nevertheless, if the player he needs to replace or the team itself is in the top 1% percentile, then even a top 5% player can weaken the team.

The average of multiple variables

In reality most clubs don’t work with a single variable to determine whether a player is above or below the league average. Although it can be done. You can summarize many data points or averages in an average of averages. But in most cases clubs are looking at a lot of different variables. Players can be above average for a couple of those and below average for other variables.

With multiple variables it becomes even harder to use averages to see whether a player is going to strengthen or weaken the team. Looking at playing style helps a bit, but it remains uncertain whether a player can replicate his stats in a different team. Here is where the eye of human scout works wonders. In one case we were looking for a winger and a striker for an average club in the Dutch Eredivisie. I had found a striker that had nice stats in the FBM statistics I have developed. The very seasoned scout I work with told me he was no good as a striker, but that he was an interesting option for that club as a winger.

When the head scout at the club saw that we proposed this striker as an option, he also expressed a dislike for this player. Yet, when I explained that we weren’t proposing him as the center forward, but as a winger his face lit up. “Yes” he said, “I can see him excel as a winger indeed.” That is one of the many reasons why you always combine data scouting with video and live scouting. The human brain is still a wonderful biocomputer to find solutions where digital computers have a hard time coming up with the right solution.

When looking at multiple variables, it is important to be very skeptical of reports telling you that player X is above average or in the top 5% percentile in regard to skill Y. For instance In a player report I read, the consultancy firm was praising a central defender for being in the top 5% of the best defenders to support the attack because his goals per minute was very high. Then I looked in the underlying data. What was the case? This defender playing in the Premier League scored twice in the 16/17 season playing little more than 3000 minutes. Then in the 17/18 season he scored three times! And in just 2000 minutes played! That was a big boost for his goals per minute, but of course the underlying data demonstrated (a) that the difference between scoring two times or three times is pretty much a matter of happenstance and (b) that he played only 2000 minutes instead of 3000 minutes indicated that the manager did not appreciate him the way the consultancy firm concluded. This report was used to try and get this defender to play at a bigger club for a better salary and hefty transfer fee. The transfer did not happen and in the 18/19 season this defender only played 1000 minutes and did not score at all.

Such a presentation are not only misleading, but even if the underlying data is solid, then it is still risky. Our mind tends to focus and remember outstanding stats and overlook and forget all other stats. This is part of how confirmation bias works. Our unconscious mind then only processes the highlights of a player. Through associative learning our brain then connects good feelings to this player. Feelings that our conscious mind interprets as a good intuition. For that reason it is important to really delve deep into the underlying data of an average or risk making mistakes. Fortunately, in my experience that people working for clubs really do delve deep and often get very annoyed and distrustful (which is a good thing) when data providers can’t explain how they arrived at a certain value of a variable or an average.