2022 – real biathlon

Most improved athletes this winter

Posted on 2022-12-24 | by

Season-to-season improvements in Total Performance Scores of regular World Cup athletes. The last row of both tables shows improvement and decline in overall scores for this season’s World Cup trimester 1 compared to performances in trimester 1 last season (only athletes with at least 5 races this winter). You can do your own season-to-season comparisons for all stats in the Patreon bonus area.

Note: The scores are standard scores (or z-scores), indicating how many standard deviations (SD) an athlete is back from the World Cup mean (negative values indicate performances better than the mean). The Total Performance Score is calculated by approximating the importance of skiing, hit rate and shooting pace using the method of least squares (for more details, see here and here), and then weighting each z-score value accordingly.

Men

2022–23 z-Scores compared to 2021–22 | Non-Team events

No	Family Name	Given Name	Nation	Races	Ski Speed Score	Hit Rate Score	Range Time Score	Total Performance Score	Change
No	Family Name	Given Name	Nation	Races	Ski Speed Score	Hit Rate Score	Range Time Score	Total Performance Score	Change
1	Hartweg	Niklas	SUI	6	-0.97	-1.41	-1.47	-1.16	-0.99
2	Andersen	Filip Fjeld	NOR	8	-1.17	-0.85	-0.44	-0.99	-0.63
3	Claude	Florent	BEL	8	-0.64	-1.42	0.25	-0.76	-0.56
4	Guzik	Grzegorz	POL	5	0.07	-0.07	0.59	0.09	-0.50
5	Giacomel	Tommaso	ITA	8	-1.21	0.20	-1.92	-0.88	-0.48
6	Laegreid	Sturla Holm	NOR	8	-1.59	-1.42	-1.85	-1.57	-0.42
7	Boe	Johannes Thingnes	NOR	8	-1.98	-0.94	-1.37	-1.60	-0.42
8	Ponsiluoma	Martin	SWE	8	-1.49	0.40	-0.91	-0.87	-0.36
9	Doherty	Sean	USA	8	-0.36	0.20	-0.89	-0.26	-0.36
10	Rees	Roman	GER	8	-1.06	-1.04	-0.40	-0.98	-0.35
11	Stvrtecky	Jakub	CZE	6	-1.11	0.52	0.34	-0.46	-0.33
12	Iliev	Vladimir	BUL	5	-0.89	0.64	0.25	-0.31	-0.30
13	Strolia	Vytautas	LTU	8	-1.01	-0.91	-0.42	-0.91	-0.28
14	Sima	Michal	SVK	6	0.44	-0.86	0.73	0.10	-0.26
15	Lapshin	Timofei	KOR	5	-0.29	-0.36	-1.91	-0.50	-0.25
16	Komatz	David	AUT	7	0.00	-1.36	0.80	-0.30	-0.25
17	Magazeev	Pavel	MDA	6	0.09	-0.82	1.95	0.05	-0.25
18	Karlik	Mikulas	CZE	5	-0.61	1.04	0.94	0.05	-0.23
19	Hiidensalo	Olli	FIN	8	-0.54	-0.56	-0.52	-0.54	-0.22
20	Tachizaki	Mikito	JPN	6	0.71	-1.27	-0.01	0.05	-0.20
21	Doll	Benedikt	GER	8	-1.41	-0.27	-0.71	-1.00	-0.17
22	Claude	Fabien	FRA	8	-1.48	-0.75	-0.76	-1.18	-0.16
23	Nelin	Jesper	SWE	8	-1.24	-0.75	0.57	-0.88	-0.14
24	Krcmar	Michal	CZE	8	-0.98	-0.75	-0.12	-0.81	-0.08
25	Brandt	Oskar	SWE	5	-0.90	1.35	0.65	-0.06	-0.07
26	Wright	Campbell	NZL	5	0.16	0.29	-0.24	0.15	-0.06
27	Perrot	Eric	FRA	5	-0.81	0.64	-0.13	-0.31	+0.00
28	Zahkna	Rene	EST	7	0.52	-0.80	-0.13	0.06	+0.07
29	Dudchenko	Anton	UKR	7	-0.62	-0.68	-1.15	-0.70	+0.08
30	Guigonnat	Antonin	FRA	8	-1.09	-0.37	-0.34	-0.79	+0.12
31	Fillon Maillet	Quentin	FRA	8	-1.07	-1.23	-1.21	-1.13	+0.14
32	Leitner	Felix	AUT	8	-0.31	-1.04	-0.71	-0.57	+0.16
33	Runnalls	Adam	CAN	6	0.10	0.11	-1.77	-0.12	+0.17
34	Langer	Thierry	BEL	6	-0.35	0.80	-0.29	-0.01	+0.20
35	Boe	Tarjei	NOR	8	-1.30	-0.37	-0.54	-0.94	+0.21
36	Samuelsson	Sebastian	SWE	8	-1.23	-0.75	-0.38	-0.99	+0.21
37	Femling	Peppe	SWE	7	-0.50	0.22	-1.39	-0.40	+0.21
38	Christiansen	Vetle Sjaastad	NOR	8	-1.52	-0.47	-0.20	-1.06	+0.23
39	Seppala	Tero	FIN	7	-1.03	0.22	-0.75	-0.63	+0.23
40	Jacquelin	Emilien	FRA	8	-1.61	0.11	-0.89	-1.02	+0.24
41	Nawrath	Philipp	GER	5	-0.62	-0.20	0.21	-0.40	+0.31
42	Kuehn	Johannes	GER	7	-1.26	0.67	-0.47	-0.60	+0.32
43	Eder	Simon	AUT	5	-0.41	-0.67	-0.52	-0.50	+0.32

Women

2022–23 z-Scores compared to 2021–22 | Non-Team events

No	Family Name	Given Name	Nation	Races	Ski Speed Score	Hit Rate Score	Range Time Score	Total Performance Score	Change
No	Family Name	Given Name	Nation	Races	Ski Speed Score	Hit Rate Score	Range Time Score	Total Performance Score	Change
1	Klemencic	Polona	SLO	7	-0.91	-0.29	0.20	-0.59	-0.68
2	Vittozzi	Lisa	ITA	8	-1.29	-0.92	-1.28	-1.18	-0.47
3	Kinnunen	Nastassia	FIN	5	-0.48	-0.04	0.56	-0.23	-0.46
4	Gasparin	Aita	SUI	8	-0.53	-0.75	-0.88	-0.64	-0.45
5	Comola	Samuela	ITA	7	-0.23	-1.01	-0.04	-0.43	-0.44
6	Simon	Julia	FRA	8	-1.38	-1.10	-1.79	-1.34	-0.42
7	Minkkinen	Suvi	FIN	7	-0.26	-1.22	-0.96	-0.62	-0.39
8	Eder	Mari	FIN	8	-1.35	0.46	-0.09	-0.67	-0.37
9	Batovska Fialkova	Paulina	SVK	8	-0.71	-0.23	-0.42	-0.53	-0.36
10	Tandrevold	Ingrid Landmark	NOR	8	-1.33	-1.01	-0.48	-1.13	-0.31
11	Knotten	Karoline Offigstad	NOR	8	-0.35	-1.01	-1.12	-0.63	-0.26
12	Chevalier	Chloe	FRA	8	-0.99	-0.49	-0.13	-0.74	-0.26
13	Voigt	Vanessa	GER	8	-0.92	-1.36	0.36	-0.90	-0.25
14	Lunder	Emma	CAN	6	-0.54	-0.76	-1.49	-0.72	-0.25
15	Zdouc	Dunja	AUT	7	0.15	-1.01	-0.79	-0.30	-0.23
16	Reid	Joanne	USA	6	-0.17	-0.13	0.14	-0.12	-0.22
17	Schwaiger	Julia	AUT	7	-0.37	0.01	0.06	-0.20	-0.21
18	Persson	Linn	SWE	8	-0.98	-1.01	-1.30	-1.03	-0.20
19	Maka	Anna	POL	5	-0.15	-0.37	0.51	-0.13	-0.15
20	Tachizaki	Fuyuko	JPN	7	-0.28	-1.01	0.79	-0.36	-0.13
21	Magnusson	Anna	SWE	8	-0.71	-0.84	-0.82	-0.76	-0.12
22	Irwin	Deedra	USA	6	-0.19	-0.13	0.13	-0.13	-0.10
23	Wierer	Dorothea	ITA	8	-0.98	-0.75	-1.78	-1.01	-0.10
24	Zuk	Kamila	POL	6	-0.31	0.37	0.37	-0.03	-0.09
25	Tomingas	Tuuli	EST	6	-0.85	0.62	0.60	-0.25	-0.06
26	Lien	Ida	NOR	7	-1.16	0.12	0.51	-0.59	-0.05
27	Herrmann-Wick	Denise	GER	8	-1.37	-0.49	-0.58	-1.02	-0.03
28	Kalkenberg	Emilie Aagheim	NOR	7	-0.25	-0.60	-0.79	-0.42	-0.03
29	Davidova	Marketa	CZE	8	-1.02	-0.92	-1.20	-1.01	-0.00
30	Bendika	Baiba	LAT	7	-0.83	0.73	-0.66	-0.36	+0.02
31	Chevalier-Bouchet	Anais	FRA	8	-1.22	-0.49	-1.13	-1.00	+0.04
32	Haecki-Gross	Lena	SUI	8	-0.94	0.46	-1.09	-0.55	+0.04
33	Oeberg	Elvira	SWE	8	-1.72	-0.84	-0.86	-1.36	+0.06
34	Hauser	Lisa Theresa	AUT	8	-0.93	-0.84	-1.44	-0.97	+0.16
35	Lie	Lotte	BEL	7	-0.38	-1.22	-0.13	-0.59	+0.17
36	Todorova	Milena	BUL	7	-0.54	0.32	-0.32	-0.26	+0.19
37	Oeberg	Hanna	SWE	8	-1.31	-0.14	-1.51	-1.00	+0.24
38	Preuss	Franziska	GER	5	-0.81	-0.31	-1.13	-0.70	+0.33
39	Stremous	Alina	MDA	7	-0.16	-0.09	1.23	0.03	+0.33
40	Jislova	Jessica	CZE	7	-0.15	-0.70	-0.49	-0.35	+0.40
41	Charvatova	Lucie	CZE	6	-0.40	1.25	-0.88	0.02	+0.42
42	Blashko	Daria	UKR	6	0.48	-0.51	-0.45	0.08	+0.42
43	Fialkova	Ivona	SVK	5	-0.73	1.73	-0.18	0.05	+0.45
44	Bilosiuk	Olena	UKR	5	0.68	-0.69	-0.03	0.20	+0.53
45	Nilsson	Stina	SWE	7	-0.69	0.63	0.47	-0.17	+0.54

New biathlon point system

Posted on 2022-12-04 | by

real biathlon | Leave a Comment

The International Biathlon Union (IBU) introduced a new scoring system for the Biathlon World Cup from this winter onwards: world championships will no longer be included in the World Cup score, no more dropped results and a major adjustment in the points system to increase the value between top results.

It’s arguably the biggest season-to-season change in the history of the sport and not everyone is happy with it.

Old vs. new biathlon point system

Rank	Scoring system from 2008–09 to 2021–22	New scoring system from 2022–23
1	60	90
2	54	75
3	48	60
4	43	50
5	40	45
6	38	40
7-40	unchanged	unchanged
	(mostly) 2 dropped scores	no dropped scores
	WCH races count	WCH no longer count

The IBU points system has always been an outlier compared to pretty much any other scoring system in sports, especially other FIS winter sports, because it greatly undervalued top results. Some people are concerned seasons will be decided too early now, others don’t like the fact that consistency is no longer as important. The fact that no results can be dropped any more has also been criticized by some athletes.

The new biathlon points system is still less extreme than the FIS scoring system or Formula One for example. Interestingly enough, the IBU prize money distribution has always been more top heavy than their scoring system. Let’s take a closer look at how previous seasons would have turned out with the new system.

For last season’s Overall World Cup, the new point system would have had very little effect. The top 3 for both men and women would be unchanged if you apply the rules of the new scoring system. The only World Cup score that would have been flipped is the women’s Mass Start score, which was won by Justine Braisaz-Bouchet, but now would go to Elvira Öberg with the new points system.

Both big crystal globes were won rather decisively, so it is no surprise a different scoring system wouldn’t change the outcome. For last season, there wouldn’t have been much difference in when the title race was over either. Both winners would have been crowned just one race earlier (Quentin Fillon Maillet would have clinched the title in the Otepää sprint, instead of the mass start, Marte Olsbu Røiseland would have won the title three instead of two races before the end of the season).

Things get more interesting for 2019–20. Here both the men’s and the women’s overall winner comes out different. It also gets quite complicated, because aside from the mere points, there’s also dropped results and the difference in world champion races to account for.

For the men, the season actually ended like this: Johannes Thingnes Bø 913, Martin Fourcade 911. With the new system Fourcade would have won 1019 vs. 1001. However, if you still count the world championship results, the outcome flips again, and Bø comes out on top (1286 vs. 1014).

It gets even more extreme on the women’s side. The actual score was very close: Dorothea Wierer 793, Tiril Eckhoff 786. However, using this winter’s scoring system, Eckhoff would have won the title quite easily (956 vs. 737). Mostly because of Wierer’s very strong and Eckhoff’s horrible 2020 WCHs in Antholz; results which would now no longer be included. If you count the championship races, Eckhoff still comes out on top (1039 vs. 1028), but only by 11 points, thanks to her 7 wins that season compared to Wierer’s 4.

Since 2011, five (out of 24) Overall World Cup decisions would have been changed due to the new scoring system (2011: Bø vs. Svendsen, 2014 Berger vs. Mäkäräinen, 2018 Mäkäräinen vs. Kuzmina, plus both winners in 2020 as mentioned above).

It seems that even with the new scoring system, World Cup seasons that were close before will still be close even with the bigger point spread. And for seasons with runaway winners, which we had several on the men’s side during the last decade, the point system doesn’t matter all that much. The biggest change is probably the fact that from now on wins and podiums will be much more important that consistent top 10 results.

Historic biathlon results create expectations. But what about points?

Posted on 2022-06-15 | by

biathlonanalytics

Introduction

The first article on the concept of the Win Expectancy Index based on Statistical Exploration, the introduction of W.E.I.S.E., explained the process and concept of win expectation. Using historic race results by athletes with variables identical to those of the athlete being analyzed, it calculated the percentage of wins. This gave us an idea of how likely it was for our athlete to win as well.

The second article on the topic gave you some examples of the possible use cases of the WEI.

After further developing the WEISE, this third article explains and demonstrates a new calculation using the same data and logic as the WEI. In this case, however, it does not look at the results binarily (winning or not winning), but rather at every result that awarded athletes with points*. The results of this calculation are summarized in the Expected Points Index (EXPI).

^* see yellow box further down in the article

As a quick reminder, this is what the Win Expectancy Index looks like for all third laps of women’s sprint races (see the interactive version of the full WEI on Tableau Public):

Win Expectancy Index example
A female (W) athlete entering lap three of a SPrint race with 1 miss and ranked #3 in ski time, has a **20%** chance of winning the race.

Version 2 of W.E.I.S.E.

One of the limitations of the Win Expectancy is that it only looks at the race results binarily: you either win a race or you do not win the race. But since World Cup races are rewarded with points we can use the same approach to calculate the EXpected Points (EXP).

This is a good time to remind you that W.E.I.S.E. uses calculated points based on calculated ranks, which can significantly differ from the ranks (and points) you find on the IBU website. All pursuit race results are (re)ranked based on the isolated times, thus ignoring the time differences at the start of the races which are based on the sprint race results. Also, in the season point totals, in addition to including points for Olympic Races, I do not deduct the points of the worst two races of the season, as is common in the IBU total standings. Lastly, all points are recalculated based on the current rules on awarding points as defined by the IBU.

EXpected Points

For all race participants in the last 22 seasons that were in the same race situation for cumulative misses and ski rank, we calculate the average of the points those athletes were awarded (link to interactive dashboard):

Again, like with the Win Expectancy, we can compare expected results with actual results. But now we can do so with far greater detail. Rather than saying “Eder had a 32% chance of winning after the second lap, but he didn’t win” we can now say “Eder could expecting 15 points after lap two, but got 18 points in the end”. And since we have this more detailed information at the points level, it allows us to calculate overperformance.

Overperformance

We simply subtract the EXpected Points on any lap during a race from the Actual Points (AP) at the end of that race. This measure tells us if the performance was above or below the expectation for every athlete. Overperformance is therefore defined as the actual points above the expected points. A negative value for overperformance simply indicates that the actual points were lower than the expected points. This can also be referred to as underperformance.

Let’s look at an example. In the image below, we have all races of the 2021-22 season for Julia Simon. Based on the overperformance calculation, we can see that she generally performed quite a lot better than expected:

Julia Simon has performed better than expected based on her last laps in all races in the 2021-22 season

When we turn this chart by 90 degrees and add lines for actual points (AP, green) and expected points (EXP, blue), it shows us Julia’s seasonal trend:

Again Julia Simon, comparing Actual Points (AP) to Expected Points (EXP) per final lap of the race
for the 2021-22 season (lines), and the overperformance (bars)

Running total of overperformance points

Now, rather than looking at the data race by race, let’s look at the overperformance cumulatively. Showing the running total of these values, we get a sense of how much Julia has overperformed this whole season. An amazing 120 points more than she could have expected based on historic results:

Same as above but with a running total of Over-and Under Performance based on final laps

Let’s move forward with that idea of the running total of overperformance points for the season. The following chart plots all female athletes of the 2021-22 season based on the total actual points (vertical axis) and the running total of overperformance points (horizontal axis):

Total actual points and running total of over-and under performance points for women’s final laps of races in the 2021-22 season

Based on this chart, Julia Simon, Dorothea Wierer and Lisa Hauser had the best total of overperformance points this season. They scored far more points than expected from historic results, based on their misses and ski rank going into the final lap of the races. Alina Stremous, Mari Eder and Vanessa Voigt were the worst overperforming athletes (aka underperformers) scoring much fewer points than expected.

Analyzing the past season, the athletes who overperformed did very well. Alternatively, when looking at the season ahead, athletes that underperformed last season could have expected to do better. So with some specific improvements over the summer, and perhaps some better luck, they may be able to make huge jumps next season if they can perform up to or above their expected points.

Season(s) averages and spread of overperformance

Per athlete, we can take the average of all the overperformances to see how they performed per season or over a specific timeframe. The other thing to look at is how large the difference was between their best overperformance and worst overperformance during that timeframe, the overperformance spread, or variance.

The last charts of this article shown below, also available interactively in the W.E.I.S.E. version 2 dashboard, show all athletes of the 2021-22 season plotted based on their average points overperformed per race (final laps only), and their overperformance spread:

Dorothea Wierer had an average of 4.56 point overperformance and an overperformance spread of 31.1

With overperformance it is fairly straightforward to determine good and bad: a positive overperformance is good (and the larger the better) and a negative one is not good, or at least leaves room for improvement. With the spread, it is a little less clear. As it is strictly based on the best and the worst overperformances during the selected season, it could be used, cautiously, as an indicator of consistency.

Another example. When we dig into Wierer’s data a little deeper, we can see that with an already strong average of +4.56 points of overperformance. Her average would be even better if the two negative outliers could be excluded. Did she perhaps brake a pole in the last lap of that individual race or had a fall in the last lap?

It will be up to Wierer and her team to figure out what happened in those two races to learn and improve. And for other athletes to find out how Wierer is so good at outperforming the expected points.

Dorothea Wierer had her best overperformance at 14.9 points above expected, and a worst of –16.2 overperformance, for a spread of 31.1

Conclusion and further development

In the first article on the Win Expectancy Index based on Statistical Information, I mostly focused on explaining the underlying data, processes and models, and the concept for the index itself. The second article gave some practical examples of the application of the index. This third article introduced an alternate and more detailed measurement based on the same concept: EXpected Points.

Tools like the W.E.I.S.E. are not suggested to eventually replace current and conventional wisdom, skill, expertise and knowledge in the field of biathlon. But it provides a different view based on actual data. And this is an approach not commonly and seriously used by many athletes and nations in the world of biathlon. Yet? Hopefully, articles like these will open some eyes to new possibilities that eventually, combined with, rather than instead of, current knowledge and expertise, will push biathlon athletes even further.

Thoughts or comments? You can find me on Twitter, or email me on the podcast Gmail account: PenaltyLoopPodcast

What do you expect? Practical applications of the W.E.I.S.E.

Posted on 2022-06-14 | by

biathlonanalytics

Introduction

In my last article, the introduction of W.E.I.S.E., I introduced the process for creating the calculation of the expectation of winning, based on historic results with identical combinations of discipline, gender, race lap, the cumulative number of misses and cumulative ski time rank. After further developing the WEISE, this article demonstrates some practical uses of this Win Expectancy Index (WEI).

As a quick reminder, the interactive WEI dashboard can be found here and looks like this (with lap 3 selected):

Examples of practical uses of the Win Expectancy Index

A tool like the Win Expectancy Index only makes sense if there is actual value in using it. Personally, I think it is very interesting to just look at the index, but I can imagine there are not too many people who share that passion for biathlon, data and data visualization. Below are some practical examples of how the Win Expectancy Index can be used to analyze athletes and races.

Athletes

Final lap Win Expectancy versus actual result

The conversion rate of opportunity to wins. For example, Elvira Oeberg’s conversion rate for races in which her Win Expectancy on the last lap was 33% or higher, was 60% (she won 3 of the 5 races).

Elvira race results in the 2021-22 season and Win Expectancy in the last lap of the race

Seasonal trend of average Win Expectancy per race for the final lap

This allows us to look at an athlete’s performance without the influence of the actual race results

The trend of JT Boe’s Win Expectancy in the final lap of every race

Average Win Expectancy per lap per discipline

Provides insight into a racer’s ability to balance their races well, and still perform well right to the final lap of a race

Looking at average Win Expectancy development per lap per athlete for sprint races

In this example, we are looking at all sprint races for women in the 2021-22 season. We then average each athlete’s Win Expectancy per lap. From this, we can see that for athletes like Roeiseland, Elvira Oeberg and Hauser the Win Expectancy increases towards the end of the race. On average, Sola, Alimbekava and Nilsson’s Win Expectancy declines in the last lap.

This gives athletes and coaches a tool to further analyse data at an individual race level to see if there is an issue or potential for improvement.

Races

Win Expectancy per race

How does the Win Expectancy change over the course of the race, and how did the eventual winner develop during the race?

Win Expectancy for athletes in the Women’s Sprint during World Cup 8 of the 2021-22 season

The trend of Win Expectancy per miss as ski rank increases

Here we can see that for athletes in the top-10 for ski rank, the difference in WE between 0 (dark green) or 1 (light green) miss is fairly consistent

Win Expectancy for all athletes in lap five of all five-lap-races with either
one (light green) or zero (dark green) misses as their race rank increases

The trend of Win Expectancy per ski rank (group) as misses increase

We can see here that the WE when skiing top-5 with one miss (~43%) is lower than when skiing top-10 with 0 misses (~49%)

Win Expectancy for all athletes in lap five of all five-lap-races with a ski rank as indicated by the labels as their race misses increase

Using Win Expectancy data to indicate some of the more exciting races

For biathlon viewers and fans, we can look at races where the highest value for Win Expectation (all athletes) during the race was 20% or less (a close pack). And within that group, the number of athletes that had that highest value of WE at some point during the race. If those numbers are 2 or more in sprint races or 4 or more in all other races, we then look at the final rank of the athlete(s) with the highest WE. If that was 10 or better we highlight the races with a pink star:

Based on this logic, the Women’s Sprint of World Cup 7 in the 2002-03 season would be interesting to watch (the highest WE of the race was only 14%, three athletes had that WE during the race and the highest rank of those three athletes was 9th). Or more recently, rewatch the 2019-20 World Cup 8 Women’s Pursuit (15%, 5 and #4).

Conclusion and further development

In the first article on the Win Expectancy Index based on Statistical Information, I mostly focused on explaining the underlying data, processes and models, and the concept for the index itself. This article dove into the application of the index by showing some examples of how it can provide value in analyzing the performances of athletes related to historic results with the same variables. I hope that this gives a better understanding of the concept of the WEI, and highlights its potential for performance analysis and athlete improvement based on a different, data-driven, view of biathlon performance.

Thoughts or comments? You can find me on Twitter, or email me on the podcast Gmail account: PenaltyLoopPodcast

Introducing W. E. I. S. E: the Win Expectancy Index based on Statistical Exploration, version 1

Posted on 2022-05-13 | by

biathlonanalytics

Introduction

History

For many years, sports data analysts in numerous fields have used results from the past to predict outcomes of similar situations in the future with a specific level of certainty. In baseball, for example, one can foresee the outcome of an at-bat with specific parameters (2 strikes, 2 balls, 1 out, batting right, pitching left, and a score of 4-2 in the 7th inning) based on all similar situations in previous seasons of baseball. In (ice-)hockey one can do the same when looking at the shot location, the period and the current score with a specific goalie in the net, or the home-team win expectation based on a certain score and with a specific amount of time left in the game. All in all, there is a long history of using historical results to predict future ones. The article below takes a first stab at doing something similar for our beloved sport biathlon.

What is W.E.I.S.E.?

Since the end of the 2021-2022 season, I have been working on The Win Expectancy Index based on Statistical Exploration, W.E.I.S.E. or WEI for short. It uses results from previous biathlon races to predict with a level of certainty what the outcome of a race can be. Unlike baseball, and to some degree hockey, biathlon doesn’t offer its data ‘live’ during a race, but we can still use the WEI to analyze athletes’ performances while comparing predicted results and actual results, look at the strength and weaker points of an athlete during a race, and what component of the race to focus on, to name a few.

The WEI only just sprouted, and I’m already claiming quite the impact and outcome of it! Please don’t take me too seriously (but please, also don’t take me not serious at all). Although I highly value data and statistics in sports, I fully realize they are just one of the means that can be used for analyzing biathlon, and do not replace but rather enhance any other type of analysis. And outcomes are just predictions based on other events that, on paper, are the same, but in reality can be quite different, and as such, while providing value, should be taken with a few grains of salt.

Goal

My goal with the WEI and this article is to share my thoughts, describe the creation process and explain its possible applications. And by doing so I hope to get your interest and provide me with feedback. Yes, I am highly interested to hear your thoughts, know if you see any value in the WEI, and read about what thoughts you have when you read this! (fire me a tweet if you care to share!) In the end, I want to keep improving, updating and refining the WEI, and getting different perspectives will help that tremendously. And although it would sadden me after double-, triple- and quadruple checking everything, if you do find anything wrong or broken in the process described here or the index demonstrated further down, please do let me know.

Data

Basics

The basic data for this article are the individual race results on a participant level, as gathered by RealBiathlon, for all seasons starting in 2001-2002 up to and including 2021-2022. From all those races, the following results were excluded: Null (no values), DNF (did not finish), DNS (did not start), DSQ (disqualified), and LAP (lapped).

To give you an idea, we are talking about 1,068 races in total (136 individual, 200 mass starts, 328 pursuit and 402 sprint races. All these races had a combined total of 76,712 participants, 1,143,230 shots and 226,896 misses (19.8%).

Key statistics

For the first version of the WEI, I focus on two specific statistics that greatly impact the results of a biathlon race: ski speed, expressed in course time, and shooting, expressed in misses. Since I want to be able to look at win expectancy at every end of a lap (post-shooting) I calculated cumulative misses per lap and cumulative course time per lap which I then ranked.

(Re-)Calculations

I also (re-)calculated final race ranks and points, for a number of reasons:

First, I wanted to include Olympic Races for which no World Cup points are rewarded
Second, when using such a long time period, some point rewarding rules may have changed, so I wanted the points to reflect the current rules for awarding points
Third, as pursuit races can be heavily influenced by start times, I used the isolated times (actual race time ignoring start time differences) and calculated the final result rank and points.

Note that this last item can lead to odd-looking results, but please trust that the calculations are fine. For example, we know that Quentin Fillon Maillet won six pursuits in a row this season (including Olympics). However, his isolated results in those races were 5th, 2nd, 3rd, 14th, 3rd and 7th.

Data kaputt?

When looking at data for the 4th and 5th laps, some sprint races may appear to be broken. This is not the case but due to being the only discipline that only has two shootings and three laps. In the dashboard I share below I have made sure sprint races are not shown for laps four and five, but when more charts become available in the near future, you may see some oddities for laps four and five in sprint races. Now you know what causes it.

Unfortunately, some races are missing course time data for certain or all participants. This was likely due to broken time tracking equipment or ankle straps on the day of the race, or someone forgot to turn on the trackers, or something comparable. Since this data is actually ‘kaputt’, these results were removed from the dataset, as nulls can not be used in the calculations and counts (118 participants in total).

Analysis

Levels of detail

Based on the data described above, I wanted to create this first version of the WEI with cumulative misses and cumulative course time combinations, that possibly could be generalized in bins. As we are only able to look at results per lap, we don’t have the luxury of generating a huge number of historical results as they do in baseball for example. To ensure a decent sample size is available, grouping ski rankings in bins seemed to make sense. I also wanted to make sure that the resulting dashboards would allow users to look at the data as a whole, per gender and per discipline.

Calculating Win Expectancy

After organizing all the data in such a way that I had combinations of race/athlete/lap/cumulative misses/cumulative ski time rank, I could then calculate the total occurrences of these combinations, as well as the total number of race winners with those combinations.

For example, after the first lap, the most occurring combination was that of 535 participants with zero misses and ranked 13th in ski time, of whom 35 ended up winning the race. The Win Expectancy (WE) for that group would be 35/535 = 6.5%. Not surprisingly, the 501 participants without misses and ranking first in ski time after the first lap had the highest WE, with just under 30%. These WEs are still relatively low, as they are calculated after the first lap and shooting. A lot can still change, especially in the race disciplines that go five laps.

Reliability

The sprint races, only having three laps (while having the most participants), will have their most reliable WEs after lap three, when both their cumulative misses and ski time rank will no longer change. The athletes in all race disciplines with zero cumulative misses and ranked first in cumulative ski time after lap three have a 76.4% chance of winning (94 winners out of 123 athletes in total), based on historic results. If we look at athletes after lap three in sprint races only, the WE goes up to 89.6% (69 out of 77).

Sample sizes

The more specific we get with regards to race parameters, the fewer athletes and results we will have to compare to (smaller sample size). Especially when we look at mass starts, as those races already have a reduced number of athletes only starting 30 per race. For example, when we specifically look at the combination of the mass start discipline, women, one miss and ski time rank five, we will only find ten athletes (of whom two won, so the WE for that group is 20%). And as we go further down in misses and rank for that group, say for six misses and ski time rank 18, we only have four athletes.

Bins

To address this we can use bins for the ski time rank, groups of values close together. So far I used bins with a size of five, combining all athletes from the same original group above (mass start, women, one miss) in ski time rank groups of 1-5, 6-10, 11-15, and so on. The ten athletes we found with one miss and ski time rank five are now in the group with one miss and ski time ranks 1-5, giving us 68 athletes including 27 winners, for a WE of 39.7%.

The W.E.I.S.E.

Now that I have introduced and described the creation of the W.E.I.S.E., let’s just have a look at it. Below is an interactive version of the Win Expectancy Index. Based on your input in the filters for Lap, Ski Rank or bin, Gender and Discipline, it shows the Win Expectancy for any combination of cumulative misses and cumulative-ski-time rank. If you hover your mouse over a data point it will show a pop-up with additional details, as shown below:

In case the embedded version below is not working or not displaying well on your screen, please go to the same dashboard on my Tableau Public site. If you’re not sure how to use the filters and such please see my previous post with some tips and tricks on how to use Tableau dashboards.

Applications of the W.E.I.S.E.

Examples

Now that you have a better understanding of the W.E.I.S.E. and an idea of what it looks like, the question arises: how does it add value? The following are examples of how I believe the W.E.I.S.E. can be used to better understand biathlon:

See what combinations of misses and ski time rankings have the best WE for each lap
Compare combinations for WE vs number of occurrences (probability vs reliability)
Compare the WE to actual race results and see how they relate or why they differ
See if the WE declines as ski time rank increases with the same number of misses
Reversed, see if the WE declines as the number of misses decreases while the ski time rank (bin) stays the same
For each lap in a race, analyze the changes in WE per athlete
Look for commonalities when doing the above for a number of races for a specific athlete to identify potential weak spots or strengths
Aggregate the WE per lap per race and compare races
Analyze if multiple and/or big changes in WE in a race relate to interesting races to (re-)watch
See if the aggregated WE can tell us anything on a nation’s level
Once live becomes available from the IBU, it can be used to see live updates on Win Expectation for any athlete, and even (although it’s definitely not my thing) use it for betting during the race
Change statistics to some that may better represent skiing and shooting
Develop this further to show expected points per combination rather than just the expectancy to win.

Share your thoughts, please!

That’s quite a list of things that I can just come up with after giving it some thought. But let’s not stop there! If you made it through the article all the way to here, please let me know (Twitter) your thoughts about the application of the W.E.I.S.E. and your ideas on how to improve and expand it in the next version. If you’re not on Twitter, I’m also on Instagram and you can also email me (just add rj@ in front of my main website name). I would really appreciate your time and attention. I have plans to embed some of the ideas above in newer versions of W.E.I.S.E. in the following weeks, but new perspectives really help in improving it, so I’m looking forward to your feedback and the conversations!

Recent Articles

Categories

Archives by Month

Search Articles