Last year we introduced our DRAFTe model, aiming, like many of our stats colleagues, at giving a ranking ahead of the NHL Draft.
Now, everything can be perfected, right?
To be brief, we wanted to not limit ourselves to drafted players, but to use all draft eligible players since 2008. That way, we were not introducing a bias only looking at players deemed interesting enough by NHL organizations for numerous reasons.
So, if this 2023 version of the DRAFTe model will build on what we created last year (using Win Shares and probabilities of reaching the NHL based on your performances in your league), we also modified other parts.
Components
As always, we only considered players with at least 10 games in a given league.
Only regular season stats are included. No playoffs, no international tournaments.
Model is trained on data from 2008 to 2019, giving enough time to 2019 players to make the NHL. It only includes 18 years old, no overageers (hopefully next time), and no goalies.
NHL careers are considered up to 30 years old, avoiding all the noise in the data as older players have much more ups and downs. Europeans also are more likely to leave the NHL early, creating bias on their longevity and average performance, etc.
The first big decision was to include the year -1 in the evaluation. We quickly tried to include draft year -2 but a good portion of players didn’t have that info (we don’t track u18 leagues), and the overall results were just less correlated with the future NHL careers or players.
Using our world database, we established the value of every draft eligible player on their draft year, using a 66% weight for draft year and 34% for year -1, which was determined as the best combination through a series of tests we made.
And we established that Draft time value both on Win Shares and Points in parallel. While we wanted to keep using Win Shares as a better proxy for the overall impact of players, points production remains strongly correlated with present and future performances, probably as NHL players are vastly picked up on their ability to score (at least a little), and because hockey is a sport where the better players simply score more, no matter the position they play.
If a player has played in two or more leagues the same year, they are weighted based on the number of games they played there. Like Michkov’s 30 games in KHL weight for 71% of his draft year season, compared to 29% for his 12 VHL games.
All leagues were translated into NHLe values, using our newly reworked NHLe we presented recently. By doing so, all players are already becoming apples to apples, even the ones going through different leagues (like Michkov playing in MHL, VHL and KHL).
Adjusting for age is a key step for Draft projection. The month you are born in might become less and less important as you career goes on, but, on draft day, it still matters a lot. Like last year, we divided the year into 4 (16 September to 15 December, 16 December to 15 March, 16 March to 15 June, 16 June to 15 September) and looked at the average value in Win Shares created by NHLers over their career up to 30 years old. Like all the age studies done before, the older players (September to December) and the youngest (June to September) have had the best results in their careers compared to Winter and Spring kids.
Therefore, Summer and Fall kids received a 3% to 6% boost to their Win Shares and Points Draft time value, while Winter and Spring players got a 2% to 6% malus.
Probabilities to become an NHLer
What’s an NHLer? There are no right answers to that, and it is all a bit arbitrary, whether you choose 1, 50, 100 or 200 NHL games as the minimum to earn that status. We decided for 82 games, for the symbolic and because going up to above 100 would also limit our samples as the draftees from 2018 or 2019 are barely there yet.
We ranked players in percentile of their league, by position. For example, Connor McDavid ranks at the 100th percentile among OHL forwards for both Win Shares or Points, alongside John Tavares and Taylor Hall.
We then binned players into groups, using a non linear thinking, which, after testing it, was the best way to represent how talent quickly deteriorates from the top going down.
100th percentile in that league at that position. Simply the best historical players.
95th-99th percentiles.
90th-94th percentiles.
80th-89th percentiles.
66th-79th percentiles.
50th-65th percentiles.
25th-49th percentiles, getting into bigger bins.
0-24th percentiles.
Why grouping players? Because guys do not evolve in a linear way from their draft year to their 30th birthday. And sometimes, players ranking at the 90th percentile simply have a worse career than players ranking at the 75th… By grouping similar players together, we reduce that volatility.
We then looked at the percentage of players that became NHLers within each group.
For the leagues with a large sample (all junior leagues), the fall-out was pretty natural, like all 8 groups for both positions in the OHL were in perfect order, from a 100% chance to become an NHLer for the 100th percentile forwards, to 0.4% chance for the 0-24th percentile forwards.
In some other cases, the drop is quite huge between the top historical players and the rest. Among defensemen in Swedish junior, the top 100th percentile all made the NHL, but only 22% of players ranked between the 95 and 99th percentile did so!
When the progression was not linear, the model merged groups together till it became so. Among defensemen in Finnish Liiga, there was simply no other logic than to group all guys above the 50th percentile together to get a 33% probability of making the NHL.
Adjustments to cover special cases
We then run into a series of issues touching the smaller sample leagues, one being the absence of NHLers in a given position and league, like for forwards from the German DEL before 2020, when Stützle, Reichel and Peterka were all called in the first couple rounds. In that case we looked at a relative comparison with the other position to establish a relative probability to become an NHLer, avoiding capping them at 0%. Basically comparing Stützle to Moritz Seider in that case.
Another adjustment tried to cover the case of unprecedented performances among top groups (100th percentile players) and small leagues (identified with 30 or less players since 2008). Why? To avoid capping a player when he is the best one ever in his local environment. If the player performed for example 1.9 times above the best performance of his predecessors, his probability to reach the NHL was multiplied by 1.9, up to 100% of course.
Finally, we could get a final draft value for both Win Shares and Points by making:
age adjusted Win Shares or Points value * probability of becoming an NHLer.
Ranking players
Then, our final problem was to mix Win Shares and Points together. First, we ranked players on those 2 metrics. For instance, Tim Stützle in 2020 was 5th on Win Shares and 4th on Points. We then ran a correlation between those Win Shares and Points Draft time values and the career performance of every prospects in the NHL till they were 30 years old. We did so by looking at the 200 top prospects by draft year according to the model, to remove all the noise from the thousands of “lower” players (no offense).
Win Shares value at Draft time seems to be determining 45% of your future NHL career (or absence of), and the Points value at draft time seems to be determining 54% of it.
As points were more telling, we weighted them accordingly in the Draft final ranking, using: (Win Shares rank * 45%) + (Points rank * 54%).
Limitations and future improvements
It is only a model. No matter how much data we use, the rules we set up to cover special cases, the fact Win Shares are covering more aspects of the game than just points, it remains that the results are based on tangibles information: performances, environment and biography.
But tangibles results can have intangible explanations stats don’t know about: personal reasons, injuries, etc. And there is no adjustments for that. It is where the scouting staff of a team would counterbalance the projections with their own assessment of the player in order to give him a final grade.
Among the things we would like to improve in the future, there is the lack of knowledge on certain leagues where the sample of comparison is limited. Maybe we could group leagues together based on their relative strength. There is also the ever changing hockey landscape. European players tend to stay in their home league more often now, and a recency biases should be looked at in general. Same for the changing scoring rates in leagues.
Anyway, there’s only so much we can do to try and be the less wrong possible (stats joke again), hopefully it is still solid for the majority of players assessed.
Without further ado, here’s the top32 for the 2023 NHL Draft according to the model. A full article on the why of each results will come out soon!
And you can have access to the full ranking on our Tableau page as well as individual player cards.
Interesting Suniev is ahead of Nadeau considering the difference in scoring this year and Nadeau is ~6 months younger. Is the Fall vs. Spring birthday influencing that much?
Solid article again.
I was wondering if a third component could be considered in parallel with win shares and points for established players : minutes played in the NHL. The model seems to undervalue defencemans but maybe considering time on ice could sway it more in a realistic draft order, since defensive metrics are harder to quantify, although time played is an indicator of success.