\keepXColumns

Golf Strategy Optimization and the Value of Golf Skills

Gautier Stauffer Faculty of Business and Economics (HEC Lausanne), Department of Operations, University of Lausanne, Quartier Unil-Chamberonne, 1015 Lausanne, Switzerland, Email: [email protected] Matthieu Guillot Laboratoire DISP, IUT Lumière, Université Lumière Lyon2, France, Email: [email protected]
Abstract

This study investigates strategic considerations in professional golf’s Stroke Play format. We develop a Markov Decision Process (MDP) model, specifically a stochastic shortest path model, to optimize a golfer’s strategy for any golf course. The model integrates golf course layout details and player skills. We demonstrate this approach using Professional Golfers’ Association Tour data and aerial views of golf courses. To the best of our knowledge, this is the first exact, data-driven approach for golf strategy optimization in the literature. While MDPs are commonly used for sport strategy optimization, scaling this approach to golf poses a challenge due to the curse of dimensionality. Our primary objective is to prove that an exact approach is computationally feasible for such large-scale problems, provided that low-level coding and meticulous code optimization are employed. Furthermore, we illustrate how this framework could be used to determine which aspects of a player’s game should be prioritized for improvement and challenge the ‘Drive for show, putt for dough’ adage. Additionally, we demonstrate how our methodology can be used to quantify the value of different golf skills.To ensure replicability and facilitate the adaptation and extension of our methodology, we provide open access to all our codes and analyses (in R and C++).

1 Introduction

Golf is a game in which players aim to put a ball into a cup (sometimes also called a hole) with the help of clubs (the main types are woods, irons, and putters) using the least number of shots (see [1] for the 2019 official rules of golf). The field where the golfer plays is called a (golf) course. It consists of eighteen independent (and different) holes. A hole comprises different areas as shown in Fig. 1 (see subsection 1 for more details).

Refer to caption
Figure 1: A hole and its different areas, the area beyond is out-of-bounds

The golfer’s score on a hole is essentially the number of shots taken to put the ball into the cup, plus any possible penalties (when the ball ends up in the water or goes out-of-bounds) [1]. The main type of competition is stroke play where the players usually play 4 rounds of 18 holes. The player’s final score is obtained by summing up the score of the 72 holes and the winner is the player with the lowest total score. We focus on this type of competition. Before a competition, elite amateurs and professional players typically practice on the tournament’s golf course to identify the main hazards and risks. This usually includes inspecting the course for obstacles, hazards, slopes, and weather conditions. They typically report such information on a golf course booklet and they include personal recommendations on which club to use and target to aim for under different scenarios (pin position, wind, etc.). The main purpose of the course booklet is to help golfers develop an effective strategy for playing their game during the competition. An ideal personalized booklet would describe which shot to play from any (reachable) position on the golf course, given the current performances of the player. Building such a booklet is beyond human capabilities, and in this paper, we develop a methodology to automate the construction of such a ”strategic” booklet through the use of (available) historical data on the performance of players and Markov Decision Processes and we show that it is computationally tractable. The underlying problem is strategy optimization.

Markov chains and Markov decision processes are the models of choice for performance assessment and optimization in sports, as they capture the inherent probabilistic nature of the success of every ‘action’ performed by athletes or teams. They have been used, for instance, in tennis, basketball, volleyball, ice-hockey, golf, soccer, darts, and snooker, e.g. [40, 41, 36, 34, 26, 27, 23, 32, 11, 39]. Building such a model for golf strategy optimization requires detailed information about the golf course and the player’s past performances. In this work we use simple 2D information on golf course extracted from aerial views and historical data on the player’s performances taken from the Shotlink database.

The introduction of the Shotlink intelligence program (an initiative of the US Professional Golfer Association (PGA) to share data it collects in real-time on all shots taken on the PGA Tour - the PGA championship - since 2001) has stimulated a lot of academic research in the past 15 years. Broadie’s pionering work on the stroke-gained method [9, 10] has revolutionized the analyses of professional golfers’ performances on the PGA Tour. In addition, a large body of work exploits the Shotlink database to study various aspects of the game of golf (such as the effect of luck, pressure on performance, the existence of the hot hand phenomenon) through statistical analyses (e.g. [6, 33, 18, 16, 35, 38, 19, 15, 14, 13, 25, 24, 3, 22]), performance prediction through machine learning (e.g. [29, 28, 37, 42, 31, 17], see [12] for a recent survey), and the evaluation of different parameters (distance, dispersion, hole size) on performance through simulation and/or optimization [5, 11]. This work is part of the area of research called golf analytics.

Golf strategy optimization was addressed first by [39]. The authors show how to approximate the optimal strategy of a player using a skill-model, simulation and Q-learning. The underlying simulation and skill-models are similar to [11], who used a greedy approach to model a golfer’s strategy (basically they assume that a golfer always chooses the best shot assuming he will play a perfect one - which is actually a strategy often chosen by amateurs). [39] show-case their approach using different types of ”average” players whose skills are characterized by parametrized distributions (the parameters are taken from statistical information available on ”average” players). In this work, we use similar simulation and skill models as those presented in [39] and [11], but we apply the methodology using empirical distributions of PGA Tour players, constructed from historical data available through the Shotlink database.

We show in particular that the natural Markov Decision Problem associated with the strategic optimization problem (this is essentially the same underlying MDP in [39]) can be solved exactly in a reasonable amount of time. In addition, we illustrate how the corresponding methodology can be used for skill improvement. The corresponding approach should help PGA tour professionals (and amateurs collecting similar data using systems such as Arccos) to substantially improve their performances.

Some golf terminologies

The tee is the area where the player starts (there might be several potential areas but the official starting point for a (round of a) tournament is delimited by tee balls - usually corresponding to a rectangular area of around 5m2superscript𝑚2m^{2}italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT): the grass is short and this is the only place where the golfer can use a tee (same name but referring to a small t-shaped piece of wood) to raise the ball from the ground (and ease the shot). The green is the area where the grass is very closely mown, and the cup and pin (a flagstick that makes the cup visible from a long distance) are placed. The fairway is the part of the hole between the tee and the green where the grass is kept short and shows the (intended) path to the hole. The rough is an area of higher grass around the fairway. Usually, the further you get from the fairway, the higher the grass. There are usually different types of roughs that vary in grass height and density, namely light or heavy roughs. The bunkers are hollows filled with sand that serve as “traps” from which it might be difficult to escape (depending on the depth and texture of the sand). The water hazards are typically ponds or other bodies of water where the player cannot usually play (unless very shallow or the ball lies on the shore) with a 1-shot penalty to get out (usually close to the point of entrance). Out-of-bounds is an area where it is forbidden to play: the player receives a 1-shot penalty if shooting a ball out-of-bounds and has to place the ball in the previous position. There are other “obstacles” (formally speaking, obstacles refer to water or bunkers only in the game of golf) such as bushes or trees. See again Fig. 1 for an illustration of the different areas.

2 Modelling and building representative PGA players’ skills from the Shotlink data

In golf, like in many skill games, the result of a shot might differ from the intention. There are several elements that may influence the deviation of a shot from the intended target: subtle differences in golf swing that can result in different launch parameters (speed, spin, angle, etc.), weather conditions (wind, humidity, etc.) and lie conditions (fairway, rough, bunker, uphill, downhill, buried ball, ball in a divot, etc.). At the strategic level, one needs not take into account all these aspects. Some can be taken into account at the operational level when playing the game. We focus here on the variation of launch parameters and different initial surface (fairway, bunker, rough). To understand the effect of different launch conditions from the same surface (fairway), we present in Figure 2 data collected for an elite amateur golfer using Trackman.

Refer to caption
Figure 2: The figure presents a set of target points and the empirical distribution of a player’s shots around those targets. The data comes from a member of the French U21 amateur elite team in 2016, with all shots hit from the fairway. We refer to this as a Trackman profile. The 10 grey points represent drives hit with a driver, demonstrating an average distance of just over 240 meters, along with the distribution of the shots around this average. The blue (for 200 meters), yellow (150 meters), red (100 meters), and green (50 meters) points illustrate the empirical shot distributions around targets located at those distances directly ahead. The ellipses around the points represent a 95% confidence region, calculated by TrackMan, indicating where most of the shots are expected to land based on the observed data.

A Trackman profile provides, for certain explicit targets, and given the type of surface, an empirical distribution of the realization of the shots for the player (note that this does not take into account the roll of the ball and this assumes 2D trajectories). These profiles are what we consider in the remainder of the manuscript. More formally, we assume that the position of the balls in 2D, around a targeted point (0,d)0𝑑(0,d)( 0 , italic_d ), with d[0,Dmax]𝑑0subscript𝐷𝑚𝑎𝑥d\in[0,D_{{max}}]italic_d ∈ [ 0 , italic_D start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ], and from a surface s𝑠sitalic_s (fairway, bunker, rough) is a random variable Xd,ssubscript𝑋𝑑𝑠X_{d,s}italic_X start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT that can be described by a probability density function in 2D (Dmaxsubscript𝐷𝑚𝑎𝑥D_{{max}}italic_D start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is the maximum distance a player can target). More formally, we consider the sample space Ω:={(x,y)2}assignΩ𝑥𝑦superscript2\Omega:=\{(x,y)\in\mathbb{R}^{2}\}roman_Ω := { ( italic_x , italic_y ) ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT } and we assume that Xd,ssubscript𝑋𝑑𝑠X_{d,s}italic_X start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT follows a probability density function fd,s:(x,y)Ωfd,s(x,y)+:subscript𝑓𝑑𝑠𝑥𝑦Ωmaps-tosubscript𝑓𝑑𝑠𝑥𝑦subscriptf_{d,s}:(x,y)\in\Omega\mapsto f_{d,s}(x,y)\in\mathbb{R}_{+}italic_f start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT : ( italic_x , italic_y ) ∈ roman_Ω ↦ italic_f start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT ( italic_x , italic_y ) ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT for any d[0,Dmax]𝑑0subscript𝐷𝑚𝑎𝑥d\in[0,D_{{max}}]italic_d ∈ [ 0 , italic_D start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ] and for any s{fairway,rough,bunker}𝑠𝑓𝑎𝑖𝑟𝑤𝑎𝑦𝑟𝑜𝑢𝑔𝑏𝑢𝑛𝑘𝑒𝑟s\in\{fairway,rough,bunker\}italic_s ∈ { italic_f italic_a italic_i italic_r italic_w italic_a italic_y , italic_r italic_o italic_u italic_g italic_h , italic_b italic_u italic_n italic_k italic_e italic_r }. The Trackman profile from Figure 2 hence represent samples from the random variables associated with the 5 distances targeted, from the fairway. In this work, we do not impose specific distributional forms (e.g., normal distributions) for the functions fd,ssubscript𝑓𝑑𝑠f_{d,s}italic_f start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT. Instead, we rely on empirical distributions derived from observed data, assuming these are representative samples of the corresponding random variables across specific distances and surfaces.

These empirical distributions can be represented by a Trackman profile. Hence in the following, we will consider that skills of the player, outside of the green, are given to us through a Trackman profile.

Collecting accurate Trackman profiles of a player is nearly impossible, as it requires the player to hit thousands of balls from the same surface with different target distances. Instead, we will infer approximate Trackman profiles by exploiting the Shotlink database and common strategies of PGA tour players. The Shotlink database collects the positions of the ball (3D coordinates) of PGA Tour golfers since 2004 in every PGA Tour competition (as well as other information). Recovering Trackman profiles from such data is not straightforward, as we have no information about the player’s intention: knowing the final destination does not help in assessing the deviation from the intended target. Moreover, we do not have information on the carry and roll of the ball. However, there are some invariants in professional game plans, and we build upon common strategies that professionals use to infer the intention and the ball’s trajectory from the database. Next, we explain the core idea of our approach. It is important to note that our goal is not to replicate exact Trackman profiles of the players, but rather to create profiles that are sufficiently realistic to effectively demonstrate our methodology on data representative of PGA tour players.

First, professionals tend to target the pin whenever possible and not too risky, which is the case for reasonably short shots - say less than 150m (in reality, they tend to aim slightly off the pin if there is an obvious obstacle close to the pin (e.g. a water hazard or bunker), or if the green is not flat, as they generally prefer uphill to downhill putts, but we omit such details). For longer shots, they might play it a bit safer, aiming between the pin and the middle of the green, but since we do not have information about the geometry of the green (and the position of the pin with respect to this geometry), we also assume in this case that they target the pin again. Second, professional golfers tend to choose a general strategy for each of the tee shots (on par-4 or par-5) before the first day of the tournament (in the training rounds on the previous days). So unless there are very different conditions between two rounds (e.g. tee positions, weather conditions), we can reasonably assume that they aim at the same target.

As mentioned, we do not have information about the carry and roll of the ball (nor on the potential lateral spin). We therefore assume for simplicity - and lack of data - that the trajectories are straight and that the final endpoint is independent of the carry/roll trajectory, so that the empirical distribution of a shot around the target only depends on the distance and on the lie of the ball. We believe that the first assumption is not very restrictive, as most shots played by professional golfers are straight (some players may have a slight preference for a certain lateral spin - fade or draw - but this is usually very subtle). For the second hypothesis, the main shortcoming might derive from very short game situations - say below 30m - where different trajectories can be chosen (with a preference usually for rolling the ball on the green as much as possible, but where a “lob shot” is needed when there is an “obstacle” (say a bunker) close to the pin and in between the ball and the hole). In such situations, we implicitly assume that the distribution of the shots around the target does not depend on the type of shot played, which is certainly a limitation. For other shots, the ball tends to roll more on the fairway and the green than in the rough when coming from a long distance (but not much when at a distance of less than 150m). However, we believe that this effect is limited.

So, in concrete terms, we build the trackman profiles from the trace of the shots in Shotlink. The position of the ball is stored as a 3D coordinate (x,y,z)𝑥𝑦𝑧(x,y,z)( italic_x , italic_y , italic_z ) in the database, where x𝑥xitalic_x and y𝑦yitalic_y are essentially the longitude and latitude of the ball (possibly expressed in a different coordinate system) and z𝑧zitalic_z is the elevation. We assume here for the exposition that the coordinates are expressed in meters. We get rid of the z𝑧zitalic_z coordinate as we assume, in this project, that the trajectories are flat, for computational efficiency. We proceed as follows. Assume that (x0,y0)subscript𝑥0subscript𝑦0(x_{0},y_{0})( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is the position of the ball before the shot and that the ball lies on the fairway, (x1,y1)subscript𝑥1subscript𝑦1(x_{1},y_{1})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) is the position of the ball after the shot, and (xP,yP)superscript𝑥𝑃superscript𝑦𝑃(x^{P},y^{P})( italic_x start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) is the position of the pin. Let d=(xxP,yyP)2=(xxP)2+(yyP)2)d=||(x-x^{P},y-y^{P})||_{2}=\sqrt{(x-x^{P})^{2}+(y-y^{P})^{2})}italic_d = | | ( italic_x - italic_x start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT , italic_y - italic_y start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG ( italic_x - italic_x start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( italic_y - italic_y start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG and let M𝑀Mitalic_M be the (transpose of the) rotation matrix, that maps (xPx0,yPy0)superscript𝑥𝑃subscript𝑥0superscript𝑦𝑃subscript𝑦0(x^{P}-x_{0},y^{P}-y_{0})( italic_x start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) to (0,d)0𝑑(0,d)( 0 , italic_d ). We assume that M(x1x0,y1y0)𝑀subscript𝑥1subscript𝑥0subscript𝑦1subscript𝑦0M\cdot(x_{1}-x_{0},y_{1}-y_{0})italic_M ⋅ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is drawn from Xd,ssubscript𝑋𝑑𝑠X_{d,s}italic_X start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT (with s=fairway𝑠𝑓𝑎𝑖𝑟𝑤𝑎𝑦s=fairwayitalic_s = italic_f italic_a italic_i italic_r italic_w italic_a italic_y) and thus we can build proxies of trackman profile using this procedure by using all data from a player in the database (we restrict to data from the last 12 months). For tee shots, we assume, as explained above, that the player targets the average position over the 4 rounds that he played (if he played only two rounds, we discard the corresponding data - we also check that the tee positions are within a 5 meter radius over the 4 rounds) and we applied a similar procedure.

We have inferred the Trackman profiles from this procedure. The Fig. 3 “original” panel shows the outcome for Phil Mickelson on fairway shots. Two things clearly emerge from the Fig. 3 “original” panel (or Fig. 5 and Fig. 7 for shots played from the bunker and the rough). First, we do not have historical data for any distance. Second, there are pairs that look suspicious: the pair with a target point at a distance of roughly 370m and final destination at 250m is probably an outlier. Indeed in this case, it is fairly obvious that the player did not try to target the pin (way too far to be within reach), but probably played a safe shot on the fairway between his position and the target. One way of detecting this kind of outlier is to put a cap on the maximum distance error that a player could make in principle. Indeed, professional players are known to be fairly accurate in terms of distance control (unless there are very particular conditions, e.g. hard ground, wind blowing suddenly, hit of a tree, etc…). We used the Trackman data for PGA Tour professionals taken from https://blog.trackmangolf.com/category/tour-stats (see Fig. 3 (left)) to set the cap, as we explain below.

We chose to keep all wedge shots (that is, shots under 100m), as in this case, the assumption of targeting the pin makes perfect sense (there might still be outliers, for instance, if the player’s ball ends up in a water hazard and the player drops it close to the entry point, then the observed deviation does not correspond to the true one), while we cap the maximum distance error to 20m for shots over 100m (the maximum in the PGA Tour Trackman data available to us is 15m, see Fig. 3 (left)). The results are presented in the “cleaned” panel of Fig. 3 . This choice is consistent with statistics from the literature (see Fig. 3 in [30]), and we validated this number with two trainers of the French U21 elite amateur team using data from their elite amateurs. Fig. 4 shows the results for 24 PGA Tour professionals.

Refer to caption
Figure 3: Each segment of the figure represents a target/destination pair for shots played from the fairway. All target points have been rotated so as to appear on the y-axis. The ”original” panel shows inferred data for Phil Mickelson before removing outliers. The left figure shows the (few) Trackman m data we have for PGA Tour professionals (the different colors represent different players). The ”cleaned” panel shows the data for Phil Mickelson after the removal of outliers. Numbers are in meters.
Refer to caption
Figure 4: Inferred data from the fairway. Numbers are in meters.

Although rigorous statistical validation is not possible (since we do not have the true Trackman profiles for the corresponding players), the figures are fairly consistent with the limited Trackman data we have (see Fig. 3 (left)). However, there are still some obvious outliers (e.g. Grace and Kisner have shots that end up behind them - perhaps due to hitting a tree), but we are not too concerned about these few remaining cases, as the effect of these outliers will be smoothened in the next preprocessing phase.

The task is somewhat harder from the rough, as there are many types from light to heavy, and the lie of the ball might vary substantially and might have a strong impact on the outcome: it could be more difficult to hit a buried ball in a light rough than a ball lying on the surface of a heavy rough. As we do not have access to such information, we collected data from the rough without differentiating these situations. Of course, this is a limitation. For similar reasons as the fairway shots, we can remove outliers when the distance error is too large. We do not have Trackman data of PGA professionals from the rough, so we need to set the thresholds somewhat more arbitrarily. The distance control error might be more important in the rough. The accuracy error on fairway and rough shots was investigated in [30]. Based on the statistics reported in Fig. 2 of the corresponding manuscript, we set the threshold on distance control error for outlier detection at 30m. We again validated this number with two of the trainers of the French U21 elite amateur team using data from their elite amateurs. The results are shown in Fig. 5 and Fig. 6.

Refer to caption
Figure 5: Each segment of the figures represents a target/destination pair for shots played from the rough. All target points have been rotated so as to appear on the y-axis. The ”original” panel shows the inferred data for Phil Mickelson before removing outliers. The ”cleaned” panel shows the data for Phil Mickelson after removal of outliers. Numbers are in meters.
Refer to caption
Figure 6: Inferred data from the rough. Numbers are in meters.

For the bunkers, we apply the exact same strategy as for the rough. The results are shown in Fig. 7 and Fig. 8. We have very scarce data for distance over 40-50 meters, which is not surprising since there are many more bunkers around the greens than so-called fairway bunkers. We will consider how to interpolate the missing data. While interpolation has limitations (especially where we have very few points), it has minimal consequences, as few shots are played from fairway bunkers.

Refer to caption
Figure 7: Each segment on the figures represents a target/destination pair for shots played from bunkers. All target points have been rotated so as to appear on the y-axis. The ”original” panel shows the inferred data for Phil Mickelson before removing outliers. The ”cleaned” panel shows the data for Phil Mickelson after removal of outliers. Numbers are in meters.
Refer to caption
Figure 8: Inferred data from the bunker. Numbers are in meters.

We now focus on the driving data off the tee. The results are shown in Fig. 9 and Fig. 10. Here again we applied a threshold of 30m to the distance control error for outlier detection.

Refer to caption
Figure 9: Each segment of the figures represents a target/destination pair for shots played off the tee. All target points have been rotated so as to appear on the y-axis. The ”original” panel shows the inferred data for Phil Mickelson before removing outliers. The ”cleaned” panel shows the data for Phil Mickelson after removal of outliers. Numbers are in meters.
Refer to caption
Figure 10: Inferred data off the tee. Numbers are in meters.

To create complete Trackman profiles, we would need empirical distributions for any possible distance. We use a form of bootstrapping to generate missing data. Observe that this procedure would be needed even if we had true Trackman profiles from the beginning, such as the one in Fig. 2, as there contain only a limited number of targeted distances and sample points. The main idea is to use a local linear approximation. For any distance d𝑑ditalic_d we might target, the basic idea is to grow a disk around (0,d)0𝑑(0,d)( 0 , italic_d ) until it contains enough sample target-destination pairs (from the inferred data), and scale the coordinates of the arrival points by the ratio of the original targeted distance and d𝑑ditalic_d (that is, assuming a linear relation: if the targeted point was (0,t)0𝑡(0,t)( 0 , italic_t ) (within the disk) and the arrival point (x,y)𝑥𝑦(x,y)( italic_x , italic_y ), we assume that for the hypothetical target (0,d)0𝑑(0,d)( 0 , italic_d ) this would have resulted in the arrival point dt(x,y)𝑑𝑡𝑥𝑦\frac{d}{t}\cdot(x,y)divide start_ARG italic_d end_ARG start_ARG italic_t end_ARG ⋅ ( italic_x , italic_y )). The radius of the disk is defined so as to grab enough data to be statistically relevant, but not too many when this includes points that are too remote - this is the case when we have fewer data available, such as for fairway bunkers, for instance: we set the number of points to 50 and put a cap of 30 meters on the radius unless we have fewer than 10 points available (in which case, we take the closest 10 points). We have also assumed that the distribution is symmetric along the y-axis, and that the lateral error and the distance control error are independent. We used these hypotheses to “shuffle” the x and y coordinates of the arrival points and avoid too much bias toward the existing data points. Additionally, we used the 95th percentile of the maximum observed target distance as the maximum targetable distance of a surface, and capped the maximum distance that the ball might reach by the maximum observed distance. The results are shown in Fig. 11, Fig. 12, Fig. 13, Fig. 14 and Fig. 15. In all the figures, we have restricted the target distances to multiples of 2.5m, and generated 15 realizations for each distance. Note that for consistency, we ensured that the average lateral dispersion would increase with distance. Hence, we slightly rescaled the lateral dispersion by the inverse of the ratio with the average of the previous distance when this was not the case. Furthermore, we paid attention to the fact that the average lateral deviation from the rough and from the bunker (for a given distance) could not be less than that from the fairway to compensate for missing data that could bias the results. Although there are still a few inconsistent points, the result appear fairly clean. While the parameters used above could be fine-tuned, we are satisfied with the results from the current settings (see Table 1 and Table 2 and the associated discussion) as, again, our objective is to construct realistic PGA Tour player profiles, not to create exact virtual clones.

Refer to caption
Figure 11: Bootstrapped data generation on the fairway. Numbers are in meters.
Refer to caption
Figure 12: Bootstrapped data generation in the rough. Numbers are in meters.
Refer to caption
Figure 13: Bootstrapped data generation from the bunker. Numbers are in meters.
Refer to caption
Figure 14: Bootstrapped data generation off the tee. Numbers are in meters.
Refer to caption
Figure 15: Bootstrapped data generation off the tee and fairway. Numbers are in meters.

Now the last skills we did not consider yet are putting skills. On a green, a professional PGA Tour golfer typically makes 1, 2, or 3 putts (perhaps 4 or 5 putts in exceptional situations, but this represents less than 3 cases out of 10000 in our data, so we assume that only these three situations can occur). As we do not have details about the slopes of the greens, we assume that the average number of putts is simply a function of the distance to the hole, that is it follows a certain function p:dp(d):𝑝𝑑maps-to𝑝𝑑p:d\in\mathbb{R}\mapsto p(d)\in\mathbb{R}italic_p : italic_d ∈ blackboard_R ↦ italic_p ( italic_d ) ∈ blackboard_R. In order to evaluate this function, we estimated the probability of 1-putt, 2-putts, and 3-putts for any possible distance on a green. We focus now on the 1-putt and 3-putts probabilities, as the 2-putts probability is easily computed from the other two.

The longest possible distance on a green (that is, the diameter of the geometrical object) is usually no more than 25m, especially in the US (the old course in Saint-Andrews (UK) is an exception with a diameter of some greens reaching 50m). We thus consider distance to hole below 32m (above this value, we have very little data: about 1 per 16,000). To build the histogram, we need enough data for each distance. Unfortunately, we have more data close to the pin than far from it (as players usually get closer each time they play without putting the ball into the hole). For individual players, we thus inferred the probability from the data available below 16m. We used buckets of doubling size (to have enough data within each bucket for statistical relevance - more than 30 points typically) with the following “breakpoints” (in meters): (0, 0.5, 1, 2, 4, 8, 16) and assigned the value to the midpoint of the interval. so, in concrete terms, for a given player, we collected all data regarding putts made from a distance in the range of distance, say [1,2]12[1,2][ 1 , 2 ], and we estimated the 1-putt (2-putt, 3-putt resp.) probability as the frequency of 1-putt (2-putt, 3-putt resp.). We then assumed that all putts were played from the mid-distance, that is 1.5 m here. For distances between 16m and 32m, due to the lack of data for individual players, we aggregated all data from all players to estimated the corresponding probabilities. We then used a linear interpolation to build a proxy of the function p𝑝pitalic_p for any player.

The results are presented in Fig. 17. If we aggregate all the data from professional PGA Tour players, we obtain the result in Fig. 16, which is in line with the literature (see Fig. 1 in [30]). We also compare the average number of putts as a function of the distance for different PGA Tour players. The results are shown in Fig. 18 (of course, our estimations could probably be improved and smoothened, but we believe that using more advanced approaches is not relevant at this stage, as there are other simplifications in our models that probably dominate this one).

We preprocessed the data with R, and the corresponding code is available, upon request, in a companion zip file for the replicability of our results.

Refer to caption
Figure 16: Inferred putting probabilities as a function of distance for the “average” PGA Tour player. Numbers are in meters.
Refer to caption
Figure 17: Inferred putting probabilities as a function of distance for a subset of players. Numbers are in meters.
Refer to caption
Figure 18: Inferred putting average as a function of distance for a subset of players. Numbers are in meters.

3 Modelling the game of golf

In order to optimize a golfer strategy with a Markov Decision Process, we need a model to predict the (stochastic) outcome of any ”action” we may use. In our case, as will become clear later, actions correspond to shots and we need to specify the result of any (possible) shot.

We explained in the previous section how we inferred reasonnable Trackman profiles of players on each surface and for each distance given the “trace” of the players on the different tournaments available in the database. These profile are 2D and we assume in the following that they represent the projections of the flight of the ball: we thus implicitly assume that the ball does not roll and that the trajectories are straight. Now we use a 2D representation of the golf course as well, using stylized 2D pictures of the holes and a clear encoding of each surface and obstacles similar to Fig. 1 (bushes/trees in dark green, roughs in green, fairways in light green, greens in yellow green, and bunkers in egg nog). We actually created (manually) 2D raster of the holes of three golf courses using aerial views from google maps: Augusta National Golf Club (that hosts one of the four major tournaments: the Masters), Le Golf National (that hosted the Ryder Cup competition in 2018, a very famous biennial competition between male teams from Europe and the United States) and the Bay Hill Club and Lodge, Orlando (that hosts the Arnold Palmer Invitational). We chose the resolution so that 1 cell roughly represents a square region of side 1m (typically between 0.7m and 1.5m, depending on the hole).

For a position (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) on a surface s𝑠sitalic_s (actually a cell in the raster), a shot is essentially a selection of a target point (xt,yt)superscript𝑥𝑡superscript𝑦𝑡(x^{t},y^{t})( italic_x start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ). This target point can be characterized by a distance d𝑑ditalic_d and an direction/angle. Let M𝑀Mitalic_M be the (transposed of the) rotation matrix associated with the corresponding angle. The target point is (x,y)+M(0,d)𝑥𝑦𝑀0𝑑(x,y)+M\cdot(0,d)( italic_x , italic_y ) + italic_M ⋅ ( 0 , italic_d ). Now in order to estimate the distribution of the outcome of the shot, we consider k𝑘kitalic_k realizations sampled from Xd,ssubscript𝑋𝑑𝑠X_{d,s}italic_X start_POSTSUBSCRIPT italic_d , italic_s end_POSTSUBSCRIPT. Let (x1,y1),,(xk,yk)subscript𝑥1subscript𝑦1subscript𝑥𝑘subscript𝑦𝑘(x_{1},y_{1}),...,(x_{k},y_{k})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) be the corresponding samples. In the absence of trees, water hazard and out-of-bound area, the distribution of the outcome could be approximated by the sample (x1,y1),,(xk,yk)subscriptsuperscript𝑥1subscriptsuperscript𝑦1subscriptsuperscript𝑥𝑘subscriptsuperscript𝑦𝑘(x^{\prime}_{1},y^{\prime}_{1}),...,(x^{\prime}_{k},y^{\prime}_{k})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), where (xi,yi)=(x,y)+M(xi,yi)subscriptsuperscript𝑥𝑖subscriptsuperscript𝑦𝑖𝑥𝑦𝑀subscript𝑥𝑖subscript𝑦𝑖(x^{\prime}_{i},y^{\prime}_{i})=(x,y)+M\cdot(x_{i},y_{i})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = ( italic_x , italic_y ) + italic_M ⋅ ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for all i=1,,k𝑖1𝑘i=1,...,kitalic_i = 1 , … , italic_k. We call this the hypothetical empirical distribution.

In the presence of trees (and the like) in the ball’s trajectory, we assume that the ball will stop on the trajectory right before the first obstacle it encounters. That is, the ball will hit the obstacle and it will neither “bounce off” nor penetrate the obstacle, it will simply stop (consider a collision with a dense tree, like fir). More precisely, we assume that trees are infinitely high, and that the ball simply stops right before the contact point. When the ball falls into a water hazard, we assume that the player will “drop” the ball at the entry point (which is the most common option out of the different possible options for a golfer in such situation). When the ball ends up out-of-bounds, we (re)position the ball at the origin of the shot (with a 1-shot penalty according to the rules of golf).

Technically speaking, we apply the following procedure to identify the destination cell. We use Bresenham’s algorithm ([8]) to identify an ordered set of cells from the raster/picture that are traversed by the trajectory. Suppose we consider the trajectory from (x1,y1)subscript𝑥1subscript𝑦1(x_{1},y_{1})( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) to (x1,y1)subscriptsuperscript𝑥1subscriptsuperscript𝑦1(x^{\prime}_{1},y^{\prime}_{1})( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ). Let 𝒞={c1,,cl}𝒞subscript𝑐1subscript𝑐𝑙\mathcal{C}=\{c_{1},...,c_{l}\}caligraphic_C = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } be the ordered set of cells return by Bresenham’s algorithm. Let i𝑖iitalic_i be the index of the first tree cell in 𝒞𝒞\mathcal{C}caligraphic_C (i=l+1𝑖𝑙1i=l+1italic_i = italic_l + 1 if there is none). We first truncate the trajectory to 𝒞={c1,,ci1}superscript𝒞subscript𝑐1subscript𝑐𝑖1\mathcal{C}^{\prime}=\{c_{1},...,c_{i-1}\}caligraphic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT } (observe that we do not allow a player to play from a tree so i>1𝑖1i>1italic_i > 1). Then, if ci1subscript𝑐𝑖1c_{i-1}italic_c start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT is a water cell, we let j𝑗jitalic_j be the largest index such that cjsubscript𝑐𝑗c_{j}italic_c start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is not a water cell (again here we do not allow a golfer to play from a water hazard so i1>j>1𝑖1𝑗1i-1>j>1italic_i - 1 > italic_j > 1) and we truncate the trajectory to 𝒞′′={c1,,cj1}superscript𝒞′′subscript𝑐1subscript𝑐𝑗1\mathcal{C}^{\prime\prime}=\{c_{1},...,c_{j-1}\}caligraphic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_c start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT }. Finally if clsubscript𝑐𝑙c_{l}italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is an out-of-bound cell, we truncate the trajectory to 𝒞′′={c1}superscript𝒞′′subscript𝑐1\mathcal{C}^{\prime\prime}=\{c_{1}\}caligraphic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT = { italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }. An illustrative (non-realistic) example of a simulation is given in Fig. 19.

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Figure 19: A sequence of simulated shots (pictures numbered sequentially from left to right and top to bottom): the white dot in picture 1 corresponds to the initial tee position, the orange dot corresponds to the intended target, white segments show the player’s trail (note that the first time a segment appears, it reflects the realization according to the hypothetical empirical distribution - without the detection of collisions and special events: it might be shortened in a second stage, e.g. pictures 6 and 11, taking into account the obstacles, water hazards, and out-of-bounds). Note that we stop as soon as we reach the green, since we can then generate the number of putts as described above. In such situation, assuming that the player makes 1 putt on the green, he would score 7 on the hole (5 shots + 1 penalty - as one shot ended up in the water hazard - to reach the green and then 1 putt). This data is entirely artificial and unrealistic for a PGA professional, and is provided solely for illustrative purposes.

We have implemented the corresponding model in C++ for better performance (as will become clear later, we need to call this model a billion times just to create the stochastic shortest path model). The corresponding C++ code is available, upon request, in a companion zip file for the replicability of our results.

4 The optimization model

Before explaining the model, we start with a brief introduction of the stochastic shortest path problem, following [21].

The stochastic shortest path (SSP) problem is a Markov decision process (MDP) that generalizes the classic deterministic shortest path problem. We want to control an agent who evolves dynamically in a system composed of different states, so as to converge to a predefined target. The agent is controlled by taking actions in each time period (we focus here on discrete time (infinite) horizon problems): actions are associated with costs, and transitions in the system are governed by probability distributions that depend exclusively on the previous action taken, and are thus independent of the past. We restrict to finite state/action spaces: the goal is to choose an action for each state, i.e. a deterministic and stationary policy, that reaches the target state with probability one (such a policy is called proper), so as to minimize the total expected cost incurred by the agent before reaching the (absorbing) target state when starting from a given initial state. The problem is well-defined when there is a way to reach the target from any state and when there is no improper policy that allows accumulating an infinitely negative cost [21].

More formally, a stochastic shortest path instance is defined by a tuple (𝒮,𝒜,J,P,c)𝒮𝒜𝐽𝑃𝑐(\mathcal{S},\mathcal{A},J,P,c)( caligraphic_S , caligraphic_A , italic_J , italic_P , italic_c ) where 𝒮={0,1,,n}𝒮01𝑛{\mathcal{S}}=\{0,1,\ldots,n\}caligraphic_S = { 0 , 1 , … , italic_n } is a finite set of states, 𝒜={0,1,,m}𝒜01𝑚{\mathcal{A}}=\{0,1,\ldots,m\}caligraphic_A = { 0 , 1 , … , italic_m } is a finite set of actions, J𝐽Jitalic_J is a 0/1 matrix with m𝑚mitalic_m rows and n𝑛nitalic_n columns and general term J(a,s)𝐽𝑎𝑠J(a,s)italic_J ( italic_a , italic_s ), for all a{1,,m}𝑎1𝑚a\in\{1,...,m\}italic_a ∈ { 1 , … , italic_m } and s{1,,n}𝑠1𝑛s\in\{1,...,n\}italic_s ∈ { 1 , … , italic_n }, with J(a,s)=1𝐽𝑎𝑠1J(a,s)=1italic_J ( italic_a , italic_s ) = 1 if and only if action a𝑎aitalic_a is available in state s𝑠sitalic_s, P𝑃Pitalic_P is a row substochastic matrix (a row substochastic matrix is a matrix with nonnegative entries so that every row adds up to at most 1111. Observe that it is not a usual stochastic matrix as state 00 and action 00 are left out) with m𝑚mitalic_m rows and n𝑛nitalic_n columns and general term P(a,s):=p(s|a)assign𝑃𝑎𝑠𝑝conditional𝑠𝑎P(a,s):=p(s|a)italic_P ( italic_a , italic_s ) := italic_p ( italic_s | italic_a ) (probability of ending in s𝑠sitalic_s when taking action a𝑎aitalic_a), for all a{1,,m}𝑎1𝑚a\in\{1,...,m\}italic_a ∈ { 1 , … , italic_m }, s{1,,n}𝑠1𝑛s\in\{1,...,n\}italic_s ∈ { 1 , … , italic_n }, and a cost vector cm𝑐superscript𝑚c\in{\mathbb{R}}^{m}italic_c ∈ blackboard_R start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. The state 00 is called the target state and the action 00 is the unique action available in that state. Action 00 leads to state 00 with probability 1111. We denote with 𝒜(s)𝒜𝑠{\mathcal{A}}(s)caligraphic_A ( italic_s ) the set of actions available from s{1,,n}𝑠1𝑛s\in\{1,...,n\}italic_s ∈ { 1 , … , italic_n } and assume without loss of generality ( If not, we simply duplicate the actions) that for all a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A, there exists a unique s𝑠sitalic_s, such that a𝒜(s)𝑎𝒜𝑠a\in\mathcal{A}(s)italic_a ∈ caligraphic_A ( italic_s ). We denote with 𝒜1(s)superscript𝒜1𝑠{\mathcal{A}}^{-1}(s)caligraphic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) the set of actions that lead to s𝑠sitalic_s, i.e. 𝒜1(s):={a:P(a,s)>0}assignsuperscript𝒜1𝑠conditional-set𝑎𝑃𝑎𝑠0{\mathcal{A}}^{-1}(s):=\{a:P(a,s)>0\}caligraphic_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_s ) := { italic_a : italic_P ( italic_a , italic_s ) > 0 }.

A (deterministic and stationary) policy ΠΠ\Piroman_Π is a function Π:s𝒮𝒜(s):Π𝑠𝒮maps-to𝒜𝑠\Pi:s\in{\mathcal{S}}\mapsto{\mathcal{A}}(s)roman_Π : italic_s ∈ caligraphic_S ↦ caligraphic_A ( italic_s ), that is, it assigns an action for each possible state. Let ykΠ+nsubscriptsuperscript𝑦Π𝑘superscriptsubscript𝑛y^{\Pi}_{k}\in{\mathbb{R}}_{+}^{n}italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be the substochastic (in general, not a purely stochastic vector, as state 00 is left out.) vector representing the state of the system in period k𝑘kitalic_k when following policy ΠΠ\Piroman_Π (from an initial distribution y0Πsubscriptsuperscript𝑦Π0y^{\Pi}_{0}italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT). That is, ykΠ(s)subscriptsuperscript𝑦Π𝑘𝑠y^{\Pi}_{k}(s)italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_s ) is the probability of being in state s𝑠sitalic_s, for all s=1,,n𝑠1𝑛s=1,...,nitalic_s = 1 , … , italic_n at time k𝑘kitalic_k following policy ΠΠ\Piroman_Π. Similarly, we denote with xkΠ+msuperscriptsubscript𝑥𝑘Πsuperscriptsubscript𝑚x_{k}^{\Pi}\in\mathbb{R}_{+}^{m}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT the substochastic (in general, not a purely stochastic vector, as action 00 is left out) vector representing the probability of performing action a𝑎aitalic_a, for all a=1,,m𝑎1𝑚a=1,...,mitalic_a = 1 , … , italic_m, at time k𝑘kitalic_k following policy ΠΠ\Piroman_Π. Given a policy ΠΠ\Piroman_Π and an initial distribution y0Πsubscriptsuperscript𝑦Π0y^{\Pi}_{0}italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at time 00, by the law of total probability (and because each action is available in exactly one state), we have xkΠ=ΠTykΠsuperscriptsubscript𝑥𝑘ΠsuperscriptΠ𝑇subscriptsuperscript𝑦Π𝑘x_{k}^{\Pi}=\Pi^{T}\cdot y^{\Pi}_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT = roman_Π start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⋅ italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for all k0𝑘0k\geq 0italic_k ≥ 0.

Given a state s{1,,n}𝑠1𝑛s\in\{1,...,n\}italic_s ∈ { 1 , … , italic_n }, a policy ΠΠ\Piroman_Π is said to be s𝑠sitalic_s-proper if k0xkΠsubscript𝑘0superscriptsubscript𝑥𝑘Π\sum_{k\geq 0}x_{k}^{\Pi}∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT is finite, when y0Π:=esassignsubscriptsuperscript𝑦Π0subscript𝑒𝑠y^{\Pi}_{0}:=e_{s}italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := italic_e start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. Observe that k0ykΠsubscript𝑘0superscriptsubscript𝑦𝑘Π\sum_{k\geq 0}y_{k}^{\Pi}∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT is also finite for s-proper policies (as ykΠ=PTxk1Πsuperscriptsubscript𝑦𝑘Πsuperscript𝑃𝑇superscriptsubscript𝑥𝑘1Πy_{k}^{\Pi}=P^{T}x_{k-1}^{\Pi}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT = italic_P start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT). In particular, limk+ykΠ=0subscript𝑘subscriptsuperscript𝑦Π𝑘0\lim_{k\rightarrow+\infty}y^{\Pi}_{k}=0roman_lim start_POSTSUBSCRIPT italic_k → + ∞ end_POSTSUBSCRIPT italic_y start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0, and thus the policy leads to the target state 00 with probability 1111 from state s𝑠sitalic_s. An s𝑠sitalic_s-proper policy is thus a policy that converges to the target with probability one, and whose expected number of visits in each action is finite. The expected cost of such policy is thus the well-defined value cTk0xkΠsuperscript𝑐𝑇subscript𝑘0superscriptsubscript𝑥𝑘Πc^{T}\sum_{k\geq 0}x_{k}^{\Pi}italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT. The s𝑠sitalic_s-stochastic-shortest-path problem (s𝑠sitalic_s-SSP for short) is the problem of finding an s𝑠sitalic_s-proper policy ΠΠ\Piroman_Π of minimal cost cTk0xkΠsuperscript𝑐𝑇subscript𝑘0superscriptsubscript𝑥𝑘Πc^{T}\sum_{k\geq 0}x_{k}^{\Pi}italic_c start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ≥ 0 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_Π end_POSTSUPERSCRIPT.

As explained in the previous sections, we built (discrete) 2D models for holes and Trackman profiles, and a 2D simulator of ball trajectories (using Bresenham algorithm). With these elements, we can evaluate a player’s performance for any strategy (a strategy is the choice of shot for any position on the corresponding hole, that is, essentially a choice of direction and targeted distance, according to our 2D representations) on any hole. Indeed, given a strategy, we can build a Markov chain whose states are the possible positions on the hole (the pixels basically), and the transition matrix can be built from the choice of direction and targeted distance in any state as follows: take the different empirical realizations from the Trackman profile corresponding to the surface where the ball lies, and simulate the outcome of the different realizations on the corresponding hole (we have 15 realizations for each shot from the Trackman profiles generated in Section 2). This provides an empirical distribution over the state space and an expected cost for the corresponding “action” (1 if no penalty occurs). If the strategy is sound (that is, converging to the target from any position), the corresponding Markov chain is absorbing, and the expected number of steps before being absorbed by the cup can be easily evaluated through computing the fundamental matrix (see for instance [20]).

We can go one step further and find the optimal strategy by building an absorbing Markov decision process (an SSP in fact) by simply adding all the possible sets of actions available in a given state to the Markov chain described above. That is, in this SSP model, the states are still the positions on the hole (again the pixels), the actions are the triplets (state, targeted distance, direction), the (empirical) transition matrix and the costs are computed as explained above for the Markov chain model. We have restricted the set of possible directions to an angle (in radian) in {0,2π180,,1792π180}02𝜋1801792𝜋180\{0,\frac{2\pi}{180},\ldots,179\cdot\frac{2\pi}{180}\}{ 0 , divide start_ARG 2 italic_π end_ARG start_ARG 180 end_ARG , … , 179 ⋅ divide start_ARG 2 italic_π end_ARG start_ARG 180 end_ARG } (with this discretization, the player has an aiming precision of at least \approx 1.75m at a distance of 100m), and the targeted distance from the Trackmanprofiles is restricted to multiples of 2.5m as discussed earlier).

In our 2D representations of the hole, we ensured that only locations where the target could be reached were kept (for instance, no rough surrounded by trees or out-of-bounds only: in such case, we would redefine the corresponding zone as a tree area or as an out-of-bounds area), and hence our models are well-defined instances of SSP.

The corresponding instances have the order of 10 thousand states and 150 million actions. We implemented the value iteration algorithm in C++ (see [21] for details of the algorithm). Most of the time spent on the SSP problem resolution actually entails creating the model (with more than a billion calls to the simulator and Bresenham’s algorithm, as we have 15 realizations to simulate for each 150 million actions). Although we have attempted to optimize our code as much as possible, optimizing the computational performance is not the main purpose of this study. Indeed, the computational performance would clearly improve by parallelizing the construction of the model. Instead again we aim at showing that the models are tractable computationally to stimulate further investigations.

Computational experiments

We conducted our experiments on the SD530 nodes within the Curta platform, which consists of 336 nodes. Each node is powered by an Intel Xeon Gold SKL-6130 processor operating at 2.1 GHz, has 32 cores, and is equipped with 96 GB of RAM. More details on the hardware can be found at https://redmine.mcia.fr/projects/cluster-curta.

Our analysis focused on 119 golf players from the 2018 Arnold Palmer Invitational (out of 165 participants) for whom we had ShotLink data available to construct TrackMan profiles. We utilized all relevant data from 2017, as well as data from early 2018 leading up to the tournament in March. The average time required to build the model and optimize the strategy for a single hole was approximately 27 minutes, with a standard deviation of 4.5 minutes. The fastest and slowest times recorded were 12 minutes and 44 minutes, respectively. Note that the main reason for setting the number of realizations to 15 was memory limits: the minimum and maximum memory consumption was 67 and 77 GB respectively (and no more than 92 GB could be reserved on the nodes).

The bulk of the computational effort is devoted to model building (a couple of minutes is needed for value iteration typically - 115 seconds on average with a standard deviation of 42 seconds and a maximum of 304 seconds). The computation time for model creation is essentially multi-linear in the parameters that influence the number of simulations/Bresenham calculations. These parameters include the angle discretization, which affects the number d𝑑ditalic_d of possible shot directions; the target distance discretization, which affects the number t𝑡titalic_t of target distances; and the number r𝑟ritalic_r of realizations used to generate the bootstrapped Trackman profiles. Therefore, the empirical computation time is of the order O(dtr)𝑂𝑑𝑡𝑟O(d\cdot t\cdot r)italic_O ( italic_d ⋅ italic_t ⋅ italic_r ).

Significant improvements in computation time could be achieved through parallel processing. Since the outcomes of individual actions within the model are independent, parallelizing the model creation over M𝑀Mitalic_M machines could reduce the time to O(dtrM)𝑂𝑑𝑡𝑟𝑀O\left(\frac{d\cdot t\cdot r}{M}\right)italic_O ( divide start_ARG italic_d ⋅ italic_t ⋅ italic_r end_ARG start_ARG italic_M end_ARG ). The value iteration algorithm could also be parallelized, although this would require careful organization due to the dependencies between computations. However, the focus of this paper is to demonstrate the feasibility of solving exactly the SSP model associated with golf strategy optimization, rather than exploring computational optimization through parallelization. We should note also that one could easily reduce the action space by filtering out actions that are dominated, e.g. aiming in a direction in the opposite side of the pin is most of the time clearly suboptimal.

The corresponding C++ code and script to launch the code on the Curta platform (or similar) are available, upon request, in a companion zip file.

5 Representativeness of our virtual PGA tour players

We have simulated the performances of the 119 golf players in the Arnold Palmer Invitational to ensure they reasonably represent PGA Tour players, though they are not exact clones of each individual.

In Tables 1 and 2, we present standard golf metrics to compare the performances of these virtual players with their actual performances at the 2018 Arnold Palmer Invitational, held at Bay Hill Club and Lodge in Orlando, a prestigious PGA Tour event. To maintain clarity and avoid overloading the main presentation, the confidence intervals have been provided in the Appendix.

Vorname Nachname Score Tee-shot Fairway L R GiR Water Bunker
1 Rory McIlroy 269 266.504 0.696 0.107 0.196 0.736 0.000 0.153
2 Bryson DeChambeau 273 260.902 0.768 0.125 0.107 0.764 0.000 0.125
3 Justin Rose 274 263.270 0.786 0.054 0.161 0.750 0.028 0.139
4 Henrik Stenson 275 253.695 0.839 0.054 0.107 0.806 0.000 0.139
5 Tiger Woods 277 258.218 0.643 0.107 0.250 0.708 0.000 0.181
6 Ryan Moore 278 254.739 0.839 0.018 0.143 0.806 0.028 0.250
7 Kevin Chappell 280 266.802 0.732 0.125 0.143 0.806 0.042 0.153
8 Marc Leishman 280 259.035 0.714 0.071 0.214 0.722 0.014 0.208
9 Patrick Rodgers 280 257.997 0.589 0.161 0.250 0.722 0.014 0.250
10 Chris Kirk 281 254.156 0.732 0.143 0.125 0.708 0.014 0.264
Table 1: Historical golf metrics for the top ten players from the 2018 Arnold Palmer Invitational. Score: total score over the 4 rounds; Tee-shot : average tee-shot distance in meters (on par 4 and par 5 only); Fairway: percentage of fairways hit with the tee-shot on par 4 and par 5; L: percentage of fairways missed on the left on par 4 and par 5; R: percentage of fairways missed on the right on par 4 and par 5; GiR: percentage of green hit in a number of shot no more than par minus 2; Water: percentage of water hasard penalties ; Bunker: percentage of bunker shots.
Vorname Nachname Score Tee-shot Fairway L R GiR Water Bunker
1 Rory McIlroy 273.486 284.492 0.685 0.126 0.189 0.735 0.009 0.137
2 Bryson DeChambeau 277.623 270.841 0.676 0.141 0.183 0.760 0.009 0.121
3 Justin Rose 275.630 269.931 0.700 0.147 0.153 0.742 0.004 0.144
4 Henrik Stenson 281.944 258.893 0.699 0.135 0.166 0.718 0.007 0.144
5 Tiger Woods 269.085 263.598 0.765 0.088 0.147 0.795 0.005 0.099
6 Ryan Moore 282.356 262.925 0.686 0.133 0.181 0.718 0.008 0.157
7 Kevin Chappell 278.488 291.782 0.624 0.121 0.255 0.721 0.015 0.118
8 Marc Leishman 274.260 277.578 0.702 0.136 0.162 0.770 0.009 0.101
9 Patrick Rodgers 284.414 271.482 0.628 0.181 0.191 0.678 0.006 0.164
10 Chris Kirk 289.085 243.917 0.647 0.135 0.217 0.693 0.013 0.119
Table 2: Simulated golf metrics for the top ten players from the 2018 Arnold Palmer Invitational (10000 simulations). Score: average score over the 4 rounds. Tee-shot : average tee-shot distance in meters (on par 4 and par 5 only); Fairway: percentage of fairways hit with the tee-shot on par 4 and par 5; L: percentage of fairways missed on the left on par 4 and par 5; R: percentage of fairways missed on the right on par 4 and par 5; GiR: percentage of green hit in a number of shot no more than par minus 2; Water: percentage of water hasard penalties ; Bunker: percentage of bunker shots

Despite some discrepancies, the constructed virtual players are representative of typical PGA Tour players, which is sufficient for demonstrating how our methodology can be leveraged to improve performance.

6 Leveraging the methodology for prioritizing training : the value of golf skills

While some statistics enable professional players to compare their performances with others (e.g., strokes gained [9, 10]), identifying the specific areas for improvement to maximize performance remains a challenge. Should a player focus on enhancing distance control, lateral dispersion, putting skills, or driving length? A significant advantage of our modeling approach is its flexibility in conducting interventions. This allows us to substitute certain skills with alternatives and compare the outcomes. For example, our approach can easily assess scenarios such as a player having Rory McIlroy’s exceptional driving length or Tiger Woods’ outstanding putting skills.

We have simulated the performances of our 119 representatives of the PGA Tour, under such interventions. First, we have adjusted their driving skills to match Rory McIlroy’s performance (we have substituted their driving stats with Rory McIlroy’s stats). Then, we have modified their putting skills to match Tiger Woods’ putting abilities. This allows us to assess the impact of these changes on their average scores. The results of these simulations are reported in Fig. 20 and Fig. 21.

Refer to caption
Figure 20: The distribution (over the 119 players) of the gain obtained by substituting a player driving skills with Rory McIlroy’s. The values on the x-axis (gap_driving) are the average gain per hole.
Refer to caption
Figure 21: The distribution (over the 119 players) of the gain obtained by substituting a player putting skills with Tiger Woods’. The values on the x-axis (gap_putting) are the average gain per hole.

Interestingly, the average gain per hole using Rory McIlroy’s driving skill is 0.139 (95% confidence interval: [0.126, 0.152]), whereas the average gain per hole using Tiger Woods’ putting skill is only 0.046 (95% confidence interval: [0.041, 0.050]). This seems to challenge the well-known maxim ”Drive for show, putt for dough.” Although several authors have already critiqued this saying using statistical methods [2, 7], we believe that the answer is not universal but rather specific to both the player and the course and our model offer a grasp on this question. A detail analysis of this matter is beyond the scope of the current paper.

Our approach, combined with accurate Trackman profile of players, allows to quantify the value of certain hypothetical skills (for a specific player on a specific course) as we have illustrated here with Rory McIlroy’s driving skills and Tiger Woods’ putting skills. We believe that this could provide an invaluable tool to prioritize training for professional players, elite amateurs, but also week-end players. Indeed one could use the same approach to quantify the impact of increasing a player’s driving length by 10m or reducing their driving lateral dispersion by 10%. This could also be applied to other skills like long shots, wedging, chipping and putting, under various scenarios. Depending on the effort needed to reach the corresponding improvement and the expected benefit on the average score, the player would have a rational way of prioritizing, with his or her training team, between different training strategies.

7 Conclusion and perspectives

The primary aim of this study was to demonstrate the computational feasibility of our methodology, designed to optimize professional golfers performances on the PGA Tour using data from Shotlink. We have also explored how it could assist golfers in making training decisions and to challenge conventional wisdom in golf such as the ”Drive for Show, Putt for Dough” saying.

Our models could also guide golf course design or redesign. By simulating different course layouts—such as adding new obstacles or repositioning tees and pins— with different PGA tour players, course architects could tailor courses for greater competitive challenge or enhanced spectacle. Moreover, one could use our methodology to rank golf courses based on the collective performance of a representative set of players, providing an additional intriguing application.

Finally, our method would likely gain from integrating a 3D simulator instead of the current 2D model, and utilizing the 3D trajectories provided by Shotlink. While this would increase the computational demands, the rise would not be excessive. Indeed, 3D adaptations of the Bresenham algorithm are available [4], and parallelizing the code could help manage computing times effectively.

8 Acknowledgement

We would like to thank the PGA Tour for giving us access to the ShotLink data. We would also like to thank Renaud Gris and Jason Belot (French U21 Elite amateur trainers) for the constructive discussions of our models and providing data on some of their elite amateurs. At the time of writing, the ShotLink Intelligence program has been discontinued, and ShotLink data is no longer publicly available to academics. However, we have received authorization from the PGA Tour to make our data accessible for replication and validation by other researchers. We would like to express our gratitude to the PGA Tour, and to Ken Lovell in particular, for their support. Please note that any further use of the data requires prior permission from the PGA Tour.

References

  • [1] The rules of golf. https://www.randa.org/rog/the-rules-of-golf, 2019. Accessed: 2024-03-15.
  • [2] DL Alexander and W Kern. Drive for show and putt for dough? an analysis of the earnings of pga tour golfers. Journal of sports Economics, 6(1):46–60, 2005.
  • [3] J Arkes. The hot hand vs. cold hand on the pga tour. International Journal of Sport Finance, 11(2):99–113, 2016.
  • [4] C Au and T Woo. Three dimensional extension of bresenham’s algorithm with voronoi diagram. Computer-Aided Design, 43(4):417–426, 2011.
  • [5] M Bansal and M Broadie. A simulation model to analyze the impact of hole size on putting in golf. In Simulation Conference, pages 2826–2834. WSC 2008, 2008.
  • [6] CD Baugher, JP Day, and EW Burford Jr. Drive for show and putt for dough? not anymore. Journal of Sports Economics, 17(2):207–215, 2016.
  • [7] CD Baugher, JP Day, and EW Burford Jr. Drive for show and putt for dough? not anymore. Journal of Sports Economics, 17(2):207–215, 2016.
  • [8] J Bresenham. Algorithm for computer control of a digital plotter. IBM Systems Journal, 4:25–30, 1965.
  • [9] M Broadie. Assessing golfer performance using golfmetrics. In Science and golf V: Proceedings of the 2008 world scientific congress of golf, pages 253–262. St. Andrews: World Scientific Congress of Golf Trust, 2008.
  • [10] M Broadie. Assessing golfer performance on the pga tour. Interfaces, 42(2):146–165, 2012.
  • [11] M Broadie and S Ko. A simulation model to analyze the impact of distance and direction on golf scores. In Winter Simulation Conference, WSC 2009, pages 3109–3120, 2009.
  • [12] RP Bunker and F Thabtah. A machine learning framework for sport result prediction. Applied Computing and Informatics, 15(1):27–33, 2019.
  • [13] R Connolly and RJ Rendleman. Tournament selection efficiency: an analysis of the pga tour’s fedexcup. Journal of Quantitative Analysis in Sports, 8(4), 2012.
  • [14] RA Connolly and RJ Rendleman. Dominance, intimidation, and ‘choking’ on the pga tour. Journal of Quantitative Analysis in Sports, 5(3), 2009.
  • [15] RA Connolly and RJ Rendleman Jr. Skill, luck, and streaky play on the pga tour. Journal of the American Statistical Association, 103(481):74–88, 2008.
  • [16] RA Connolly and RJ Rendleman Jr. What it takes to win on the pga tour (if your name is “tiger” or if it isn’t). Interfaces, 42(6):554–576, 2012.
  • [17] C Drappi and LC Ting Keh. Predicting golf scores at the shot level. Journal of Sports Analytics, 5(2):1–9, 2018.
  • [18] D Fearing, J Acimovic, and SC Graves. How to catch a tiger: understanding putting performance on the pga tour. Journal of Quantitative Analysis in Sports, 7(1), 2011.
  • [19] E Gnagy, M Dixon, E Clingerman, and J Bartholomew. An exploration of strategic decision making in golf: take a chance, it’s worth the risk. International Journal of Golf Science, 4(2):89–109, 2015.
  • [20] CM Grinstead and JL Snell. Introduction to probability. American Mathematical Society, Providence, RI, 1997.
  • [21] M Guillot and G Stauffer. The stochastic shortest path problem: a polyhedral combinatorics perspective. European Journal of Operational Research, 285(1):148–158, 2018.
  • [22] EL Heiny and R Heiny. And the 2011 driving champion is? dustin johnson. Journal of Quantitative Analysis in Sports, 8(4), 2012.
  • [23] EL Heiny and R Heiny. Stochastic model of the 2012 pga tour season. Journal of Quantitative Analysis in Sports, 10(4), 2014.
  • [24] DC Hickman, C Kerr, and N Metz. Rank and performance in dynamic tournaments: evidence from the pga tour. Journal of Sports Economics, 20(4):509–534, 2019.
  • [25] DC Hickman and NE Metz. The impact of pressure on performance: evidence from the pga tour. Journal of Economic Behavior and Organization, 116:319–330, 2015.
  • [26] S Hoffmeister and J Rambau. Strategy optimization in sports: a two-scale approach via markov decision problems. http://www.wm.uni-bayreuth.de/de/download/xcf2d3wd4lkj2/preprint_sso_bv.pdf, 2015.
  • [27] S Hoffmeister and J Rambau. Sport strategy optimization in beach volleyball? how to bound direct point probabilities dependent on individual skills. In MathSport International 2017 Conference, 2017.
  • [28] KY Huang and WL Chang. A neural network method for prediction of 2006 world cup football game. In The 2010 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2010.
  • [29] J Hucaljuk and A Rakipovic. Predicting football scores using machine learning techniques. In MIPRO, Proceedings of the 34th International Convention, pages 1623–1627. IEEE, 2011.
  • [30] N James and GD Rees. Approach shot accuracy as a performance indicator for us pga tour golf professionals. International Journal of Sports Sciences Coach, 3(1):145–160, 2008.
  • [31] J Lim, Y Lim, and J Song. Prediction of golf scores on the pga tour using statistical models. Korean Journal of Applied Statistics, 30(1):41–55, 2017.
  • [32] M Maher. Stochastic modelling of sport. In 2012 Ninth International Conference on Quantitative Evaluation of Systems, pages 207–208, 2012.
  • [33] S Ozbeklik and JK Smith. Risk taking in competition: evidence from match play golf tournaments. Journal of Corporate Finance, 44:506–523, 2017.
  • [34] M Pfeiffer, H Zhang, and A Hohmann. A markov chain model of elite table tennis competition. International Journal of Sports Sciences Coach, 5(2):205–222, 2010.
  • [35] S Robertson, AF Burnett, and R Gupta. Two tests of approach-iron golf skill and their ability to predict tournament performance. Journal of Sports Sciences, 32(14):1341–1349, 2014.
  • [36] K Routley and O Schulte. A markov game model for valuing player actions in ice hockey. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 782–791, Arlington, Virginia, United States, 2015. AUAI Press.
  • [37] Z Shi, S Moorthy, and A Zimmermann. Predicting ncaab match outcomes using ml techniques–some results and lessons learned. In ECML/PKDD 2013 Workshop on Machine Learning and Data Mining for Sports Analytics, 2013.
  • [38] M Stockl and PF Lamb. The variable and chaotic nature of professional golf performance. Journal of Sports Sciences, 36(9):978–984, 2018.
  • [39] S Sugawara, H Kawamura, and K Suzuki. Skill-based simulation model for optimizing strategy in golf. In Advanced Intelligent Mechatronics (AIM), 2013 IEEE/ASME International Conference, pages 1591–1596, 2013.
  • [40] A Terroba, W Kosters, J Varon, and CS Manresa-Yee. Finding optimal strategies in tennis from video sequences. International Journal of Pattern Recognition and Artificial Intelligence, 27(06):1355010, 2013.
  • [41] E Trumbelj and P Vraar. Simulating a basketball match with a homogeneous markov model and forecasting the outcome. International Journal of Forecasting, 28(2):532–542, 2012.
  • [42] O Wiseman. Using machine learning to predict the winning score of professional golf events on the PGA Tour. PhD thesis, National College of Ireland, Dublin, 2016.

9 Appendix

The tables below present the confidence intervals for the mean of various metrics considered in the table 1. For each metric, ’LB’ represents the lower bound and ’UB’ represents the upper bound of the interval.

FirstName LastName Score Score LB score UB Tee-shot Tee-shot LB Tee-shot UB
1 Rory McIlroy 269.000 258.816 279.184 266.504 262.347 270.661
2 Bryson DeChambeau 273.000 262.445 283.555 260.902 257.769 264.036
3 Justin Rose 274.000 263.146 284.854 263.270 259.396 267.145
4 Henrik Stenson 275.000 266.162 283.838 253.695 249.943 257.446
5 Tiger Woods 277.000 267.877 286.123 258.218 253.077 263.359
6 Ryan Moore 278.000 268.006 287.994 254.739 250.969 258.510
7 Kevin Chappell 280.000 268.684 291.316 266.802 262.215 271.389
8 Marc Leishman 280.000 269.146 290.854 259.035 254.889 263.180
9 Patrick Rodgers 280.000 270.532 289.468 257.997 254.502 261.492
10 Chris Kirk 281.000 270.205 291.795 254.156 250.510 257.801
FirstName LastName Fairway Fairway LB Fairway UB L L LB L UB
1 Rory McIlroy 0.696 0.594 0.799 0.107 0.036 0.178
2 Bryson DeChambeau 0.768 0.675 0.860 0.125 0.061 0.189
3 Justin Rose 0.786 0.691 0.880 0.054 0.000 0.107
4 Henrik Stenson 0.839 0.758 0.921 0.054 0.006 0.101
5 Tiger Woods 0.643 0.536 0.750 0.107 0.036 0.178
6 Ryan Moore 0.839 0.754 0.925 0.018 -0.013 0.049
7 Kevin Chappell 0.732 0.636 0.828 0.125 0.047 0.203
8 Marc Leishman 0.714 0.620 0.809 0.071 0.015 0.128
9 Patrick Rodgers 0.589 0.500 0.678 0.161 0.087 0.234
10 Chris Kirk 0.732 0.640 0.825 0.143 0.067 0.218
FirstName LastName R R LB R UB GiR GiR LB GiR UB
1 Rory McIlroy 0.196 0.107 0.286 0.736 0.638 0.834
2 Bryson DeChambeau 0.107 0.036 0.178 0.764 0.682 0.846
3 Justin Rose 0.161 0.075 0.246 0.750 0.648 0.852
4 Henrik Stenson 0.107 0.032 0.183 0.806 0.725 0.886
5 Tiger Woods 0.250 0.146 0.354 0.708 0.610 0.806
6 Ryan Moore 0.143 0.063 0.223 0.806 0.729 0.883
7 Kevin Chappell 0.143 0.072 0.214 0.806 0.714 0.897
8 Marc Leishman 0.214 0.123 0.305 0.722 0.618 0.826
9 Patrick Rodgers 0.250 0.159 0.341 0.722 0.623 0.822
10 Chris Kirk 0.125 0.052 0.198 0.708 0.605 0.811
FirstName LastName water water LB water UB bunker bunker LB bunker UB
1 Rory McIlroy 0.000 0.000 0.000 0.153 0.065 0.240
2 Bryson DeChambeau 0.000 0.000 0.000 0.125 0.046 0.204
3 Justin Rose 0.028 -0.004 0.059 0.139 0.056 0.222
4 Henrik Stenson 0.000 0.000 0.000 0.139 0.069 0.209
5 Tiger Woods 0.000 0.000 0.000 0.181 0.093 0.268
6 Ryan Moore 0.028 -0.027 0.082 0.250 0.151 0.349
7 Kevin Chappell 0.042 -0.019 0.103 0.153 0.062 0.243
8 Marc Leishman 0.014 -0.013 0.041 0.208 0.115 0.301
9 Patrick Rodgers 0.014 -0.013 0.041 0.250 0.158 0.342
10 Chris Kirk 0.014 -0.013 0.041 0.264 0.161 0.367

The tables below present the confidence intervals for the mean of various metrics considered in the table 2 . For each metric, ’LB’ represents the lower bound and ’UB’ represents the upper bound of the interval.

FirstName LastName Score Score LB score UB Tee-shot Tee-shot LB Tee-shot UB
1 Rory McIlroy 273.486 272.555 274.418 284.492 283.870 285.113
2 Bryson DeChambeau 277.623 276.739 278.506 270.841 270.328 271.354
3 Justin Rose 275.630 274.768 276.493 269.931 269.510 270.353
4 Henrik Stenson 281.944 281.022 282.867 258.893 258.463 259.323
5 Tiger Woods 269.085 268.275 269.895 263.598 263.269 263.927
6 Ryan Moore 282.356 281.473 283.240 262.925 262.520 263.330
7 Kevin Chappell 278.488 277.572 279.405 291.782 291.217 292.348
8 Marc Leishman 274.260 273.386 275.135 277.578 277.045 278.110
9 Patrick Rodgers 284.414 283.510 285.317 271.482 271.055 271.908
10 Chris Kirk 289.085 288.156 290.013 243.917 243.217 244.617
FirstName LastName Fairway Fairway LB Fairway UB L L LB L UB
1 Rory McIlroy 0.685 0.677 0.693 0.126 0.120 0.133
2 Bryson DeChambeau 0.676 0.668 0.685 0.141 0.135 0.147
3 Justin Rose 0.700 0.691 0.709 0.147 0.140 0.154
4 Henrik Stenson 0.699 0.690 0.707 0.135 0.129 0.141
5 Tiger Woods 0.765 0.757 0.773 0.088 0.083 0.093
6 Ryan Moore 0.686 0.677 0.695 0.133 0.126 0.139
7 Kevin Chappell 0.624 0.615 0.632 0.121 0.115 0.127
8 Marc Leishman 0.702 0.694 0.711 0.136 0.130 0.142
9 Patrick Rodgers 0.628 0.619 0.637 0.181 0.174 0.188
10 Chris Kirk 0.647 0.639 0.656 0.135 0.129 0.142
FirstName LastName R R LB R UB GiR GiR LB GiR UB
1 Rory McIlroy 0.189 0.182 0.196 0.735 0.726 0.743
2 Bryson DeChambeau 0.183 0.176 0.190 0.760 0.752 0.768
3 Justin Rose 0.153 0.146 0.160 0.742 0.734 0.750
4 Henrik Stenson 0.166 0.159 0.173 0.718 0.710 0.727
5 Tiger Woods 0.147 0.141 0.154 0.795 0.787 0.803
6 Ryan Moore 0.181 0.175 0.188 0.718 0.710 0.727
7 Kevin Chappell 0.255 0.248 0.262 0.721 0.713 0.730
8 Marc Leishman 0.162 0.155 0.168 0.770 0.761 0.778
9 Patrick Rodgers 0.191 0.184 0.198 0.678 0.669 0.687
10 Chris Kirk 0.217 0.210 0.224 0.693 0.684 0.701
FirstName LastName water water LB water UB bunker bunker LB bunker UB
1 Rory McIlroy 0.009 0.007 0.011 0.137 0.130 0.144
2 Bryson DeChambeau 0.009 0.007 0.010 0.121 0.114 0.128
3 Justin Rose 0.004 0.003 0.005 0.144 0.137 0.150
4 Henrik Stenson 0.007 0.006 0.009 0.144 0.137 0.151
5 Tiger Woods 0.005 0.003 0.006 0.099 0.093 0.105
6 Ryan Moore 0.008 0.006 0.009 0.157 0.150 0.164
7 Kevin Chappell 0.015 0.013 0.017 0.118 0.112 0.125
8 Marc Leishman 0.009 0.007 0.011 0.101 0.095 0.107
9 Patrick Rodgers 0.006 0.005 0.008 0.164 0.157 0.172
10 Chris Kirk 0.013 0.011 0.015 0.119 0.113 0.126