Skip to content

Latest commit

 

History

History
120 lines (99 loc) · 6.62 KB

75f77142.md

File metadata and controls

120 lines (99 loc) · 6.62 KB
title slug tags date
The Gittins Index
the-gittins-index
zettel
the-gittins-index
statistics
probability
mathematics
algorithms-to-live-by
book
book/algorithms-to-live-by
brian-christian
tom-griffiths
brian-christian-and-tom-griffiths
2022-05-17T07:17

Discovered accidentally by John Gittins, statistics professor at Oxford, one of the hardest mathematical riddle by solving an optimization problem for Unilever corporation, a for-profit drug producing company.

In drug producing business, both the for-profit companies and the medical profession they serve are faced with the dilemma of the [[04f52b1f|explore/exploit trade-off]]. Companies want to make as much profit as possible but also want to reinvest profits into R&D for new drug breakthrough. Similarly, doctors are bound by their oath to "do no harm" and provide the best treatment to their patients but also favors experimental studies that may save countless number of lives in the future. However, the present holds more weight into the equation and "a cured patient today is taken to be more valuable than one cured a week or a year from now, and certainly the same holds true of profits."1 In parlance of the economists, valuing the present more highly than the future is called "discounting."

Gittins discovered what he called "dynamic allocation index", or more famously the Gittins index, that also put to rest the long overdue [[1f59b2e1|multi-armed bandit riddle]] with geometrically discounted payoffs.2 The competing demand to explore or exploit dissolves into maximization of a single quantity that accounts for both.

Sample Gittins Index

Wins (Column) vs. Loses (Row)

0 1 2 3 4 5 6 7 8 9
0 $.7029$ $.8001$ $.8452$ $.8723$ $.8905$ $.9039$ $.9141$ $.9221$ $.9287$ $.9342$
1 $.5001$ $.6346$ $.7072$ $.7539$ $.7869$ $.8115$ $.8307$ $.8461$ $.8588$ $.8695$
2 $.3796$ $.5163$ $.6010$ $.6579$ $.6996$ $.7318$ $.7573$ $.7782$ $.7956$ $.8103$
3 $.3021$ $.4342$ $.5184$ $.5809$ $.6276$ $.6642$ $.6940$ $.7187$ $.7396$ $.7573$
4 $.2488$ $.3720$ $.4561$ $.5179$ $.5676$ $.6071$ $.6395$ $.6666$ $.6899$ $.7101$
5 $.2103$ $.3245$ $.4058$ $.4677$ $.5168$ $.5581$ $.5923$ $.6212$ $.6461$ $.6677$
6 $.1815$ $.2871$ $.3647$ $.4257$ $.4748$ $.5156$ $.5510$ $.5811$ $.6071$ $.6300$
7 $.1591$ $.2569$ $.3308$ $.3900$ $.4387$ $.4795$ $.5144$ $.5454$ $.5723$ $.5960$
8 $.1413$ $.2323$ $.3025$ $.3595$ $.4073$ $.4479$ $.4828$ $.5134$ $.5409$ $.5652$
9 $.1269$ $.2116$ $.2784$ $.3332$ $.3799$ $.4200$ $.4548$ $.4853$ $.5125$ $.5373$

Gittins index values as a function of wins and losses, assuming that a payoff next time is worth 90% of a payoff now.1

~ Algorithms to Live By


Here, the rules are simple. Using the [[1f59b2e1|Multi-armed bandit slot machine problem]], with a record of 0 win and 0 losses we start at column 0 and row 0 and each win we go to the right of the table and each losses we go down. Gittins index follows a [[209fb824|Markov model]], that is the future state depends only on the current state, and never on the prior events. The table assumes that the next arm pull is worth 90% of the payoff now (much like a discount), and notice that the Gittins index at the start point is already $.7029$ or a 70% probability of payout. In other words, Gittins index gives the benefit of the doubt to a fresh start with no payout knowledge of the slot machine.

The index increases as we go the to right of the table and decrease as we go down, suggesting that we should keep playing as long as the machine pays out. However, losing does not necessarily we should move on to the next slot machine. The question: "is a fresh start worth more than my next pull?" For instance, a record of 4-1 with an index of $.7869$ or a 79% chance of payout, we compare to our starting point of 70% chance of payout, we should chance another pull since starting over is worth less than the current state. Winning another round only increases our chances of success but losing favors more of starting over. With a record of 4-2 with an index of $.6579$ or a 66% chance of payout, we are better off moving on to the next machine.

Wins (Column) vs. Loses (Row)

0 1 2 3 4 5 6 7 8 9
0 $.8699$ $.9102$ $.9285$ $.9395$ $.9470$ $.9525$ $.9568$ $.9603$ $.9631$ $.9655$
1 $.7005$ $.7844$ $.8268$ $.8533$ $.8719$ $.8857$ $.8964$ $.9051$ $.9122$ $.9183$
2 $.5671$ $.6726$ $.7308$ $.7696$ $.7973$ $.8184$ $.8350$ $.8485$ $.8598$ $.8693$
3 $.4701$ $.5806$ $.6490$ $.6952$ $.7295$ $.7561$ $.7773$ $.7949$ $.8097$ $.8222$
4 $.3969$ $.5093$ $.5798$ $.6311$ $.6697$ $.6998$ $.7249$ $.7456$ $.7631$ $.7781$
5 $.3415$ $.4509$ $.5225$ $.5756$ $.6172$ $.6504$ $.6776$ $.7004$ $.7203$ $.7373$
6 $.2979$ $.4029$ $.4747$ $.5277$ $.5710$ $.6061$ $.6352$ $.6599$ $.6811$ $.6997$
7 $.2632$ $.3633$ $.4337$ $.4876$ $.5300$ $.5665$ $.5970$ $.6230$ $.6456$ $.6653$
8 $.2350$ $.3303$ $.3986$ $.4520$ $.4952$ $.5308$ $.5625$ $.5895$ $.6130$ $.6337$
9 $.2117$ $.3020$ $.3679$ $.4208$ $.4640$ $.5002$ $.5310$ $.5589$ $.5831$ $.6045$

Gittins index values as a function of wins and losses, assuming that a payoff next time is worth 99% of a payoff now.1

~ Algorithms to Live By


Gittins index is generally difficult to calculate. For a more elaborate formulation of the Gittins index, see the paper Multi-Armed Bandits, Gittins Index, and It's Calculation by Jhelum Chakravorty and Aditya Mahajan.

Resources

Footnotes

  1. Algorithms to Live By by Brian Christian and Tom Griffiths - Explore/Exploit 2 3

  2. Geometric discounting refers to each decision to explore is worth a constant fraction of the last one.