Free
Faster access than browser!

# Multi-armed bandit

In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. [1]

## Annals of Applied Probability

The Annals of Applied Probability is a peer-reviewed mathematics journal published by the Institute of Mathematical Statistics.

## Annals of Statistics

The Annals of Statistics is a peer-reviewed statistics journal published by the Institute of Mathematical Statistics.

## Asymptote

In analytic geometry, an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the x or y coordinates tends to infinity.

## Bayes' theorem

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule, also written as Bayes’s theorem) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

## Bulletin of the American Mathematical Society

The Bulletin of the American Mathematical Society is a quarterly mathematical journal published by the American Mathematical Society.

## Clinical trial

Clinical trials are experiments or observations done in clinical research.

## Condorcet criterion

The Condorcet candidate (Condorcet winner) is the person who would win a two-candidate election against each of the other candidates in a plurality vote.

The Condorcet paradox (also known as voting paradox or the paradox of voting) in social choice theory is a situation noted by the Marquis de Condorcet in the late 18th century, in which collective preferences can be cyclic, even if the preferences of individual voters are not cyclic.

## Dynamic routing

Dynamic routing, also called adaptive routing, is a process where a router can forward data via a different route or given destination based on the current conditions of the communication circuits within a system.

## Gambling

Gambling is the wagering of money or something of value (referred to as "the stakes") on an event with an uncertain outcome with the primary intent of winning money or material goods.

## Germany

Germany (Deutschland), officially the Federal Republic of Germany (Bundesrepublik Deutschland), is a sovereign state in central-western Europe.

## Gittins index

The Gittins index is a measure of the reward that can be achieved by a random process bearing a termination state and evolving from its present state onward, under the option of terminating the said process at every later stage with the accrual of the probabilistic expected reward from that stage up to the attainment of its termination state.

## Greedy algorithm

A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the intent of finding a global optimum.

## Herbert Robbins

Herbert Ellis Robbins (January 12, 1915 – February 12, 2001) was an American mathematician and statistician.

## John C. Gittins

John Charles Gittins (born 1938) is a researcher in applied probability and operations research, who is a professor and Emeritus Fellow at Keble College, Oxford University.

## Journal of the Royal Statistical Society

The Journal of the Royal Statistical Society is a peer-reviewed scientific journal of statistics.

## Lecture Notes in Computer Science

Springer Lecture Notes in Computer Science (LNCS) is a series of computer science books published by Springer Science+Business Media (formerly Springer-Verlag) since 1973.

## Markov decision process

Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

## Medical ethics

Medical ethics is a system of moral principles that apply values to the practice of clinical medicine and in scientific research.

## Michael Katehakis

Michael N. Katehakis (Μιχαήλ Ν. Κατεχάκης; born 1952) is a Professor of Management Science at Rutgers University.

## Nonparametric regression

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data.

## Open-source model

The open-source model is a decentralized software-development model that encourages open collaboration.

## Optimal stopping

In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost.

## Peter Whittle (mathematician)

Peter Whittle (born 27 February 1927, in Wellington, New Zealand) is a mathematician and statistician, working in the fields of stochastic nets, optimal control, time series analysis, stochastic optimisation and stochastic dynamics. From 1967 to 1994, he was the Churchill Professor of Mathematics for Operational Research at the University of Cambridge.

## Pharmaceutical industry

The pharmaceutical industry (or medicine industry) is the commercial industry that discovers, develops, produces, and markets drugs or pharmaceutical drugs for use as different types of medicine and medications.

## Portfolio (finance)

In finance, a portfolio is a collection of investments held by an investment company, hedge fund, financial institution or individual.

## Prisoner's dilemma

The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so.

## Probability distribution

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

## Probability theory

Probability theory is the branch of mathematics concerned with probability.

## Regret (decision theory)

In decision theory, on making decisions under uncertainty—should information about the best course of action arrive after taking a fixed decision—the human emotional response of regret is often experienced.

## Reinforcement learning

Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

## Search theory

In microeconomics, search theory studies buyers or sellers who cannot instantly find a trading partner, and must therefore search for a partner prior to transacting.

## SIAM Journal on Computing

The SIAM Journal on Computing is a scientific journal focusing on the mathematical and formal aspects of computer science.

## Singular-value decomposition

In linear algebra, the singular-value decomposition (SVD) is a factorization of a real or complex matrix.

## Slot machine

A slot machine (American English), known variously as a fruit machine (British English), puggy (Scottish English), the slots (Canadian and American English), poker machine/pokies (Australian English and New Zealand English), or simply slot (American English), is a casino gambling machine with three or more reels which spin when a button is pushed.

## Softmax function

In mathematics, the softmax function, or normalized exponential function, is a generalization of the logistic function that "squashes" a -dimensional vector \mathbf of arbitrary real values to a -dimensional vector \sigma(\mathbf) of real values, where each entry is in the range (0, 1, and all the entries add up to 1. The function is given by In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution. The softmax function is also the gradient of the LogSumExp function. The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression), multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks. Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of distinct linear functions, and the predicted probability for the 'th class given a sample vector and a weighting vector is: This can be seen as the composition of linear functions \mathbf \mapsto \mathbf^\mathsf\mathbf_1, \ldots, \mathbf \mapsto \mathbf^\mathsf\mathbf_K and the softmax function (where \mathbf^\mathsf\mathbf denotes the inner product of \mathbf and \mathbf). The operation is equivalent to applying a linear operator defined by \mathbf to vectors \mathbf, thus transforming the original, probably highly-dimensional, input to vectors in a -dimensional space \mathbb^K.

## Stochastic scheduling

Stochastic scheduling concerns scheduling problems involving random attributes, such as random processing times, random due dates, random weights, and stochastic machine breakdowns.

## Thompson sampling

In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem.

## Tikhonov regularization

Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems.

## World War II

World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although conflicts reflecting the ideological clash between what would become the Allied and Axis blocs began earlier.

## References

Hey! We are on Facebook now! »