Get it on Google Play
New! Download Unionpedia on your Android™ device!
Faster access than browser!

Multi-armed bandit

Index Multi-armed bandit

In probability theory, the multi-armed bandit problem (sometimes called the K- or N-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice's properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to the choice. [1]

41 relations: Annals of Applied Probability, Annals of Statistics, Asymptote, Bayes' theorem, Bulletin of the American Mathematical Society, Clinical trial, Condorcet criterion, Condorcet paradox, Dynamic routing, Gambling, Germany, Gittins index, Greedy algorithm, Herbert Robbins, John C. Gittins, Journal of the Royal Statistical Society, Lecture Notes in Computer Science, Markov decision process, Medical ethics, Michael Katehakis, Nonparametric regression, Open-source model, Operations Research (journal), Optimal stopping, Peter Whittle (mathematician), Pharmaceutical industry, Portfolio (finance), Prisoner's dilemma, Probability distribution, Probability theory, Regret (decision theory), Reinforcement learning, Search theory, SIAM Journal on Computing, Singular-value decomposition, Slot machine, Softmax function, Stochastic scheduling, Thompson sampling, Tikhonov regularization, World War II.

Annals of Applied Probability

The Annals of Applied Probability is a peer-reviewed mathematics journal published by the Institute of Mathematical Statistics.

New!!: Multi-armed bandit and Annals of Applied Probability · See more »

Annals of Statistics

The Annals of Statistics is a peer-reviewed statistics journal published by the Institute of Mathematical Statistics.

New!!: Multi-armed bandit and Annals of Statistics · See more »


In analytic geometry, an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the x or y coordinates tends to infinity.

New!!: Multi-armed bandit and Asymptote · See more »

Bayes' theorem

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule, also written as Bayes’s theorem) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

New!!: Multi-armed bandit and Bayes' theorem · See more »

Bulletin of the American Mathematical Society

The Bulletin of the American Mathematical Society is a quarterly mathematical journal published by the American Mathematical Society.

New!!: Multi-armed bandit and Bulletin of the American Mathematical Society · See more »

Clinical trial

Clinical trials are experiments or observations done in clinical research.

New!!: Multi-armed bandit and Clinical trial · See more »

Condorcet criterion

The Condorcet candidate (Condorcet winner) is the person who would win a two-candidate election against each of the other candidates in a plurality vote.

New!!: Multi-armed bandit and Condorcet criterion · See more »

Condorcet paradox

The Condorcet paradox (also known as voting paradox or the paradox of voting) in social choice theory is a situation noted by the Marquis de Condorcet in the late 18th century, in which collective preferences can be cyclic, even if the preferences of individual voters are not cyclic.

New!!: Multi-armed bandit and Condorcet paradox · See more »

Dynamic routing

Dynamic routing, also called adaptive routing, is a process where a router can forward data via a different route or given destination based on the current conditions of the communication circuits within a system.

New!!: Multi-armed bandit and Dynamic routing · See more »


Gambling is the wagering of money or something of value (referred to as "the stakes") on an event with an uncertain outcome with the primary intent of winning money or material goods.

New!!: Multi-armed bandit and Gambling · See more »


Germany (Deutschland), officially the Federal Republic of Germany (Bundesrepublik Deutschland), is a sovereign state in central-western Europe.

New!!: Multi-armed bandit and Germany · See more »

Gittins index

The Gittins index is a measure of the reward that can be achieved by a random process bearing a termination state and evolving from its present state onward, under the option of terminating the said process at every later stage with the accrual of the probabilistic expected reward from that stage up to the attainment of its termination state.

New!!: Multi-armed bandit and Gittins index · See more »

Greedy algorithm

A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the intent of finding a global optimum.

New!!: Multi-armed bandit and Greedy algorithm · See more »

Herbert Robbins

Herbert Ellis Robbins (January 12, 1915 – February 12, 2001) was an American mathematician and statistician.

New!!: Multi-armed bandit and Herbert Robbins · See more »

John C. Gittins

John Charles Gittins (born 1938) is a researcher in applied probability and operations research, who is a professor and Emeritus Fellow at Keble College, Oxford University.

New!!: Multi-armed bandit and John C. Gittins · See more »

Journal of the Royal Statistical Society

The Journal of the Royal Statistical Society is a peer-reviewed scientific journal of statistics.

New!!: Multi-armed bandit and Journal of the Royal Statistical Society · See more »

Lecture Notes in Computer Science

Springer Lecture Notes in Computer Science (LNCS) is a series of computer science books published by Springer Science+Business Media (formerly Springer-Verlag) since 1973.

New!!: Multi-armed bandit and Lecture Notes in Computer Science · See more »

Markov decision process

Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

New!!: Multi-armed bandit and Markov decision process · See more »

Medical ethics

Medical ethics is a system of moral principles that apply values to the practice of clinical medicine and in scientific research.

New!!: Multi-armed bandit and Medical ethics · See more »

Michael Katehakis

Michael N. Katehakis (Μιχαήλ Ν. Κατεχάκης; born 1952) is a Professor of Management Science at Rutgers University.

New!!: Multi-armed bandit and Michael Katehakis · See more »

Nonparametric regression

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data.

New!!: Multi-armed bandit and Nonparametric regression · See more »

Open-source model

The open-source model is a decentralized software-development model that encourages open collaboration.

New!!: Multi-armed bandit and Open-source model · See more »

Operations Research (journal)

Operations Research is a bimonthly peer-reviewed academic journal covering operations research that is published by INFORMS.

New!!: Multi-armed bandit and Operations Research (journal) · See more »

Optimal stopping

In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost.

New!!: Multi-armed bandit and Optimal stopping · See more »

Peter Whittle (mathematician)

Peter Whittle (born 27 February 1927, in Wellington, New Zealand) is a mathematician and statistician, working in the fields of stochastic nets, optimal control, time series analysis, stochastic optimisation and stochastic dynamics. From 1967 to 1994, he was the Churchill Professor of Mathematics for Operational Research at the University of Cambridge.

New!!: Multi-armed bandit and Peter Whittle (mathematician) · See more »

Pharmaceutical industry

The pharmaceutical industry (or medicine industry) is the commercial industry that discovers, develops, produces, and markets drugs or pharmaceutical drugs for use as different types of medicine and medications.

New!!: Multi-armed bandit and Pharmaceutical industry · See more »

Portfolio (finance)

In finance, a portfolio is a collection of investments held by an investment company, hedge fund, financial institution or individual.

New!!: Multi-armed bandit and Portfolio (finance) · See more »

Prisoner's dilemma

The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so.

New!!: Multi-armed bandit and Prisoner's dilemma · See more »

Probability distribution

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

New!!: Multi-armed bandit and Probability distribution · See more »

Probability theory

Probability theory is the branch of mathematics concerned with probability.

New!!: Multi-armed bandit and Probability theory · See more »

Regret (decision theory)

In decision theory, on making decisions under uncertainty—should information about the best course of action arrive after taking a fixed decision—the human emotional response of regret is often experienced.

New!!: Multi-armed bandit and Regret (decision theory) · See more »

Reinforcement learning

Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

New!!: Multi-armed bandit and Reinforcement learning · See more »

Search theory

In microeconomics, search theory studies buyers or sellers who cannot instantly find a trading partner, and must therefore search for a partner prior to transacting.

New!!: Multi-armed bandit and Search theory · See more »

SIAM Journal on Computing

The SIAM Journal on Computing is a scientific journal focusing on the mathematical and formal aspects of computer science.

New!!: Multi-armed bandit and SIAM Journal on Computing · See more »

Singular-value decomposition

In linear algebra, the singular-value decomposition (SVD) is a factorization of a real or complex matrix.

New!!: Multi-armed bandit and Singular-value decomposition · See more »

Slot machine

A slot machine (American English), known variously as a fruit machine (British English), puggy (Scottish English), the slots (Canadian and American English), poker machine/pokies (Australian English and New Zealand English), or simply slot (American English), is a casino gambling machine with three or more reels which spin when a button is pushed.

New!!: Multi-armed bandit and Slot machine · See more »

Softmax function

In mathematics, the softmax function, or normalized exponential function, is a generalization of the logistic function that "squashes" a -dimensional vector \mathbf of arbitrary real values to a -dimensional vector \sigma(\mathbf) of real values, where each entry is in the range (0, 1, and all the entries add up to 1. The function is given by In probability theory, the output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over different possible outcomes. In fact, it is the gradient-log-normalizer of the categorical probability distribution. The softmax function is also the gradient of the LogSumExp function. The softmax function is used in various multiclass classification methods, such as multinomial logistic regression (also known as softmax regression), multiclass linear discriminant analysis, naive Bayes classifiers, and artificial neural networks. Specifically, in multinomial logistic regression and linear discriminant analysis, the input to the function is the result of distinct linear functions, and the predicted probability for the 'th class given a sample vector and a weighting vector is: This can be seen as the composition of linear functions \mathbf \mapsto \mathbf^\mathsf\mathbf_1, \ldots, \mathbf \mapsto \mathbf^\mathsf\mathbf_K and the softmax function (where \mathbf^\mathsf\mathbf denotes the inner product of \mathbf and \mathbf). The operation is equivalent to applying a linear operator defined by \mathbf to vectors \mathbf, thus transforming the original, probably highly-dimensional, input to vectors in a -dimensional space \mathbb^K.

New!!: Multi-armed bandit and Softmax function · See more »

Stochastic scheduling

Stochastic scheduling concerns scheduling problems involving random attributes, such as random processing times, random due dates, random weights, and stochastic machine breakdowns.

New!!: Multi-armed bandit and Stochastic scheduling · See more »

Thompson sampling

In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem.

New!!: Multi-armed bandit and Thompson sampling · See more »

Tikhonov regularization

Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems.

New!!: Multi-armed bandit and Tikhonov regularization · See more »

World War II

World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945, although conflicts reflecting the ideological clash between what would become the Allied and Axis blocs began earlier.

New!!: Multi-armed bandit and World War II · See more »

Redirects here:

Adversarial bandit, Bandit (machine learning), Bandit model, Bandit problem, Bandit process, Contextual bandit algorithm, E-greedy strategy, Epsilon-greedy strategy, K armed bandit, K-armed bandit, Multi armed bandit, Multi-armed bandit problem, Multi-armed bandits, Multiarmed bandit, Multi–armed bandit, N armed bandit, N-armed bandit, Two armed bandit, Two-armed bandit.


[1] https://en.wikipedia.org/wiki/Multi-armed_bandit

Hey! We are on Facebook now! »