# Q-learning

Q-learning is a reinforcement learning technique used in machine learning. 

## Angular velocity

In physics, the angular velocity of a particle is the rate at which it rotates around a chosen center point: that is, the time rate of change of its angular displacement relative to the origin.

## Artificial neural network

Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

## Atari 2600

The Atari 2600 (or Atari Video Computer System before November 1982) is a home video game console from Atari, Inc. Released on September 11, 1977, it is credited with popularizing the use of microprocessor-based hardware and games contained on ROM cartridges, a format first used with the Fairchild Channel F in 1976.

## Backpropagation

Backpropagation is a method used in artificial neural networks to calculate a gradient that is needed in the calculation of the weights to be used in the network.

## Convolution

In mathematics (and, in particular, functional analysis) convolution is a mathematical operation on two functions (f and g) to produce a third function, that is typically viewed as a modified version of one of the original functions, giving the integral of the pointwise multiplication of the two functions as a function of the amount that one of the original functions is translated.

## Convolutional neural network

In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery.

## Deep learning

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms.

## DeepMind

DeepMind Technologies Limited is a British artificial intelligence company founded in September 2010.

## Deterministic system

In mathematics, computer science and physics, a deterministic system is a system in which no randomness is involved in the development of future states of the system.

## Expected value

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents.

## Function approximation

In general, a function approximation problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a task-specific way.

## Game theory

Game theory is "the study of mathematical models of conflict and cooperation between intelligent rational decision-makers".

## Intelligent agent

In artificial intelligence, an intelligent agent (IA) is an autonomous entity which observes through sensors and acts upon an environment using actuators (i.e. it is an agent) and directs its activity towards achieving goals (i.e. it is "rational", as defined in economics).

## Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

## Markov decision process

Markov decision processes (MDPs) provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker.

## Peter Norvig

Peter Norvig (born December 14, 1956) is an American computer scientist.

## Prentice Hall

Prentice Hall is a major educational publisher owned by Pearson plc.

## Probably approximately correct learning

In computational learning theory, probably approximately correct learning (PAC learning) is a framework for mathematical analysis of machine learning.

## Pseudocode

Pseudocode is an informal high-level description of the operating principle of a computer program or other algorithm.

## Reinforcement learning

Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

## State–action–reward–state–action

State–action–reward–state–action (Sarsa) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning.

## Stochastic process

--> In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a collection of random variables.

## Stuart J. Russell

Stuart Jonathan Russell (born 1962) is a computer scientist known for his contributions to artificial intelligence.

## Temporal difference learning

Temporal difference (TD) learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping from the current estimate of the value function.

