Communication
Free
Faster access than browser!

# Kullback–Leibler divergence

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution. [1]

## Absolute continuity

In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity.

## Akaike information criterion

The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data.

## Alfréd Rényi

Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician who made contributions in combinatorics, graph theory, number theory but mostly in probability theory.

## Almost everywhere

In measure theory (a branch of mathematical analysis), a property holds almost everywhere if, in a technical sense, the set for which the property holds takes up nearly all possibilities.

## Annals of Mathematical Statistics

The Annals of Mathematical Statistics was a peer-reviewed statistics journal published by the Institute of Mathematical Statistics from 1930 to 1972.

## Bayes' theorem

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule, also written as Bayes’s theorem) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

## Bayesian experimental design

Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived.

## Bayesian inference

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available.

## Bayesian information criterion

In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred.

## Bayesian statistics

Bayesian statistics, named for Thomas Bayes (1701–1761), is a theory in the field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief known as Bayesian probabilities.

## Bit

The bit (a portmanteau of binary digit) is a basic unit of information used in computing and digital communications.

## Bregman divergence

In mathematics, a Bregman divergence or Bregman distance is similar to a metric, but satisfies neither the triangle inequality nor symmetry.

## Chapman & Hall

Chapman & Hall was a British publishing house in London, founded in the first half of the 19th century by Edward Chapman and William Hall.

## Chi-squared test

A chi-squared test, also written as test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true.

## Coding theory

Coding theory is the study of the properties of codes and their respective fitness for specific applications.

## Conditional entropy

In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random variable X is known.

## Conference on Neural Information Processing Systems

The Conference and Workshop on Neural Information Processing Systems (NIPS) is a machine learning and computational neuroscience conference held every December.

## Covariance matrix

In probability theory and statistics, a covariance matrix (also known as dispersion matrix or variance–covariance matrix) is a matrix whose element in the i, j position is the covariance between the i-th and j-th elements of a random vector.

## Cross entropy

In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution q, rather than the "true" distribution p. The cross entropy for the distributions p and q over a given set is defined as follows: where H(p) is the entropy of p, and D_(p \| q) is the Kullback–Leibler divergence of q from p (also known as the relative entropy of p with respect to q &mdash; note the reversal of emphasis).

## Data compression

In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation.

## Data differencing

In computer science and information theory, data differencing or differential compression is producing a technical description of the difference between two sets of data – a source and a target.

## Density matrix

A density matrix is a matrix that describes a quantum system in a mixed state, a statistical ensemble of several quantum states.

## Deviance information criterion

The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).

## Differential entropy

Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions.

## Dimensional analysis

In engineering and science, dimensional analysis is the analysis of the relationships between different physical quantities by identifying their base quantities (such as length, mass, time, and electric charge) and units of measure (such as miles vs. kilometers, or pounds vs. kilograms) and tracking these dimensions as calculations or comparisons are performed.

## Divergence

In vector calculus, divergence is a vector operator that produces a scalar field, giving the quantity of a vector field's source at each point.

## Divergence (statistics)

In statistics and information geometry, divergence or a contrast function is a function which establishes the "distance" of one probability distribution to the other on a statistical manifold.

## Dover Publications

Dover Publications, also known as Dover Books, is an American book publisher founded in 1941 by Hayward Cirker and his wife, Blanche.

## E (mathematical constant)

The number is a mathematical constant, approximately equal to 2.71828, which appears in many different settings throughout mathematics.

## Earth mover's distance

In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D.

## Edwin Thompson Jaynes

Edwin Thompson Jaynes (July 5, 1922 &ndash; April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis.

## Einstein notation

In mathematics, especially in applications of linear algebra to physics, the Einstein notation or Einstein summation convention is a notational convention that implies summation over a set of indexed terms in a formula, thus achieving notational brevity.

## Entropic value at risk

In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position.

## Entropy

In statistical mechanics, entropy is an extensive property of a thermodynamic system.

## Entropy (information theory)

Information entropy is the average rate at which information is produced by a stochastic source of data.

## Entropy in thermodynamics and information theory

There are close parallels between the mathematical expressions for the thermodynamic entropy, usually denoted by S, of a physical system in the statistical thermodynamics established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s, and the information-theoretic entropy, usually expressed as H, of Claude Shannon and Ralph Hartley developed in the 1940s.

## Entropy maximization

An entropy maximization problem is a convex optimization problem of the form where \vec \in \mathbb^n_ is the optimization variable, A\in\mathbb^ and b \in\mathbb^m are problem parameters, and \mathbf denotes a vector whose components are all 1.

## Entropy power inequality

In information theory, the entropy power inequality is a result that relates to so-called "entropy power" of random variables.

## Exergy

In thermodynamics, the exergy (in older usage, available work or availability) of a system is the maximum useful work possible during a process that brings the system into equilibrium with a heat reservoir.

## Expected value

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents.

## F-divergence

In probability theory, an ƒ-divergence is a function Df&thinsp;(P&thinsp;&thinsp;||&thinsp;Q) that measures the difference between two probability distributions P and Q. It helps the intuition to think of the divergence as an average, weighted by the function f, of the odds ratio given by P and Q. These divergences were introduced and studied independently by, and and are sometimes known as Csiszár ƒ-divergences, Csiszár-Morimoto divergences or Ali-Silvey distances.

## Fisher information metric

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space.

## Fluid mechanics

Fluid mechanics is a branch of physics concerned with the mechanics of fluids (liquids, gases, and plasmas) and the forces on them.

## Gibbs free energy

In thermodynamics, the Gibbs free energy (IUPAC recommended name: Gibbs energy or Gibbs function; also known as free enthalpy to distinguish it from Helmholtz free energy) is a thermodynamic potential that can be used to calculate the maximum of reversible work that may be performed by a thermodynamic system at a constant temperature and pressure (isothermal, isobaric).

## Gibbs' inequality

Josiah Willard Gibbs In information theory, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution.

## Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions.

## Helmholtz free energy

In thermodynamics, the Helmholtz free energy is a thermodynamic potential that measures the useful work obtainable from a closed thermodynamic system at a constant temperature and volume.

## Hessian matrix

In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field.

## Huffman coding

In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.

## I. J. Good

Irving John ("I. J."; "Jack") Good (9 December 1916 – 5 April 2009) The Times of 16-apr-09, http://www.timesonline.co.uk/tol/comment/obituaries/article6100314.ece was a British mathematician who worked as a cryptologist at Bletchley Park with Alan Turing.

## Inference

Inferences are steps in reasoning, moving from premises to logical consequences.

## Information gain in decision trees

In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence.

## Information gain ratio

In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information.

## Information projection

In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is where D_ is the Kullback–Leibler divergence from q to p. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection p^* is the "closest" distribution to q of all the distributions in P. The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex: \operatorname_(p||q) \geq \operatorname_(p||p^*) + \operatorname_(p^*||q) This inequality can be interpreted as an information-geometric version of Pythagoras' triangle inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.

## Information theory and measure theory

This article discusses how information theory (a branch of mathematics studying the transmission, processing and storage of information) is related to measure theory (a branch of mathematics related to integration and probability).

## International Journal of Computer Vision

The International Journal of Computer Vision (IJCV) is a journal published by Springer.

## Jensen–Shannon divergence

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions.

## John Wiley & Sons

John Wiley & Sons, Inc., also referred to as Wiley, is a global publishing company that specializes in academic publishing.

## Joint probability distribution

Given random variables X, Y,..., that are defined on a probability space, the joint probability distribution for X, Y,...

## Josiah Willard Gibbs

Josiah Willard Gibbs (February 11, 1839 &ndash; April 28, 1903) was an American scientist who made important theoretical contributions to physics, chemistry, and mathematics.

## Kolmogorov–Smirnov test

In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).

## Kraft–McMillan inequality

In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code (in Leon G. Kraft's version) or a uniquely decodable code (in Brockway McMillan's version) for a given set of codeword lengths.

## Kronecker delta

In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers.

## Large deviations theory

In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions.

## List of weight-of-evidence articles

Weight of evidence is a measure of evidence on one side of an issue as compared with the evidence on the other side of the issue, or to measure the evidence on multiple issues.

## Logarithm

In mathematics, the logarithm is the inverse function to exponentiation.

## Logit

The logit function is the inverse of the sigmoidal "logistic" function or logistic transform used in mathematics, especially in statistics.

## Loss function

In mathematical optimization, statistics, econometrics, decision theory, machine learning and computational neuroscience, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.

## Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

## Marginal distribution

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset.

## Matching distance

In mathematics, the matching distanceMichele d'Amico, Patrizio Frosini, Claudia Landi, Using matching distance in Size Theory: a survey, International Journal of Imaging Systems and Technology, 16(5):154–161, 2006.

## Mathematical statistics

Mathematical statistics is the application of mathematics to statistics, as opposed to techniques for collecting statistical data.

## Maximum likelihood estimation

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations.

## Maximum spacing estimation

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model.

## Measure (mathematics)

In mathematical analysis, a measure on a set is a systematic way to assign a number to each suitable subset of that set, intuitively interpreted as its size.

## Metric (mathematics)

In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.

## Metric space

In mathematics, a metric space is a set for which distances between all members of the set are defined.

## Metric tensor

In the mathematical field of differential geometry, a metric tensor is a type of function which takes as input a pair of tangent vectors and at a point of a surface (or higher dimensional differentiable manifold) and produces a real number scalar in a way that generalizes many of the familiar properties of the dot product of vectors in Euclidean space.

## Multivariate normal distribution

In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions.

## Mutual information

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables.

## Nat (unit)

The natural unit of information (symbol: nat), sometimes also nit or nepit, is a unit of information or entropy, based on natural logarithms and powers of ''e'', rather than the powers of 2 and base 2 logarithms, which define the bit.

## Neuroscience

Neuroscience (or neurobiology) is the scientific study of the nervous system.

## Numerical Recipes

Numerical Recipes is the generic title of a series of books on algorithms and numerical analysis by William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery.

## Partition function (mathematics)

The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics.

## Patch (computing)

A patch is a set of changes to a computer program or its supporting data designed to update, fix, or improve it.

## Pierre-Simon Laplace

Pierre-Simon, marquis de Laplace (23 March 1749 – 5 March 1827) was a French scholar whose work was important to the development of mathematics, statistics, physics and astronomy.

## Pinsker's inequality

In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence.

## Positive-definite matrix

In linear algebra, a symmetric real matrix M is said to be positive definite if the scalar z^Mz is strictly positive for every non-zero column vector z of n real numbers.

## Posterior probability

In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account.

## Principle of indifference

The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities.

## Principle of maximum entropy

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge is the one with largest entropy, in the context of precisely stated prior data (such as a proposition that expresses testable information).

## Prior probability

In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account.

## Probability density function

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.

## Probability distribution

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

## Probability space

In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that models a real-world process (or “experiment”) consisting of states that occur randomly.

## Proceedings of the Royal Society

Proceedings of the Royal Society is the parent title of two scientific journals published by the Royal Society.

## Quantum entanglement

Quantum entanglement is a physical phenomenon which occurs when pairs or groups of particles are generated, interact, or share spatial proximity in ways such that the quantum state of each particle cannot be described independently of the state of the other(s), even when the particles are separated by a large distance—instead, a quantum state must be described for the system as a whole.

## Quantum information science

Quantum information science is an area of study based on the idea that information science depends on quantum effects in physics.

## Quantum relative entropy

In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states.

In mathematics, the Radon–Nikodym theorem is a result in measure theory.

## Rate function

In mathematics &mdash; specifically, in large deviations theory &mdash; a rate function is a function used to quantify the probabilities of rare events.

## Rényi entropy

In information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min entropy.

## Richard Leibler

Richard A. Leibler (March 18, 1914, Chicago – October 25, 2003, Reston, Virginia) was an American mathematician and cryptanalyst.

## Riemann hypothesis

In mathematics, the Riemann hypothesis is a conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part.

## Riemannian manifold

In differential geometry, a (smooth) Riemannian manifold or (smooth) Riemannian space (M,g) is a real, smooth manifold M equipped with an inner product g_p on the tangent space T_pM at each point p that varies smoothly from point to point in the sense that if X and Y are differentiable vector fields on M, then p \mapsto g_p(X(p),Y(p)) is a smooth function.

## Self-information

In information theory, self-information or surprisal is the surprise when a random variable is sampled.

## Sergio Verdú

Sergio Verdú (born Barcelona, Spain, August 15, 1958) is the Eugene Higgins Professor of Electrical Engineering at Princeton University, where he teaches and conducts research on Information Theory in the Information Sciences and Systems Group.

## Solomon Kullback

Solomon Kullback (April 3, 1907August 5, 1994) was an American cryptanalyst and mathematician, who was one of the first three employees hired by William F. Friedman at the US Army's Signal Intelligence Service (SIS) in the 1930s, along with Frank Rowlett and Abraham Sinkov.

## Statistical classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

## Statistical distance

In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.

## Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of some sample data and similar data from a larger population.

## Statistical Science

Statistical Science is a review journal published by the Institute of Mathematical Statistics.

## Symmetry

Symmetry (from Greek συμμετρία symmetria "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance.

## Taylor series

In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point.

## The American Statistician

The American Statistician is a quarterly peer-reviewed scientific journal covering statistics published by Taylor & Francis on behalf of the American Statistical Association.

## Time series

A time series is a series of data points indexed (or listed or graphed) in time order.

## Total variation

In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure.

## Total variation distance of probability measures

In probability theory, the total variation distance is a distance measure for probability distributions.

## Triangle inequality

In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.

## Utility

Within economics the concept of utility is used to model worth or value, but its usage has evolved significantly over time.

## Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements).

## Vector calculus

Vector calculus, or vector analysis, is a branch of mathematics concerned with differentiation and integration of vector fields, primarily in 3-dimensional Euclidean space \mathbb^3.

## Work (thermodynamics)

In thermodynamics, work performed by a system is the energy transferred by the system to its surroundings, that is fully accounted for solely by macroscopic forces exerted on the system by factors external to it, that is to say, factors in its surroundings.

## References

Hey! We are on Facebook now! »