123 relations: Absolute continuity, Akaike information criterion, Alfréd Rényi, Almost everywhere, Annals of Mathematical Statistics, Bayes' theorem, Bayesian experimental design, Bayesian inference, Bayesian information criterion, Bayesian statistics, Bit, Bregman divergence, Chapman & Hall, Chi-squared test, Coding theory, Conditional entropy, Conference on Neural Information Processing Systems, Covariance matrix, Cross entropy, Data compression, Data differencing, Density matrix, Deviance information criterion, Differential entropy, Dimensional analysis, Divergence, Divergence (statistics), Dover Publications, E (mathematical constant), Earth mover's distance, Edwin Thompson Jaynes, Einstein notation, Entropic value at risk, Entropy, Entropy (information theory), Entropy in thermodynamics and information theory, Entropy maximization, Entropy power inequality, Exergy, Expected value, F-divergence, Fisher information metric, Fluid mechanics, Gibbs free energy, Gibbs' inequality, Hellinger distance, Helmholtz free energy, Hessian matrix, Huffman coding, I. J. Good, ..., Inference, Information gain in decision trees, Information gain ratio, Information projection, Information theory and measure theory, International Journal of Computer Vision, Jensen–Shannon divergence, John Wiley & Sons, Joint probability distribution, Josiah Willard Gibbs, Kolmogorov–Smirnov test, Kraft–McMillan inequality, Kronecker delta, Large deviations theory, List of weight-of-evidence articles, Logarithm, Logit, Loss function, Machine learning, Marginal distribution, Matching distance, Mathematical statistics, Maximum likelihood estimation, Maximum spacing estimation, Measure (mathematics), Metric (mathematics), Metric space, Metric tensor, Multivariate normal distribution, Mutual information, Nat (unit), Neuroscience, Numerical Recipes, Partition function (mathematics), Patch (computing), Pierre-Simon Laplace, Pinsker's inequality, Positive-definite matrix, Posterior probability, Principle of indifference, Principle of maximum entropy, Prior probability, Probability density function, Probability distribution, Probability space, Proceedings of the Royal Society, Quantum entanglement, Quantum information science, Quantum relative entropy, Radon–Nikodym theorem, Rate function, Rényi entropy, Richard Leibler, Riemann hypothesis, Riemannian manifold, Self-information, Sergio Verdú, Solomon Kullback, Statistical classification, Statistical distance, Statistical model, Statistical Science, Symmetry, Taylor series, The American Statistician, Time series, Total variation, Total variation distance of probability measures, Triangle inequality, Utility, Variation of information, Vector calculus, Work (thermodynamics). Expand index (73 more) » « Shrink index
In calculus, absolute continuity is a smoothness property of functions that is stronger than continuity and uniform continuity.
The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given set of data.
Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician who made contributions in combinatorics, graph theory, number theory but mostly in probability theory.
In measure theory (a branch of mathematical analysis), a property holds almost everywhere if, in a technical sense, the set for which the property holds takes up nearly all possibilities.
The Annals of Mathematical Statistics was a peer-reviewed statistics journal published by the Institute of Mathematical Statistics from 1930 to 1972.
In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule, also written as Bayes’s theorem) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Bayesian experimental design provides a general probability-theoretical framework from which other theories on experimental design can be derived.
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available.
In statistics, the Bayesian information criterion (BIC) or Schwarz criterion (also SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred.
Bayesian statistics, named for Thomas Bayes (1701–1761), is a theory in the field of statistics in which the evidence about the true state of the world is expressed in terms of degrees of belief known as Bayesian probabilities.
The bit (a portmanteau of binary digit) is a basic unit of information used in computing and digital communications.
In mathematics, a Bregman divergence or Bregman distance is similar to a metric, but satisfies neither the triangle inequality nor symmetry.
Chapman & Hall was a British publishing house in London, founded in the first half of the 19th century by Edward Chapman and William Hall.
A chi-squared test, also written as test, is any statistical hypothesis test where the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true.
Coding theory is the study of the properties of codes and their respective fitness for specific applications.
In information theory, the conditional entropy (or equivocation) quantifies the amount of information needed to describe the outcome of a random variable Y given that the value of another random variable X is known.
The Conference and Workshop on Neural Information Processing Systems (NIPS) is a machine learning and computational neuroscience conference held every December.
In probability theory and statistics, a covariance matrix (also known as dispersion matrix or variance–covariance matrix) is a matrix whose element in the i, j position is the covariance between the i-th and j-th elements of a random vector.
In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution q, rather than the "true" distribution p. The cross entropy for the distributions p and q over a given set is defined as follows: where H(p) is the entropy of p, and D_(p \| q) is the Kullback–Leibler divergence of q from p (also known as the relative entropy of p with respect to q — note the reversal of emphasis).
In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation.
In computer science and information theory, data differencing or differential compression is producing a technical description of the difference between two sets of data – a source and a target.
A density matrix is a matrix that describes a quantum system in a mixed state, a statistical ensemble of several quantum states.
The deviance information criterion (DIC) is a hierarchical modeling generalization of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC).
Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Shannon to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions.
In engineering and science, dimensional analysis is the analysis of the relationships between different physical quantities by identifying their base quantities (such as length, mass, time, and electric charge) and units of measure (such as miles vs. kilometers, or pounds vs. kilograms) and tracking these dimensions as calculations or comparisons are performed.
In vector calculus, divergence is a vector operator that produces a scalar field, giving the quantity of a vector field's source at each point.
In statistics and information geometry, divergence or a contrast function is a function which establishes the "distance" of one probability distribution to the other on a statistical manifold.
Dover Publications, also known as Dover Books, is an American book publisher founded in 1941 by Hayward Cirker and his wife, Blanche.
The number is a mathematical constant, approximately equal to 2.71828, which appears in many different settings throughout mathematics.
In statistics, the earth mover's distance (EMD) is a measure of the distance between two probability distributions over a region D.
Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was the Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis.
In mathematics, especially in applications of linear algebra to physics, the Einstein notation or Einstein summation convention is a notational convention that implies summation over a set of indexed terms in a formula, thus achieving notational brevity.
In financial mathematics and stochastic optimization, the concept of risk measure is used to quantify the risk involved in a random outcome or risk position.
In statistical mechanics, entropy is an extensive property of a thermodynamic system.
Information entropy is the average rate at which information is produced by a stochastic source of data.
There are close parallels between the mathematical expressions for the thermodynamic entropy, usually denoted by S, of a physical system in the statistical thermodynamics established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s, and the information-theoretic entropy, usually expressed as H, of Claude Shannon and Ralph Hartley developed in the 1940s.
An entropy maximization problem is a convex optimization problem of the form where \vec \in \mathbb^n_ is the optimization variable, A\in\mathbb^ and b \in\mathbb^m are problem parameters, and \mathbf denotes a vector whose components are all 1.
In information theory, the entropy power inequality is a result that relates to so-called "entropy power" of random variables.
In thermodynamics, the exergy (in older usage, available work or availability) of a system is the maximum useful work possible during a process that brings the system into equilibrium with a heat reservoir.
In probability theory, the expected value of a random variable, intuitively, is the long-run average value of repetitions of the experiment it represents.
In probability theory, an ƒ-divergence is a function Df (P  || Q) that measures the difference between two probability distributions P and Q. It helps the intuition to think of the divergence as an average, weighted by the function f, of the odds ratio given by P and Q. These divergences were introduced and studied independently by, and and are sometimes known as Csiszár ƒ-divergences, Csiszár-Morimoto divergences or Ali-Silvey distances.
In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space.
Fluid mechanics is a branch of physics concerned with the mechanics of fluids (liquids, gases, and plasmas) and the forces on them.
In thermodynamics, the Gibbs free energy (IUPAC recommended name: Gibbs energy or Gibbs function; also known as free enthalpy to distinguish it from Helmholtz free energy) is a thermodynamic potential that can be used to calculate the maximum of reversible work that may be performed by a thermodynamic system at a constant temperature and pressure (isothermal, isobaric).
Josiah Willard Gibbs In information theory, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution.
In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions.
In thermodynamics, the Helmholtz free energy is a thermodynamic potential that measures the useful work obtainable from a closed thermodynamic system at a constant temperature and volume.
In mathematics, the Hessian matrix or Hessian is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field.
In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression.
Irving John ("I. J."; "Jack") Good (9 December 1916 – 5 April 2009) The Times of 16-apr-09, http://www.timesonline.co.uk/tol/comment/obituaries/article6100314.ece was a British mathematician who worked as a cryptologist at Bletchley Park with Alan Turing.
Inferences are steps in reasoning, moving from premises to logical consequences.
In information theory and machine learning, information gain is a synonym for Kullback–Leibler divergence.
In decision tree learning, Information gain ratio is a ratio of information gain to the intrinsic information.
In information theory, the information projection or I-projection of a probability distribution q onto a set of distributions P is where D_ is the Kullback–Leibler divergence from q to p. Viewing the Kullback–Leibler divergence as a measure of distance, the I-projection p^* is the "closest" distribution to q of all the distributions in P. The I-projection is useful in setting up information geometry, notably because of the following inequality, valid when P is convex: \operatorname_(p||q) \geq \operatorname_(p||p^*) + \operatorname_(p^*||q) This inequality can be interpreted as an information-geometric version of Pythagoras' triangle inequality theorem, where KL divergence is viewed as squared distance in a Euclidean space.
This article discusses how information theory (a branch of mathematics studying the transmission, processing and storage of information) is related to measure theory (a branch of mathematics related to integration and probability).
The International Journal of Computer Vision (IJCV) is a journal published by Springer.
In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions.
John Wiley & Sons, Inc., also referred to as Wiley, is a global publishing company that specializes in academic publishing.
Given random variables X, Y,..., that are defined on a probability space, the joint probability distribution for X, Y,...
Josiah Willard Gibbs (February 11, 1839 – April 28, 1903) was an American scientist who made important theoretical contributions to physics, chemistry, and mathematics.
In statistics, the Kolmogorov–Smirnov test (K–S test or KS test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test).
In coding theory, the Kraft–McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code (in Leon G. Kraft's version) or a uniquely decodable code (in Brockway McMillan's version) for a given set of codeword lengths.
In mathematics, the Kronecker delta (named after Leopold Kronecker) is a function of two variables, usually just non-negative integers.
In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions.
Weight of evidence is a measure of evidence on one side of an issue as compared with the evidence on the other side of the issue, or to measure the evidence on multiple issues.
In mathematics, the logarithm is the inverse function to exponentiation.
The logit function is the inverse of the sigmoidal "logistic" function or logistic transform used in mathematics, especially in statistics.
In mathematical optimization, statistics, econometrics, decision theory, machine learning and computational neuroscience, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.
In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset.
In mathematics, the matching distanceMichele d'Amico, Patrizio Frosini, Claudia Landi, Using matching distance in Size Theory: a survey, International Journal of Imaging Systems and Technology, 16(5):154–161, 2006.
Mathematical statistics is the application of mathematics to statistics, as opposed to techniques for collecting statistical data.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model, given observations.
In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model.
In mathematical analysis, a measure on a set is a systematic way to assign a number to each suitable subset of that set, intuitively interpreted as its size.
In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.
In mathematics, a metric space is a set for which distances between all members of the set are defined.
In the mathematical field of differential geometry, a metric tensor is a type of function which takes as input a pair of tangent vectors and at a point of a surface (or higher dimensional differentiable manifold) and produces a real number scalar in a way that generalizes many of the familiar properties of the dot product of vectors in Euclidean space.
In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions.
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables.
The natural unit of information (symbol: nat), sometimes also nit or nepit, is a unit of information or entropy, based on natural logarithms and powers of ''e'', rather than the powers of 2 and base 2 logarithms, which define the bit.
Neuroscience (or neurobiology) is the scientific study of the nervous system.
Numerical Recipes is the generic title of a series of books on algorithms and numerical analysis by William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P. Flannery.
The partition function or configuration integral, as used in probability theory, information theory and dynamical systems, is a generalization of the definition of a partition function in statistical mechanics.
A patch is a set of changes to a computer program or its supporting data designed to update, fix, or improve it.
Pierre-Simon, marquis de Laplace (23 March 1749 – 5 March 1827) was a French scholar whose work was important to the development of mathematics, statistics, physics and astronomy.
In information theory, Pinsker's inequality, named after its inventor Mark Semenovich Pinsker, is an inequality that bounds the total variation distance (or statistical distance) in terms of the Kullback–Leibler divergence.
In linear algebra, a symmetric real matrix M is said to be positive definite if the scalar z^Mz is strictly positive for every non-zero column vector z of n real numbers.
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account.
The principle of indifference (also called principle of insufficient reason) is a rule for assigning epistemic probabilities.
The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge is the one with largest entropy, in the context of precisely stated prior data (such as a proposition that expresses testable information).
In Bayesian statistical inference, a prior probability distribution, often simply called the prior, of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account.
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample.
In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.
In probability theory, a probability space or a probability triple (\Omega, \mathcal, P) is a mathematical construct that models a real-world process (or “experiment”) consisting of states that occur randomly.
Proceedings of the Royal Society is the parent title of two scientific journals published by the Royal Society.
Quantum entanglement is a physical phenomenon which occurs when pairs or groups of particles are generated, interact, or share spatial proximity in ways such that the quantum state of each particle cannot be described independently of the state of the other(s), even when the particles are separated by a large distance—instead, a quantum state must be described for the system as a whole.
Quantum information science is an area of study based on the idea that information science depends on quantum effects in physics.
In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states.
In mathematics, the Radon–Nikodym theorem is a result in measure theory.
In mathematics — specifically, in large deviations theory — a rate function is a function used to quantify the probabilities of rare events.
In information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min entropy.
Richard A. Leibler (March 18, 1914, Chicago – October 25, 2003, Reston, Virginia) was an American mathematician and cryptanalyst.
In mathematics, the Riemann hypothesis is a conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part.
In differential geometry, a (smooth) Riemannian manifold or (smooth) Riemannian space (M,g) is a real, smooth manifold M equipped with an inner product g_p on the tangent space T_pM at each point p that varies smoothly from point to point in the sense that if X and Y are differentiable vector fields on M, then p \mapsto g_p(X(p),Y(p)) is a smooth function.
In information theory, self-information or surprisal is the surprise when a random variable is sampled.
Sergio Verdú (born Barcelona, Spain, August 15, 1958) is the Eugene Higgins Professor of Electrical Engineering at Princeton University, where he teaches and conducts research on Information Theory in the Information Sciences and Systems Group.
Solomon Kullback (April 3, 1907August 5, 1994) was an American cryptanalyst and mathematician, who was one of the first three employees hired by William F. Friedman at the US Army's Signal Intelligence Service (SIS) in the 1930s, along with Frank Rowlett and Abraham Sinkov.
In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
In statistics, probability theory, and information theory, a statistical distance quantifies the distance between two statistical objects, which can be two random variables, or two probability distributions or samples, or the distance can be between an individual sample point and a population or a wider sample of points.
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of some sample data and similar data from a larger population.
Statistical Science is a review journal published by the Institute of Mathematical Statistics.
Symmetry (from Greek συμμετρία symmetria "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance.
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point.
The American Statistician is a quarterly peer-reviewed scientific journal covering statistics published by Taylor & Francis on behalf of the American Statistical Association.
A time series is a series of data points indexed (or listed or graphed) in time order.
In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure.
In probability theory, the total variation distance is a distance measure for probability distributions.
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
Within economics the concept of utility is used to model worth or value, but its usage has evolved significantly over time.
In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements).
Vector calculus, or vector analysis, is a branch of mathematics concerned with differentiation and integration of vector fields, primarily in 3-dimensional Euclidean space \mathbb^3.
In thermodynamics, work performed by a system is the energy transferred by the system to its surroundings, that is fully accounted for solely by macroscopic forces exerted on the system by factors external to it, that is to say, factors in its surroundings.
Discrimination information, Information gain, KL distance, KL divergence, KL-distance, KL-divergence, Kl-divergence, Kullback Leibler divergence, Kullback divergence, Kullback information, Kullback-Leibler, Kullback-Leibler Distance, Kullback-Leibler distance, Kullback-Leibler divergence, Kullback-Leibler entropy, Kullback-Leibler information, Kullback-Leibler redundancy, Kullback-Liebler, Kullback-Liebler distance, Kullback-leibler divergence, Kullback–Leibler distance, Kullback–Leibler entropy, Kullback–Leibler information, Kullback–Leibler redundancy, Principle of Minimum Discrimination Information, Relative entropy.