## Brown Corpus

The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) was compiled in the 1960s by Henry Kučera and W. Nelson Francis at Brown University, Providence, Rhode Island as a general corpus (text collection) in the field of corpus linguistics.

## Cross entropy

In information theory, the cross entropy between two probability distributions p and q over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an "unnatural" probability distribution q, rather than the "true" distribution p. The cross entropy for the distributions p and q over a given set is defined as follows: where H(p) is the entropy of p, and D_(p \| q) is the Kullback–Leibler divergence of q from p (also known as the relative entropy of p with respect to q — note the reversal of emphasis).

## English language

English is a West Germanic language that was first spoken in early medieval England and is now a global lingua franca.

## Entropy (information theory)

Information entropy is the average rate at which information is produced by a stochastic source of data.

## Information theory

Information theory studies the quantification, storage, and communication of information.

## Language model

A statistical language model is a probability distribution over sequences of words.

## Natural language processing

Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

## Probability distribution

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

## Random variable

In probability and statistics, a random variable, random quantity, aleatory variable, or stochastic variable is a variable whose possible values are outcomes of a random phenomenon.

## Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of some sample data and similar data from a larger population.

## Text corpus

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed).

## Trigram

Trigrams are a special case of the ''n''-gram, where n is 3.

