Free
Faster access than browser!

# String metric

In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching. [1]

## Approximate string matching

In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly).

## Bhattacharyya distance

In statistics, the Bhattacharyya distance measures the similarity of two probability distributions.

## Computer science

Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.

## Damerau–Levenshtein distance

In information theory and computer science, the Damerau–Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein.) is a string metric for measuring the edit distance between two sequences.

## Data analysis techniques for fraud detection

Fraud is a billion-dollar business and it is increasing every year.

## Data deduplication

In computing, data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data.

## Data integration

Data integration involves combining data residing in different sources and providing users with a unified view of them.

## Data mining

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

## Database

A database is an organized collection of data, stored and accessed electronically.

## Distance

Distance is a numerical measurement of how far apart objects are.

## Fingerprint

A fingerprint in its narrow sense is an impression left by the friction ridges of a human finger.

## Genetic testing

Genetic testing, also known as DNA testing, allows the determination of bloodlines and the genetic diagnosis of vulnerabilities to inherited diseases.

## Hamming distance

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

## Hellinger distance

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions.

## Image analysis

Image analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques.

## Incremental search

In computing, incremental search, incremental find or real-time suggestions is a user interface interaction method to progressively search for and filter through text.

## Information integration

Information integration (II) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations.

## Jaccard index

The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets.

## Jaro–Winkler distance

In computer science and statistics, the Jaro–Winkler distance is a string metric for measuring the edit distance between two sequences.

## JavaScript

JavaScript, often abbreviated as JS, is a high-level, interpreted programming language.

## Jensen–Shannon divergence

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions.

## Kendall tau distance

The Kendall tau rank distance is a metric that counts the number of pairwise disagreements between two ranking lists.

## Knowledge integration

Knowledge integration is the process of synthesizing multiple knowledge models (or representations) into a common model (representation).

## Kullback–Leibler divergence

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution.

## Levenshtein distance

In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences.

## Lexical analysis

In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).

## Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

## Mathematics

Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning") is the study of such topics as quantity, structure, space, and change.

## Metric (mathematics)

In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.

## Most frequent k characters

In information theory, MostFreqKDistance is a string metric technique for quickly estimating how similar two ordered sets or strings are.

## Ontology merging

Ontology merging defines the act of bringing together two conceptually divergent ontologies or the instance data associated to two ontologies.

## Overlap coefficient

The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two sets.

## Plagiarism detection

Plagiarism detection is the process of locating instances of plagiarism within a work or document.

## Sørensen–Dice coefficient

The Sørensen–Dice index, also known by other names (see Name, below), is a statistic used for comparing the similarity of two samples.

## Scala (programming language)

Scala is a general-purpose programming language providing support for functional programming and a strong static type system.

## Simple matching coefficient

The simple matching coefficient (SMC) or Rand similarity coefficient is a statistic used for comparing the similarity and diversity of sample sets.

## String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.

## String-searching algorithm

In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.

## Taxicab geometry

A taxicab geometry is a form of geometry in which the usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates.

## Tf–idf

In information retrieval, tf–idf or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

## Triangle inequality

In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.

## Tversky index

The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype.

## References

Hey! We are on Facebook now! »