Communication
Free
Faster access than browser!

# Approximate string matching

In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). [1]

## Acoustic fingerprint

An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.

## Agrep

agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system.

## Anti-spam techniques

Various anti-spam techniques are used to prevent email spam (unsolicited bulk email).

## Big O notation

Big O notation is a mathematical notation that describes the limiting behaviour of a function when the argument tends towards a particular value or infinity.

## Bitap algorithm

The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm.

## Computer science

Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.

## Concept search

A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is conceptually similar to the information provided in a search query.

## Dynamic programming

Dynamic programming is both a mathematical optimization method and a computer programming method.

## Edit distance

In computational linguistics and computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.

## Jaro–Winkler distance

In computer science and statistics, the Jaro–Winkler distance is a string metric for measuring the edit distance between two sequences.

## JavaScript

JavaScript, often abbreviated as JS, is a high-level, interpreted programming language.

## Levenshtein distance

In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences.

## Locality-sensitive hashing

Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data.

## Metaphone

Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.

## Metric tree

A metric tree is any tree data structure specialized to index data in metric spaces.

## N-gram

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech.

## Needleman–Wunsch algorithm

The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.

## Nucleotide

Nucleotides are organic molecules that serve as the monomer units for forming the nucleic acid polymers deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all life-forms on Earth.

## Pattern

A pattern is a discernible regularity in the world or in a manmade design.

## Plagiarism detection

Plagiarism detection is the process of locating instances of plagiarism within a work or document.

## Programming tool

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications.

## Regular expression

A regular expression, regex or regexp (sometimes called a rational expression) is, in theoretical computer science and formal language theory, a sequence of characters that define a search pattern.

## Scala (programming language)

Scala is a general-purpose programming language providing support for functional programming and a strong static type system.

## Search engine indexing

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval.

## Smith–Waterman algorithm

The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences.

## Soundex

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.

## Spell checker

In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly.

## String (computer science)

In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.

## String metric

In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching.

## Substring

A substring is a contiguous sequence of characters within a string.

## Suffix tree

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values.

## Unix

Unix (trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, development starting in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

## References

Hey! We are on Facebook now! »