An acoustic fingerprint is a condensed digital summary, a fingerprint, deterministically generated from an audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.
agrep (approximate grep) is an open-source approximate string matching program, developed by Udi Manber and Sun Wu between 1988 and 1991, for use with the Unix operating system.
Various anti-spam techniques are used to prevent email spam (unsolicited bulk email).
Big O notation is a mathematical notation that describes the limiting behaviour of a function when the argument tends towards a particular value or infinity.
The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm.
Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.
A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is conceptually similar to the information provided in a search query.
Dynamic programming is both a mathematical optimization method and a computer programming method.
In computational linguistics and computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other.
In computer science and statistics, the Jaro–Winkler distance is a string metric for measuring the edit distance between two sequences.
In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences.
Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data.
Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.
A metric tree is any tree data structure specialized to index data in metric spaces.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech.
The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.
Nucleotides are organic molecules that serve as the monomer units for forming the nucleic acid polymers deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules within all life-forms on Earth.
A pattern is a discernible regularity in the world or in a manmade design.
Plagiarism detection is the process of locating instances of plagiarism within a work or document.
A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications.
A regular expression, regex or regexp (sometimes called a rational expression) is, in theoretical computer science and formal language theory, a sequence of characters that define a search pattern.
Scala is a general-purpose programming language providing support for functional programming and a strong static type system.
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval.
The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences.
Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English.
In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly.
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.
In mathematics and computer science, a string metric (also known as a string similarity metric or string distance function) is a metric that measures distance ("inverse similarity") between two text strings for approximate string matching or comparison and in fuzzy string searching.
A substring is a contiguous sequence of characters within a string.
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values.
Unix (trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, development starting in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.