Get it on Google Play
New! Download Unionpedia on your Android™ device!
Faster access than browser!

Cluster analysis

Index Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). [1]

169 relations: Adjusted mutual information, Affinity propagation, Algorithm, Animal, Anomaly detection, Artificial neural network, Association for Computing Machinery, Association for the Advancement of Artificial Intelligence, Balanced clustering, Biclustering, Big data, Bioinformatics, Biology, BIRCH, Canopy clustering algorithm, Centroid, Climatology, Clique (graph theory), Cluster-weighted modeling, Clustering high-dimensional data, Cohen's kappa, Community, Complete-linkage clustering, Computer graphics, Computer science, Conceptual clustering, Confusion matrix, Consensus clustering, Constrained clustering, Consumer, Correlation and dependence, Correlation clustering, Curse of dimensionality, Customer, Data analysis, Data compression, Data mining, Data stream clustering, Davies–Bouldin index, DBSCAN, Dendrogram, Determining the number of clusters in a data set, Deterministic algorithm, Digital data, Dimensionality reduction, DNA annotation, DNA microarray, Dunn index, Ecology, Edge detection, ..., Educational data mining, Empirical distribution function, Enzyme, Evolutionary algorithm, Evolutionary biology, Expectation–maximization algorithm, Expressed sequence tag, F1 score, False positives and false negatives, Flickr, Fowlkes–Mallows index, Fuzzy clustering, Gene, Gene duplication, Genomics, Genotype, Gold standard (test), Google, Graph (discrete mathematics), Hans-Peter Kriegel, HCS clustering algorithm, Heidelberg University, Hierarchical clustering, High-dimensional statistics, Hopkins statistic, Human genetic clustering, Image, Image analysis, Image segmentation, Independent component analysis, Information retrieval, Information theory, Jaccard index, Journal of the American Statistical Association, K-means clustering, K-means++, K-medians clustering, K-medoids, Kernel density estimation, Knowledge extraction, Latent class model, List of gene families, Lloyd's algorithm, Local optimum, Machine learning, Markedness, Market research, Market segmentation, Marketing, Markov chain Monte Carlo, Mathematical chemistry, Matthews correlation coefficient, Mean shift, Median, Medical imaging, Medicine, Message passing, Metabolic pathway, Metric (mathematics), Multi-objective optimization, Multidimensional scaling, Multimodal distribution, Multivariate normal distribution, Mutual information, Nearest neighbor search, Neighbourhood components analysis, Neural network, New product development, Normal distribution, NP-hardness, Numerical taxonomy, OPTICS algorithm, Outline of object recognition, Overfitting, Parallel coordinates, Pattern recognition, Personality psychology, Phylogenetic tree, Plant, Population, Positioning (marketing), Positron emission tomography, Precision and recall, Principal component analysis, Probability distribution, R-tree, Rand index, Raymond Cattell, Recommender system, Robert Tryon, Sørensen–Dice coefficient, Self-organizing map, Sequence analysis, Sequence clustering, SIGKDD, Silhouette (clustering), Single-linkage clustering, Social network, Software evolution, Spectral clustering, Statistical classification, Statistical physics, Statistics, Stock keeping unit, Structured data analysis (statistics), SUBCLU, Supervised learning, Survey methodology, Systematics, Tissue (biology), Topological index, Transcriptomics technologies, Unsupervised learning, UPGMA, Variation of information, Voronoi diagram, World Wide Web, Yippy, Youden's J statistic. Expand index (119 more) »

Adjusted mutual information

In probability theory and information theory, adjusted mutual information, a variation of mutual information may be used for comparing clusterings.

New!!: Cluster analysis and Adjusted mutual information · See more »

Affinity propagation

In statistics and data mining, affinity propagation (AP) is a clustering algorithm based on the concept of "message passing" between data points.

New!!: Cluster analysis and Affinity propagation · See more »


In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems.

New!!: Cluster analysis and Algorithm · See more »


Animals are multicellular eukaryotic organisms that form the biological kingdom Animalia.

New!!: Cluster analysis and Animal · See more »

Anomaly detection

In data mining, anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

New!!: Cluster analysis and Anomaly detection · See more »

Artificial neural network

Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

New!!: Cluster analysis and Artificial neural network · See more »

Association for Computing Machinery

The Association for Computing Machinery (ACM) is an international learned society for computing.

New!!: Cluster analysis and Association for Computing Machinery · See more »

Association for the Advancement of Artificial Intelligence

The Association for the Advancement of Artificial Intelligence (AAAI) is an international, nonprofit, scientific society devoted to promote research in, and responsible use of, artificial intelligence.

New!!: Cluster analysis and Association for the Advancement of Artificial Intelligence · See more »

Balanced clustering

Balanced clustering is a special case of clustering where, in the strictest sense, cluster sizes are constrained to \lfloor \rfloor or \lceil\rceil, where n is the number of points and k is the number of clusters.

New!!: Cluster analysis and Balanced clustering · See more »


Biclustering, block clustering, co-clustering, or two-mode clustering is a data mining technique which allows simultaneous clustering of the rows and columns of a matrix.

New!!: Cluster analysis and Biclustering · See more »

Big data

Big data is data sets that are so big and complex that traditional data-processing application software are inadequate to deal with them.

New!!: Cluster analysis and Big data · See more »


Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data.

New!!: Cluster analysis and Bioinformatics · See more »


Biology is the natural science that studies life and living organisms, including their physical structure, chemical composition, function, development and evolution.

New!!: Cluster analysis and Biology · See more »


BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets.

New!!: Cluster analysis and BIRCH · See more »

Canopy clustering algorithm

The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000.

New!!: Cluster analysis and Canopy clustering algorithm · See more »


In mathematics and physics, the centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the shape.

New!!: Cluster analysis and Centroid · See more »


Climatology (from Greek κλίμα, klima, "place, zone"; and -λογία, -logia) or climate science is the scientific study of climate, scientifically defined as weather conditions averaged over a period of time.

New!!: Cluster analysis and Climatology · See more »

Clique (graph theory)

In the mathematical area of graph theory, a clique is a subset of vertices of an undirected graph such that every two distinct vertices in the clique are adjacent; that is, its induced subgraph is complete.

New!!: Cluster analysis and Clique (graph theory) · See more »

Cluster-weighted modeling

In data mining, cluster-weighted modeling (CWM) is an algorithm-based approach to non-linear prediction of outputs (dependent variables) from inputs (independent variables) based on density estimation using a set of models (clusters) that are each notionally appropriate in a sub-region of the input space.

New!!: Cluster analysis and Cluster-weighted modeling · See more »

Clustering high-dimensional data

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions.

New!!: Cluster analysis and Clustering high-dimensional data · See more »

Cohen's kappa

Cohen's kappa coefficient (κ) is a statistic which measures inter-rater agreement for qualitative (categorical) items.

New!!: Cluster analysis and Cohen's kappa · See more »


A community is a small or large social unit (a group of living things) that has something in common, such as norms, religion, values, or identity.

New!!: Cluster analysis and Community · See more »

Complete-linkage clustering

Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering.

New!!: Cluster analysis and Complete-linkage clustering · See more »

Computer graphics

Computer graphics are pictures and films created using computers.

New!!: Cluster analysis and Computer graphics · See more »

Computer science

Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.

New!!: Cluster analysis and Computer science · See more »

Conceptual clustering

Conceptual clustering is a machine learning paradigm for unsupervised classification developed mainly during the 1980s.

New!!: Cluster analysis and Conceptual clustering · See more »

Confusion matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix).

New!!: Cluster analysis and Confusion matrix · See more »

Consensus clustering

Clustering is the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.

New!!: Cluster analysis and Consensus clustering · See more »

Constrained clustering

In computer science, constrained clustering is a class of semi-supervised learning algorithms.

New!!: Cluster analysis and Constrained clustering · See more »


A consumer is a person or organization that use economic services or commodities.

New!!: Cluster analysis and Consumer · See more »

Correlation and dependence

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data.

New!!: Cluster analysis and Correlation and dependence · See more »

Correlation clustering

Clustering is the problem of partitioning data points into groups based on their similarity.

New!!: Cluster analysis and Correlation clustering · See more »

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

New!!: Cluster analysis and Curse of dimensionality · See more »


In sales, commerce and economics, a customer (sometimes known as a client, buyer, or purchaser) is the recipient of a good, service, product or an idea - obtained from a seller, vendor, or supplier via a financial transaction or exchange for money or some other valuable consideration.

New!!: Cluster analysis and Customer · See more »

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

New!!: Cluster analysis and Data analysis · See more »

Data compression

In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation.

New!!: Cluster analysis and Data compression · See more »

Data mining

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.

New!!: Cluster analysis and Data mining · See more »

Data stream clustering

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc.

New!!: Cluster analysis and Data stream clustering · See more »

Davies–Bouldin index

The Davies–Bouldin index (DBI) (introduced by David L. Davies and Donald W. Bouldin in 1979) is a metric for evaluating clustering algorithms.

New!!: Cluster analysis and Davies–Bouldin index · See more »


Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.

New!!: Cluster analysis and DBSCAN · See more »


A dendrogram (from Greek dendro "tree" and gramma "drawing") is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering.

New!!: Cluster analysis and Dendrogram · See more »

Determining the number of clusters in a data set

Determining the number of clusters in a data set, a quantity often labelled k as in the ''k''-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem.

New!!: Cluster analysis and Determining the number of clusters in a data set · See more »

Deterministic algorithm

In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.

New!!: Cluster analysis and Deterministic algorithm · See more »

Digital data

Digital data, in information theory and information systems, is the discrete, discontinuous representation of information or works.

New!!: Cluster analysis and Digital data · See more »

Dimensionality reduction

In statistics, machine learning, and information theory, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

New!!: Cluster analysis and Dimensionality reduction · See more »

DNA annotation

DNA annotation or genome annotation is the process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do.

New!!: Cluster analysis and DNA annotation · See more »

DNA microarray

A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface.

New!!: Cluster analysis and DNA microarray · See more »

Dunn index

The Dunn index (DI) (introduced by J. C. Dunn in 1974) is a metric for evaluating clustering algorithms.

New!!: Cluster analysis and Dunn index · See more »


Ecology (from οἶκος, "house", or "environment"; -λογία, "study of") is the branch of biology which studies the interactions among organisms and their environment.

New!!: Cluster analysis and Ecology · See more »

Edge detection

Edge detection includes a variety of mathematical methods that aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities.

New!!: Cluster analysis and Edge detection · See more »

Educational data mining

Educational data mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems).

New!!: Cluster analysis and Educational data mining · See more »

Empirical distribution function

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample.

New!!: Cluster analysis and Empirical distribution function · See more »


Enzymes are macromolecular biological catalysts.

New!!: Cluster analysis and Enzyme · See more »

Evolutionary algorithm

In artificial intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm.

New!!: Cluster analysis and Evolutionary algorithm · See more »

Evolutionary biology

Evolutionary biology is the subfield of biology that studies the evolutionary processes that produced the diversity of life on Earth, starting from a single common ancestor.

New!!: Cluster analysis and Evolutionary biology · See more »

Expectation–maximization algorithm

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.

New!!: Cluster analysis and Expectation–maximization algorithm · See more »

Expressed sequence tag

In genetics, an expressed sequence tag (EST) is a short sub-sequence of a cDNA sequence.

New!!: Cluster analysis and Expressed sequence tag · See more »

F1 score

In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy.

New!!: Cluster analysis and F1 score · See more »

False positives and false negatives

In medical testing, and more generally in binary classification, a false positive is an error in data reporting in which a test result improperly indicates presence of a condition, such as a disease (the result is positive), when in reality it is not present, while a false negative is an error in which a test result improperly indicates no presence of a condition (the result is negative), when in reality it is present.

New!!: Cluster analysis and False positives and false negatives · See more »


Flickr (pronounced "flicker") is an image hosting service and video hosting service.

New!!: Cluster analysis and Flickr · See more »

Fowlkes–Mallows index

Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm).

New!!: Cluster analysis and Fowlkes–Mallows index · See more »

Fuzzy clustering

Fuzzy clustering (also referred to as soft clustering) is a form of clustering in which each data point can belong to more than one cluster.

New!!: Cluster analysis and Fuzzy clustering · See more »


In biology, a gene is a sequence of DNA or RNA that codes for a molecule that has a function.

New!!: Cluster analysis and Gene · See more »

Gene duplication

Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution.

New!!: Cluster analysis and Gene duplication · See more »


Genomics is an interdisciplinary field of science focusing on the structure, function, evolution, mapping, and editing of genomes.

New!!: Cluster analysis and Genomics · See more »


The genotype is the part of the genetic makeup of a cell, and therefore of an organism or individual, which determines one of its characteristics (phenotype).

New!!: Cluster analysis and Genotype · See more »

Gold standard (test)

In medicine and statistics, gold standard test is usually diagnostic test or benchmark that is the best available under reasonable conditions.

New!!: Cluster analysis and Gold standard (test) · See more »


Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware.

New!!: Cluster analysis and Google · See more »

Graph (discrete mathematics)

In mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related".

New!!: Cluster analysis and Graph (discrete mathematics) · See more »

Hans-Peter Kriegel

Hans-Peter Kriegel (1 October 1948, Germany) is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science.

New!!: Cluster analysis and Hans-Peter Kriegel · See more »

HCS clustering algorithm

The (also known as the HCS algorithm, and other names such as Highly Connected Clusters/Components/Kernels) is an algorithm based on graph connectivity for Cluster analysis, by first representing the similarity data in a similarity graph, and afterwards finding all the highly connected subgraphs as clusters.

New!!: Cluster analysis and HCS clustering algorithm · See more »

Heidelberg University

Heidelberg University (Ruprecht-Karls-Universität Heidelberg; Universitas Ruperto Carola Heidelbergensis) is a public research university in Heidelberg, Baden-Württemberg, Germany.

New!!: Cluster analysis and Heidelberg University · See more »

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.

New!!: Cluster analysis and Hierarchical clustering · See more »

High-dimensional statistics

In statistical theory, the field of high-dimensional statistics studies data whose dimension is larger than dimensions considered in classical multivariate analysis.

New!!: Cluster analysis and High-dimensional statistics · See more »

Hopkins statistic

The Hopkins statistic (introduced by Brian Hopkins and John Gordon Skellam) is a way of measuring the cluster tendency of a data set.

New!!: Cluster analysis and Hopkins statistic · See more »

Human genetic clustering

Human genetic clustering is the degree to which human genetic variation can be partitioned into a small number of groups or clusters.

New!!: Cluster analysis and Human genetic clustering · See more »


An image (from imago) is an artifact that depicts visual perception, for example, a photo or a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person, thus providing a depiction of it.

New!!: Cluster analysis and Image · See more »

Image analysis

Image analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques.

New!!: Cluster analysis and Image analysis · See more »

Image segmentation

In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as super-pixels).

New!!: Cluster analysis and Image segmentation · See more »

Independent component analysis

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents.

New!!: Cluster analysis and Independent component analysis · See more »

Information retrieval

Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection of information resources.

New!!: Cluster analysis and Information retrieval · See more »

Information theory

Information theory studies the quantification, storage, and communication of information.

New!!: Cluster analysis and Information theory · See more »

Jaccard index

The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets.

New!!: Cluster analysis and Jaccard index · See more »

Journal of the American Statistical Association

The Journal of the American Statistical Association (JASA) is the primary journal published by the American Statistical Association, the main professional body for statisticians in the United States.

New!!: Cluster analysis and Journal of the American Statistical Association · See more »

K-means clustering

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.

New!!: Cluster analysis and K-means clustering · See more »


In data mining, k-means++ is an algorithm for choosing the initial values (or "seeds") for the ''k''-means clustering algorithm.

New!!: Cluster analysis and K-means++ · See more »

K-medians clustering

In statistics and data mining, k-medians clustering is a cluster analysis algorithm.

New!!: Cluster analysis and K-medians clustering · See more »


The -medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm.

New!!: Cluster analysis and K-medoids · See more »

Kernel density estimation

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable.

New!!: Cluster analysis and Kernel density estimation · See more »

Knowledge extraction

Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources.

New!!: Cluster analysis and Knowledge extraction · See more »

Latent class model

In statistics, a latent class model (LCM) relates a set of observed (usually discrete) multivariate variables to a set of latent variables.

New!!: Cluster analysis and Latent class model · See more »

List of gene families

This is a list of gene families or gene complexes, that is sets of genes which occur across a number of different species which often serve similar biological functions.

New!!: Cluster analysis and List of gene families · See more »

Lloyd's algorithm

In computer science and electrical engineering, Lloyd's algorithm, also known as Voronoi iteration or relaxation, is an algorithm named after Stuart P. Lloyd for finding evenly spaced sets of points in subsets of Euclidean spaces and partitions of these subsets into well-shaped and uniformly sized convex cells.

New!!: Cluster analysis and Lloyd's algorithm · See more »

Local optimum

In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions.

New!!: Cluster analysis and Local optimum · See more »

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

New!!: Cluster analysis and Machine learning · See more »


In linguistics and social sciences, markedness is the state of standing out as unusual or divergent in comparison to a more common or regular form.

New!!: Cluster analysis and Markedness · See more »

Market research

Market research (also in some contexts known as industrial research) is any organized effort to gather information about target markets or customers.

New!!: Cluster analysis and Market research · See more »

Market segmentation

Market segmentation is the process of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers (known as segments) based on some type of shared characteristics.

New!!: Cluster analysis and Market segmentation · See more »


Marketing is the study and management of exchange relationships.

New!!: Cluster analysis and Marketing · See more »

Markov chain Monte Carlo

In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution.

New!!: Cluster analysis and Markov chain Monte Carlo · See more »

Mathematical chemistry

Mathematical chemistry is the area of research engaged in novel applications of mathematics to chemistry; it concerns itself principally with the mathematical modeling of chemical phenomena.

New!!: Cluster analysis and Mathematical chemistry · See more »

Matthews correlation coefficient

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975.

New!!: Cluster analysis and Matthews correlation coefficient · See more »

Mean shift

Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm.

New!!: Cluster analysis and Mean shift · See more »


The median is the value separating the higher half of a data sample, a population, or a probability distribution, from the lower half.

New!!: Cluster analysis and Median · See more »

Medical imaging

Medical imaging is the technique and process of creating visual representations of the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology).

New!!: Cluster analysis and Medical imaging · See more »


Medicine is the science and practice of the diagnosis, treatment, and prevention of disease.

New!!: Cluster analysis and Medicine · See more »

Message passing

In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer.

New!!: Cluster analysis and Message passing · See more »

Metabolic pathway

In biochemistry, a metabolic pathway is a linked series of chemical reactions occurring within a cell.

New!!: Cluster analysis and Metabolic pathway · See more »

Metric (mathematics)

In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.

New!!: Cluster analysis and Metric (mathematics) · See more »

Multi-objective optimization

Multi-objective optimization (also known as multi-objective programming, vector optimization, multicriteria optimization, multiattribute optimization or Pareto optimization) is an area of multiple criteria decision making, that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously.

New!!: Cluster analysis and Multi-objective optimization · See more »

Multidimensional scaling

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset.

New!!: Cluster analysis and Multidimensional scaling · See more »

Multimodal distribution

In statistics, a bimodal distribution is a continuous probability distribution with two different modes.

New!!: Cluster analysis and Multimodal distribution · See more »

Multivariate normal distribution

In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions.

New!!: Cluster analysis and Multivariate normal distribution · See more »

Mutual information

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables.

New!!: Cluster analysis and Mutual information · See more »

Nearest neighbor search

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point.

New!!: Cluster analysis and Nearest neighbor search · See more »

Neighbourhood components analysis

Neighbourhood components analysis is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data.

New!!: Cluster analysis and Neighbourhood components analysis · See more »

Neural network

The term neural network was traditionally used to refer to a network or circuit of neurons.

New!!: Cluster analysis and Neural network · See more »

New product development

In business and engineering, new product development (NPD) covers the complete process of bringing a new product to market.

New!!: Cluster analysis and New product development · See more »

Normal distribution

In probability theory, the normal (or Gaussian or Gauss or Laplace–Gauss) distribution is a very common continuous probability distribution.

New!!: Cluster analysis and Normal distribution · See more »


NP-hardness (''n''on-deterministic ''p''olynomial-time hardness), in computational complexity theory, is the defining property of a class of problems that are, informally, "at least as hard as the hardest problems in NP".

New!!: Cluster analysis and NP-hardness · See more »

Numerical taxonomy

Numerical taxonomy is a classification system in biological systematics which deals with the grouping by numerical methods of taxonomic units based on their character states.

New!!: Cluster analysis and Numerical taxonomy · See more »

OPTICS algorithm

Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data.

New!!: Cluster analysis and OPTICS algorithm · See more »

Outline of object recognition

The following outline is provided as an overview of and topical guide to object recognition: Object recognition – technology in the field of computer vision for finding and identifying objects in an image or video sequence.

New!!: Cluster analysis and Outline of object recognition · See more »


In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".

New!!: Cluster analysis and Overfitting · See more »

Parallel coordinates

Parallel coordinates are a common way of visualizing high-dimensional geometry and analyzing multivariate data.

New!!: Cluster analysis and Parallel coordinates · See more »

Pattern recognition

Pattern recognition is a branch of machine learning that focuses on the recognition of patterns and regularities in data, although it is in some cases considered to be nearly synonymous with machine learning.

New!!: Cluster analysis and Pattern recognition · See more »

Personality psychology

Personality psychology is a branch of psychology that studies personality and its variation among individuals.

New!!: Cluster analysis and Personality psychology · See more »

Phylogenetic tree

A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the evolutionary relationships among various biological species or other entities—their phylogeny—based upon similarities and differences in their physical or genetic characteristics.

New!!: Cluster analysis and Phylogenetic tree · See more »


Plants are mainly multicellular, predominantly photosynthetic eukaryotes of the kingdom Plantae.

New!!: Cluster analysis and Plant · See more »


In biology, a population is all the organisms of the same group or species, which live in a particular geographical area, and have the capability of interbreeding.

New!!: Cluster analysis and Population · See more »

Positioning (marketing)

Positioning refers to the place that a brand occupies in the mind of the customer and how it is distinguished from products from competitors.

New!!: Cluster analysis and Positioning (marketing) · See more »

Positron emission tomography

Positron-emission tomography (PET) is a nuclear medicine functional imaging technique that is used to observe metabolic processes in the body as an aid to the diagnosis of disease.

New!!: Cluster analysis and Positron emission tomography · See more »

Precision and recall

In pattern recognition, information retrieval and binary classification, precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances.

New!!: Cluster analysis and Precision and recall · See more »

Principal component analysis

Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

New!!: Cluster analysis and Principal component analysis · See more »

Probability distribution

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment.

New!!: Cluster analysis and Probability distribution · See more »


R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.

New!!: Cluster analysis and R-tree · See more »

Rand index

The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings.

New!!: Cluster analysis and Rand index · See more »

Raymond Cattell

Raymond Bernard Cattell (20 March 1905 – 2 February 1998) was a British and American psychologist, known for his psychometric research into intrapersonal psychological structure.

New!!: Cluster analysis and Raymond Cattell · See more »

Recommender system

A recommender system or a recommendation system (sometimes replacing "system" with a synonym such as platform or engine) is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item.

New!!: Cluster analysis and Recommender system · See more »

Robert Tryon

Robert Choate Tryon (September 4, 1901 – September 27, 1967) was an American behavioral psychologist, who pioneered the study of hereditary trait inheritance and learning in animals.

New!!: Cluster analysis and Robert Tryon · See more »

Sørensen–Dice coefficient

The Sørensen–Dice index, also known by other names (see Name, below), is a statistic used for comparing the similarity of two samples.

New!!: Cluster analysis and Sørensen–Dice coefficient · See more »

Self-organizing map

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction.

New!!: Cluster analysis and Self-organizing map · See more »

Sequence analysis

In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution.

New!!: Cluster analysis and Sequence analysis · See more »

Sequence clustering

In bioinformatics, sequence clustering algorithms attempt to group biological sequences that are somehow related.

New!!: Cluster analysis and Sequence clustering · See more »


SIGKDD is the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining.

New!!: Cluster analysis and SIGKDD · See more »

Silhouette (clustering)

Silhouette refers to a method of interpretation and validation of consistency within clusters of data.

New!!: Cluster analysis and Silhouette (clustering) · See more »

Single-linkage clustering

In statistics, single-linkage clustering is one of several methods of hierarchical clustering.

New!!: Cluster analysis and Single-linkage clustering · See more »

Social network

A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors.

New!!: Cluster analysis and Social network · See more »

Software evolution

Software evolution is the term used in software engineering (specifically software maintenance) to refer to the process of developing software initially, then repeatedly updating it for various reasons.

New!!: Cluster analysis and Software evolution · See more »

Spectral clustering

In multivariate statistics and the clustering of data, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions.

New!!: Cluster analysis and Spectral clustering · See more »

Statistical classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

New!!: Cluster analysis and Statistical classification · See more »

Statistical physics

Statistical physics is a branch of physics that uses methods of probability theory and statistics, and particularly the mathematical tools for dealing with large populations and approximations, in solving physical problems.

New!!: Cluster analysis and Statistical physics · See more »


Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data.

New!!: Cluster analysis and Statistics · See more »

Stock keeping unit

In the field of inventory management, a stock keeping unit (SKU) is a distinct type of item for sale, such as a product or service, and all attributes associated with the item type that distinguish it from other item types.

New!!: Cluster analysis and Stock keeping unit · See more »

Structured data analysis (statistics)

Structured data analysis is the statistical data analysis of structured data.

New!!: Cluster analysis and Structured data analysis (statistics) · See more »


SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.

New!!: Cluster analysis and SUBCLU · See more »

Supervised learning

Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.

New!!: Cluster analysis and Supervised learning · See more »

Survey methodology

A field of applied statistics of human research surveys, survey methodology studies the sampling of individual units from a population and associated techniques of survey data collection, such as questionnaire construction and methods for improving the number and accuracy of responses to surveys.

New!!: Cluster analysis and Survey methodology · See more »


Biological systematics is the study of the diversification of living forms, both past and present, and the relationships among living things through time.

New!!: Cluster analysis and Systematics · See more »

Tissue (biology)

In biology, tissue is a cellular organizational level between cells and a complete organ.

New!!: Cluster analysis and Tissue (biology) · See more »

Topological index

In the fields of chemical graph theory, molecular topology, and mathematical chemistry, a topological index also known as a connectivity index is a type of a molecular descriptor that is calculated based on the molecular graph of a chemical compound.

New!!: Cluster analysis and Topological index · See more »

Transcriptomics technologies

Transcriptomics technologies are the techniques used to study an organism’s transcriptome, the sum of all of its RNA transcripts.

New!!: Cluster analysis and Transcriptomics technologies · See more »

Unsupervised learning

Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of "unlabeled" data (i.e. data that has not been classified or categorized).

New!!: Cluster analysis and Unsupervised learning · See more »


UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a simple agglomerative (bottom-up) hierarchical clustering method.

New!!: Cluster analysis and UPGMA · See more »

Variation of information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements).

New!!: Cluster analysis and Variation of information · See more »

Voronoi diagram

In mathematics, a Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane.

New!!: Cluster analysis and Voronoi diagram · See more »

World Wide Web

The World Wide Web (abbreviated WWW or the Web) is an information space where documents and other web resources are identified by Uniform Resource Locators (URLs), interlinked by hypertext links, and accessible via the Internet.

New!!: Cluster analysis and World Wide Web · See more »


Yippy (formerly Clusty) is a metasearch engine developed by Vivísimo before Vivisimo was later acquired by IBM and renamed IBM Watson Explorer which offers clusters of results.

New!!: Cluster analysis and Yippy · See more »

Youden's J statistic

Youden's J statistic (also called Youden's index) is a single statistic that captures the performance of a dichotomous diagnostic test.

New!!: Cluster analysis and Youden's J statistic · See more »

Redirects here:

Agglomerative clustering, Cluster (statistics), Cluster Analysis, Cluster analyses, Cluster tendency, Cluster validation, Clustered data, Clustering algorithm, Clustering metric, Data Clustering, Data clustering, Density-based clustering, Soft clustering.


[1] https://en.wikipedia.org/wiki/Cluster_analysis

Hey! We are on Facebook now! »