112 relations: Accord.NET, ALGLIB, Apache Mahout, Apache Spark, Association for Computational Linguistics, Astronomy, Autoencoder, Ayasdi, Bell Labs, BFR algorithm, Bilateral filter, Centroid, Centroidal Voronoi tessellation, Cluster analysis, Color quantization, Computer graphics, Computer vision, CrimeStat, Data mining, Data mining in agriculture, David Mount, Determining the number of clusters in a data set, Discrete & Computational Geometry, ELKI, Euclidean distance, Expectation–maximization algorithm, Feature learning, Free and open-source software, Geostatistics, GNU Octave, Head/tail Breaks, Heuristic (computer science), Hugo Steinhaus, IEEE Transactions on Information Theory, Image segmentation, Independent component analysis, Integer lattice, Iris (plant), Iris flower data set, Jenks natural breaks optimization, Journal of the Royal Statistical Society, Julia (programming language), K q-flats, K-d tree, K-means++, K-medians clustering, K-medoids, K-nearest neighbors algorithm, K-SVD, KNIME, ..., Lecture Notes in Computer Science, Linde–Buzo–Gray algorithm, Linear classifier, Lloyd's algorithm, Local optimum, Machine learning, Machine Learning (journal), MapReduce, Market segmentation, MATLAB, Mean, Mean shift, Medoid, Metric (mathematics), Mixture model, MLPACK (C++ library), Named-entity recognition, Nathan Netanyahu, Natural language processing, Nearest centroid classifier, Normal distribution, NP-hardness, OpenCV, Orange (software), Palette (painting), Partition of a set, Principal component analysis, Proceedings of the Royal Society, Proprietary software, PSPP, Pulse-code modulation, R (programming language), Radial basis function, Radial basis function network, RapidMiner, Restricted Boltzmann machine, Rocchio algorithm, Sampling (statistics), SAP HANA, SAS (software), Scikit-learn, SciPy, Self-organizing map, Semi-supervised learning, Signal processing, Silhouette (clustering), Smoothed analysis, SPSS, Stata, Supervised learning, Symposium on Computational Geometry, Taxicab geometry, Torch (machine learning), Triangle inequality, Unsupervised learning, Variance, Vector quantization, Voronoi diagram, Weka (machine learning), Whitening transformation, Wolfram Mathematica, Worst-case complexity. Expand index (62 more) » « Shrink index
Accord.NET is a framework for scientific computing in.NET.
ALGLIB is a cross-platform open source numerical analysis and data processing library.
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification.
Apache Spark is an open-source cluster-computing framework.
The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation.
Astronomy (from ἀστρονομία) is a natural science that studies celestial objects and phenomena.
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner.
Ayasdi is a machine intelligence software company that offers a software platform and applications to organizations looking to analyze and build predictive models using big data or highly dimensional data sets.
Nokia Bell Labs (formerly named AT&T Bell Laboratories, Bell Telephone Laboratories and Bell Labs) is an American research and scientific development company, owned by Finnish company Nokia.
The BFR algorithm, named after its inventors Bradley, Fayyad and Reina, is a variant of k-means algorithm that is designed to cluster data in a high-dimensional Euclidean space.
A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images.
In mathematics and physics, the centroid or geometric center of a plane figure is the arithmetic mean position of all the points in the shape.
In geometry, a centroidal Voronoi tessellation (CVT) is a special type of Voronoi tessellation or Voronoi diagram.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
In computer graphics, color quantization or color image quantization is a process that reduces the number of distinct colors used in an image, usually with the intention that the new image should be as visually similar as possible to the original image.
Computer graphics are pictures and films created using computers.
Computer vision is a field that deals with how computers can be made for gaining high-level understanding from digital images or videos.
CrimeStat is a crime mapping software program.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data mining in agriculture is a very recent research topic.
David Mount is a professor at University of Maryland Department of Computer Science (College Park Campus) whose research is in computational geometry.
Determining the number of clusters in a data set, a quantity often labelled k as in the ''k''-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem.
Discrete & Computational Geometry is a peer-reviewed mathematics journal published quarterly by Springer.
ELKI (for Environment for DeveLoping KDD-Applications Supported by Index-Structures) is a knowledge discovery in databases (KDD, "data mining") software framework developed for use in research and teaching originally at the database systems research unit of Professor Hans-Peter Kriegel at the Ludwig Maximilian University of Munich, Germany.
In mathematics, the Euclidean distance or Euclidean metric is the "ordinary" straight-line distance between two points in Euclidean space.
In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
In machine learning, feature learning or representation learning is a set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data.
Free and open-source software (FOSS) is software that can be classified as both free software and open-source software.
Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets.
GNU Octave is software featuring a high-level programming language, primarily intended for numerical computations.
Head/tail breaks is a clustering algorithm scheme for data with a heavy-tailed distribution such as power laws and lognormal distributions.
In computer science, artificial intelligence, and mathematical optimization, a heuristic (from Greek εὑρίσκω "I find, discover") is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution.
Władysław Hugo Dionizy Steinhaus (January 14, 1887 – February 25, 1972) was a Jewish-Polish mathematician and educator.
IEEE Transactions on Information Theory is a monthly peer-reviewed scientific journal published by the IEEE Information Theory Society.
In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as super-pixels).
In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents.
In mathematics, the n-dimensional integer lattice (or cubic lattice), denoted Zn, is the lattice in the Euclidean space Rn whose lattice points are ''n''-tuples of integers.
Iris is a genus of 260–300 species of flowering plants with showy flowers.
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes.
The Journal of the Royal Statistical Society is a peer-reviewed scientific journal of statistics.
Julia is a high-level dynamic programming language designed to address the needs of high-performance numerical analysis and computational science, without the typical need of separate compilation to be fast, while also being effective for general-purpose programming, web use or as a specification language.
In data mining and machine learning, k q-flats algorithm is an iterative method which aims to partition m observations into k clusters where each cluster is close to a q-flat, where q is a given integer.
In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space.
In data mining, k-means++ is an algorithm for choosing the initial values (or "seeds") for the ''k''-means clustering algorithm.
In statistics and data mining, k-medians clustering is a cluster analysis algorithm.
The -medoids algorithm is a clustering algorithm related to the k-means algorithm and the medoidshift algorithm.
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.
In applied mathematics, K-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform.
Springer Lecture Notes in Computer Science (LNCS) is a series of computer science books published by Springer Science+Business Media (formerly Springer-Verlag) since 1973.
The Linde–Buzo–Gray algorithm (introduced by Yoseph Linde, Andrés Buzo and Robert M. Gray in 1980) is a vector quantization algorithm to derive a good codebook.
In the field of machine learning, the goal of statistical classification is to use an object's characteristics to identify which class (or group) it belongs to.
In computer science and electrical engineering, Lloyd's algorithm, also known as Voronoi iteration or relaxation, is an algorithm named after Stuart P. Lloyd for finding evenly spaced sets of points in subsets of Euclidean spaces and partitions of these subsets into well-shaped and uniformly sized convex cells.
In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions.
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.
Machine Learning is a peer-reviewed scientific journal, published since 1986.
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.
Market segmentation is the process of dividing a broad consumer or business market, normally consisting of existing and potential customers, into sub-groups of consumers (known as segments) based on some type of shared characteristics.
MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and proprietary programming language developed by MathWorks.
In mathematics, mean has several different definitions depending on the context.
Mean shift is a non-parametric feature-space analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm.
Medoids are representative objects of a data set or a cluster with a data set whose average dissimilarity to all the objects in the cluster is minimal.
In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.
In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs.
mlpack is a machine learning software library for C++, built on top of the Armadillo library.
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
Nathan S. Netanyahu (נָתָן נְתַנְיָהוּ; born 28 November 1951) is an Israeli computer scientist, a professor of computer science at Bar-Ilan University.
Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.
In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation.
In probability theory, the normal (or Gaussian or Gauss or Laplace–Gauss) distribution is a very common continuous probability distribution.
NP-hardness (''n''on-deterministic ''p''olynomial-time hardness), in computational complexity theory, is the defining property of a class of problems that are, informally, "at least as hard as the hardest problems in NP".
OpenCV (Open Source Computer Vision) is a library of programming functions mainly aimed at real-time computer vision.
Orange is an open-source data visualization, machine learning and data mining toolkit.
A palette, in the original sense of the word, is a rigid, flat surface on which a painter arranges and mixes paints.
In mathematics, a partition of a set is a grouping of the set's elements into non-empty subsets, in such a way that every element is included in one and only one of the subsets.
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
Proceedings of the Royal Society is the parent title of two scientific journals published by the Royal Society.
Proprietary software is non-free computer software for which the software's publisher or another person retains intellectual property rights—usually copyright of the source code, but sometimes patent rights.
PSPP is a free software application for analysis of sampled data, intended as a free alternative for IBM SPSS Statistics.
Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals.
R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.
A radial basis function (RBF) is a real-valued function whose value depends only on the distance from the origin, so that \phi\left(\mathbf\right).
In the field of mathematical modeling, a radial basis function network is an artificial neural network that uses radial basis functions as activation functions.
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics.
A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.
The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System which was developed 1960-1964.
In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population.
SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE.
SAS (previously "Statistical Analysis System") is a software suite developed by SAS Institute for advanced analytics, multivariate analyses, business intelligence, data management, and predictive analytics.
Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.
SciPy (pronounced /ˈsaɪpaɪ'/ "Sigh Pie") is a free and open-source Python library used for scientific computing and technical computing.
A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction.
Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data.
Signal processing concerns the analysis, synthesis, and modification of signals, which are broadly defined as functions conveying "information about the behavior or attributes of some phenomenon", such as sound, images, and biological measurements.
Silhouette refers to a method of interpretation and validation of consistency within clusters of data.
Smoothed analysis is a way of measuring the complexity of an algorithm.
SPSS Statistics is a software package used for interactive, or batched, statistical analysis.
Stata is a general-purpose statistical software package created in 1985 by StataCorp.
Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.
The Annual Symposium on Computational Geometry (SoCG) is an academic conference in computational geometry.
A taxicab geometry is a form of geometry in which the usual distance function or metric of Euclidean geometry is replaced by a new metric in which the distance between two points is the sum of the absolute differences of their Cartesian coordinates.
Torch is an open source machine learning library, a scientific computing framework, and a script language based on the Lua programming language.
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.
Unsupervised machine learning is the machine learning task of inferring a function that describes the structure of "unlabeled" data (i.e. data that has not been classified or categorized).
In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean.
Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors.
In mathematics, a Voronoi diagram is a partitioning of a plane into regions based on distance to points in a specific subset of the plane.
Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand.
A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.
Wolfram Mathematica (usually termed Mathematica) is a modern technical computing system spanning most areas of technical computing — including neural networks, machine learning, image processing, geometry, data science, visualizations, and others.
In computer science, the worst-case complexity (usually denoted in asymptotic notation) measures the resources (e.g. running time, memory) an algorithm requires in the worst-case.