82 relations: Affero General Public License, Anomaly detection, Apache Batik, Apriori algorithm, Arthur Zimek, Bicycle-sharing system, Business intelligence, Canopy clustering algorithm, Cascading Style Sheets, Cluster analysis, Clustering high-dimensional data, Column family, Copyleft, Correlation clustering, Data mining, Data science, Database, Database index, DBSCAN, Dynamic time warping, Evaluation, Expectation–maximization algorithm, For loop, Garbage collection (computer science), GNU Affero General Public License, Hans-Peter Kriegel, Heidelberg University, Hierarchical clustering, Histogram, Inkscape, JAR (file format), Java (programming language), Java (software platform), K-d tree, K-means clustering, K-medians clustering, K-nearest neighbors algorithm, KNIME, LaTeX, Linux, Local outlier factor, Locality-sensitive hashing, Ludwig Maximilian University of Munich, M-tree, Machine learning, Macintosh operating systems, Metric (mathematics), Microsoft Windows, Multidimensional scaling, Nearest neighbor search, ..., NoSQL, OpenGL, OPTICS algorithm, Outlier, Parallel coordinates, PDF, Phoneme, PostScript, Primitive data type, Principal component analysis, R* tree, R-tree, RapidMiner, Receiver operating characteristic, Research, Scalable Vector Graphics, Scatter plot, Service provider interface, SIGMOD, Single-linkage clustering, Software engineer, Software framework, Spaceflight, Spatial database, Sperm whale, SQL, Statistical classification, Student, SUBCLU, Time series, University of Southern Denmark, Weka (machine learning). Expand index (32 more) » « Shrink index
The Affero General Public License (Affero GPL and informally Affero License) is either of two distinct, though historically related, free software licenses.
In data mining, anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.
Batik is a pure-Java library that can be used to render, generate, and manipulate SVG graphics (SVG is an XML markup language for describing two-dimensional vector graphics).
AprioriRakesh Agrawal and Ramakrishnan Srikant.
Arthur Zimek is a professor in data mining, data science and machine learning at the University of Southern Denmark in Odense, Denmark.
A bicycle-sharing system, public bicycle system, or bike-share scheme, is a service in which bicycles are made available for shared use to individuals on a short term basis for a price or free.
Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information.
The canopy clustering algorithm is an unsupervised pre-clustering algorithm introduced by Andrew McCallum, Kamal Nigam and Lyle Ungar in 2000.
Cascading Style Sheets (CSS) is a style sheet language used for describing the presentation of a document written in a markup language like HTML.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions.
A column family is a database object that contains columns of related data.
Copyleft (a play on the word copyright) is the practice of offering people the right to freely distribute copies and modified versions of a work with the stipulation that the same rights be preserved in derivative works down the line.
Clustering is the problem of partitioning data points into groups based on their similarity.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data mining.
A database is an organized collection of data, stored and accessed electronically.
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996.
In time series analysis, dynamic time warping (DTW) is one of the algorithms for measuring similarity between two temporal sequences, which may vary in speed.
Evaluation is a systematic determination of a subject's merit, worth and significance, using criteria governed by a set of standards.
In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables.
In computer science, a for-loop (or simply for loop) is a control flow statement for specifying iteration, which allows code to be executed repeatedly.
In computer science, garbage collection (GC) is a form of automatic memory management.
The GNU Affero General Public License is a free, copyleft license published by the Free Software Foundation in November 2007, and based on the GNU General Public License, version 3 and the Affero General Public License.
Hans-Peter Kriegel (1 October 1948, Germany) is a German computer scientist and professor at the Ludwig Maximilian University of Munich and leading the Database Systems Group in the Department of Computer Science.
Heidelberg University (Ruprecht-Karls-Universität Heidelberg; Universitas Ruperto Carola Heidelbergensis) is a public research university in Heidelberg, Baden-Württemberg, Germany.
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis which seeks to build a hierarchy of clusters.
A histogram is an accurate representation of the distribution of numerical data.
Inkscape is a free and open-source vector graphics editor; it can be used to create or edit vector graphics such as illustrations, diagrams, line arts, charts, logos and complex paintings.
A JAR (Java ARchive) is a package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file for distribution.
Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.
Java is a set of computer software and specifications developed by James Gosling at Sun Microsystems, which was later acquired by the Oracle Corporation, that provides a system for developing application software and deploying it in a cross-platform computing environment.
In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space.
k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.
In statistics and data mining, k-medians clustering is a cluster analysis algorithm.
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform.
LaTeX (or; a shortening of Lamport TeX) is a document preparation system.
Linux is a family of free and open-source software operating systems built around the Linux kernel.
In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and Jörg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.
Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data.
Ludwig Maximilian University of Munich (also referred to as LMU or the University of Munich, in German: Ludwig-Maximilians-Universität München) is a public research university located in Munich, Germany.
M-trees are tree data structures that are similar to R-trees and B-trees.
Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.
The family of Macintosh operating systems developed by Apple Inc. includes the graphical user interface-based operating systems it has designed for use with its Macintosh series of personal computers since 1984, as well as the related system software it once created for compatible third-party systems.
In mathematics, a metric or distance function is a function that defines a distance between each pair of elements of a set.
Microsoft Windows is a group of several graphical operating system families, all of which are developed, marketed, and sold by Microsoft.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset.
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point.
A NoSQL (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases.
Open Graphics Library (OpenGL) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics.
Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding density-based clusters in spatial data.
In statistics, an outlier is an observation point that is distant from other observations.
Parallel coordinates are a common way of visualizing high-dimensional geometry and analyzing multivariate data.
The Portable Document Format (PDF) is a file format developed in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.
A phoneme is one of the units of sound (or gesture in the case of sign languages, see chereme) that distinguish one word from another in a particular language.
PostScript (PS) is a page description language in the electronic publishing and desktop publishing business.
In computer science, primitive data type is either of the following.
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.
In data processing R*-trees are a variant of R-trees used for indexing spatial information.
R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.
RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics.
In statistics, a receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.
Research comprises "creative and systematic work undertaken to increase the stock of knowledge, including knowledge of humans, culture and society, and the use of this stock of knowledge to devise new applications." It is used to establish or confirm facts, reaffirm the results of previous work, solve new or existing problems, support theorems, or develop new theories.
Scalable Vector Graphics (SVG) is an XML-based vector image format for two-dimensional graphics with support for interactivity and animation.
A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data.
Service Provider Interface (SPI) is an API intended to be implemented or extended by a third party.
SIGMOD is the Association for Computing Machinery's Special Interest Group on Management of Data, which specializes in large-scale data management problems and databases.
In statistics, single-linkage clustering is one of several methods of hierarchical clustering.
A software engineer is a person who applies the principles of software engineering to the design, development, maintenance, testing, and evaluation of computer software.
In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software.
Spaceflight (also written space flight) is ballistic flight into or through outer space.
A spatial database is a database that is optimized for storing and querying data that represents objects defined in a geometric space.
The sperm whale (Physeter macrocephalus) or cachalot is the largest of the toothed whales and the largest toothed predator.
SQL (S-Q-L, "sequel"; Structured Query Language) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).
In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
A student is a learner or someone who attends an educational institution.
SUBCLU is an algorithm for clustering high-dimensional data by Karin Kailing, Hans-Peter Kriegel and Peer Kröger.
A time series is a series of data points indexed (or listed or graphed) in time order.
The University of Southern Denmark (Syddansk Universitet, literally South Danish University, abbr. SDU) is a university in Denmark.
Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand.