Get it on Google Play
New! Download Unionpedia on your Android™ device!
Faster access than browser!

Data mining

Index Data mining

Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. [1]

187 relations: Academic journal, Academic Press, ADVISE, Agent mining, Aggregate function, Analytics, Angoss, Anomaly detection, Artificial intelligence, Artificial neural network, Association for Computing Machinery, Association for the Advancement of Artificial Intelligence, Association rule learning, Automatic summarization, Bayes' theorem, Bayesian network, Behavior informatics, Big data, Bioinformatics, Business intelligence, Buzzword, C++, Cambridge University Press, Carrot2, Chemicalize, Clarabridge, Cluster analysis, Clustering high-dimensional data, Computational complexity theory, Computational science, Computer science, Conference on Information and Knowledge Management, Copyright Directive, CounterPunch, Cross-industry standard process for data mining, Data, Data analysis, Data collection, Data dredging, Data integration, Data management, Data mart, Data Mining and Knowledge Discovery, Data pre-processing, Data set, Data transformation, Data visualization, Data warehouse, Database, Database Directive, ..., DATADVANCE, Decision support system, Decision tree, Decision tree learning, Deep learning, Domain driven data mining, Drug discovery, ECML PKDD, Educational data mining, Edward Snowden, Electronic discovery, ELKI, Ensemble learning, European Commission, Examples of data mining, Exploratory data analysis, Factor analysis, Fair use, Family Educational Rights and Privacy Act, Forrester Research, Fraction of variance unexplained, Gartner, General Architecture for Text Engineering, Genetic algorithm, Global surveillance disclosures (2013–present), GNU Project, Google Book Search Settlement Agreement, Google Scholar, Gregory Piatetsky-Shapiro, Health Insurance Portability and Accountability Act, Hewlett-Packard, IBM, Information extraction, Information integration, Information processing, InformationWeek, Intention mining, Interdisciplinarity, International Journal of Data Warehousing and Mining, International Safe Harbor Privacy Principles, Java (programming language), Java Data Mining, Jerome H. Friedman, Jiawei Han, KNIME, KXEN Inc., Learning classifier system, Limitations and exceptions to copyright, LIONsolver, Lua (programming language), Machine learning, Massive Online Analysis, Megaputer Intelligence, Michael Lovell, Microsoft, Microsoft Academic Search, Microsoft Analysis Services, Misnomer, Missing data, MLPACK (C++ library), Morgan Kaufmann Publishers, Multi expression programming, Multilinear subspace learning, Multivariate statistics, Named-entity recognition, National Security Agency, Natural language processing, Natural Language Toolkit, NetOwl, Neural network, Online algorithm, Open access, Open-source model, OpenNN, OpenText, Oracle Corporation, Oracle Data Mining, Orange (software), Overfitting, Personally identifiable information, Philip S. Yu, Predictive analytics, Predictive Model Markup Language, Prentice Hall, Profiling (information science), Programming language, PSeven, Psychometrics, Python (programming language), Qlucore, R (programming language), RapidMiner, Receiver operating characteristic, Regression analysis, Reproducibility, Rexer's Annual Data Miner Survey, Robert Tibshirani, SAS Institute, Scikit-learn, SEMMA, Sequential pattern mining, SIGKDD, SIGMOD, Social media mining, Spatial database, Springer Science+Business Media, SPSS Modeler, Statistica, Statistical classification, Statistical hypothesis testing, Statistical inference, Statistical model, Statistics, StatSoft, Stellar Wind, Structured data analysis (statistics), Support vector machine, Surveillance capitalism, Tanagra (machine learning), Text mining, The American Statistician, The Review of Economic Studies, Time series, Torch (machine learning), Total Information Awareness, Training, test, and validation sets, Trevor Hastie, UBM plc, UIMA, United States Congress, Usama Fayyad, Vertica, VLDB, Web mining, Web scraping, Weka (machine learning), XML. Expand index (137 more) »

Academic journal

An academic or scholarly journal is a periodical publication in which scholarship relating to a particular academic discipline is published.

New!!: Data mining and Academic journal · See more »

Academic Press

Academic Press is an academic book publisher.

New!!: Data mining and Academic Press · See more »


ADVISE (Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement) is a research and development program within the United States Department of Homeland Security (DHS) Threat and Vulnerability Testing and Assessment (TVTA) portfolio.

New!!: Data mining and ADVISE · See more »

Agent mining

Agent mining is an interdisciplinary area that synergizes multiagent systems with data mining and machine learning.

New!!: Data mining and Agent mining · See more »

Aggregate function

In database management an aggregate function is a function where the values of multiple rows are grouped together to form a single value of more significant meaning or measurement such as a set, a bag or a list.

New!!: Data mining and Aggregate function · See more »


Analytics is the discovery, interpretation, and communication of meaningful patterns in data.

New!!: Data mining and Analytics · See more »


Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, is a provider of predictive analytics systems through software licensing and services.

New!!: Data mining and Angoss · See more »

Anomaly detection

In data mining, anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

New!!: Data mining and Anomaly detection · See more »

Artificial intelligence

Artificial intelligence (AI, also machine intelligence, MI) is intelligence demonstrated by machines, in contrast to the natural intelligence (NI) displayed by humans and other animals.

New!!: Data mining and Artificial intelligence · See more »

Artificial neural network

Artificial neural networks (ANNs) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains.

New!!: Data mining and Artificial neural network · See more »

Association for Computing Machinery

The Association for Computing Machinery (ACM) is an international learned society for computing.

New!!: Data mining and Association for Computing Machinery · See more »

Association for the Advancement of Artificial Intelligence

The Association for the Advancement of Artificial Intelligence (AAAI) is an international, nonprofit, scientific society devoted to promote research in, and responsible use of, artificial intelligence.

New!!: Data mining and Association for the Advancement of Artificial Intelligence · See more »

Association rule learning

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases.

New!!: Data mining and Association rule learning · See more »

Automatic summarization

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document.

New!!: Data mining and Automatic summarization · See more »

Bayes' theorem

In probability theory and statistics, Bayes’ theorem (alternatively Bayes’ law or Bayes' rule, also written as Bayes’s theorem) describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

New!!: Data mining and Bayes' theorem · See more »

Bayesian network

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG).

New!!: Data mining and Bayesian network · See more »

Behavior informatics

Behavior informatics (BI) is the informatics of behaviors so as to obtain behavior intelligence and behavior insights.

New!!: Data mining and Behavior informatics · See more »

Big data

Big data is data sets that are so big and complex that traditional data-processing application software are inadequate to deal with them.

New!!: Data mining and Big data · See more »


Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data.

New!!: Data mining and Bioinformatics · See more »

Business intelligence

Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information.

New!!: Data mining and Business intelligence · See more »


A buzzword is a word or phrase, new or already existing, that becomes very popular for a period of time.

New!!: Data mining and Buzzword · See more »


C++ ("see plus plus") is a general-purpose programming language.

New!!: Data mining and C++ · See more »

Cambridge University Press

Cambridge University Press (CUP) is the publishing business of the University of Cambridge.

New!!: Data mining and Cambridge University Press · See more »


Carrot² is an open source search results clustering engine.

New!!: Data mining and Carrot2 · See more »


Chemicalize is an online platform for chemical calculations, search, and text processing.

New!!: Data mining and Chemicalize · See more »


Clarabridge is an American software company founded in 2006 in Reston, Virginia, United States.

New!!: Data mining and Clarabridge · See more »

Cluster analysis

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).

New!!: Data mining and Cluster analysis · See more »

Clustering high-dimensional data

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions.

New!!: Data mining and Clustering high-dimensional data · See more »

Computational complexity theory

Computational complexity theory is a branch of the theory of computation in theoretical computer science that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other.

New!!: Data mining and Computational complexity theory · See more »

Computational science

Computational science (also scientific computing or scientific computation (SC)) is a rapidly growing multidisciplinary field that uses advanced computing capabilities to understand and solve complex problems.

New!!: Data mining and Computational science · See more »

Computer science

Computer science deals with the theoretical foundations of information and computation, together with practical techniques for the implementation and application of these foundations.

New!!: Data mining and Computer science · See more »

Conference on Information and Knowledge Management

The ACM Conference on Information and Knowledge Management (CIKM, pronounced) is an annual computer science research conference dedicated to information management (IM) and knowledge management (KM).

New!!: Data mining and Conference on Information and Knowledge Management · See more »

Copyright Directive

The Copyright Directive (officially the Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, also known as the Information Society Directive or the InfoSoc Directive), is a directive of the European Union enacted to implement the WIPO Copyright Treaty and to harmonise aspects of copyright law across Europe, such as copyright exceptions. The directive was enacted under the internal market provisions of the Treaty of Rome. The directive was subject to unprecedented lobbying and has been cited as a success for copyright industries. The directive gives EU Member States significant freedom in certain aspects of transposition. Member States had until 22 December 2002 to implement the directive into their national laws. However, only Greece and Denmark met the deadline and the European Commission eventually initiated enforcement action against six Member States for non-implementation.

New!!: Data mining and Copyright Directive · See more »


CounterPunch is a magazine published six times per year in the United States that covers politics in a manner its editors describe as "muckraking with a radical attitude".

New!!: Data mining and CounterPunch · See more »

Cross-industry standard process for data mining

Cross-industry standard process for data mining, known as CRISP-DM,Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22.

New!!: Data mining and Cross-industry standard process for data mining · See more »


Data is a set of values of qualitative or quantitative variables.

New!!: Data mining and Data · See more »

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

New!!: Data mining and Data analysis · See more »

Data collection

Data collection is the process of gathering and measuring information on targeted variables in an established systematic fashion, which then enables one to answer relevant questions and evaluate outcomes.

New!!: Data mining and Data collection · See more »

Data dredging

Data dredging (also data fishing, data snooping, and '''''p'''''-hacking) is the use of data mining to uncover patterns in data that can be presented as statistically significant, without first devising a specific hypothesis as to the underlying causality.

New!!: Data mining and Data dredging · See more »

Data integration

Data integration involves combining data residing in different sources and providing users with a unified view of them.

New!!: Data mining and Data integration · See more »

Data management

Data management comprises all disciplines related to managing data as a valuable resource.

New!!: Data mining and Data management · See more »

Data mart

A data mart is a structure / access pattern specific to data warehouse environments, used to retrieve client-facing data.

New!!: Data mining and Data mart · See more »

Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery is a bimonthly peer-reviewed scientific journal focusing on data mining published by Springer Science+Business Media.

New!!: Data mining and Data Mining and Knowledge Discovery · See more »

Data pre-processing

Data pre-processing is an important step in the data mining process.

New!!: Data mining and Data pre-processing · See more »

Data set

A data set (or dataset) is a collection of data.

New!!: Data mining and Data set · See more »

Data transformation

In computing, data transformation is the process of converting data from one format or structure into another format or structure.

New!!: Data mining and Data transformation · See more »

Data visualization

Data visualiation or data visualiation is viewed by many disciplines as a modern equivalent of visual communication.

New!!: Data mining and Data visualization · See more »

Data warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.

New!!: Data mining and Data warehouse · See more »


A database is an organized collection of data, stored and accessed electronically.

New!!: Data mining and Database · See more »

Database Directive

The Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases is a directive of the European Union in the field of copyright law, made under the internal market provisions of the Treaty of Rome.

New!!: Data mining and Database Directive · See more »


DATADVANCE Is a software development company, evolved out of a collaborative research program between Airbus and Institute for Information Transmission Problems of the Russian Academy of Sciences (IITP RAS).

New!!: Data mining and DATADVANCE · See more »

Decision support system

A decision support system (DSS) is an information system that supports business or organizational decision-making activities.

New!!: Data mining and Decision support system · See more »

Decision tree

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.

New!!: Data mining and Decision tree · See more »

Decision tree learning

Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).

New!!: Data mining and Decision tree learning · See more »

Deep learning

Deep learning (also known as deep structured learning or hierarchical learning) is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms.

New!!: Data mining and Deep learning · See more »

Domain driven data mining

Domain driven data mining is a data mining methodology for discovering actionable knowledge and deliver actionable insights from complex data and behaviors in a complex environment.

New!!: Data mining and Domain driven data mining · See more »

Drug discovery

In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which new candidate medications are discovered.

New!!: Data mining and Drug discovery · See more »


ECML PKDD, the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, is one of the leading academic conferences on machine learning and knowledge discovery, held in Europe every year.

New!!: Data mining and ECML PKDD · See more »

Educational data mining

Educational data mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems).

New!!: Data mining and Educational data mining · See more »

Edward Snowden

Edward Joseph Snowden (born June 21, 1983) is an American computer professional, former Central Intelligence Agency (CIA) employee, and former contractor for the United States government who copied and leaked classified information from the National Security Agency (NSA) in 2013 without authorization.

New!!: Data mining and Edward Snowden · See more »

Electronic discovery

Electronic discovery (also e-discovery or ediscovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often referred to as electronically stored information or ESI).

New!!: Data mining and Electronic discovery · See more »


ELKI (for Environment for DeveLoping KDD-Applications Supported by Index-Structures) is a knowledge discovery in databases (KDD, "data mining") software framework developed for use in research and teaching originally at the database systems research unit of Professor Hans-Peter Kriegel at the Ludwig Maximilian University of Munich, Germany.

New!!: Data mining and ELKI · See more »

Ensemble learning

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms alone.

New!!: Data mining and Ensemble learning · See more »

European Commission

The European Commission (EC) is an institution of the European Union, responsible for proposing legislation, implementing decisions, upholding the EU treaties and managing the day-to-day business of the EU.

New!!: Data mining and European Commission · See more »

Examples of data mining

Data mining, the process of discovering patterns in large data sets, has been used in many applications.

New!!: Data mining and Examples of data mining · See more »

Exploratory data analysis

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods.

New!!: Data mining and Exploratory data analysis · See more »

Factor analysis

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

New!!: Data mining and Factor analysis · See more »

Fair use

Fair use is a doctrine in the law of the United States that permits limited use of copyrighted material without having to first acquire permission from the copyright holder.

New!!: Data mining and Fair use · See more »

Family Educational Rights and Privacy Act

The Family Educational Rights and Privacy Act of 1974 (FERPA or the Buckley Amendment) is a United States federal law that governs the access of educational information and records to public entities such as potential employers, publicly funded educational institutions, and foreign governments.

New!!: Data mining and Family Educational Rights and Privacy Act · See more »

Forrester Research

Forrester is an American market research company that provides advice on existing and potential impact of technology, to its clients and the public.

New!!: Data mining and Forrester Research · See more »

Fraction of variance unexplained

In statistics, the fraction of variance unexplained (FVU) in the context of a regression task is the fraction of variance of the regressand (dependent variable) Y which cannot be explained, i.e., which is not correctly predicted, by the explanatory variables X.

New!!: Data mining and Fraction of variance unexplained · See more »


Gartner, Inc. is a global research and advisory firm providing insights, advice, and tools for leaders in IT, Finance, HR, Customer Service and Support, Legal and Compliance, Marketing, Sales, and Supply Chain functions across the world.

New!!: Data mining and Gartner · See more »

General Architecture for Text Engineering

General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for many natural language processing tasks, including information extraction in many languages.

New!!: Data mining and General Architecture for Text Engineering · See more »

Genetic algorithm

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA).

New!!: Data mining and Genetic algorithm · See more »

Global surveillance disclosures (2013–present)

Ongoing news reports in the international media have revealed operational details about the United States National Security Agency (NSA) and its international partners' global surveillance of foreign nationals and U.S. citizens.

New!!: Data mining and Global surveillance disclosures (2013–present) · See more »

GNU Project

The GNU Project is a free-software, mass-collaboration project, first announced on September 27, 1983 by Richard Stallman at MIT.

New!!: Data mining and GNU Project · See more »

Google Book Search Settlement Agreement

The Google Book Search Settlement Agreement was a proposal between the Authors Guild, the Association of American Publishers, and Google in the settlement of ''Authors Guild et al. v. Google'', a class action lawsuit alleging copyright infringement on the part of Google.

New!!: Data mining and Google Book Search Settlement Agreement · See more »

Google Scholar

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines.

New!!: Data mining and Google Scholar · See more »

Gregory Piatetsky-Shapiro

Gregory I. Piatetsky-Shapiro (born 7 April 1958) is a data scientist and the co-founder of the KDD, the Association for Computing Machinery SIGKDD association for Knowledge Discovery and Data Mining.

New!!: Data mining and Gregory Piatetsky-Shapiro · See more »

Health Insurance Portability and Accountability Act

The Health Insurance Portability and Accountability Act of 1996 (HIPAA) was enacted by the United States Congress and signed by President Bill Clinton in 1996.

New!!: Data mining and Health Insurance Portability and Accountability Act · See more »


The Hewlett-Packard Company (commonly referred to as HP) or shortened to Hewlett-Packard was an American multinational information technology company headquartered in Palo Alto, California.

New!!: Data mining and Hewlett-Packard · See more »


The International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries.

New!!: Data mining and IBM · See more »

Information extraction

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents.

New!!: Data mining and Information extraction · See more »

Information integration

Information integration (II) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations.

New!!: Data mining and Information integration · See more »

Information processing

Information processing is the change (processing) of information in any manner detectable by an observer.

New!!: Data mining and Information processing · See more »


InformationWeek is a digital magazine which conducts corresponding face-to-face events, virtual events, and research.

New!!: Data mining and InformationWeek · See more »

Intention mining

In data mining, intention mining or intent mining is the problem of determining a user's intention from logs of his/her behavior in interaction with a computer system, such as in search engines, where there has been research on user intent or query intent prediction since 2002 (see Section 7.2.3 in R. Baeza-Yates and B. Ribeiro-Neto. "", second edition, Addison-Wesley, 2011.); and commercial intents expressed in social media posts Zhiyuan Chen, Bing Liu, Meichun Hsu, Malu Castellanos, and Riddhiman Ghosh.

New!!: Data mining and Intention mining · See more »


Interdisciplinarity or interdisciplinary studies involves the combining of two or more academic disciplines into one activity (e.g., a research project).

New!!: Data mining and Interdisciplinarity · See more »

International Journal of Data Warehousing and Mining

The International Journal of Data Warehousing and Mining (IJDWM) is a quarterly peer-reviewed academic journal covering data warehousing and data mining.

New!!: Data mining and International Journal of Data Warehousing and Mining · See more »

International Safe Harbor Privacy Principles

The International Safe Harbor Privacy Principles or Safe Harbour Privacy Principles were principles developed between 1998 and 2000 in order to prevent private organizations within the European Union or United States which store customer data from accidentally disclosing or losing personal information.

New!!: Data mining and International Safe Harbor Privacy Principles · See more »

Java (programming language)

Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.

New!!: Data mining and Java (programming language) · See more »

Java Data Mining

Java Data Mining (JDM) is a standard Java API for developing data mining applications and tools.

New!!: Data mining and Java Data Mining · See more »

Jerome H. Friedman

Jerome Harold Friedman (born 1939) is an American statistician, consultant and Professor of Statistics at Stanford University, known for his contributions in the field of statistics and data mining.

New!!: Data mining and Jerome H. Friedman · See more »

Jiawei Han

Jiawei Han (born August 10, 1949) is a Chinese computer scientist and Abel Bliss Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign.

New!!: Data mining and Jiawei Han · See more »


KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform.

New!!: Data mining and KNIME · See more »


KXEN was an American software company which existed from 1998 to 2013 when it was acquired by SAP AG.

New!!: Data mining and KXEN Inc. · See more »

Learning classifier system

Learning classifier systems, or LCS, are a paradigm of rule-based machine learning methods that combine a discovery component (e.g. typically a genetic algorithm) with a learning component (performing either supervised learning, reinforcement learning, or unsupervised learning).

New!!: Data mining and Learning classifier system · See more »

Limitations and exceptions to copyright

Limitations and exceptions to copyright are provisions, in local copyright law or Berne Convention, which allow for copyrighted works to be used without a license from the copyright owner.

New!!: Data mining and Limitations and exceptions to copyright · See more »


LIONsolver is an integrated software for data mining, business intelligence, analytics, and modeling Learning and Intelligent OptimizatioN and reactive business intelligence approach.

New!!: Data mining and LIONsolver · See more »

Lua (programming language)

Lua (from meaning moon) is a lightweight, multi-paradigm programming language designed primarily for embedded use in applications.

New!!: Data mining and Lua (programming language) · See more »

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

New!!: Data mining and Machine learning · See more »

Massive Online Analysis

Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift.

New!!: Data mining and Massive Online Analysis · See more »

Megaputer Intelligence

Megaputer Intelligence, Inc., is a software company headquartered in Bloomington, Indiana, United States, that provides data and text mining tools along with consulting services.

New!!: Data mining and Megaputer Intelligence · See more »

Michael Lovell

Michael R. Lovell (born 1967) is an American engineer, educator, and President of Marquette University.

New!!: Data mining and Michael Lovell · See more »


Microsoft Corporation (abbreviated as MS) is an American multinational technology company with headquarters in Redmond, Washington.

New!!: Data mining and Microsoft · See more »

Microsoft Academic Search

Microsoft Academic Search was a research project and academic search engine retired in 2012.

New!!: Data mining and Microsoft Academic Search · See more »

Microsoft Analysis Services

Microsoft SQL Server Analysis Services, SSAS, is an online analytical processing (OLAP) and data mining tool in Microsoft SQL Server.

New!!: Data mining and Microsoft Analysis Services · See more »


A misnomer is a name or term that suggests an idea that is known to be wrong.

New!!: Data mining and Misnomer · See more »

Missing data

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation.

New!!: Data mining and Missing data · See more »

MLPACK (C++ library)

mlpack is a machine learning software library for C++, built on top of the Armadillo library.

New!!: Data mining and MLPACK (C++ library) · See more »

Morgan Kaufmann Publishers

Morgan Kaufmann Publishers is a Burlington, Massachusetts (San Francisco, California until 2008) based publisher specializing in computer science and engineering content.

New!!: Data mining and Morgan Kaufmann Publishers · See more »

Multi expression programming

Multi Expression Programming (MEP) is a genetic programming variant encoding multiple solutions in the same chromosome.

New!!: Data mining and Multi expression programming · See more »

Multilinear subspace learning

Multilinear subspace learning is an approach to dimensionality reduction.

New!!: Data mining and Multilinear subspace learning · See more »

Multivariate statistics

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable.

New!!: Data mining and Multivariate statistics · See more »

Named-entity recognition

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

New!!: Data mining and Named-entity recognition · See more »

National Security Agency

The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence.

New!!: Data mining and National Security Agency · See more »

Natural language processing

Natural language processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

New!!: Data mining and Natural language processing · See more »

Natural Language Toolkit

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language.

New!!: Data mining and Natural Language Toolkit · See more »


NetOwl is a suite of multilingual text and entity analytics products that analyze big data in the form of text data – reports, web, social media, etc.

New!!: Data mining and NetOwl · See more »

Neural network

The term neural network was traditionally used to refer to a network or circuit of neurons.

New!!: Data mining and Neural network · See more »

Online algorithm

In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start.

New!!: Data mining and Online algorithm · See more »

Open access

Open access (OA) refers to research outputs which are distributed online and free of cost or other barriers, and possibly with the addition of a Creative Commons license to promote reuse.

New!!: Data mining and Open access · See more »

Open-source model

The open-source model is a decentralized software-development model that encourages open collaboration.

New!!: Data mining and Open-source model · See more »


OpenNN (Open Neural Networks Library) is a software library written in the C++ programming language which implements neural networks, a main area of deep learning research.

New!!: Data mining and OpenNN · See more »


OpenText Corporation (also written opentext) is a Canadian company that develops and sells enterprise information management (EIM) software.

New!!: Data mining and OpenText · See more »

Oracle Corporation

Oracle Corporation is an American multinational computer technology corporation, headquartered in Redwood Shores, California.

New!!: Data mining and Oracle Corporation · See more »

Oracle Data Mining

Oracle Data Mining (ODM) is an option of Oracle Corporation's Relational Database Management System (RDBMS) Enterprise Edition (EE).

New!!: Data mining and Oracle Data Mining · See more »

Orange (software)

Orange is an open-source data visualization, machine learning and data mining toolkit.

New!!: Data mining and Orange (software) · See more »


In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably".

New!!: Data mining and Overfitting · See more »

Personally identifiable information

Personal information, described in United States legal fields as either personally identifiable information (PII), or sensitive personal information (SPI), as used in information security and privacy laws, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context.

New!!: Data mining and Personally identifiable information · See more »

Philip S. Yu

Philip S. Yu (born 1952) is an American computer scientist and Professor in Information Technology at the University of Illinois at Chicago, known for his work in the field of data mining.

New!!: Data mining and Philip S. Yu · See more »

Predictive analytics

Predictive analytics encompasses a variety of statistical techniques from predictive modelling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.

New!!: Data mining and Predictive analytics · See more »

Predictive Model Markup Language

The Predictive Model Markup Language (PMML) is an XML-based predictive model interchange format conceived by Dr.

New!!: Data mining and Predictive Model Markup Language · See more »

Prentice Hall

Prentice Hall is a major educational publisher owned by Pearson plc.

New!!: Data mining and Prentice Hall · See more »

Profiling (information science)

In information science, profiling refers to the process of construction and application of user profiles generated by computerized data analysis.

New!!: Data mining and Profiling (information science) · See more »

Programming language

A programming language is a formal language that specifies a set of instructions that can be used to produce various kinds of output.

New!!: Data mining and Programming language · See more »


pSeven is a design space exploration software platform developed by DATADVANCE, extending design, simulation and analysis capabilities and assisting in smarter and faster design decisions.

New!!: Data mining and PSeven · See more »


Psychometrics is a field of study concerned with the theory and technique of psychological measurement.

New!!: Data mining and Psychometrics · See more »

Python (programming language)

Python is an interpreted high-level programming language for general-purpose programming.

New!!: Data mining and Python (programming language) · See more »


Qlucore is a bioinformatics company from Lund, Sweden, that provides software for the life science and biotech industries.

New!!: Data mining and Qlucore · See more »

R (programming language)

R is a programming language and free software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.

New!!: Data mining and R (programming language) · See more »


RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics.

New!!: Data mining and RapidMiner · See more »

Receiver operating characteristic

In statistics, a receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

New!!: Data mining and Receiver operating characteristic · See more »

Regression analysis

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables.

New!!: Data mining and Regression analysis · See more »


Reproducibility is the closeness of the agreement between the results of measurements of the same measurand carried out under changed conditions of measurement.

New!!: Data mining and Reproducibility · See more »

Rexer's Annual Data Miner Survey

Rexer Analytics’s Annual Data Miner Survey is the largest survey of data mining, data science, and analytics professionals in the industry.

New!!: Data mining and Rexer's Annual Data Miner Survey · See more »

Robert Tibshirani

Robert Tibshirani (born July 10, 1956) is a Professor in the Departments of Statistics and Health Research and Policy at Stanford University.

New!!: Data mining and Robert Tibshirani · See more »

SAS Institute

SAS Institute (or SAS, pronounced "sass") is an American multinational developer of analytics software based in Cary, North Carolina.

New!!: Data mining and SAS Institute · See more »


Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.

New!!: Data mining and Scikit-learn · See more »


SEMMA is an acronym that stands for Sample, Explore, Modify, Model, and Assess.

New!!: Data mining and SEMMA · See more »

Sequential pattern mining

Sequential pattern mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.

New!!: Data mining and Sequential pattern mining · See more »


SIGKDD is the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining.

New!!: Data mining and SIGKDD · See more »


SIGMOD is the Association for Computing Machinery's Special Interest Group on Management of Data, which specializes in large-scale data management problems and databases.

New!!: Data mining and SIGMOD · See more »

Social media mining

Social media mining is the process of representing, analyzing, and extracting actionable patterns and trends from raw social media data.

New!!: Data mining and Social media mining · See more »

Spatial database

A spatial database is a database that is optimized for storing and querying data that represents objects defined in a geometric space.

New!!: Data mining and Spatial database · See more »

Springer Science+Business Media

Springer Science+Business Media or Springer, part of Springer Nature since 2015, is a global publishing company that publishes books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing.

New!!: Data mining and Springer Science+Business Media · See more »

SPSS Modeler

IBM SPSS Modeler is a data mining and text analytics software application from IBM.

New!!: Data mining and SPSS Modeler · See more »


Statistica is an advanced analytics software package originally developed by StatSoft which was acquired by Dell in March 2014.

New!!: Data mining and Statistica · See more »

Statistical classification

In machine learning and statistics, classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

New!!: Data mining and Statistical classification · See more »

Statistical hypothesis testing

A statistical hypothesis, sometimes called confirmatory data analysis, is a hypothesis that is testable on the basis of observing a process that is modeled via a set of random variables.

New!!: Data mining and Statistical hypothesis testing · See more »

Statistical inference

Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution.

New!!: Data mining and Statistical inference · See more »

Statistical model

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of some sample data and similar data from a larger population.

New!!: Data mining and Statistical model · See more »


Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data.

New!!: Data mining and Statistics · See more »


StatSoft is the original developer of Statistica.

New!!: Data mining and StatSoft · See more »

Stellar Wind

"Stellar Wind" (or "Stellarwind") was the code name of a warrantless surveillance program begun under the George W. Bush administration's President's Surveillance Program (PSP).

New!!: Data mining and Stellar Wind · See more »

Structured data analysis (statistics)

Structured data analysis is the statistical data analysis of structured data.

New!!: Data mining and Structured data analysis (statistics) · See more »

Support vector machine

In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.

New!!: Data mining and Support vector machine · See more »

Surveillance capitalism

Surveillance capitalism is a term first introduced by John Bellamy Foster and Robert W. McChesney in Monthly Review in 2014 and later popularized by academic Shoshana Zuboff that denotes a new genus of capitalism that monetizes data acquired through surveillance.

New!!: Data mining and Surveillance capitalism · See more »

Tanagra (machine learning)

Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France.

New!!: Data mining and Tanagra (machine learning) · See more »

Text mining

Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text.

New!!: Data mining and Text mining · See more »

The American Statistician

The American Statistician is a quarterly peer-reviewed scientific journal covering statistics published by Taylor & Francis on behalf of the American Statistical Association.

New!!: Data mining and The American Statistician · See more »

The Review of Economic Studies

The Review of Economic Studies (also known as RESTUD) is a quarterly peer-reviewed academic journal covering economics.

New!!: Data mining and The Review of Economic Studies · See more »

Time series

A time series is a series of data points indexed (or listed or graphed) in time order.

New!!: Data mining and Time series · See more »

Torch (machine learning)

Torch is an open source machine learning library, a scientific computing framework, and a script language based on the Lua programming language.

New!!: Data mining and Torch (machine learning) · See more »

Total Information Awareness

Total Information Awareness (TIA) was a program of the United States Information Awareness Office that began during the 2003 fiscal year.

New!!: Data mining and Total Information Awareness · See more »

Training, test, and validation sets

In machine learning, the study and construction of algorithms that can learn from and make predictions on data is a common task.

New!!: Data mining and Training, test, and validation sets · See more »

Trevor Hastie

Trevor John Hastie (born 27 June 1953) is a South African and American statistician and computer scientist.

New!!: Data mining and Trevor Hastie · See more »

UBM plc

UBM plc is a global business-to-business (B2B) events organiser headquartered in London, United Kingdom.

New!!: Data mining and UBM plc · See more »


UIMA, short for Unstructured Information Management Architecture, is an OASIS standard for content analytics, originally developed at IBM.

New!!: Data mining and UIMA · See more »

United States Congress

The United States Congress is the bicameral legislature of the Federal government of the United States.

New!!: Data mining and United States Congress · See more »

Usama Fayyad

Usama M. Fayyad (born July, 1965) is an American data scientist and co-founder of KDD conferences and ACM SIGKDD association for Knowledge Discovery and Data Mining.

New!!: Data mining and Usama Fayyad · See more »


Vertica Systems is an analytic database management software company.

New!!: Data mining and Vertica · See more »


VLDB is an annual conference held by the non-profit Very Large Data Base Endowment Inc. The mission of VLDB is to promote and exchange scholarly work in databases and related fields throughout the world.

New!!: Data mining and VLDB · See more »

Web mining

Web mining is the application of data mining techniques to discover patterns from the World Wide Web.

New!!: Data mining and Web mining · See more »

Web scraping

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

New!!: Data mining and Web scraping · See more »

Weka (machine learning)

Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand.

New!!: Data mining and Weka (machine learning) · See more »


In computing, Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

New!!: Data mining and XML · See more »

Redirects here:

Artificial Intelligence in Data Mining, DATA MINING, Data Mining, Data discovery, Data mine, Data miner, Data mining system, Data-mining, Datamine, Datamining, Information mining, Information-mining, Knowledge Discovery in Databases, Knowledge discovering in databases, Knowledge discovery in databases, Knowledge mining, List of data mining software, Pattern Mining, Pattern mining, Predictive software, Subject-based data mining, Usage mining, Visual Data Mining, Web data mining.


[1] https://en.wikipedia.org/wiki/Data_mining

Hey! We are on Facebook now! »