Logo
Unionpedia
Communication
Get it on Google Play
New! Download Unionpedia on your Android™ device!
Install
Faster access than browser!
 

Apache Hadoop

Index Apache Hadoop

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. [1]

114 relations: Amazon (company), Amazon Elastic Compute Cloud, Amazon S3, Apache Accumulo, Apache Cassandra, Apache CouchDB, Apache Flume, Apache HBase, Apache Hive, Apache Impala, Apache License, Apache Mahout, Apache Nutch, Apache Oozie, Apache Phoenix, Apache Pig, Apache Software Foundation, Apache Spark, Apache Thrift, Apache ZooKeeper, Apress, Big data, BigQuery, Bigtable, C (programming language), Cloud computing, Clustered file system, Cocoa (API), Command-line interface, Commodity computing, Computer cluster, Computer Weekly, Cross-platform, Data warehouse, Data-intensive computing, Distributed computing, Distributed data store, Docker (software), Doug Cutting, Facebook, Failover, FIFO (computing and electronics), File Transfer Protocol, Filesystem in Userspace, For Dummies, Google, Google Cloud Dataproc, Google Cloud Platform, Google File System, Google Storage, ..., Hewlett-Packard, Hortonworks, HPCC, Hypertable, Hypertext Transfer Protocol, IBM, IBM Spectrum Scale, IBRIX Fusion, Internet protocol suite, JAR (file format), Java (programming language), Java virtual machine, Jetty (web server), Lambda architecture, LexisNexis, Linux, Load (computing), Locality of reference, Machine learning, Manning Publications, MapR, MapR FS, MapReduce, Marketwired, Method (computer programming), Microsoft, Microsoft Azure, Mike Cafarella, Mount (computing), Multi-core processor, Network socket, O'Reilly Media, OCaml, Open-source model, Operating system, Oracle Cloud Platform, Oracle Corporation, PDF, Petabyte, Pool (computer science), POSIX, Preemption (computing), Programming model, Quality of service, RAID, Redundancy (engineering), Remote procedure call, Replication (computing), Sector/Sphere, Secure Shell, Shell script, Slurm Workload Manager, Software framework, Sqoop, Stanford University, Storm (event processor), Supercomputer architecture, The New York Times, Throughput, TIFF, Unix, Virtual file system, Web application, Yahoo!. Expand index (64 more) »

Amazon (company)

Amazon.com, Inc., doing business as Amazon, is an American electronic commerce and cloud computing company based in Seattle, Washington that was founded by Jeff Bezos on July 5, 1994.

New!!: Apache Hadoop and Amazon (company) · See more »

Amazon Elastic Compute Cloud

Amazon Elastic Compute Cloud (EC2) forms a central part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), by allowing users to rent virtual computers on which to run their own computer applications.

New!!: Apache Hadoop and Amazon Elastic Compute Cloud · See more »

Amazon S3

Amazon S3 (Simple Storage Service) is a cloud computing web service offered by Amazon Web Services (AWS).

New!!: Apache Hadoop and Amazon S3 · See more »

Apache Accumulo

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable.

New!!: Apache Hadoop and Apache Accumulo · See more »

Apache Cassandra

Apache Cassandra is a free and open-source distributed wide column store NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

New!!: Apache Hadoop and Apache Cassandra · See more »

Apache CouchDB

Apache CouchDB is open source database software that focuses on ease of use and having a scalable architecture.

New!!: Apache Hadoop and Apache CouchDB · See more »

Apache Flume

Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating, and moving large amounts of log data.

New!!: Apache Hadoop and Apache Flume · See more »

Apache HBase

HBase is an open-source, non-relational, distributed database modeled after Google's Bigtable and is written in Java.

New!!: Apache Hadoop and Apache HBase · See more »

Apache Hive

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization, query and analysis.

New!!: Apache Hadoop and Apache Hive · See more »

Apache Impala

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop.

New!!: Apache Hadoop and Apache Impala · See more »

Apache License

The Apache License is a permissive free software license written by the Apache Software Foundation (ASF).

New!!: Apache Hadoop and Apache License · See more »

Apache Mahout

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily in the areas of collaborative filtering, clustering and classification.

New!!: Apache Hadoop and Apache Mahout · See more »

Apache Nutch

Apache Nutch is a highly extensible and scalable open source web crawler software project.

New!!: Apache Hadoop and Apache Nutch · See more »

Apache Oozie

Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs.

New!!: Apache Hadoop and Apache Oozie · See more »

Apache Phoenix

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store.

New!!: Apache Hadoop and Apache Phoenix · See more »

Apache Pig

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop.

New!!: Apache Hadoop and Apache Pig · See more »

Apache Software Foundation

The Apache Software Foundation (ASF) is an American non-profit corporation (classified as 501(c)(3) in the United States) to support Apache software projects, including the Apache HTTP Server.

New!!: Apache Hadoop and Apache Software Foundation · See more »

Apache Spark

Apache Spark is an open-source cluster-computing framework.

New!!: Apache Hadoop and Apache Spark · See more »

Apache Thrift

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous languages.

New!!: Apache Hadoop and Apache Thrift · See more »

Apache ZooKeeper

Apache ZooKeeper is a software project of the Apache Software Foundation.

New!!: Apache Hadoop and Apache ZooKeeper · See more »

Apress

Apress Media LLC is a publisher of information technology books, based in New York City.

New!!: Apache Hadoop and Apress · See more »

Big data

Big data is data sets that are so big and complex that traditional data-processing application software are inadequate to deal with them.

New!!: Apache Hadoop and Big data · See more »

BigQuery

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage.

New!!: Apache Hadoop and BigQuery · See more »

Bigtable

Bigtable is a compressed, high performance, and proprietary data storage system built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies.

New!!: Apache Hadoop and Bigtable · See more »

C (programming language)

C (as in the letter ''c'') is a general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations.

New!!: Apache Hadoop and C (programming language) · See more »

Cloud computing

Cloud computing is an information technology (IT) paradigm that enables ubiquitous access to shared pools of configurable system resources and higher-level services that can be rapidly provisioned with minimal management effort, often over the Internet.

New!!: Apache Hadoop and Cloud computing · See more »

Clustered file system

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers.

New!!: Apache Hadoop and Clustered file system · See more »

Cocoa (API)

Cocoa is Apple's native object-oriented application programming interface (API) for their operating system macOS.

New!!: Apache Hadoop and Cocoa (API) · See more »

Command-line interface

A command-line interface or command language interpreter (CLI), also known as command-line user interface, console user interface and character user interface (CUI), is a means of interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines).

New!!: Apache Hadoop and Command-line interface · See more »

Commodity computing

Commodity computing (also known as commodity cluster computing) involves the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost.

New!!: Apache Hadoop and Commodity computing · See more »

Computer cluster

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system.

New!!: Apache Hadoop and Computer cluster · See more »

Computer Weekly

Computer Weekly is a digital magazine and website for IT professionals in the United Kingdom.

New!!: Apache Hadoop and Computer Weekly · See more »

Cross-platform

In computing, cross-platform software (also multi-platform software or platform-independent software) is computer software that is implemented on multiple computing platforms.

New!!: Apache Hadoop and Cross-platform · See more »

Data warehouse

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.

New!!: Apache Hadoop and Data warehouse · See more »

Data-intensive computing

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data.

New!!: Apache Hadoop and Data-intensive computing · See more »

Distributed computing

Distributed computing is a field of computer science that studies distributed systems.

New!!: Apache Hadoop and Distributed computing · See more »

Distributed data store

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion.

New!!: Apache Hadoop and Distributed data store · See more »

Docker (software)

Docker is a computer program that performs operating-system-level virtualization also known as containerization.

New!!: Apache Hadoop and Docker (software) · See more »

Doug Cutting

Douglass Read Cutting is a software designer and advocate and creator of open-source search technology.

New!!: Apache Hadoop and Doug Cutting · See more »

Facebook

Facebook is an American online social media and social networking service company based in Menlo Park, California.

New!!: Apache Hadoop and Facebook · See more »

Failover

In computing and related technologies such as networking, failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network.

New!!: Apache Hadoop and Failover · See more »

FIFO (computing and electronics)

FIFO is an acronym for first in, first out, a method for organizing and manipulating a data buffer, where the oldest (first) entry, or 'head' of the queue, is processed first.

New!!: Apache Hadoop and FIFO (computing and electronics) · See more »

File Transfer Protocol

The File Transfer Protocol (FTP) is a standard network protocol used for the transfer of computer files between a client and server on a computer network.

New!!: Apache Hadoop and File Transfer Protocol · See more »

Filesystem in Userspace

Filesystem in Userspace (FUSE) is a software interface for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code.

New!!: Apache Hadoop and Filesystem in Userspace · See more »

For Dummies

For Dummies is an extensive series of instructional/reference books which are intended to present non-intimidating guides for readers new to the various topics covered.

New!!: Apache Hadoop and For Dummies · See more »

Google

Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware.

New!!: Apache Hadoop and Google · See more »

Google Cloud Dataproc

Google Cloud Dataproc (Cloud Dataproc) is a cloud-based managed Spark and Hadoop service offered on.

New!!: Apache Hadoop and Google Cloud Dataproc · See more »

Google Cloud Platform

Google Cloud Platform, offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search and YouTube.

New!!: Apache Hadoop and Google Cloud Platform · See more »

Google File System

Google File System (GFS or GoogleFS) is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware.

New!!: Apache Hadoop and Google File System · See more »

Google Storage

Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure.

New!!: Apache Hadoop and Google Storage · See more »

Hewlett-Packard

The Hewlett-Packard Company (commonly referred to as HP) or shortened to Hewlett-Packard was an American multinational information technology company headquartered in Palo Alto, California.

New!!: Apache Hadoop and Hewlett-Packard · See more »

Hortonworks

Hortonworks is a big data software company based in Santa Clara, California.

New!!: Apache Hadoop and Hortonworks · See more »

HPCC

HPCC (High-Performance Computing Cluster), also known as DAS (Data Analytics Supercomputer), is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions.

New!!: Apache Hadoop and HPCC · See more »

Hypertable

Hypertable was an open-source software project to implement a database management system inspired by publications on the design of Google's Bigtable.

New!!: Apache Hadoop and Hypertable · See more »

Hypertext Transfer Protocol

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, and hypermedia information systems.

New!!: Apache Hadoop and Hypertext Transfer Protocol · See more »

IBM

The International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries.

New!!: Apache Hadoop and IBM · See more »

IBM Spectrum Scale

IBM Spectrum Scale is a high-performance clustered file system developed by IBM.

New!!: Apache Hadoop and IBM Spectrum Scale · See more »

IBRIX Fusion

IBRIX Fusion is a parallel file system combined with a logical volume manager, availability features and a management interface.

New!!: Apache Hadoop and IBRIX Fusion · See more »

Internet protocol suite

The Internet protocol suite is the conceptual model and set of communications protocols used on the Internet and similar computer networks.

New!!: Apache Hadoop and Internet protocol suite · See more »

JAR (file format)

A JAR (Java ARchive) is a package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file for distribution.

New!!: Apache Hadoop and JAR (file format) · See more »

Java (programming language)

Java is a general-purpose computer-programming language that is concurrent, class-based, object-oriented, and specifically designed to have as few implementation dependencies as possible.

New!!: Apache Hadoop and Java (programming language) · See more »

Java virtual machine

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages and compiled to Java bytecode.

New!!: Apache Hadoop and Java virtual machine · See more »

Jetty (web server)

Eclipse Jetty is a Java HTTP (Web) server and Java Servlet container.

New!!: Apache Hadoop and Jetty (web server) · See more »

Lambda architecture

Lambda architecture is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods.

New!!: Apache Hadoop and Lambda architecture · See more »

LexisNexis

LexisNexis Group is a corporation providing computer-assisted legal research as well as business research and risk management services.

New!!: Apache Hadoop and LexisNexis · See more »

Linux

Linux is a family of free and open-source software operating systems built around the Linux kernel.

New!!: Apache Hadoop and Linux · See more »

Load (computing)

In UNIX computing, the system load is a measure of the amount of computational work that a computer system performs.

New!!: Apache Hadoop and Load (computing) · See more »

Locality of reference

In computer science, locality of reference, also known as the principle of locality, is a term for the phenomenon in which the same values, or related storage locations, are frequently accessed, depending on the memory access pattern.

New!!: Apache Hadoop and Locality of reference · See more »

Machine learning

Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.

New!!: Apache Hadoop and Machine learning · See more »

Manning Publications

Manning Publications is an American publisher established by Lee Fitzpatrick and Marjan Bace that publishes books on computer technology topics, with a particular focus on web development.

New!!: Apache Hadoop and Manning Publications · See more »

MapR

MapR is a business software company headquartered in Santa Clara, California.

New!!: Apache Hadoop and MapR · See more »

MapR FS

The MapR File System (MapR FS) is a clustered file system that supports both very large-scale and high-performance uses.

New!!: Apache Hadoop and MapR FS · See more »

MapReduce

MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

New!!: Apache Hadoop and MapReduce · See more »

Marketwired

Marketwired is a press release distribution service headquartered in Toronto, Ontario, Canada.

New!!: Apache Hadoop and Marketwired · See more »

Method (computer programming)

A method in object-oriented programming (OOP) is a procedure associated with a message and an object.

New!!: Apache Hadoop and Method (computer programming) · See more »

Microsoft

Microsoft Corporation (abbreviated as MS) is an American multinational technology company with headquarters in Redmond, Washington.

New!!: Apache Hadoop and Microsoft · See more »

Microsoft Azure

Microsoft Azure (formerly Windows Azure) is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through a global network of Microsoft-managed data centers.

New!!: Apache Hadoop and Microsoft Azure · See more »

Mike Cafarella

Mike Cafarella is a computer scientist specializing in database management systems.

New!!: Apache Hadoop and Mike Cafarella · See more »

Mount (computing)

Mounting is a process by which the operating system makes files and directories on a storage device (such as hard drive, CD-ROM, or network share) available for user to access via the computer's file system.

New!!: Apache Hadoop and Mount (computing) · See more »

Multi-core processor

A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions.

New!!: Apache Hadoop and Multi-core processor · See more »

Network socket

A network socket is an internal endpoint for sending or receiving data within a node on a computer network.

New!!: Apache Hadoop and Network socket · See more »

O'Reilly Media

O'Reilly Media (formerly O'Reilly & Associates) is an American media company established by Tim O'Reilly that publishes books and Web sites and produces conferences on computer technology topics.

New!!: Apache Hadoop and O'Reilly Media · See more »

OCaml

OCaml, originally named Objective Caml, is the main implementation of the programming language Caml, created by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez and others in 1996.

New!!: Apache Hadoop and OCaml · See more »

Open-source model

The open-source model is a decentralized software-development model that encourages open collaboration.

New!!: Apache Hadoop and Open-source model · See more »

Operating system

An operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs.

New!!: Apache Hadoop and Operating system · See more »

Oracle Cloud Platform

Oracle Cloud Platform (OCP) is part of Oracle Cloud.

New!!: Apache Hadoop and Oracle Cloud Platform · See more »

Oracle Corporation

Oracle Corporation is an American multinational computer technology corporation, headquartered in Redwood Shores, California.

New!!: Apache Hadoop and Oracle Corporation · See more »

PDF

The Portable Document Format (PDF) is a file format developed in the 1990s to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems.

New!!: Apache Hadoop and PDF · See more »

Petabyte

The petabyte is a multiple of the unit byte for digital information.

New!!: Apache Hadoop and Petabyte · See more »

Pool (computer science)

In computer science, a pool is a set of resources that are kept ready to use, rather than acquired on use and released afterwards.

New!!: Apache Hadoop and Pool (computer science) · See more »

POSIX

The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems.

New!!: Apache Hadoop and POSIX · See more »

Preemption (computing)

In computing, preemption is the act of temporarily interrupting a task being carried out by a computer system, without requiring its cooperation, and with the intention of resuming the task at a later time.

New!!: Apache Hadoop and Preemption (computing) · See more »

Programming model

A Programming model refers to the style of programming where execution is invoked by making what appear to be library calls.

New!!: Apache Hadoop and Programming model · See more »

Quality of service

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network or a cloud computing service, particularly the performance seen by the users of the network.

New!!: Apache Hadoop and Quality of service · See more »

RAID

RAID (Redundant Array of Independent Disks, originally Redundant Array of Inexpensive Disks) is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both.

New!!: Apache Hadoop and RAID · See more »

Redundancy (engineering)

In engineering, redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.

New!!: Apache Hadoop and Redundancy (engineering) · See more »

Remote procedure call

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal (local) procedure call, without the programmer explicitly coding the details for the remote interaction.

New!!: Apache Hadoop and Remote procedure call · See more »

Replication (computing)

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

New!!: Apache Hadoop and Replication (computing) · See more »

Sector/Sphere

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing.

New!!: Apache Hadoop and Sector/Sphere · See more »

Secure Shell

Secure Shell (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network.

New!!: Apache Hadoop and Secure Shell · See more »

Shell script

A shell script is a computer program designed to be run by the Unix shell, a command-line interpreter.

New!!: Apache Hadoop and Shell script · See more »

Slurm Workload Manager

The Slurm Workload Manager (formerly known as Simple Linux Utility for Resource Management or SLURM), or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

New!!: Apache Hadoop and Slurm Workload Manager · See more »

Software framework

In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software.

New!!: Apache Hadoop and Software framework · See more »

Sqoop

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop.

New!!: Apache Hadoop and Sqoop · See more »

Stanford University

Stanford University (officially Leland Stanford Junior University, colloquially the Farm) is a private research university in Stanford, California.

New!!: Apache Hadoop and Stanford University · See more »

Storm (event processor)

Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language.

New!!: Apache Hadoop and Storm (event processor) · See more »

Supercomputer architecture

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s.

New!!: Apache Hadoop and Supercomputer architecture · See more »

The New York Times

The New York Times (sometimes abbreviated as The NYT or The Times) is an American newspaper based in New York City with worldwide influence and readership.

New!!: Apache Hadoop and The New York Times · See more »

Throughput

In general terms, throughput is the maximum rate of production or the maximum rate at which something can be processed.

New!!: Apache Hadoop and Throughput · See more »

TIFF

Tagged Image File Format, abbreviated TIFF or TIF, is a computer file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers.

New!!: Apache Hadoop and TIFF · See more »

Unix

Unix (trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, development starting in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

New!!: Apache Hadoop and Unix · See more »

Virtual file system

A Virtual File System (VFS) or virtual filesystem switch is an abstraction layer on top of a more concrete file system.

New!!: Apache Hadoop and Virtual file system · See more »

Web application

In computing, a web application or web app is a client–server computer program which the client (including the user interface and client-side logic) runs in a web browser.

New!!: Apache Hadoop and Web application · See more »

Yahoo!

Yahoo! is a web services provider headquartered in Sunnyvale, California and wholly owned by Verizon Communications through Oath Inc..

New!!: Apache Hadoop and Yahoo! · See more »

Redirects here:

Amazon Elastic MapReduce, HDFS, Hadoop, Hadoop Distributed File System, Hadoop Distributed Filesystem, Hadoop YARN, Hadoop distributed file system, YARN.

References

[1] https://en.wikipedia.org/wiki/Apache_Hadoop

OutgoingIncoming
Hey! We are on Facebook now! »