The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing
Hadoop includes these subprojects:
* Hadoop Common: The common utilities that support the other Hadoop subprojects.
* Avro: A data serialization system that provides dynamic integration with scripting languages.
* Chukwa: A data collection system for managing large distributed systems.
* HBase: A scalable, distributed database that supports structured data storage for large tables.
* HDFS: A distributed file system that provides high throughput access to application data.
* Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
* MapReduce: A software framework for distributed processing of large data sets on compute clusters.
* Pig: A high-level data-flow language and execution framework for parallel computation.
* ZooKeeper: A high-performance coordination service for distributed applications.