Overview
The Approaching Storm by Constant Troyon, 1849 HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
HBase ia an open-source, distributed, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase also includes:
* Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
* Query predicate push down via server side scan and get filters
* Optimizations for real time queries
* A high performance Thrift gateway
* A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
* Cascading source and sink modules
* Extensible jruby-based (JIRB) shell
* Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
This most recent version of HBase, 0.20.0, has greatly improved on its predecessors:
* No HBase single point of failure
* Rolling restart for configuration changes and minor upgrades
* Random access performance on par with open source relational databases such as MySQL