Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

HBase is massively scalable -- and hugely complex

Rick Grehan | April 1, 2014
Apache HBase offers extreme scalability, reliability, and flexibility, but at the cost of many moving parts.

More precisely, a row is a collection of key/value pairs, the key being a column identifier and the value being the content of the cell that exists at the intersection of a specific row and column. However, because HBase is a column-oriented database, no two rows in a table need have the same columns. To complicate matters further, data is versioned in HBase. The actual coordinates of a value (cell) is the tuple {row key, column key, timestamp}. In addition, columns can be grouped into column families, which give a database designer further control over access characteristics, as all columns within a column family will be stored in close proximity to one another.

A write operation in HBase first records the data to a commit log (a "write-ahead log"), then to an internal memory structure called a MemStore. When the MemStore fills, it is flushed to disk as an entity called an HFile. HFiles are stored as a sequence of data blocks, with an index appended to the file's end. Another index, kept in memory, speeds searches for data in HFiles.

HFiles are immutable once written. If a key is deleted, HBase records a special "tombstone" marker to commemorate the delete. Tombstones are removed (as is the deleted data) when HFiles are periodically compacted.

HBase attempts to satisfy read operations first through the MemStore. Failing that, HBase checks yet another in-memory structure, the BlockStore, which is a read cache designed to deliver frequently read data from memory, rather than from the disk-based HFiles.

HBase shards rows by regions, which are defined by a range of row keys. Every region in an HBase cluster is managed by a RegionServer process. Typically, there is a single RegionServer process per HBase node. As the amount of data grows, HBase splits regions and migrates the associated data to different nodes in the cluster for balancing purposes.

HBase's cluster architecture is not completely symmetrical. For example, every cluster must have a single, active master node. Multiple nodes can (and should) be designated as master nodes, but when the cluster boots, the candidate masters coordinate so that only one is the acting master. It's the master's responsibility to monitor region servers, handle region server failover, and coordinate region splits.

Should the master node crash, the cluster can still operate in a steady-state mode -- managing read and write requests — but cannot execute any of the operations that require the master's coordination (such as rebalancing). This is why it's a good idea to specify multiple master nodes; if and when the reigning master should fail, it will be quickly replaced.

You can run HBase atop a native file system for development purposes, but a deployed HBase cluster runs on HDFS, which — as mentioned earlier — seems like a poor playground for HBase. Despite the streaming-oriented underlying file system, HBase achieves fast random I/O. It accomplishes this magic by a combination of batching writes in memory and persisting data to disk using log-structured merge trees. As a result, all random writes are performed in memory, and when data is flushed to disk, the data is first sorted, then written sequentially with an accompanying index. Random reads are first attempted in memory, as mentioned above. If the requested data is not in memory, the subsequent disk search is speedy because the data is sorted and indexed.


Previous Page  1  2  3  4  5  Next Page 

Sign up for Computerworld eNewsletters.