Facebook announced in a blog post on Thursday that
it has upgraded the Apache HBase database with a new open source system
called HydraBase. Facebook is an avid HBase shop, using it to store
data for various services, including the company’s internal monitoring
system, search indexing, streaming data analysis and data scraping. What
makes HydraBase better than HBase is that it is supposedly a more
reliable database that should minimize downtime when servers fail.
With HBase, data is sharded across many regions, with multiple regions
hosted on a set of “region servers.” If a region server goes down, all
the regions it hosts have to migrate to another region server. According
to Facebook, although HBase has automatic failover, it can take a long
time to actually happen.
HydraBase counters this lag by having each region hosted on multiple
region servers, so if a single server were to go kaput, the other region
servers can act as backups, thus significantly improving recovery time
compared with HBase. The company claims HydraBase could lead to Facebook
having “no more than five minutes of downtime in a year.”
From the blog post:
The set of region servers serving each region form a quorum. Each quorum has a leader that services read and write requests from the client. HydraBase uses theRAFT consensus protocol to ensure consistency across the quorum. With a quorum of 2F+1, HydraBase can tolerate up to F failures. Each hosting region server synchronously writes to the WAL corresponding the modified region, but only a majority of the region servers need to complete their writes to ensure consistency.
Facebook is testing HydraBase and the company plans on deploying the system in phases across production clusters.
In addition to the HydraBase, Facebook also open sourced on Thursday HDFS RAID, a way of using erasure codes –
a method of data protection – in order to cut down on the multiple
clusters of data one might have Hadoop create as backups in case one
cluster shuts down.
Last year when the company used HDFS RAID in its data warehouse clusters, the blog post explains, “the cluster’s overall replication factor was reduced enough to represent tens of petabytes of capacity saving.”
0 comments:
Post a Comment