2012年11月29日木曜日

ZK is null on connection event (HBase)

HBase clients (such as HBase shell) may throw an exception with a message "ZK is null on connection event" under some conditions, especially, in my case, on a slow machine.

This exception is raised when ZooKeeperWatcher (a part of HBase client) fails to connect to ZooKeeper in a certain period of time. The code snippet below is an excerpt from ZooKeeperWatcher.java.

// Now, this callback can be invoked before the this.zookeeper is set.
// Wait a little while.
long finished = System.currentTimeMillis() +
   this.conf.getLong("hbase.zookeeper.watcher.sync.connected.wait", 2000);
while (System.currentTimeMillis() < finished) {
  Threads.sleep(1);
  if (this.recoverableZooKeeper != null) break;
}
if (this.recoverableZooKeeper == null) {
  LOG.error("ZK is null on connection event -- see stack trace " +
    "for the stack trace when constructor was called on this zkw",
    this.constructorCaller);
  throw new NullPointerException("ZK is null");
}

We can see that this code is waiting for the milliseconds specified by the property hbase.zookeeper.watcher.sync.connected.wait or 2000 milliseconds (the default value) if the property is not set.

So, a simple solution to stop the exception is to set a long-enough value to the property in hbase-site.xml like below.

<property>
  <name>hbase.zookeeper.watcher.sync.connected.wait</name>
  <value>10000</value>
</property>

At least in my case, the above setting made the exception disappear.