Executed DeployUndeploySearchInLoop for a longer time (which deploys an index, search in it, and undeploys it). This emulates the usage of a LuceneClient for a long time. Looking at heap dumps after several hours i found
ZooKeeper$ZkWatchManager.existWatches map with 8400 entries. Entries for pathes like
/katta/shard-to-nodes/index53#bIndex which belonging to indices which were already undeployed a long time.
A zookeeper watch is usually removed when an event for it is triggered. ZkClient immediatly registeres the watch again in case it have a listener for it.
Now in Katta a the Client remove it shard-to-nodes listeners if an index has been removed.
Looking at the thread-sump it seems that sometimes the
ZooKeeper$ZkWatchManager.existWatches has been cleared from undeployed indices and sometimes not. I suspect that this depends on the sequence of events. if the client gets a index-removed event before the last shard-to-node change event, every index related watch gets removed. If its the other way around some obsolete watches are still hanging around forever.
Removing the watches explicitely isn't that easy:
ZOOKEEPER-442
.
Re-instantiating the client isn't a solution either since
ZooKeeper instances doesn't seems to be properly garbage collected (see
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201102.mbox/browser
)