Hi,
I usually see operator thread getting stopped and restarted immediately. Is that expected? This particular instance, in one of the node it stopped for almost 2 hours and no index deployment happened during this time on this node. The 'listNodes' was showing the node connected though. I am using Katta 0.6.2.
2011-05-05 00:02:15,793 INFO net.sf.katta.master.OperatorThread:100 - operator thread stopped
2011-05-05 00:02:17,276 WARN org.I0Itec.zkclient.ZkEventThread:78 - Error handling event ZkEvent[State changed to SyncConnected sent to net.sf.katta.protoco
l.InteractionProtocol$1@64cbdef5]
org.I0Itec.zkclient.exception.ZkNodeExistsException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /katta/maste
r
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:55)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:304)
at org.I0Itec.zkclient.ZkClient.createEphemeral(ZkClient.java:328)
at net.sf.katta.protocol.InteractionProtocol.createEphemeral(InteractionProtocol.java:478)
at net.sf.katta.protocol.InteractionProtocol.publishMaster(InteractionProtocol.java:351)
at net.sf.katta.master.Master.becomePrimaryOrSecondaryMaster(Master.java:104)
at net.sf.katta.master.Master.reconnect(Master.java:86)
at net.sf.katta.protocol.InteractionProtocol$1.handleStateChanged(InteractionProtocol.java:95)
at org.I0Itec.zkclient.ZkClient$5.run(ZkClient.java:484)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /katta/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:608)
at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:87)
at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:308)
at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:304)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 9 more
2011-05-05 02:05:50,000 INFO net.sf.katta.protocol.upgrade.UpgradeRegistry:52 - version of distribution 0.6.2
2011-05-05 02:05:50,000 INFO net.sf.katta.protocol.upgrade.UpgradeRegistry:53 - version of cluster 0.6.2
2011-05-05 02:05:50,016 INFO net.sf.katta.master.Master:149 - start managing nodes...
Thanks,
Murali Krishna
Yes, this is from master log. I have 2 hosts, each run master and slavenode process. The other node master log was fine and the shards continued to deploy there.
Is it possible that it didnot receive zookeeper reconnect at all ? This is happening multiple times a day and causing stability issues. Any pointers to fix this will be really helpful. I run hadoop+hbase+zookeeper+Katta on 3 hosts.
Description
Reported by Murali Krishna:
Hi,
I usually see operator thread getting stopped and restarted immediately. Is that expected? This particular instance, in one of the node it stopped for almost 2 hours and no index deployment happened during this time on this node. The 'listNodes' was showing the node connected though. I am using Katta 0.6.2.
2011-05-05 00:02:15,793 INFO net.sf.katta.master.OperatorThread:100 - operator thread stopped
2011-05-05 00:02:17,276 WARN org.I0Itec.zkclient.ZkEventThread:78 - Error handling event ZkEvent[State changed to SyncConnected sent to net.sf.katta.protoco
l.InteractionProtocol$1@64cbdef5]
org.I0Itec.zkclient.exception.ZkNodeExistsException: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /katta/maste
r
at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:55)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
at org.I0Itec.zkclient.ZkClient.create(ZkClient.java:304)
at org.I0Itec.zkclient.ZkClient.createEphemeral(ZkClient.java:328)
at net.sf.katta.protocol.InteractionProtocol.createEphemeral(InteractionProtocol.java:478)
at net.sf.katta.protocol.InteractionProtocol.publishMaster(InteractionProtocol.java:351)
at net.sf.katta.master.Master.becomePrimaryOrSecondaryMaster(Master.java:104)
at net.sf.katta.master.Master.reconnect(Master.java:86)
at net.sf.katta.protocol.InteractionProtocol$1.handleStateChanged(InteractionProtocol.java:95)
at org.I0Itec.zkclient.ZkClient$5.run(ZkClient.java:484)
at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:72)
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /katta/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:608)
at org.I0Itec.zkclient.ZkConnection.create(ZkConnection.java:87)
at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:308)
at org.I0Itec.zkclient.ZkClient$1.call(ZkClient.java:304)
at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
... 9 more
2011-05-05 02:05:50,000 INFO net.sf.katta.protocol.upgrade.UpgradeRegistry:52 - version of distribution 0.6.2
2011-05-05 02:05:50,000 INFO net.sf.katta.protocol.upgrade.UpgradeRegistry:53 - version of cluster 0.6.2
2011-05-05 02:05:50,016 INFO net.sf.katta.master.Master:149 - start managing nodes...
Thanks,
Murali Krishna
Yes, this is from master log. I have 2 hosts, each run master and slavenode process. The other node master log was fine and the shards continued to deploy there.
Is it possible that it didnot receive zookeeper reconnect at all ? This is happening multiple times a day and causing stability issues. Any pointers to fix this will be really helpful. I run hadoop+hbase+zookeeper+Katta on 3 hosts.