Issue Details (XML | Word | Printable)

Key: KATTA-6
Type: Sub-task Sub-task
Status: Resolved Resolved
Resolution: Won't Fix
Priority: Major Major
Assignee: Stefan Groschupf
Reporter: Johannes Zillmann
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Katta
KATTA-5

refactor cluster start/sop mechanism

Created: 12/Dec/08 02:54 PM   Updated: 19/Feb/10 04:28 PM
Component/s: cluster
Affects Version/s: 0.4
Fix Version/s: 0.6

External References:
observational geoengineering warmest biological sea (observational geoengineering warmest biological sea)
radiative capacity levels feedback extinction observations economics


 Description  « Hide
A shutdown of the cluster (bin/stop-all.sh) leaves the ephemeral nodes (of master and nodes) behind.
On newstart, the ephemera's are still there (but not connected to any owner). The master and all nodes delete unconnected ephemeral's with their address and create new one.
This leed's to a lot of node connected / node disconnected events in the master log on startup and it looks like there is something wrong.
Especially on large cluster the startup log looks very confusing. But redeployments shouldn't happen, thanks to the safe mode.

Think there are several solutions.
I is possible to re-own unconnected ephemerals if seesion id and password are known. So if f.e. a node would persist it zookeeper sessionId and password it could reown it ephemeral on startup if still existent.

What is already there is a shutdown hook on node side. This hook tries, among other stuff, to delete the node's ephemeral.
The problem with this is the stop-all script, since it stops the master with the zk process first and then the node. So if the hook executes there is no zk system to communicate anymore.
I think it would be a good move to decouple master and zookeeper process and stop the zookeeper process a last in the stop-all.sh script.
This would also help people who may have their own zookeeper system already running!



 All   Comments   Change History   git Commits      Sort Order: Ascending order - Click to sort in descending order
Ted Dunning added a comment - 17/Apr/09 04:27 PM

Re-owning ephemeral nodes is a very bad idea since it violates many of the guarantees that ZK provides.

But the suggestion about separating out ZK from the katta nodes is very important. Doing that allows multiple katta's to coexist in a single ZK namespace which is very nice for high level diagnostics. In general, ZK should be more persistent than any katta object.

If you have a stop operation and have ZK integrated into the master, then the master should delete all of the ZK related snapshots and transaction logs as it exits. This would prevent any confusion on the next start since ZK would always be empty on start.


Ted Dunning added a comment - 17/Apr/09 04:27 PM

This is related to KATTA-43 since a session expiration event is very similar to a disorderly shutdown.

Stefan Groschupf added a comment - 13/Oct/09 04:20 AM
Duplicated since the problems are also described in KATTA-43.