|
I should also point out that we have done a significant amount of load testing. Results were generally good in that the cluster handled pretty high loads, but also not as good as I would like in that at the highest loads, the cluster did not shed load well. Thanks for posting your experiences here. I am currently working on a coordinated, reproducible load test (starting a number of katta nodes, hitting these nodes from a configurable number of "test searcher" nodes to produce traffic and collecting statistic from these). With this we want to learn how katta deals with heavy load situation and learn about katta's throughput limits (to finally be able to improve on this).
An interesting topic would also be to investigate the ZK session timeout that you have mentioned. HI Ted, Hi Peter,
I already spend quite some time working on that. Some scripts can be found in extras benchmark. I tried working with faban and grinder but run into issues with both. I recently handed the issue to Peter. (Peter can you assign it to you) We discussed and agreed we will do our own little laod generation and statistic collection. (Sad there is nothing that really works). However using zookeeper for this as well, it might be that much work. Stefan I think that the ZK issue is orthogonal to load testing. It probably occurs most in EC2 due to virtualization delays. It should be possible to emulate that effect with a mocked up Zookeeper interface or by setting session timeouts very small. Insofar as load testing is concerned, the only really important points that I would bring up would be: a) should use a large variety of queries to avoid unrealistically good results from caching b) should use a traffic source that approximates Poisson distribution. One easy way to do this is just have a fair number of real or virtual sources, each with substantial random variation in inter-query timing. We used about 100 sources that had uniform distribution from 0 to 100/desired-query-rate. The reason that this is important is that too much structure in the query stream leads to unreasonably good or bad performance estimates. Plotting response time versus query generation rate is an interesting exercise. It will clearly diagnose your performance limit and what sorts of pathology result from over-load. The three kinds of important cases I have seen include: 1) well-behaved systems ... response time is nearly constant for query rates below saturation and increases linearly above saturation. 2) realistic, but still pretty well-behaved systems ... as in (1), but at some point response time is exponential or worse with respect to query rate 3) failure ... the response time shows hysteresis effects so that it looks like (1) or (2) as load approaches a critical point, but when load is decreased after exceeding the critical point, the response time stays very high. Hi Peter,
great contribution! Thanks a lot. Some comments looking through the loadTest code: + not sure why you create a execution service with just one thread. I would leave that null until it is set via startSearch. + we might want to rename TestSearcherMetaData to LoadTestNodeMetaData + also it would be super cool if we can have an ant task that runs 1 master, 2 searchNodes, 2 loadTestNodes on a singel server and than run this within a loadTest ant task and generate a graph. My vison would be that we can run this as part of the release and compare the performance over time. Stefan Hi Ted,
thanks for your valuable input. I have a first cut implementation that uses random variation of inter-query durations, but I didn't have time to implement using a large variety of queries in my first cut implementation, yet. I will have a look at this later. Hi Stefan, + Right. We don't have to create an execution service at the beginning. That was just to avoid a NPE, but I have changed this to make the code clearer and also added a JUnit test for the case where I wanted to avoid the NPE. I will comment on your other topics later. --Peter + also I wonder in generell how we want to analyze the result. I suggest we try to make sure the clocks are sync of this boxes. Than we just write a log file with <timestamp, loadTestNodeID, queryString, duration>. If we than just collect the results log files it would be straight forward to compute queries per second and the duration.
Clock synchronization is not much of an issue. If you use most any of the standard kernels (I like the ones from alestic.com) then the clocks will be close enough to correct. In general, the query rate should be changed slowly (over a period of 10's of seconds) that synchronization is not an issue. + Maybe we can use jfreechart to create such a graph (http://www.jfree.org/jfreechart/images/PriceVolumeDemo1.png I have used jfreechart and found it to produce pretty nice charts at a moderate level of effort. My own feeling is that the primary output should be tab-delimited files to make further analysis easy. I can supply a sample of 20,000 real queries if you like. If you want a realistic corpus, I suggest (some of) wikipedia. I have just checked in a patch which allows to increase the number of threads used during a test. But I feel that we should keep the number of threads constant and just adjust the query rate as Ted suggests. But that would just be a small change.
Clock synchronization is not an issue because whenever I change a parameter (e.g. query rate our thread count) I am doing this by sending a message to all of the test nodes. So the changed parameter set would take affect at all nodes at the (more or less) same time. I am currently just reporting: <thread count> <test duration> and I think it also makes sense to also report <node id> <query rate> <start time> and maybe <queryString>. I will take care of this later. Hi Peter,
Right it is about to increase the queries per second and not really matters how many threads it is. Though somehow this correspond with each other. For example if you want to do 100 queries per second and each query takes 0.5 seconds than you need at least 50 threads and each thread can make 2 queries within the second. I actually would log each query with its parameter (startTime, endTime query, nodeId etc).
Actually, the number does matter a bit. It is important for the distribution of query arrival times to have a relatively large number of threads going. This is even more important when each query starts to take a while. In general, the overall average time between queries should be much less than the average for each thread and the average response time for a single query should be much less than the random variation that the query thread adds.
It would only be an issue if you look at logs from different nodes that report the same events at different times. If you log the message that controls query rate, then you should be in great shape. Likewise, you should be fine with EC2 nodes started by a standard kernel since they usually synchronize on boot and don't survive long after boot time. According to your comments (thanks for the great input) I have made a few changes to the code. During load testing the query rates is increased now. The number of threads is just an internal detail that is computed depending on the query rate. When starting a load test one can specify the following values:
1. Number of test searcher nodes to use 2. Start query rate (in queries per second) 3. End query rate 4. Query rate step 5. Time per test in ms (the period of time before the query rate gets changed, this probably should be something like 10000 or 30000) 6. Query details The query rate gets distributed on the test searcher nodes. E.g. when having 5 nodes and a requested query rate of 100, each node will fire search requests at a rate of 20 queries per second. When running at a query rate of 1 queries per seconds I am using 1 thread and an inter-query time of (1000ms - actual query duration). To randomize this a bit I am using random( 2 * (1000ms - actual query duration) ). I am writing out statistics for each query (query rate, node id, start time, end time, elapsed time, query). So I think this should be a good start. The only part that really is missing is to use a dictionary of queries to avoid Lucene caching, We also need to build a proper example index that is suitable for load tests. I am currently migrating the existing load test code to the new design provided by
Hey Peter, what is the status of this ?
I ported Peters code to the current katta.
I did a simplification in having one thread per query (this was more robust when coming into the limit area), The loadtest is executed on a seperate katta cluster. So to run it you start one zookeeper and two katta clusters with different rootPathes. I will update the documentation |
||||||||||||||||||||||||||||||||||||||||
There are some surprises in store. Many of these center around ZK session timeout issues.