Issue Details (XML | Word | Printable)

Key: KATTA-17
Type: Improvement Improvement
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Johannes Zillmann
Reporter: Stefan Groschupf
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Katta

create a load test running on ec2

Created: 08/Jan/09 04:04 PM   Updated: 07/Jan/10 06:55 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 0.6


 Description  « Hide
The idea is to run load tests on ec2 to make performance a first level citizen.
+ write some script that creates a ec2 cluster
+ deploy the current sources on this cluster
+ write a class that generates a test index
+ start katta, deploy the test index
+ run a http://faban.sunsource.net test
+ graph the result
+ shutdown the cluster.

 All   Comments   Change History   git Commits      Sort Order: Ascending order - Click to sort in descending order
Ted Dunning added a comment - 17/Apr/09 04:21 PM
I would be happy to help anyone who works on this. We have been running a large cluster on EC2 for some time.

There are some surprises in store. Many of these center around ZK session timeout issues.


Ted Dunning added a comment - 17/Apr/09 04:22 PM

I should also point out that we have done a significant amount of load testing. Results were generally good in that the cluster handled pretty high loads, but also not as good as I would like in that at the highest loads, the cluster did not shed load well.

Peter Voss added a comment - 17/Apr/09 07:03 PM
Thanks for posting your experiences here. I am currently working on a coordinated, reproducible load test (starting a number of katta nodes, hitting these nodes from a configurable number of "test searcher" nodes to produce traffic and collecting statistic from these). With this we want to learn how katta deals with heavy load situation and learn about katta's throughput limits (to finally be able to improve on this).
An interesting topic would also be to investigate the ZK session timeout that you have mentioned.

Stefan Groschupf added a comment - 17/Apr/09 07:11 PM
HI Ted, Hi Peter,
I already spend quite some time working on that. Some scripts can be found in extras benchmark.
I tried working with faban and grinder but run into issues with both. I recently handed the issue to Peter. (Peter can you assign it to you)
We discussed and agreed we will do our own little laod generation and statistic collection. (Sad there is nothing that really works).
However using zookeeper for this as well, it might be that much work.

Stefan


Ted Dunning added a comment - 17/Apr/09 08:10 PM

I think that the ZK issue is orthogonal to load testing. It probably occurs most in EC2 due to virtualization delays. It should be possible to emulate that effect with a mocked up Zookeeper interface or by setting session timeouts very small.

Insofar as load testing is concerned, the only really important points that I would bring up would be:

a) should use a large variety of queries to avoid unrealistically good results from caching

b) should use a traffic source that approximates Poisson distribution. One easy way to do this is just have a fair number of real or virtual sources, each with substantial random variation in inter-query timing. We used about 100 sources that had uniform distribution from 0 to 100/desired-query-rate. The reason that this is important is that too much structure in the query stream leads to unreasonably good or bad performance estimates.

Plotting response time versus query generation rate is an interesting exercise. It will clearly diagnose your performance limit and what sorts of pathology result from over-load. The three kinds of important cases I have seen include:

1) well-behaved systems ... response time is nearly constant for query rates below saturation and increases linearly above saturation.

2) realistic, but still pretty well-behaved systems ... as in (1), but at some point response time is exponential or worse with respect to query rate

3) failure ... the response time shows hysteresis effects so that it looks like (1) or (2) as load approaches a critical point, but when load is decreased after exceeding the critical point, the response time stays very high.


Stefan Groschupf added a comment - 27/Apr/09 04:33 AM
Hi Peter,
great contribution! Thanks a lot. Some comments looking through the loadTest code:

+ not sure why you create a execution service with just one thread. I would leave that null until it is set via startSearch.
+ we might even want to make the # of threads configurable and also increase them incrementally to see where the point starts that katta starts queuing.
+ We for sure dont want to use Katta.search(_indexNames, _queryString, _count); but the Client object. Using Katta.search creates a client object with each search, what means it creates a zookeeper connection with each search. This is a lot of overhead.

+ we might want to rename TestSearcherMetaData to LoadTestNodeMetaData
+ We need to use than just one word, we might need to use a complete dictionary, see SampleIndexGenerator for an example. Lucene is just great i caching, so a single world will not give us good results.
+ also I wonder in generell how we want to analyze the result. I suggest we try to make sure the clocks are sync of this boxes. Than we just write a log file with <timestamp, loadTestNodeID, queryString, duration>. If we than just collect the results log files it would be straight forward to compute queries per second and the duration. Maybe we can use jfreechart to create such a graph (http://www.jfree.org/jfreechart/images/PriceVolumeDemo1.png)
Where we graph number of queries per sec and there duration.

+ also it would be super cool if we can have an ant task that runs 1 master, 2 searchNodes, 2 loadTestNodes on a singel server and than run this within a loadTest ant task and generate a graph. My vison would be that we can run this as part of the release and compare the performance over time.

Stefan


Peter Voss added a comment - 27/Apr/09 09:23 AM
Hi Ted,

thanks for your valuable input. I have a first cut implementation that uses random variation of inter-query durations, but I didn't have time to implement using a large variety of queries in my first cut implementation, yet. I will have a look at this later.

Hi Stefan,

+ Right. We don't have to create an execution service at the beginning. That was just to avoid a NPE, but I have changed this to make the code clearer and also added a JUnit test for the case where I wanted to avoid the NPE.
+ I am not using Katta.search anymore. Thanks for this hint.
+ TestSearcherMetaData is now called LoadTestNodeMetaData

I will comment on your other topics later.

--Peter


Ted Dunning added a comment - 27/Apr/09 03:57 PM
+ also I wonder in generell how we want to analyze the result. I suggest we try to make sure the clocks are sync of this boxes. Than we just write a log file with <timestamp, loadTestNodeID, queryString, duration>. If we than just collect the results log files it would be straight forward to compute queries per second and the duration.

Clock synchronization is not much of an issue. If you use most any of the standard kernels (I like the ones from alestic.com) then the clocks will be close enough to correct. In general, the query rate should be changed slowly (over a period of 10's of seconds) that synchronization is not an issue.

+ Maybe we can use jfreechart to create such a graph (http://www.jfree.org/jfreechart/images/PriceVolumeDemo1.png) Where we graph number of queries per sec and their duration.

I have used jfreechart and found it to produce pretty nice charts at a moderate level of effort. My own feeling is that the primary output should be tab-delimited files to make further analysis easy.

I can supply a sample of 20,000 real queries if you like. If you want a realistic corpus, I suggest (some of) wikipedia.


Peter Voss added a comment - 27/Apr/09 07:16 PM
I have just checked in a patch which allows to increase the number of threads used during a test. But I feel that we should keep the number of threads constant and just adjust the query rate as Ted suggests. But that would just be a small change.

Clock synchronization is not an issue because whenever I change a parameter (e.g. query rate our thread count) I am doing this by sending a message to all of the test nodes. So the changed parameter set would take affect at all nodes at the (more or less) same time.

I am currently just reporting: <thread count> <test duration> and I think it also makes sense to also report <node id> <query rate> <start time> and maybe <queryString>. I will take care of this later.


Stefan Groschupf added a comment - 27/Apr/09 09:53 PM
Hi Peter,

But I feel that we should keep the number of threads constant and just adjust the query rate as Ted suggests.

Right it is about to increase the queries per second and not really matters how many threads it is. Though somehow this correspond with each other. For example if you want to do 100 queries per second and each query takes 0.5 seconds than you need at least 50 threads and each thread can make 2 queries within the second.
Or you can simplify it and have one thread per query per second. Than you dont really care how long a query takes.

I actually would log each query with its parameter (startTime, endTime query, nodeId etc).
We later than can run different kind of analysis than if we just have the duration and thread count.


Ted Dunning added a comment - 28/Apr/09 09:03 PM

Right it is about to increase the queries per second and not really matters how many threads it is.

Actually, the number does matter a bit. It is important for the distribution of query arrival times to have a relatively large number of threads going. This is even more important when each query starts to take a while.

In general, the overall average time between queries should be much less than the average for each thread and the average response time for a single query should be much less than the random variation that the query thread adds.

Clock synchronization is not an issue because ... I am doing this by sending a message to all of the test nodes. So the changed parameter set would take affect at all nodes at the (more or less) same time.

It would only be an issue if you look at logs from different nodes that report the same events at different times.

If you log the message that controls query rate, then you should be in great shape. Likewise, you should be fine with EC2 nodes started by a standard kernel since they usually synchronize on boot and don't survive long after boot time.


Peter Voss added a comment - 01/May/09 02:45 PM
According to your comments (thanks for the great input) I have made a few changes to the code. During load testing the query rates is increased now. The number of threads is just an internal detail that is computed depending on the query rate. When starting a load test one can specify the following values:
1. Number of test searcher nodes to use
2. Start query rate (in queries per second)
3. End query rate
4. Query rate step
5. Time per test in ms (the period of time before the query rate gets changed, this probably should be something like 10000 or 30000)
6. Query details

The query rate gets distributed on the test searcher nodes. E.g. when having 5 nodes and a requested query rate of 100, each node will fire search requests at a rate of 20 queries per second.

When running at a query rate of 1 queries per seconds I am using 1 thread and an inter-query time of (1000ms - actual query duration). To randomize this a bit I am using random( 2 * (1000ms - actual query duration) ).

I am writing out statistics for each query (query rate, node id, start time, end time, elapsed time, query).
I am also writing out end results for each test iteration (requested query rate, achieved query rate, number of failed tests, average search duration, standard deviation)

So I think this should be a good start. The only part that really is missing is to use a dictionary of queries to avoid Lucene caching, We also need to build a proper example index that is suitable for load tests.


Peter Voss added a comment - 31/Oct/09 05:36 PM
I am currently migrating the existing load test code to the new design provided by KATTA-54 beginning of next week. I hope that I can finish my work beginning of next week.

Johannes Zillmann added a comment - 08/Dec/09 01:47 PM
Hey Peter, what is the status of this ?

Johannes Zillmann added a comment - 07/Jan/10 04:59 PM
I ported Peters code to the current katta.
I did a simplification in having one thread per query (this was more robust when coming into the limit area),

The loadtest is executed on a seperate katta cluster. So to run it you start one zookeeper and two katta clusters with different rootPathes.
For some shortcomings with the current implementation i opened KATTA-108.

I will update the documentation soon.