EC2 is probably not the most effective environment to run Redis on virtualized hardware, however, it's a preferred one, and there are a number of points to understand to get the best from Redis on this platform.
The below points are summarized from this documentation - http://redis.io/topics/persistence
It is not specific to EC2, but Redis is significantly slower when running on a VM (in term of maximum supported throughput). This is due to the fact for basic operations, Redis does not add much overhead to the epoll/read/write system calls required to handle client connections (like Memcached, or other efficient keys/value stores). System calls are typically more expensive on a VM, and they represent a significant part of Redis activity (especially in benchmarks). In those conditions, a 50% decrease in term of maximum throughput compared to bare metal is not uncommon.
Of course, it also depends on the quality of the hypervisor. For EC2, Xen is used.
Benchmarking in good conditions
Benchmarking are often tricky, especially on a platform like EC2. One point often forgotten is to ensure a proper configuration for both the benchmark client and server. For instance, do not run Redis-benchmark on a CPU starved micro-instance (which will likely be throttled down by Amazon) while targeting your Redis server. Both machines are equally important to get good maximum throughput.
Actually, to evaluate Redis performance, you need to:
- run redis-benchmark locally (on the same machine than the server), assuming you have more than one vCPU core.
- run redis-benchmark remotely (from a different VM), on a machine whose QoS configuration is equivalent to the server machine
So you can evaluate and compare the performance of the machines and the network.
On EC2, you may have the most effective results with second-generation m3 instances (or high-memory, or cluster compute instances) thus you'll benefit of HVM (hardware virtualization) rather than relying on slower para-virtualization.
The fork issue
This is not specific to EC2, but to Xen: forking a large process can be really slow on Xen (it looks better with kvm). For Redis, this can be a big drawback if you propose to use persistence: both persistence options (RDB or AOF) need the main thread to fork and launch background save or rewrite processes.
In some cases, fork latency will freeze the Redis event loop for several seconds. The additional memory managed by the Redis instance, the more latency.
On EC2, be sure to use an HVM enabled instance (M3, high-memory, cluster), it will mitigate the issue.
Then, if you have large memory requirements, and your application can tolerate it, consider running several smaller Redis instances on the same machine, and share your data. It will decrease the latency because of fork operations to an acceptable level.
If you use RDB, keep in mind the memory copy-on-write mechanism will start duplicating pages once the save background process has been forked off. So you wish to ensure there's enough memory for Redis itself, plus some margin to cope with the COW. the amount of extra memory depends on your workload. The more you write in the instance, the more extra memory you need.
Please note writing a file may also consume some memory (because of the filesystem cache), so during a Redis background save, you need to account for Redis memory, COW overhead, and size of the dump file.
The machine running the Redis server must not ever swap. If it does, the result will be catastrophic. Contrary to other stores, Redis is not virtual memory friendly.
With Linux, take care to set sensible system parameters: vm.overcommit_memory=1 and vm.swappiness=0 (or a very low value anyway). Do not use previous kernel versions: they're quite bad at enforcing a low swappiness (resulting in swapping when a huge file is written).
If you use AOF, review the fsync options. It is a tradeoff between raw performance and durability of the write operations. You need to make a choice and define a strategy.
You also need to get familiar with the EC2 storage options. On some VM, you've got the choice between ephemeral storage and EBS. On some others, you only have EBS.
Ephemeral storage is generally faster, and you will probably get fewer issues than with EBS, but you can easily lose your data in case of disk failure or reboot of the host, etc ... You can imagine putting RDB snapshots on ephemeral storage, and then copying the resulting files to EBS directories, as a tradeoff between performance and robustness.
EBS is remote storage: it may eat the standard network bandwidth allocated to the VM, and impact the maximum throughput of Redis. If you propose to use EBS, think about choosing the "EBS-optimized" option to establish a QoS between the standard network and storage links.
Finally, a very common setup for performance demanding instances with EC2 is to deactivate persistence on the master, and only activate it on a slave instance. It is probably less safe for the data, but it may prevent a lot of potential latency issues on the master.