A client's Django application is intermittently (about twice a day) throwing RuntimeError("Unable to create a new session key."):
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py", line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.6/dist-packages/django/contrib/admin/views/decorators.py", line 17, in _checklogin
if request.user.is_active and request.user.is_staff:
File "/usr/local/lib/python2.6/dist-packages/django/contrib/auth/middleware.py", line 9, in __get__
request._cached_user = get_user(request)
File "/usr/local/lib/python2.6/dist-packages/django/contrib/auth/__init__.py", line 107, in get_user
user_id = request.session[SESSION_KEY]
File "/usr/local/lib/python2.6/dist-packages/django/contrib/sessions/backends/base.py", line 47, in __getitem__
return self._session[key]
File "/usr/local/lib/python2.6/dist-packages/django/contrib/sessions/backends/base.py", line 195, in _get_session
self._session_cache = self.load()
File "/usr/local/lib/python2.6/dist-packages/django/contrib/sessions/backends/cache.py", line 16, in load
self.create()
File "/usr/local/lib/python2.6/dist-packages/django/contrib/sessions/backends/cache.py", line 33, in create
raise RuntimeError("Unable to create a new session key.")
RuntimeError: Unable to create a new session key.
As you can see from the traceback, this happens deep in the bowels of django.contrib.sessions when using the cache session backend with the Memcached cache backend.
A Django Trac ticket (https://code.djangoproject.com/ticket/14093) suggests changing the session key hash from MD5 to UUID4, but that's no help -- the problem is the network. I've observed (with tcpdump) that this exception can occur when the TCP connection from app server to Memcache server times out due to packet loss.
We have two app servers and one Memcached (1.4.2) server, all running in Amazon EC2. During periods of high demand, I've observed one app server exchanging 75,000 packets/second with the Memcache server. During this period of high demand, I observed one SYN packet for a new Memcache connection get lost, resulting in a python-memcache connection timeout (before the kernel even had a chance to retransmit) and a RuntimeError.
I'm at a loss for how to solve this. I'd like to tune Linux's TCP retransmit timer lower than three seconds, but it's not tunable. Failing that, I'd like to have python-memcache retry a connection a couple of times before giving up, but it won't. I see that pylibmc has configurable connect and retry behaviour, but I haven't been able to find a combination of options that work around the packet loss.
Ideas?