The last days we've starting experiencing some problems with our webservers where they start using all the CPU and load average goes to 10-14(rarely above 2 in normal conditions). Debugging suggests is caused by pecl/memcache 3.0.4 or memcached itself.
Using strace on the httpd process it reveals an seemingly infinte loop using the syscall select(), waiting for activity on a socket. The socket is an connection to one of the memcached(1.2.6) servers.
The source for pecl/memcache 3.0.4 goes like this:
void mmc_pool_select(mmc_pool_t *pool TSRMLS_DC) /*
runs one select() round on all scheduled requests {{{ */
{
...
result = select(nfds + 1, &(pool->rfds), &(pool->wfds), NULL, &tv);
...
[lots of code for sending and recieving data]
}
Nothing seems odd about the code in the mmc_pool_select() and there's no loops inside that can cause an infinite looping of select(). The most likely source of this loop lies in mmc_pool_run():
void mmc_pool_run(mmc_pool_t *pool TSRMLS_DC) /*
runs all scheduled requests to completion {{{ */
{
...
while (pool->reading->len || pool->sending->len) {
mmc_pool_select(pool TSRMLS_CC);
}
}
This simply seems like something triggers the socket so select() continues to run the mmc_pool_select() routine, but it never recieves or sends anything. In turn this never sets pool->reading->len or pool->sending->len to 0 and then you got an infinite loop.
What is the most likely source of the problem here? pecl/memcache or memcached itself? If it's the first we probably could solve it with switching to pecl/memcached.
Ingen kommentarer:
Legg inn en kommentar