[peruser] Fix graceful restarts on busy servers with multiple processors
Lazy
lazy404 at gmail.com
Wed May 21 04:20:29 MDT 2008
There are 2 race conditions in peruser.
1) Due to lack of locking multiple Multiplexers exit from poll-ing
listening sockets and only 1 accept it, others will block on accept,
normally this is ok
but on graceful restart these blocked multiplexers stay behind
blocking, the listening sockets sometimes segfaults. With this patch
graceful restarts are not noticeable as they should, without it on our
servers graceful restarts produce around 1 minute downtime (2xdc
opteron with ~600 Processors)
2) same thing in recv_from_multiplexer(), workers get blocked on
recv_msg() and they cant read the pod
The patch reenables locking around pool() for multiplexers only. To
make this working you have to use 'User nobody' in apache config
if your multiplexers run as nobody.
And it fixes 2 by making it non blocking, unlucky workers get back to
the poll(). Maybe this should be fixed by multiplexer sending some
killmessage threw the control pipe while gracefull restart and making
recv_from_multiplexer blocking again, or do some per senv locking.
Unblocking it isn't optimal for performance but it
fixes graceful restarts and I can upp ExpireTimeout significantly
because no children are lost anymore.
--
Michal Grzedzicki
-------------- next part --------------
A non-text attachment was scrubbed...
Name: peruser-locking.patch
Type: text/x-diff
Size: 3844 bytes
Desc: not available
Url : http://www.telana.com/pipermail/peruser/attachments/20080521/cc24cf21/attachment.bin
More information about the Peruser
mailing list