[peruser] Patch for busy servers

Marcelo Coelho marcelo at tpn.com.br
Wed Jan 5 07:48:04 MST 2011


Hi Taavi,

Thank you for reporting this problem. Now it is fixed.

Patch from RC2:
http://opensource.mco2.net/download/apache/peruser/peruser-rc2-to-rc3-v7.patch

Full patch from vanilla Apache 2.2.17:
http://opensource.mco2.net/download/apache/peruser/peruser-rc3-full-v7.patch

Changes (from RC2):

* (v7) Bug fixed: multiplexers now can clone a processor child if all workers are busy.
* (v6) Bug fixed: apachectl graceful now working properly, without "long lost child" errors
* (v5) Not released to public
* (v4) Code cleanup
* (v4) Performance: childs are started in ~25ms, 40 times faster than in RC2 (~1000ms)
* (v4) Bug fixed: now checking if total_processors is 1 (first access) to start StartProcessors
* (v3) Performance: new child type (CHILD_TYPE_RESERVED) to avoid collision (2 childs trying to get the same free slot)
* (v3) Bug fixed: fixed a bug in RC2, wait_timeout was always 0, never sleeping to wait for new workers.
* (v2) Performance: StartProcessors, new configuration directive to control the number of child processors per vhost at startup
* (v2) Performance: childs are started in ~50ms, 20 times faster than in RC2 (~1000ms)
* (v1) Performance: faster to lookup for free slots (this is important on busy servers, with many virtual hosts)
* (v1) Performance: faster to count processors, one single loop counts all processors
* (v1) Bug fixed: bug when MinSpareProcessors is set to 0 (now all workers processes are killed when idle_timeout is reached)
* (v1) Bug fixed: Free-up slots when a WORKER or PROCESSOR unexpectedly dies

--
Marcelo Coelho
marcelo at mco2.com.br


On Jan 5, 2011, at 8:44 AM, Taavi Sannik wrote:

> Hello again!
> 
> I see that you have added a special case if MinProcessors is 0, then it will allow processor count to be below MinSpareProcessors (if IdleTimeout is reached).
> This gets really bad, if one of the active workers hangs and all the other workers get killed, because noone would accept new connections and noone will clone new children.
> 
> The steps to reproduce this:
> - use this peruser configuration:
> <IfModule peruser.c>
>    ServerLimit 700
>    MaxClients 700
>    MinSpareProcessors 1
>    MaxSpareProcessors 20
>    MinProcessors 0
>    MaxProcessors 80
>    MaxRequestsPerChild 1000
>    ExpireTimeout 7200
>    IdleTimeout 10
>    MinMultiplexers 3
>    MaxMultiplexers 40
>    MultiplexerIdleTimeout 120
>    ProcessorWaitTimeout 5
> </IfModule>
> - create an infinite sleep script (for example in PHP: <?php while(true) sleep(1); ?>)
> - start the server and run lynx on this script (lynx http://hostname/sleep.php). Lynx will start to wait for the response.
> - if you look at the server-status or ps aux, you can see 2 workers (one of them is handling the sleep script, and the second is idle).
> - wait until idletimeout kicks in. There is now only the "sleeping" worker left and the virtualhost is no longer accessible.
> - run ab -c 100 -n 10000 against the dead virtualhost (you may need to repeat it a couple of times as ab timeouts).
> - the whole server is now not accessible, all multiplexers have been spawned and are trying to forward the requests to the dead virtualhost's workers but there is noone to accept them.
> 
> There would be 2 ways to fix this:
> - rewrite the child cloning part and make multiplexers able to clone other children
> - disallow setting MinSpareProcessors to 0. If MinProcessors is 0 then kill the idle workers only if there are no active workers and all the workers have their idletimeout limit reached.
> 
> 
> Cheers,
> Taavi
> _______________________________________________
> Peruser mailing list
> Peruser at telana.com
> http://www.telana.com/mailman/listinfo/peruser
> 



More information about the Peruser mailing list