You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in a "least busy process first" manner. This article explains the implications and details of our request load balancing mechanism.
4
+
At its core, Passenger is a process manager and HTTP request router. In order to minimize response times, and to distribute load over multiple CPU cores for optimal performance, Passenger load balances requests over processes in an intelligent manner. This article explains the implications and details of our request load balancing mechanism.
5
5
6
6
**Table of contents**
7
7
@@ -15,15 +15,15 @@ At its core, Passenger is a process manager and HTTP request router. In order to
15
15
<%else%>
16
16
<%=language_name%> applications can only handle 1 request at the same time.
17
17
<%end%>
18
-
With Passenger this is improved by running multiple processes of the application using pooled application groups. For each request from the incoming request queue, a non-busy (free) application process is selected to handle the request.
18
+
With Passenger this is improved by running multiple processes of the application using pooled application groups. For each request from the incoming request queue, a non-completely-busy (free) application process is selected to handle the request.
For thread-safe Ruby apps it is also possible to enable [multithreading](<%=url_for"/references/config_reference/nginx/#passenger_concurrency_model"%>), which allows the application processes to concurrently handle multiple requests at the same time -- up to the amount of threads configured. In this case, Passenger forwards the request to the instance that is currently handling the least number of requests.
23
+
For thread-safe Ruby apps it is also possible to enable [multithreading](<%=url_for"/references/config_reference/nginx/#passenger_concurrency_model"%>), which allows the application processes to concurrently handle multiple requests at the same time -- up to the amount of threads configured. In this case, Passenger forwards the request to the instance that meets the criteria described [below](#intelligent-routing).
24
24
<%end%>
25
25
26
-
If all application processes and threads are busy, Passenger spawns a new instance, up to the [process limit](<%=url_for"/references/config_reference/nginx/#passenger_concurrency_model"%>) for the group. The amount of processes of the groups combined may also not exceed the limit for the application pool. If either limit is reached, the request remains in the queue (which has its own [limit](#request-queue-overflow)).
26
+
If all application processes and threads are completely busy, Passenger spawns a new instance, up to the [process limit](<%=url_for"/references/config_reference/nginx/#passenger_concurrency_model"%>) for the group. The amount of processes of the groups combined may also not exceed the limit for the application pool. If either limit is reached, the request remains in the queue (which has its own [limit](#request-queue-overflow)).
<%=language_name%> applications normally execute in a single thread/process, using a single CPU core. Passenger enables running multiple instances of the application (multiple processes) using pooled application groups, distributing requests to the process that is currently handling the least amount of requests.
@@ -58,31 +58,70 @@ A core concept in the load balancing algorithm is that of the **maximum process
58
58
59
59
For this reason, load balancing requests between multiple processes is beneficial.
60
60
61
-
## Least-busy-process-first routing
61
+
### App Generations
62
+
63
+
Another core concept in the load balancing algorithm is that of the **app generations**. Every time Passenger is asked to [restart an app group](<%=url_for"/advanced_guides/troubleshooting/nginx/restart_app.html"%>), a counter is incremented. Processes created after the restart are labeled with a generation which is taken from this counter. Passenger usually performs a blocking restart, which means that there is only one generation of processes alive at a time, however if you initiate a [rolling restart](<%=url_for"/advanced_guides/deployment_and_scaling/nginx/zero_downtime_redeployments/ruby/index.html"%>), multiple generations can be alive at the same time, and Passenger will take this into account when load balancing requests.
64
+
65
+
## Intelligent routing
62
66
63
67
### Algorithm summary
64
68
65
-
Passenger keeps a list of application processes. For each application process, Passenger keeps track of how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that is handling the least number of requests (the one that is "least busy").
69
+
Passenger keeps a list of application processes. For each application process, Passenger keeps track of: the app generation that resulted in it being started, when it was started, and how many requests it is currently handling. When a new request comes in, Passenger routes the request to the process that meets the following criteria:
70
+
71
+
<ol>
72
+
<li>is not completely busy,</li>
73
+
<li>is part of the newest available app generation<sup>1</sup> meeting the previous condition,</li>
74
+
<li>is the oldest process meeting the previous conditions,</li>
75
+
<li>is the least busy process meeting the previous conditions.</li>
76
+
</ol>
77
+
78
+
[1] Unless blocked by <ahref="<%=url_for"/advanced_guides/deployment_and_scaling/standalone/deployment_error_resistance.html"%>">Deployment Error Resistance</a>.
66
79
67
80
<aname="algorithm_ordered"></a>
68
81
69
-
### First available process in the list has highest priority
82
+
### Process with the highest app generation in the list has highest priority
83
+
84
+
If there are multiple processes that are not completely busy, then Passenger will pick one from the list with the highest app generation:
85
+
86
+
For example, suppose that there are 3 application processes:
87
+
88
+
Process A: gen 1, spawned 1s ago, handling 0 requests
89
+
Process B: gen 1, spawned 1s ago, handling 0 requests
90
+
Process C: gen 2, spawned 1s ago, handling 0 requests
91
+
92
+
Process C will be chosen for the next incoming request.
93
+
94
+
This speeds the rate at which requests are handled by the newer generation of app processes.
95
+
96
+
### Oldest available process in a generation has highest priority
70
97
71
-
If there are multiple processes that have the least busyness, then Passenger will pick the first one in the list. For example, suppose that there are 3 application processes:
98
+
If there are multiple processes from the newest generation, that not completely busy; then Passenger will pick the one that was started first, from the list.
72
99
73
-
Process A: handling 1 request
74
-
Process B: handling 0 requests
75
-
Process C: handling 0 requests
100
+
For example, suppose that there are 3 application processes:
76
101
77
-
On the next request, Passenger will always pick B, never C.
102
+
Process A: gen 2, spawned 30s ago, handling 0 requests
103
+
Process B: gen 3, spawned 20s ago, handling 0 requests
104
+
Process C: gen 3, spawned 10s ago, handling 0 requests
78
105
79
-
This property is used by the [dynamic process scaling](<%=url_for"/advanced_guides/in_depth/ruby/dynamic_scaling_of_app_processes/index.html"%>) algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the first process with least busyiness (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.
106
+
Process B will be chosen for the next incoming request.
80
107
81
-
Another advantage of picking the first process is that it improves application-level caching. Since the first process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm. Examples of such caches include: in-memory hash tables, JIT caches, etc.
108
+
This property is used by the [dynamic process scaling](<%=url_for"/advanced_guides/in_depth/ruby/dynamic_scaling_of_app_processes/index.html"%>) algorithm. Dynamic process scaling works by shutting down processes that haven't received requests for a while (processes that are "idle"). By routing to the oldest process that is not completely busy (instead of, say, a random one, using round-robin), Passenger gives other processes the chance to become idle and thus eligible for shutdown.
109
+
110
+
Another advantage of picking the oldest process is that it improves caching. Since the oldest process is the most likely candidate for load balancing, it will have the most chance to keep its cache warm, and therefore process requests more quickly.
111
+
112
+
### Least busy process as tie breaker
113
+
114
+
If there are multiple processes that have met all the previous conditions, then Passenger will pick the least busy one. For example, suppose that there are 3 application processes:
115
+
116
+
Process A: gen 3, spawned 10s ago, handling 1 requests
117
+
Process B: gen 3, spawned 10s ago, handling 0 requests
118
+
Process C: gen 3, spawned 10s ago, handling 2 requests
119
+
120
+
Process B will be chosen for the next incoming request.
82
121
83
122
### Traffic may appear unbalanced between processes
84
123
85
-
Because Passenger [prefers to load balance to the first request](#algorithm_ordered), traffic may appear unbalanced between processes. Here is an example from `passenger-status`:
124
+
Because Passenger [prefers to load balance to the oldest request in the generation](#algorithm_ordered), traffic may appear unbalanced between processes. Here is an example from `passenger-status`:
86
125
87
126
~~~
88
127
/var/www/phusion_blog/current/public:
@@ -113,7 +152,7 @@ Instead, the Passenger implementation can be compared to using a single (shared)
113
152
<%iflanguage_min_concurrency == 1%>
114
153
### Example with maximum concurrency 1
115
154
116
-
Suppose that you have 3 application processes, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:
155
+
Suppose that you have 3 application processes, spawned in order, and each process's maximum concurrency is 1. When the application is idle, none of the processes are handling any requests:
117
156
118
157
Process A [ ]
119
158
Process B [ ]
@@ -125,7 +164,7 @@ When a new request comes in (let's call this α), Passenger will decide to route
125
164
Process B [ ]
126
165
Process C [ ]
127
166
128
-
Suppose that, while α is still in progress, a new requests comes in (which we call β). That request will be load balanced to process B because it is the least busy one:
167
+
Suppose that, while α is still in progress, a new request comes in (which we call β). That request will be load balanced to process B because it is the oldest not-completely-busy process:
129
168
130
169
Process A [α]
131
170
Process B [β]
@@ -143,6 +182,8 @@ If another request comes in (which we call ɣ), that request will be routed to A
143
182
Process B [β]
144
183
Process C [ ]
145
184
185
+
This keeps process A's caches warm, and may allow process C to idle and be shutdown sooner.
186
+
146
187
<%end%>
147
188
<%iflanguage_max_concurrency != 1%>
148
189
### Example with maximum concurrency 4
@@ -163,15 +204,15 @@ When a new request comes in (which we call α, Passenger will decide to route th
163
204
Process A [α ]
164
205
Process B [ ]
165
206
166
-
Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process B because it is the least busy one:
207
+
Suppose that, while α is still in progress, 1 more request comes in (which we call β). That request will be load balanced to process A because it is the oldest not-completely-busy one:
167
208
168
-
Process A [α ]
169
-
Process B [β ]
209
+
Process A [αβ ]
210
+
Process B [ ]
170
211
171
212
Suppose that another request comes in (which we call ɣ). That will be load balanced to process A again, not to B:
0 commit comments