-
Notifications
You must be signed in to change notification settings - Fork 729
simplify codebase by using one thread/conn, instead of preforked procs #218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ab13d8
to
2cbaba5
Compare
the existing codebase used an elaborate and complex approach for its parallelism: 5 different config file options, namely - MaxClients - MinSpareServers - MaxSpareServers - StartServers - MaxRequestsPerChild were used to steer how (and how many) parallel processes tinyproxy would spin up at start, how many processes at each point needed to be idle, etc. it seems all preforked processes would listen on the server port and compete with each other about who would get assigned the new incoming connections. since some data needs to be shared across those processes, a half- baked "shared memory" implementation was provided for this purpose. that implementation used to use files in the filesystem, and since it had a big FIXME comment, the author was well aware of how hackish that approach was. this entire complexity is now removed. the main thread enters a loop which polls on the listening fds, then spins up a new thread per connection, until the maximum number of connections (MaxClients) is hit. this is the only of the 5 config options left after this cleanup. since threads share the same address space, the code necessary for shared memory access has been removed. this means that the other 4 mentioned config option will now produce a parse error, when encountered. currently each thread uses a hardcoded default of 256KB per thread for the thread stack size, which is quite lavish and should be sufficient for even the worst C libraries, but people may want to tweak this value to the bare minimum, thus we may provide a new config option for this purpose in the future. i suspect that on heavily optimized C libraries such a musl, a stack size of 8-16 KB per thread could be sufficient. since the existing list implementation in vector.c did not provide a way to remove a single item from an existing list, i added my own list implementation from my libulz library which offers this functionality, rather than trying to add an ad-hoc, and perhaps buggy implementation to the vector_t list code. the sblist code is contained in an 80 line C file and as simple as it can get, while offering good performance and is proven bugfree due to years of use in other projects.
if we don't handle these gracefully, pretty much every existing config file will fail with an error, which is probably not very friendly. the obsoleted config items can be made hard errors after the next release.
since the write syscall is used instead of stdio, accesses have been safe already, but it's better to use a mutex anyway to prevent out- of-order writes.
…heck tinyproxy used to do a full hostname resolution whenever a new client connection happened, which could cause very long delays (as reported in tinyproxy#198). there's only a single place/scenario that actually requires a hostname, and that is when an Allow/Deny rule exists for a hostname or domain, rather than a raw IP address. since it is very likely this feature is not very widely used, it makes absolute sense to only do the costly resolution when it is unavoidable.
it is quite easy to bring down a proxy server by forcing it to make connections to one of its own ports, because this will result in an endless loop spawning more and more connections, until all available fds are exhausted. since there's a potentially infinite number of potential DNS/ip addresses resolving to the proxy, it is impossible to detect an endless loop by simply looking at the destination ip address and port. what *is* possible though is to record the ip/port tuples assigned to outgoing connections, and then compare them against new incoming connections. if they match, the sender was the proxy itself and therefore needs to reject that connection. fixes tinyproxy#199.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested rofl0r:threads
on Ubuntu 18.04 via AWS EC2
Installed via Git & Compile
51 git clone --single-branch --branch threads https://github.com/rofl0r/tinyproxy.git
52 cd tinyproxy/
53 ./autogen.sh
54 ./configure
55 nano Makefile
56 make
57 make install
58 nano /usr/local/etc/tinyproxy/tinyproxy.conf
59 tinyproxy
60 ss -lntup | grep tinyproxy
Used BasicAuth
configuration
User nobody
Group nogroup
Port 8888
Timeout 600
DefaultErrorFile "/usr/local/share/tinyproxy/default.html"
StatFile "/usr/local/share/tinyproxy/stats.html"
Syslog On
LogLevel Info
MaxClients 100
BasicAuth username password
DisableViaHeader Yes
Outcomes: Worked Great. No issues.
Tested "rofl0r:threads" as reverse proxy on freetz (mips) by changing the make file and patches. Works as expected but the memory footprint is high. A config option for this purpose is recommended.. |
thanks for testing. could you provide some comparison numbers before/after with the same btw in src/child.c, there's this block: if (pthread_attr_init(&attr) == 0) {
attrp = &attr;
pthread_attr_setstacksize(attrp, 256*1024);
}
if (pthread_create(&child->thread, attrp, child_thread, child) $
sblist_delete(childs, sblist_getsize(childs) -1);
free(child);
goto oom;
} in the line with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solves the issue presented on: #237
Tested this on MacOS:
It works with 64*1024 Bytes, it crashes at lower values. |
After restart the memory footprint (no matter which setting for 'MaxClients') is: old model - Virtual memory size (932 kB) - Resident set size (400 kB) After calling the same URLs from outside the memory footprint for the old model stays the same. MaxClients, Virtual memory size, Resident set size I will try to change the stack size value later... |
Sorry i think i have forgotten the worker for the old model. Every worker consumes 940 kB of extra memory. This means that the footprint of the threaded model is lower than the old model? |
yeah, my expectation was that the threaded model would use less memory, if the stacksize is set to a low value and a reasonable MaxClient limit is set, because the old model uses |
@rofl0r do you have time frame when will be merged ? |
totally awesome! |
Hello, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case, 10 000 connections , this branch works better than the master branch. Proxy request answer more quicky ( loop proxy detection is important in our specifiq use)
@limbo127: have you been able to do memory usage comparisons ? |
after 3 hours, Res memory is 15920 for the tinyproxy.thread ( 3000 clients
// 100 request /s )
for the others tinyproxy.thread proxy memory is ~ 6000 ( < 200 clients )
, so the memory seems to be increased during time and request, I do not
known if it is a normal behaviour
regards
Nicolas
Le ven. 9 août 2019 à 15:37, rofl0r <[email protected]> a écrit :
… @limbo127 <https://github.com/limbo127>: have you been able to do memory
usage comparisons ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUBGYFNGF4ELFOGW5MDQDVXK5A5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD36WD6I#issuecomment-519922169>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUBSA7V4RKFP2WFBXITQDVXK5ANCNFSM4GKWDINQ>
.
|
15920 would be KBs, right ? so rougly 16 MB? that's not so bad given that each of the 3000 clients has a separate thread (in the old model, many clients would share a pre-forked process) |
Yes
Le sam. 10 août 2019 à 02:42, rofl0r <[email protected]> a écrit :
… 15920 would be KBs, right ? so rougly 16 MB? that's not so bad giving that
each of the 3000 clients has a separate thread (in the old model, many
clients would share a pre-forked process)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUH5NCT6DQBLT37LG4DQDYFFVA5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ACLVQ#issuecomment-520103382>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUEV7W6MEPOEONAD26DQDYFFVANCNFSM4GKWDINQ>
.
|
hello,
a lot of ERROR Aug 22 11:06:32 [29486]: read_buffer: read() failed on
fd 3: Connection reset by peer
ERROR Aug 22 11:06:33 [29486]: read_buffer: read() failed on fd 3:
Connection reset by peer
ERROR Aug 22 11:06:38 [29486]: read_buffer: read() failed on fd 7:
Connection reset by peer
ERROR Aug 22 11:06:45 [29486]: read_buffer: read() failed on fd 5:
Connection reset by peer
ERROR Aug 22 11:06:52 [29486]: read_buffer: read() failed on fd 5:
Connection reset by peer
perhaps a client issue, but it seems to block tinyproxy thread, i need to
restart tinyproxy, with message : dren: 2 threads still alive!
nicolas
Le sam. 10 août 2019 à 02:42, rofl0r <[email protected]> a écrit :
… 15920 would be KBs, right ? so rougly 16 MB? that's not so bad giving that
each of the 3000 clients has a separate thread (in the old model, many
clients would share a pre-forked process)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUH5NCT6DQBLT37LG4DQDYFFVA5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4ACLVQ#issuecomment-520103382>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUEV7W6MEPOEONAD26DQDYFFVANCNFSM4GKWDINQ>
.
|
can you elaborate ? i have no idea what "it" is nor which thread, nor what is being blocked. |
I'm sorry, I have not more debug. I just need to kill and restart
tinyproxy, it seems it does not answer to request.
nicolas
Le jeu. 22 août 2019 à 19:17, rofl0r <[email protected]> a écrit :
… it seems to block tinyproxy thread
can you elaborate ? i have no idea what "it" is nor which thread, nor what
is being blocked.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUG4JJOQY6IUKXF2BADQF3C2LA5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD45Y25Q#issuecomment-523996534>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUEWRLRZ7ULHKKIVXDTQF3C2LANCNFSM4GKWDINQ>
.
|
is that new ? it seems you were quite happy with this branch a couple weeks ago. |
Yes, i think it's working better than standard tinyproxy.
I 'm trying to reproduce
Le jeu. 22 août 2019 à 22:58, rofl0r <[email protected]> a écrit :
… it does not answer to request
is that new ? it seems you were quite happy with this branch a couple
weeks ago.
if so, what changed in your setup?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUFJ772EGS46XKDDWRTQF34XZA5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD46L6OY#issuecomment-524074811>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUBG76RYET6QGML23R3QF34XZANCNFSM4GKWDINQ>
.
|
What is the status of this branch? I made a comparison between the current stable version and this branch and found that this solution works without throwing connection timeouts and connection refused errors. The current stable build has some unexpected behaviour when servers are under load. Config I used (for the current stable build): |
it's quite complete and master will be switched to this branch soon. just need to find time to revise the SIGHUP config reloading system which could potentially interfere in unexpected ways with the running threads. |
Hello,
our tests shows us that's thread tinyproxy crash sometimes, but no log at
this time
regards,
Nicolas Prochazka
Le lun. 23 sept. 2019 à 16:10, rofl0r <[email protected]> a écrit :
… What is the status of this branch
it's quite complete and master will be switched to this branch soon. just
need to find time to revise the SIGHUP config reloading system which could
potentially interfere in unexpected ways with the running threads.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218?email_source=notifications&email_token=AAMCNUBETQIQIRM6J4LCKITQLDE53A5CNFSM4GKWDIN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7K7R4Y#issuecomment-534116595>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMCNUARCVSSKXVUC2GZPCTQLDE53ANCNFSM4GKWDINQ>
.
|
Hello, Regards, |
after how much time? can you maybe post contents of /proc/$pid/status when this happens, where $pid is tinyproxy's pid ? |
merged as 69c86b9...cd005a9. this branch is now the new master branch targetting 1.11.0, master has been renamed to tp-1.10.x and will only receive bugfixes. |
I'm also having random read_buffer: read() failed on fd 8: Connection reset by peer, is a fixed newer version outhere? working on ubuntu 18.04, tinyproxy 1.8.4 |
@talishka: in order to get the benefits of this branch you need to compile git master from source. there is no release containing the changes yet. if you have further questions/issues, please open a new issue here, thanks. |
the existing codebase used an elaborate and complex approach for
its parallelism:
5 different config file options, namely
were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.
this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.
currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.
since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.