Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

rofl0r
Copy link
Contributor

@rofl0r rofl0r commented Dec 17, 2018

the existing codebase used an elaborate and complex approach for
its parallelism:

5 different config file options, namely

  • MaxClients
  • MinSpareServers
  • MaxSpareServers
  • StartServers
  • MaxRequestsPerChild

were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.

this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.

currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.

since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.

@rofl0r rofl0r force-pushed the threads branch 3 times, most recently from 0ab13d8 to 2cbaba5 Compare December 17, 2018 01:53
the existing codebase used an elaborate and complex approach for
its parallelism:

5 different config file options, namely

- MaxClients
- MinSpareServers
- MaxSpareServers
- StartServers
- MaxRequestsPerChild

were used to steer how (and how many) parallel processes tinyproxy
would spin up at start, how many processes at each point needed to
be idle, etc.
it seems all preforked processes would listen on the server port
and compete with each other about who would get assigned the new
incoming connections.
since some data needs to be shared across those processes, a half-
baked "shared memory" implementation was provided for this purpose.
that implementation used to use files in the filesystem, and since
it had a big FIXME comment, the author was well aware of how hackish
that approach was.

this entire complexity is now removed. the main thread enters
a loop which polls on the listening fds, then spins up a new
thread per connection, until the maximum number of connections
(MaxClients) is hit. this is the only of the 5 config options
left after this cleanup. since threads share the same address space,
the code necessary for shared memory access has been removed.
this means that the other 4 mentioned config option will now
produce a parse error, when encountered.

currently each thread uses a hardcoded default of 256KB per thread
for the thread stack size, which is quite lavish and should be
sufficient for even the worst C libraries, but people may want
to tweak this value to the bare minimum, thus we may provide a new
config option for this purpose in the future.
i suspect that on heavily optimized C libraries such a musl, a
stack size of 8-16 KB per thread could be sufficient.

since the existing list implementation in vector.c did not provide
a way to remove a single item from an existing list, i added my
own list implementation from my libulz library which offers this
functionality, rather than trying to add an ad-hoc, and perhaps
buggy implementation to the vector_t list code. the sblist
code is contained in an 80 line C file and as simple as it can get,
while offering good performance and is proven bugfree due to years
of use in other projects.
if we don't handle these gracefully, pretty much every existing config
file will fail with an error, which is probably not very friendly.

the obsoleted config items can be made hard errors after the next
release.
since the write syscall is used instead of stdio, accesses have been
safe already, but it's better to use a mutex anyway to prevent out-
of-order writes.
…heck

tinyproxy used to do a full hostname resolution whenever a new client
connection happened, which could cause very long delays (as reported in tinyproxy#198).

there's only a single place/scenario that actually requires a hostname, and
that is when an Allow/Deny rule exists for a hostname or domain, rather than
a raw IP address. since it is very likely this feature is not very widely used,
it makes absolute sense to only do the costly resolution when it is unavoidable.
it is quite easy to bring down a proxy server by forcing it to make
connections to one of its own ports, because this will result in an endless
loop spawning more and more connections, until all available fds are exhausted.
since there's a potentially infinite number of potential DNS/ip addresses
resolving to the proxy, it is impossible to detect an endless loop by simply
looking at the destination ip address and port.

what *is* possible though is to record the ip/port tuples assigned to outgoing
connections, and then compare them against new incoming connections. if they
match, the sender was the proxy itself and therefore needs to reject that
connection.

fixes tinyproxy#199.
Copy link

@BrittonWinterrose BrittonWinterrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested rofl0r:threads on Ubuntu 18.04 via AWS EC2

Installed via Git & Compile

   51  git clone --single-branch --branch threads https://github.com/rofl0r/tinyproxy.git
   52  cd tinyproxy/
   53  ./autogen.sh
   54  ./configure
   55  nano Makefile
   56  make
   57  make install
   58  nano /usr/local/etc/tinyproxy/tinyproxy.conf
   59  tinyproxy
   60  ss -lntup | grep tinyproxy

Used BasicAuth configuration

User nobody
Group nogroup
Port 8888
Timeout 600
DefaultErrorFile "/usr/local/share/tinyproxy/default.html"
StatFile "/usr/local/share/tinyproxy/stats.html"
Syslog On
LogLevel Info
MaxClients 100
BasicAuth username password
DisableViaHeader Yes

Outcomes: Worked Great. No issues.

@harryboo
Copy link

harryboo commented Apr 9, 2019

Tested "rofl0r:threads" as reverse proxy on freetz (mips) by changing the make file and patches.

Works as expected but the memory footprint is high. A config option for this purpose is recommended..

@rofl0r
Copy link
Contributor Author

rofl0r commented Apr 9, 2019

Works as expected but the memory footprint is high. A config option for this purpose is recommended..

thanks for testing. could you provide some comparison numbers before/after with the same MaxClients setting in the config file ?

btw in src/child.c, there's this block:

                if (pthread_attr_init(&attr) == 0) {
                        attrp = &attr;
                        pthread_attr_setstacksize(attrp, 256*1024);
                }

                if (pthread_create(&child->thread, attrp, child_thread, child) $
                        sblist_delete(childs, sblist_getsize(childs) -1);
                        free(child);
                        goto oom;
                }

in the line with pthread_attr_setstacksize(attrp, 256*1024);, a stack size of 256 KB is set per client. this value was chosen as a really conservative value that should work on any C library implementation, even the worst. i plan to make this a configurable setting. if you want, you could try out some lower values there, if you're using a musl-libc based linux such as openwrt, a value as low as 8*1024 might be sufficient. going even lower than that could potentially lead to crashes.

Copy link

@robertosgm robertosgm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the issue presented on: #237

@robertosgm
Copy link

Tested this on MacOS:

       if (pthread_attr_init(&attr) == 0) {
                    attrp = &attr;
                    pthread_attr_setstacksize(attrp, 256*1024);
            }

            if (pthread_create(&child->thread, attrp, child_thread, child) $
                    sblist_delete(childs, sblist_getsize(childs) -1);
                    free(child);
                    goto oom;
            }

It works with 64*1024 Bytes, it crashes at lower values.

@harryboo
Copy link

harryboo commented Apr 9, 2019

After restart the memory footprint (no matter which setting for 'MaxClients') is:

old model - Virtual memory size (932 kB) - Resident set size (400 kB)
threaded model - Virtual memory size (1160 kB) - Resident set size (420 kB)

After calling the same URLs from outside the memory footprint for the old model stays the same.
For the threaded model (all measurements with 256kB):

MaxClients, Virtual memory size, Resident set size
20, 2468 kB, 588 kB
50, 2992 kB, 636 kB
100, 2460 kB, 584 kB
150, 2732 kB, 628 kB
200, 2728 kB, 604 kB

I will try to change the stack size value later...

@harryboo
Copy link

harryboo commented Apr 9, 2019

Sorry i think i have forgotten the worker for the old model. Every worker consumes 940 kB of extra memory.

This means that the footprint of the threaded model is lower than the old model?
Coz i have only 1 process of 3332 kB with the threaded model but i have 6032 kB with old model (with 5 StartServers).

@rofl0r
Copy link
Contributor Author

rofl0r commented Apr 9, 2019

This means that the footprint of the threaded model is lower than the old model?

yeah, my expectation was that the threaded model would use less memory, if the stacksize is set to a low value and a reasonable MaxClient limit is set, because the old model uses fork() which needs a complete copy of the address space of the parent per worker.

@deba12
Copy link

deba12 commented Apr 9, 2019

@rofl0r do you have time frame when will be merged ?

@rofl0r
Copy link
Contributor Author

rofl0r commented Apr 10, 2019

@rofl0r do you have time frame when will be merged ?

after sufficient testing. this is a quite intrusive PR and i want to get the details right before merging it into master. results like those from @harryboo are very important to get the full picture.

@n3mxnet
Copy link

n3mxnet commented Jun 27, 2019

totally awesome!

@limbo127
Copy link

limbo127 commented Aug 9, 2019

Hello,
We have issue with proxy loop, so we are testing this branch on our prod, 10000 clients to proxy, I can do a feedback then.
Regards,
Nicolas

Copy link

@limbo127 limbo127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our case, 10 000 connections , this branch works better than the master branch. Proxy request answer more quicky ( loop proxy detection is important in our specifiq use)

@rofl0r
Copy link
Contributor Author

rofl0r commented Aug 9, 2019

@limbo127: have you been able to do memory usage comparisons ?

@limbo127
Copy link

limbo127 commented Aug 9, 2019 via email

@rofl0r
Copy link
Contributor Author

rofl0r commented Aug 10, 2019

15920 would be KBs, right ? so rougly 16 MB? that's not so bad given that each of the 3000 clients has a separate thread (in the old model, many clients would share a pre-forked process)

@limbo127
Copy link

limbo127 commented Aug 10, 2019 via email

@limbo127
Copy link

limbo127 commented Aug 22, 2019 via email

@rofl0r
Copy link
Contributor Author

rofl0r commented Aug 22, 2019

it seems to block tinyproxy thread

can you elaborate ? i have no idea what "it" is nor which thread, nor what is being blocked.

@limbo127
Copy link

limbo127 commented Aug 22, 2019 via email

@rofl0r
Copy link
Contributor Author

rofl0r commented Aug 22, 2019

it does not answer to request

is that new ? it seems you were quite happy with this branch a couple weeks ago.
if so, what changed in your setup?

@limbo127
Copy link

limbo127 commented Aug 22, 2019 via email

@riston
Copy link

riston commented Sep 23, 2019

What is the status of this branch? I made a comparison between the current stable version and this branch and found that this solution works without throwing connection timeouts and connection refused errors. The current stable build has some unexpected behaviour when servers are under load.

Config I used (for the current stable build):
MaxClients: 100, MinSpareServers: 5, MaxSpareServers: 20

@rofl0r
Copy link
Contributor Author

rofl0r commented Sep 23, 2019

What is the status of this branch

it's quite complete and master will be switched to this branch soon. just need to find time to revise the SIGHUP config reloading system which could potentially interfere in unexpected ways with the running threads.

@limbo127
Copy link

limbo127 commented Sep 23, 2019 via email

@limbo127
Copy link

limbo127 commented Oct 3, 2019

Hello,
tinyproxy does not proces request anymore, after some times
ERROR Oct 03 08:20:34 [2053]: opensock: Could not retrieve info for https
ERROR Oct 03 08:21:50 [2053]: read_buffer: read() failed on fd 3: Connection reset by peer

Regards,
Nicolas

@rofl0r
Copy link
Contributor Author

rofl0r commented Oct 3, 2019

tinyproxy does not proces request anymore, after some times

after how much time?

can you maybe post contents of /proc/$pid/status when this happens, where $pid is tinyproxy's pid ?

@rofl0r
Copy link
Contributor Author

rofl0r commented Dec 21, 2019

merged as 69c86b9...cd005a9. this branch is now the new master branch targetting 1.11.0, master has been renamed to tp-1.10.x and will only receive bugfixes.

@rofl0r rofl0r closed this Dec 21, 2019
@talishka
Copy link

talishka commented Jun 9, 2020

I'm also having random read_buffer: read() failed on fd 8: Connection reset by peer, is a fixed newer version outhere? working on ubuntu 18.04, tinyproxy 1.8.4

@rofl0r
Copy link
Contributor Author

rofl0r commented Jul 6, 2020

@talishka: in order to get the benefits of this branch you need to compile git master from source. there is no release containing the changes yet. if you have further questions/issues, please open a new issue here, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants