Fix memory corruption and leaks on clisp#196
Fix memory corruption and leaks on clisp#196vibs29 wants to merge 9 commits intocl-plus-ssl:masterfrom
Conversation
---- Files ffi-buffer.lisp ffi-buffer-clisp.lisp streams.lisp random.lisp x509.lisp ---- Memory corruption bug on clisp s/b-replace b/s-replace had bugs that could cause them to miscalculate the buffer's end as being beyond its boundary, or miscalculate the number of bytes to copy if the buffer's end was specified but the sequence was smaller. All callers of s/b-replace happened to pass arguments that didn't trigger its bugs. But one caller of b/s-replace (namely stream-write-sequence) could legitimately call it in a way that did trigger one of its bugs. E.g. if the buffer was smaller than the sequence, it would corrupt memory by writing beyond the buffer's bounds. I have fixed all the bugs, which were in s/b-replace and b/s-replace. ---- Performance For clisp: b/s-replace also copies less. The old version always called subseq, which copies. The new version copies only if the source seq is not a vector. s/b-replace is not expected to allocate memory proportional to its input arrays. But due to its call to memory-as, it did. Now it doesn't. It allocates O(1) memory, regardless of input array sizes. b/s-replace also allocates O(1) memory now. For all lisps: stream-read-sequence I have made this clearer. stream-write-sequence I have rewritten this to be clearer. And faster. The old version could flush a non-full stream. As a pathological case, writing 1 byte, then 2048 times writing 2049 bytes, would cause 4096 flushes. Now that will cause only 2049 flushes. ---- Memory leak on clisp There was also a memory leak because foreign buffers were allocated but never freed. I have fixed this by extending the buffer API and having all callers of make-buffer also release the buffer when finished with it.
|
re #120 |
There was a problem hiding this comment.
What motivates the changes in this file, and what is the changes nature? Note, decode-certificate and decode-certificate-from-file are public functions.
There was a problem hiding this comment.
Oh, you're right. Sorry, please do not merge. I mistakenly assumed decode-certificate was private. But it's not and therefore I mustn't change its signature.
The motivation (irrelevant now) was that cffi's with-pointer-to-buffer (wptvd) was a bad API, cl+ssl's adaptation of it was better, to move x509.lisp to the new one as an example of how the new one was to be used, including releasing memory, and so that all of cl+ssl only used the new API rather than parts using cffi's and parts using cl+ssl's. Changing x509 was not necessary for any other reason. It had no bugs or leaks. (Other than that all current uses of cffi's experimental and bad wptvd API have a potential future leak if cffi ever adds an implementation for a Lisp that requires explicitly freeing the buffer.)
Since decode-certificate is public, x509.lisp must not be changed. Very sorry about this.
I have now checked all other functions whose signatures I changed or bodies I removed everywhere to make sure that none of them were public. They aren't. Other files besides x509.lisp are fit to merge.
Ideally, a future version of decode-certificate would take as input a vec parameter specified to be of ordinary Lisp type vector. Not specified to be a cffi shareable-byte-vector or a cl+ssl ffi-buffer. Those are implementation details of cl+ssl. Internally, it could create whatever it wanted from that vec. That would reduce the burden on users by not requiring them to know of this foreign data business. And it would let decode-certificate be written more robustly, i.e. to free the foreign data it had itself created. And the data is small (the size of a certificate file) so it's utterly unimportant to try to avoid making one copy of it from Lisp space to C space. If there's a deprecation process for cl+ssl functions, then that can be used to provide a good alternative function and deprecate the old one. If there isn't, then there's nothing one can do. In any case, x509.lisp is not central to this pull request, so should probably now be ignored and not distract from what is central.
Would you like to have the rest of the files? If so, what's the best administrative method? I can create a fresh pull request with a fresh branch that omits the x509.lisp change. (I suppose I could also look into making a second commit on this branch that restores the original x509.lisp and then git squashing or something, but I'm not a squashing expert and will only look into doing that if you greatly prefer that to my opening a whole new pull request.)
|
How to avoid unnecessary changes in unmodified files is a secondary question. We will solve that after we agreed on the final version. But first I need to understand all the changes. I haven't digested your branch yet. The comments in src/ffi-buffer-clisp.lisp say you got a significant speed up and attribute that to "copying via a single foreign call to MEMORY-AS instead of one foreign call per element via %MEM-REF.' Could you point where this mem-ref per element happens in the old code? What whould be good to have is self-contained test cases demonstrating the bug in the old code and pass with the new code, and also covering all branches of the copying code changed and introduced. Do you see a sufficienty easy and practical way to implement such tests? Are you open go have a call to help me understand the pull request? |
|
A call will be great. Sent you email. This is my test file that reliably demonstrates cl+ssl corrupting memory on clisp. As for testing the new code, this is a test for the random function, And this is a test to verify that the various code paths through the new clisp code work correctly. Sorry these tests aren't more orderly and aren't in line with cl+ssl's testing standard. I did briefly look at whether I could make them so, then decided it was more effort that I was willing to make. So here they are as is, since you asked, for whatever they are worth. Performance: I didn't comment that I had achieved a huge speedup. The commenter above me said they had achieved a huge speedup. I was merely hypothesizing that the reason for that speedup couldn't have been lack of copying as he'd claimed because his code also copied. So the reason must have been the style of copying ( (The reason I say to not look at the very latest cffi code is that I recently submitted a patch so that that too uses The reason I started working on this was that cl+ssl would crash the process on clisp because ffi-buffer-clisp.lisp had miscalculations about buffer boundaries and was writing into C memory well past array boundaries. I improved performance and fixed the memory leak as side effects of working on that primary bug. |
|
I wonder if the problems addressed in this pull request also caused #163 |
|
I'm unfamiliar with the bio code. I've taken a quick look at the write-puts test. I don't think what I've done will directly fix that. However, remembering that stream-write-sequence would on clisp illegally overwrite arbitrary parts of the C memory, if any of the tests called stream-write-sequence then all subsequent behaviour of that process is unpredictable. It'll certainly be worth trying the test suite of 163 to see if its problem has disappeared with this patch. And it's not worth reasoning about any misbehaviour on clisp prior to applying this patch. cl+ssl on clisp is dangerous and should never be used without this patch. |
stream-listen stream-read-byte are written to potentially have horrible performance. The reason is that cffi's experimental Shareable Byte Vector interface is bad. It indicates that although with-pointer-to-vector-data (wptvd) with an empty body may be constant time (due to the vector's being shared between Lisp and C), it may also be O(len(vector)) on implementations that require copying in/out. Given that cffi is a portability layer, the name make-shareable-byte-vector is misleading if the result isn't necessarily shared between C and Lisp but may require copying. Anyway, bad cffi names aside, callers must therefore treat wptvd as having O(len(vector)) overhead since that is what its documentation indicates. In that light, stream-listen and stream-read-byte, which are written to potentially cause the copying of the entire underlying buffer when they only want to operate on one byte, are badly written. They are written with a mistaken assumption, that wptvd has O(1) overhead, but it is really specified by cffi to have O(len(vector)) overhead. One solution is for the stream to own another one byte input buffer for these two functions to use. The input buffer is only used as a temporary variable within a function. It doesn't retain meaningful state between function calls. So this solution is easy to implement. As it happens, with the clisp-specific ffi-buffer-clisp.lisp, the clisp wptvd in cl+ssl is O(1) so there happens not to be a problem presently. All current wptvd's in use by cl+ssl are O(1). But stream-listen and stream-read-byte should still not be written to assume that the general wptvd has O(1) overhead. A new wptvd may be added to cffi or an old one rewritten to have the allowable O(len(vector)) overhead. Thus stream-listen and stream-read-byte should really be corrected to reflect that wptvd does not guarantee that implementations will have less than O(len(vector)) overhead. As written, they are just lucky they happen to perform well because of the implementation details of the current wptvd's in existence. This commit implements the solution of having a separate small buffer for stream-listen and stream-read-byte to use. Performance will not improve today. But this design will ensure that cl+ssl doesn't in the future suddenly mysteriously develop horrible performance due to relying on implementation details of wptvd adding O(1) overhead when it is really specified to add O(len(vector)) overhead.
|
This is how I validate the last commit 45c439f "Stabilise performance for byte-sized operations". |
| (setf (ssl-stream-peeked-byte stream) nil)) | ||
| (handler-case | ||
| (let ((buf (ssl-stream-input-buffer stream)) | ||
| (let ((buf (ssl-stream-input-buffer-small stream)) |
There was a problem hiding this comment.
Why reading into a single-byte buffer is better than reading one byte into the big buffer in stream-listen and stream-read-byte?
There was a problem hiding this comment.
Hi Anton. I've described this over several paragraphs in the commit message. I can also explain over the call. This isn't as important as the prior commits, so it's fruitless discussing it before the prior commits have been understood. I can't explain better in writing than I have in the commit message.
There was a problem hiding this comment.
Sorry I missed the commit message (I looked in the pull request conversation and in the code comments before asking though).
I see now in the commit message you adress that question.
|
Eventual Design Ideal I see that cffi and cl+ssl are both MIT licenced. Ideally, cffi would deprecate its own wptvd API and copy cl+ssl's improved one from ffi-buffer.lisp and ffi-buffer-clisp.lisp into its own codebase, but would renumber the functions etc. to have a suffix of 2, to differentiate it from its old API. It could then tell its users,
cl+ssl could then delete its own copy of ffi-buffer.lisp and ffi-buffer-clisp.lisp and use cffi's new API instead. Other users of cffi besides cl+ssl could upgrade to cffi's new API to benefit from the fast (and now correct, faster and O(1) memory) ffi-buffer-clisp.lisp that cl+ssl has long had for itself. The reason the API is backward incompatible is that (a) |
The original versions of s/b-replace and b/s-replace were one liners so declared to be inlined, but not all versions are one-liners now so not all versions should be inlined.
src/ffi-buffer-clisp.lisp
Outdated
| (+ buf-start | ||
| (- (or seq-end (length seq)) | ||
| seq-start))) | ||
| (defparameter *mem-max* 1024 "so *-REPLACE require the expected O(1) memory") |
There was a problem hiding this comment.
The *mem-max* is intended to limit the allocations done by memory-as, right? Question, if we copy 2048 bytes, calling memory-as twice for 1024 bytes, is it really better than a single call for 2048 bytes? The total amount of allocated memory is the same. If it is garbage collected, collecting two objects may be more work for GC than one object. Is it really beneficial?
Also, in context of cl+ssl, the maximum size of arrays copied with s/b-replace, b/s-replace is limited by the buffer size, which defaults to 2048 bytes. Not a big difference from the 1024 *mem-max* here.
Are you thinking of the *-replace functions as more general purpose utilities, that must be prepared to any size of collections copied?
There was a problem hiding this comment.
Aha, *mem-max* also limits the intermediate array when b/s-replace is called with a list as the sequence to be copied to the biffer. The same questions apply for that case.
|
The If you make the number 2048, future programmers might not understand this and may think it needs to be kept in sync with the other 2048. Also, yes, the buffer code is at a lower level than code that uses it and should be independent of such code (e.g. streams.lisp). It really belongs in cffi. (And may someday get there: cffi/cffi#421 but that's a separate matter.) Whether the garbage collector collects one piece of data or two is immaterial. What's material is that the two numbers be kept separate, even into the future. Make it 2048 and future programmers are less likely to understand this. But it's fine to make it 2048 if future programmers can be made to understand this some other way. The simplest way I could manage was what I did. What matters is to keep the two numbers separate and not think they must be varied together. Anything that forever keeps update: |
|
My comments about the Eventual Design Ideal: The In cl+ssl we have a different arrangement: we read into a buffer and then copy from the buffer to another sequence. So we may want better support in CFFI for this use case without deprecating Next, I think there is no need for CFFI to introduce an opaque buffer abstraction. Applications can just use The only thing missing is bulk copy between a foreign memory array and lisp sequence ( A big design question is how generic these bulk copy functions should be with respect to lisp sequence type: support only simple vectors, any vectors, or both vectors and lists. This depends on the possibilities we can reasonably expect most lisp implementations can efficiently provide. |
|
Moreover, The only problem is that for some lisps So the question is: can cffi provode more primitive bulk copying operations that are supported on more lisps than currently support the zero copy A relevant doc: https://github.com/cffi/cffi/blob/master/doc/mem-vector.txt |
|
I undersdod what may be the motivation for opaque buffer abstraction in cffi: if efficient bulk co0ying operations are not available in all lisp, then some lisps may implement buffer using shareable-vector/wptvd and other lisps with foreign-alloc / bulk copy. |
s/b-replace had O(n^2) running time for an input seq of length n. Now it is O(n). Also, s/b-replace and b/s-replace now correctly raise errors when bounding indices are out of bounds, whereas previously they would sometimes effectively shrink illegal arguments to become legal, which was unintentionally different from how replace behaves. Their behaviour is modelled on replace's. b/s-replace also wouldn't work for zero-length buffer/sequence. Now it does. This bug caused no harm because it is never called with zero-length buffer/sequence.
|
@vibs29, pleae help me see the O(n^2) run time of s/b-replace prior the commit 61444d (the commit comment says it's one of its fixes). Is it because calling PS: I started work on this PR from integrating unit tests based on your examples, but these days I am somewhat busy with other things, so the progress is slow. I will continue in the coming days. |
|
Hi Anton. Yes, exactly right! I felt a bit guilty about adding commits to the same branch, but didn't know what else to feasibly do. I didn't dare submit my additional tests for the last commit above in the comment. But I think I'll paste it here now just so everything is out there and not sitting privately on my computer where nobody can see it at all. They don't use the official testing framework, which I still haven't taken the time to get the hang of. But they were still useful to me to verify that what I'd finally written did expose (latent) problems in the code before last commit that the last commit solved. It's totally fine to ignore this completely. |
…o that failure or exception report are more readable; fix buffer-equal; little more test cases. re #196
… to b/s-replace, more test cases for b/s-replace. re #196
…boundary check test cases suggested by vibs29 in #196 (comment). re #196
…e b/s-replace tests us - generate a separate test name for every case; extend the test cases for sequences of type list. re #196
…lace, including foreign buffer memory corruption by b/s-replace; also guarantee O(1) working memory usage by s/b-replace and b/s-replace, even if the buffer and sequence sizes are huge. re #196
…lace, including foreign buffer memory corruption by b/s-replace; also guarantee O(1) working memory usage by s/b-replace and b/s-replace, even if the buffer and sequence sizes are huge. re #196
…t readtable case (instead of just interning lower case names). re #196
…r version of the pull request by vibs29 (instead of the current constant), so the tests can modify it and it can be controlled dynamically if needed. re #196
…ogical case in the old version: writing 1 byte, then 2048 times writing 2049 bytes, would cause 4096 flushes; now that will cause only 2049 flushes). re #196
Files
ffi-buffer.lisp
ffi-buffer-clisp.lisp
streams.lisp
random.lisp
x509.lisp
Memory corruption bug on clisp
s/b-replace
b/s-replace
had bugs that could cause them to miscalculate the buffer's end as being
beyond its boundary, or miscalculate the number of bytes to copy if the
buffer's end was specified but the sequence was smaller.
All callers of s/b-replace happened to pass arguments that didn't trigger
its bugs.
But one caller of b/s-replace (namely stream-write-sequence) could
legitimately call it in a way that did trigger one of its bugs.
E.g. if the buffer was smaller than the sequence, it would corrupt memory
by writing beyond the buffer's bounds.
I have fixed all the bugs, which were in s/b-replace and b/s-replace.
Performance
For clisp:
b/s-replace also copies less.
The old version always called subseq, which copies.
The new version copies only if the source seq is not a vector.
s/b-replace is not expected to allocate memory proportional to its
input arrays.
But due to its call to memory-as, it did.
Now it doesn't. It allocates O(1) memory, regardless of input array sizes.
b/s-replace also allocates O(1) memory now.
For all lisps:
stream-read-sequence
I have made this clearer.
stream-write-sequence
I have rewritten this to be clearer.
And faster. The old version could flush a non-full stream.
As a pathological case,
writing 1 byte, then 2048 times writing 2049 bytes, would cause 4096
flushes.
Now that will cause only 2049 flushes.
Memory leak on clisp
There was also a memory leak because foreign buffers were allocated but
never freed. I have fixed this by extending the buffer API and having
all callers of make-buffer also release the buffer when finished with it.