-
Notifications
You must be signed in to change notification settings - Fork 155
Add timeout support for get_n_chars/4 #3136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c70db3b to
8243268
Compare
| % - infinity/inf (no timeout, blocks indefinitely) | ||
| % | ||
| % Returns whatever data is available within the timeout period. | ||
| % On timeout, returns partial data (distinguishable from EOF). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a partial string ?
|
This feature would be extremely valuable, thank you a lot for working on this! I have one question: What if the timeout occurs between two bytes of a multi-octet UTF-8 sequence? For instance, if the file or stream content (eventually) is If possible, could you please consider adding a test for this? Thank you a lot! |
|
Regarding the message |
Great question. I have a thought and a question. My thought is that my usecase for this was to prevent blocking on idle connections, not active ones. So, it would be easy enough to make the default behavior continue to consume while there is data available. Of course, that could interfere with cooperative concurrency, so if we were to enforce a cutoff, what should the behavior be? should I push the half char back onto the stream? |
|
Personally, if I set a timeout, I would expect the system to respect it in all cases, and to read as much as possible within the timeout. In addition, I would expect the API to be implemented in such a way that it reliably yields correct results. In the case of For instance, if the sender sent the character '💜', then I would not expect to receive |
I don't think this is even theoretically possible though, is it? I need to think this through a bit. It seems to me to be entirely possible that there are a number of conditions which means that a half char would be received when a full char was sent. So I think maybe the question resolves to, "how do we know what the senders intent was?" I suppose there are two situations that come to mind, but I'm sure it's nonexhaustive.
In situation 1, it is unclear if waiting longer would result in additional information or not. |
On the byte-level, it is definitely possible to detect whether the prefix of a UTF-8 encoded character is encountered! In the case of |
|
Ah, of course. This is in charsio, we are only interested in full characters. So it would never make sense to receive a half character. |
|
Also, only situation 2 matters: The timeout occurs on a half char. In that case, the implementation must take care to ensure that I can (later) receive the full character (of course presuming it ever arrives). What must never happen is that the predicate tells us: "This is a character that was received", when that character was not sent. |
|
This is going to be somewhat of an expanded footprint (esp w/regard to tests) in order to main sequential serialization of reads and not having any existing predicates break when they try to read from a half char stuck in a stream. |
|
If this cannot be supported (yet) for The predicate could throw an error if it is given a stream for which it cannot reliably perform the operation. Maybe a e) There shall be a Permission Error when it is not
permitted to perform a specific operation. It has the form
permission_error(Operation, PermissionType,
Culprit) where
Operation E {
access,
create,
input,
modify,
open,
output,
reposition
},
|
src/tests/incomplete_utf8.pl
Outdated
| get_n_chars(Out, _, Chars1, 100), | ||
| Chars1 = [], | ||
| get_char(Out, C1), | ||
| C1 = '💜', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stronger test would be using (==)/2 here, it ensures that the terms are really identical.
(In a wrong implementation of get_char/2, C1 may be a variable after success, and the unification then succeeds.)
src/tests/incomplete_utf8.pl
Outdated
| get_n_chars(Out, _, Chars1, 100), | ||
| Chars1 = [], | ||
| get_code(Out, Code), | ||
| Code =:= 128156, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For comparison to the above, (=:=)/2 here is a good strong test, it requires that Code be ground.)
src/tests/incomplete_utf8.pl
Outdated
| get_n_chars(Out, _, Chars1, 100), | ||
| Chars1 = [], | ||
| get_line_to_chars(Out, Line, []), | ||
| Line = ['💜',t,e,s,t,'\n'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a shorter notation for strings, using double quotes.
|
I see you now already implemented it also for If this works correctly, then it will tremendously increase Scryer Prolog's application opportunities, notably for hosting sophisticated web services. |
🤞 🤞 🤞 and maybe even web actors 🤫 🤫 🤫 |
|
It looks very good! The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out)) It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more. |
In the test cases or the implementation? |
35bec05 to
c16dec0
Compare
|
|
Thank you! The explicit module qualifications should not be necessary! |
|
I agree they should not be, but unfortunately they are. Perhaps not for the external calls to |
|
Once we have dyadic quads, the current approach along with explicit qualification will no longer be necessary. |
|
I now see the terminology "partial data" is still present: In the context of Prolog, "partial" is associated with "partially instantiated", from the documentation I would have thought the call may now yield partially instantiated lists. But apparently that is not the case? It only may yield fewer chars than asked for, if a timeout occurs, is that correct? |
|
And more importantly: returns partial data (distinguishable from EOF). On EOF, we get |
|
For completeness: We can always manually use |
I was using the informal sense of the world "partial" here. Thanks for pointing this out, I will fix the terminology to be more precise. I am open to suggestions on how to differentiate EOF vs timeout. We could also put an option in the 3rd argument for termination characterization. |
Regarding the general API, I have now compared this with the Elisp function (accept-process-output &optional PROCESS SECONDS MILLISEC JUST-THIS-ONE) Allow any pending output from subprocesses to be read by Emacs. It is given to their filter functions. ... Optional second argument SECONDS and third argument MILLISEC specify a timeout; return after that much time even if there is no subprocess output. I think this is the direction you initially envisaged: That the timeout only plays a role if there is nothing to read. On the other hand, if there is data available, then why not read and yield it? In our case, both semantics do not mesh well with the Maybe: |
|
This is true, and that was a thought I had. However the immediate reason that comes to mind is a poorly behaved process could flood the server. What about a |
Good point! What I was thinking about initially is an application with real-time requirements, such as audio processing, where another thing must be handled after a number of milliseconds. Therefore, yes, good point, for these reasons I think it makes sense to honor the timeout in all situations. As mentioned, I think there is no need to indicate EOF within this predicate. When needed, that can be tested completely separately, with the standard predicate I think it only remains to document the arguments correctly. |
8b647d4 to
e6e3105
Compare
Implements non-blocking I/O for get_n_chars with configurable timeout, addressing GitHub discussion mthom#3035. The timeout parameter enables reading from TCP sockets and process pipes without indefinite blocking. Key features: - New get_n_chars/4 predicate with timeout parameter (milliseconds) - Variable N support: when unbound, N unifies with actual chars read - Timeout behavior: returns partial data, distinct from EOF - Stream remains usable after timeout (no EOF marking) - Native OS timeouts: TCP uses set_read_timeout(), pipes use poll() - Backward compatible: existing get_n_chars/3 unchanged Implementation: - Added GetNCharsWithTimeout instruction (arity 4) - Separate dispatch handlers for 3-arg and 4-arg versions - Stream methods: set_read_timeout() and poll_read_ready() - Direct byte reading with elapsed time tracking for pipe timeouts - Proper handling of all variable types (Var, AttrVar, StackVar) Timeout values: - Integer: timeout in milliseconds - 0 or 'nonblock': minimal timeout (1ms) - 'infinity' or 'inf': no timeout (blocks indefinitely)
Changes: - Modified timeout=0 to mean "no timeout" (was 1ms) - Fixed critical buffering bug where CharReader buffering prevented reading all immediately available data with timeout - Updated documentation for get_n_chars/4 timeout behavior - Created comprehensive test suite with 10 passing tests This is work in progress - tests need to be migrated to follow the testing guide structure (src/tests/*.pl and CLI tests).
Moved tests from standalone file to proper testing framework: - Created src/tests/get_n_chars.pl using test_framework module - Created CLI test at tests/scryer/cli/src_tests/get_n_chars.toml - Removed old tests_get_n_chars.pl All 10 tests pass: 1. timeout=0 equals get_n_chars/3 2. Variable N with timeout=0 3. Negative timeout equals no timeout 4. Positive timeout stops reading 5. Infinity atom means no timeout 6. Stream usable after timeout 7. Timeout returns partial data not EOF 8. Multiple reads with timeout=0 9. Read more than available with timeout=0 10. Variable N unifies with actual count Tests now follow the three-layer approach from TESTING_GUIDE.md: - Layer 2: Prolog integration tests (src/tests/get_n_chars.pl) - Layer 3: CLI tests (tests/scryer/cli/src_tests/get_n_chars.toml)
When get_n_chars/4 timed out mid-character (e.g., after reading the first byte of a 4-byte UTF-8 character), subsequent stream operations would fail with syntax_error(invalid_data) because CharReader wasn't aware of the incomplete bytes saved in the incomplete_utf8 buffer. Key changes: - Added CharReader::prepend_bytes() to inject incomplete UTF-8 bytes into CharReader's buffer before reading new data - Created specialized StreamLayout<CharReader<PipeReader>> impl that loads incomplete_utf8 bytes before any read operation - Ensures incomplete_utf8 buffer is cleared after loading to prevent double-reading Fixed operations after timeout mid-character: - get_char/2, peek_char/2: correctly read/peek the complete character - get_code/2, peek_code/2: correctly return the character's code point - get_n_chars/3: continues reading from incomplete character - read_term/3: works after consuming incomplete character - get_line_to_chars/3: includes incomplete character in line - Sequential timeouts: handles multiple incomplete UTF-8 sequences Tests added: - src/tests/incomplete_utf8.pl: 8 comprehensive tests using test_framework, covering all major stream reading predicates - tests/scryer/cli/src_tests/incomplete_utf8.toml: CLI test config All tests pass, verifying that stream operations correctly handle incomplete UTF-8 sequences left by get_n_chars/4 timeouts.
Replace unification (=) with structural equality (==) for stronger test assertions. The == operator ensures variables are already bound to the expected values, preventing false positives from unification. As suggested by @triska in PR review, this catches potential bugs where predicates might incorrectly leave variables unbound. Changed assertions: - Character comparisons: C1 = '💜' → C1 == '💜' - List comparisons: Chars1 = [] → Chars1 == [] - Kept numeric comparisons as =:= (already correct)
Replace list notation with double-quote string notation for cleaner, more idiomatic test assertions. Changes: - Line == ['💜',t,e,s,t,'\n'] → Line == "💜test\n" - Chars2 == ['💜', 'A'] → Chars2 == "💜A" As suggested by @triska in PR review.
Wrap all process_create calls with setup_call_cleanup/3 to ensure
reliable stream cleanup even if tests fail.
Changes:
- Added library(iso_ext) import for setup_call_cleanup/3
- Wrapped all 8 tests in incomplete_utf8.pl with setup_call_cleanup
- Wrapped all 11 tests in get_n_chars.pl with setup_call_cleanup
- Tests with multiple streams use nested setup_call_cleanup
Pattern used:
setup_call_cleanup(
process_create(..., [stdout(pipe(Out))]),
( ... test body ... ),
close(Out)
)
Note: There appears to be a module loading issue when running these
tests that needs investigation. The tests load correctly before this
change, suggesting the iso_ext import may conflict with the test
framework's module system.
Apply module qualifications to all imported predicates in test files to work around testing framework idiosyncracy where importing iso_ext causes other modules to become unavailable. Changes: - Qualify setup_call_cleanup with iso_ext: - Qualify process_create with process: - Qualify get_n_chars, get_char, peek_char, get_code, peek_code, and get_line_to_chars with charsio: - Qualify length with lists:
Adds 7 additional tests for nonblock functionality: - nonblock with fixed N limit - nonblock with variable N reads all available - nonblock returns empty when no data ready - nonblock vs timeout returns immediately - sequential nonblock reads drain buffer - nonblock with slow data returns partial snapshot - variable N: timeout waits, nonblock returns immediately Total test coverage now: - 18 tests in get_n_chars.pl (including nonblock) - 8 tests in incomplete_utf8.pl (all predicates) - 26 total tests for complete coverage
Add test functions to tests/scryer/src_tests.rs to enable running the get_n_chars/4 and incomplete UTF-8 handling tests via cargo test. Tests: - get_n_chars: 18 tests covering timeout, nonblock, and variable N - incomplete_utf8: 8 tests covering all predicates with partial UTF-8 Both test suites pass successfully.
When multiple process_create calls are needed in a test, each stream
must have its own setup_call_cleanup wrapper. Previously used a single
setup_call_cleanup with compound setup and cleanup goals, which is
incorrect.
Changed from:
setup_call_cleanup(
(create1, create2),
test_body,
(close2, close1)
)
To proper nesting:
setup_call_cleanup(
create1,
setup_call_cleanup(
create2,
test_body,
close2
),
close1
)
This ensures proper cleanup order even if tests fail.
e6e3105 to
9b28c5c
Compare
|
Ok I think I unbuggered everything after a failed rebase, could use another set of eyes |
| % - nonblock (atom for minimal non-blocking behavior) | ||
| % | ||
| % Returns whatever data is available within the timeout period. | ||
| % On timeout, returns partial data (distinguishable from EOF). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Partial data" in the sense that it may be less than N chars?
Summary
This PR implements timeout support for
get_n_chars/4, enabling non-blocking character reading from TCP sockets and process pipes as requested in #3035.Details
Usage Examples
Testing