Add timeout support for get_n_chars/4 #3136

jjtolton · 2025-10-25T04:26:10Z

Summary

This PR implements timeout support for get_n_chars/4, enabling non-blocking character reading from TCP sockets and process pipes as requested in #3035.

Details

Usage Examples

% Read with 5 second timeout
?- tcp_open_socket(Host, Port, Stream),
   get_n_chars(Stream, 1024, Request, 5000),
   process_request(Request).

% Read what's available in 1 second (Variable N)
?- process_create(Cmd, Args, [stdout(pipe(Out))]),
   get_n_chars(Out, N, Data, 1000),
   format("Got ~d bytes~n", [N]).

% Wait indefinitely (like get_n_chars/3)
?- get_n_chars(Stream, 100, Chars, 0).

Testing

# Run Prolog tests
./target/release/scryer-prolog -f --no-add-history src/tests/get_n_chars.pl \
  -f -g "use_module(library(get_n_chars_tests)), get_n_chars_tests:main_quiet(get_n_chars_tests)"

# Run CLI tests
TRYCMD=dump cargo test --test scryer get_n_chars

Closes Implementing goal timeouts? #3035

triska · 2025-10-25T14:02:50Z

src/lib/charsio.pl

+%   - infinity/inf (no timeout, blocks indefinitely)
+%
+% Returns whatever data is available within the timeout period.
+% On timeout, returns partial data (distinguishable from EOF).


a partial string ?

triska · 2025-10-25T14:10:25Z

This feature would be extremely valuable, thank you a lot for working on this!

I have one question: What if the timeout occurs between two bytes of a multi-octet UTF-8 sequence? For instance, if the file or stream content (eventually) is "\xf0\\x9f\\x92\\x9c\", which is the UTF-8 encoding of a single character, is the read always "atomic" in the sense that it reliably reads a full character, if any, also if a timeout occurs?

If possible, could you please consider adding a test for this? Thank you a lot!

triska · 2025-10-25T14:12:11Z

Regarding the message format("Got ~d bytes~n", [N]).: It should be chars, shouldn't it?

jjtolton · 2025-10-25T15:22:04Z

This feature would be extremely valuable, thank you a lot for working on this!

I have one question: What if the timeout occurs between two bytes of a multi-octet UTF-8 sequence? For instance, if the file or stream content (eventually) is "\xf0\\x9f\\x92\\x9c\", which is the UTF-8 encoding of a single character, is the read always "atomic" in the sense that it reliably reads a full character, if any, also if a timeout occurs?

If possible, could you please consider adding a test for this? Thank you a lot!

Great question. I have a thought and a question. My thought is that my usecase for this was to prevent blocking on idle connections, not active ones. So, it would be easy enough to make the default behavior continue to consume while there is data available.

Of course, that could interfere with cooperative concurrency, so if we were to enforce a cutoff, what should the behavior be?

should I push the half char back onto the stream?

triska · 2025-10-25T16:02:11Z

Personally, if I set a timeout, I would expect the system to respect it in all cases, and to read as much as possible within the timeout.

In addition, I would expect the API to be implemented in such a way that it reliably yields correct results. In the case of get_n_chars/4, this means that I expect to get what the sender transmitted.

For instance, if the sender sent the character '💜', then I would not expect to receive ['\xF0\'|Rs], because '\xf0\' is a different character (even though its byte value is part of the prefix of the UTF-8 encoding of '💜').

jjtolton · 2025-10-25T16:32:08Z

Personally, if I set a timeout, I would expect the system to respect it in all cases, and to read as much as possible within the timeout.

In addition, I would expect the API to be implemented in such a way that it reliably yields correct results. In the case of get_n_chars/4, this means that I expect to get what the sender transmitted.

For instance, if the sender sent the character '💜', then I would not expect to receive ['\xF0\'|Rs], because '\xf0\' is a different character (even though its byte value is part of the prefix of the UTF-8 encoding of '💜').

I don't think this is even theoretically possible though, is it? I need to think this through a bit. It seems to me to be entirely possible that there are a number of conditions which means that a half char would be received when a full char was sent.

So I think maybe the question resolves to, "how do we know what the senders intent was?"

I suppose there are two situations that come to mind, but I'm sure it's nonexhaustive.

A transmission is "stalled" on a half char and THEN the timeout occurs
The timeout occurs on a half char

In situation 1, it is unclear if waiting longer would result in additional information or not.
In situation 2, it seems that truncating the half char would correct but possibly incomplete?

triska · 2025-10-25T16:38:55Z

I don't think this is even theoretically possible though, is it?

On the byte-level, it is definitely possible to detect whether the prefix of a UTF-8 encoded character is encountered! In the case of get_n_chars/4, this detection cannot happen on the Prolog level, because this predicate yields chars, i.e., it presumes completely received characters.

jjtolton · 2025-10-25T16:39:12Z

Ah, of course. This is in charsio, we are only interested in full characters. So it would never make sense to receive a half character.

triska · 2025-10-25T16:41:38Z

Also, only situation 2 matters: The timeout occurs on a half char.

In that case, the implementation must take care to ensure that I can (later) receive the full character (of course presuming it ever arrives).

What must never happen is that the predicate tells us: "This is a character that was received", when that character was not sent.

jjtolton · 2025-10-25T19:49:53Z

This is going to be somewhat of an expanded footprint (esp w/regard to tests) in order to main sequential serialization of reads and not having any existing predicates break when they try to read from a half char stuck in a stream.

triska · 2025-10-25T20:05:22Z

If this cannot be supported (yet) for text streams, then one option may be to make timeouts only work for binary streams? For binary streams, bytes can be read individually, and there is no risk of breaking up a byte incorrectly.

The predicate could throw an error if it is given a stream for which it cannot reliably perform the operation.

Maybe a permisson_error on the grounds that it is not allowed to read input from a text stream with a timeout?

  e) There shall be a Permission Error when it is not
  permitted to perform a specific operation. It has the form
  permission_error(Operation, PermissionType,
  Culprit) where

     Operation E {
       access,
       create,
       input,
       modify,
       open,
       output,
       reposition
     },

triska · 2025-10-25T20:34:56Z

src/tests/incomplete_utf8.pl

+    get_n_chars(Out, _, Chars1, 100),
+    Chars1 = [],
+    get_char(Out, C1),
+    C1 = '💜',


A stronger test would be using (==)/2 here, it ensures that the terms are really identical.

(In a wrong implementation of get_char/2, C1 may be a variable after success, and the unification then succeeds.)

triska · 2025-10-25T20:35:52Z

src/tests/incomplete_utf8.pl

+    get_n_chars(Out, _, Chars1, 100),
+    Chars1 = [],
+    get_code(Out, Code),
+    Code =:= 128156,


(For comparison to the above, (=:=)/2 here is a good strong test, it requires that Code be ground.)

triska · 2025-10-25T20:36:35Z

src/tests/incomplete_utf8.pl

+    get_n_chars(Out, _, Chars1, 100),
+    Chars1 = [],
+    get_line_to_chars(Out, Line, []),
+    Line = ['💜',t,e,s,t,'\n'],


We have a shorter notation for strings, using double quotes.

triska · 2025-10-25T20:39:39Z

I see you now already implemented it also for text streams! Incredibly contribution, thank you a lot!

If this works correctly, then it will tremendously increase Scryer Prolog's application opportunities, notably for hosting sophisticated web services.

jjtolton · 2025-10-25T20:49:19Z

I see you now already implemented it also for text streams! Incredibly contribution, thank you a lot!

If this works correctly, then it will tremendously increase Scryer Prolog's application opportunities, notably for hosting sophisticated web services.

🤞 🤞 🤞

and maybe even web actors 🤫 🤫 🤫

triska · 2025-10-25T20:53:08Z

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))

It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

jjtolton · 2025-10-25T21:03:36Z

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))
It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

In the test cases or the implementation?

jjtolton · 2025-10-25T21:31:54Z

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))
It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

@triska c16dec0

triska · 2025-10-26T06:21:09Z

Thank you! The explicit module qualifications should not be necessary!

jjtolton · 2025-10-26T06:23:38Z

I agree they should not be, but unfortunately they are. Perhaps not for the external calls to iso_ext:setup_call_clean/3, but for the arguments to it, those must be module qualified otherwise they will error as undefined.

jjtolton · 2025-10-26T06:25:04Z

Once we have dyadic quads, the current approach along with explicit qualification will no longer be necessary.

triska · 2025-10-26T06:56:06Z

I now see the terminology "partial data" is still present: In the context of Prolog, "partial" is associated with "partially instantiated", from the documentation I would have thought the call may now yield partially instantiated lists. But apparently that is not the case? It only may yield fewer chars than asked for, if a timeout occurs, is that correct?

triska · 2025-10-26T07:02:28Z

And more importantly:

returns partial data (distinguishable from EOF).

On EOF, we get [], and on timeout, we may also get [], so how are the cases distinguishable? From the documentation, I thought that the timeout case is distinguished from EOF by a partially instantiated list, or (in case nothing at all arrived within the timeout) with a variable.

triska · 2025-10-26T07:04:21Z

For completeness: We can always manually use at_end_of_stream/1, so read_n_chars/4 itself need not even provide special facilities to distinguish this condition!

jjtolton · 2025-10-26T07:12:58Z

And more importantly:
returns partial data (distinguishable from EOF).
On EOF, we get [], and on timeout, we may also get [], so how are the cases distinguishable? From the documentation, I thought that the timeout case is distinguished from EOF by a partially instantiated list, or (in case nothing at all arrived within the timeout) with a variable.

I was using the informal sense of the world "partial" here. Thanks for pointing this out, I will fix the terminology to be more precise.

I am open to suggestions on how to differentiate EOF vs timeout.

We could also put an option in the 3rd argument for termination characterization.

triska · 2025-10-26T07:27:51Z

suggestions on how to differentiate EOF vs timeout.

Regarding the general API, I have now compared this with the Elisp function accept-process-output, which says:

(accept-process-output &optional PROCESS SECONDS MILLISEC JUST-THIS-ONE)

Allow any pending output from subprocesses to be read by Emacs.
It is given to their filter functions.
...

Optional second argument SECONDS and third argument MILLISEC
specify a timeout; return after that much time even if there is
no subprocess output.

I think this is the direction you initially envisaged: That the timeout only plays a role if there is nothing to read. On the other hand, if there is data available, then why not read and yield it?

In our case, both semantics do not mesh well with the N argument of get_n_chars/4. So maybe a slightly different API would be useful for reading with timeouts, analogous to Elisp's accept-process-output?

Maybe: get_chars_timeout(+Stream, -Chars, +Timeout): Yield as many Chars as can be read from Stream within Timeout?

jjtolton · 2025-10-26T12:37:46Z

This is true, and that was a thought I had. However the immediate reason that comes to mind is a poorly behaved process could flood the server.

What about a get_n_chars/5 where the final argument is EOF or timeout?

triska · 2025-10-26T12:53:50Z

However the immediate reason that comes to mind is a poorly behaved process could flood the server.

Good point! What I was thinking about initially is an application with real-time requirements, such as audio processing, where another thing must be handled after a number of milliseconds.

Therefore, yes, good point, for these reasons I think it makes sense to honor the timeout in all situations.

As mentioned, I think there is no need to indicate EOF within this predicate. When needed, that can be tested completely separately, with the standard predicate at_end_of_stream/1. Note that the existing predicate get_n_chars/3 also does not indicate EOF.

I think it only remains to document the arguments correctly.

Implements non-blocking I/O for get_n_chars with configurable timeout, addressing GitHub discussion mthom#3035. The timeout parameter enables reading from TCP sockets and process pipes without indefinite blocking. Key features: - New get_n_chars/4 predicate with timeout parameter (milliseconds) - Variable N support: when unbound, N unifies with actual chars read - Timeout behavior: returns partial data, distinct from EOF - Stream remains usable after timeout (no EOF marking) - Native OS timeouts: TCP uses set_read_timeout(), pipes use poll() - Backward compatible: existing get_n_chars/3 unchanged Implementation: - Added GetNCharsWithTimeout instruction (arity 4) - Separate dispatch handlers for 3-arg and 4-arg versions - Stream methods: set_read_timeout() and poll_read_ready() - Direct byte reading with elapsed time tracking for pipe timeouts - Proper handling of all variable types (Var, AttrVar, StackVar) Timeout values: - Integer: timeout in milliseconds - 0 or 'nonblock': minimal timeout (1ms) - 'infinity' or 'inf': no timeout (blocks indefinitely)

Changes: - Modified timeout=0 to mean "no timeout" (was 1ms) - Fixed critical buffering bug where CharReader buffering prevented reading all immediately available data with timeout - Updated documentation for get_n_chars/4 timeout behavior - Created comprehensive test suite with 10 passing tests This is work in progress - tests need to be migrated to follow the testing guide structure (src/tests/*.pl and CLI tests).

Moved tests from standalone file to proper testing framework: - Created src/tests/get_n_chars.pl using test_framework module - Created CLI test at tests/scryer/cli/src_tests/get_n_chars.toml - Removed old tests_get_n_chars.pl All 10 tests pass: 1. timeout=0 equals get_n_chars/3 2. Variable N with timeout=0 3. Negative timeout equals no timeout 4. Positive timeout stops reading 5. Infinity atom means no timeout 6. Stream usable after timeout 7. Timeout returns partial data not EOF 8. Multiple reads with timeout=0 9. Read more than available with timeout=0 10. Variable N unifies with actual count Tests now follow the three-layer approach from TESTING_GUIDE.md: - Layer 2: Prolog integration tests (src/tests/get_n_chars.pl) - Layer 3: CLI tests (tests/scryer/cli/src_tests/get_n_chars.toml)

When get_n_chars/4 timed out mid-character (e.g., after reading the first byte of a 4-byte UTF-8 character), subsequent stream operations would fail with syntax_error(invalid_data) because CharReader wasn't aware of the incomplete bytes saved in the incomplete_utf8 buffer. Key changes: - Added CharReader::prepend_bytes() to inject incomplete UTF-8 bytes into CharReader's buffer before reading new data - Created specialized StreamLayout<CharReader<PipeReader>> impl that loads incomplete_utf8 bytes before any read operation - Ensures incomplete_utf8 buffer is cleared after loading to prevent double-reading Fixed operations after timeout mid-character: - get_char/2, peek_char/2: correctly read/peek the complete character - get_code/2, peek_code/2: correctly return the character's code point - get_n_chars/3: continues reading from incomplete character - read_term/3: works after consuming incomplete character - get_line_to_chars/3: includes incomplete character in line - Sequential timeouts: handles multiple incomplete UTF-8 sequences Tests added: - src/tests/incomplete_utf8.pl: 8 comprehensive tests using test_framework, covering all major stream reading predicates - tests/scryer/cli/src_tests/incomplete_utf8.toml: CLI test config All tests pass, verifying that stream operations correctly handle incomplete UTF-8 sequences left by get_n_chars/4 timeouts.

@triska

Replace unification (=) with structural equality (==) for stronger test assertions. The == operator ensures variables are already bound to the expected values, preventing false positives from unification. As suggested by @triska in PR review, this catches potential bugs where predicates might incorrectly leave variables unbound. Changed assertions: - Character comparisons: C1 = '💜' → C1 == '💜' - List comparisons: Chars1 = [] → Chars1 == [] - Kept numeric comparisons as =:= (already correct)

@triska

Replace list notation with double-quote string notation for cleaner, more idiomatic test assertions. Changes: - Line == ['💜',t,e,s,t,'\n'] → Line == "💜test\n" - Chars2 == ['💜', 'A'] → Chars2 == "💜A" As suggested by @triska in PR review.

Wrap all process_create calls with setup_call_cleanup/3 to ensure reliable stream cleanup even if tests fail. Changes: - Added library(iso_ext) import for setup_call_cleanup/3 - Wrapped all 8 tests in incomplete_utf8.pl with setup_call_cleanup - Wrapped all 11 tests in get_n_chars.pl with setup_call_cleanup - Tests with multiple streams use nested setup_call_cleanup Pattern used: setup_call_cleanup( process_create(..., [stdout(pipe(Out))]), ( ... test body ... ), close(Out) ) Note: There appears to be a module loading issue when running these tests that needs investigation. The tests load correctly before this change, suggesting the iso_ext import may conflict with the test framework's module system.

Apply module qualifications to all imported predicates in test files to work around testing framework idiosyncracy where importing iso_ext causes other modules to become unavailable. Changes: - Qualify setup_call_cleanup with iso_ext: - Qualify process_create with process: - Qualify get_n_chars, get_char, peek_char, get_code, peek_code, and get_line_to_chars with charsio: - Qualify length with lists:

Adds 7 additional tests for nonblock functionality: - nonblock with fixed N limit - nonblock with variable N reads all available - nonblock returns empty when no data ready - nonblock vs timeout returns immediately - sequential nonblock reads drain buffer - nonblock with slow data returns partial snapshot - variable N: timeout waits, nonblock returns immediately Total test coverage now: - 18 tests in get_n_chars.pl (including nonblock) - 8 tests in incomplete_utf8.pl (all predicates) - 26 total tests for complete coverage

Add test functions to tests/scryer/src_tests.rs to enable running the get_n_chars/4 and incomplete UTF-8 handling tests via cargo test. Tests: - get_n_chars: 18 tests covering timeout, nonblock, and variable N - incomplete_utf8: 8 tests covering all predicates with partial UTF-8 Both test suites pass successfully.

When multiple process_create calls are needed in a test, each stream must have its own setup_call_cleanup wrapper. Previously used a single setup_call_cleanup with compound setup and cleanup goals, which is incorrect. Changed from: setup_call_cleanup( (create1, create2), test_body, (close2, close1) ) To proper nesting: setup_call_cleanup( create1, setup_call_cleanup( create2, test_body, close2 ), close1 ) This ensures proper cleanup order even if tests fail.

jjtolton · 2025-10-26T18:09:17Z

Ok I think I unbuggered everything after a failed rebase, could use another set of eyes

triska · 2025-10-26T19:34:09Z

src/lib/charsio.pl

+%   - nonblock (atom for minimal non-blocking behavior)
+%
+% Returns whatever data is available within the timeout period.
+% On timeout, returns partial data (distinguishable from EOF).


"Partial data" in the sense that it may be less than N chars?

jjtolton force-pushed the discussion-3035 branch 2 times, most recently from c70db3b to 8243268 Compare October 25, 2025 04:30

triska reviewed Oct 25, 2025

View reviewed changes

jjtolton marked this pull request as ready for review October 25, 2025 21:29

jjtolton force-pushed the discussion-3035 branch from 35bec05 to c16dec0 Compare October 25, 2025 21:31

jjtolton force-pushed the discussion-3035 branch 2 times, most recently from 8b647d4 to e6e3105 Compare October 26, 2025 15:23

jjtolton marked this pull request as draft October 26, 2025 15:47

jjtolton added 11 commits October 26, 2025 13:19

jjtolton force-pushed the discussion-3035 branch from e6e3105 to 9b28c5c Compare October 26, 2025 18:05

jjtolton marked this pull request as ready for review October 26, 2025 18:08

triska reviewed Oct 26, 2025

View reviewed changes

Add timeout support for get_n_chars/4 #3136

Are you sure you want to change the base?

Add timeout support for get_n_chars/4 #3136

Conversation

jjtolton commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage Examples

Testing

Uh oh!

triska Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

triska commented Oct 25, 2025

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025

Uh oh!

triska commented Oct 25, 2025

Uh oh!

triska Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

triska Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

triska Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

triska commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025

Uh oh!

jjtolton commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

triska commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

triska commented Oct 26, 2025

Uh oh!

triska commented Oct 26, 2025

Uh oh!

triska commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

triska commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

triska commented Oct 26, 2025

Uh oh!

jjtolton commented Oct 26, 2025

Uh oh!

triska Oct 26, 2025

Choose a reason for hiding this comment

jjtolton commented Oct 25, 2025 •

edited

Loading

jjtolton commented Oct 25, 2025 •

edited

Loading

jjtolton commented Oct 25, 2025 •

edited

Loading

jjtolton commented Oct 26, 2025 •

edited

Loading

triska commented Oct 26, 2025 •

edited

Loading