Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jjtolton
Copy link

@jjtolton jjtolton commented Oct 25, 2025

Summary

This PR implements timeout support for get_n_chars/4, enabling non-blocking character reading from TCP sockets and process pipes as requested in #3035.

Details

Usage Examples

% Read with 5 second timeout
?- tcp_open_socket(Host, Port, Stream),
   get_n_chars(Stream, 1024, Request, 5000),
   process_request(Request).

% Read what's available in 1 second (Variable N)
?- process_create(Cmd, Args, [stdout(pipe(Out))]),
   get_n_chars(Out, N, Data, 1000),
   format("Got ~d bytes~n", [N]).

% Wait indefinitely (like get_n_chars/3)
?- get_n_chars(Stream, 100, Chars, 0).

Testing

# Run Prolog tests
./target/release/scryer-prolog -f --no-add-history src/tests/get_n_chars.pl \
  -f -g "use_module(library(get_n_chars_tests)), get_n_chars_tests:main_quiet(get_n_chars_tests)"

# Run CLI tests
TRYCMD=dump cargo test --test scryer get_n_chars

@jjtolton jjtolton force-pushed the discussion-3035 branch 2 times, most recently from c70db3b to 8243268 Compare October 25, 2025 04:30
% - infinity/inf (no timeout, blocks indefinitely)
%
% Returns whatever data is available within the timeout period.
% On timeout, returns partial data (distinguishable from EOF).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a partial string ?

@triska
Copy link
Contributor

triska commented Oct 25, 2025

This feature would be extremely valuable, thank you a lot for working on this!

I have one question: What if the timeout occurs between two bytes of a multi-octet UTF-8 sequence? For instance, if the file or stream content (eventually) is "\xf0\\x9f\\x92\\x9c\", which is the UTF-8 encoding of a single character, is the read always "atomic" in the sense that it reliably reads a full character, if any, also if a timeout occurs?

If possible, could you please consider adding a test for this? Thank you a lot!

@triska
Copy link
Contributor

triska commented Oct 25, 2025

Regarding the message format("Got ~d bytes~n", [N]).: It should be chars, shouldn't it?

@jjtolton
Copy link
Author

This feature would be extremely valuable, thank you a lot for working on this!

I have one question: What if the timeout occurs between two bytes of a multi-octet UTF-8 sequence? For instance, if the file or stream content (eventually) is "\xf0\\x9f\\x92\\x9c\", which is the UTF-8 encoding of a single character, is the read always "atomic" in the sense that it reliably reads a full character, if any, also if a timeout occurs?

If possible, could you please consider adding a test for this? Thank you a lot!

Great question. I have a thought and a question. My thought is that my usecase for this was to prevent blocking on idle connections, not active ones. So, it would be easy enough to make the default behavior continue to consume while there is data available.

Of course, that could interfere with cooperative concurrency, so if we were to enforce a cutoff, what should the behavior be?

should I push the half char back onto the stream?

@triska
Copy link
Contributor

triska commented Oct 25, 2025

Personally, if I set a timeout, I would expect the system to respect it in all cases, and to read as much as possible within the timeout.

In addition, I would expect the API to be implemented in such a way that it reliably yields correct results. In the case of get_n_chars/4, this means that I expect to get what the sender transmitted.

For instance, if the sender sent the character '💜', then I would not expect to receive ['\xF0\'|Rs], because '\xf0\' is a different character (even though its byte value is part of the prefix of the UTF-8 encoding of '💜').

@jjtolton
Copy link
Author

Personally, if I set a timeout, I would expect the system to respect it in all cases, and to read as much as possible within the timeout.

In addition, I would expect the API to be implemented in such a way that it reliably yields correct results. In the case of get_n_chars/4, this means that I expect to get what the sender transmitted.

For instance, if the sender sent the character '💜', then I would not expect to receive ['\xF0\'|Rs], because '\xf0\' is a different character (even though its byte value is part of the prefix of the UTF-8 encoding of '💜').

I don't think this is even theoretically possible though, is it? I need to think this through a bit. It seems to me to be entirely possible that there are a number of conditions which means that a half char would be received when a full char was sent.

So I think maybe the question resolves to, "how do we know what the senders intent was?"

I suppose there are two situations that come to mind, but I'm sure it's nonexhaustive.

  1. A transmission is "stalled" on a half char and THEN the timeout occurs
  2. The timeout occurs on a half char

In situation 1, it is unclear if waiting longer would result in additional information or not.
In situation 2, it seems that truncating the half char would correct but possibly incomplete?

@triska
Copy link
Contributor

triska commented Oct 25, 2025

I don't think this is even theoretically possible though, is it?

On the byte-level, it is definitely possible to detect whether the prefix of a UTF-8 encoded character is encountered! In the case of get_n_chars/4, this detection cannot happen on the Prolog level, because this predicate yields chars, i.e., it presumes completely received characters.

@jjtolton
Copy link
Author

Ah, of course. This is in charsio, we are only interested in full characters. So it would never make sense to receive a half character.

@triska
Copy link
Contributor

triska commented Oct 25, 2025

Also, only situation 2 matters: The timeout occurs on a half char.

In that case, the implementation must take care to ensure that I can (later) receive the full character (of course presuming it ever arrives).

What must never happen is that the predicate tells us: "This is a character that was received", when that character was not sent.

@jjtolton
Copy link
Author

This is going to be somewhat of an expanded footprint (esp w/regard to tests) in order to main sequential serialization of reads and not having any existing predicates break when they try to read from a half char stuck in a stream.

@triska
Copy link
Contributor

triska commented Oct 25, 2025

If this cannot be supported (yet) for text streams, then one option may be to make timeouts only work for binary streams? For binary streams, bytes can be read individually, and there is no risk of breaking up a byte incorrectly.

The predicate could throw an error if it is given a stream for which it cannot reliably perform the operation.

Maybe a permisson_error on the grounds that it is not allowed to read input from a text stream with a timeout?

  e) There shall be a Permission Error when it is not
  permitted to perform a specific operation. It has the form
  permission_error(Operation, PermissionType,
  Culprit) where

     Operation E {
       access,
       create,
       input,
       modify,
       open,
       output,
       reposition
     },

get_n_chars(Out, _, Chars1, 100),
Chars1 = [],
get_char(Out, C1),
C1 = '💜',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stronger test would be using (==)/2 here, it ensures that the terms are really identical.

(In a wrong implementation of get_char/2, C1 may be a variable after success, and the unification then succeeds.)

get_n_chars(Out, _, Chars1, 100),
Chars1 = [],
get_code(Out, Code),
Code =:= 128156,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For comparison to the above, (=:=)/2 here is a good strong test, it requires that Code be ground.)

get_n_chars(Out, _, Chars1, 100),
Chars1 = [],
get_line_to_chars(Out, Line, []),
Line = ['💜',t,e,s,t,'\n'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a shorter notation for strings, using double quotes.

@triska
Copy link
Contributor

triska commented Oct 25, 2025

I see you now already implemented it also for text streams! Incredibly contribution, thank you a lot!

If this works correctly, then it will tremendously increase Scryer Prolog's application opportunities, notably for hosting sophisticated web services.

@jjtolton
Copy link
Author

jjtolton commented Oct 25, 2025

I see you now already implemented it also for text streams! Incredibly contribution, thank you a lot!

If this works correctly, then it will tremendously increase Scryer Prolog's application opportunities, notably for hosting sophisticated web services.

🤞 🤞 🤞

and maybe even web actors 🤫 🤫 🤫

@triska
Copy link
Contributor

triska commented Oct 25, 2025

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))

It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

@jjtolton
Copy link
Author

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))
It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

In the test cases or the implementation?

@jjtolton jjtolton marked this pull request as ready for review October 25, 2025 21:29
@jjtolton
Copy link
Author

jjtolton commented Oct 25, 2025

It looks very good!

The only remaining small issue I noticed is in the test cases, and can be addressed at any time (also later): setup_call_cleanup/3 may be useful to reliably close started processes:

setup_call_cleanup(process_create(Echo, [Content], [stdout(pipe(Out))]), ..., close(Out))
It doesn't matter much for the tests themselves, but it may be valuable to use and see this pattern more.

@triska c16dec0

@triska
Copy link
Contributor

triska commented Oct 26, 2025

Thank you! The explicit module qualifications should not be necessary!

@jjtolton
Copy link
Author

I agree they should not be, but unfortunately they are. Perhaps not for the external calls to iso_ext:setup_call_clean/3, but for the arguments to it, those must be module qualified otherwise they will error as undefined.

@jjtolton
Copy link
Author

Once we have dyadic quads, the current approach along with explicit qualification will no longer be necessary.

@triska
Copy link
Contributor

triska commented Oct 26, 2025

I now see the terminology "partial data" is still present: In the context of Prolog, "partial" is associated with "partially instantiated", from the documentation I would have thought the call may now yield partially instantiated lists. But apparently that is not the case? It only may yield fewer chars than asked for, if a timeout occurs, is that correct?

@triska
Copy link
Contributor

triska commented Oct 26, 2025

And more importantly:

returns partial data (distinguishable from EOF).

On EOF, we get [], and on timeout, we may also get [], so how are the cases distinguishable? From the documentation, I thought that the timeout case is distinguished from EOF by a partially instantiated list, or (in case nothing at all arrived within the timeout) with a variable.

@triska
Copy link
Contributor

triska commented Oct 26, 2025

For completeness: We can always manually use at_end_of_stream/1, so read_n_chars/4 itself need not even provide special facilities to distinguish this condition!

@jjtolton
Copy link
Author

jjtolton commented Oct 26, 2025

And more importantly:

returns partial data (distinguishable from EOF).

On EOF, we get [], and on timeout, we may also get [], so how are the cases distinguishable? From the documentation, I thought that the timeout case is distinguished from EOF by a partially instantiated list, or (in case nothing at all arrived within the timeout) with a variable.

I was using the informal sense of the world "partial" here. Thanks for pointing this out, I will fix the terminology to be more precise.

I am open to suggestions on how to differentiate EOF vs timeout.

We could also put an option in the 3rd argument for termination characterization.

@triska
Copy link
Contributor

triska commented Oct 26, 2025

suggestions on how to differentiate EOF vs timeout.

Regarding the general API, I have now compared this with the Elisp function accept-process-output, which says:

(accept-process-output &optional PROCESS SECONDS MILLISEC JUST-THIS-ONE)

Allow any pending output from subprocesses to be read by Emacs.
It is given to their filter functions.
...

Optional second argument SECONDS and third argument MILLISEC
specify a timeout; return after that much time even if there is
no subprocess output. 

I think this is the direction you initially envisaged: That the timeout only plays a role if there is nothing to read. On the other hand, if there is data available, then why not read and yield it?

In our case, both semantics do not mesh well with the N argument of get_n_chars/4. So maybe a slightly different API would be useful for reading with timeouts, analogous to Elisp's accept-process-output?

Maybe: get_chars_timeout(+Stream, -Chars, +Timeout): Yield as many Chars as can be read from Stream within Timeout?

@jjtolton
Copy link
Author

This is true, and that was a thought I had. However the immediate reason that comes to mind is a poorly behaved process could flood the server.

What about a get_n_chars/5 where the final argument is EOF or timeout?

@triska
Copy link
Contributor

triska commented Oct 26, 2025

However the immediate reason that comes to mind is a poorly behaved process could flood the server.

Good point! What I was thinking about initially is an application with real-time requirements, such as audio processing, where another thing must be handled after a number of milliseconds.

Therefore, yes, good point, for these reasons I think it makes sense to honor the timeout in all situations.

As mentioned, I think there is no need to indicate EOF within this predicate. When needed, that can be tested completely separately, with the standard predicate at_end_of_stream/1. Note that the existing predicate get_n_chars/3 also does not indicate EOF.

I think it only remains to document the arguments correctly.

@jjtolton jjtolton force-pushed the discussion-3035 branch 2 times, most recently from 8b647d4 to e6e3105 Compare October 26, 2025 15:23
@jjtolton jjtolton marked this pull request as draft October 26, 2025 15:47
Implements non-blocking I/O for get_n_chars with configurable timeout,
addressing GitHub discussion mthom#3035. The timeout parameter enables reading
from TCP sockets and process pipes without indefinite blocking.
Key features:
- New get_n_chars/4 predicate with timeout parameter (milliseconds)
- Variable N support: when unbound, N unifies with actual chars read
- Timeout behavior: returns partial data, distinct from EOF
- Stream remains usable after timeout (no EOF marking)
- Native OS timeouts: TCP uses set_read_timeout(), pipes use poll()
- Backward compatible: existing get_n_chars/3 unchanged
Implementation:
- Added GetNCharsWithTimeout instruction (arity 4)
- Separate dispatch handlers for 3-arg and 4-arg versions
- Stream methods: set_read_timeout() and poll_read_ready()
- Direct byte reading with elapsed time tracking for pipe timeouts
- Proper handling of all variable types (Var, AttrVar, StackVar)
Timeout values:
- Integer: timeout in milliseconds
- 0 or 'nonblock': minimal timeout (1ms)
- 'infinity' or 'inf': no timeout (blocks indefinitely)
Changes:
- Modified timeout=0 to mean "no timeout" (was 1ms)
- Fixed critical buffering bug where CharReader buffering prevented
  reading all immediately available data with timeout
- Updated documentation for get_n_chars/4 timeout behavior
- Created comprehensive test suite with 10 passing tests
This is work in progress - tests need to be migrated to follow
the testing guide structure (src/tests/*.pl and CLI tests).
Moved tests from standalone file to proper testing framework:
- Created src/tests/get_n_chars.pl using test_framework module
- Created CLI test at tests/scryer/cli/src_tests/get_n_chars.toml
- Removed old tests_get_n_chars.pl
All 10 tests pass:
1. timeout=0 equals get_n_chars/3
2. Variable N with timeout=0
3. Negative timeout equals no timeout
4. Positive timeout stops reading
5. Infinity atom means no timeout
6. Stream usable after timeout
7. Timeout returns partial data not EOF
8. Multiple reads with timeout=0
9. Read more than available with timeout=0
10. Variable N unifies with actual count
Tests now follow the three-layer approach from TESTING_GUIDE.md:
- Layer 2: Prolog integration tests (src/tests/get_n_chars.pl)
- Layer 3: CLI tests (tests/scryer/cli/src_tests/get_n_chars.toml)
When get_n_chars/4 timed out mid-character (e.g., after reading the
first byte of a 4-byte UTF-8 character), subsequent stream operations
would fail with syntax_error(invalid_data) because CharReader wasn't
aware of the incomplete bytes saved in the incomplete_utf8 buffer.

Key changes:
- Added CharReader::prepend_bytes() to inject incomplete UTF-8 bytes
  into CharReader's buffer before reading new data
- Created specialized StreamLayout<CharReader<PipeReader>> impl that
  loads incomplete_utf8 bytes before any read operation
- Ensures incomplete_utf8 buffer is cleared after loading to prevent
  double-reading

Fixed operations after timeout mid-character:
- get_char/2, peek_char/2: correctly read/peek the complete character
- get_code/2, peek_code/2: correctly return the character's code point
- get_n_chars/3: continues reading from incomplete character
- read_term/3: works after consuming incomplete character
- get_line_to_chars/3: includes incomplete character in line
- Sequential timeouts: handles multiple incomplete UTF-8 sequences

Tests added:
- src/tests/incomplete_utf8.pl: 8 comprehensive tests using
  test_framework, covering all major stream reading predicates
- tests/scryer/cli/src_tests/incomplete_utf8.toml: CLI test config

All tests pass, verifying that stream operations correctly handle
incomplete UTF-8 sequences left by get_n_chars/4 timeouts.
Replace unification (=) with structural equality (==) for stronger
test assertions. The == operator ensures variables are already bound
to the expected values, preventing false positives from unification.

As suggested by @triska in PR review, this catches potential bugs
where predicates might incorrectly leave variables unbound.

Changed assertions:
- Character comparisons: C1 = '💜' → C1 == '💜'
- List comparisons: Chars1 = [] → Chars1 == []
- Kept numeric comparisons as =:= (already correct)
Replace list notation with double-quote string notation for cleaner,
more idiomatic test assertions.

Changes:
- Line == ['💜',t,e,s,t,'\n'] → Line == "💜test\n"
- Chars2 == ['💜', 'A'] → Chars2 == "💜A"

As suggested by @triska in PR review.
Wrap all process_create calls with setup_call_cleanup/3 to ensure
reliable stream cleanup even if tests fail.

Changes:
- Added library(iso_ext) import for setup_call_cleanup/3
- Wrapped all 8 tests in incomplete_utf8.pl with setup_call_cleanup
- Wrapped all 11 tests in get_n_chars.pl with setup_call_cleanup
- Tests with multiple streams use nested setup_call_cleanup

Pattern used:
  setup_call_cleanup(
      process_create(..., [stdout(pipe(Out))]),
      ( ... test body ... ),
      close(Out)
  )

Note: There appears to be a module loading issue when running these
tests that needs investigation. The tests load correctly before this
change, suggesting the iso_ext import may conflict with the test
framework's module system.
Apply module qualifications to all imported predicates in test files
to work around testing framework idiosyncracy where importing iso_ext
causes other modules to become unavailable.

Changes:
- Qualify setup_call_cleanup with iso_ext:
- Qualify process_create with process:
- Qualify get_n_chars, get_char, peek_char, get_code, peek_code, and
  get_line_to_chars with charsio:
- Qualify length with lists:
Adds 7 additional tests for nonblock functionality:
- nonblock with fixed N limit
- nonblock with variable N reads all available
- nonblock returns empty when no data ready
- nonblock vs timeout returns immediately
- sequential nonblock reads drain buffer
- nonblock with slow data returns partial snapshot
- variable N: timeout waits, nonblock returns immediately

Total test coverage now:
- 18 tests in get_n_chars.pl (including nonblock)
- 8 tests in incomplete_utf8.pl (all predicates)
- 26 total tests for complete coverage
Add test functions to tests/scryer/src_tests.rs to enable running
the get_n_chars/4 and incomplete UTF-8 handling tests via cargo test.

Tests:
- get_n_chars: 18 tests covering timeout, nonblock, and variable N
- incomplete_utf8: 8 tests covering all predicates with partial UTF-8

Both test suites pass successfully.
When multiple process_create calls are needed in a test, each stream
must have its own setup_call_cleanup wrapper. Previously used a single
setup_call_cleanup with compound setup and cleanup goals, which is
incorrect.

Changed from:
  setup_call_cleanup(
      (create1, create2),
      test_body,
      (close2, close1)
  )

To proper nesting:
  setup_call_cleanup(
      create1,
      setup_call_cleanup(
          create2,
          test_body,
          close2
      ),
      close1
  )

This ensures proper cleanup order even if tests fail.
@jjtolton jjtolton marked this pull request as ready for review October 26, 2025 18:08
@jjtolton
Copy link
Author

Ok I think I unbuggered everything after a failed rebase, could use another set of eyes

% - nonblock (atom for minimal non-blocking behavior)
%
% Returns whatever data is available within the timeout period.
% On timeout, returns partial data (distinguishable from EOF).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Partial data" in the sense that it may be less than N chars?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants