Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix edge cases for int() casting of strings #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions integration_tests/test_str_to_int.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,10 @@ def f():
i = i32(int(s))
assert i == -1234

assert i32(int(" 3 ")) == 3
assert i32(int("+3")) == 3
assert i32(int("\n3")) == 3
assert i32(int("3\n")) == 3
assert i32(int("\r\t\n3\r\t\n")) == 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an AssertionError among these tests.
Check all the cases if it gives the correct output.


f()
18 changes: 16 additions & 2 deletions src/lpython/semantics/python_intrinsic_eval.h
Original file line number Diff line number Diff line change
Expand Up @@ -77,20 +77,34 @@ struct IntrinsicNodeHandler {
char *c = ASR::down_cast<ASR::StringConstant_t>(
ASRUtils::expr_value(arg))->m_s;
int ival = 0;
std::string str;
char *ch = c;
if (*ch == '-') {
if (*ch == '-' || *ch == '+') {
ch++;
}
Comment on lines +82 to 84
Copy link
Collaborator

@ubaidsk ubaidsk Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems currently + and - signs are being ignored. We have error at the following place.

(lp) lpython$ git diff                                    
diff --git a/integration_tests/test_str_to_int.py b/integration_tests/test_str_to_int.py
index 765dd1e0f..b67dcab32 100644
--- a/integration_tests/test_str_to_int.py
+++ b/integration_tests/test_str_to_int.py
@@ -3,21 +3,27 @@ from ltypes import i32
 def f():
     i: i32
     i = i32(int("314"))
+    print(i)
     assert i == 314
     i = i32(int("-314"))
     i = i32(int("-314"))
+    print(i)
     assert i == -314
     s: str
     s = "123"
     i = i32(int(s))
+    print(i)
     assert i == 123
     s = "-123"
     i = i32(int(s))
+    print(i)
     assert i == -123
     s = "    1234"
     i = i32(int(s))
+    print(i)
     assert i == 1234
     s = "    -1234   "
     i = i32(int(s))
+    print(i)
     assert i == -1234
 
     assert i32(int("  3   ")) == 3
@@ -25,5 +31,6 @@ def f():
     assert i32(int("\n3")) == 3
     assert i32(int("3\n")) == 3
     assert i32(int("\r\t\n3\r\t\n")) == 3
+    print("Done")
 
 f()
(lp) lpython$ lpython integration_tests/test_str_to_int.py
314
314
AssertionError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since we check for +/- sign for only at the start, I think the following could fail (it fails currently):

(lp) lpython$ python3 examples/expr2.py
-3
3
(lp) lpython$ lpython examples/expr2.py
semantic error: invalid literal for int() with base 10: '    -3    '
 --> examples/expr2.py:4:19
  |
4 |     print(i32(int("    -3    ")))
  |                   ^^^^^^^^^^^^ 


Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).

while (*ch) {
if(*ch == ' '){
ch++;
continue;
}
if(*ch == '\\'){
ch++;
if (*ch == 'n' || *ch == 'r' || *ch == 't') {
ch++;
continue;
}
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch == '.') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
Comment on lines +86 to 103
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above could be simplified as follows:

Suggested change
if(*ch == ' '){
ch++;
continue;
}
if(*ch == '\\'){
ch++;
if (*ch == 'n' || *ch == 'r' || *ch == 't') {
ch++;
continue;
}
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch == '.') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if(*ch == ' ' || *ch == '\n' || *ch == '\r' || *ch == '\t'){
ch++;
continue;
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ch holds the whole escaped character. So, we need not check for \n (or others) in two steps. The above approach performs the check in one step.

Example:

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
    char *mystr = "abc def \n\r\t pqr xyz";
    char *ch = mystr;
    while (*ch) {
        if (*ch == ' ') {
            // do not print anything
        } else if (*ch == '\n') {
            std::cout << "newline character" << std::endl;
        } else if (*ch == '\r') {
            std::cout << "carriage return" << std::endl;
        } else if (*ch == '\t') {
            std::cout << "tab character" << std::endl;
        } else {
            std::cout << *ch << std::endl;
        }
        ch++;
    }
    return 0;
}

Output:

a
b
c
d
e
f
newline character
carriage return
tab character
p
q
r
x
y
z

str+=*ch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor: I guess two spaces here would look good).

Suggested change
str+=*ch;
str += *ch;

ch++;
}
ival = std::stoi(c);
ival = std::stoi(str);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case to be looked into :

def f():
    print(int(' '))    # Only whitespace without any numeric character.
f()

(lp) C:\Users\kunni\lpython>python try.py
Traceback (most recent call last):
  File "C:\Users\kunni\lpython\try.py", line 3, in <module>
    f()
  File "C:\Users\kunni\lpython\try.py", line 2, in f
    print(int(' '))
ValueError: invalid literal for int() with base 10: ' '

(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
std::exception: invalid stoi argument

Been a bit busy with university exams, will handle this soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this case throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this case throws an error - The stoi catches invalid input and throws an error. I'll have to add a if condition and throws a corresponding semantic error. Thats it right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think, we have to check and throw a SemanticError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this edge case I feel all cases are handled !

return (ASR::asr_t *)ASR::down_cast<ASR::expr_t>(ASR::make_IntegerConstant_t(al,
loc, ival, to_type));
}
Expand Down