Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Fix edge cases for int() casting of strings #1589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

faze-geek
Copy link
Contributor

All of these cases below throw the invalid literal semantic error on master as of now.

On branch

def foo():        
    print(int("  3   "))
    print(int("+3"))
    print(int("\n3"))
    print(int("3\n"))
    print(int("\r\t\n3\r\t\n"))


foo()

(lp) C:\Users\kunni\lpython>python try.py
3
3
3
3
3

(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
3
3
3
3
3

@faze-geek faze-geek marked this pull request as draft March 20, 2023 07:10
ch++;
}
ival = std::stoi(c);
ival = std::stoi(str);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case to be looked into :

def f():
    print(int(' '))    # Only whitespace without any numeric character.
f()

(lp) C:\Users\kunni\lpython>python try.py
Traceback (most recent call last):
  File "C:\Users\kunni\lpython\try.py", line 3, in <module>
    f()
  File "C:\Users\kunni\lpython\try.py", line 2, in f
    print(int(' '))
ValueError: invalid literal for int() with base 10: ' '

(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
std::exception: invalid stoi argument

Been a bit busy with university exams, will handle this soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this case throw an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this case throws an error - The stoi catches invalid input and throws an error. I'll have to add a if condition and throws a corresponding semantic error. Thats it right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think, we have to check and throw a SemanticError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than this edge case I feel all cases are handled !

@faze-geek faze-geek marked this pull request as ready for review March 25, 2023 15:04
@faze-geek
Copy link
Contributor Author

@czgdp1807 @Thirumalai-Shaktivel Kindly consider this pull request ready for review. The CI logs fail as of now, but I'll handle it at once after having collective reviews.

@Thirumalai-Shaktivel
Copy link
Collaborator

LGTM!

@faze-geek
Copy link
Contributor Author

LGTM!

Have a look at the CI logs once please. I'm not sure if I had updated the references after the latest commit.

assert i32(int("+3")) == 3
assert i32(int("\n3")) == 3
assert i32(int("3\n")) == 3
assert i32(int("\r\t\n3\r\t\n")) == 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an AssertionError among these tests.
Check all the cases if it gives the correct output.

Comment on lines +82 to 84
if (*ch == '-' || *ch == '+') {
ch++;
}
Copy link
Collaborator

@ubaidsk ubaidsk Mar 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems currently + and - signs are being ignored. We have error at the following place.

(lp) lpython$ git diff                                    
diff --git a/integration_tests/test_str_to_int.py b/integration_tests/test_str_to_int.py
index 765dd1e0f..b67dcab32 100644
--- a/integration_tests/test_str_to_int.py
+++ b/integration_tests/test_str_to_int.py
@@ -3,21 +3,27 @@ from ltypes import i32
 def f():
     i: i32
     i = i32(int("314"))
+    print(i)
     assert i == 314
     i = i32(int("-314"))
     i = i32(int("-314"))
+    print(i)
     assert i == -314
     s: str
     s = "123"
     i = i32(int(s))
+    print(i)
     assert i == 123
     s = "-123"
     i = i32(int(s))
+    print(i)
     assert i == -123
     s = "    1234"
     i = i32(int(s))
+    print(i)
     assert i == 1234
     s = "    -1234   "
     i = i32(int(s))
+    print(i)
     assert i == -1234
 
     assert i32(int("  3   ")) == 3
@@ -25,5 +31,6 @@ def f():
     assert i32(int("\n3")) == 3
     assert i32(int("3\n")) == 3
     assert i32(int("\r\t\n3\r\t\n")) == 3
+    print("Done")
 
 f()
(lp) lpython$ lpython integration_tests/test_str_to_int.py
314
314
AssertionError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since we check for +/- sign for only at the start, I think the following could fail (it fails currently):

(lp) lpython$ python3 examples/expr2.py
-3
3
(lp) lpython$ lpython examples/expr2.py
semantic error: invalid literal for int() with base 10: '    -3    '
 --> examples/expr2.py:4:19
  |
4 |     print(i32(int("    -3    ")))
  |                   ^^^^^^^^^^^^ 


Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).

Comment on lines +86 to 103
if(*ch == ' '){
ch++;
continue;
}
if(*ch == '\\'){
ch++;
if (*ch == 'n' || *ch == 'r' || *ch == 't') {
ch++;
continue;
}
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch == '.') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the above could be simplified as follows:

Suggested change
if(*ch == ' '){
ch++;
continue;
}
if(*ch == '\\'){
ch++;
if (*ch == 'n' || *ch == 'r' || *ch == 't') {
ch++;
continue;
}
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch == '.') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if(*ch == ' ' || *ch == '\n' || *ch == '\r' || *ch == '\t'){
ch++;
continue;
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ch holds the whole escaped character. So, we need not check for \n (or others) in two steps. The above approach performs the check in one step.

Example:

#include <iostream>
#include <cstring>
using namespace std;

int main()
{
    char *mystr = "abc def \n\r\t pqr xyz";
    char *ch = mystr;
    while (*ch) {
        if (*ch == ' ') {
            // do not print anything
        } else if (*ch == '\n') {
            std::cout << "newline character" << std::endl;
        } else if (*ch == '\r') {
            std::cout << "carriage return" << std::endl;
        } else if (*ch == '\t') {
            std::cout << "tab character" << std::endl;
        } else {
            std::cout << *ch << std::endl;
        }
        ch++;
    }
    return 0;
}

Output:

a
b
c
d
e
f
newline character
carriage return
tab character
p
q
r
x
y
z

if (*ch == '.') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
if (*ch < '0' || *ch > '9') {
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc);
}
str+=*ch;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Minor: I guess two spaces here would look good).

Suggested change
str+=*ch;
str += *ch;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants