-
Notifications
You must be signed in to change notification settings - Fork 171
Fix edge cases for int()
casting of strings
#1589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ch++; | ||
} | ||
ival = std::stoi(c); | ||
ival = std::stoi(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edge case to be looked into :
def f():
print(int(' ')) # Only whitespace without any numeric character.
f()
(lp) C:\Users\kunni\lpython>python try.py
Traceback (most recent call last):
File "C:\Users\kunni\lpython\try.py", line 3, in <module>
f()
File "C:\Users\kunni\lpython\try.py", line 2, in f
print(int(' '))
ValueError: invalid literal for int() with base 10: ' '
(lp) C:\Users\kunni\lpython>src\bin\lpython try.py
std::exception: invalid stoi argument
Been a bit busy with university exams, will handle this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this case throw an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this case throws an error - The stoi
catches invalid input and throws an error. I'll have to add a if condition and throws a corresponding semantic error. Thats it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think, we have to check and throw a SemanticError.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than this edge case I feel all cases are handled !
@czgdp1807 @Thirumalai-Shaktivel Kindly consider this pull request ready for review. The CI logs fail as of now, but I'll handle it at once after having collective reviews. |
LGTM! |
Have a look at the CI logs once please. I'm not sure if I had updated the references after the latest commit. |
assert i32(int("+3")) == 3 | ||
assert i32(int("\n3")) == 3 | ||
assert i32(int("3\n")) == 3 | ||
assert i32(int("\r\t\n3\r\t\n")) == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an AssertionError
among these tests.
Check all the cases if it gives the correct output.
if (*ch == '-' || *ch == '+') { | ||
ch++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems currently +
and -
signs are being ignored. We have error at the following place.
(lp) lpython$ git diff
diff --git a/integration_tests/test_str_to_int.py b/integration_tests/test_str_to_int.py
index 765dd1e0f..b67dcab32 100644
--- a/integration_tests/test_str_to_int.py
+++ b/integration_tests/test_str_to_int.py
@@ -3,21 +3,27 @@ from ltypes import i32
def f():
i: i32
i = i32(int("314"))
+ print(i)
assert i == 314
i = i32(int("-314"))
i = i32(int("-314"))
+ print(i)
assert i == -314
s: str
s = "123"
i = i32(int(s))
+ print(i)
assert i == 123
s = "-123"
i = i32(int(s))
+ print(i)
assert i == -123
s = " 1234"
i = i32(int(s))
+ print(i)
assert i == 1234
s = " -1234 "
i = i32(int(s))
+ print(i)
assert i == -1234
assert i32(int(" 3 ")) == 3
@@ -25,5 +31,6 @@ def f():
assert i32(int("\n3")) == 3
assert i32(int("3\n")) == 3
assert i32(int("\r\t\n3\r\t\n")) == 3
+ print("Done")
f()
(lp) lpython$ lpython integration_tests/test_str_to_int.py
314
314
AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since we check for +
/-
sign for only at the start, I think the following could fail (it fails currently):
(lp) lpython$ python3 examples/expr2.py
-3
3
(lp) lpython$ lpython examples/expr2.py
semantic error: invalid literal for int() with base 10: ' -3 '
--> examples/expr2.py:4:19
|
4 | print(i32(int(" -3 ")))
| ^^^^^^^^^^^^
Note: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).
if(*ch == ' '){ | ||
ch++; | ||
continue; | ||
} | ||
if(*ch == '\\'){ | ||
ch++; | ||
if (*ch == 'n' || *ch == 'r' || *ch == 't') { | ||
ch++; | ||
continue; | ||
} | ||
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
} | ||
if (*ch == '.') { | ||
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
} | ||
if (*ch < '0' || *ch > '9') { | ||
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above could be simplified as follows:
if(*ch == ' '){ | |
ch++; | |
continue; | |
} | |
if(*ch == '\\'){ | |
ch++; | |
if (*ch == 'n' || *ch == 'r' || *ch == 't') { | |
ch++; | |
continue; | |
} | |
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
} | |
if (*ch == '.') { | |
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
} | |
if (*ch < '0' || *ch > '9') { | |
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
} | |
if(*ch == ' ' || *ch == '\n' || *ch == '\r' || *ch == '\t'){ | |
ch++; | |
continue; | |
} | |
if (*ch < '0' || *ch > '9') { | |
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ch
holds the whole escaped character. So, we need not check for \n
(or others) in two steps. The above approach performs the check in one step.
Example:
#include <iostream>
#include <cstring>
using namespace std;
int main()
{
char *mystr = "abc def \n\r\t pqr xyz";
char *ch = mystr;
while (*ch) {
if (*ch == ' ') {
// do not print anything
} else if (*ch == '\n') {
std::cout << "newline character" << std::endl;
} else if (*ch == '\r') {
std::cout << "carriage return" << std::endl;
} else if (*ch == '\t') {
std::cout << "tab character" << std::endl;
} else {
std::cout << *ch << std::endl;
}
ch++;
}
return 0;
}
Output:
a
b
c
d
e
f
newline character
carriage return
tab character
p
q
r
x
y
z
if (*ch == '.') { | ||
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
} | ||
if (*ch < '0' || *ch > '9') { | ||
throw SemanticError("invalid literal for int() with base 10: '"+ std::string(c) + "'", arg->base.loc); | ||
} | ||
str+=*ch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Minor: I guess two spaces here would look good).
str+=*ch; | |
str += *ch; |
All of these cases below throw the
invalid literal
semantic error on master as of now.On branch