-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Refactor gitno_extract_url_parts
#4563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
63d0b2e to
2acde00
Compare
2acde00 to
f2a3d68
Compare
src/buffer.c
Outdated
| size_t str_len) | ||
| { | ||
| size_t str_pos, new_size; | ||
| int error = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's unnecessary.
| if (str[str_pos] == '%' && | ||
| str_len > str_pos + 2 && | ||
| isxdigit(str[str_pos + 1]) && | ||
| isxdigit(str[str_pos + 2])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still in favour of indenting additional lines for if-statements by four spaces. Makes it more obvious what still belongs to the if-statement and what not. Anyway, no need to settle on that issue now.
|
|
||
| for (str_pos = 0; str_pos < str_len; buf->size++, str_pos++) { | ||
| if (str[str_pos] == '%' && | ||
| str_len > str_pos + 2 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Doesn't this need to be >=? I'd expect like this if there is a "%xx" at the end of string it won't be decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so: consider some string of length 4 (str_len in this case is 4). If we were to use >= instead - then this comparison becomes str_len >= str_pos + 2, and then this would be true for str_pos = 2. Consider that the string of length 4 is the string: ab%z. At the point in the loop when str_pos = 2, then we would evaluate true on the first condition (str[str_pos] == '%'), and the second (str_len >= str_pos + 2), and then would walk off the end of the array in the fourth. (str[str_pos + 2]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convinced, thanks!
| expect_decode_pass("github.com", "github.com"); | ||
| expect_decode_pass("github.com", "githu%62.com"); | ||
| expect_decode_pass("github.com", "github%2ecom"); | ||
| expect_decode_pass("foo bar baz", "foo%20bar%20baz"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably also add the case of trailing "%xx"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
| git_buf_oom(&port) || | ||
| git_buf_oom(&path) || | ||
| git_buf_oom(&username) || | ||
| git_buf_oom(&password)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaks memory in case only some of these buffers OOM'd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We made the decision a long time ago to leak on small allocation failures - it's not work cleaning up 3 bytes if you're OOM. You're going to fail. (We clean up large allocations though, since if you've just allocated 1 GB and OOM, then there's a possibility that you could actually do something useful after freeing that 1GB. 3 bytes, not so much.)
We could revisit this decision, I suppose, but a) I think that there are many more pressing needs and b) let's not revisit this decision until 0.27 gets out the door.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
src/netops.c
Outdated
| url_userinfo = url + u.field_data[UF_USERINFO].off; | ||
| url_userinfo_len = u.field_data[UF_USERINFO].len; | ||
|
|
||
| if (has_host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno. Wouldn't this function be easier to read if setting the buffers would be all in one spot? E.g.
if (!!(u.field_set & (1 << UF_HOST)))
git_buf_decode_percent(&host, url + u.field_data[UF_HOST].off, u.field_data[UF_HOST].len);
A lot less variables and all handling of that information is at exactly one spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines are soooooo long then. It is much cleaner to when we use some temporary variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly true. On the other hand, you could just have locale variables inside of that block
f2a3d68 to
7bbd741
Compare
|
I added a test for the trailing |
| expect_decode_pass("github.com", "github%2ecom"); | ||
| expect_decode_pass("foo bar baz", "foo%20bar%20baz"); | ||
| expect_decode_pass("foo bar baz", "foo%20bar%20baz"); | ||
| expect_decode_pass("foo bar ", "foo%20bar%20"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Looks good to me. I'm still a bit on the edge with regards to all those variables getting declared at the start of the function. Making them block-local to the conditions would clean that up a bit, I'd guess. Not blocking this PR for it, though. Feel free to merge and do an rc3 |
7bbd741 to
6533dfc
Compare
Yeah, that seems like a nice compromise. I did this, and left the |
|
Perfect, looks a lot better like that, thanks! One issue though. I think I recall one compiler who treats an empty newline between two blocks of variable declarations as non-ISO-C90 compliant, producing an error. You had that in one place (can't point out as I cannot comment due to GitHub's JavaScript being broken for me). Other than that, I'm happy with this PR. Also, all CI jobs are currently failing due to an access issue with our Bitbucket repository. We should try to resolve that issue soonish. |
|
I'm curious what compiler you're thinking of. I've never heard of this before. MSVC (pre-2015) is the canonical shitty compiler, but even it didn't have this problem. I'll fix it though, but I suspect that it's not a problem. (I also suspect that we have whitespace between declarations somewhere.) I'm trying to figure out the bitbucket issue. I'm hoping that we can get access back to the repositories, since (I think) we're validating some of the refs and such. (sigh). |
6533dfc to
3f45b1f
Compare
|
What I don't understand is why the tests succeed sometimes. 🤔 |
|
No idea either. Maybe it depends on the server, where we get redirected to faulty instances by their load balancer in some occasions. Regarding the compiler I'm completely clueless which one it was. I'm really sure to have experienced it at some point in time, though |
We have whitespace between decls all over the codebase. Here's just one example that has been in the codebase for years now. If there's a compiler so broken that it can't parse whitespaces, then thankfully nobody's actually used it to compile us or file an issue about it. We have a lot of problems that keep us from moving things forward that we should be focusing on instead of that. |
Introduce a function to take a percent-encoded string (URI encoded, described by RFC 1738) and decode it into a `git_buf`.
Use `git_buf_decode_percent` so that we can avoid allocating a temporary buffer.
Now that we can decode percent-encoded strings as part of `git_buf`s, use that decoder in `gitno_extract_url_parts`.
RFC 3986 says that hostnames can be percent encoded. Percent decode hostnames in our URLs.
3f45b1f to
9108959
Compare
gitno_extract_url_partsseems a little deficient in a few ways:git_bufs. This lets the SSH transport avoid an unnecessary small, temporarymalloc.gitno_extract_url_partsuses the same buffer-based decodingThis builds on #4557.