Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ethomson
Copy link
Member

@ethomson ethomson commented Mar 3, 2018

gitno_extract_url_parts seems a little deficient in a few ways:

  • This moves percent decoding into git_bufs. This lets the SSH transport avoid an unnecessary small, temporary malloc.
  • Now gitno_extract_url_parts uses the same buffer-based decoding
  • Percent decode usernames even when passwords are not provided
  • Percent decode hostnames, per RFC 3986

This builds on #4557.

@ethomson ethomson force-pushed the ethomson/ssh-unescape branch 2 times, most recently from 63d0b2e to 2acde00 Compare March 4, 2018 11:06
@ethomson ethomson mentioned this pull request Mar 10, 2018
13 tasks
@ethomson ethomson force-pushed the ethomson/ssh-unescape branch from 2acde00 to f2a3d68 Compare March 14, 2018 15:08
pks-t
pks-t previously requested changes Mar 15, 2018
src/buffer.c Outdated
size_t str_len)
{
size_t str_pos, new_size;
int error = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's unnecessary.

if (str[str_pos] == '%' &&
str_len > str_pos + 2 &&
isxdigit(str[str_pos + 1]) &&
isxdigit(str[str_pos + 2])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in favour of indenting additional lines for if-statements by four spaces. Makes it more obvious what still belongs to the if-statement and what not. Anyway, no need to settle on that issue now.


for (str_pos = 0; str_pos < str_len; buf->size++, str_pos++) {
if (str[str_pos] == '%' &&
str_len > str_pos + 2 &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Doesn't this need to be >=? I'd expect like this if there is a "%xx" at the end of string it won't be decoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so: consider some string of length 4 (str_len in this case is 4). If we were to use >= instead - then this comparison becomes str_len >= str_pos + 2, and then this would be true for str_pos = 2. Consider that the string of length 4 is the string: ab%z. At the point in the loop when str_pos = 2, then we would evaluate true on the first condition (str[str_pos] == '%'), and the second (str_len >= str_pos + 2), and then would walk off the end of the array in the fourth. (str[str_pos + 2]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convinced, thanks!

expect_decode_pass("github.com", "github.com");
expect_decode_pass("github.com", "githu%62.com");
expect_decode_pass("github.com", "github%2ecom");
expect_decode_pass("foo bar baz", "foo%20bar%20baz");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably also add the case of trailing "%xx"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

git_buf_oom(&port) ||
git_buf_oom(&path) ||
git_buf_oom(&username) ||
git_buf_oom(&password))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leaks memory in case only some of these buffers OOM'd

Copy link
Member Author

@ethomson ethomson Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We made the decision a long time ago to leak on small allocation failures - it's not work cleaning up 3 bytes if you're OOM. You're going to fail. (We clean up large allocations though, since if you've just allocated 1 GB and OOM, then there's a possibility that you could actually do something useful after freeing that 1GB. 3 bytes, not so much.)

We could revisit this decision, I suppose, but a) I think that there are many more pressing needs and b) let's not revisit this decision until 0.27 gets out the door.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

src/netops.c Outdated
url_userinfo = url + u.field_data[UF_USERINFO].off;
url_userinfo_len = u.field_data[UF_USERINFO].len;

if (has_host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno. Wouldn't this function be easier to read if setting the buffers would be all in one spot? E.g.

if (!!(u.field_set & (1 << UF_HOST)))
        git_buf_decode_percent(&host, url + u.field_data[UF_HOST].off, u.field_data[UF_HOST].len);

A lot less variables and all handling of that information is at exactly one spot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines are soooooo long then. It is much cleaner to when we use some temporary variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly true. On the other hand, you could just have locale variables inside of that block

@ethomson ethomson force-pushed the ethomson/ssh-unescape branch from f2a3d68 to 7bbd741 Compare March 16, 2018 10:42
@ethomson
Copy link
Member Author

I added a test for the trailing %xx and changed the tests a bit to ensure that the tests only read the len bytes (by adding trailing garbage at the end of the strings, past len bytes). This ensures that we actually look at the len bytes instead of looking for a NUL terminator.

expect_decode_pass("github.com", "github%2ecom");
expect_decode_pass("foo bar baz", "foo%20bar%20baz");
expect_decode_pass("foo bar baz", "foo%20bar%20baz");
expect_decode_pass("foo bar ", "foo%20bar%20");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@pks-t
Copy link
Member

pks-t commented Mar 16, 2018

Looks good to me. I'm still a bit on the edge with regards to all those variables getting declared at the start of the function. Making them block-local to the conditions would clean that up a bit, I'd guess. Not blocking this PR for it, though. Feel free to merge and do an rc3

@ethomson ethomson force-pushed the ethomson/ssh-unescape branch from 7bbd741 to 6533dfc Compare March 17, 2018 14:00
@ethomson
Copy link
Member Author

Making them block-local to the conditions would clean that up a bit, I'd guess.

Yeah, that seems like a nice compromise. I did this, and left the bool declarations the way they were. I think it's an improvement.

@pks-t
Copy link
Member

pks-t commented Mar 17, 2018

Perfect, looks a lot better like that, thanks! One issue though. I think I recall one compiler who treats an empty newline between two blocks of variable declarations as non-ISO-C90 compliant, producing an error. You had that in one place (can't point out as I cannot comment due to GitHub's JavaScript being broken for me). Other than that, I'm happy with this PR.

Also, all CI jobs are currently failing due to an access issue with our Bitbucket repository. We should try to resolve that issue soonish.

@ethomson
Copy link
Member Author

ethomson commented Mar 17, 2018

I'm curious what compiler you're thinking of. I've never heard of this before. MSVC (pre-2015) is the canonical shitty compiler, but even it didn't have this problem. I'll fix it though, but I suspect that it's not a problem. (I also suspect that we have whitespace between declarations somewhere.)

I'm trying to figure out the bitbucket issue. I'm hoping that we can get access back to the repositories, since (I think) we're validating some of the refs and such. (sigh).

@ethomson ethomson force-pushed the ethomson/ssh-unescape branch from 6533dfc to 3f45b1f Compare March 17, 2018 19:54
@ethomson
Copy link
Member Author

What I don't understand is why the tests succeed sometimes. 🤔

@pks-t
Copy link
Member

pks-t commented Mar 19, 2018

No idea either. Maybe it depends on the server, where we get redirected to faulty instances by their load balancer in some occasions.

Regarding the compiler I'm completely clueless which one it was. I'm really sure to have experienced it at some point in time, though

@ethomson
Copy link
Member Author

Regarding the compiler I'm completely clueless which one it was. I'm really sure to have experienced it at some point in time, though

We have whitespace between decls all over the codebase. Here's just one example that has been in the codebase for years now.

If there's a compiler so broken that it can't parse whitespaces, then thankfully nobody's actually used it to compile us or file an issue about it. We have a lot of problems that keep us from moving things forward that we should be focusing on instead of that.

emmax86 and others added 9 commits March 19, 2018 16:08
Introduce a function to take a percent-encoded string (URI encoded,
described by RFC 1738) and decode it into a `git_buf`.
Use `git_buf_decode_percent` so that we can avoid allocating a temporary
buffer.
Now that we can decode percent-encoded strings as part of `git_buf`s,
use that decoder in `gitno_extract_url_parts`.
RFC 3986 says that hostnames can be percent encoded.  Percent decode
hostnames in our URLs.
@ethomson ethomson force-pushed the ethomson/ssh-unescape branch from 3f45b1f to 9108959 Compare March 19, 2018 23:08
@ethomson ethomson merged commit 5585e35 into master Mar 20, 2018
@ethomson ethomson deleted the ethomson/ssh-unescape branch October 26, 2018 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants