Refactor `gitno_extract_url_parts` #4563

ethomson · 2018-03-03T21:54:58Z

gitno_extract_url_parts seems a little deficient in a few ways:

This moves percent decoding into git_bufs. This lets the SSH transport avoid an unnecessary small, temporary malloc.
Now gitno_extract_url_parts uses the same buffer-based decoding
Percent decode usernames even when passwords are not provided
Percent decode hostnames, per RFC 3986

This builds on #4557.

pks-t · 2018-03-15T12:30:36Z

src/buffer.c

+	size_t str_len)
+{
+	size_t str_pos, new_size;
+	int error = 0;


This variable is unused

Right, it's unnecessary.

pks-t · 2018-03-15T12:31:34Z

src/buffer.c

+		if (str[str_pos] == '%' &&
+			str_len > str_pos + 2 &&
+			isxdigit(str[str_pos + 1]) &&
+			isxdigit(str[str_pos + 2])) {


I'm still in favour of indenting additional lines for if-statements by four spaces. Makes it more obvious what still belongs to the if-statement and what not. Anyway, no need to settle on that issue now.

pks-t · 2018-03-15T12:36:03Z

src/buffer.c

+
+	for (str_pos = 0; str_pos < str_len; buf->size++, str_pos++) {
+		if (str[str_pos] == '%' &&
+			str_len > str_pos + 2 &&


Hm. Doesn't this need to be >=? I'd expect like this if there is a "%xx" at the end of string it won't be decoded.

I don't think so: consider some string of length 4 (str_len in this case is 4). If we were to use >= instead - then this comparison becomes str_len >= str_pos + 2, and then this would be true for str_pos = 2. Consider that the string of length 4 is the string: ab%z. At the point in the loop when str_pos = 2, then we would evaluate true on the first condition (str[str_pos] == '%'), and the second (str_len >= str_pos + 2), and then would walk off the end of the array in the fourth. (str[str_pos + 2]).

Convinced, thanks!

pks-t · 2018-03-15T12:41:17Z

tests/buf/percent.c

+	expect_decode_pass("github.com", "github.com");
+	expect_decode_pass("github.com", "githu%62.com");
+	expect_decode_pass("github.com", "github%2ecom");
+	expect_decode_pass("foo bar baz", "foo%20bar%20baz");


You should probably also add the case of trailing "%xx"

pks-t · 2018-03-15T12:42:39Z

src/netops.c

+		git_buf_oom(&port) ||
+		git_buf_oom(&path) ||
+		git_buf_oom(&username) ||
+		git_buf_oom(&password))


This leaks memory in case only some of these buffers OOM'd

Yes. We made the decision a long time ago to leak on small allocation failures - it's not work cleaning up 3 bytes if you're OOM. You're going to fail. (We clean up large allocations though, since if you've just allocated 1 GB and OOM, then there's a possibility that you could actually do something useful after freeing that 1GB. 3 bytes, not so much.)

We could revisit this decision, I suppose, but a) I think that there are many more pressing needs and b) let's not revisit this decision until 0.27 gets out the door.

pks-t · 2018-03-15T12:48:23Z

src/netops.c

+	url_userinfo = url + u.field_data[UF_USERINFO].off;
+	url_userinfo_len = u.field_data[UF_USERINFO].len;
+
+	if (has_host)


Dunno. Wouldn't this function be easier to read if setting the buffers would be all in one spot? E.g.

if (!!(u.field_set & (1 << UF_HOST))) git_buf_decode_percent(&host, url + u.field_data[UF_HOST].off, u.field_data[UF_HOST].len);

A lot less variables and all handling of that information is at exactly one spot.

The lines are soooooo long then. It is much cleaner to when we use some temporary variables.

Certainly true. On the other hand, you could just have locale variables inside of that block

ethomson · 2018-03-16T10:43:40Z

I added a test for the trailing %xx and changed the tests a bit to ensure that the tests only read the len bytes (by adding trailing garbage at the end of the strings, past len bytes). This ensures that we actually look at the len bytes instead of looking for a NUL terminator.

fixed

pks-t · 2018-03-16T12:05:31Z

tests/buf/percent.c

+	expect_decode_pass("github.com", "github%2ecom");
+	expect_decode_pass("foo bar baz", "foo%20bar%20baz");
+	expect_decode_pass("foo bar baz", "foo%20bar%20baz");
+	expect_decode_pass("foo bar ", "foo%20bar%20");


pks-t · 2018-03-16T12:07:46Z

Looks good to me. I'm still a bit on the edge with regards to all those variables getting declared at the start of the function. Making them block-local to the conditions would clean that up a bit, I'd guess. Not blocking this PR for it, though. Feel free to merge and do an rc3

ethomson · 2018-03-17T14:01:09Z

Making them block-local to the conditions would clean that up a bit, I'd guess.

Yeah, that seems like a nice compromise. I did this, and left the bool declarations the way they were. I think it's an improvement.

pks-t · 2018-03-17T18:22:12Z

Perfect, looks a lot better like that, thanks! One issue though. I think I recall one compiler who treats an empty newline between two blocks of variable declarations as non-ISO-C90 compliant, producing an error. You had that in one place (can't point out as I cannot comment due to GitHub's JavaScript being broken for me). Other than that, I'm happy with this PR.

Also, all CI jobs are currently failing due to an access issue with our Bitbucket repository. We should try to resolve that issue soonish.

ethomson · 2018-03-17T19:52:02Z

I'm curious what compiler you're thinking of. I've never heard of this before. MSVC (pre-2015) is the canonical shitty compiler, but even it didn't have this problem. I'll fix it though, but I suspect that it's not a problem. (I also suspect that we have whitespace between declarations somewhere.)

I'm trying to figure out the bitbucket issue. I'm hoping that we can get access back to the repositories, since (I think) we're validating some of the refs and such. (sigh).

ethomson · 2018-03-17T20:36:23Z

What I don't understand is why the tests succeed sometimes. 🤔

pks-t · 2018-03-19T15:42:03Z

No idea either. Maybe it depends on the server, where we get redirected to faulty instances by their load balancer in some occasions.

Regarding the compiler I'm completely clueless which one it was. I'm really sure to have experienced it at some point in time, though

ethomson · 2018-03-19T22:43:22Z

Regarding the compiler I'm completely clueless which one it was. I'm really sure to have experienced it at some point in time, though

We have whitespace between decls all over the codebase. Here's just one example that has been in the codebase for years now.

If there's a compiler so broken that it can't parse whitespaces, then thankfully nobody's actually used it to compile us or file an issue about it. We have a lot of problems that keep us from moving things forward that we should be focusing on instead of that.

Introduce a function to take a percent-encoded string (URI encoded, described by RFC 1738) and decode it into a `git_buf`.

Use `git_buf_decode_percent` so that we can avoid allocating a temporary buffer.

Now that we can decode percent-encoded strings as part of `git_buf`s, use that decoder in `gitno_extract_url_parts`.

RFC 3986 says that hostnames can be percent encoded. Percent decode hostnames in our URLs.

ethomson mentioned this pull request Mar 3, 2018

Unescape special characters in SSH repo paths #4557

Closed

ethomson force-pushed the ethomson/ssh-unescape branch 2 times, most recently from 63d0b2e to 2acde00 Compare March 4, 2018 11:06

ethomson mentioned this pull request Mar 10, 2018

Release v0.27.0 & v0.26.1 #4465

Closed

13 tasks

ethomson force-pushed the ethomson/ssh-unescape branch from 2acde00 to f2a3d68 Compare March 14, 2018 15:08

pks-t previously requested changes Mar 15, 2018

View reviewed changes

ethomson force-pushed the ethomson/ssh-unescape branch from f2a3d68 to 7bbd741 Compare March 16, 2018 10:42

pks-t reviewed Mar 16, 2018

View reviewed changes

ethomson force-pushed the ethomson/ssh-unescape branch from 7bbd741 to 6533dfc Compare March 17, 2018 14:00

ethomson force-pushed the ethomson/ssh-unescape branch from 6533dfc to 3f45b1f Compare March 17, 2018 19:54

emmax86 and others added 9 commits March 19, 2018 16:08

Rename unescape and make non-static

8a2cdbd

Unescape repo before constructing ssh request

1621087

Update tests

30333e8

Introduce git_buf_decode_percent

8070a35

Introduce a function to take a percent-encoded string (URI encoded, described by RFC 1738) and decode it into a `git_buf`.

ssh urls: use git_buf_decode_percent

6f57790

Use `git_buf_decode_percent` so that we can avoid allocating a temporary buffer.

gitno_extract_url_parts: use git_bufs

60e7848

Now that we can decode percent-encoded strings as part of `git_buf`s, use that decoder in `gitno_extract_url_parts`.

Remove now unnecessary gitno_unescape

05551ca

gitno_extract_url_parts: decode hostnames

0e4f3d9

RFC 3986 says that hostnames can be percent encoded. Percent decode hostnames in our URLs.

buf: add tests for percent decoding

9108959

ethomson force-pushed the ethomson/ssh-unescape branch from 3f45b1f to 9108959 Compare March 19, 2018 23:08

ethomson merged commit 5585e35 into master Mar 20, 2018

ethomson deleted the ethomson/ssh-unescape branch October 26, 2018 13:37

ethomson mentioned this pull request Jul 25, 2019

clone: don't decode URL percent encodings #5187

Merged

Refactor gitno_extract_url_parts #4563

Refactor gitno_extract_url_parts #4563

Uh oh!

Conversation

ethomson commented Mar 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethomson Mar 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethomson commented Mar 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pks-t commented Mar 16, 2018

Uh oh!

ethomson commented Mar 17, 2018

Uh oh!

pks-t commented Mar 17, 2018

Uh oh!

ethomson commented Mar 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethomson commented Mar 17, 2018

Uh oh!

pks-t commented Mar 19, 2018

Uh oh!

ethomson commented Mar 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor `gitno_extract_url_parts` #4563

Refactor `gitno_extract_url_parts` #4563

ethomson Mar 16, 2018 •

edited

Loading

ethomson commented Mar 17, 2018 •

edited

Loading