-
Notifications
You must be signed in to change notification settings - Fork 5
refactor: use coder/slog + minor go style changes #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just commenting for now, since it sounded like there might be some more changes you want to make, but I'm okay with making these changes
And to be clear, when I'm asking a question, that's not me trying to be defensive – I'm just trying to understand how big the gap between my TypeScript way of doing things is with how a Gopher usually does stuff
|
||
lineScanner := bufio.NewScanner(strings.NewReader(trimmed)) | ||
for lineScanner.Scan() { | ||
lineNum++ | ||
nextLine := lineScanner.Text() | ||
nextLine = lineScanner.Text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the point of this change, since nextLine
isn't ever used outside the loop. I'm not a fan of scope pollution, and try to keep scoping as aggressively small as possible, even in the same function
- Is this mostly a memory optimization?
- I feel like I see code all the time in Go that looks just like what we used to have. Especially with
range
loops. Does the below example have the same problems as the old approach, where we're declaring new block-scoped variables on the stack once per iteration?
for i, value := range exampleSlice {
// Stuff
}
Is there an optimization that range
loops have that doesn't exist with other loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is just an optimization to reduce memory allocations. Very minor in this case since I doubt this loop has a lot of iterations, but without this a new string for nextLine
is allocated for each iteration of the loop.
The Go compiler already does an optimization itself for for thing := range anotherThing
to do the same optimization, assigning to the same var for each iteration rather than allocating a new one every time.
var ( | ||
err error | ||
subDir os.FileInfo | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that Go engineers typically do? I guess I just expected these parentheses declarations to be mainly used for declaring groups of related variables. Right now, the variables aren't directly related (aside from being scoped to the function), and take up more lines total now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal preference in some cases. The convention is either if the variables are logically related to each other, or to help with readability such as when there's multiple variables declared near eachother and you want to avoid repeating the var
keyword.
In this case, I wanted both to get their respective default values and allow for turning the previous lines 17-18 into one line.
The other option was to do:
var subDir os.FileInfo
var err Error
} | ||
|
||
errs := []error{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question I've had for a bit: does it matter whether errs
is defined with an allocation or as a nil slice, since we're not serializing it as JSON?
I know that Go recommends that you don't differentiate between a nil slice and an empty, allocated slice aside from JSON output, but aside from JSON, are there ever any times when you'd want to do an allocation for a slice that might stay empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question I've had for a bit: does it matter whether errs is defined with an allocation or as a nil slice, since we're not serializing it as JSON?
Do you mean empty slice, as opposed to nil slice? Rather than doing var slice := make(...)
?
Usually I prefer doing make(...)
with some specified length/capacity since that allows for either starting out with a slice of the size you need, or at least of some reasonable size. Every call to append
when the underlying memory no longer has remaining capacity for what you're trying to append results in reallocation of the slice with 2x the current capacity.
When we don't know what the final length might be and it is possible that it could be 0, using []error{}
ensures we don't allocate any space for the item storage portion of the slice.
Even if we did errs := make([]error, 0, 10)
the underlying item storage would still be allocated for 10 items.
In general it's best to avoid nil slices for return values, though they can be used for function parameters/optional values.
One important point to note is that under the hood you can append to a nil slice, it will be treated as a 0 length empty slice on the first append.
var ( | ||
userDirs []os.DirEntry | ||
err error | ||
) | ||
if userDirs, err = os.ReadDir(rootRegistryPath); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point of this change to help insulate the function from needing to worry about when variables are created as things get refactored over time?
I feel like we got a lot out of that :
before, especially the type inference it gives
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to allow for the assignment and err check in one line, while still being able to use the value of userDirs
later on in the function. I may have gone a bit overboard with this type of change in this PR 😂
I feel like we got a lot out of that : before, especially the type inference it gives
True, though it's safe in this case since we're calling a function from the stdlib. The function signature for os.ReadDir
will not change without a major version change in Go. If changes were required before a Go 2.0, a new function would be introduced rather than os.ReadDir
being modified in a way that changed it's return types.
cmd/readmevalidation/readmefiles.go
Outdated
const ( | ||
rootRegistryPath = "./registry" | ||
fence = "---" | ||
|
||
// validationPhaseFileStructureValidation indicates when the entire Registry | ||
// directory is being verified for having all files be placed in the file | ||
// system as expected. | ||
validationPhaseFileStructureValidation validationPhase = "File structure validation" | ||
|
||
var supportedAvatarFileFormats = []string{".png", ".jpeg", ".jpg", ".gif", ".svg"} | ||
// validationPhaseFileLoad indicates when README files are being read from | ||
// the file system. | ||
validationPhaseFileLoad = "Filesystem reading" | ||
|
||
// validationPhaseReadmeParsing indicates when a README's frontmatter is | ||
// being parsed as YAML. This phase does not include YAML validation. | ||
validationPhaseReadmeParsing = "README parsing" | ||
|
||
// readme represents a single README file within the repo (usually within the | ||
// top-level "/registry" directory). | ||
// validationPhaseAssetCrossReference indicates when a README's frontmatter | ||
// is having all its relative URLs be validated for whether they point to | ||
// valid resources. | ||
validationPhaseAssetCrossReference = "Cross-referencing relative asset URLs" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of using declarations to group related variables, I feel like I'd want two const
declarations here – one for the phases, and one for everything else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two other things:
- I'm hesitant about referencing something above where it's defined if I can help it (in this case, defining a string in terms of
validationPhase
before the type is declared), even if it'll still work. I generally like being able to read the code top-to-bottom, even if that means that files don't follow a pattern of always defining constants first - The phases were previously defined as ints via
iota
. I'm just now realizing: with the current setup, only the first phase is defined strictly asvalidationPhase
, right? Everything else in the series has the typeuntyped string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the topic of using declarations to group related variables, I feel like I'd want two const declarations here – one for the phases, and one for everything else
Can you elaborate as to why? It's idiomatic, but not required, to group together const/var definitions like this. We could separate the const definitions, or simply separate them within the same parens block with a comment.
I'm hesitant about referencing something above where it's defined if I can help it (in this case, defining a string in terms of validationPhase before the type is declared), even if it'll still work. I generally like being able to read the code top-to-bottom, even if that means that files don't follow a pattern of always defining constants first
I agree with you, for sure. I missed that validationPhaseFileStructureValidation
was a validationPhase
. I'll see if there's another refactor we can make here to keep the top to bottom organization in place.
The phases were previously defined as ints via iota. I'm just now realizing: with the current setup, only the first phase is defined strictly as validationPhase, right? Everything else in the series has the type untyped string?
Well, string type no untyped, but yes. How important is it that the validationPhase
exist? Could we use iota
again or have these all just be strings?
@@ -318,19 +310,18 @@ func validateAllContributorFiles() error { | |||
return err | |||
} | |||
|
|||
log.Printf("Processing %d README files\n", len(allReadmeFiles)) | |||
logger.Info(context.Background(), "Processing README files", "num_files", len(allReadmeFiles)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still new to structured logging. Is there any special behavior/benefit you get if you use the same key multiple times? I guess I'm just wondering how much of a concern it is to make sure you're using the same keys each time you describe the same "resource", particularly for a function call that takes a variadic slice of empty interfaces (so basically zero type-safety)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the end of the world if you don't use the same key, but it does make searching for logs in some kind of log aggregation system much easier.
For example, a system I used to work on referred to the same internal tenant
type within the system as variations of user
, tenant
, id
, etc. Remembering which key was used on which logged lines complicated searches when I knew within I needed to see info for tenant="1234"
but on some lines the logging was user="1234"
.
Again this is likely less important in the case of the registry but still a good practice.
Signed-off-by: Callum Styan <[email protected]>
Changes are broken down in to multiples commits to hopefully make reviewing easy. 1 commit for the slog change and then a commit per Go file for style changes.
Style changes are generally:
call function, check if err != nil
blocks as possibleerr
orerrs
for all return type names, previously usedproblems
in some cases buterrs
in othersTodo
->TODO
, sometimes also useful to doTODO (name):
to make it easier to find things a specific author meant to follow up on// FunctionName/TypeName ...
though I'm now seeing places I didn't update thatIn general there's very few tests for the Go code here, would we like more or is there some testing that spins up the entire registry to validate things? I didn't see any makefile.