Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

cbandy
Copy link
Member

@cbandy cbandy commented Sep 10, 2025

Checklist:

  • Have you added an explanation of what your changes do and why you'd like them to be included?
  • Have you updated or added documentation for the change, as applicable?
  • Have you tested your changes on all related environments with successful results, as applicable?
    • Have you added automated tests?

Type of Changes:

  • New feature

What is the current behavior (link to any open issues here)?

When unspecified, Postgres logs to the log directory inside its data directory. This is difficult to coordinate with the OpenTelemetry Collector and can lose log files during replication creation and major upgrades.

What is the new behavior (if this is a feature change)?

  • Breaking change (fix or feature that would cause existing functionality to change)

When unspecified, Postgres logs outside the data directory on one of the attached persistent volumes. The allowed values for the log_directory parameter have been reduced to a handful of absolute path prefixes to volumes that are also in the spec.

Users that want to continue to log to the data directory can/must set the log_directory parameter to log. That value is safe to assign even before this change.

Other Information:

Issue: PGO-2558

cbandy and others added 3 commits September 11, 2025 12:28
This is a tightening of validation compared to v1beta1. The parameter
must have one of a handful of prefixes. A spec-level rule ensures the
value refers to a volume declared in the spec.

Co-authored-by: Benjamin Blattberg <[email protected]>
Co-authored-by: Drew Sessler <[email protected]>
Issue: PGO-2558
This is further protection against Postgres data loss due to a
misconfigured or maliciously configured log directory.

Co-authored-by: Benjamin Blattberg <[email protected]>
Co-authored-by: Drew Sessler <[email protected]>
Issue: PGO-2558
This has been the recommended practice for ages, but PGO left this
parameter at Postgres' packaged default. Being outside the data
directory makes the log directory and files:

 - easier to create
 - easier to consume from the OpenTelemetry Collector
 - easier to exclude from backups
 - persistent across failures that recover by recreating the data directory

Users that wish to log to the default directory can
set "spec.config.parameters.log_directory" to "log" on PostgresCluster.

Co-authored-by: Benjamin Blattberg <[email protected]>
Co-authored-by: Drew Sessler <[email protected]>
Issue: PGO-2558
@cbandy cbandy force-pushed the postgres-log-directory branch from fb42b48 to a3078e0 Compare September 11, 2025 17:30
Copy link

snyk-io bot commented Sep 11, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

Rules []PostgresHBARuleSpec `json:"rules,omitempty"`
}

type PostgresConfigSpec struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only using this struct right now; but are you thinking it's easier to copy over all the v1beta1 structs in this file so that we can keep them aligned (up until we drop v1beta1 or let them become unaligned)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking at all! Which seems better to you presently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I was going to say, "bring over everything so we can keep things aligned" and then I realized that the reason to have a v1 is to change things. Here we have a new CEL rule on parameters, and an editing of existing CEL rules (mostly reorganizing messages as the last part, I think) for PostgresHBARuleSpec -- but then we're not using the v1.PostgresHBARuleSpec anywhere...

So now I lean lightly towards "just bring over the things that have changed."

}
}

// sensitiveAbsolutePath is used by [sanitizeLogDirectory].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might've helped me reading at first to understand sensitiveAbsolutePath and sensitiveRelativePath were disallowed paths.

Copy link
Contributor

@benjaminjb benjaminjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, I like that we're getting logs out as a default from data.

The only question I have is about sanitizing. We have the CEL rules around log_directory for v1, so theoretically any value we get should be safe to use for this cluster. (e.g., if the log_directory is in pgwal, they have a pgwal volume set up.)

So the sanitizing is really for v1beta1, yeah?

But also, is this our first instance of doing some real sanitizing of user-set configs?

@cbandy
Copy link
Member Author

cbandy commented Sep 11, 2025

The only question I have is about sanitizing. We have the CEL rules around log_directory for v1, so theoretically any value we get should be safe to use for this cluster. (e.g., if the log_directory is in pgwal, they have a pgwal volume set up.)

So the sanitizing is really for v1beta1, yeah?

But also, is this our first instance of doing some real sanitizing of user-set configs?

I'm conflicted about this, too. I don't have all our validation in my head, but I don't think much of it so far has been about safety. When I skim, most of what I see are riffs on optional, required, and max-items. The label and subdomain validation is an early detection of values rejected (and not acted on) by Kubernetes.

I do recall:

  • these are about SQL injection, repeated in Go.
  • this about accessing files, again in Go.
  • this about identities, also in Go.

So the sanitizing is really for v1beta1, yeah?

Yeah, I guess so.

Most of my concern is that (without "sanitize") there are values the controller code cannot handle. There are lots of values for log directory inside PGDATA that won't bootstrap, or won't scrape, etc. The code now tries to adjust to known good values, but another approach could be to error out completely: "donk; invalid!"

There's also an almost-zero chance that someone could poke a CRD to remove some validation, then submit a resource our controller enacts, which allows or does something "bad." 🤔 Rather pedantic.

@benjaminjb
Copy link
Contributor

Most of my concern is that (without "sanitize") there are values the controller code cannot handle. There are lots of values for log directory inside PGDATA that won't bootstrap, or won't scrape, etc. The code now tries to adjust to known good values, but another approach could be to error out completely: "donk; invalid!"

OK, if I remember that a spec is a request, then fixing that request seems reasonable. And I suppose someone who wanted to find the logs shouldn't look at the spec as the source of truth, but could get the location of the logs from PG / the settings configmap. (Maybe we make that suggestion in the docs.)

But do you think it's worth emitting an event? Or adding it to the status? (People probably don't look at events unless something goes wrong, but status feels a little too much to me right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants