Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add Prometheus alert metrics#1140

Open
MemoMeto35 wants to merge 1 commit into
pgmoneta:mainfrom
MemoMeto35:alert-metrics
Open

Add Prometheus alert metrics#1140
MemoMeto35 wants to merge 1 commit into
pgmoneta:mainfrom
MemoMeto35:alert-metrics

Conversation

@MemoMeto35
Copy link
Copy Markdown

This PR solves #1096

@jesperpedersen
Copy link
Copy Markdown
Member

@MemoMeto35 Remember the authors files - see DEVELOPERS.md

@jesperpedersen
Copy link
Copy Markdown
Member

@MemoMeto35 And, the manual

@MemoMeto35
Copy link
Copy Markdown
Author

Hello @jesperpedersen, thanks for the feedback. I wanted to ask about you comment for the manual. Do we need a new chapter for alerts or in prometheus.md; I believe I have added it to prometheus.md in my last commit.

@Abdelrhmansersawy
Copy link
Copy Markdown
Collaborator

@MemoMeto35

Hello, We need to add to the manual how users can configure alert rules with Grafana, similar to pgexporter:
https://github.com/pgexporter/pgexporter/blob/main/doc/manual/en/09-grafana.md#alerting-with-grafana

We also need to add an introductory guide explaining alerts and why they are important (similar to the first part of pgexporter here: https://github.com/pgexporter/pgexporter/blob/main/doc/ALERT.md).

I think we can keep all of this in prometheus.md, without needing a separate alert.md, @jesperpedersen, WDYT?

@Abdelrhmansersawy
Copy link
Copy Markdown
Collaborator

@jesperpedersen
Copy link
Copy Markdown
Member

Yes, it belongs in the Prometheus chapter, and in doc/PROMETHEUS.md

@Abdelrhmansersawy
Copy link
Copy Markdown
Collaborator

@MemoMeto35
Copy link
Copy Markdown
Author

Hi @jesperpedersen @Abdelrhmansersawy , thanks for your feedback. I applied the changes needed. Let me know if any further modifications should be done!

Copy link
Copy Markdown
Collaborator

@Abdelrhmansersawy Abdelrhmansersawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I have just left a few comments.

Comment thread doc/PROMETHEUS.md
Comment thread doc/PROMETHEUS.md
All alert metrics carry three labels: `server` (the server identifier), `alert` (the alert
name), and `type` (the alert type, currently `state`).

# Alerting with Grafana
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some images guide of each step similar to pgexporter
https://github.com/pgexporter/pgexporter/blob/main/doc/manual/en/09-grafana.md#alerting-with-grafana

we try to keep cross-port consistent between projects

Comment thread src/include/configuration.h Outdated
Comment thread doc/CONFIGURATION.md Outdated
| libev | `auto` | String | No | Select the [libev](http://software.schmorp.de/pkg/libev.html) backend to use. Valid options: `auto`, `select`, `poll`, `epoll`, `iouring`, `devpoll` and `port` |
| max_rate | 0 | Int | No | The maximum backup transfer rate in bytes per second. Use 0 to disable |
| progress | off | Bool | No | Enable backup progress tracking |
| alert | off | Bool | No | Enable Prometheus alert metrics |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment thread doc/CONFIGURATION.md Outdated
@Abdelrhmansersawy
Copy link
Copy Markdown
Collaborator

@MemoMeto35
Please don't forget to squash commits
See https://github.com/pgmoneta/pgmoneta/blob/main/doc/DEVELOPERS.md#multiple-commits

follow commit message format [#issue_id] ...

@MemoMeto35 MemoMeto35 force-pushed the alert-metrics branch 2 times, most recently from c1bec03 to 4b3f10a Compare May 13, 2026 16:34
@MemoMeto35
Copy link
Copy Markdown
Author

@Abdelrhmansersawy , your requested changes should be done now. please let me know if any further modifications needed

Comment thread src/libpgmoneta/configuration.c Outdated
Comment thread src/libpgmoneta/configuration.c Outdated
Comment thread src/libpgmoneta/configuration.c Outdated
config->progress = false;
}
}
else if (!strcmp(key, "alert"))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Please ensure that we have it alert(s) every-way (keep it consistent between docs and codebase)

Comment thread src/libpgmoneta/configuration.c Outdated
Comment thread src/libpgmoneta/configuration.c Outdated
Comment thread doc/PROMETHEUS.md
@MemoMeto35
Copy link
Copy Markdown
Author

Hello @Abdelrhmansersawy , changes requested have been updated

Copy link
Copy Markdown
Collaborator

@Abdelrhmansersawy Abdelrhmansersawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job, just a few things from my side.

config = (struct main_configuration*)shmem;

/* pgmoneta_alert_server_down */
data = pgmoneta_append(data, "#HELP pgmoneta_alert_server_down Alert: server is not online (1 = down, 0 = up)\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between # and HELP

it should be # HELP


/* pgmoneta_alert_server_down */
data = pgmoneta_append(data, "#HELP pgmoneta_alert_server_down Alert: server is not online (1 = down, 0 = up)\n");
data = pgmoneta_append(data, "#TYPE pgmoneta_alert_server_down gauge\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

data = NULL;

/* pgmoneta_alert_wal_streaming_down */
data = pgmoneta_append(data, "#HELP pgmoneta_alert_wal_streaming_down Alert: WAL streaming is not active (1 = down, 0 = streaming)\n");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same to all of others...

}
int critical = 0;

if (total_s > 0 && free_s < total_s / 10)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid magic number, Please #define PGMONETA_ALERT_DISK_CRITICAL_THRESHOLD 10 at the top

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace pgexporter-slack to pgmoneta-slack

Comment thread doc/PROMETHEUS.md
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We all add Alert sections to manual 10-prometheus.md
both en and es

srv.workers = -1;
srv.max_rate = -1;
srv.progress_enabled = -1;
srv.alert_enabled = -1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set default value also config->alerts = false as well

continue;
}
int stale = 0;
int retention = config->common.servers[i].retention_days;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider also retention_weeks/months/years or just skip when no retention is configured?

if (backup_ts > 0)
{
time_t now = time(NULL);
double age_days = difftime(now, backup_ts) / 86400.0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace magic number 86400 with #define

Comment on lines +1764 to +1765
free_s = pgmoneta_free_space(base_dir);
total_s = pgmoneta_total_space(base_dir);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free_s and total_s are computed once from config->base_dir, then emitted
for every server with only the server= label changing. All servers report
the exact same value.
Could we move the measurement inside the loop and use each server's own backup
path? Something like:

for (int i = 0; i < config->common.number_of_servers; i++)
{
   if (!pgmoneta_is_alert_enabled(i))
   {
      continue;
   }

   char* server_path = pgmoneta_get_server_backup(i);
   unsigned long free_s = pgmoneta_free_space(server_path);
   unsigned long total_s = pgmoneta_total_space(server_path);
   free(server_path);

   int critical = (total_s > 0 && free_s < total_s / 10) ? 1 : 0;

   /* ... emit metric ... */
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants