-
Notifications
You must be signed in to change notification settings - Fork 23
ADBDEV-5487: gplogfilter: fix timerange validation #1134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e3c6ae7 to
6afd8a8
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
That's the default way cherry-picked commits work (preserving the PR link from the old PR). Should it always be changed to the new PR? |
This comment was marked as resolved.
This comment was marked as resolved.
6afd8a8 to
87b3a20
Compare
|
Look's like --find (-f) is not working in GPDB 7 GPDB 6 behaviourgpadmin@gpdb6u:~/src/arenadata.sh$ ls -l /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/
total 68
-rw------- 1 gpadmin gpadmin 14836 Dec 3 13:37 gpdb-2024-12-03_163647.csv
-rw------- 1 gpadmin gpadmin 11744 Dec 3 13:37 gpdb-2024-12-03_163710.csv
-rw------- 1 gpadmin gpadmin 34809 Dec 3 13:37 gpdb-2024-12-03_163713.csv
-rw------- 1 gpadmin gpadmin 714 Dec 3 13:37 startup.log
gpadmin@gpdb6u:~/src/arenadata.sh$ head -n 3 /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163647.csv
2024-12-03 16:36:47.681750 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2024-12-03 16:36:46 MSK",,,,,,,0,,"xlog.c",6602,
2024-12-03 16:36:47.681838 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/5DF4550",,,,,,,0,,"xlog.c",7715,
2024-12-03 16:36:47.681849 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 713 and next transaction id is 714",,,,,,,0,,"xlog.c",8089,
gpadmin@gpdb6u:~/src/arenadata.sh$ gplogfilter -b "2024-12-03 16:36:46" -e "2024-12-03 16:36:48" -f 'database system was shut down'
requested timestamp range from 2024-12-03 16:36:46 to 2024-12-03 16:36:48
---------- /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163647.csv ----------
2024-12-03 16:36:47.681750 MSK|||p18626|th704555456||||0|||seg-1|||||LOG: |00000|database system was shut down at 2024-12-03 16:36:46 MSK|||||||0||xlog.c|6602|
in: 56 lines, 56 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:37:03.185090
time ok: 9 lines, 9 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.686136
match: 1 lines, 1 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.681750
out: 1 lines, 1 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.681750
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163710.csv
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163713.csv
---------- /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/startup.log ----------
in: 3 lines, 3 log entries; timestamps from 2024-12-03 16:36:47.680281 to 2024-12-03 16:37:13.336233
time ok: 1 lines, 1 log entries; timestamps from 2024-12-03 16:36:47.680281 to 2024-12-03 16:36:47.680281
match: 0 lines
out: 0 lines, 0 log entriesGPDB 7 behaviourgpadmin@gpdb7u:~/src/arenadata.sh$ ls -l /home/gpadmin/.data/qddir/demoDataDir-1/log/
total 40
-rw------- 1 gpadmin gpadmin 7158 Dec 3 13:40 gpdb-2024-12-03_164049.csv
-rw------- 1 gpadmin gpadmin 4823 Dec 3 13:41 gpdb-2024-12-03_164059.csv
-rw------- 1 gpadmin gpadmin 19975 Dec 3 13:41 gpdb-2024-12-03_164102.csv
-rw------- 1 gpadmin gpadmin 3235 Dec 3 13:41 startup.log
gpadmin@gpdb7u:~/src/arenadata.sh$ head -n 3 /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164049.csv
2024-12-03 16:40:49.925554 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2024-12-03 16:40:49 MSK",,,,,,,0,,"xlog.c",6626,
2024-12-03 16:40:49.925582 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/471D7A0",,,,,,,0,,"xlog.c",7861,
2024-12-03 16:40:49.925591 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 524 and next transaction id is 525",,,,,,,0,,"xlog.c",8250,
gpadmin@gpdb7u:~/src/arenadata.sh$ gplogfilter -b "2024-12-03 16:40:48" -e "2024-12-03 16:40:50" -f 'database system was shut down'
requested timestamp range from 2024-12-03 16:40:48 to 2024-12-03 16:40:50
---------- /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164049.csv ----------
in: 30 lines, 1 log entries; timestamps from 2024-12-03 16:40:49.925554 to 2024-12-03 16:40:49.925554
time ok: 0 lines
match: 0 lines
out: 0 lines, 0 log entries
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164059.csv
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164102.csv
---------- /home/gpadmin/.data/qddir/demoDataDir-1/log/startup.log ----------
in: 15 lines, 15 log entries; timestamps from 2024-12-03 16:40:49.912866 to 2024-12-03 16:41:02.150204
time ok: 4 lines, 4 log entries; timestamps from 2024-12-03 16:40:49.913130 to 2024-12-03 16:40:49.923857
match: 0 lines
out: 0 lines, 0 log entriesIs it expected behavior and what we will do? |
|
This is actually -b/-e failing to filter lines because CsvFlatten outputs nonsense on Python 3. I suggest fixing it here, even though this probably affected other things too. |
32753f4 to
f8311ea
Compare
f8311ea to
795dc93
Compare
|
Upgraded the behave tests to check the actual log lines as well (this should detect the previous issue) |
This comment was marked as resolved.
This comment was marked as resolved.
795dc93 to
5e668a3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary `AND` (if time < begin AND time > end - skip file), but something like binary `OR` is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant `open/close` operations. This patch fixes such behavior, by: - moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases) - skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant `open/close` operations Changes from original commit: 1. Fix print() syntax for Python 3 2. Add StopIteration handler (required by Python 3.7) 3. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3 4. Make sure gplogfilter doesn't skip the first line (with a chain iterator) 5. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log) 6. Add file content checking to behave test (counting lines that were read) (cherry picked from commit e971286)
5e668a3 to
c7dac99
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary `AND` (if time < begin AND time > end - skip file), but something like binary `OR` is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant `open/close` operations. This patch fixes such behavior, by: - moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases) - skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant `open/close` operations Changes from original commit: 1. Fix print() syntax for Python 3 2. Add StopIteration handler (required by Python 3.7) 3. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3 4. Make sure gplogfilter doesn't skip the first line (with a chain iterator) 5. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log) 6. Add file content checking to behave test (counting lines that were read) (cherry picked from commit e971286)
c7dac99 to
c8a154d
Compare
|
Yes, probably. This probably wasn't noticed because the first line is normally "database system was shut down", and nobody searches for that. I think we shouldn't create a separate ticket for GPDB 7 though, only one to backport the new fixes to GPDB 6. |
This comment was marked as resolved.
This comment was marked as resolved.
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary `AND` (if time < begin AND time > end - skip file), but something like binary `OR` is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant `open/close` operations. This patch fixes such behavior, by: - moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases) - skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant `open/close` operations Changes from original commit: 1. Fix print() syntax for Python 3 2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3 3. Make sure gplogfilter doesn't skip the first line. 4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log) 5. Add file content checking to behave test (counting lines that were read) (cherry picked from commit e971286)
c8a154d to
3c819cb
Compare
|
Correction: GPDB 6 isn't affected because it uses a |
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary `AND` (if time < begin AND time > end - skip file), but something like binary `OR` is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant `open/close` operations. This patch fixes such behavior, by: - moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases) - skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant `open/close` operations Changes from original commit: 1. Fix print() syntax for Python 3 2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3 3. Make sure gplogfilter doesn't skip the first line. 4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log) 5. Add file content checking to behave test (counting lines that were read) (cherry picked from commit e971286)
3c819cb to
05fb8c2
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
No, already known (ADBDEV-6013) |
hilltracer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ready to approve, please just fix typos and add comments.
This comment was marked as resolved.
This comment was marked as resolved.
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary `AND` (if time < begin AND time > end - skip file), but something like binary `OR` is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant `open/close` operations. This patch fixes such behavior, by: - moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases) - skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant `open/close` operations Changes from original commit: 1. Fix print() syntax for Python 3 2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3 3. Make sure gplogfilter doesn't skip the first line. 4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log) 5. Add file content checking to behave test (counting lines that were read) (cherry picked from commit e971286)
29220f3 to
2db4324
Compare
gplogfilter: fix timerange validation
Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary
AND(if time < begin AND time > end - skip file), but something like binaryORis needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundantopen/closeoperations. This patch fixes such behavior, by:open/closeoperationsChanges from original commit:
(cherry picked from commit e971286)
Note: do not squash to preserve authorship