Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@silent-observer
Copy link

@silent-observer silent-observer commented Nov 27, 2024

gplogfilter: fix timerange validation

Previously the time range filtering of log files was incorrect: there was a check for belonging a log file to specified time range which worked like a binary AND (if time < begin AND time > end - skip file), but something like binary OR is needed here. Thus file filtering never worked. Also this validation was executed after the file was opened, so in case of failed validation there was a redundant open/close operations. This patch fixes such behavior, by:

  • moving the logic of belonging log file name the user specified range to the separate function (it's also helps to cover corner cases)
  • skip file processing, if it doesn't belong to range, at the begin of loop to prevent redundant open/close operations

Changes from original commit:

  1. Fix print() syntax for Python 3
  2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
  3. Make sure gplogfilter doesn't skip the first line.
  4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
  5. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)


Note: do not squash to preserve authorship

@silent-observer silent-observer marked this pull request as ready for review November 28, 2024 11:38
@RekGRpth

This comment was marked as resolved.

RekGRpth
RekGRpth previously approved these changes Nov 29, 2024
@silent-observer
Copy link
Author

That's the default way cherry-picked commits work (preserving the PR link from the old PR). Should it always be changed to the new PR?

@RekGRpth

This comment was marked as resolved.

@silent-observer silent-observer changed the title ADBDEV-4349: gplogfilter: fix timerange validation (#627) ADBDEV-4349: gplogfilter: fix timerange validation Nov 29, 2024
@hilltracer hilltracer self-requested a review December 2, 2024 10:42
@hilltracer
Copy link

hilltracer commented Dec 3, 2024

Look's like --find (-f) is not working in GPDB 7

GPDB 6 behaviour
gpadmin@gpdb6u:~/src/arenadata.sh$ ls -l /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/
total 68
-rw------- 1 gpadmin gpadmin 14836 Dec  3 13:37 gpdb-2024-12-03_163647.csv
-rw------- 1 gpadmin gpadmin 11744 Dec  3 13:37 gpdb-2024-12-03_163710.csv
-rw------- 1 gpadmin gpadmin 34809 Dec  3 13:37 gpdb-2024-12-03_163713.csv
-rw------- 1 gpadmin gpadmin   714 Dec  3 13:37 startup.log
gpadmin@gpdb6u:~/src/arenadata.sh$ head -n 3 /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163647.csv
2024-12-03 16:36:47.681750 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2024-12-03 16:36:46 MSK",,,,,,,0,,"xlog.c",6602,
2024-12-03 16:36:47.681838 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/5DF4550",,,,,,,0,,"xlog.c",7715,
2024-12-03 16:36:47.681849 MSK,,,p18626,th704555456,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 713 and next transaction id is 714",,,,,,,0,,"xlog.c",8089,
gpadmin@gpdb6u:~/src/arenadata.sh$ gplogfilter -b "2024-12-03 16:36:46" -e "2024-12-03 16:36:48" -f 'database system was shut down'
requested timestamp range from 2024-12-03 16:36:46 to 2024-12-03 16:36:48
----------  /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163647.csv ---------- 
2024-12-03 16:36:47.681750 MSK|||p18626|th704555456||||0|||seg-1|||||LOG: |00000|database system was shut down at 2024-12-03 16:36:46 MSK|||||||0||xlog.c|6602|
       in:      56 lines,      56 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:37:03.185090
  time ok:       9 lines,       9 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.686136
    match:       1 lines,       1 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.681750
      out:       1 lines,       1 log entries; timestamps from 2024-12-03 16:36:47.681750 to 2024-12-03 16:36:47.681750
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163710.csv
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/gpdb-2024-12-03_163713.csv
----------  /home/gpadmin/.data/qddir/demoDataDir-1/pg_log/startup.log ---------- 
       in:       3 lines,       3 log entries; timestamps from 2024-12-03 16:36:47.680281 to 2024-12-03 16:37:13.336233
  time ok:       1 lines,       1 log entries; timestamps from 2024-12-03 16:36:47.680281 to 2024-12-03 16:36:47.680281
    match:       0 lines
      out:       0 lines,       0 log entries
GPDB 7 behaviour
gpadmin@gpdb7u:~/src/arenadata.sh$ ls -l /home/gpadmin/.data/qddir/demoDataDir-1/log/
total 40
-rw------- 1 gpadmin gpadmin  7158 Dec  3 13:40 gpdb-2024-12-03_164049.csv
-rw------- 1 gpadmin gpadmin  4823 Dec  3 13:41 gpdb-2024-12-03_164059.csv
-rw------- 1 gpadmin gpadmin 19975 Dec  3 13:41 gpdb-2024-12-03_164102.csv
-rw------- 1 gpadmin gpadmin  3235 Dec  3 13:41 startup.log
gpadmin@gpdb7u:~/src/arenadata.sh$ head -n 3 /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164049.csv
2024-12-03 16:40:49.925554 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","database system was shut down at 2024-12-03 16:40:49 MSK",,,,,,,0,,"xlog.c",6626,
2024-12-03 16:40:49.925582 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","end of transaction log location is 0/471D7A0",,,,,,,0,,"xlog.c",7861,
2024-12-03 16:40:49.925591 MSK,,,p234151,th-1828210240,,,,0,,,seg-1,,,,,"LOG","00000","latest completed transaction id is 524 and next transaction id is 525",,,,,,,0,,"xlog.c",8250,
gpadmin@gpdb7u:~/src/arenadata.sh$ gplogfilter -b "2024-12-03 16:40:48" -e "2024-12-03 16:40:50" -f 'database system was shut down'
requested timestamp range from 2024-12-03 16:40:48 to 2024-12-03 16:40:50
---------- /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164049.csv ----------
       in:      30 lines,       1 log entries; timestamps from 2024-12-03 16:40:49.925554 to 2024-12-03 16:40:49.925554
  time ok:       0 lines
    match:       0 lines
      out:       0 lines,       0 log entries
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164059.csv
SKIP file: /home/gpadmin/.data/qddir/demoDataDir-1/log/gpdb-2024-12-03_164102.csv
---------- /home/gpadmin/.data/qddir/demoDataDir-1/log/startup.log ----------
       in:      15 lines,      15 log entries; timestamps from 2024-12-03 16:40:49.912866 to 2024-12-03 16:41:02.150204
  time ok:       4 lines,       4 log entries; timestamps from 2024-12-03 16:40:49.913130 to 2024-12-03 16:40:49.923857
    match:       0 lines
      out:       0 lines,       0 log entries

Is it expected behavior and what we will do?

@silent-observer
Copy link
Author

This is actually -b/-e failing to filter lines because CsvFlatten outputs nonsense on Python 3. I suggest fixing it here, even though this probably affected other things too.

@silent-observer silent-observer changed the title ADBDEV-4349: gplogfilter: fix timerange validation ADBDEV-5487: gplogfilter: fix timerange validation Dec 5, 2024
@silent-observer
Copy link
Author

Upgraded the behave tests to check the actual log lines as well (this should detect the previous issue)

@RekGRpth

This comment was marked as resolved.

@whitehawk

This comment was marked as resolved.

silent-observer pushed a commit that referenced this pull request Dec 6, 2024
Previously the time range filtering of log files was incorrect: there
was a check for belonging a log file to specified time range which
worked like a binary `AND` (if time < begin AND time > end - skip
file), but something like binary `OR` is needed here. Thus file
filtering never worked. Also this validation was executed after the
file was opened, so in case of failed validation there was a redundant
`open/close` operations. This patch fixes such behavior, by:
- moving the logic of belonging log file name the user specified range
  to the separate function (it's also helps to cover corner cases)
- skip file processing, if it doesn't belong to range, at the
  begin of loop to prevent redundant `open/close` operations

Changes from original commit:
1. Fix print() syntax for Python 3
2. Add StopIteration handler (required by Python 3.7)
3. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
4. Make sure gplogfilter doesn't skip the first line (with a chain iterator)
5. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
6. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)
@RekGRpth

This comment was marked as resolved.

silent-observer pushed a commit that referenced this pull request Dec 6, 2024
Previously the time range filtering of log files was incorrect: there
was a check for belonging a log file to specified time range which
worked like a binary `AND` (if time < begin AND time > end - skip
file), but something like binary `OR` is needed here. Thus file
filtering never worked. Also this validation was executed after the
file was opened, so in case of failed validation there was a redundant
`open/close` operations. This patch fixes such behavior, by:
- moving the logic of belonging log file name the user specified range
  to the separate function (it's also helps to cover corner cases)
- skip file processing, if it doesn't belong to range, at the
  begin of loop to prevent redundant `open/close` operations

Changes from original commit:
1. Fix print() syntax for Python 3
2. Add StopIteration handler (required by Python 3.7)
3. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
4. Make sure gplogfilter doesn't skip the first line (with a chain iterator)
5. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
6. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)
@silent-observer
Copy link
Author

Yes, probably. This probably wasn't noticed because the first line is normally "database system was shut down", and nobody searches for that. I think we shouldn't create a separate ticket for GPDB 7 though, only one to backport the new fixes to GPDB 6.

@RekGRpth

This comment was marked as resolved.

silent-observer pushed a commit that referenced this pull request Dec 6, 2024
Previously the time range filtering of log files was incorrect: there
was a check for belonging a log file to specified time range which
worked like a binary `AND` (if time < begin AND time > end - skip
file), but something like binary `OR` is needed here. Thus file
filtering never worked. Also this validation was executed after the
file was opened, so in case of failed validation there was a redundant
`open/close` operations. This patch fixes such behavior, by:
- moving the logic of belonging log file name the user specified range
  to the separate function (it's also helps to cover corner cases)
- skip file processing, if it doesn't belong to range, at the
  begin of loop to prevent redundant `open/close` operations

Changes from original commit:
1. Fix print() syntax for Python 3
2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
3. Make sure gplogfilter doesn't skip the first line.
4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
5. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)
@silent-observer
Copy link
Author

Correction: GPDB 6 isn't affected because it uses a while True loop. It was broken in GPDB 7 by commit f5c285a. I think no additional ticket is needed after all

silent-observer pushed a commit that referenced this pull request Dec 6, 2024
Previously the time range filtering of log files was incorrect: there
was a check for belonging a log file to specified time range which
worked like a binary `AND` (if time < begin AND time > end - skip
file), but something like binary `OR` is needed here. Thus file
filtering never worked. Also this validation was executed after the
file was opened, so in case of failed validation there was a redundant
`open/close` operations. This patch fixes such behavior, by:
- moving the logic of belonging log file name the user specified range
  to the separate function (it's also helps to cover corner cases)
- skip file processing, if it doesn't belong to range, at the
  begin of loop to prevent redundant `open/close` operations

Changes from original commit:
1. Fix print() syntax for Python 3
2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
3. Make sure gplogfilter doesn't skip the first line.
4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
5. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)
@RekGRpth

This comment was marked as resolved.

@silent-observer
Copy link
Author

No, already known (ADBDEV-6013)

RekGRpth
RekGRpth previously approved these changes Dec 9, 2024
Copy link

@hilltracer hilltracer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to approve, please just fix typos and add comments.

@RekGRpth

This comment was marked as resolved.

Previously the time range filtering of log files was incorrect: there
was a check for belonging a log file to specified time range which
worked like a binary `AND` (if time < begin AND time > end - skip
file), but something like binary `OR` is needed here. Thus file
filtering never worked. Also this validation was executed after the
file was opened, so in case of failed validation there was a redundant
`open/close` operations. This patch fixes such behavior, by:
- moving the logic of belonging log file name the user specified range
  to the separate function (it's also helps to cover corner cases)
- skip file processing, if it doesn't belong to range, at the
  begin of loop to prevent redundant `open/close` operations

Changes from original commit:
1. Fix print() syntax for Python 3
2. Add StringIO.seek(0) after StringIO.truncate(0) for Python 3
3. Make sure gplogfilter doesn't skip the first line.
4. Update behave test for GPDB 7 (use COORDINATOR_DATA_DIRECTORY and /log)
5. Add file content checking to behave test (counting lines that were read)

(cherry picked from commit e971286)
@silent-observer silent-observer enabled auto-merge (rebase) December 12, 2024 07:04
@silent-observer silent-observer merged commit 7e49463 into adb-7.2.0 Dec 12, 2024
1 check passed
@silent-observer silent-observer deleted the ADBDEV-5487 branch December 12, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants