Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bimboterminator1
Copy link
Member

@bimboterminator1 bimboterminator1 commented Feb 25, 2025

The proposed code contains main framework for rebalance execution.
Some options are not implemented fully and are expected to be finished in next
tasks.

The code describes the following segment movement approach. Firstly, we creating
a movements plan: simple steps telling which segment to which host to move.
Steps in plan can be different:

  1. Mirror only moves.
  2. Both primary and mirror are moved to different hosts.
  3. Primary only moves.
  4. Primary and mirror are swapped.

For each type of movement we clarify the target dirs and ports at target hosts,
able to contain the size of moved segment. To do that the DiskFree and DiskUsage
commands are used.

The movements, in its turn, are composite and imply extra actions including
segment switching.

  1. Mirror only moves use only single gprecoverseg call to perform movement.
  2. If we move primary and mirror pair, the strategy is following. The mirror is
    firstly moved via gprecoverseg to primary's target host. Then the roles are
    switched. Then ex-primary (new mirror) is moved to mirror's target host.
  3. Primary only moves imply 2 role switches. Switch.Move.Switch.
  4. Primary mirror swap is executed similar to 2nd type. Mirror is moved to
    primary dir in its own host. Switch. Ex-primary is moved to mirror dir in its
    own host.

The status management is written in general and may contain errors.
The cleanup in finally part of try in gprebalance file is left for test purposes

How to test until behave tests are not written: just create a test multi host
cluster (either in docker or in cloud) and generate imbalanced configurations
via gpinitsystem -I conffile

config file is built according to template:
hostname~address~port~datadir~dbid~content~replication_port

QD_PRIMARY_ARRAY=mdw~mdw~5432~/home/gpadmin/master/gpseg-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~10000~/home/gpadmin/primary/gpseg0~2~0~11000
sdw1~sdw1~10001~/home/gpadmin/primary/gpseg1~3~1~11001
sdw2~sdw2~10002~/home/gpadmin/primary/gpseg2~9~2~11002
sdw2~sdw2~10003~/home/gpadmin/primary/gpseg3~10~3~11003
sdw3~sdw3~10004~/home/gpadmin/primary/gpseg4~13~4~11004
sdw3~sdw3~10005~/home/gpadmin/primary/gpseg5~14~5~11005
sdw3~sdw3~10006~/home/gpadmin/primary/gpseg6~17~6~11006
sdw3~sdw3~10007~/home/gpadmin/primary/gpseg7~18~7~11007
sdw3~sdw3~10008~/home/gpadmin/primary/gpseg8~19~8~11008
)
declare -a MIRROR_ARRAY=(
sdw1~sdw1~10503~/home/gpadmin/mirror/gpseg3~4~3~11503
sdw1~sdw1~10504~/home/gpadmin/mirror/gpseg4~5~4~11504
sdw1~sdw1~10506~/home/gpadmin/mirror/gpseg6~6~6~11506
sdw1~sdw1~10507~/home/gpadmin/mirror/gpseg7~7~7~11507
sdw1~sdw1~10508~/home/gpadmin/mirror/gpseg8~8~8~11508
sdw2~sdw2~10500~/home/gpadmin/mirror/gpseg0~11~0~11500
sdw2~sdw2~10505~/home/gpadmin/mirror/gpseg5~12~5~11505
sdw3~sdw3~10501~/home/gpadmin/mirror/gpseg1~15~1~11501
sdw3~sdw3~10502~/home/gpadmin/mirror/gpseg2~16~2~11502
)

@RekGRpth

This comment was marked as resolved.

@RekGRpth
Copy link
Member

How to test until behave tests are not written: just create a test multi host cluster (either in docker or in cloud) and generate imbalanced configurations via gpinitsystem -I conffile

config file is built according to template:
hostname~address~port~datadir~dbid~content~replication_port

QD_PRIMARY_ARRAY=mdw~mdw~5432~/home/gpadmin/master/gpseg-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~10000~/home/gpadmin/primary/gpseg0~2~0~11000
sdw1~sdw1~10001~/home/gpadmin/primary/gpseg1~3~1~11001
sdw2~sdw2~10002~/home/gpadmin/primary/gpseg2~9~2~11002
sdw2~sdw2~10003~/home/gpadmin/primary/gpseg3~10~3~11003
sdw3~sdw3~10004~/home/gpadmin/primary/gpseg4~13~4~11004
sdw3~sdw3~10005~/home/gpadmin/primary/gpseg5~14~5~11005
sdw3~sdw3~10006~/home/gpadmin/primary/gpseg6~17~6~11006
sdw3~sdw3~10007~/home/gpadmin/primary/gpseg7~18~7~11007
sdw3~sdw3~10008~/home/gpadmin/primary/gpseg8~19~8~11008
)
declare -a MIRROR_ARRAY=(
sdw1~sdw1~10503~/home/gpadmin/mirror/gpseg3~4~3~11503
sdw1~sdw1~10504~/home/gpadmin/mirror/gpseg4~5~4~11504
sdw1~sdw1~10506~/home/gpadmin/mirror/gpseg6~6~6~11506
sdw1~sdw1~10507~/home/gpadmin/mirror/gpseg7~7~7~11507
sdw1~sdw1~10508~/home/gpadmin/mirror/gpseg8~8~8~11508
sdw2~sdw2~10500~/home/gpadmin/mirror/gpseg0~11~0~11500
sdw2~sdw2~10505~/home/gpadmin/mirror/gpseg5~12~5~11505
sdw3~sdw3~10501~/home/gpadmin/mirror/gpseg1~15~1~11501
sdw3~sdw3~10502~/home/gpadmin/mirror/gpseg2~16~2~11502
)

I tried to test using a file
conffile

but got
error

What did I do wrong?

Still managed to successfully launch rebalance in one container!

Had to fix paths and ports.

QD_PRIMARY_ARRAY=cdw~cdw~7000~/home/gpadmin/.data/qddir/demoDataDir-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~10100~/home/gpadmin/.data/sdw1/primary/gpseg0~2~0~11100
sdw1~sdw1~10110~/home/gpadmin/.data/sdw1/primary/gpseg1~3~1~11110
sdw2~sdw2~10220~/home/gpadmin/.data/sdw2/primary/gpseg2~9~2~11220
sdw2~sdw2~10230~/home/gpadmin/.data/sdw2/primary/gpseg3~10~3~11230
sdw3~sdw3~10340~/home/gpadmin/.data/sdw3/primary/gpseg4~13~4~11340
sdw3~sdw3~10350~/home/gpadmin/.data/sdw3/primary/gpseg5~14~5~11350
sdw3~sdw3~10360~/home/gpadmin/.data/sdw3/primary/gpseg6~17~6~11360
sdw3~sdw3~10370~/home/gpadmin/.data/sdw3/primary/gpseg7~18~7~11370
sdw3~sdw3~10380~/home/gpadmin/.data/sdw3/primary/gpseg8~19~8~11380
)
declare -a MIRROR_ARRAY=(
sdw1~sdw1~50130~/home/gpadmin/.data/sdw1/mirror/gpseg3~4~3~51130
sdw1~sdw1~50140~/home/gpadmin/.data/sdw1/mirror/gpseg4~5~4~51140
sdw1~sdw1~50160~/home/gpadmin/.data/sdw1/mirror/gpseg6~6~6~51160
sdw1~sdw1~50170~/home/gpadmin/.data/sdw1/mirror/gpseg7~7~7~51170
sdw1~sdw1~50180~/home/gpadmin/.data/sdw1/mirror/gpseg8~8~8~51180
sdw2~sdw2~50200~/home/gpadmin/.data/sdw2/mirror/gpseg0~11~0~51200
sdw2~sdw2~50250~/home/gpadmin/.data/sdw2/mirror/gpseg5~12~5~51250
sdw3~sdw3~50310~/home/gpadmin/.data/sdw3/mirror/gpseg1~15~1~51310
sdw3~sdw3~50320~/home/gpadmin/.data/sdw3/mirror/gpseg2~16~2~51320
)

But why so slow?! Even on an empty cluster!

gprebalance -n 1
20250227:10:03:30:473322 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.30.gba7c7afd9fc build dev'
20250227:10:03:30:473322 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.30.gba7c7afd9fc build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Feb 27 2025 04:40:54 (with assert checking) Bhuvnesh C.'
20250227:10:03:30:473322 gprebalance:cdw:gpadmin-[INFO]:-Querying gprebalance schema for current state
20250227:10:03:30:473322 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be 
using more hosts than the number of segments per host for spread mirroring. 
Grouped mirroring places all of a given hosts segments on a single 
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> 
20250227:10:03:32:473322 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250227:10:03:32:473322 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250227:10:03:32:473322 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250227:10:03:36:473322 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for move dbid=15
20250227:10:05:01:473322 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for move dbid=16
20250227:10:06:06:473322 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 2 segments
20250227:10:06:16:473322 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for move dbid=18
20250227:10:07:20:473322 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for move dbid=19
20250227:10:08:26:473322 gprebalance:cdw:gpadmin-[INFO]:-Dropping rebalance schema
20250227:10:08:26:473322 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

Is this expected behavior?

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Feb 27, 2025

But why so slow?! Even on an empty cluster!

Now the parallel execution is fixed, but all movements are done via gprecoverseg, which launches pg_basebackup, so at the moment I am running out of ideas how to speed up things. It's really likely that rebalance on loaded cluster will take hours

@RekGRpth
Copy link
Member

will take hours

or days/weeks/months


EXECNAME = os.path.split(__file__)[-1]
MAX_BATCH_SIZE = 128
MAX_PARALLEL_WORKERS = 96
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move it, instead of changing at the first place?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mvp i suggest to pay attention to general and intrinsic aspects of rebalance feature rather than considering the constants definition. I can make such cosmetics improvements later, after there will be no crucial questions left.

parser.add_option('-f', '--target-hosts', metavar='<hosts_file>', dest='filename',
help='yaml containing target hosts configuration')
parser.add_option('-p', '--show-plan', dest='show_plan', action='store_true', default=False,
help='show rebalance plan')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the restult of conflicts resolution during the preparation of current brunch for PR. A we can see, the -p option is still present in the code

parser.add_option('-e', '--end', type='datetime', metavar='datetime',
help="ending date and time in the format 'YYYY-MM-DD hh:mm:ss'.")
parser.add_option('-n', '--parallel', type="int", default=1, metavar="<parallel_processes>",
parser.add_option('-n', '--parallel', type="int", default=4, metavar="<parallel_processes>",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the default value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems better to have default parallel degree different from 1

return balancer.getPlan(balancer.balance())

def save_plan(self, plan: Plan):
# picke the plan in conf directory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@whitehawk

This comment was marked as outdated.

@whitehawk
Copy link

Is it Ok that after gprebalance the mirrors are not balanced?

gpadmin@cdw:~$ psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration where role='m' order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
   11 |       0 | m    | m              | s    | u      | 7050 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   16 |       5 | m    | m              | s    | u      | 7055 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    9 |       7 | m    | m              | s    | u      | 7019 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/gpseg7
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(10 rows)
Full log:
psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration order by address, content;"

        export PGPORT=7000
        export COORDINATOR_DATA_DIRECTORY=/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
        export MASTER_DATA_DIRECTORY=/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
   11 |       0 | m    | m              | s    | u      | 7050 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
    3 |       1 | p    | p              | s    | u      | 7003 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   16 |       5 | m    | m              | s    | u      | 7055 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    6 |       4 | p    | p              | s    | u      | 7006 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
   18 |       7 | m    | m              | s    | u      | 7057 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
    9 |       7 | p    | p              | s    | u      | 7009 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   10 |       8 | p    | p              | s    | u      | 7010 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(20 rows)

gpadmin@cdw:~$ gprebalance
20250228:02:17:56:012838 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.32.ga642f22d08 build commit:a642f22d085e0edd2863cedea5f7cf425e50e7fb'
20250228:02:17:56:012838 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.32.ga642f22d08 build commit:a642f22d085e0edd2863cedea5f7cf425e50e7fb) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Feb 28 2025 01:59:58 (with assert checking) Bhuvnesh C.'
20250228:02:17:56:012838 gprebalance:cdw:gpadmin-[INFO]:-Querying gprebalance schema for current state
20250228:02:17:56:012838 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
>
20250228:02:17:58:012838 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250228:02:17:58:012838 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250228:02:17:58:012838 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250228:02:18:02:012838 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 18, content = 7) sdw3|7057|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7 sdw1|7016|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg7
20250228:02:19:12:012838 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 2 segments
20250228:02:19:25:012838 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 9, content = 7) sdw3|7009|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7 sdw2|7019|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/gpseg7
20250228:02:19:25:012838 gprebalance:cdw:gpadmin-[INFO]:-About to run gprecoverseg for mirror move (dbid = 10, content = 8) sdw3|7010|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8 sdw2|7021|/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg8
20250228:02:20:45:012838 gprebalance:cdw:gpadmin-[INFO]:-Executing role swaps for 1 segments
20250228:02:20:59:012838 gprebalance:cdw:gpadmin-[INFO]:-Dropping rebalance schema
20250228:02:20:59:012838 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$ psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
   11 |       0 | m    | m              | s    | u      | 7050 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
   12 |       1 | m    | m              | s    | u      | 7051 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   18 |       7 | p    | p              | s    | u      | 7016 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg7
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   16 |       5 | m    | m              | s    | u      | 7055 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    9 |       7 | m    | m              | s    | u      | 7019 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/gpseg7
   10 |       8 | p    | p              | s    | u      | 7021 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg8
    6 |       4 | p    | p              | s    | u      | 7006 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(20 rows)

gpadmin@cdw:~$ psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration where role='p' order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                              datadir
------+---------+------+----------------+------+--------+------+----------+---------+-------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
   18 |       7 | p    | p              | s    | u      | 7016 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg7
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   10 |       8 | p    | p              | s    | u      | 7021 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg8
    6 |       4 | p    | p              | s    | u      | 7006 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
(10 rows)

gpadmin@cdw:~$ psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration where role='m' order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
   11 |       0 | m    | m              | s    | u      | 7050 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   16 |       5 | m    | m              | s    | u      | 7055 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    9 |       7 | m    | m              | s    | u      | 7019 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/gpseg7
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(10 rows)

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 2, 2025

I got an error

Didn't reproduce. What is in gprecoverseg log files?

@bimboterminator1
Copy link
Member Author

Is it Ok that after gprebalance the mirrors are not balanced?

Fixed

@RekGRpth
Copy link
Member

RekGRpth commented Mar 3, 2025

With the following config file:
Config file

I got an error:
Output

How do you run rebalance? In docker? In one container or in several? I have provided a working config for running in one container above.

@RekGRpth
Copy link
Member

RekGRpth commented Mar 3, 2025

With the following config file:
Config file
I got an error:
Output

How do you run rebalance? In docker? In one container or in several? I have provided a working config for running in one container above.

In cloud, with several vms. I'll try docker setup

This was a response to a comment.

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 3, 2025

This was a response to a comment.

Yep, sorry. I 've misinterpreted

@whitehawk
Copy link

How do you run rebalance? In docker? In one container or in several? I have provided a working config for running in one container #1236 (comment).

Didn't reproduce. What is in gprecoverseg log files?

I guess it is some issue with my local setup. Maybe some resource related issue. I tested with 4 containers (1 for coordinator and 3 for segments). When I tried to do the same on other machine (in the cloud), I didn't reproduce the problem. So'll keep testing with the cloud machine.

@whitehawk
Copy link

When I tried to rebalance a cluster with following configuration:

 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
    6 |       4 | p    | p              | s    | u      | 7006 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
    9 |       7 | p    | p              | s    | u      | 7009 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   10 |       8 | p    | p              | s    | u      | 7010 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
   11 |       0 | m    | m              | s    | u      | 7050 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   16 |       5 | m    | m              | s    | u      | 7055 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
   18 |       7 | m    | m              | s    | u      | 7057 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(20 rows)

I got an error:

20250304:07:34:57:012809 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: Host sdw1 does not have any valid primary datadirs for segment SegmentId(dbid=15, contentid=4)
Full configuration file used for cluster init
TRUSTED_SHELL=ssh
ENCODING=UNICODE
SEG_PREFIX=demoDataDir
HEAP_CHECKSUM=on
HBA_HOSTNAMES=0
QD_PRIMARY_ARRAY=cdw~cdw~7000~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~7002~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0~2~0
sdw2~sdw2~7003~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1~3~1
sdw2~sdw2~7004~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2~4~2
sdw2~sdw2~7005~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3~5~3
sdw2~sdw2~7006~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4~6~4
sdw2~sdw2~7007~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5~7~5
sdw2~sdw2~7008~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6~8~6
sdw2~sdw2~7009~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7~9~7
sdw2~sdw2~7010~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8~10~8
)
declare -a MIRROR_ARRAY=(
sdw3~sdw3~7050~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0~11~0
sdw3~sdw3~7051~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1~12~1
sdw3~sdw3~7052~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2~13~2
sdw3~sdw3~7053~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3~14~3
sdw3~sdw3~7054~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4~15~4
sdw3~sdw3~7055~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5~16~5
sdw3~sdw3~7056~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6~17~6
sdw3~sdw3~7057~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7~18~7
sdw3~sdw3~7058~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8~19~8
)
Full log of gprebalance
gpadmin@cdw:~$ gprebalance

psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration order by address, content;"
psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration where role='p' order by address, content;"
psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration where role='m' order by address, content;"
psql -U gpadmin -h localhost -p 7000 postgres -c "select content, role, address from gp_segment_configuration order by content, role;"
20250304:07:34:48:012809 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.35.g486f725235 build commit:486f725235283f7421f6864319c622f778bc7c0e'
20250304:07:34:48:012809 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.35.g486f725235 build commit:486f725235283f7421f6864319c622f778bc7c0e) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar  4 2025 06:39:29 (with assert checking) Bhuvnesh C.'
20250304:07:34:48:012809 gprebalance:cdw:gpadmin-[INFO]:-Querying gprebalance schema for current state
20250304:07:34:48:012809 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
>
20250304:07:34:52:012809 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250304:07:34:52:012809 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250304:07:34:52:012809 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema
20250304:07:34:57:012809 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: Host sdw1 does not have any valid primary datadirs for segment SegmentId(dbid=15, contentid=4)

Exiting...
20250304:07:34:57:012809 gprebalance:cdw:gpadmin-[INFO]:-Dropping rebalance schema
20250304:07:34:57:012809 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@bimboterminator1
Copy link
Member Author

I got an error

fixed

@whitehawk
Copy link

I got an error

fixed

My expectation as the user was that the tool would change the original configuration:

 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
    6 |       4 | p    | p              | s    | u      | 7006 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
    9 |       7 | p    | p              | s    | u      | 7009 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   10 |       8 | p    | p              | s    | u      | 7010 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
   11 |       0 | m    | m              | s    | u      | 7050 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   16 |       5 | m    | m              | s    | u      | 7055 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
   18 |       7 | m    | m              | s    | u      | 7057 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
   19 |       8 | m    | m              | s    | u      | 7058 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
(20 rows)

into smth like:


 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
   11 |       0 | m    | m              | s    | u      | 7050 | sdw2     | sdw2    | /data/mirror/gpseg0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
    6 |       4 | m    | m              | s    | u      | 7054 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
    7 |       5 | m    | m              | s    | u      | 7055 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
    8 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    9 |       7 | m    | m              | s    | u      | 7057 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
   10 |       8 | m    | m              | s    | u      | 7058 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   15 |       4 | p    | p              | s    | u      | 7006 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
   16 |       5 | p    | p              | s    | u      | 7007 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
   17 |       6 | p    | p              | s    | u      | 7008 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
   18 |       7 | p    | p              | s    | u      | 7009 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   19 |       8 | p    | p              | s    | u      | 7010 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
(20 rows)

But I got:

 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
   11 |       0 | m    | m              | s    | u      | 7013 | sdw2     | sdw2    | /data/mirror/gpseg0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
   13 |       2 | m    | m              | s    | u      | 7052 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
    6 |       4 | m    | m              | s    | u      | 7011 | sdw2     | sdw2    | /data/mirror/gpseg4
    7 |       5 | m    | m              | s    | u      | 7013 | sdw1     | sdw1    | /data/mirror/gpseg5
    8 |       6 | m    | m              | s    | u      | 7015 | sdw2     | sdw2    | /data/mirror/gpseg6
    9 |       7 | m    | m              | s    | u      | 7017 | sdw1     | sdw1    | /data/mirror/gpseg7
   10 |       8 | m    | m              | s    | u      | 7019 | sdw1     | sdw1    | /data/mirror/gpseg8
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   15 |       4 | p    | p              | s    | u      | 7010 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg4
   16 |       5 | p    | p              | s    | u      | 7060 | sdw3     | sdw3    | /data/primary/gpseg5
   17 |       6 | p    | p              | s    | u      | 7014 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/gpseg6
   18 |       7 | p    | p              | s    | u      | 7064 | sdw3     | sdw3    | /data/primary/gpseg7
   19 |       8 | p    | p              | s    | u      | 7066 | sdw3     | sdw3    | /data/primary/gpseg8
(20 rows)

  1. Why can't we use the datadirs that were specified in the config file for each segment?
  2. Why the new datadirs ignore the setting SEG_PREFIX=demoDataDir?
  3. Some primary segments now have port number that was previously specified for the mirror (and vice versa). Is it Ok?

@whitehawk

This comment was marked as resolved.

This patch implements the --clean option of the gprebalance utility. It cleans
the status-file, the gprebalance schema in the database, and completely removes
the rebalance directory from the coordinator.

Ticket: ADBDEV-6856
@whitehawk

This comment was marked as resolved.

@whitehawk
Copy link

Is it Ok that after gprebalance failed, on the second run it doesn't produce any meaningfull message?

gpadmin@cdw:~$ gprebalance
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 07:42:27 (with assert checking) Bhuvnesh C.'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
Full log:
gpadmin@cdw:~$ gprebalance
20250310:08:13:06:011727 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a'
20250310:08:13:06:011727 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 07:42:27 (with assert checking) Bhuvnesh C.'
20250310:08:13:06:011727 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
>
20250310:08:13:08:011727 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: Cannot evenly distribute 8 segments across 3 hosts.

Exiting...
20250310:08:13:08:011727 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$ gprebalance
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 07:42:27 (with assert checking) Bhuvnesh C.'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@RekGRpth
Copy link
Member

Is it Ok that after gprebalance failed, on the second run it doesn't produce any meaningfull message?

gpadmin@cdw:~$ gprebalance
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.44.g944229a529 build commit:944229a5297268272bb6dd08702c701e57a1d68a) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 07:42:27 (with assert checking) Bhuvnesh C.'
20250310:08:13:24:011741 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

Full log:

First you need to make a cleanup.

@whitehawk
Copy link

First you need to make a cleanup.

It exits without suggesting it. It is not user friendly I think.

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 10, 2025

gprebalance failed: Host sdw3 does not have any valid primary datadirs for segment SegmentId

You need to create the directory if it does not exist, or provide a valid path.

It exits without suggesting it. It is not user friendly I think.

I'll add that

@whitehawk
Copy link

gprebalance failed: Host sdw3 does not have any valid primary datadirs for segment SegmentId

You need to create the directory if it does not exist, or provide a valid path.

In my case /data/mirror and /data/primary exist on segments

log:
gpadmin@cdw:~$ source /usr/local/greenplum-db-devel/greenplum_path.sh
export PGPORT=7000
export COORDINATOR_DATA_DIRECTORY=/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
export MASTER_DATA_DIRECTORY=/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1

psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   14 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
    8 |       0 | m    | m              | s    | u      | 7007 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
    3 |       1 | p    | p              | s    | u      | 7002 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    9 |       1 | m    | m              | s    | u      | 7008 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
   10 |       2 | m    | m              | s    | u      | 7009 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
    5 |       3 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
    6 |       4 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7006 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
   11 |       3 | m    | m              | s    | u      | 7007 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   12 |       4 | m    | m              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   13 |       5 | m    | m              | s    | u      | 7009 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
(14 rows)

gpadmin@cdw:~$ ssh sdw1 /data/*
bash: line 1: /data/gpdata: Is a directory
gpadmin@cdw:~$ ssh sdw1 ls -l /data/*
total 0
gpadmin@cdw:~$ ssh sdw2 ls -l /data/*
total 0
gpadmin@cdw:~$ ssh sdw3 ls -l /data/*
total 0
gpadmin@cdw:~$ ssh sdw1 mkdir -p /data/mirror
gpadmin@cdw:~$ ssh sdw1 mkdir -p /data/primary
gpadmin@cdw:~$ ssh sdw2 mkdir -p /data/mirror
gpadmin@cdw:~$ ssh sdw2 mkdir -p /data/primary
gpadmin@cdw:~$ ssh sdw3 mkdir -p /data/mirror
gpadmin@cdw:~$ ssh sdw3 mkdir -p /data/primary
gpadmin@cdw:~$ ssh sdw1 ls -l /data/*
total 0
gpadmin@cdw:~$ ssh sdw1 ls -l /data/
total 12
drwxrwxrwx 2 root    root    4096 Mar 10 12:05 gpdata
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:10 mirror
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:10 primary
gpadmin@cdw:~$ ssh sdw2 ls -l /data/
total 12
drwxrwxrwx 2 root    root    4096 Mar 10 12:05 gpdata
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:10 mirror
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:10 primary
gpadmin@cdw:~$ ssh sdw3 ls -l /data/
total 12
drwxrwxrwx 2 root    root    4096 Mar 10 12:05 gpdata
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:11 mirror
drwxr-xr-x 2 gpadmin gpadmin 4096 Mar 10 12:11 primary
gpadmin@cdw:~$ gprebalance
20250310:12:16:01:009624 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.46.ga6717d979d build commit:a6717d979dba5b42eae8780d3a7104e09d81c93e'
20250310:12:16:01:009624 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.46.ga6717d979d build commit:a6717d979dba5b42eae8780d3a7104e09d81c93e) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 11:48:06 (with assert checking) Bhuvnesh C.'
20250310:12:16:01:009624 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> spread
20250310:12:16:06:009624 gprebalance:cdw:gpadmin-[INFO]:-Validation passed. Preparing rebalance...
20250310:12:16:06:009624 gprebalance:cdw:gpadmin-[INFO]:-Planning rebalance...
20250310:12:16:06:009624 gprebalance:cdw:gpadmin-[INFO]:-Creating expansion schema

The segment (dbid=5, content=3) is about to be moved to host sdw1, but no mirror datadirs are specified for the host.

Enter the mirror datadir prefix (default=/data/mirror):
>

The segment (dbid=7, content=5) is about to be moved to host sdw3, but no primary datadirs are specified for the host.

Enter the primary datadir prefix (default=/data/primary):
>
20250310:12:16:21:009624 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: Host sdw3 does not have any valid primary datadirs for segment SegmentId(dbid=12, contentid=4).None of the set() either exists or has enough free space for segment movement

Exiting...
20250310:12:16:21:009624 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@whitehawk
Copy link

When only mirrors are not balanced, gprebalance says that the cluster is in balanced state:

psql -U gpadmin -h localhost -p 7000 postgres -c "select * from gp_segment_configuration order by address, content;"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
   11 |       0 | m    | m              | s    | u      | 7050 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
   12 |       1 | m    | m              | s    | u      | 7051 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
    3 |       1 | p    | p              | s    | u      | 7003 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
    4 |       2 | p    | p              | s    | u      | 7004 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
   13 |       2 | m    | m              | s    | u      | 7052 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
   14 |       3 | m    | m              | s    | u      | 7053 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   15 |       4 | m    | m              | s    | u      | 7054 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
   16 |       5 | m    | m              | s    | u      | 7055 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
   17 |       6 | m    | m              | s    | u      | 7056 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
   18 |       7 | m    | m              | s    | u      | 7057 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
   19 |       8 | m    | m              | s    | u      | 7058 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
    6 |       4 | p    | p              | s    | u      | 7006 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
    7 |       5 | p    | p              | s    | u      | 7007 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
    8 |       6 | p    | p              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
    9 |       7 | p    | p              | s    | u      | 7009 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   10 |       8 | p    | p              | s    | u      | 7010 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
(20 rows)

gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$ gprebalance
20250311:03:40:08:012918 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.46.ga6717d979d build commit:a6717d979dba5b42eae8780d3a7104e09d81c93e'
20250311:03:40:08:012918 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.46.ga6717d979d build commit:a6717d979dba5b42eae8780d3a7104e09d81c93e) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 10 2025 11:48:06 (with assert checking) Bhuvnesh C.'
20250311:03:40:09:012918 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
>
20250311:03:40:11:012918 gprebalance:cdw:gpadmin-[INFO]:-Cluster is already balanced
20250311:03:40:11:012918 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

I suppose that it should take into account mirrors as well.

Config file:
TRUSTED_SHELL=ssh
ENCODING=UNICODE
SEG_PREFIX=demoDataDir
HEAP_CHECKSUM=on
HBA_HOSTNAMES=0
QD_PRIMARY_ARRAY=cdw~cdw~7000~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1~1~-1~0
declare -a PRIMARY_ARRAY=(
sdw1~sdw1~7002~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0~2~0
sdw1~sdw1~7003~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1~3~1
sdw1~sdw1~7004~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2~4~2
sdw2~sdw2~7005~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3~5~3
sdw2~sdw2~7006~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4~6~4
sdw2~sdw2~7007~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5~7~5
sdw3~sdw3~7008~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6~8~6
sdw3~sdw3~7009~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7~9~7
sdw3~sdw3~7010~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8~10~8
)
declare -a MIRROR_ARRAY=(
sdw1~sdw1~7050~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0~11~0
sdw1~sdw1~7051~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1~12~1
sdw1~sdw1~7052~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2~13~2
sdw1~sdw1~7053~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3~14~3
sdw1~sdw1~7054~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4~15~4
sdw1~sdw1~7055~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5~16~5
sdw1~sdw1~7056~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6~17~6
sdw1~sdw1~7057~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7~18~7
sdw1~sdw1~7058~/home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8~19~8
)

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 11, 2025

When only mirrors are not balanced, gprebalance says that the cluster is in balanced state:

Discussed this with architect. He agreed that's an oversight, however he allowed to neglect this, and leave current variant (cluster balance = primary balance) in MVP.

In my case /data/mirror and /data/primary exist on segments

Fixed

Why can't we use the datadirs that were specified in the config file for each segment?
Why the new datadirs ignore the setting SEG_PREFIX=demoDataDir?
Some primary segments now have port number that was previously specified for the mirror (and vice versa). Is it Ok?

For now it's out of scope. I'll ask the architect to confirm this in the parent ticket

@whitehawk
Copy link

It exits without suggesting it. It is not user friendly I think.

I'll add that

Hm, still quitely shutting down:

log
gpadmin@cdw:~$ gprebalance
20250312:05:01:54:012832 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:05:01:54:012832 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:05:01:54:012832 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...

You haven't specified desirable mirroring strategy.  Spread mirroring places
a given hosts mirrored segments each on a separate host.  You must be
using more hosts than the number of segments per host for spread mirroring.
Grouped mirroring places all of a given hosts segments on a single
mirrored host.  You must be using at least 2 hosts for grouped strategy.



What type of mirroring strategy would you like?
 spread|grouped (default=grouped):
> spread
20250312:05:01:57:012832 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: Cannot support spread mirroring strategy on given configuration. Use cluster utilities like gpresize or gpexpand to get desired cluster configuration

Exiting...
20250312:05:01:57:012832 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$ gprebalance
20250312:05:02:00:012844 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:05:02:00:012844 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:05:02:00:012844 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$ gprebalance
20250312:05:05:23:012876 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:05:05:23:012876 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:05:05:23:012876 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$

Not a big issue though for the MVP...

@whitehawk
Copy link

If one of the primary segments is down, gprebalance should show an error, but it should show the list of 'd' segments and should ask user for work continue, but now it shows a not relevant error:

postgres=# select * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
    1 |      -1 | p    | p              | n    | u      | 7000 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
    8 |       6 | p    | p              | s    | u      | 7008 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir6
   17 |       6 | m    | m              | s    | u      | 7056 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
    4 |       2 | p    | p              | s    | u      | 7004 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
   13 |       2 | m    | m              | s    | u      | 7052 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
    6 |       4 | p    | p              | s    | u      | 7006 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir4
   15 |       4 | m    | m              | s    | u      | 7054 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir4
    2 |       0 | p    | p              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
   11 |       0 | m    | m              | s    | u      | 7050 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
    9 |       7 | p    | p              | s    | u      | 7009 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir7
   18 |       7 | m    | m              | s    | u      | 7057 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
    7 |       5 | p    | p              | s    | u      | 7007 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir5
   16 |       5 | m    | m              | s    | u      | 7055 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir5
    5 |       3 | p    | p              | s    | u      | 7005 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir3
   14 |       3 | m    | m              | s    | u      | 7053 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir3
   10 |       8 | p    | p              | s    | u      | 7010 | sdw3     | sdw3    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast3/demoDataDir8
   19 |       8 | m    | m              | s    | u      | 7058 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir8
   20 |      -1 | m    | m              | s    | u      | 7001 | cdw      | cdw     | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/standby
    3 |       1 | m    | p              | n    | d      | 7003 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast2/demoDataDir1
   12 |       1 | p    | m              | n    | u      | 7051 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
(20 rows)

postgres=# \q
gpadmin@cdw:~$ gprebalance
20250312:05:28:05:012886 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:05:28:05:012886 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:05:28:05:012886 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...
20250312:05:28:05:012886 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: 'Logger' object has no attribute 'eror'

Exiting...
20250312:05:28:05:012886 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@whitehawk
Copy link

According to functional requirements, item#3, gprebalance should support rebalancing of a cluster, where one or several segments have preferred_role != role, but I'm getting an error in such case:

gpadmin@cdw:~$ psql -U gpadmin -h localhost -p 7000 postgres -c "
select * from gp_segment_configuration where role<>preferred_role;
"
 dbid | content | role | preferred_role | mode | status | port | hostname | address |                                 datadir
------+---------+------+----------------+------+--------+------+----------+---------+--------------------------------------------------------------------------
   11 |       0 | m    | p              | s    | u      | 7050 | sdw2     | sdw2    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir0
    2 |       0 | p    | m              | s    | u      | 7002 | sdw1     | sdw1    | /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
(2 rows)

gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$
gpadmin@cdw:~$ gprebalance
20250312:06:22:46:012791 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:06:22:46:012791 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:06:22:46:012791 gprebalance:cdw:gpadmin-[INFO]:-Validation of rebalance possibility...
20250312:06:22:46:012791 gprebalance:cdw:gpadmin-[ERROR]:-gprebalance failed: 'Logger' object has no attribute 'eror'

Exiting...
20250312:06:22:46:012791 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...

@RekGRpth
Copy link
Member

object has no attribute 'eror'

typo?

@whitehawk
Copy link

If gprebalance has timed out, shouldn't it return non zero exit status (as we haven't successfully rebalanced the cluster)?

20250312:07:01:46:012805 gprebalance:cdw:gpadmin-[INFO]:-Execution timeout is reached. Waiting the existing jobs to finish and stopping rebalance.
20250312:07:02:30:012805 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 17): /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror1/demoDataDir6
20250312:07:02:31:012805 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 18): /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir7
20250312:07:02:31:012805 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 13): /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror3/demoDataDir2
20250312:07:02:31:012805 gprebalance:cdw:gpadmin-[INFO]:-Removing old segment's datadir (dbidi = 12): /home/gpadmin/gpdb_src/gpAux/gpdemo/datadirs/dbfast_mirror2/demoDataDir1
20250312:07:02:31:012805 gprebalance:cdw:gpadmin-[INFO]:-Rebalance stopped due to timeout
20250312:07:02:31:012805 gprebalance:cdw:gpadmin-[INFO]:-Shutting down gprebalance...
gpadmin@cdw:~$ echo $?
0

@whitehawk
Copy link

I suppose that gprebalance should provide some specific message if cluster is started in 'master instance only' mode, not just an error message that it couldn't connect to the database:

20250312:07:23:46:013405 gpstart:cdw:gpadmin-[INFO]:-Coordinator Started...
gpadmin@cdw:~$ psql postgres
psql: error: FATAL:  System was started in single node mode - only utility mode connections are allowed
gpadmin@cdw:~$ gprebalance -m grouped
20250312:07:24:12:013428 gprebalance:cdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370'
20250312:07:24:12:013428 gprebalance:cdw:gpadmin-[INFO]:-coordinator Greenplum Version: 'PostgreSQL 12.12 (Greenplum Database 7.2.0_arenadata7+dev.50.g494f0f045a build commit:494f0f045aff954d3d38e3120d1819c97c668370) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Mar 12 2025 04:11:02 (with assert checking) Bhuvnesh C.'
20250312:07:24:12:013428 gprebalance:cdw:gpadmin-[ERROR]:-Failed to connect to database.  Make sure the Greenplum instance you wish to expand is running and that your environment is correct, then rerun gprebalance-m grouped

But that could be left for the final implementation, not for MVP, I guess...

@bimboterminator1
Copy link
Member Author

bimboterminator1 commented Mar 13, 2025

@whitehawk

the comments in parent ticket for mvp contain the confirmatioin that current version is enough for MVP. Thanks for review and useful observations, those ones will be included in the final SRS

@whitehawk
Copy link

Ok, approving this PR with the assumption that all not covered issues will be handled in the final implementation.

@bimboterminator1 bimboterminator1 merged commit 5dc85b0 into feature/ADBDEV-6608 Mar 13, 2025
1 check passed
@bimboterminator1 bimboterminator1 deleted the ADBDEV-6855 branch March 13, 2025 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants