-
Notifications
You must be signed in to change notification settings - Fork 635
Open
Description
Hello!
For usage questions and help
https://discuss.elastic.co/t/curator-shard-has-exceeded-the-maximum-number-of-retries-1/290059
When the curator tries to allocate a replica shard of shrunken index I've got this error:
{
"index" : "example-index-2021-09-29-shrink",
"shard" : 0,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-11-23T12:26:19.515Z",
"failed_allocation_attempts" : 1,
"details" : "failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "8r_zhRD4RDm2peWnDun_3w",
"node_name" : "node13",
"transport_address" : "192.168.0.162:9300",
"node_attributes" : {
"ml.machine_memory" : "135291469824",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"node_decision" : "no",
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [1] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-11-23T12:26:19.515Z], failed_attempts[1], delayed=false, details[failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ], allocation_status[no_attempt]]]"
}
]
Is there a way to increase the "index.allocation.max_retries" in curator settings?
Action file:
actions:
1:
action: shrink
description: >-
Shrink selected indices on the node with the most available space.
Delete source index after successful shrink, then reroute the shrunk
index with the provided parameters.
options:
ignore_empty_list: True
shrink_node: DETERMINISTIC
node_filters:
permit_masters: True
number_of_shards: 1
number_of_replicas: ${REPLICA_COUNT:1}
shrink_prefix:
shrink_suffix: '-shrink'
copy_aliases: True
delete_after: True
wait_for_active_shards: 1
extra_settings:
settings:
index.codec: best_compression
wait_for_completion: True
wait_for_rebalance: True
wait_interval: 9
max_wait: -1
filters:
- filtertype: pattern
kind: prefix
value: ${INDEX_PREFIX}
- filtertype: age
source: name
direction: older
timestring: ${TIMESTAMP:'%Y-%m-%d'}
unit: ${PERIOD:days}
unit_count: ${PERIOD_COUNT}
Curator version: 5.8.4
OS: Centos 7
I've tried to create a template:
"shrink" : {
"order" : 0,
"index_patterns" : [
"*-shrink"
],
"settings" : {
"index" : {
"allocation" : {
"max_retries" : "5"
}
}
But it doesn't help.
Here are indices settings after successful shrink:
GET /example-index-shrink/_settings
{
"example-index-shrink" : {
"settings" : {
"index" : {
"allocation" : {
"max_retries" : "1"
},
"shrink" : {
"source" : {
"name" : "example-index",
"uuid" : "mecKKzDDTzu77ViMv5N3EA"
}
},
"blocks" : {
"write" : null
},
"provided_name" : "example-index-shrink",
"creation_date" : "1637751350836",
"number_of_replicas" : "1",
"uuid" : "MI_wbW35R8ubkYZOySfp1g",
"version" : {
"created" : "6080899",
"upgraded" : "6080899"
},
"codec" : "best_compression",
"routing" : {
"allocation" : {
"initial_recovery" : {
"_id" : "nWOPSov3TFKUunoiooVxMQ"
},
"require" : {
"_name" : null
}
}
},
"number_of_shards" : "1",
"routing_partition_size" : "1",
"resize" : {
"source" : {
"name" : "example-index",
"uuid" : "mecKKzDDTzu77ViMv5N3EA"
}
}
}
}
}
}
Thanks in advance
Metadata
Metadata
Assignees
Labels
No labels