Thanks to visit codestin.com
Credit goes to github.com

Skip to content

shard has exceeded the maximum number of retries [1] #1630

@kazukiyashiro

Description

@kazukiyashiro

Hello!

For usage questions and help

https://discuss.elastic.co/t/curator-shard-has-exceeded-the-maximum-number-of-retries-1/290059

When the curator tries to allocate a replica shard of shrunken index I've got this error:

{
  "index" : "example-index-2021-09-29-shrink",
  "shard" : 0,
  "primary" : false,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-11-23T12:26:19.515Z",
    "failed_allocation_attempts" : 1,
    "details" : "failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ",
    "last_allocation_status" : "no_attempt"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "8r_zhRD4RDm2peWnDun_3w",
      "node_name" : "node13",
      "transport_address" : "192.168.0.162:9300",
      "node_attributes" : {
        "ml.machine_memory" : "135291469824",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "node_decision" : "no",
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [1] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-11-23T12:26:19.515Z], failed_attempts[1], delayed=false, details[failed shard on node [8r_zhRD4RDm2peWnDun_3w]: failed recovery, failure RecoveryFailedException[[example-index-2021-09-29-shrink][0]: Recovery failed from {node15}{nWOPSov3TFKUunoiooVxMQ}{PSAfiXvZQx-NLyKpnXGs1A}{192.168.0.164}{192.168.0.164:9300}{ml.machine_memory=135291469824, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} into {node13}{8r_zhRD4RDm2peWnDun_3w}{KU0HhEPMQ_ilSV3RCe4XNw}{192.168.0.162}{192.168.0.162:9300}{ml.machine_memory=135291469824, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: RemoteTransportException[[node15][172.17.0.3:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [85] files with total size of [24.8gb]]; nested: ReceiveTimeoutTransportException[[node13][192.168.0.162:9300][internal:index/shard/recovery/file_chunk] request_id [1586168734] timed out after [899897ms]]; ], allocation_status[no_attempt]]]"
        }
      ]

Is there a way to increase the "index.allocation.max_retries" in curator settings?

Action file:

actions:
  1:
    action: shrink
    description: >-
      Shrink selected indices on the node with the most available space.
      Delete source index after successful shrink, then reroute the shrunk
      index with the provided parameters.
    options:
      ignore_empty_list: True
      shrink_node: DETERMINISTIC
      node_filters:
        permit_masters: True
      number_of_shards: 1
      number_of_replicas: ${REPLICA_COUNT:1}
      shrink_prefix:
      shrink_suffix: '-shrink'
      copy_aliases: True
      delete_after: True
      wait_for_active_shards: 1
      extra_settings:
        settings:
          index.codec: best_compression
      wait_for_completion: True
      wait_for_rebalance: True
      wait_interval: 9
      max_wait: -1
    filters:
     - filtertype: pattern
       kind: prefix
       value: ${INDEX_PREFIX}
     - filtertype: age
       source: name
       direction: older
       timestring: ${TIMESTAMP:'%Y-%m-%d'}
       unit: ${PERIOD:days}
       unit_count: ${PERIOD_COUNT}

Curator version: 5.8.4
OS: Centos 7

I've tried to create a template:

"shrink" : {
    "order" : 0,
    "index_patterns" : [
      "*-shrink"
    ],
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "5"
        }
      }

But it doesn't help.
Here are indices settings after successful shrink:

GET /example-index-shrink/_settings

{
  "example-index-shrink" : {
    "settings" : {
      "index" : {
        "allocation" : {
          "max_retries" : "1"
        },
        "shrink" : {
          "source" : {
            "name" : "example-index",
            "uuid" : "mecKKzDDTzu77ViMv5N3EA"
          }
        },
        "blocks" : {
          "write" : null
        },
        "provided_name" : "example-index-shrink",
        "creation_date" : "1637751350836",
        "number_of_replicas" : "1",
        "uuid" : "MI_wbW35R8ubkYZOySfp1g",
        "version" : {
          "created" : "6080899",
          "upgraded" : "6080899"
        },
        "codec" : "best_compression",
        "routing" : {
          "allocation" : {
            "initial_recovery" : {
              "_id" : "nWOPSov3TFKUunoiooVxMQ"
            },
            "require" : {
              "_name" : null
            }
          }
        },
        "number_of_shards" : "1",
        "routing_partition_size" : "1",
        "resize" : {
          "source" : {
            "name" : "example-index",
            "uuid" : "mecKKzDDTzu77ViMv5N3EA"
          }
        }
      }
    }
  }
}

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions