Hive catalog: Add retry logic for hive locking#701
Hive catalog: Add retry logic for hive locking#701HonahX merged 3 commits intoapache:mainfrom frankliee:lock
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Thanks for the contribution! I added a few comments on TableProperties and testing
pyiceberg/catalog/hive.py
Outdated
| DEFAULT_LOCK_CHECK_MIN_WAIT_TIME = 2 | ||
| DEFAULT_LOCK_CHECK_MAX_WAIT_TIME = 30 | ||
| DEFAULT_LOCK_CHECK_RETRIES = 5 | ||
| DEFAULT_LOCK_CHECK_MULTIPLIER = 2 |
There was a problem hiding this comment.
wdyt about grouping these configs into TableProperties, along with their default value
iceberg-python/pyiceberg/table/__init__.py
Line 200 in 7bd5d9e
There was a problem hiding this comment.
Hi @kevinjqliu. I don't think these should be grouped in TableProperties. TableProperties controls the behavior of a specifc table while CatalogProperties controls the behavior of the catalog instance. In this case, these properties controls the behavior of HiveCatalog's Lock and thus should be classified as CatalogProperties.
Currently, the convention is to put each catalog's properties in their own files. In this case, they can be in hive.py.
Does this sound good to you? :)
There was a problem hiding this comment.
In Iceberg-Java, these props are in TableProperties like that.
iceberg.hive.lock-check-min-wait-ms=xxx
iceberg.hive.lock-check-max-wait-ms=xxx
iceberg.hive.lock-timeout-ms=xxx
So, it is better to unify this setting?
There was a problem hiding this comment.
In java, they are in the HadoopConfiguration. The hadoop configuration for a hive catalog is set at the catalog-level:
https://github.com/apache/iceberg/blob/817a5e1be1616af77329965ac3742c14ca3ae116/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L630
Metastore parse these properties from the hadoop configuration
I call them "CatalogProperties" since pyiceberg does not have hadoop configuration. Sorry if that creates any confusion.
The only hive-lock-related config in TableProperties is engine.hive.lock-enabled which can enable/disable lock for a single table. But that is for https://issues.apache.org/jira/browse/HIVE-26882, see the "warn" section in the end of https://iceberg.apache.org/docs/nightly/configuration/#hadoop-configuration. I don't think we need that in pyiceberg now.
WDYT?
There was a problem hiding this comment.
You are right, I have see this setting comes from HadoopConfiguration, "CatalogProperties" seems more suitable.
tests/integration/test_reads.py
Outdated
|
|
||
| def another_task() -> None: | ||
| lock1: LockResponse = open_client.lock(session_catalog_hive._create_lock_request(database_name, table_name)) | ||
| time.sleep(5) |
There was a problem hiding this comment.
nit: time.sleep in test is typically an anti-pattern, this will add at least 5 seconds to the test suite in the future.
There was a problem hiding this comment.
It might be easier to mock lock and check_lock functions instead of relying on the timing of the function calls
There was a problem hiding this comment.
Also maybe add a test case for when _wait_for_lock failed to acquire locks after retry
There was a problem hiding this comment.
https://stackoverflow.com/questions/47906671/python-retry-with-tenacity-disable-wait-for-unittest
This might be helpful to override the waiting behavior in retry
There was a problem hiding this comment.
I have added a new unit test test_hive_wait_for_lock in test_hive.py that uses mocked lock and check_lock.
But I still keep an integration test based on the real hive metastore to simulate real-world cases.
In order to reduce the test latency, I use fine-grained sleep time instead.
WDYT?
HonahX
left a comment
There was a problem hiding this comment.
Thanks @frankliee for working on this and @kevinjqliu for reviewing!
pyiceberg/catalog/hive.py
Outdated
| acquire_lock_timeout = ( | ||
| properties.get(TableProperties.HIVE_ACQUIRE_LOCK_TIMEOUT_MS, TableProperties.HIVE_ACQUIRE_LOCK_TIMEOUT_MS_DEFAULT) | ||
| ) / 1000.0 | ||
| lock_check_min_wait_time = ( | ||
| properties.get(TableProperties.HIVE_LOCK_CHECK_MIN_WAIT_MS, TableProperties.HIVE_LOCK_CHECK_MIN_WAIT_MS_DEFAULT) | ||
| ) / 1000.0 | ||
| lock_check_max_wait_time = ( | ||
| properties.get(TableProperties.HIVE_LOCK_CHECK_MAX_WAIT_MS, TableProperties.HIVE_LOCK_CHECK_MAX_WAIT_MS_DEFAULT) | ||
| ) / 1000.0 |
There was a problem hiding this comment.
We could use PropertyUtil.property_as_int to get the values. This utl methods throws error message when failing to parse the property.
BTW: You would properly need #type: ignore at the end of each PropertyUti.property_as_int to pass the linter.
There was a problem hiding this comment.
I have added property_as_float to allow fine-grained setting.
pyiceberg/catalog/hive.py
Outdated
| DEFAULT_LOCK_CHECK_MIN_WAIT_TIME = 2 | ||
| DEFAULT_LOCK_CHECK_MAX_WAIT_TIME = 30 | ||
| DEFAULT_LOCK_CHECK_RETRIES = 5 | ||
| DEFAULT_LOCK_CHECK_MULTIPLIER = 2 |
There was a problem hiding this comment.
In java, they are in the HadoopConfiguration. The hadoop configuration for a hive catalog is set at the catalog-level:
https://github.com/apache/iceberg/blob/817a5e1be1616af77329965ac3742c14ca3ae116/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L630
Metastore parse these properties from the hadoop configuration
I call them "CatalogProperties" since pyiceberg does not have hadoop configuration. Sorry if that creates any confusion.
The only hive-lock-related config in TableProperties is engine.hive.lock-enabled which can enable/disable lock for a single table. But that is for https://issues.apache.org/jira/browse/HIVE-26882, see the "warn" section in the end of https://iceberg.apache.org/docs/nightly/configuration/#hadoop-configuration. I don't think we need that in pyiceberg now.
WDYT?
pyiceberg/catalog/hive.py
Outdated
| LOCK_CHECK_MIN_WAIT_TIME = "lock_check_min_wait_time" | ||
| LOCK_CHECK_MAX_WAIT_TIME = "lock_check_max_wait_time" | ||
| LOCK_CHECK_RETRIES = "lock_check_retries" | ||
| DEFAULT_LOCK_CHECK_MIN_WAIT_TIME = 2 | ||
| DEFAULT_LOCK_CHECK_MAX_WAIT_TIME = 30 | ||
| DEFAULT_LOCK_CHECK_RETRIES = 10 |
There was a problem hiding this comment.
A good idea, I have updated these default values.
|
Looks like all the review comments are addressed. I'll merge this. Thanks everyone! |

In the current hive catalog implementation, locking will only be executed once.
If another task has already locked the target table, the locking will fail inevitably.
Since Iceberg-Java has implemented retry on
org.apache.iceberg.hive.MetastoreLock, we could refer this to improve Iceberg-python.This PR adds retry and wait logic for hive catalog with the following modifications.
_wait_for_lockon hive catalogWaitingForLockException