Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lambda invocation loop rework #8970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 110 commits into from
Sep 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
db52c40
wip
dominikschubert Jun 13, 2023
97d7aba
First working invoke
joe4dev Jun 14, 2023
fd86117
Only execute lambda tests (temporarily)
joe4dev Jun 14, 2023
40163e7
Add stop version todo
joe4dev Jun 14, 2023
0f02a77
fix circleci config
dominikschubert Jun 15, 2023
1995cc3
fix formatting
dominikschubert Jun 15, 2023
37de492
wip
dfangl Jul 5, 2023
b9d6cc5
wip
dominikschubert Jul 11, 2023
27b5848
Rework reserved and unreserved concurrency
joe4dev Jul 11, 2023
326b71d
Add discussion comments
joe4dev Jul 11, 2023
9c544ed
Add invocation encoder WIP
joe4dev Jul 12, 2023
1bdd973
Create internal async queue infrastructure
joe4dev Jul 12, 2023
694b2fd
Add provisioned concurrency tracker
joe4dev Jul 28, 2023
e869174
Fix payload JSON encoding
joe4dev Jul 28, 2023
9f084a6
Remove debug sleep
joe4dev Jul 28, 2023
fe9603d
Re-use environments
joe4dev Jul 28, 2023
38932b2
Add provisioned concurrency planning (WIP)
joe4dev Jul 28, 2023
8fee073
Put provisioned concurrency working
joe4dev Aug 2, 2023
3735b80
Add most simple provisioned concurrency update
joe4dev Aug 2, 2023
6657a87
Notify assignment service upon function keepalive timeout
joe4dev Aug 2, 2023
4675d5d
Fix linter error
joe4dev Aug 2, 2023
f14c2b0
Fix resource cleanup upon stopping environments
joe4dev Aug 2, 2023
73f29ac
Fix lambda cleanup of active function breaking CI
joe4dev Aug 2, 2023
a2ff598
First queue-based invoke working
joe4dev Aug 2, 2023
7c3b333
Add SQS invocation with retry field
joe4dev Aug 2, 2023
f66fb14
Async SQS message handling (WIP)
joe4dev Aug 3, 2023
5824622
Complete async failure handling (retries need fixing)
joe4dev Aug 3, 2023
75bf4eb
Add hacky workaround for broken delay seconds
joe4dev Aug 3, 2023
686091b
Disable sleep workaround for broken delay seconds
joe4dev Aug 3, 2023
ea2d177
Fix delay seconds and add thread pool
joe4dev Aug 4, 2023
2b1aa82
Handle and log exceptions
joe4dev Aug 4, 2023
a03b04f
Clarify defaults and sources of event handling implementation
joe4dev Aug 4, 2023
205c01f
Handle event_invoke_config == None
joe4dev Aug 8, 2023
fb452c1
Fix approx invocation count for reserved concurrency 0
joe4dev Aug 8, 2023
e08e371
Handle exception retries (WIP)
joe4dev Aug 8, 2023
e0914bd
Stop event manager and handle exception cases
joe4dev Aug 8, 2023
60a0ec6
Fix event source listener callback
joe4dev Aug 9, 2023
d27f887
Fix SQS => Lambda DLQ test by reducing retries
joe4dev Aug 9, 2023
761f3f6
Fix service exception types
joe4dev Aug 9, 2023
82d4aff
Fix stopping Lambda environment for provisioned concurrency
joe4dev Aug 9, 2023
fdf1ed3
Draft locking design
joe4dev Aug 9, 2023
6dc5ded
readd shutdown, refactor counting service to allow locking
dfangl Aug 9, 2023
50a4d01
Fix warn logging deprecations
joe4dev Aug 10, 2023
2b94685
Remove implemented event manager todo.py
joe4dev Aug 10, 2023
5b89d50
Fix Lambda => SNS DLQ => SQS test by reducing Lambda retries
joe4dev Aug 10, 2023
484a0c4
Fix provisioned concurrency tests and exceptions
joe4dev Aug 10, 2023
0dc1dbb
Re-activate other AWS tests
joe4dev Aug 10, 2023
38d5a52
Fix concurrency quota assumptions for provisioned concurrency test
joe4dev Aug 11, 2023
9122dfd
Fix limits testing for reserved concurrency
joe4dev Aug 11, 2023
3e15001
Re-enable all tests
joe4dev Aug 11, 2023
73146a7
Add more logging info for Lambda poller shutdown error
joe4dev Aug 11, 2023
b9d2e65
Add test for invoking non-existing function
joe4dev Aug 11, 2023
3cbe3bc
Fix locking scope and cleanup concurrency tracking
joe4dev Aug 11, 2023
16dba97
Remove draft of irrelevant counting service view
joe4dev Aug 11, 2023
681a845
Remove dead code in lambda service
joe4dev Aug 11, 2023
0b1b682
Fix snapshot skips for old provider
joe4dev Aug 11, 2023
af22ac5
Remove planning notes file
joe4dev Aug 11, 2023
2cca0d0
Fix init lock and exception handling
joe4dev Aug 22, 2023
ca8753f
Skip failing SQS DLQ test for old provider
joe4dev Aug 22, 2023
9942925
Fixing poller shutdown (WIP)
joe4dev Aug 22, 2023
2dbe962
add more debug output, reorder to avoid missing cleanups
dfangl Aug 22, 2023
f7eb882
Add botoconfig to disable retries for poller queue delete
joe4dev Aug 23, 2023
5403117
Handle runtime environment startup errors
joe4dev Aug 23, 2023
3d2eeb9
Re-generate snapshot for test_invoke_exceptions
joe4dev Aug 23, 2023
6d67d94
Skip unsupported test for old provider
joe4dev Aug 23, 2023
7b40804
Handle running executor endpoint future
joe4dev Aug 29, 2023
fefc7f1
Improve thread naming
joe4dev Aug 29, 2023
2637f40
Shut down provisioning thread
joe4dev Aug 29, 2023
caa8f5d
Improve thread naming
joe4dev Aug 29, 2023
a10ae4e
Guard invoke during version shutdown and cleanup version manager
joe4dev Aug 29, 2023
43b510b
add todo and exception supressing code which is currently inactive
dfangl Aug 29, 2023
2888083
Remove debug logs
joe4dev Aug 29, 2023
432bc67
Clarify Lambda retry base delay configuration
joe4dev Aug 29, 2023
2ed6186
Fix or clarify more TODOs
joe4dev Aug 29, 2023
d6b4331
Fix log storing positional argument
joe4dev Aug 29, 2023
d528f1c
Resolve more TODOs or clarify
joe4dev Aug 29, 2023
3feed82
Fix Lambda runtime startup deadlock
joe4dev Aug 30, 2023
14610f3
Add failing test for wrapper not found case
joe4dev Aug 30, 2023
c460be1
Add failing test for Lambda exit
joe4dev Aug 30, 2023
35ca5f0
Temporary CI fix until the moto request dispatching fix is merged
joe4dev Aug 30, 2023
6e7ded6
Match different retry attemps
joe4dev Aug 31, 2023
16cf218
Revert "Temporary CI fix until the moto request dispatching fix is me…
joe4dev Sep 4, 2023
2f5dc73
Fix async invoke type test timing
joe4dev Sep 4, 2023
5507ef5
Add additional logging if enqueuing events fails
joe4dev Sep 4, 2023
db24612
Unify stop logging terminology
joe4dev Sep 4, 2023
5cb3248
Unify skipif condition and update snapshot
joe4dev Sep 4, 2023
7015883
Add handler error test in one place
joe4dev Sep 4, 2023
e6b25f4
Make internal queue region explict and internal resource account conf…
joe4dev Sep 5, 2023
ca4bbd1
Improve exception messages
joe4dev Sep 5, 2023
5379d5a
Fix internal resource account imports
joe4dev Sep 5, 2023
e1df762
Add Lambda delete during invocation cleanup test
joe4dev Sep 5, 2023
a5ce1ca
Skip legacy tests that leak Lambda resources due to bad cleanup
joe4dev Sep 5, 2023
a57e429
Disable retries for timeout exception testing
joe4dev Sep 5, 2023
37e3cb2
Improve execution environment exception messages
joe4dev Sep 5, 2023
71cdab7
Add testcase for segfault during runtime startup
joe4dev Sep 6, 2023
3347e39
Print execution environment logs upon timeout
joe4dev Sep 6, 2023
9b2be42
Handle startup timeout separately and adjust logging and exception ha…
joe4dev Sep 6, 2023
41f97ab
Unify log and exception messages for execution environment
joe4dev Sep 7, 2023
dcf0b55
Add lambda log prefix with executor environment id
joe4dev Sep 11, 2023
dbabcd4
Unify version and execution environment performance logging
joe4dev Sep 11, 2023
936f0c5
Increase read timeout of client config
joe4dev Sep 12, 2023
a64fc3c
Reduce scope of scheduled lambda fixture causing flaky tests
joe4dev Sep 12, 2023
02a62b6
Avoid lock in provisioned concurrency case
joe4dev Sep 12, 2023
eb313a3
Replace for-else construct with clearer implementation
joe4dev Sep 12, 2023
7f10be9
Add test for lambda runtime startup error
joe4dev Sep 12, 2023
a575da6
Isolate execution environment status locks
joe4dev Sep 12, 2023
1fafc51
Make invoker pool shutdown async
joe4dev Sep 12, 2023
cda8426
Add waiter after function update
joe4dev Sep 12, 2023
6dc07f4
Add locking for provisioned state update
joe4dev Sep 12, 2023
1ec503f
Ensure to cancel startup timer
joe4dev Sep 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions localstack/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -735,6 +735,11 @@ def legacy_fallback(envar_name: str, default: T) -> T:
DOCKER_BRIDGE_IP = ip
break

# AWS account used to store internal resources such as Lambda archives or internal SQS queues.
# It should not be modified by the user, or visible to him, except as through a presigned url with the
# get-function call.
INTERNAL_RESOURCE_ACCOUNT = os.environ.get("INTERNAL_RESOURCE_ACCOUNT") or "949334387222"

# -----
# SERVICE-SPECIFIC CONFIGS BELOW
# -----
Expand Down Expand Up @@ -985,9 +990,11 @@ def legacy_fallback(envar_name: str, default: T) -> T:

# INTERNAL: 60 (default matching AWS) only applies to new lambda provider
# Base delay in seconds for async retries. Further retries use: NUM_ATTEMPTS * LAMBDA_RETRY_BASE_DELAY_SECONDS
# 300 (5min) is the maximum because NUM_ATTEMPTS can be at most 3 and SQS has a message timer limit of 15 min.
# For example:
# 1x LAMBDA_RETRY_BASE_DELAY_SECONDS: delay between initial invocation and first retry
# 2x LAMBDA_RETRY_BASE_DELAY_SECONDS: delay between the first retry and the second retry
# 3x LAMBDA_RETRY_BASE_DELAY_SECONDS: delay between the second retry and the third retry
LAMBDA_RETRY_BASE_DELAY_SECONDS = int(os.getenv("LAMBDA_RETRY_BASE_DELAY") or 60)

# PUBLIC: 0 (default)
Expand Down
54 changes: 25 additions & 29 deletions localstack/services/lambda_/event_source_listeners/adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import logging
import threading
from abc import ABC
from concurrent.futures import Future
from functools import lru_cache
from typing import Callable, Optional

Expand All @@ -13,7 +12,7 @@
from localstack.aws.protocol.serializer import gen_amzn_requestid
from localstack.services.lambda_ import api_utils
from localstack.services.lambda_.api_utils import function_locators_from_arn, qualifier_is_version
from localstack.services.lambda_.invocation.lambda_models import InvocationError, InvocationResult
from localstack.services.lambda_.invocation.lambda_models import InvocationResult
from localstack.services.lambda_.invocation.lambda_service import LambdaService
from localstack.services.lambda_.invocation.models import lambda_stores
from localstack.services.lambda_.lambda_executors import (
Expand All @@ -23,6 +22,7 @@
from localstack.utils.aws.client_types import ServicePrincipal
from localstack.utils.json import BytesEncoder
from localstack.utils.strings import to_bytes, to_str
from localstack.utils.threads import FuncThread

LOG = logging.getLogger(__name__)

Expand Down Expand Up @@ -143,29 +143,26 @@ def __init__(self, lambda_service: LambdaService):
self.lambda_service = lambda_service

def invoke(self, function_arn, context, payload, invocation_type, callback=None):
def _invoke(*args, **kwargs):
# split ARN ( a bit unnecessary since we build an ARN again in the service)
fn_parts = api_utils.FULL_FN_ARN_PATTERN.search(function_arn).groupdict()

# split ARN ( a bit unnecessary since we build an ARN again in the service)
fn_parts = api_utils.FULL_FN_ARN_PATTERN.search(function_arn).groupdict()

ft = self.lambda_service.invoke(
# basically function ARN
function_name=fn_parts["function_name"],
qualifier=fn_parts["qualifier"],
region=fn_parts["region_name"],
account_id=fn_parts["account_id"],
invocation_type=invocation_type,
client_context=json.dumps(context or {}),
payload=to_bytes(json.dumps(payload or {}, cls=BytesEncoder)),
request_id=gen_amzn_requestid(),
)

if callback:
result = self.lambda_service.invoke(
# basically function ARN
function_name=fn_parts["function_name"],
qualifier=fn_parts["qualifier"],
region=fn_parts["region_name"],
account_id=fn_parts["account_id"],
invocation_type=invocation_type,
client_context=json.dumps(context or {}),
payload=to_bytes(json.dumps(payload or {}, cls=BytesEncoder)),
request_id=gen_amzn_requestid(),
)

def mapped_callback(ft_result: Future[InvocationResult]) -> None:
if callback:
try:
result = ft_result.result(timeout=10)
error = None
if isinstance(result, InvocationError):
if result.is_error:
error = "?"
callback(
result=LegacyInvocationResult(
Expand All @@ -187,7 +184,8 @@ def mapped_callback(ft_result: Future[InvocationResult]) -> None:
error=e,
)

ft.add_done_callback(mapped_callback)
thread = FuncThread(_invoke)
thread.start()

def invoke_with_statuscode(
self,
Expand All @@ -204,7 +202,7 @@ def invoke_with_statuscode(
fn_parts = api_utils.FULL_FN_ARN_PATTERN.search(function_arn).groupdict()

try:
ft = self.lambda_service.invoke(
result = self.lambda_service.invoke(
# basically function ARN
function_name=fn_parts["function_name"],
qualifier=fn_parts["qualifier"],
Expand All @@ -218,11 +216,10 @@ def invoke_with_statuscode(

if callback:

def mapped_callback(ft_result: Future[InvocationResult]) -> None:
def mapped_callback(result: InvocationResult) -> None:
try:
result = ft_result.result(timeout=10)
error = None
if isinstance(result, InvocationError):
if result.is_error:
error = "?"
callback(
result=LegacyInvocationResult(
Expand All @@ -243,11 +240,10 @@ def mapped_callback(ft_result: Future[InvocationResult]) -> None:
error=e,
)

ft.add_done_callback(mapped_callback)
mapped_callback(result)

# they're always synchronous in the ASF provider
result = ft.result(timeout=900)
if isinstance(result, InvocationError):
if result.is_error:
return 500
else:
return 200
Expand Down
158 changes: 158 additions & 0 deletions localstack/services/lambda_/invocation/assignment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
import contextlib
import logging
from collections import defaultdict
from concurrent.futures import Future, ThreadPoolExecutor
from typing import ContextManager

from localstack.services.lambda_.invocation.execution_environment import (
EnvironmentStartupTimeoutException,
ExecutionEnvironment,
InvalidStatusException,
)
from localstack.services.lambda_.invocation.executor_endpoint import StatusErrorException
from localstack.services.lambda_.invocation.lambda_models import (
FunctionVersion,
InitializationType,
OtherServiceEndpoint,
)

LOG = logging.getLogger(__name__)


class AssignmentException(Exception):
pass


class AssignmentService(OtherServiceEndpoint):
"""
scope: LocalStack global
"""

# function_version (fully qualified function ARN) => runtime_environment_id => runtime_environment
environments: dict[str, dict[str, ExecutionEnvironment]]

# Global pool for spawning and killing provisioned Lambda runtime environments
provisioning_pool: ThreadPoolExecutor

def __init__(self):
self.environments = defaultdict(dict)
self.provisioning_pool = ThreadPoolExecutor(thread_name_prefix="lambda-provisioning-pool")

@contextlib.contextmanager
def get_environment(
self, function_version: FunctionVersion, provisioning_type: InitializationType
) -> ContextManager[ExecutionEnvironment]:
version_arn = function_version.qualified_arn
applicable_envs = (
env
for env in self.environments[version_arn].values()
if env.initialization_type == provisioning_type
)
execution_environment = None
for environment in applicable_envs:
try:
environment.reserve()
execution_environment = environment
break
except InvalidStatusException:
pass

if execution_environment is None:
if provisioning_type == "provisioned-concurrency":
raise AssignmentException(
"No provisioned concurrency environment available despite lease."
)
elif provisioning_type == "on-demand":
execution_environment = self.start_environment(function_version)
self.environments[version_arn][execution_environment.id] = execution_environment
execution_environment.reserve()
else:
raise ValueError(f"Invalid provisioning type {provisioning_type}")

try:
yield execution_environment
execution_environment.release()
except InvalidStatusException as invalid_e:
LOG.error("InvalidStatusException: %s", invalid_e)
except Exception as e:
LOG.error("Failed invocation %s", e)
self.stop_environment(execution_environment)
raise e

def start_environment(self, function_version: FunctionVersion) -> ExecutionEnvironment:
LOG.debug("Starting new environment")
execution_environment = ExecutionEnvironment(
function_version=function_version,
initialization_type="on-demand",
on_timeout=self.on_timeout,
)
try:
execution_environment.start()
except StatusErrorException:
raise
except EnvironmentStartupTimeoutException:
raise
except Exception as e:
message = f"Could not start new environment: {e}"
raise AssignmentException(message) from e
return execution_environment

def on_timeout(self, version_arn: str, environment_id: str) -> None:
"""Callback for deleting environment after function times out"""
del self.environments[version_arn][environment_id]

def stop_environment(self, environment: ExecutionEnvironment) -> None:
version_arn = environment.function_version.qualified_arn
try:
environment.stop()
self.environments.get(version_arn).pop(environment.id)
except Exception as e:
LOG.debug(
"Error while stopping environment for lambda %s, environment: %s, error: %s",
version_arn,
environment.id,
e,
)

def stop_environments_for_version(self, function_version: FunctionVersion):
# We have to materialize the list before iterating due to concurrency
environments_to_stop = list(
self.environments.get(function_version.qualified_arn, {}).values()
)
for env in environments_to_stop:
self.stop_environment(env)

def scale_provisioned_concurrency(
self, function_version: FunctionVersion, target_provisioned_environments: int
) -> list[Future[None]]:
version_arn = function_version.qualified_arn
current_provisioned_environments = [
e
for e in self.environments[version_arn].values()
if e.initialization_type == "provisioned-concurrency"
]
# TODO: refine scaling loop to re-use existing environments instead of re-creating all
# current_provisioned_environments_count = len(current_provisioned_environments)
# diff = target_provisioned_environments - current_provisioned_environments_count

# TODO: handle case where no provisioned environment is available during scaling
# Most simple scaling implementation for now:
futures = []
# 1) Re-create new target
for _ in range(target_provisioned_environments):
execution_environment = ExecutionEnvironment(
function_version=function_version,
initialization_type="provisioned-concurrency",
on_timeout=self.on_timeout,
)
self.environments[version_arn][execution_environment.id] = execution_environment
futures.append(self.provisioning_pool.submit(execution_environment.start))
# 2) Kill all existing
for env in current_provisioned_environments:
# TODO: think about concurrent updates while deleting a function
futures.append(self.provisioning_pool.submit(self.stop_environment, env))

return futures

def stop(self):
self.provisioning_pool.shutdown(cancel_futures=True)
Loading