Draft: Add ResourceFilter composition#10328
Conversation
| if hasattr(base, "apply_filters"): | ||
| stmt = base.apply_filters(self, stmt) |
There was a problem hiding this comment.
This raises the question to me: do we want to call each extension's filter, or only if any of their filter fields are set?
There was a problem hiding this comment.
I feel like we can call it always. The extension's apply_filter should (need to?) have logic to determine if they are being called or not.
There was a problem hiding this comment.
This way we can also inject some logic even if the extension does not have any filter present
| if hasattr(base, "apply_filters"): | ||
| stmt = base.apply_filters(self, stmt) |
There was a problem hiding this comment.
base.apply_filters(self, stmt)
Good enough for the PoC, but something to figure out how we'll clean it up for the actual implementation. StrawberryFilter defines apply_filter as an instance method. You typically call them on an instance. That is what you would expect. But you now forcefully call them on the class and have to pass in the instance.
I think that one solution would be the separation between data and logic that I drafted, but that had complications of its own. Another could perhaps be to use a class method instead.
| class ExampleResourceFilter(StrawberryFilter): | ||
| my_attr: StrFilter | None = None | ||
|
|
||
| def apply_filters[*Ts](self, stmt: Select[tuple[*Ts]]) -> Select[tuple[*Ts]]: |
There was a problem hiding this comment.
Great for the draft. For the eventual interface it's still unclear to me what we want this to be exactly. Mostly does it modify the select or does it simply return a join directive.
There was a problem hiding this comment.
Like a cte that we then join to filter on?
I feel like modifying the select might be clearer. wdyt?
| """ | ||
|
|
||
| loader = StrawberrySQLAlchemyLoader(async_bind_factory=get_session_factory()) | ||
| ResourceFilter = context.graphql_service.build_resource_filter() |
There was a problem hiding this comment.
mypy still hates this
There was a problem hiding this comment.
Is casting this as typing.Any the best solution?
|
|
||
|
|
||
| @strawberry.input | ||
| class BaseResourceFilter(ABC): |
There was a problem hiding this comment.
Maybe ResourceFilterABC is better
150b928 to
7e6b539
Compare
sanderr
left a comment
There was a problem hiding this comment.
I still wonder if we should / can restrict what apply_filters() can do. And I'm still leaning towards yet. But, I think that perhaps we should carry on like this first, and once we have a clear idea of what lsm needs exactly and how it will all fit together (could even be after the first merge), we can come back to this consideration and see if we can streamline that interface just a bit more.
| for filter_cls in self.all_filters: | ||
| stmt = filter_cls.apply_filter(stmt=stmt, filter_instance=filter_instance) |
There was a problem hiding this comment.
I wonder if the order ever matters here (apart from that it should be deterministic). I think not? i.e. we could safely call this on the lsm filters first and only then on the core ones?
There was a problem hiding this comment.
I believe that the order does not matter.
| def update_resource_filter(self, ext_filter: type[ResourceFilterABC]) -> None: | ||
| """ | ||
| Adds an extension's ResourceFilterABC implementation to be composed into the ResourceFilter that is exposed to the user | ||
| """ | ||
| self.resource_filter_engine.register_extension_filter(ext_filter) | ||
| self._composed_resource_filter_cls = None |
There was a problem hiding this comment.
Do we really need to be this dynamic? I think it could be a bit more robust / consistent if we can make sure that we initialize once to start with, and reject additional filters afterwards. Wdyt?
| from strawberry.types.execution import ExecutionResult | ||
|
|
||
|
|
||
| class ResourceFilterEngine: |
There was a problem hiding this comment.
If we move this into graphql.schema (and perhaps make a singleton by operating at class level rather than instance level)?, can we get rid of the whole passing around of the slice and engine that's going on there? Or am I missing something?
|
|
||
| environment: uuid.UUID | ||
| is_orphan: bool | None = strawberry.UNSET | ||
| purged: bool | None = strawberry.UNSET |
There was a problem hiding this comment.
Why do we need purged at this level?
| """ | ||
|
|
||
| environment: uuid.UUID | ||
| is_orphan: bool | None = strawberry.UNSET |
There was a problem hiding this comment.
Do we still need is_orphan at this level, considering what we discussed earlier this week? i.e. lsm filters are always about the latest intent (we may still need to figure out how to enforce that, but that's a different matter).
There was a problem hiding this comment.
I put is_orphan and purged at this level to make mypy happy because they are used in the resources query.
There was a problem hiding this comment.
I see.
- I worry that
is_orphanmight cause us trouble in the next stage, but let's see about that when we get there. For now it's fine here. - I'm pretty sure that we'll have to update the condition that we use to determine whether or not we can use the simplified count (a complication I hadn't thought of until just now). Once we have addressed that, if we're lucky
purgedwill have disappeared from theresourcesquery.
|
|
||
| @classmethod | ||
| @abstractmethod | ||
| def apply_filter[*Ts](cls, stmt: Select[tuple[*Ts]], filter_instance: typing.Self) -> Select[tuple[*Ts]]: ... |
There was a problem hiding this comment.
Apparently this Self isn't really safe. Took it to Slack.
| :param purged: If we want to filter on the "purged" attribute or not. | ||
| """ | ||
|
|
||
| environment: uuid.UUID |
There was a problem hiding this comment.
With this move we'll have to make sure to not reintroduce the bug where we omit the environment filter. But I guess the test you expanded will protect us from that.
| """ | ||
| if filter is not None and filter is not strawberry.UNSET: | ||
| stmt = filter.apply_filters(stmt) | ||
| if isinstance(filter, ResourceFilterABC): |
There was a problem hiding this comment.
I still need to give this some more thought to see if we can do anything about the clunkiness here. No immediate ideas yet.
| def register_extension_filter(self, extension_name: str, filter_cls: type[ResourceFilterABC]) -> None: | ||
| """ | ||
| Register an extension filter. | ||
| This is only possible if the composed ResourceFilter has yet to be generated |
There was a problem hiding this comment.
I think this can be improved to better address the caller. I.e. "the composed ResourceFilter has yet to be generated" is an implementation detail. The corresponding client contract is that it must be called before this slice starts, so during the prestart stage, or the start stage of a slice that registers this one as a dependency.
| async def start(self) -> None: | ||
| assert self.compiler_service is not None | ||
| self.schema = get_schema( | ||
| GraphQLContext(compiler_service=self.compiler_service, resource_filter_components=list(self.all_filters.values())) |
There was a problem hiding this comment.
Why pass this through the context? It requires quite some wiring, and I don't think it really offers us anything?
There was a problem hiding this comment.
I think I see. You use the context so you can access it in the resources method. But wouldn't it be a lot simpler to simply store it as a class var on the Query object? Or even just use the value from the closure, though I think I prefer to make it explicit on Query.
There was a problem hiding this comment.
I tried setting it as a variable of Query but got some errors, needs to be a GraphQL valid type and I got some conflicts there. And I feel like context is a good "container" for all the small things that we get from the slice to the schema.
| assert isinstance(compiler_service, CompilerService) | ||
| self.context = GraphQLContext(compiler_service=compiler_service) | ||
| self.compiler_service = compiler_service | ||
| self.all_filters[SLICE_GRAPHQL] = CoreResourceFilter |
There was a problem hiding this comment.
Bit of a nitpick, and I don't care too strongly, but I think it could turn out slightly cleaner if we move this responsibility to get_schema. i.e. the slice knows about extensions, the schema knows about core and how to extend it with extensions. Apart from the slightly debatable semantical improvement, I think it offers the following benefits:
- We get rid of the back-and-forth where we import a filter from the schema here, to pass it back down to the schema. This flow feels a bit weird to me.
- We make the
typing.Annotateda bit more safe, because ifget_schemais the one that injects the core filter, than we actually know that the generated type will be an instance of it. Otherwise we introduce more coupling, because thetyping.Annotateddepends on the assumption that the slice injects the core resource filter.
There was a problem hiding this comment.
Not sure what you mean by this. So we wouldn't have all_filters as a concept on the slice? Or do you mean that GraphQLSlice does not register CoreResourceFilter but keeps all the other filters, and then we just append CoreResourceFilter on get_schema?
There was a problem hiding this comment.
Pretty much. Well, I think prepend would be safest, but that's beside the point.
| resource_filter_instances: list[ResourceFilterABC] = [] | ||
| for filter_type in info.context.get("resource_filter_components"): | ||
| filter_fields = { | ||
| field.name: getattr(filter, field.name, strawberry.UNSET) for field in dataclasses.fields(filter_type) |
There was a problem hiding this comment.
Why use getattr()? To be extra robust? It normally shouldn't ever be absent, should it?
There was a problem hiding this comment.
I'm not sure how would the alternative look like. Or do you mean the default to strawberry.UNSET?
There was a problem hiding this comment.
I was somewhat mistaken here, sorry. What I meant indeed boils down to the default.
There was a problem hiding this comment.
I remember I ran into some issues without the default. But that may have been from a previous iteration where we where converting to dict.
The tests seem to pass without the default, I'll remove it so we can more easily diagnose if something goes wrong
| if filter is not None and filter is not strawberry.UNSET: | ||
| stmt = filter.apply_filters(stmt) | ||
| for filter_instance in filter: | ||
| if filter_instance is not strawberry.UNSET: |
There was a problem hiding this comment.
Not necessarily in scope for this PR, but why do we check for UNSET actually? It's not of type StrawberryFilter, is it?
There was a problem hiding this comment.
I think this is just an iteration artifact. I check for UNSET before I call this when applicable
|
I updated the draft to match the design. There are still some mypy issues that I might need some assistance with. EDIT: I tried on the return object of the resources query |
sanderr
left a comment
There was a problem hiding this comment.
I tried to do a broad, high level review and to only comment on things that are relevant at that level.
Some things that come to mind that are still missing (there are probably others, I didn't cross check with the design):
- support for different version filters (we still branch our query on
include_orphans)
| """ | ||
|
|
||
| @classmethod | ||
| def get_resource_filter_input_class(cls) -> type[ResourceFilterABC] | None: |
There was a problem hiding this comment.
Interesting. I believe I proposed something like this at some point and we didn't go this way. Why did you change it? I assume it has to do with the fact that we need both input and output? How / when do they differ in their fields?
EDIT: Ah, I guess "meta" filters like includeOwned... We could probably also use annotations on the fields to determine whether they're in/out/both. But let's go with this approach for now and see how it ends up. If it looks clean, this is probably the simplest. If it turns out to be too repititive / coupled, perhaps we can reconsider.
| return None | ||
|
|
||
| @classmethod | ||
| def get_context_loaders(cls) -> dict[str, object]: |
There was a problem hiding this comment.
What's this? The docstring doesn't make it clear to me and I don't see it being used anywhere.
|
|
||
| Resource: type = build_resource_return_obj(extension_contributions) | ||
|
|
||
| class CustomInfo(Info[StrawberryInfoContextDict, object]): |
There was a problem hiding this comment.
I thought we were moving away from this context object?
There was a problem hiding this comment.
(Also related to the comment above) I don't think we can/should. The way we fetch lsm related information will be through custom resolvers (like the purged attribute). I want the extensions to inject stuff that they require here to help in the logic of the custom resolvers. This is an example that Claude made:
```python
@dataclasses.dataclass
class _LsmResourceData:
service_entity: str
service_instance_id: uuid.UUID
lifecycle_state: str
Shared DataLoader helper:
async def _load_lsm_data(root: Any, info: Info) -> Optional[_LsmResourceData]:
make_loader = info.context.get("lsm_resource_info_loader_factory")
if make_loader is None:
return None
cache_key = f"_lsm_info_loader_{root.environment}"
loader = info.context.get(cache_key)
if loader is None:
loader = make_loader(root.environment)
info.context[cache_key] = loader
return await loader.load(root.resource_id)Three individual field resolvers (each call _load_lsm_data, extract one attribute):
async def _get_lsm_service_entity(root, info) -> Optional[str]: ...
async def _get_lsm_service_instance_id(root, info) -> Optional[uuid.UUID]: ...
async def _get_lsm_lifecycle_state(root, info) -> Optional[str]: ...Mixin class:
class _LsmResourceMixin:
lsm_service_entity: Optional[str] = strawberry.field(resolver=..., description="...")
lsm_service_instance_id: Optional[uuid.UUID] = strawberry.field(resolver=..., description="...")
lsm_lifecycle_state: Optional[str] = strawberry.field(resolver=..., description="...")Performance: No impact from having 3 resolvers instead of 1. The DataLoader batches
all loader.load(resource_id) calls within the same async tick into a single SQL query,
and caches results by key — subsequent load() calls for the same key return the cached
_LsmResourceData without a database round-trip. Result: still one SQL query per
environment per page regardless of which or how many inline LSM fields are requested.
LsmResourceFilterFields
@strawberry.input
class LsmResourceFilterFields:
service_entity: Optional[StrFilter] = strawberry.UNSET
service_instance: Optional[list[uuid.UUID]] = strawberry.UNSET
lifecycle_state: Optional[StrFilter] = strawberry.UNSET
service_instance_version: Optional[int] = strawberry.UNSET
include_owned: Optional[bool] = strawberry.UNSETDataLoader for inline LSM fields
get_context_loaders() returns {"lsm_resource_info_loader_factory": make_loader} where
make_loader is a per-environment factory creating a DataLoader on first use. One
DataLoader instance per environment per request; all resources on the same page share one
batch SQL query.
The batch query fetches all three values in one shot (selecting more columns than a caller
might need is negligible cost compared to an extra round-trip):
SELECT
elem AS resource_id,
lt.resource_id_value AS instance_id,
lt.attributes->>'service_entity' AS service_entity,
si.state AS lifecycle_state
FROM resource lt
JOIN resource_set_configuration_model rscm ON ...
JOIN (SELECT MAX(version) AS version FROM configurationmodel
WHERE environment = $1 AND released = TRUE) lv ON rscm.model = lv.version
LEFT JOIN lsm_serviceinstance si
ON si.id::text = lt.resource_id_value AND si.environment = lt.environment
CROSS JOIN LATERAL jsonb_array_elements_text(lt.attributes->'resources') elem
WHERE lt.environment = $1
AND lt.resource_type = 'lsm::LifecycleTransfer'
AND lt.attributes->>'resources' != '<<undefined>>'
AND elem = ANY($2)| return strawberry.input( | ||
| dataclasses.dataclass(kw_only=True)(type("ResourceFilter", resource_filter_components, {})) |
There was a problem hiding this comment.
Wouldn't it be simpler to move this part to the caller so that we don't have to return a tuple? Optionally make it a second helper method tuple[type[ResourceFilterABC], ...] -> type if you don't want the complexity in get_schema()?
| _resource_annotations: dict[str, object] = {} | ||
| _resource_attrs: dict[str, object] = {} | ||
| _skip_attrs = {"__dict__", "__weakref__", "__doc__", "__annotations__", "__module__"} | ||
| for _base in (CoreResourceBase, *_resource_mixins): | ||
| _resource_annotations.update(_base.__dict__.get("__annotations__", {})) | ||
| for _k, _v in _base.__dict__.items(): | ||
| if _k not in _skip_attrs: | ||
| _resource_attrs[_k] = _v |
There was a problem hiding this comment.
This level of meta-programming scares me a bit with respect to stability. I need to think if I see a better way.
There was a problem hiding this comment.
Yeah, I agree that this is one of the parts that scares me the most, Claude wrote it, for better or worse
| if _k not in _skip_attrs: | ||
| _resource_attrs[_k] = _v | ||
| # Can't do the same as the resource filter because the mixins can't have the mapper.type decorator and that is required | ||
| return mapper.type(models.Resource)(type("Resource", (), {"__annotations__": _resource_annotations, **_resource_attrs})) |
There was a problem hiding this comment.
If we use mapper.type(models.Resource), won't it prevent the query from returning anything other than a models.Resource object?
Open question. I don't feel like I understand the framework sufficiently to give an answer myself.
There was a problem hiding this comment.
No. It creates a strawberry type based on the attributes of models.Resource but it is not indeed the same object since we can add fields to it (we already do the same with the purged attribute for example)
| self.schema = get_schema( | ||
| compiler_service=self.compiler_service, extension_contributions=list(self.extension_contributions.values()) | ||
| ) |
There was a problem hiding this comment.
Nice. I don't recall if I've already reviewed since this change, but I prefer it a lot over the previous approach.
Co-authored-by: Sander Van Balen <[email protected]>
…ecks in the GraphQL schema with an `is_set` TypeGuard helper. (PR #10496) # Description Factored out of #10328. Introduces an `is_set` TypeGuard helper in `inmanta.graphql.schema` and uses it to replace the repeated `x is not None and x is not strawberry.UNSET` checks scattered through the GraphQL filter/pagination code. ```python def is_set[T](value: T | None) -> typing.TypeGuard[T]: return value is not None and value is not strawberry.UNSET ``` Because it's a `TypeGuard`, mypy narrows away `None`/`strawberry.UNSET` in the positive branch exactly like the inline check did, so there is no behavioural change — this is a pure readability refactor. Note: the one spot that passed `getattr(...)` (typed `Any`) now annotates `attr: CustomFilter | None` so the TypeGuard narrows to a concrete type instead of `Never`. # Self Check: - [x] Attached issue to pull request (N/A — refactor split out of #10328) - [x] Changelog entry - [x] Type annotations are present - [x] Code is clear and sufficiently documented - [x] No (preventable) type errors (verified with mypy: no new errors vs master) - [x] Sufficient test cases (no behaviour change; existing GraphQL filter/pagination tests cover the touched code) - [x] Correct, in line with design 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Description
Introduces
ResourceFilterABCwhich any extension can implement and register on theGraphQLSlice.These implementations are then composed into a single
ResourceFilterthat is then exposed to the user on GraphQL.closes Add ticket reference here
Self Check:
Strike through any lines that are not applicable (
~~line~~) then check the box