Issue/3608 alt 1#3643
Conversation
There was a problem hiding this comment.
Pull request overview
Implements the new semantic index query API access-control flow (#3608) by adding a Registry access-filter endpoint and updating semantic search to use a two-phase search (broad vector search + access filtering, then a fallback search constrained to accessible record IDs).
Changes:
- Added Registry
/records/access-filterendpoint and supporting request/response models + auth/tenant isolation tests. - Added Phase 1/Phase 2 access-control logic to semantic search service, plus query-builder support for recordId-scoped vector queries.
- Added TypeScript API clients and updated semantic-search API wiring/tests to pass JWT + tenant context.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| magda-typescript-common/src/SearchApiClient.ts | New client for Search API dataset search (used for Phase 2 fallback). |
| magda-typescript-common/src/RegistryApiClient.ts | New client for Registry access-filter endpoint (used for Phase 1 filtering). |
| magda-semantic-search-api/src/service/SemanticSearchService.ts | Implements two-phase access-aware semantic search flow. |
| magda-semantic-search-api/src/service/queryBuilder.ts | Adds recordId-scoped KNN query builder. |
| magda-semantic-search-api/src/api/createApiRouter.ts | Plumbs session + tenant headers into SearchParams. |
| magda-semantic-search-api/src/model.ts | Extends search params to carry jwt + tenantId. |
| magda-semantic-search-api/src/index.ts | Wires new clients into service + adds Search API URL CLI option. |
| magda-semantic-search-api/src/test/service/semanticSearchService.spec.ts | Adds unit tests for phase 1 filtering + phase 2 fallback behavior. |
| magda-semantic-search-api/src/test/service/queryBuilder.spec.ts | Adds coverage for new query-builder branches and recordId-scoped queries. |
| magda-semantic-search-api/src/test/searchRoute.spec.ts | Ensures headers are forwarded into service params + adds /retrieve route test. |
| magda-scala-common/src/main/scala/au/csiro/data61/magda/model/Registry.scala | Adds shared request/response case classes for access-filter endpoint. |
| magda-registry-api/src/main/scala/au/csiro/data61/magda/registry/RecordsServiceRO.scala | Adds the access-filter route to Registry API (read-only service). |
| magda-registry-api/src/test/scala/au/csiro/data61/magda/registry/RecordServiceAuthSpec.scala | Adds auth + tenant isolation + input sanitization tests for access-filter. |
| CHANGES.md | Notes the new semantic index query API/access-control changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
t83714
left a comment
There was a problem hiding this comment.
Well done 👍 Thanks for pulling this together & congrats to your first PR 🎉 - I've left a few comments. Please let me know your thoughts.
…ry/RecordsServiceRO.scala Co-authored-by: Jacky Jiang <[email protected]>
…ry/RecordsServiceRO.scala Co-authored-by: Jacky Jiang <[email protected]>
120e67c to
5a16d6a
Compare
t83714
left a comment
There was a problem hiding this comment.
@chensuihui thanks for the updates - left a few comments - I think we're very close 😄
A few extra points:
- noticed the the latest commit was force pushed - can we avoid force push in future? Force push is generally considered as dangerous because it breaks shared history. Moreover, it also could confuse diffs / PR discussions with suddenly change. Instead, you should merge remote, when out of sync with remote. e.g.
git merge --no-commit --no-ff origin/issue/3608-alt-1 - the PR since now has merge conflicts require fixing
- I sent your latest commit to CI for checking and got some errors. Can you have a look? https://gitlab.com/magda-data/magda/-/pipelines/2467160390
t83714
left a comment
There was a problem hiding this comment.
Great! I can see all pipelines passed 🚀
Left one commit (re:response status code) for a last-minute tweak~
After that I guess we're ready to merge the PR 🎉
| } | ||
| const parsed = Number(raw); | ||
| if (!Number.isInteger(parsed) || parsed < 0) { | ||
| throw new Error("Invalid X-Magda-Tenant-Id"); |
There was a problem hiding this comment.
Sorry - I guess my last review missed this - can we throw a BadRequestError here instead of Error? Right now, the Error will be forwarded to generic error middleware, returning 500 (internal server).
We probably can throw BadRequestError and capture it in the error middleware (in this file) and respond a proper 400 code (Bad Request).
|
@chensuihui I created a test release: https://github.com/magda-io/magda/releases/tag/v6.0.0-alpha.15 from your branch. Once the release job (https://gitlab.com/magda-data/magda/-/pipelines/2507504434) is done, you can use the release version number |
|
@chensuihui when you have time, please let me know how did you go with local deployment testing of https://github.com/magda-io/magda/releases/tag/v6.0.0-alpha.16 |
What this PR does
Fixes #3608
This PR implements the new semantic index query API access control flow.
Specifically, it:
filter records by accessendpoint to the Registry API so a list of record IDs can be filtered based on the current user's read accessChecklist