-
Notifications
You must be signed in to change notification settings - Fork 20
Feature: v1 endpoints for address stats with pagination and filtering #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new v1 API endpoint /api/v1/addresses/stats.json that provides address statistics with enhanced filtering, sorting, and pagination capabilities. The implementation leverages PostgreSQL materialized views with trigram indexing for efficient substring search.
Key changes:
- Database materialized view
address_total_message_statswith trigram indexing for fast address substring search - New query parameter schema with support for filtering by message type counts, sorting options, and pagination
- Backend functions for fetching address statistics and finding matching addresses
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
deployment/migrations/versions/0040_d6539a42cd51_create_address_summary_view.py |
Creates materialized view for address message counts with trigram index for substring search |
src/aleph/schemas/addresses_query_params.py |
Defines query parameters schema with filtering, sorting, and pagination support |
src/aleph/db/accessors/messages.py |
Adds fetch_stats_for_addresses and find_matching_addresses functions with SQL-based queries |
src/aleph/web/controllers/accounts.py |
Implements new v2 endpoint handler with pagination and custom JSON encoding for Decimal types |
src/aleph/web/controllers/routes.py |
Registers new v1 endpoint route |
tests/db/test_address_stats.py |
Comprehensive test coverage for address stats functions including filtering, sorting, and pagination |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nesitor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things done with AI that needs to be done in the proper way following the same patterns we already have and also preventing security issues.
deployment/migrations/versions/0041_d6539a42cd51_create_address_summary_view.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
deployment/migrations/versions/0040_d6539a42cd51_create_address_summary_view.py
Outdated
Show resolved
Hide resolved
d2ab60a to
d07f760
Compare
odesenfans
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a few bugs.
deployment/migrations/versions/0041_d6539a42cd51_create_address_summary_view.py
Outdated
Show resolved
Hide resolved
src/aleph/db/accessors/address.py
Outdated
| ).group_by(AddressStats.address) | ||
|
|
||
| # Filter by address (list) | ||
| if addresses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bug here. If find_matching_addresses returns an empty list (=no match), this will instead query all of the addresses. You should:
- Check for
if addresses is not None - Detect the case where there is no match and just not enter this function at all (no need for a DB query if we know that there is no match).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldnt it be better to return a 404 here if matched_address is an empty list ?
| self, | ||
| session: DbSession, | ||
| filters: Optional[Dict[str, Any]] = None, | ||
| ) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the logic behind this caching? Is querying the materialized view too slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching was to avoid re processed the total of address everytime,
If user use explorer, when switching page everytime we would re query the toal paginations on every request.
It's was mostly for the filter i added before but if we remove the filter parts we might also want to remove that, it's shouldn't be that expensive to query even if we query it 10 time.
| enum_filters = {SortBy(k): v for k, v in filters.items()} | ||
|
|
||
| # Pass per_page=0 to disable pagination for the count query | ||
| stmt = make_fetch_stats_address_query(filters=enum_filters, per_page=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You never specify address_contains here, so you always query all addresses.
src/aleph/db/accessors/address.py
Outdated
|
|
||
| address_query = ( | ||
| select(AddressTotalMessages.address) | ||
| .where(AddressTotalMessages.address.ilike(pattern)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use lower() to make sure that you hit the GIN index.
| .where(AddressTotalMessages.address.ilike(pattern)) | |
| .where(func.lower(AddressTotalMessages.address).like(pattern.lower())) |
src/aleph/db/accessors/address.py
Outdated
|
|
||
|
|
||
| def find_matching_addresses( | ||
| session: DbSession, address_contains: str, limit: int = 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5000 is a bit big. Could result in performance issues when using the result in later queries, especially for ones with lots of potential matches. It's not a huge problem right now, but maybe a subquery could be better? i.e. pass address_contains to make_fetch_stats_address_query and use a subquery if it is present?
| filters: Dict[SortBy, int] | None = Field( | ||
| default=None, | ||
| description="Minimum values required for each sort category. Example: { 'POST': 3 }", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this field? Defining it as a dictionary is also weird, how are you supposed to pass it as query parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i remember correct was something like:
?filters[POST]=300
The goal was to sort with minimum value exemple you want address who send store so you can just filter.
But i guess this isnt really usefull i can remove it
| return web.json_response(output, dumps=lambda v: json.dumps(v)) | ||
|
|
||
|
|
||
| async def addresses_stats_view_v2(request: web.Request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add at least a simple integration test that uses most features of the endpoint.
src/aleph/web/controllers/routes.py
Outdated
| ) | ||
|
|
||
| app.router.add_get("/api/v0/addresses/stats.json", accounts.addresses_stats_view) | ||
| app.router.add_get("/api/v1/addresses/stats.json", accounts.addresses_stats_view_v2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: a bit weird to have v1 use a function called v2. Name the original addresses_stats_view_v0 and the new one addresses_stats_view_v1.
…w and address_total_message_stats
… addresses_stats_view_v2 to addresses_stats_view_v1
…for faster search
7aa2dc4 to
d3bf43f
Compare
src/aleph/db/accessors/address.py
Outdated
| Subquery defining the set of addresses to include. | ||
| Only used when address filtering is requested. | ||
| """ | ||
| pattern = f"%{address_contains.lower()}%" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this, this is a risk for SQL injection. Remove this and use func.lower(AddressStats.address).contains(address_contains.lower()) instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/aleph/db/models/address.py
Outdated
| class ViewBase: | ||
| pass | ||
|
|
||
|
|
||
| # Create the base with the class as a template | ||
| Base = declarative_base(cls=ViewBase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you declare a new Base class? Any reason not to reuse the one that's in db/models/base.py?
This Pr goal is to make a new endpoints
/api/v1/addresses/stats.jsonto handle address stats with filter, paginationRelated Clickup or Jira tickets : ALEPH-XXX
Self proofreading checklist
Changes
This pull request introduces a new, efficient, and flexible system for querying address statistics, including a new API endpoint, database materialized views, and backend logic. The main focus is to enable advanced filtering, sorting, and substring search of addresses, along with robust pagination and improved performance. Comprehensive tests are also added to ensure correctness.
The most important changes are:
Database and Backend Infrastructure:
address_total_message_statsmaterialized view, which aggregates total message counts per address and includes indexes (including a trigram index) to support fast substring search and efficient sorting/filtering.messages.pyto:fetch_stats_for_addressesfor advanced address stats queries with filtering, sorting, and pagination.find_matching_addressesfor fast substring search using the new trigram index.API and Schema Enhancements:
AddressesQueryParamsschema, supporting flexible query parameters for filtering, sorting, and pagination of address statistics, including substring search./api/v1/addresses/stats.jsonwith theaddresses_stats_view_v2handler, which leverages the new backend logic and schema for efficient address stats queries. [1] [2]Testing: