Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@1yam
Copy link
Member

@1yam 1yam commented Dec 15, 2025

This Pr goal is to make a new endpoints /api/v1/addresses/stats.json to handle address stats with filter, pagination

Related Clickup or Jira tickets : ALEPH-XXX

Self proofreading checklist

  • Is my code clear enough and well documented
  • Are my files well typed
  • New translations have been added or updated if new strings have been introduced in the frontend
  • Database migrations file are included
  • Are there enough tests
  • Documentation has been included (for new feature)

Changes

This pull request introduces a new, efficient, and flexible system for querying address statistics, including a new API endpoint, database materialized views, and backend logic. The main focus is to enable advanced filtering, sorting, and substring search of addresses, along with robust pagination and improved performance. Comprehensive tests are also added to ensure correctness.

The most important changes are:

Database and Backend Infrastructure:

  • Added a new Alembic migration to create the address_total_message_stats materialized view, which aggregates total message counts per address and includes indexes (including a trigram index) to support fast substring search and efficient sorting/filtering.
  • Updated the backend logic in messages.py to:
    • Add fetch_stats_for_addresses for advanced address stats queries with filtering, sorting, and pagination.
    • Add find_matching_addresses for fast substring search using the new trigram index.
    • Ensure materialized views are refreshed together for up-to-date stats.

API and Schema Enhancements:

  • Introduced the AddressesQueryParams schema, supporting flexible query parameters for filtering, sorting, and pagination of address statistics, including substring search.
  • Added a new API endpoint /api/v1/addresses/stats.json with the addresses_stats_view_v2 handler, which leverages the new backend logic and schema for efficient address stats queries. [1] [2]

Testing:

  • Added comprehensive tests for address stats functions, covering substring search, filtering, sorting, and pagination to ensure correctness and robustness of the new querying system.

Copilot AI review requested due to automatic review settings December 15, 2025 14:05
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new v1 API endpoint /api/v1/addresses/stats.json that provides address statistics with enhanced filtering, sorting, and pagination capabilities. The implementation leverages PostgreSQL materialized views with trigram indexing for efficient substring search.

Key changes:

  • Database materialized view address_total_message_stats with trigram indexing for fast address substring search
  • New query parameter schema with support for filtering by message type counts, sorting options, and pagination
  • Backend functions for fetching address statistics and finding matching addresses

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
deployment/migrations/versions/0040_d6539a42cd51_create_address_summary_view.py Creates materialized view for address message counts with trigram index for substring search
src/aleph/schemas/addresses_query_params.py Defines query parameters schema with filtering, sorting, and pagination support
src/aleph/db/accessors/messages.py Adds fetch_stats_for_addresses and find_matching_addresses functions with SQL-based queries
src/aleph/web/controllers/accounts.py Implements new v2 endpoint handler with pagination and custom JSON encoding for Decimal types
src/aleph/web/controllers/routes.py Registers new v1 endpoint route
tests/db/test_address_stats.py Comprehensive test coverage for address stats functions including filtering, sorting, and pagination

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@nesitor nesitor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some things done with AI that needs to be done in the proper way following the same patterns we already have and also preventing security issues.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@1yam 1yam force-pushed the 1yam-address-improvment branch from d2ab60a to d07f760 Compare December 22, 2025 09:31
Copy link
Collaborator

@odesenfans odesenfans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a few bugs.

).group_by(AddressStats.address)

# Filter by address (list)
if addresses:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bug here. If find_matching_addresses returns an empty list (=no match), this will instead query all of the addresses. You should:

  1. Check for if addresses is not None
  2. Detect the case where there is no match and just not enter this function at all (no need for a DB query if we know that there is no match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self,
session: DbSession,
filters: Optional[Dict[str, Any]] = None,
) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the logic behind this caching? Is querying the materialized view too slow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caching was to avoid re processed the total of address everytime,
If user use explorer, when switching page everytime we would re query the toal paginations on every request.

It's was mostly for the filter i added before but if we remove the filter parts we might also want to remove that, it's shouldn't be that expensive to query even if we query it 10 time.

enum_filters = {SortBy(k): v for k, v in filters.items()}

# Pass per_page=0 to disable pagination for the count query
stmt = make_fetch_stats_address_query(filters=enum_filters, per_page=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You never specify address_contains here, so you always query all addresses.


address_query = (
select(AddressTotalMessages.address)
.where(AddressTotalMessages.address.ilike(pattern))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use lower() to make sure that you hit the GIN index.

Suggested change
.where(AddressTotalMessages.address.ilike(pattern))
.where(func.lower(AddressTotalMessages.address).like(pattern.lower()))



def find_matching_addresses(
session: DbSession, address_contains: str, limit: int = 5000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5000 is a bit big. Could result in performance issues when using the result in later queries, especially for ones with lots of potential matches. It's not a huge problem right now, but maybe a subquery could be better? i.e. pass address_contains to make_fetch_stats_address_query and use a subquery if it is present?

Comment on lines 57 to 60
filters: Dict[SortBy, int] | None = Field(
default=None,
description="Minimum values required for each sort category. Example: { 'POST': 3 }",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this field? Defining it as a dictionary is also weird, how are you supposed to pass it as query parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i remember correct was something like:

?filters[POST]=300

The goal was to sort with minimum value exemple you want address who send store so you can just filter.
But i guess this isnt really usefull i can remove it

return web.json_response(output, dumps=lambda v: json.dumps(v))


async def addresses_stats_view_v2(request: web.Request):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add at least a simple integration test that uses most features of the endpoint.

)

app.router.add_get("/api/v0/addresses/stats.json", accounts.addresses_stats_view)
app.router.add_get("/api/v1/addresses/stats.json", accounts.addresses_stats_view_v2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: a bit weird to have v1 use a function called v2. Name the original addresses_stats_view_v0 and the new one addresses_stats_view_v1.

@1yam 1yam force-pushed the 1yam-address-improvment branch from 7aa2dc4 to d3bf43f Compare January 6, 2026 15:37
Subquery defining the set of addresses to include.
Only used when address filtering is requested.
"""
pattern = f"%{address_contains.lower()}%"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this, this is a risk for SQL injection. Remove this and use func.lower(AddressStats.address).contains(address_contains.lower()) instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* use same model for "data" in v0 and v1
* return ints in all cases
* reduce duplication between v0 and v1
Comment on lines 6 to 11
class ViewBase:
pass


# Create the base with the class as a template
Base = declarative_base(cls=ViewBase)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you declare a new Base class? Any reason not to reuse the one that's in db/models/base.py?

@odesenfans odesenfans merged commit cfa8466 into main Jan 7, 2026
5 checks passed
@odesenfans odesenfans deleted the 1yam-address-improvment branch January 7, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants