Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Optimize(getItemParent): optimize pgsql getItemParent func to avoid scan full table#3203

Closed
cw1427 wants to merge 1 commit intofossology:masterfrom
cw1427:master
Closed

Optimize(getItemParent): optimize pgsql getItemParent func to avoid scan full table#3203
cw1427 wants to merge 1 commit intofossology:masterfrom
cw1427:master

Conversation

@cw1427
Copy link

@cw1427 cw1427 commented Dec 24, 2025

Pgsql getItemParent func optimization to receive one more extra parameter uploadId to force assign the target table for searching parentId to avoid scan full table.

Description

Related discusstion here: #3194

Changes

List the changes done to fix a bug or introducing a new feature.

How to test

Describe the steps required to test the changes proposed in the pull request.

Please consider using the closing keyword if the pull request is proposed to
fix an issue already created in the repository
(https://help.github.com/articles/closing-issues-using-keywords/)

@AnujRewar
Copy link
Contributor

@cw1427
Nice work on this.
Just a small suggestion: it might be good to align the commit syntax with the usual FOSSology style for consistency with other PRs that pass all checks.
You can refer to documentation or recent merged PRs with all green checks for examples.

@cw1427 cw1427 changed the title Optimize getItemParent psql func to avoid full table scan Optimize(getItemParent): optimize pgsql getItemParent func to avoid scan full table Dec 25, 2025
@cw1427
Copy link
Author

cw1427 commented Dec 26, 2025

@AnujRewar can it be verified with workflow? I updated the commit message on the PR.

@AnujRewar
Copy link
Contributor

AnujRewar commented Dec 26, 2025

No , just give a look at other PR's commit , in last line you have to write signed off by also. e.g-
Signed-off-by: Anuj Rewar [email protected]
image

Another failure appears in docker-test .The service is not yet listening on port 24693 when fo_cli -S runs.
image

image

I will soon figure out solution for second issue

…table scan during ununpack db update

Avoids a full table scan by refactor the getItemParent func to increase upload_fk parms
improving performance of ununpack DB update.
Fixes: fossology#3194


Signed-off-by: cw1427 ([email protected])
@Kaushl2208
Copy link
Member

Hi @cw1427 , Your commit lint check is failing:

Here are the issues:
image

I will go through the changes, meanwhile you can fix commit message :)

And also: @AnujRewar Ideally you should sign your commit for DCO. Using ssh or gpg signing. You can take a look here: Sign your commit

Comment on lines +1041 to +1076
CREATE OR REPLACE FUNCTION getItemParent(itemId Integer, uploadId Integer)
RETURNS Integer AS $$
DECLARE
target_table text;
query_sql text;
result_id integer;
BEGIN
target_table := \'uploadtree_\' || uploadId;
query_sql := format(\'
WITH RECURSIVE file_tree(uploadtree_pk, parent, jump, path, cycle) AS (
SELECT ut.uploadtree_pk, ut.parent,
true,
ARRAY[ut.uploadtree_pk],
false
FROM %I ut
WHERE ut.uploadtree_pk = %L
UNION ALL
SELECT ut.uploadtree_pk, ut.parent,
ut.ufile_mode & (1<<28) != 0,
path || ut.uploadtree_pk,
ut.uploadtree_pk = ANY(path)
FROM %I ut, file_tree ft
WHERE ut.uploadtree_pk = ft.parent
AND jump AND NOT cycle
)
SELECT uploadtree_pk from file_tree ft WHERE NOT jump
\', target_table, itemId, target_table);
BEGIN
EXECUTE query_sql INTO result_id;
EXCEPTION WHEN undefined_table THEN
RETURN NULL;
END;
RETURN result_id;
END;
$$ LANGUAGE plpgsql STABLE STRICT;
';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cw1427 ,

This change introduces a PL/pgSQL function with dynamic SQL where a static SQL function was sufficient.
The use of EXECUTE forces runtime parsing and planning on every call, prevents plan caching, and adds unnecessary overhead.
While dynamic table selection could be justified in a genuinely sharded schema, the current uploadtree_* tables are created based on file size and are not an inherent sharding strategy.
As a result, this change increases complexity and reduces performance without a clear architectural benefit.

If future work introduces true sharding or tenant-level table isolation, revisiting a dynamic approach would make sense but that does not appear to be the case today.

last_pk,
batch_size);

result = PQexec(pgConn, SQL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, The batch processing and assigning the realparent doesn't bring the actual benefits.

Considering the issue for which this PR is opened, There is no logical connection too.

@Kaushl2208
Copy link
Member

Thanks for the contributions @cw1427 , I will be closing this PR now. Changes suggested doesnt align with the issue mentioned and overall fossology architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants