Optimize(getItemParent): optimize pgsql getItemParent func to avoid scan full table#3203
Optimize(getItemParent): optimize pgsql getItemParent func to avoid scan full table#3203cw1427 wants to merge 1 commit intofossology:masterfrom
Conversation
|
@cw1427 |
|
@AnujRewar can it be verified with workflow? I updated the commit message on the PR. |
|
No , just give a look at other PR's commit , in last line you have to write signed off by also. e.g- Another failure appears in docker-test .The service is not yet listening on port 24693 when fo_cli -S runs. I will soon figure out solution for second issue |
…table scan during ununpack db update Avoids a full table scan by refactor the getItemParent func to increase upload_fk parms improving performance of ununpack DB update. Fixes: fossology#3194 Signed-off-by: cw1427 ([email protected])
|
Hi @cw1427 , Your commit lint check is failing: I will go through the changes, meanwhile you can fix commit message :) And also: @AnujRewar Ideally you should sign your commit for DCO. Using |
| CREATE OR REPLACE FUNCTION getItemParent(itemId Integer, uploadId Integer) | ||
| RETURNS Integer AS $$ | ||
| DECLARE | ||
| target_table text; | ||
| query_sql text; | ||
| result_id integer; | ||
| BEGIN | ||
| target_table := \'uploadtree_\' || uploadId; | ||
| query_sql := format(\' | ||
| WITH RECURSIVE file_tree(uploadtree_pk, parent, jump, path, cycle) AS ( | ||
| SELECT ut.uploadtree_pk, ut.parent, | ||
| true, | ||
| ARRAY[ut.uploadtree_pk], | ||
| false | ||
| FROM %I ut | ||
| WHERE ut.uploadtree_pk = %L | ||
| UNION ALL | ||
| SELECT ut.uploadtree_pk, ut.parent, | ||
| ut.ufile_mode & (1<<28) != 0, | ||
| path || ut.uploadtree_pk, | ||
| ut.uploadtree_pk = ANY(path) | ||
| FROM %I ut, file_tree ft | ||
| WHERE ut.uploadtree_pk = ft.parent | ||
| AND jump AND NOT cycle | ||
| ) | ||
| SELECT uploadtree_pk from file_tree ft WHERE NOT jump | ||
| \', target_table, itemId, target_table); | ||
| BEGIN | ||
| EXECUTE query_sql INTO result_id; | ||
| EXCEPTION WHEN undefined_table THEN | ||
| RETURN NULL; | ||
| END; | ||
| RETURN result_id; | ||
| END; | ||
| $$ LANGUAGE plpgsql STABLE STRICT; | ||
| '; |
There was a problem hiding this comment.
Hi @cw1427 ,
This change introduces a PL/pgSQL function with dynamic SQL where a static SQL function was sufficient.
The use of EXECUTE forces runtime parsing and planning on every call, prevents plan caching, and adds unnecessary overhead.
While dynamic table selection could be justified in a genuinely sharded schema, the current uploadtree_* tables are created based on file size and are not an inherent sharding strategy.
As a result, this change increases complexity and reduces performance without a clear architectural benefit.
If future work introduces true sharding or tenant-level table isolation, revisiting a dynamic approach would make sense but that does not appear to be the case today.
| last_pk, | ||
| batch_size); | ||
|
|
||
| result = PQexec(pgConn, SQL); |
There was a problem hiding this comment.
Similarly here, The batch processing and assigning the realparent doesn't bring the actual benefits.
Considering the issue for which this PR is opened, There is no logical connection too.
|
Thanks for the contributions @cw1427 , I will be closing this PR now. Changes suggested doesnt align with the issue mentioned and overall fossology architecture. |
Pgsql getItemParent func optimization to receive one more extra parameter uploadId to force assign the target table for searching parentId to avoid scan full table.
Description
Related discusstion here: #3194
Changes
List the changes done to fix a bug or introducing a new feature.
How to test
Describe the steps required to test the changes proposed in the pull request.
Please consider using the closing keyword if the pull request is proposed to
fix an issue already created in the repository
(https://help.github.com/articles/closing-issues-using-keywords/)