Thanks to visit codestin.com
Credit goes to github.com

Skip to content

UT_OUTPUT_CLOB_BUFFER_TMP_PK violation on parallel #1128

Closed
@al-hexagon

Description

@al-hexagon

We're getting this error on running massiv unit testing:

ORA-00001: unique constraint (UT3.UT_OUTPUT_CLOB_BUFFER_TMP_PK) violated
ORA-06512: at "UT3.UT_RUNNER", line 180
ORA-06512: at "UT3.UT_OUTPUT_CLOB_TABLE_BUFFER", line 40
ORA-06512: at "UT3.UT_OUTPUT_REPORTER_BASE", line 53
ORA-06512: at "UT3.UT_DEBUG_REPORTER", line 54
ORA-06512: at "UT3.UT_EVENT_MANAGER", line 70
ORA-06512: at "UT3.UT_EVENT_MANAGER", line 77
ORA-06512: at "UT3.UT_EXECUTABLE", line 167
ORA-06512: at "UT3.UT_EXECUTABLE", line 44
ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 178
ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 38
ORA-06512: at "UT3.UT_TEST", line 79
ORA-06512: at "UT3.UT_SUITE_ITEM", line 49
ORA-06512: at "UT3.UT_SUITE", line 66
ORA-06512: at "UT3.UT_RUN", line 67
ORA-06512: at "UT3.UT_SUITE_ITEM", line 49
ORA-06512: at "UT3.UT_RUNNER", line 172
ORA-06512: at line 1

The scenario is quite complex.

  1. You need a lot of tests e.g. (actually we run >8000)
  2. You need to run them parallel in multiple sessions (totally stressed db for at least 60 minutes)
  3. each session runs a "suite" which represents a "moudle" in the sofware and takes a couple of minutes to finsh if run "stand alone"
  4. The test-runs do have overlapping code (e.g. partly covering the same "core" packages across different modules)

On running the unit test suits - the "end" of each suite is random because of DB performance, execution time etc..
Checking the code behind the mentioned table / type code: it looks like the system runs into a race condition which triggers the error above.

Please just review the code and assume if this can happen - (type body ut_output_clob_table_buffer)

as far as i can see - the error is because of:

in type: body ut_output_clob_table_buffer the content of the mentioned table is "controled".

The problem i see is, that on staring multiple UT suite runs at the same time, the value of message_id is set for each instance.
During execution - even if covered by autonomous_transactions - the message_id is reset quite often by the functions

send_lines, send_clob

here - we see self.last_message_id := self.last_message_id + sql%rowcount;

So - in theory - what happens to a self.last_message_id if the sql%rowcount is added while antother session is doing the same at the same moment on the same $$plsql_unit with a lower initalized message_id.
How is that kept in sync?
Because of self.last_message_id := self.last_lastmessage_id +1 and self.last_message_id + rownum in other parallel running sessions, it's obviously happening, that the message_id can overlap.

Furthermore - there is an issue on get_lines in the same package.
On the scenario above - the "waiting" for output // chunking is not optimal, too.
In case the data is read by process 1 there is a small wait time on each chunk.
The system is send to sleep for a short time and therefore will lock the table // data?
But why - it's a autonomous_transactions and finally the code wants to delete the data.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions