Description
We're getting this error on running massiv unit testing:
ORA-00001: unique constraint (UT3.UT_OUTPUT_CLOB_BUFFER_TMP_PK) violated
ORA-06512: at "UT3.UT_RUNNER", line 180
ORA-06512: at "UT3.UT_OUTPUT_CLOB_TABLE_BUFFER", line 40
ORA-06512: at "UT3.UT_OUTPUT_REPORTER_BASE", line 53
ORA-06512: at "UT3.UT_DEBUG_REPORTER", line 54
ORA-06512: at "UT3.UT_EVENT_MANAGER", line 70
ORA-06512: at "UT3.UT_EVENT_MANAGER", line 77
ORA-06512: at "UT3.UT_EXECUTABLE", line 167
ORA-06512: at "UT3.UT_EXECUTABLE", line 44
ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 178
ORA-06512: at "UT3.UT_EXECUTABLE_TEST", line 38
ORA-06512: at "UT3.UT_TEST", line 79
ORA-06512: at "UT3.UT_SUITE_ITEM", line 49
ORA-06512: at "UT3.UT_SUITE", line 66
ORA-06512: at "UT3.UT_RUN", line 67
ORA-06512: at "UT3.UT_SUITE_ITEM", line 49
ORA-06512: at "UT3.UT_RUNNER", line 172
ORA-06512: at line 1
The scenario is quite complex.
- You need a lot of tests e.g. (actually we run >8000)
- You need to run them parallel in multiple sessions (totally stressed db for at least 60 minutes)
- each session runs a "suite" which represents a "moudle" in the sofware and takes a couple of minutes to finsh if run "stand alone"
- The test-runs do have overlapping code (e.g. partly covering the same "core" packages across different modules)
On running the unit test suits - the "end" of each suite is random because of DB performance, execution time etc..
Checking the code behind the mentioned table / type code: it looks like the system runs into a race condition which triggers the error above.
Please just review the code and assume if this can happen - (type body ut_output_clob_table_buffer)
as far as i can see - the error is because of:
in type: body ut_output_clob_table_buffer the content of the mentioned table is "controled".
The problem i see is, that on staring multiple UT suite runs at the same time, the value of message_id is set for each instance.
During execution - even if covered by autonomous_transactions - the message_id is reset quite often by the functions
send_lines, send_clob
here - we see self.last_message_id := self.last_message_id + sql%rowcount;
So - in theory - what happens to a self.last_message_id if the sql%rowcount is added while antother session is doing the same at the same moment on the same $$plsql_unit with a lower initalized message_id.
How is that kept in sync?
Because of self.last_message_id := self.last_lastmessage_id +1 and self.last_message_id + rownum in other parallel running sessions, it's obviously happening, that the message_id can overlap.
Furthermore - there is an issue on get_lines in the same package.
On the scenario above - the "waiting" for output // chunking is not optimal, too.
In case the data is read by process 1 there is a small wait time on each chunk.
The system is send to sleep for a short time and therefore will lock the table // data?
But why - it's a autonomous_transactions and finally the code wants to delete the data.