Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@laceysanderson
Copy link
Member

@laceysanderson laceysanderson commented Apr 2, 2024

Tripal 4 Core Dev Task

Issue #1716

Tripal Version: 4

Description

This PR fixes issues when you run two publish jobs in the same tripal job. Specifically, when you publish two content types in the same Tripal job, the values from the first publish are leaked into the second publish.

I believe this is happening due to some unwritten Drupal rule in how the TripalPublish service is written. I think it's related to the fact that this service uses an init() method to setup everything rather then typical dependency injection with a constructor... but I originally got so caught up in writing a test to demonstrate the problem that I couldn't get my brain to move on to finding the fix.

Luckily, after posting this as a draft PR and asking for help, @dsenalik realized that we could just reset the leaked values in the init() 😅 which is a really clean solution. It may not be ideal and we may run into other issues with this service in the future... but based on my understanding of what is happening, I do think it is a good, reliable fix :-) And I don't know what the ideal fix is so I say we go with this! 🤪

What exactly does this PR do?

  1. Updates the TripalPublish::init() and other variables to ensure they are reset on subsequent runs of this service.
  2. Creates two simple tests that actually run the publish: these were failing before fix no. 1 and are now working. They are not ideal and do not have as many assertions as I would like but they will help us catch situations where the publish job straight up fails so they are worth keeping.
  3. Creates a new testing helper method to help setup content types with fields based on the YAML. This replaces the current process where you have to first setup the content type and then attach the fields in the test. Plus it ensures the content type you are testing has all the default fields.

Testing?

Setup Testing environment

  1. Create a fresh site/docker
  2. Use Drush php:cli or devel php code to add records to Chado. The following is how I do this for Organism + Contacts. Alternatively you could load a GFF3 file.
$connection = \Drupal::service('tripal_chado.database');
for ($i=1; $i <= 10; $i++) {
  $species = 'databasica' . uniqid();
  $infra_name = 'postgresquelus' . $i; 
  $connection->insert('1:organism')
    ->fields([
      'genus' => 'Tripalus', 
      'species' => $species, 
      'infraspecific_name' => $infra_name, 
      'type_id' => 500
    ])->execute();
}
for ($i=1; $i <= 5; $i++) {
  $name = 'Tripal' . uniqid() . ' collaborator #' . $i;
  $description = md5(mt_rand()).' '.md5(mt_rand());
  $connection->insert('1:contact')
    ->fields([
      'name' => $name, 
      'description' => $description, 
      'type_id' => 500
    ])->execute();
}
for ($i=1; $i <= 15; $i++) {
  $name = 'Project TestingPublish' . uniqid() . ' interation #' . $i;
  $description = md5(mt_rand()).' '.md5(mt_rand());
  $connection->insert('1:project')
    ->fields([
      'name' => $name, 
      'description' => $description, 
    ])->execute();
}

Single publish job per call to job launcher works

  1. Run tripal jobs to see that the queue is empty.
  2. Go to Admin > Tripal > Content > Publish and choose a random content type with unpublished data. If you setup as I did above then you can use "contact".
  3. Immediately run the tripal job to complete the publishing and then confirm.
  4. Repeat above steps exactly and confirm all general content types publish perfectly.

When multiple publish jobs are in the queue, the second one fails.

From a fresh install of Tripal or a fresh docker container

  1. Run tripal jobs to see that the queue is empty.
  2. Go to Admin > Tripal > Content > Publish and choose a random content type with unpublished data. If you setup as I did above, then you can use "organism".
  3. WITHOUT RUNNING THE JOB LAUNCHER, repeat step 2 for another content type. If you setup as I did above then you can use "project".
  4. Now run the tripal jobs and confirm you see a big error and the second content type, regardless of what you chose, failed because it tried to publish data from fields for the first content type.

@laceysanderson laceysanderson added bug - confirmed For issues where a core developer has confirmed a bug exists. Tripal 4 Follow-up Needed Any issue needing follow up from one of the core developers. Group 1 - Tripal Content Types | Terms | Fields Any issue relating to Tripal Content including types, terms, and fields. Group 2 - Data Storage | Tripal DBX | Chado Any issue relating to biological data storage, Tripal DBX and Chado integration, Materialized Views labels Apr 2, 2024
@dsenalik
Copy link
Contributor

dsenalik commented Apr 4, 2024

I think what is happening is that some class variables retain their values from one job to the next.
I don't understand why, but a simple fix is to just reinitialize them all in the init() function.
tripal/src/api/tripal.publish.api.php calls this like so:

    $publish = \Drupal::service('tripal.publish');
...
    $publish->init($bundle, $datastore, $options, $job);
    $publish->publish();

So commit 256956c fixes this, you can run two consecutive publish jobs now.

To be decided is if we want to be idealistic and work on and finish the tests, or accept this change and work on other stuff.

@dsenalik dsenalik marked this pull request as ready for review April 8, 2024 16:12
@laceysanderson
Copy link
Member Author

I just ran through a manual test again and this is working beautifully! As such, I'm opening it up for review. Thanks for the help @dsenalik ❤️

@pdtouch
Copy link
Contributor

pdtouch commented Apr 16, 2024

I tested both Single and Multiple publish with both the Main Tripal 4 and the tv4g1-issue1716-dataLeakBetweenPublishCalls branches and I notice there is no error message with the tv4g1-issue1716-dataLeakBetweenPublishCalls branch when using Single or Multiple content publications. There is no data leak between publish calls in the tv4g1-issue1716-dataLeakBetweenPublishCalls branch while the error is visible with current Tripal 4 main branch.

@laceysanderson
Copy link
Member Author

Thanks @pdtouch for the review! That means this is ready to merge!

@laceysanderson laceysanderson merged commit bbcebb2 into 4.x Apr 16, 2024
@laceysanderson laceysanderson deleted the tv4g1-issue1716-dataLeakBetweenPublishCalls branch April 16, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug - confirmed For issues where a core developer has confirmed a bug exists. Follow-up Needed Any issue needing follow up from one of the core developers. Group 1 - Tripal Content Types | Terms | Fields Any issue relating to Tripal Content including types, terms, and fields. Group 2 - Data Storage | Tripal DBX | Chado Any issue relating to biological data storage, Tripal DBX and Chado integration, Materialized Views

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trying to publish 2+ content types in the same call to the Tripal job launcher fails

4 participants