Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sakim8048
Copy link
Contributor

This PR includes two major updates. Updated restart option will restart optimization or vibration calculations if previous runs are unexpectedly terminated. Also this PR will allow users to run Pynta efficiently on ALCF Polaris machine. Details are described below:


  1. Restart Option

Upon running restart(), it will retrieve Fireworks workflow information, including the workflow ID number, task ID number, task states, and launch directories where unexpectedly terminated calculations were running. Before rerunning Fireworks for the incomplete runs, all necessary files, such as optimization trajectory files or vib folders, will be copied and sent to the destination directory.

If task states are not completed (e.g., fizzled or lost runs), the optimization runs will restart from the last geometry of the optimization trajectory file in the previous launch directory. In the case of a vibration restart, empty vibration JSON files will be deleted from the vib folder before rerunning the vibration.

  1. Running Pynta on ALCF Polaris

With Raymundo's efforts, this PR allows Pynta to run on ALCF Polaris with a single queue allocation. Raymundo updated the way Pynta maps tasks on each node for ALCF machines. Each task runs on a different Fireworker, and each Fireworker is associated with a node. This is available for multilauncher. The optimal approach is to set num_jobs in Pynta input script to the number of nodes.

rayhe88 and others added 8 commits March 24, 2025 14:25
…start job. Delete empty vibration cache file in vib folder for vibration restart job
add machine keyword in firework task

Function copyDataAndSave() is added to copy a file from a origin subdirectory to a destination subdirectory

add machine keyword to vibration keywords in setup_adsorbates
Function copyDataAndSave() is added to copy a file from a origin subdirectory to a destination subdirectory

add machine keywords to optimization and vibration firework tasks

add machine keyword in firework task

Function copyDataAndSave() is added to copy a file from a origin subdirectory to a destination subdirectory

add machine keyword to vibration keywords in setup_adsorbates

add machine keywords to optimization and vibration firework tasks
…t task and delete empty json file in vib folder for vib task
@sakim8048 sakim8048 requested review from mjohnson541 and tdprice-858 and removed request for mjohnson541 March 28, 2025 00:24
Copy link
Contributor

@mjohnson541 mjohnson541 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I went over most of it.

import os
from fireworks.core.fworker import FWorker

def createCommand(node, software):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with support for NERSC and general ALCF machines, but expecting users to write a whole new file for each machine is not tenable. If this polaris specific we need to redesign. It looks more like a hack to get this to work.

We can simply establish a better protocol to pass this information. I would recommend we write commands in the format:
"mpiexec --hosts {node} ......... {binary} PREFIX...." then we can instead simply tell users in the documentation if you put {node} or {binary} in the command code Pynta will automatically insert that. Then we don't need this.

import os
from fireworks.core.fworker import FWorker

def createCommand(node, software):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function names should be snake case (also below)

@@ -0,0 +1,297 @@
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly this code was taken from somewhere. How is this code licensed? Can we put it in Pynta? Where is it from and is there a reason this isn't included in fireworks?



# TODO: why is loglvl a required parameter??? Also nlaunches and sleep_time could have a sensible default??
def launch_multiprocess2(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely needs a different name. What distinguishes this from launch_multiprocess?

from fireworks.core.fworker import FWorker
import fireworks.fw_config
import logging
#restart RHE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these attribution comments. They are confusing, especially here and if anyone wants to know who edited a section of the code they can look at the blame.

del constraint_dict["type"]
return constructor(**constraint_dict)

def copyDataAndSave(origin, destination, file):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake case


def copyDataAndSave(origin, destination, file):
'''
Function to copy a file from a origin subdirectory to a destination subdirectory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this function needs to exist. What makes it unviable to just use shutil.copy?


vib_obj_dict = {"software": self.software, "label": adsname, "software_kwargs": software_kwargs,
"machine": self.machine, "constraints": ["freeze up to "+str(self.nslab)]}
vib_obj_dict = {"software": self.software, "label": ad, "software_kwargs": software_kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is messed up with the changes in this line, I think you were doing changes in one commit and then undoing them in another.

pynta/main.py Outdated
print(f'No directory named "vib" found in {src}.')
print('No vibration calculations executed: Check optimization runs are finished and optimized geometries are collected.')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these whitespace changes.

pynta/main.py Outdated
xyz = os.path.join(self.path,"Adsorbates",adsname,str(prefix),str(prefix)+".xyz")
xyzs.append(xyz)
fwopt = optimize_firework(os.path.join(self.path,"Adsorbates",adsname,str(prefix),str(prefix)+"_init.xyz"),
self.machine,self.software,"weakopt_"+str(prefix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this should have probably been squashed with the changes in main...which may help solve the issue with some of the doing-undoing changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants