Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@a270105
Copy link
Collaborator

@a270105 a270105 commented Jan 29, 2025

Tracer parallelisation can be switched on by __usetp in fesom2/src/CMakeLists.txt and onlye used when recom is on.

ogurses and others added 8 commits October 15, 2024 15:08
Define new variables to track tracer changes
due to advection and diffusion.

We want to save for now diffusion and advection
contribution to the tracer changes. Horizontal and
vertical diffusion includes Redi
parametrization (if it is set .true.).
Fill __ciso directive to ensure that
carbon isotope code works. Medusa interface is
added.
@a270105
Copy link
Collaborator Author

a270105 commented Jan 29, 2025

By compilation I found an issue regrading new compilations. I need to manually delete the executable file fesom.x in fesom2.build/bin and then compile the model. Otherwise, after a successful compilation I will still find the old fesom.x in fesom2/bin.

Copy link
Collaborator

@JanStreffing JanStreffing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of review:
There were a few places that need changing, some that may not need to change but would require a closer look that everything is indeed ok. There are a number of style improvements to be made. All comment that start with ! kh date should start without the name and date. Same for !YY, !O:G !OG and !.OG. Git blame can tell you this info.

In addition to the standalone FESOM2 CI tests, this branch needs to be tested by me as part of AWI-CM3, by @ackerlar as part of AWI-ESM2, by @sebastianbeyer or @suvarchal as part of IFS-FESOM, by @mbutzin with active tracers. I'm assuming here, that you have already tested this branch with recom and without _usetp

set(USE_ICEPACK OFF CACHE BOOL "compile fesom with the Iceapck modules for sea ice column physics.")
set(OPENMP_REPRODUCIBLE OFF CACHE BOOL "serialize OpenMP loops that are critical for reproducible results")
set(RECOM_COUPLED OFF CACHE BOOL "compile fesom including biogeochemistry, REcoM3")
set(RECOM_COUPLED ON CACHE BOOL "compile fesom including biogeochemistry, REcoM3")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be turned back for the main branch.

integer :: AB_order=2
integer :: ID
!___________________________________________________________________________
! TODO: Make it as a part of namelist.tra
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done before more?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to clarify with Özgür

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanStreffing
Copy link
Collaborator

In addition to the changes of the review, the merge conflicts should be solved. Most of them look like simple double additions of new features that both have to be kept. e.g. parallel tracers and icebergs.

@JanStreffing JanStreffing added the enhancement New feature or request label Feb 3, 2025
@ackerlar
Copy link
Collaborator

ackerlar commented Feb 5, 2025

Is there any template fesom-recom yaml that I can use for testing?

@ackerlar
Copy link
Collaborator

ackerlar commented Feb 5, 2025

I get this error during compilation:

/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/associate_mesh_ass.h(57): warning #5117: Bad # preprocessor line
#if defined(__async_icebergs)
-^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/associate_mesh_ass.h(59): warning #5117: Bad # preprocessor line
#endif
-^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(363): error #6784: The number of actual arguments cannot be greater than the number of dummy arguments.   [OASIS_INIT_COMP]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)
---------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(961): error #6404: This name does not have a type, and must have an explicit type.   [MPI_COMM_FESOM_SAME_RANK_IN_GROUPS]
      call MPI_Bcast(action, 1, MPI_LOGICAL, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
------------------------------------------------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(961): error #6404: This name does not have a type, and must have an explicit type.   [MPIERR]
      call MPI_Bcast(action, 1, MPI_LOGICAL, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
------------------------------------------------------------------------------------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(976): error #6404: This name does not have a type, and must have an explicit type.   [MYDIM_NOD2D]
          call MPI_Bcast(data_array, myDim_nod2d, MPI_DOUBLE_PRECISION, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
-------------------------------------^
compilation aborted for /home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:205: src/CMakeFiles/fesom.dir/cpl_driver.F90.o] Error 1

@a270105
Copy link
Collaborator Author

a270105 commented Feb 5, 2025

I am still testing the code. So far I can run ocean-only without __usetp in CMakeLists and compile the model with __usetp. But I have not yet tested the coupled setup.

@a270105
Copy link
Collaborator Author

a270105 commented Feb 5, 2025

I get this error during compilation:

/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/associate_mesh_ass.h(57): warning #5117: Bad # preprocessor line
#if defined(__async_icebergs)
-^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/associate_mesh_ass.h(59): warning #5117: Bad # preprocessor line
#endif
-^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(363): error #6784: The number of actual arguments cannot be greater than the number of dummy arguments.   [OASIS_INIT_COMP]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)
---------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(961): error #6404: This name does not have a type, and must have an explicit type.   [MPI_COMM_FESOM_SAME_RANK_IN_GROUPS]
      call MPI_Bcast(action, 1, MPI_LOGICAL, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
------------------------------------------------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(961): error #6404: This name does not have a type, and must have an explicit type.   [MPIERR]
      call MPI_Bcast(action, 1, MPI_LOGICAL, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
------------------------------------------------------------------------------------^
/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(976): error #6404: This name does not have a type, and must have an explicit type.   [MYDIM_NOD2D]
          call MPI_Bcast(data_array, myDim_nod2d, MPI_DOUBLE_PRECISION, 0, MPI_COMM_FESOM_SAME_RANK_IN_GROUPS, MPIerr)
-------------------------------------^
compilation aborted for /home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:205: src/CMakeFiles/fesom.dir/cpl_driver.F90.o] Error 1

I just compiled it with FESOM_COUPLED ON and didn't get any error. Did you compile the model with esm-tools?

@ackerlar
Copy link
Collaborator

ackerlar commented Feb 5, 2025

@a270105 yes, I used esm_tools: esm_master get-awiesm-2.6, changed the FESOM branch to fesom2.6_recom_tp and then esm_master comp-awiesm-2.6

which oasis version are you using?

@ackerlar
Copy link
Collaborator

ackerlar commented Feb 5, 2025

I got at least rid of this error

/home/a/a270124/model_codes/testing_and_debugging/awiesm-2.6/fesom-2.6/src/cpl_driver.F90(363): error #6784: The number of actual arguments cannot be greater than the number of dummy arguments.   [OASIS_INIT_COMP]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)

when switching to oasis branch feat/multi-group-support. Before I used 2.8mct-awiesm-2.1. However, the other errors remain

call cpl_oasis3mct_init(f%partit,f%partit%MPI_COMM_FESOM)
! call cpl_oasis3mct_init(f%partit,f%partit%MPI_COMM_FESOM)
! kh 02.12.21
#if defined(__recom) && defined(__usetp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"localCommunicator" is neither defined in fesom_module nor in MOD_PARTIT. Should this really be "localCommunicator" or "f%partit%MPI_COMM_FESOM"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really answer this. These lines already exited in fesom2.5-recom and another earlier version where Özgür and I merged their complex recom version and my simpler paleo version. But I can't find them in my paleo version where Kai first time implemented tracer parallelisation. There this part of code is still in fvom_main.f90 and only contains:
#if defined (__oasis)

! kh 21.03.22 pass num_fesom_groups to coupler
call cpl_oasis3mct_init(MPI_COMM_FESOM, num_fesom_groups)
#endif
t1 = MPI_Wtime()

call par_init

......

So I think it should be for the case with tracer parallelisation:
call cpl_oasis3mct_init(f%partit, f%partit%MPI_COMM_FESOM, num_fesom_groups)

@ackerlar
Copy link
Collaborator

ackerlar commented Feb 5, 2025

FESOM compiled for me with some minor changes in MPI calls. I also changed

-        call cpl_oasis3mct_init(f%partit, f%partit%localCommunicator, num_fesom_groups)
+        call cpl_oasis3mct_init(f%partit, f%partit%MPI_COMM_FESOM, num_fesom_groups)

in fesom_module as localCommunicator is not defined outside of cpl_driver. Please check whether this makes sense.

However, the model crashes as FESOM is missing some entries in namelist.config. Is there a recom specific namelist.config?

@a270105
Copy link
Collaborator Author

a270105 commented Feb 5, 2025

@a270105 yes, I used esm_tools: esm_master get-awiesm-2.6, changed the FESOM branch to fesom2.6_recom_tp and then esm_master comp-awiesm-2.6

which oasis version are you using?

@a270105 a270105 closed this Feb 5, 2025
@a270105 a270105 reopened this Feb 5, 2025
@a270105
Copy link
Collaborator Author

a270105 commented Feb 5, 2025

FESOM compiled for me with some minor changes in MPI calls. I also changed

-        call cpl_oasis3mct_init(f%partit, f%partit%localCommunicator, num_fesom_groups)
+        call cpl_oasis3mct_init(f%partit, f%partit%MPI_COMM_FESOM, num_fesom_groups)

in fesom_module as localCommunicator is not defined outside of cpl_driver. Please check whether this makes sense.

However, the model crashes as FESOM is missing some entries in namelist.config. Is there a recom specific namelist.config?

Do you compile with __usetp? If not, there is no additional entry needed in namelist.config.

@ackerlar
Copy link
Collaborator

ackerlar commented Feb 6, 2025

Do you compile with __usetp? If not, there is no additional entry needed in namelist.config.

Yes, I set RECOM_COUPLED=ON which gives

if(${RECOM_COUPLED})
#   target_compile_definitions(${PROJECT_NAME} PRIVATE __recom USE_PRECISION=2 __usetp)
#   target_compile_definitions(${PROJECT_NAME} PRIVATE __recom USE_PRECISION=2)
   target_compile_definitions(${PROJECT_NAME} PRIVATE __recom USE_PRECISION=2 __3Zoo2Det __coccos __usetp)
endif()

in src/CMakeLists.txt and __usetp should be used if I understand correctly.

@JanStreffing
Copy link
Collaborator

Good progress @a270105. @suvarchal, or maybe @sebastianbeyer, any chance you can try to run a day of ifs-fesom with this branch this week?

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

The issue with copying recom restart files is fixed now and the changes are committed in the awiesm-2.6-recom branch.
A run without TP is finished for 5 days with daily restart.

@JanStreffing
Copy link
Collaborator

JanStreffing commented Oct 28, 2025

I tried the AWI-ESM3 part. I get stuck compiling:

/work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90(362): error #6627: This is an actual argument keyword name, and not a dummy argument name.   [NUM_PROGRAM_GROUPS]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)
-----------------------------------------------------^
compilation aborted for /work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:205: src/CMakeFiles/fesom.dir/cpl_driver.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Does that look familiar to you @a270105 ?

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

No, I have never had problem with this line.
I checked the oasis code and found that 'num_program_groups' is defined for the subroutine oasis_init_comp in oasis/lib/psmile/src/mod_oasis_method.F90
Maybe you are using a different oasis version?

@JanStreffing
Copy link
Collaborator

Yes, we use oasis3mct5, not oasis3mct2.8

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

Can you use the branch origin/feat/multi-group-support? Kai's changes for using multiple groups of the same program were added there 2022

@JanStreffing
Copy link
Collaborator

Can you link to that branch?
I can't see that online

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

@JanStreffing
Copy link
Collaborator

JanStreffing commented Oct 28, 2025

This branch is many years out of date. We can not run the latest AWI-ESM3 with that. Even if we make that run, it would not help us for the merge request at hand. We need this to work with new oasis3mct versions that contain required features, such as CONSERV remapping. Maybe this is something Kai can tackle?

I am running with: https://git.smhi.se/ec-earth/vendor/oasis/oasis3-mct-5.git

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

I am not suggesting to use that branch, just to see where Kai changed things related to tracer parallelisation. Maybe it is not too difficult to just add those in your oasis code.

@JanStreffing
Copy link
Collaborator

@a270105
Copy link
Collaborator Author

a270105 commented Oct 28, 2025

yes, I also found his changes in these 3 files

@JanStreffing
Copy link
Collaborator

Can you give this a try @a270105 & @chrisdane? I might not have the time to do so before the model freeze.

@a270105
Copy link
Collaborator Author

a270105 commented Oct 29, 2025

I can't run awiesm-2.6-recom with TP with the current oasis (from the branch 'multi-group-support') and got segmentation fault at the step FESOM: calling exchange_roots. So, at the moment the code works with TP compiled and num_fesom_groups = 1 (only one tracer groups, i.e. no MPI applied for tracers). For further steps I need help from Kai or someone else.

@JanStreffing
Copy link
Collaborator

I can't run awiesm-2.6-recom with TP with the current oasis (from the branch 'multi-group-support') and got segmentation fault at the step FESOM: calling exchange_roots. So, at the moment the code works with TP compiled and num_fesom_groups = 1 (only one tracer groups, i.e. no MPI applied for tracers). For further steps I need help from Kai or someone else.

Good to know. This does not block the merge, as that would be a new feature.

@JanStreffing
Copy link
Collaborator

I tried the AWI-ESM3 part. I get stuck compiling:

/work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90(362): error #6627: This is an actual argument keyword name, and not a dummy argument name.   [NUM_PROGRAM_GROUPS]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)
-----------------------------------------------------^
compilation aborted for /work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:205: src/CMakeFiles/fesom.dir/cpl_driver.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Does that look familiar to you @a270105 ?

I no longer get this error using the Group enabled version of OASIS3MCT5.
Next I got errors with preprocessing flags in the the .h files. Basically Intel compiler seems to ignore preprocessing flags inside .h files. When use_tp is not set, it then tries to access pieces of partit, that are not there, because the working preprocessor statements in the F90 code removed them:

/work/ab0246/a270092/model_codes/awiesm3-develop-cc/fesom-2.6/src/associate_part_ass.h(2): error #6460: This is not a component name that is defined in the encompassing structure.   [MPI_COMM_FESOM_WORLD]
MPI_COMM_FESOM_WORLD                => partit%MPI_COMM_FESOM_WORLD
----------------------------------------------^

@JanStreffing
Copy link
Collaborator

I tried the AWI-ESM3 part. I get stuck compiling:

/work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90(362): error #6627: This is an actual argument keyword name, and not a dummy argument name.   [NUM_PROGRAM_GROUPS]
    CALL oasis_init_comp(comp_id, comp_name, ierror, num_program_groups = num_fesom_groups)
-----------------------------------------------------^
compilation aborted for /work/ab0246/a270092/model_codes/awiesm3-develop/fesom-2.6/src/cpl_driver.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:205: src/CMakeFiles/fesom.dir/cpl_driver.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Does that look familiar to you @a270105 ?

I no longer get this error using the Group enabled version of OASIS3MCT5. Next I got errors with preprocessing flags in the the .h files. Basically Intel compiler seems to ignore preprocessing flags inside .h files. When use_tp is not set, it then tries to access pieces of partit, that are not there, because the working preprocessor statements in the F90 code removed them:

/work/ab0246/a270092/model_codes/awiesm3-develop-cc/fesom-2.6/src/associate_part_ass.h(2): error #6460: This is not a component name that is defined in the encompassing structure.   [MPI_COMM_FESOM_WORLD]
MPI_COMM_FESOM_WORLD                => partit%MPI_COMM_FESOM_WORLD
----------------------------------------------^

I though I was able to fix this by switching from #ifdef to !DEC statements, But I think that broke the code even more. Consider reverting. We need a solution to have preprocessors statements in the .h files. Otherwise we need two versions and include those. But that would add over 300 preprocessors statements and back a clusterfuck. No way I want to do that.

Any idea how to do this @suvarchal?

@a270105
Copy link
Collaborator Author

a270105 commented Oct 30, 2025

I just looked back in this merge request. Yes, I had the same issue and Lars as well. I tried with !DIR since I thought it would be good to not to be compiler specific...But for some reasons it did not work. So we decided to remove the preprocessor directives and let those pointers always defined. The same is done for the iceberg part. I just have these lines in the beginnging of my associate_part_ass.h:
MPI_COMM_FESOM_WORLD => partit%MPI_COMM_FESOM_WORLD MPI_COMM_FESOM_SAME_RANK_IN_GROUPS => partit%MPI_COMM_FESOM_SAME_RANK_IN_GROUPS MPI_COMM_FESOM => partit%MPI_COMM_FESOM MPI_COMM_FESOM_IB => partit%MPI_COMM_FESOM_IB
Do you have a different file?

@JanStreffing
Copy link
Collaborator

I have the version that is pushed onto the repo here.
Can you pull and fix that again? Sorry for the mess.

@a270105
Copy link
Collaborator Author

a270105 commented Oct 31, 2025

The preprocessor directives in those two .h files are deleted and I compiled the code with usetp on and off.

@JanStreffing
Copy link
Collaborator

JanStreffing commented Oct 31, 2025

Adding the tests to the end, so they are easier to find:

Here are the tests we need to pass once before we can merge this branch:

Compiling:

  • FESOM2, standalone, without RECOM, without TP [@a270105 ]
  • FESOM2, standalone, with RECOM, without TP [@a270105 ]
  • FESOM2, standalone, with RECOM, with TP [@a270105 ]
  • AWI-ESM2, without RECOM, without TP [@a270105 ]
  • AWI-ESM2, with RECOM, without TP [@a270105 ]
  • AWI-ESM2, with RECOM, with TP [@a270105 ] (nice to have but not blocking this PR if it does not work)
  • AWI-CM3, without RECOM, without TP [@JanStreffing]
  • AWI-ESM3, with RECOM, without TP [@a270105]
  • AWI-ESM3, with RECOM, with TP [@a270105] (nice to have but not blocking this PR if it does not work)
  • IFS-FESOM, without RECOM, without TP [@suvarchal]

Running for 1 day:

  • FESOM2, standalone, without RECOM, without TP [@a270105 ]
  • FESOM2, standalone, with RECOM, without TP [@a270105 ]
  • FESOM2, standalone, with RECOM, with TP [@a270105 ]
  • AWI-ESM2, without RECOM, without TP [@a270105 ]
  • AWI-ESM2, with RECOM, without TP [@a270105 ]
  • AWI-ESM2, with RECOM, with TP [@a270105 ] (nice to have but not blocking this PR if it does not work)
  • AWI-CM3, without RECOM, without TP [@JanStreffing]
  • AWI-ESM3, with RECOM, without TP [@chrisdane]
  • AWI-ESM3, with RECOM, with TP [@chrisdane] (nice to have but not blocking this PR if it does not work)
  • IFS-FESOM, without RECOM, without TP [@suvarchal]

Update: AWI-ESM3, with RECOM, without TP and AWI-ESM3, with RECOM, with TP compile.

@JanStreffing
Copy link
Collaborator

Update, I was able to run AWI-ESM3 without TP on this branch, after a local merge of our CO2 coupling developments done on fesom2.6_recom_awiesm3_co2_coupling

I will not test AWI-ESM3 with TP. It will not work yet, but it does not have to for this PR.

@a270105
Copy link
Collaborator Author

a270105 commented Nov 4, 2025

I have to report something really odd:
I added a new feature in the recom code (src/int_recom/recom_sms.F90) and recompiled fesom. However, a fully identical executable file was created (the build folder was deleted before compilation). This did not happen as I changed the fesom code (or cmake options in src/CMakeLists for recom). Somehow the recom files were not updated during recompilation. Others in the recom group have the same problem with other branches like fesom-2.6-recom-tra-diags.
This might need to be fixed before the release, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request RECOM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Porting fast FESOM/RECOM to main branch

9 participants