Parallelising fan-data to proj-data function in ML_norm.cxx #1168

markus-jehl · 2023-02-28T10:29:17Z

Relates to issue: #1167

markus-jehl · 2023-02-28T10:34:16Z

Before the change, the following calls were timed as:

make_fan_data_remove_gaps: 2.8 s
apply_efficiencies: 0.7 s
apply_geo_norm: 2.1 s
set_fan_data_add_gaps: 3.1 s

After the change, they were much faster:

make_fan_data_remove_gaps: 1 s
apply_efficiencies: 0.1 s
apply_geo_norm: 0.4 s
set_fan_data_add_gaps: 0.6 s

src/buildblock/ML_norm.cxx

KrisThielemans

I haven't checked but this is probably fine. The tricky (i.e. virtually impossible) one would be iterate_efficiencies as that updates efficiencies continuously in the loop. We tried a "one step late" scheme but it doesn't converge (see the old proceedings by Darren Hogg).

KrisThielemans

Unfortunately, I think a lot of this is not thread-safe. Lines like

STIR/src/buildblock/ML_norm.cxx

Lines 1138 to 1139 in fda27e0

    
           fan_data(new_ra, new_a, new_rb, new_b) = 
        
           fan_data(new_rb, new_b, new_ra, new_a) =

are not guaranteed to work. See for instance https://stackoverflow.com/a/41614045/15030207

Unfortunately, using atomic will likely not work for lines like that as they go via a class member, but only for vector access. Even then they will be version specific (but I'd be entirely fine only parallelising for compilers that support recent enough openmp). If atomic doesn't work, I think it'll need a critical section, essentially killing the speed-up. It would then need creating writable variables for every thread (which would probably break all encapsulation of FanData etc.

For the loops that use set_segment, it would be alright, as each segment is independent per thread, and set_segment is thread-safe.

markus-jehl · 2023-03-03T07:40:07Z

Your comment makes sense. But the sections I've parallelised should be fine, I think. As you say, parallelising on segments should be alright. And in the places where we parallelise over ra and a, the threads only access entries at the ra and a locations. The only one that looks problematic is the first of the three collapse(2) parallelisations: there we indeed compute new indices and use them to write into the "work" variable.

KrisThielemans · 2023-03-03T07:47:14Z

not sure if I agree. The lines I quoted are in the 1st loop and update new_fan_data. They are problematic. There's a loop here (3rd one?) which updated work. The next one updates fan_data. No?

markus-jehl · 2023-03-03T07:53:50Z

Yes, the lines you originally quoted are in the first loop, but that loop only parallelises across segments. The first new link you sent is the one I also think is problematic, but the second link I think is fine again because it does update fan_data, but only in location (ra, a, ...) therefore ensuring that each thread writes a different section of fan_data.

KrisThielemans · 2023-03-06T09:13:36Z

but updating fan_data is just as problematic, certainly see as we update it symmetrically (fan_data(new_ra, new_a, new_rb, new_b) = fan_data(new_rb, new_b, new_ra, new_a) = ...). How do we know that another thread isn't access the "symmetric" version, or just the neghbouring one, with therefore the potential for memory corruption?

It might be alright as all the index access in the 4D array is read-only, and we only update the 1D vectors, but how do know that those 1D arrays are not adjacent for different segments?

Of course, it seems pretty unlikely that this would generate a race-condition, but I don't think we have a solid guarantee.

I don't have access to any tools to check thread-safety sadly.

markus-jehl · 2023-03-06T12:46:55Z

Darn, I only noticed this symmetry thing now. I ran TSAN over it and even the parallelisation across segments seems to be not thread-safe.

KrisThielemans · 2023-03-06T13:30:19Z

By the way, how reliable is TSAN (presumably with clang) for OpenMP? (I've tried it with gcc, but I had to build by own instrumented openmp library (or was it gcc?), so gave up).

markus-jehl · 2023-03-06T13:51:12Z

I haven't used it with OpenMP before, and the more I look into it here the less I trust it. I think it might be getting confused with the various layers of bound checking and shared pointers when obtaining segments by sinogram.

KrisThielemans · 2023-03-06T14:19:08Z

yeah.. by the way, there's no reason for this code

 shared_ptr<SegmentBySinogram<float> > segment_ptr;
 segment_ptr.reset(new SegmentBySinogram<float>(proj_data.get_segment_by_sinogram(bin.segment_num())));

the following is essentially equivalent but much clearer

   const auto segment(proj_data.get_segment_by_sinogram(bin.segment_num()));

(I'm hoping I didn't write those lines)

Note that this is going to make TSAN problems disappear.

markus-jehl · 2023-03-06T15:37:20Z

This does look much cleaner indeed! But I still get TSAN problems... I'll have to look into this some more.

markus-jehl · 2023-03-13T16:25:47Z

Finally I managed to get OpenMP working with TSAN for a very simple dummy for loop just printing something to cout. Then parallelising the least problematic for loop over segments where only the segment is modified, it already throws a lot of warnings in TSAN. Furthermore, using "atomic read" on the fan_data doesn't even compile, because it expects this to work on simple operations such as "v=x", not on complex functions. Even on the "[]"-style indexing in VectorWithOffset it complained.

I'm afraid we'll have to live with the slow serial implementation for now...

KrisThielemans · 2023-03-13T18:09:12Z

I managed to get OpenMP working with TSAN for a very simple dummy for loop just printing something to cout.

well, that's a bit strange. Writing to cout is most definitely not thread-safe with critical section, so it should have complained! I've tried to find some doc on clang/TSAM/OpenMP but gave up. The only info i could find (but it's old) is that you need to build your own libgomp, and then LD_PRELOAD it.

"atomic read" on the fan_data doesn't even compile, because it expects this to work on simple operations such as "v=x", not on complex functions. Even on the "[]"-style indexing in VectorWithOffset it complained

I'm not surprised by the atomic restrictions for fan_data, but am disappointed it cannot handle VectorWithOffset::operator[]. Does it work with std::vector::operator[] by the way?

We don't need atomic reads anywhere. They are only necessary when someone else can write to that memory. So, in the second loop, we read fan_data in parallel (fine), and write to a thread-local segment (fine), and then call set_segment, which should have its own internal critical section., so should be fine as well. You can always check by adding your own critical section for the set_segment.

Updating fan_data though is harder.

markus-jehl · 2023-03-14T08:30:49Z

Yes, it's all very mysterious.

I've tried to find some doc on clang/TSAM/OpenMP but gave up. The only info i could find (but it's old) is that you need to build your own libgomp, and then LD_PRELOAD it.

Yes, this is how I got it to work eventually.

I'm not surprised by the atomic restrictions for fan_data, but am disappointed it cannot handle VectorWithOffset::operator[]. Does it work with std::vector::operator[] by the way?
We don't need atomic reads anywhere. They are only necessary when someone else can write to that memory. So, in the second loop, we read fan_data in parallel (fine), and write to a thread-local segment (fine), and then call set_segment, which should have its own internal critical section., so should be fine as well. You can always check by adding your own critical section for the set_segment.

I can try this a bit later this week!

markus-jehl · 2023-03-16T16:41:16Z

As discussed, std::vector access works, but VectorWithOffset doesn't. There seems to be no way to use atomic, and TSAN still complains about all parallelisation (therefore TSAN can't be trusted).

I have now also tried to add an atomic_read function to FanData that has a critical section on the read, but that kills performance thoroughly.

Therefore, the best option is to parallelise only the loop that looks safe for now.

KrisThielemans

looks good now. can you just add something to the release notes? Thanks!

Parallelised the first few functions.

59088c7

danieldeidda reviewed Feb 28, 2023

View reviewed changes

src/buildblock/ML_norm.cxx Outdated Show resolved Hide resolved

KrisThielemans reviewed Feb 28, 2023

View reviewed changes

Added collapse(2) statements to OpenMP parallelisation.

fda27e0

KrisThielemans requested changes Mar 2, 2023

View reviewed changes

Reverting all but one of the parallelisations.

2a16ec0

KrisThielemans changed the title ~~Parallelising functions in ML_norm.cxx to improve performance.~~ Parallelising fan-data to proj-data function in ML_norm.cxx Mar 29, 2023

KrisThielemans requested changes Mar 29, 2023

View reviewed changes

Updated release notes.

078f606

KrisThielemans merged commit a7e6d56 into UCL:master Mar 30, 2023

markus-jehl deleted the issue/1167-methods-in-ML_norm-are-not-parallelised-yet branch January 29, 2024 13:44

	fan_data(new_ra, new_a, new_rb, new_b) =
	fan_data(new_rb, new_b, new_ra, new_a) =

Parallelising fan-data to proj-data function in ML_norm.cxx #1168

Parallelising fan-data to proj-data function in ML_norm.cxx #1168

Uh oh!

Conversation

markus-jehl commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markus-jehl commented Feb 28, 2023

Uh oh!

Uh oh!

KrisThielemans left a comment

Choose a reason for hiding this comment

Uh oh!

KrisThielemans left a comment

Choose a reason for hiding this comment

Uh oh!

markus-jehl commented Mar 3, 2023

Uh oh!

KrisThielemans commented Mar 3, 2023

Uh oh!

markus-jehl commented Mar 3, 2023

Uh oh!

KrisThielemans commented Mar 6, 2023

Uh oh!

markus-jehl commented Mar 6, 2023

Uh oh!

KrisThielemans commented Mar 6, 2023

Uh oh!

markus-jehl commented Mar 6, 2023

Uh oh!

KrisThielemans commented Mar 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markus-jehl commented Mar 6, 2023

Uh oh!

markus-jehl commented Mar 13, 2023

Uh oh!

KrisThielemans commented Mar 13, 2023

Uh oh!

markus-jehl commented Mar 14, 2023

Uh oh!

markus-jehl commented Mar 16, 2023

Uh oh!

KrisThielemans left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markus-jehl commented Feb 28, 2023 •

edited

Loading

KrisThielemans commented Mar 6, 2023 •

edited

Loading