-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[Flang][OpenMP]Support for lowering task_reduction and in_reduction to MLIR #111155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-flang-semantics @llvm/pr-subscribers-flang-fir-hlfir Author: Kaviya Rajendiran (kaviya2510) ChangesThis patch supports lowering of task_reduction and in_reduction to MLIR Patch is 22.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/111155.diff 8 Files Affected:
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index a4d2524bccf5c3..95ab51809dcf94 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -1063,7 +1063,7 @@ bool ClauseProcessor::processReduction(
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> reductionSyms;
ReductionProcessor rp;
- rp.addDeclareReduction(
+ rp.addDeclareReduction<omp::clause::Reduction>(
currentLocation, converter, clause, reductionVars, reduceVarByRef,
reductionDeclSymbols, outReductionSyms ? &reductionSyms : nullptr);
@@ -1085,6 +1085,75 @@ bool ClauseProcessor::processReduction(
});
}
+bool ClauseProcessor::processTaskReduction(
+ mlir::Location currentLocation, mlir::omp::TaskReductionClauseOps &result,
+ llvm::SmallVectorImpl<mlir::Type> *outReductionTypes,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *outReductionSyms) const {
+ return findRepeatableClause<omp::clause::TaskReduction>(
+ [&](const omp::clause::TaskReduction &clause, const parser::CharBlock &) {
+ llvm::SmallVector<mlir::Value> taskReductionVars;
+ llvm::SmallVector<bool> taskReductionByref;
+ llvm::SmallVector<mlir::Attribute> taskReductionDeclSymbols;
+ llvm::SmallVector<const semantics::Symbol *> taskReductionSyms;
+ ReductionProcessor rp;
+ rp.addDeclareReduction<omp::clause::TaskReduction>(
+ currentLocation, converter, clause, taskReductionVars, taskReductionByref,
+ taskReductionDeclSymbols, outReductionSyms ? &taskReductionSyms : nullptr);
+
+ // Copy local lists into the output.
+ llvm::copy(taskReductionVars, std::back_inserter(result.taskReductionVars));
+ llvm::copy(taskReductionByref, std::back_inserter(result.taskReductionByref));
+ llvm::copy(taskReductionDeclSymbols,
+ std::back_inserter(result.taskReductionSyms));
+
+ if (outReductionTypes) {
+ outReductionTypes->reserve(outReductionTypes->size() +
+ taskReductionVars.size());
+ llvm::transform(taskReductionVars, std::back_inserter(*outReductionTypes),
+ [](mlir::Value v) { return v.getType(); });
+ }
+
+ if (outReductionSyms)
+ llvm::copy(taskReductionSyms, std::back_inserter(*outReductionSyms));
+ });
+}
+
+bool ClauseProcessor::processInReduction(
+ mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result,
+ llvm::SmallVectorImpl<mlir::Type> *outReductionTypes,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *outReductionSyms) const {
+ return findRepeatableClause<omp::clause::InReduction>(
+ [&](const omp::clause::InReduction &clause,
+ const parser::CharBlock &source) {
+ llvm::SmallVector<mlir::Value> inReductionVars;
+ llvm::SmallVector<bool> inReductionByref;
+ llvm::SmallVector<mlir::Attribute> inReductionDeclSymbols;
+ llvm::SmallVector<const semantics::Symbol *> inReductionSyms;
+ ReductionProcessor rp;
+ rp.addDeclareReduction<omp::clause::InReduction>(
+ currentLocation, converter, clause, inReductionVars,
+ inReductionByref, inReductionDeclSymbols,
+ outReductionSyms ? &inReductionSyms : nullptr);
+
+ // Copy local lists into the output.
+ llvm::copy(inReductionVars, std::back_inserter(result.inReductionVars));
+ llvm::copy(inReductionByref, std::back_inserter(result.inReductionByref));
+ llvm::copy(inReductionDeclSymbols,
+ std::back_inserter(result.inReductionSyms));
+
+ if (outReductionTypes) {
+ outReductionTypes->reserve(outReductionTypes->size() +
+ inReductionVars.size());
+ llvm::transform(inReductionVars,
+ std::back_inserter(*outReductionTypes),
+ [](mlir::Value v) { return v.getType(); });
+ }
+
+ if (outReductionSyms)
+ llvm::copy(inReductionSyms, std::back_inserter(*outReductionSyms));
+ });
+}
+
bool ClauseProcessor::processTo(
llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const {
return findRepeatableClause<omp::clause::To>(
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.h b/flang/lib/Lower/OpenMP/ClauseProcessor.h
index 0c8e7bd47ab5a6..04416a927a1c37 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.h
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.h
@@ -129,6 +129,16 @@ class ClauseProcessor {
llvm::SmallVectorImpl<mlir::Type> *reductionTypes = nullptr,
llvm::SmallVectorImpl<const semantics::Symbol *> *reductionSyms =
nullptr) const;
+ bool processTaskReduction(
+ mlir::Location currentLocation, mlir::omp::TaskReductionClauseOps &result,
+ llvm::SmallVectorImpl<mlir::Type> *taskReductionTypes = nullptr,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *taskReductionSyms =
+ nullptr) const;
+ bool processInReduction(
+ mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result,
+ llvm::SmallVectorImpl<mlir::Type> *inReductionTypes = nullptr,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *inReductionSyms =
+ nullptr) const;
bool processTo(llvm::SmallVectorImpl<DeclareTargetCapturePair> &result) const;
bool processUseDeviceAddr(
lower::StatementContext &stmtCtx,
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 60c83586e468b6..850f32ff0bf030 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -1244,29 +1244,34 @@ static void genTaskClauses(lower::AbstractConverter &converter,
semantics::SemanticsContext &semaCtx,
lower::StatementContext &stmtCtx,
const List<Clause> &clauses, mlir::Location loc,
- mlir::omp::TaskOperands &clauseOps) {
+ mlir::omp::TaskOperands &clauseOps,
+ llvm::SmallVectorImpl<mlir::Type> &inReductionTypes,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &inReductionSyms) {
ClauseProcessor cp(converter, semaCtx, clauses);
cp.processAllocate(clauseOps);
cp.processDepend(clauseOps);
cp.processFinal(stmtCtx, clauseOps);
cp.processIf(llvm::omp::Directive::OMPD_task, clauseOps);
+ cp.processInReduction(loc, clauseOps, &inReductionTypes,
+ &inReductionSyms);
cp.processMergeable(clauseOps);
cp.processPriority(stmtCtx, clauseOps);
cp.processUntied(clauseOps);
// TODO Support delayed privatization.
- cp.processTODO<clause::Affinity, clause::Detach, clause::InReduction>(
+ cp.processTODO<clause::Affinity, clause::Detach>(
loc, llvm::omp::Directive::OMPD_task);
}
static void genTaskgroupClauses(lower::AbstractConverter &converter,
- semantics::SemanticsContext &semaCtx,
- const List<Clause> &clauses, mlir::Location loc,
- mlir::omp::TaskgroupOperands &clauseOps) {
+ semantics::SemanticsContext &semaCtx,
+ const List<Clause> &clauses, mlir::Location loc,
+ mlir::omp::TaskgroupOperands &clauseOps,
+ llvm::SmallVectorImpl<mlir::Type> &taskReductionTypes,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &taskReductionSyms) {
ClauseProcessor cp(converter, semaCtx, clauses);
cp.processAllocate(clauseOps);
- cp.processTODO<clause::TaskReduction>(loc,
- llvm::omp::Directive::OMPD_taskgroup);
+ cp.processTaskReduction(loc, clauseOps, &taskReductionTypes, &taskReductionSyms);
}
static void genTaskwaitClauses(lower::AbstractConverter &converter,
@@ -1866,13 +1871,26 @@ genTaskOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
ConstructQueue::const_iterator item) {
lower::StatementContext stmtCtx;
mlir::omp::TaskOperands clauseOps;
- genTaskClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps);
+ llvm::SmallVector<mlir::Type> inReductionTypes;
+ llvm::SmallVector<const semantics::Symbol *> inreductionSyms;
+ genTaskClauses(converter, semaCtx, stmtCtx, item->clauses, loc, clauseOps,
+ inReductionTypes, inreductionSyms);
+
+ auto reductionCallback = [&](mlir::Operation *op) {
+ genReductionVars(op, converter, loc, inreductionSyms, inReductionTypes);
+ return inreductionSyms;
+ };
- return genOpWithBody<mlir::omp::TaskOp>(
+ auto taskOp = genOpWithBody<mlir::omp::TaskOp>(
OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
llvm::omp::Directive::OMPD_task)
- .setClauses(&item->clauses),
+ .setClauses(&item->clauses)
+ .setGenRegionEntryCb(reductionCallback),
queue, item, clauseOps);
+ // Add reduction variables as arguments
+ llvm::SmallVector<mlir::Location> blockArgLocs(inReductionTypes.size(), loc);
+ taskOp->getRegion(0).addArguments(inReductionTypes, blockArgLocs);
+ return taskOp;
}
static mlir::omp::TaskgroupOp
@@ -1882,13 +1900,21 @@ genTaskgroupOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
const ConstructQueue &queue,
ConstructQueue::const_iterator item) {
mlir::omp::TaskgroupOperands clauseOps;
- genTaskgroupClauses(converter, semaCtx, item->clauses, loc, clauseOps);
+ llvm::SmallVector<mlir::Type> taskReductionTypes;
+ llvm::SmallVector<const semantics::Symbol *> taskReductionSyms;
+ genTaskgroupClauses(converter, semaCtx, item->clauses, loc, clauseOps,
+ taskReductionTypes, taskReductionSyms);
- return genOpWithBody<mlir::omp::TaskgroupOp>(
+ auto taskgroupOp = genOpWithBody<mlir::omp::TaskgroupOp>(
OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval,
llvm::omp::Directive::OMPD_taskgroup)
.setClauses(&item->clauses),
queue, item, clauseOps);
+
+ // Add reduction variables as arguments
+ llvm::SmallVector<mlir::Location> blockArgLocs(taskReductionSyms.size(), loc);
+ taskgroupOp->getRegion(0).addArguments(taskReductionTypes, blockArgLocs);
+ return taskgroupOp;
}
static mlir::omp::TaskwaitOp
@@ -2764,7 +2790,9 @@ static void genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable,
!std::holds_alternative<clause::ThreadLimit>(clause.u) &&
!std::holds_alternative<clause::Threads>(clause.u) &&
!std::holds_alternative<clause::UseDeviceAddr>(clause.u) &&
- !std::holds_alternative<clause::UseDevicePtr>(clause.u)) {
+ !std::holds_alternative<clause::UseDevicePtr>(clause.u) &&
+ !std::holds_alternative<clause::TaskReduction>(clause.u) &&
+ !std::holds_alternative<clause::InReduction>(clause.u)) {
TODO(clauseLocation, "OpenMP Block construct clause");
}
}
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
index 9da15ba303a475..deb25b4fff3792 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.cpp
@@ -24,6 +24,7 @@
#include "flang/Parser/tools.h"
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
#include "llvm/Support/CommandLine.h"
+#include <type_traits>
static llvm::cl::opt<bool> forceByrefReduction(
"force-byref-reduction",
@@ -34,6 +35,38 @@ namespace Fortran {
namespace lower {
namespace omp {
+// explicit template declarations
+template void ReductionProcessor::addDeclareReduction<omp::clause::Reduction>(
+ mlir::Location currentLocation,
+ lower::AbstractConverter &converter,
+ const omp::clause::Reduction &reduction,
+ llvm::SmallVectorImpl<mlir::Value> &reductionVars,
+ llvm::SmallVectorImpl<bool> &reduceVarByRef,
+ llvm::SmallVectorImpl<mlir::Attribute> &reductionDeclSymbols,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *reductionSymbols
+ );
+
+template void ReductionProcessor::addDeclareReduction<omp::clause::TaskReduction>(
+ mlir::Location currentLocation,
+ lower::AbstractConverter &converter,
+ const omp::clause::TaskReduction &reduction,
+ llvm::SmallVectorImpl<mlir::Value> &reductionVars,
+ llvm::SmallVectorImpl<bool> &reduceVarByRef,
+ llvm::SmallVectorImpl<mlir::Attribute> &reductionDeclSymbols,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *reductionSymbols
+ );
+
+template void ReductionProcessor::addDeclareReduction<omp::clause::InReduction>(
+ mlir::Location currentLocation,
+ lower::AbstractConverter &converter,
+ const omp::clause::InReduction &reduction,
+ llvm::SmallVectorImpl<mlir::Value> &reductionVars,
+ llvm::SmallVectorImpl<bool> &reduceVarByRef,
+ llvm::SmallVectorImpl<mlir::Attribute> &reductionDeclSymbols,
+ llvm::SmallVectorImpl<const semantics::Symbol *> *reductionSymbols
+ );
+
+
ReductionProcessor::ReductionIdentifier ReductionProcessor::getReductionType(
const omp::clause::ProcedureDesignator &pd) {
auto redType = llvm::StringSwitch<std::optional<ReductionIdentifier>>(
@@ -716,24 +749,26 @@ static bool doReductionByRef(mlir::Value reductionVar) {
return false;
}
-void ReductionProcessor::addDeclareReduction(
- mlir::Location currentLocation, lower::AbstractConverter &converter,
- const omp::clause::Reduction &reduction,
+template <class T>
+void ReductionProcessor::addDeclareReduction(mlir::Location currentLocation,
+ lower::AbstractConverter &converter,
+ const T &reduction,
llvm::SmallVectorImpl<mlir::Value> &reductionVars,
llvm::SmallVectorImpl<bool> &reduceVarByRef,
llvm::SmallVectorImpl<mlir::Attribute> &reductionDeclSymbols,
llvm::SmallVectorImpl<const semantics::Symbol *> *reductionSymbols) {
fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder();
-
- if (std::get<std::optional<omp::clause::Reduction::ReductionModifier>>(
- reduction.t))
- TODO(currentLocation, "Reduction modifiers are not supported");
+ if constexpr (std::is_same<T, omp::clause::Reduction>::value) {
+ if (std::get<std::optional<typename T::ReductionModifier>>(
+ reduction.t))
+ TODO(currentLocation, "Reduction modifiers are not supported");
+ }
mlir::omp::DeclareReductionOp decl;
- const auto &redOperatorList{
- std::get<omp::clause::Reduction::ReductionIdentifiers>(reduction.t)};
- assert(redOperatorList.size() == 1 && "Expecting single operator");
- const auto &redOperator = redOperatorList.front();
+ const auto &redOperatorList{
+ std::get<typename T::ReductionIdentifiers>(reduction.t)};
+ assert(redOperatorList.size() == 1 && "Expecting single operator");
+ const auto &redOperator = redOperatorList.front();
const auto &objectList{std::get<omp::ObjectList>(reduction.t)};
if (!std::holds_alternative<omp::clause::DefinedOperator>(redOperator.u)) {
diff --git a/flang/lib/Lower/OpenMP/ReductionProcessor.h b/flang/lib/Lower/OpenMP/ReductionProcessor.h
index 0ed5782e5da1b7..d34db0618c7cda 100644
--- a/flang/lib/Lower/OpenMP/ReductionProcessor.h
+++ b/flang/lib/Lower/OpenMP/ReductionProcessor.h
@@ -120,9 +120,10 @@ class ReductionProcessor {
/// Creates a reduction declaration and associates it with an OpenMP block
/// directive.
+ template <class T>
static void addDeclareReduction(
mlir::Location currentLocation, lower::AbstractConverter &converter,
- const omp::clause::Reduction &reduction,
+ const T &reduction,
llvm::SmallVectorImpl<mlir::Value> &reductionVars,
llvm::SmallVectorImpl<bool> &reduceVarByRef,
llvm::SmallVectorImpl<mlir::Attribute> &reductionDeclSymbols,
diff --git a/flang/test/Lower/OpenMP/task_array_reduction.f90 b/flang/test/Lower/OpenMP/task_array_reduction.f90
new file mode 100644
index 00000000000000..74693343744c26
--- /dev/null
+++ b/flang/test/Lower/OpenMP/task_array_reduction.f90
@@ -0,0 +1,50 @@
+! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
+
+! CHECK-LABEL: omp.declare_reduction @add_reduction_byref_box_Uxf32 : !fir.ref<!fir.box<!fir.array<?xf32>>> alloc {
+! [...]
+! CHECK: omp.yield
+! CHECK-LABEL: } init {
+! [...]
+! CHECK: omp.yield
+! CHECK-LABEL: } combiner {
+! [...]
+! CHECK: omp.yield
+! CHECK-LABEL: } cleanup {
+! [...]
+! CHECK: omp.yield
+! CHECK: }
+
+! CHECK-LABEL: func.func @_QPtaskreduction
+! CHECK-SAME: (%[[VAL_0:.*]]: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "x"}) {
+! CHECK: %[[VAL_1:.*]] = fir.dummy_scope : !fir.dscope
+! CHECK: %[[VAL_2:.*]]:2 = hlfir.declare %[[VAL_0]] dummy_scope %[[VAL_1]]
+! CHECK-SAME {uniq_name = "_QFtaskreductionEx"} : (!fir.box<!fir.array<?xf32>>, !fir.dscope) -> (!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
+! CHECK: omp.parallel {
+! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xf32>>
+! CHECK: fir.store %[[VAL_2]]#1 to %[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
+! CHECK: omp.taskgroup task_reduction(byref @add_reduction_byref_box_Uxf32 %[[VAL_3]] -> %[[VAL_4:.*]]: !fir.ref<!fir.box<!fir.array<?xf32>>>) {
+! CHECK: %[[VAL_5:.*]] = fir.alloca !fir.box<!fir.array<?xf32>>
+! CHECK: fir.store %[[VAL_2]]#1 to %[[VAL_5]] : !fir.ref<!fir.box<!fir.array<?xf32>>>
+! CHECK: omp.task in_reduction(byref @add_reduction_byref_box_Uxf32 %[[VAL_5]] -> %[[VAL_6:.*]] : !fir.ref<!fir.box<!fir.array<?xf32>>>) {
+! [...]
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: omp.terminator
+! CHECK: }
+! CHECK: return
+! CHECK: }
+
+subroutine taskReduction(x)
+ real, dimension(:) :: x
+ !$omp parallel
+ !$omp taskgroup task_reduction(+:x)
+ !$omp task in_reduction(+:x)
+ x = x + 1
+ !$omp end task
+ !$omp end taskgroup
+ !$omp end parallel
+end subroutine
+
diff --git a/flang/test/Lower/OpenMP/task_in_reduction.f90 b/flang/test/Lower/OpenMP/task_in_reduction.f90
new file mode 100644
index 00000000000000..26c079d5ac8aa5
--- /dev/null
+++ b/flang/test/Lower/OpenMP/task_in_reduction.f90
@@ -0,0 +1,48 @@
+! RUN: bbc -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
+! RUN: %flang_fc1 -emit-hlfir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
+
+!CHECK-LABEL: omp.declare_reduction
+!CHECK-SAME: @[[RED_I32_NAME:.*]] : i32 init {
+!CHECK: ^bb0(%{{.*}}: i32):
+!CHECK: %[[C0_1:.*]] = arith.constant 0 : i32
+!CHECK: omp.yield(%[[C0_1]] : i32)
+!CHECK: } combiner {
+!CHECK: ^bb0(%[[ARG0:.*]]: i32, %[[ARG1:.*]]: i32):
+!CHECK: %[[RES:.*]] = arith.addi %[[ARG0]], %[[ARG1]] : i32
+!CHECK: omp.yield(%[[RES]] : i32)
+!CHECK: }
+
+!CHECK-LABEL: func.func @_QPin_reduction() {
+!CHECK: %[[VAL_0:.*]] = fir.alloca i32 {bindc_name = "x", uniq_name = "_QFin_reductionEx"}
+!CHECK: %[[VAL_1:.*]]:2 = hlfir.declare %[[VAL_0]] {uniq_name = "_QFin_reductionEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[VAL_2:.*]] = arith.constant 0 : i32
+!CHECK: hlfir.assign %[[VAL_2]] to %[[VAL_1]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.parallel {
+!CHECK: omp.taskgroup task_reduction(@[[RED_I32_NAME]] %[[VAL_1]]#0 -> %[[VAL_3:.*]] : !fir.ref<i32>) {
+!CHECK: omp.task in_reduction(@[[RED_I32_NAME]] %[[VAL_1]]#0 -> %[[VAL_4:.*]] : !fir.ref<i32>) {
+!CHECK: %[[VAL_5:.*]]:2 = hlfir.declare %[[VAL_4]] {uniq_name = "_QFin_reductionEx"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+!CHECK: %[[VAL_6:.*]] = fir.load %[[VAL_5]]#0 : !fir.ref<i32>
+!CHECK: %[[VAL_7:.*]] = arith.constant 1 : i32
+!CHECK: %[[VAL_8:.*]] = arith.addi %[[VAL_6]], %[[VAL_7]] : i32
+!CHECK: hlfir.assign %[[VAL_8]] to %[[VAL_5]]#0 : i32, !fir.ref<i32>
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: omp.terminator
+!CHECK: }
+!CHECK: return
+!CHECK: }
+
+subroutine in_reduction
+ integer :: x
+ x = 0
+ !$omp parallel
+ !$omp taskgroup task_reduction(+:x)
+ !$omp task in_reduction(+:x)
+ x = x + 1
+ !$omp end task
+ !$omp end taskgroup
+ !$omp end parallel
+end subroutine
+
diff --git...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the translation from OpenMP dialect to LLVMIR before lowering to MLIR. Otherwise this will manifest as a crash.
Sure, I will do it. |
8aeb419
to
711fe33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @kaviya2510 for this work. The approach seems fine to me, just adding some comments to hopefully help you rebase this patch whenever it's time.
b401393
to
e31a990
Compare
Thanks for the review @skatrak . I rebased and also addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kaviya, I only have a few small comments.
@@ -104,6 +104,9 @@ class ClauseProcessor { | |||
bool processIsDevicePtr( | |||
mlir::omp::IsDevicePtrClauseOps &result, | |||
llvm::SmallVectorImpl<const semantics::Symbol *> &isDeviceSyms) const; | |||
bool processInReduction( | |||
mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result, | |||
llvm::SmallVectorImpl<const semantics::Symbol *> &InReductionSyms) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it's better to specify that it's intended as an output rather than restating the full clause name. Feel free to disagree, but in that case please fix capitalization.
llvm::SmallVectorImpl<const semantics::Symbol *> &InReductionSyms) const; | |
llvm::SmallVectorImpl<const semantics::Symbol *> &outReductionSyms) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews @skatrak.
yeah, I can understand. I will modify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it again I changed my mind on this 😅, so feel free to use whichever way you prefer, as long as the first letter is lowercase.
e31a990
to
00f3f5d
Compare
cef327b
to
72d2230
Compare
72d2230
to
1b5a47e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay getting back to this, I hope I haven't blocked you too much. I think there is one minor problem at the moment, which will probably make you have to update some of the tests, but other than that I just have some small nits.
converter.collectSymbolSet(eval, allSymbols, flag, | ||
/*collectSymbols=*/true, | ||
/*collectSymbols=*/collectSymbols, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The comment is redundant when passing a variable instead of a constant.
/*collectSymbols=*/collectSymbols, | |
collectSymbols, |
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
OpWithBodyGenInfo genInfo = | ||
OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval, | ||
llvm::omp::Directive::OMPD_task) | ||
.setClauses(&item->clauses) | ||
.setDataSharingProcessor(&dsp) | ||
.setGenRegionEntryCb(genRegionEntryCB), | ||
queue, item, clauseOps); | ||
.setGenRegionEntryCb(genRegionEntryCB); | ||
|
||
auto taskOp = | ||
genOpWithBody<mlir::omp::TaskOp>(genInfo, queue, item, clauseOps); | ||
return taskOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It doesn't look like this change is necessary. Since the general convention in this file is to return the result of genOpWithBody
directly and construct the OpWithBodyGenInfo
structure parameter inside of the call whenever possible, I think this should be left as it was.
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
OpWithBodyGenInfo genInfo = | ||
OpWithBodyGenInfo(converter, symTable, semaCtx, loc, eval, | ||
llvm::omp::Directive::OMPD_taskgroup) | ||
.setClauses(&item->clauses), | ||
queue, item, clauseOps); | ||
.setClauses(&item->clauses) | ||
.setGenRegionEntryCb(genRegionEntryCB); | ||
|
||
auto taskgroupOp = | ||
genOpWithBody<mlir::omp::TaskgroupOp>(genInfo, queue, item, clauseOps); | ||
return taskgroupOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Same comment as above. For consistency, return directly the result of genOpWithBody
and construct the genInfo
argument within the argument list.
flang/lib/Lower/OpenMP/OpenMP.cpp
Outdated
genEntryBlock(converter.getFirOpBuilder(), taskgroupArgs, op->getRegion(0)); | ||
return llvm::to_vector(taskgroupArgs.getSyms()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing the binding of the symbols to the new entry block arguments:
genEntryBlock(converter.getFirOpBuilder(), taskgroupArgs, op->getRegion(0)); | |
return llvm::to_vector(taskgroupArgs.getSyms()); | |
genEntryBlock(converter.getFirOpBuilder(), taskgroupArgs, op->getRegion(0)); | |
bindEntryBlockArgs(converter, | |
llvm::cast<mlir::omp::BlockArgOpenMPOpInterface>(op), | |
taskgroupArgs); | |
return llvm::to_vector(taskgroupArgs.getSyms()); |
! CHECK: %[[VAL_3:.*]] = fir.alloca !fir.box<!fir.array<?xf32>> | ||
! CHECK: fir.store %[[VAL_2]]#1 to %[[VAL_3]] : !fir.ref<!fir.box<!fir.array<?xf32>>> | ||
! CHECK: omp.taskgroup task_reduction(byref @add_reduction_byref_box_Uxf32 %[[VAL_3]] -> %[[VAL_4:.*]]: !fir.ref<!fir.box<!fir.array<?xf32>>>) { | ||
! CHECK: %[[VAL_5:.*]] = fir.alloca !fir.box<!fir.array<?xf32>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right to me, but I might be wrong. Shouldn't we be passing the result of an hlfir.declare
of VAL_4 to omp.task
, rather than creating an alloca for it? Perhaps this is related to the missing binding of Fortran symbols to entry block arguments during the creation of the region for omp.taskgroup
.
!CHECK: %[[VAL_0:.*]] = fir.alloca i32 {bindc_name = "res", uniq_name = "_QFomp_taskgroup_task_reductionEres"} | ||
!CHECK: %[[VAL_1:.*]]:2 = hlfir.declare %[[VAL_0]] {uniq_name = "_QFomp_taskgroup_task_reductionEres"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) | ||
!CHECK: omp.taskgroup task_reduction(@[[RED_I32_NAME]] %[[VAL_1]]#0 -> %[[VAL_2:.*]] : !fir.ref<i32>) { | ||
!CHECK: %[[VAL_3:.*]] = fir.load %[[VAL_1]]#0 : !fir.ref<i32> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The omp.taskgroup
region shouldn't have references to VAL_1. References to that variable should be done through VAL_2.
!CHECK-LABEL: func.func @_QPin_reduction() { | ||
! [...] | ||
!CHECK: omp.taskgroup task_reduction(@[[RED_I32_NAME]] %[[VAL_1:.*]]#0 -> %[[VAL_3:.*]] : !fir.ref<i32>) { | ||
!CHECK: omp.task in_reduction(@[[RED_I32_NAME]] %[[VAL_1]]#0 -> %[[VAL_4:.*]] : !fir.ref<i32>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here: VAL_1 shouldn't be used inside of the omp.taskgroup
region.
! CHECK: return | ||
! CHECK: } | ||
|
||
subroutine taskReduction(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: For consistency with other tests.
subroutine taskReduction(x) | |
subroutine task_reduction(x) |
@skatrak It's fine. This PR is dependent on the translation patches for inreduction and task_reduction being merged first, so no problem. |
…uments in taskgroup construct and fixed testcases
@skatrak apologies for the delayed response. I thought that this patch depends on the LLVM translation patch for task reduction and in-reduction and it might crash without the translation part. So I decided to keep it on hold and waiting for the translation patch to merge first. Later, I came to know that you had added a patch which throws an error for unhandled clauses during LLVM translation. Realizing that this patch does not have a dependency, I resumed working on it. |
I have addressed all your review comments in the recent patch. Could you please take a look at it and let me know if you have any comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this. Looks pretty good overall. Just some minor points.
…th setEntryBlockArgs()
Thanks for the review @tblah. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Hi @skatrak, Could you please review the patch and provide your approval? |
Thank you for the approval @tblah |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Kaviya, LGTM! I have a minimal nit, but no need for another review by me before merging.
`According to the OpenMP specification, the rules for variables with implicitly determined data-sharing attributes are:
subroutine omp_task_in_reduction()
integer i
i = 0
!$omp task in_reduction(+:i)
i = i + 1
!$omp end task
end subroutine omp_task_in_reduction In the above example, without any explicit data sharing attribute the variable With the current flow, the compiler incorrectly marks the data-sharing attribute of omp.private {type = firstprivate} @_QFomp_task_in_reductionEi_firstprivate_ref_i32 : !fir.ref<i32> alloc {
^bb0(%arg0: !fir.ref<i32>):
%0 = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFomp_task_in_reductionEi"}
%1:2 = hlfir.declare %0 {uniq_name = "_QFomp_task_in_reductionEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
omp.yield(%1#0 : !fir.ref<i32>)
} copy {
^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
%0 = fir.load %arg0 : !fir.ref<i32>
hlfir.assign %0 to %arg1 : i32, !fir.ref<i32>
omp.yield(%arg1 : !fir.ref<i32>)
} I am felling that my earlier fix in DataSharingProcessor.cpp is not correct. Instead of skipping collecting symbols in DataSharingProcessor.cpp, I added a new change which detects the dsa of variable @tblah , could you please take a look at it and review my changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach sounds good. Sorry I didn't catch this on the last round of review.
Thanks for the review. |
Thanks for the review @skatrak |
@kaviya2510 Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
…o MLIR (llvm#111155) This patch, - Added support for lowering of task_reduction to MLIR - Added support for lowering of in_reduction to MLIR - Fixed incorrect DSA handling for variables in the presence of 'in_reduction' clause.
Added support for lowering of task_reduction and in_reduction to MLIR
Fixed the below issue which is observed while lowering in_reduction to MLIR.
The below testcase is not generating the expected MLIR.
It adds some information(mentioned below) related to
omp.private
in the MLIR lowering and this issue is not observed while enclosing in_reduction inside a parallel construct.This issue has also been resolved as part of this PR.