Hetero subgraph with dispatching #43

ZenoTan · 2022-05-04T10:42:50Z

No description provided.

…ero_subgraph

codecov-commenter · 2022-05-05T23:40:37Z

Codecov Report

❌ Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.51%. Comparing base (7b599eb) to head (663a675).
⚠️ Report is 403 commits behind head on master.

Files with missing lines	Patch %	Lines
pyg_lib/csrc/utils/hetero_dispatch.h	77.77%	4 Missing ⚠️
pyg_lib/csrc/sampler/cpu/mapper.h	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   97.27%   96.51%   -0.76%     
==========================================
  Files          10       12       +2     
  Lines         220      287      +67     
==========================================
+ Hits          214      277      +63     
- Misses          6       10       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

was checking out wrong commit this morning.

…ero_subgraph

pyg_lib/csrc/sampler/cpu/mapper.h

yaoyaowd · 2022-05-12T17:06:22Z

pyg_lib/csrc/sampler/cpu/mapper.h

  void fill(const scalar_t* nodes_data, const scalar_t size) {
    if (use_vec) {
-      for (scalar_t i = 0; i < size; ++i)
+      for (scalar_t i = 0; i < size; ++i) {


Let me post my question here, I read some documents and based on my understanding scalar_t includes both float, double, int32, int64 during compile. But in a lot of our usecases we are iterating over integers. How does pytorch avoid compile float type for these functions? Is there a better way to be more specific to the data types here?

There are some helper functions like is_integral for a dtype, but IMO it is mostly runtime checking. We can also use some STL type checking for compile time.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

pyg_lib/csrc/utils/types.h

yaoyaowd · 2022-05-12T17:26:16Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-  });
-
-  return std::make_tuple(out_rowptr, out_col, out_edge_id);
+  return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);


The code structure looks a little weird to me because csrc/sampler/cpu/subgraph_kernel exists for register TORCH_LIBRARY_IMPL and it is using a general implementation in csr/sampler/subgraph.cpp. How about reorganize the code like this:

csr - ops # all ops expose for pytorch. - sampler # all general graph operation. - sampler

We don't need to refactor the code structure now. But want to hear your opinion.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

We could follow the style in other pyg repos: put CPU/GPU specific impl in separate folders and provide common interface in a higher directory.

yaoyaowd · 2022-05-12T18:27:54Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-  });
-
-  return std::make_tuple(out_rowptr, out_col, out_edge_id);
+  return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);


nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

pyg_lib/csrc/utils/hetero_dispatch.h

yaoyaowd · 2022-05-13T22:46:16Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

+
+        auto res = subgraph_with_mapper<scalar_t>(rowptr, col, src_nodes,
+                                                  mapper, return_edge_id);
+        out_rowptr = std::get<0>(res);


or maybe we could do std::tie(out_powptr, out_col, out_edge_id) = res?

yaoyaowd · 2022-05-13T22:52:49Z

pyg_lib/csrc/sampler/subgraph.cpp

+
+  for (const auto& kv : rowptr) {
+    const auto& edge_type = kv.key();
+    bool pass = filter_args_by_edge(edge_type, src_nodes_arg, dst_nodes_arg,


I'd still prefer

pass = src_nodes_args.filter_by_edge(edge_type) && dst_nodes_args.filter_by_edge(edge_type) && edge_id_arg.filter_by_edge(edge_type)

or from an efficiency point of view.

auto dst = get_dst(edge_type) auto src = get_src(edge_type) bool pass = return_edge_id.counts(edge_type) > 0 && src_nodes.counts(src) > 0 && dst_nodes.counts(dst) > 0;

yaoyaowd · 2022-05-13T22:53:43Z

pyg_lib/csrc/sampler/subgraph.cpp

+      const auto& r = rowptr.at(edge_type);
+      const auto& c = col.at(edge_type);
+      res.insert(edge_type,
+                 subgraph_bipartite(r, c, std::get<0>(vals), std::get<1>(vals),


and here would just be

subgraph_bipartite(r, c, src_nodes.at(src), dst_nodes.at(dst), return_edge_id.at(edge_type));

CHANGELOG.md

rusty1s · 2022-05-06T12:12:10Z

pyg_lib/csrc/sampler/subgraph.cpp

  return op.call(rowptr, col, nodes, return_edge_id);
 }

+c10::Dict<utils::edge_t,


I actually would have expected we return a tuple of dictionaries, similar to how the input looks like.

test/csrc/utils/test_utils.cpp

rusty1s · 2022-05-15T16:19:33Z

pyg_lib/csrc/sampler/cpu/mapper.h

  void fill(const scalar_t* nodes_data, const scalar_t size) {
    if (use_vec) {
-      for (scalar_t i = 0; i < size; ++i)
+      for (scalar_t i = 0; i < size; ++i) {


The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

rusty1s · 2022-05-15T16:21:09Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp

-            offset++;
-          }
+  AT_DISPATCH_INTEGRAL_TYPES(
+      nodes.scalar_type(), "subgraph_kernel_with_mapper", [&] {


Can we make this a one-liner again?

rusty1s · 2022-05-15T16:25:25Z

pyg_lib/csrc/sampler/cpu/subgraph_kernel.cpp


 TORCH_LIBRARY_IMPL(pyg, CPU, m) {
  m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph"), TORCH_FN(subgraph_kernel));
+  m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph_bipartite"),


Any reason we want to expose that? Looks more like an internal function to me.

If the user want to build a subgraph of a bipartite graph then he can use it.

rusty1s · 2022-05-15T16:26:21Z

pyg_lib/csrc/sampler/subgraph.cpp

+}
+
+c10::Dict<utils::EdgeType,
+          std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>


IMO, the output should be a tuple of dictionaries (similar to the input).

rusty1s · 2022-05-15T16:27:16Z

pyg_lib/csrc/sampler/subgraph.cpp

+    if (pass) {
+      const auto& r = rowptr.at(edge_type);
+      const auto& c = col.at(edge_type);
+      res.insert(edge_type, subgraph_bipartite(


Shouldn't we user the mapper here? Other-wise, we will re-map across every edge type.

Yes it has a cost, but the mapper is more read-intensive. I will add a TODO here.

rusty1s · 2022-05-15T16:28:03Z

pyg_lib/csrc/utils/types.h

+
+inline NodeType get_dst(const EdgeType& e) {
+  return e.substr(e.find_last_of(SPLIT_TOKEN) + 1);
+}


We could also add a function that maps tuples to strings and vice versa.

rusty1s · 2022-05-15T16:31:35Z

pyg_lib/csrc/sampler/subgraph.cpp

+          std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
+hetero_subgraph(const utils::EdgeTensorDict& rowptr,
+                const utils::EdgeTensorDict& col,
+                const utils::NodeTensorDict& src_nodes,


Not sure why we have both src_nodes and dst_nodes. IMO, these can be safely merged as in https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.HeteroData.subgraph.

Separating src and dst is just to give some flexibility. We could also have the merged API though.

Co-authored-by: Matthias Fey <[email protected]>

ZenoTan added 4 commits May 3, 2022 16:19

init hetero subgraph

2a50f4c

update

cba41dd

Merge branch 'master' of https://github.com/pyg-team/pyg-lib into het…

7494c2d

…ero_subgraph

hetero dispatch logic

f87e8f3

ZenoTan changed the title ~~[WIP] Hetero subgraph API~~ Hetero subgraph API May 5, 2022

ZenoTan changed the title ~~Hetero subgraph API~~ Hetero subgraph with dispatching May 5, 2022

ZenoTan and others added 2 commits May 6, 2022 12:03

update

8a1cb14

[pre-commit.ci] auto fixes from pre-commit.com hooks

08faf36

for more information, see https://pre-commit.ci

ZenoTan self-assigned this May 6, 2022

ZenoTan added 0 - Priority P0 feature sampler labels May 6, 2022

ZenoTan requested review from rusty1s and yaoyaowd and removed request for rusty1s and yaoyaowd May 6, 2022 12:05

yaoyaowd previously approved these changes May 6, 2022

View reviewed changes

ZenoTan added 3 commits May 11, 2022 19:24

Merge branch 'master' of https://github.com/pyg-team/pyg-lib into het…

c59be3f

…ero_subgraph

refactor

dd2e8b6

better structure

d1c98cc

ZenoTan requested a review from yaoyaowd May 12, 2022 12:25

yaoyaowd reviewed May 12, 2022

View reviewed changes

ZenoTan added 3 commits May 12, 2022 22:30

fix type name

0a4bc01

structure

1f20313

simplify code using etype loop

f8f9059

ZenoTan requested review from yaoyaowd May 13, 2022 22:17

yaoyaowd reviewed May 13, 2022

View reviewed changes

fix comments

c4c446a

rusty1s reviewed May 15, 2022

View reviewed changes

ZenoTan and others added 3 commits May 15, 2022 17:36

Update CHANGELOG.md

487eaf6

Co-authored-by: Matthias Fey <[email protected]>

Update test/csrc/utils/test_utils.cpp

48c115d

Co-authored-by: Matthias Fey <[email protected]>

Merge branch 'master' into hetero_subgraph

663a675

Uh oh!

Hetero subgraph with dispatching #43

Are you sure you want to change the base?

Hetero subgraph with dispatching #43

Uh oh!

Conversation

ZenoTan commented May 4, 2022

Uh oh!

codecov-commenter commented May 5, 2022 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rusty1s May 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 5, 2022 •

edited by codecov bot

Loading

rusty1s May 15, 2022 •

edited

Loading