Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ZenoTan
Copy link
Member

@ZenoTan ZenoTan commented May 4, 2022

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented May 5, 2022

Codecov Report

❌ Patch coverage is 93.50649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.51%. Comparing base (7b599eb) to head (663a675).
⚠️ Report is 403 commits behind head on master.

Files with missing lines Patch % Lines
pyg_lib/csrc/utils/hetero_dispatch.h 77.77% 4 Missing ⚠️
pyg_lib/csrc/sampler/cpu/mapper.h 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
- Coverage   97.27%   96.51%   -0.76%     
==========================================
  Files          10       12       +2     
  Lines         220      287      +67     
==========================================
+ Hits          214      277      +63     
- Misses          6       10       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ZenoTan ZenoTan changed the title [WIP] Hetero subgraph API Hetero subgraph API May 5, 2022
@ZenoTan ZenoTan changed the title Hetero subgraph API Hetero subgraph with dispatching May 5, 2022
@ZenoTan ZenoTan self-assigned this May 6, 2022
@ZenoTan ZenoTan requested review from rusty1s and yaoyaowd and removed request for rusty1s and yaoyaowd May 6, 2022 12:05
yaoyaowd
yaoyaowd previously approved these changes May 6, 2022
@yaoyaowd yaoyaowd dismissed their stale review May 6, 2022 17:30

was checking out wrong commit this morning.

@ZenoTan ZenoTan requested a review from yaoyaowd May 12, 2022 12:25
void fill(const scalar_t* nodes_data, const scalar_t size) {
if (use_vec) {
for (scalar_t i = 0; i < size; ++i)
for (scalar_t i = 0; i < size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me post my question here, I read some documents and based on my understanding scalar_t includes both float, double, int32, int64 during compile. But in a lot of our usecases we are iterating over integers. How does pytorch avoid compile float type for these functions? Is there a better way to be more specific to the data types here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some helper functions like is_integral for a dtype, but IMO it is mostly runtime checking. We can also use some STL type checking for compile time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

});

return std::make_tuple(out_rowptr, out_col, out_edge_id);
return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code structure looks a little weird to me because csrc/sampler/cpu/subgraph_kernel exists for register TORCH_LIBRARY_IMPL and it is using a general implementation in csr/sampler/subgraph.cpp. How about reorganize the code like this:

csr
  - ops
    # all ops expose for pytorch.
    - sampler
  # all general graph operation.
  - sampler

We don't need to refactor the code structure now. But want to hear your opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could follow the style in other pyg repos: put CPU/GPU specific impl in separate folders and provide common interface in a higher directory.

});

return std::make_tuple(out_rowptr, out_col, out_edge_id);
return subgraph_bipartite(rowptr, col, nodes, nodes, return_edge_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seems subgraph.cpp also defines library. Why not merge them together since sampler/subgraph.cpp also runs on cpu only.

@ZenoTan ZenoTan requested review from yaoyaowd May 13, 2022 22:17

auto res = subgraph_with_mapper<scalar_t>(rowptr, col, src_nodes,
mapper, return_edge_id);
out_rowptr = std::get<0>(res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we could do std::tie(out_powptr, out_col, out_edge_id) = res?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


for (const auto& kv : rowptr) {
const auto& edge_type = kv.key();
bool pass = filter_args_by_edge(edge_type, src_nodes_arg, dst_nodes_arg,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer

pass = src_nodes_args.filter_by_edge(edge_type) && dst_nodes_args.filter_by_edge(edge_type) && edge_id_arg.filter_by_edge(edge_type)

or from an efficiency point of view.

auto dst = get_dst(edge_type)
auto src = get_src(edge_type)
bool pass = return_edge_id.counts(edge_type) > 0 && src_nodes.counts(src) > 0 && dst_nodes.counts(dst) > 0;

const auto& r = rowptr.at(edge_type);
const auto& c = col.at(edge_type);
res.insert(edge_type,
subgraph_bipartite(r, c, std::get<0>(vals), std::get<1>(vals),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here would just be

subgraph_bipartite(r, c, src_nodes.at(src), dst_nodes.at(dst), return_edge_id.at(edge_type));

return op.call(rowptr, col, nodes, return_edge_id);
}

c10::Dict<utils::edge_t,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually would have expected we return a tuple of dictionaries, similar to how the input looks like.

void fill(const scalar_t* nodes_data, const scalar_t size) {
if (use_vec) {
for (scalar_t i = 0; i < size; ++i)
for (scalar_t i = 0; i < size; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AT_DISPATCH_INTEGRAL_TYPES call handles which types scalar_t can take (during compile time).

offset++;
}
AT_DISPATCH_INTEGRAL_TYPES(
nodes.scalar_type(), "subgraph_kernel_with_mapper", [&] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a one-liner again?


TORCH_LIBRARY_IMPL(pyg, CPU, m) {
m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph"), TORCH_FN(subgraph_kernel));
m.impl(TORCH_SELECTIVE_NAME("pyg::subgraph_bipartite"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we want to expose that? Looks more like an internal function to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user want to build a subgraph of a bipartite graph then he can use it.

}

c10::Dict<utils::EdgeType,
std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the output should be a tuple of dictionaries (similar to the input).

if (pass) {
const auto& r = rowptr.at(edge_type);
const auto& c = col.at(edge_type);
res.insert(edge_type, subgraph_bipartite(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we user the mapper here? Other-wise, we will re-map across every edge type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it has a cost, but the mapper is more read-intensive. I will add a TODO here.


inline NodeType get_dst(const EdgeType& e) {
return e.substr(e.find_last_of(SPLIT_TOKEN) + 1);
}
Copy link
Member

@rusty1s rusty1s May 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a function that maps tuples to strings and vice versa.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

std::tuple<at::Tensor, at::Tensor, c10::optional<at::Tensor>>>
hetero_subgraph(const utils::EdgeTensorDict& rowptr,
const utils::EdgeTensorDict& col,
const utils::NodeTensorDict& src_nodes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we have both src_nodes and dst_nodes. IMO, these can be safely merged as in https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.HeteroData.subgraph.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separating src and dst is just to give some flexibility. We could also have the merged API though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants