Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

goungoun
Copy link

@goungoun goungoun commented Sep 13, 2025

What changes were proposed in this pull request?

This PR implements fixed-length pattern matching. e.g. (u)-[*3]->(v)
https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf (page 15)

Why are the changes needed?

Users must manually write a long pattern without the feature. After the change, the parser can handle the pattern by creating intermediate vertices and edges. For example (u)-[*3]->(v) is converted to (u)-[]->(_v1);(_v1)-[]->(_v2);(_v2)-[]->(v); internally.

@SemyonSinchenko
Copy link
Collaborator

JFYI, we have a configured pre-commit hook

@goungoun goungoun marked this pull request as ready for review September 13, 2025 11:17
@goungoun
Copy link
Author

@SemyonSinchenko Thanks for letting me know. It seems that it failed due to the formatting. I pushed the formatted result by the sbt scalafmtAll.

@SemyonSinchenko
Copy link
Collaborator

@SemyonSinchenko Thanks for letting me know. It seems that it failed due to the formatting. I pushed the formatted result by the sbt scalafmtAll.

We have quite a strict rules for scalafix... Just add something like case _ => throw new GraphFramesUnreachableException(), should fix it.

Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @goungoun !

@goungoun
Copy link
Author

Thanks! @SemyonSinchenko.

Copy link
Collaborator

@rjurney rjurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one more unit test.

@goungoun goungoun changed the title feat: variable length pattern matching feat: long length pattern matching Sep 14, 2025
@goungoun goungoun changed the title feat: long length pattern matching feat: fixed length pattern matching Sep 15, 2025
@goungoun
Copy link
Author

There are two ways to handle quantified path patterns. Everyone wants to use the pattern such as (u)-[*2..3]->(v), but my approach supports (u)-[*3]->(v) only. The last commit distinguishes fixed-length pattern from the variable length pattern in the parser. The error message includes the limitation that it does not support the variable-length pattern.

@rjurney
Copy link
Collaborator

rjurney commented Sep 15, 2025

There are two ways to handle quantified path patterns. Everyone wants to use the pattern such as (u)-[*2..3]->(v), but my approach supports (u)-[*3]->(v) only. The last commit distinguishes fixed-length pattern from the variable length pattern in the parser. The error message includes the limitation that it does not support the variable-length pattern.

This is excellent. Do you think you know how to implement variable length patterns in another PR? That is a highly desirable, requested feature!

@goungoun
Copy link
Author

@rjurney I think variable length pattern can be split into fixed-length pattern. (u)-[*2..5]->(v) is (u)-[*2]->(v) or (u)-[*3]->(v) or (u)-[*4]->(v) or (u)-[*5]->(v). Then union all the results is the expected result. Is that right?

@goungoun
Copy link
Author

goungoun commented Sep 16, 2025

I didn't implement the parser side, still tricky, but it is better to see the code to validate what it looks like.

@goungoun
Copy link
Author

The variable pattern relies on the dataframeunionByName. It is handled within the find method, not the parser side. With that minimal change, now it is working (u)-[*3..5]->(v).

@goungoun goungoun changed the title feat: fixed length pattern matching feat: variable length pattern matching Sep 17, 2025
@goungoun
Copy link
Author

I've added more tests on the edge cases and checked the error message.
Supported:
(u)-[*5]->(v)
(u)-[*3..5]->(v)
(u)-[*10]->(v) // fixed the passer patten to include 0

Not supported:
(u)-[*0]->(v)
(u)-[*..5]->(v)
(u)-[*3..]->(v)
(u)-[*]->(v)

@goungoun
Copy link
Author

@rjurney @SemyonSinchenko, There were several commits to cover more cases and edge cases. It is not easy to track changes from the original branch. How can I make it easier for your review? Is it better to open a new PR with a new branch?

@SemyonSinchenko
Copy link
Collaborator

How can I make it easier for your review? Is it better to open a new PR with a new branch?

At least if we merge the first PR from you, I won't need anymore to approve each CI run :)

@SemyonSinchenko SemyonSinchenko linked an issue Sep 18, 2025 that may be closed by this pull request
@SemyonSinchenko
Copy link
Collaborator

@goungoun I would like to mention it again, that we have a pre-commit hooks that run the same set of checks like in CI. You can read about it here: https://github.com/graphframes/graphframes/blob/master/CONTRIBUTING.md#styleguides

@goungoun
Copy link
Author

goungoun commented Sep 18, 2025

@SemyonSinchenko Thanks for your help. By the way, the named edge pattern from the Issue #539 is not supported.
Supported:
(a)-[*1..2]->(b)

Not Supported:
(a)-[e1..2]->(b).
(a)-[e*1..2]->(b).

@SemyonSinchenko
Copy link
Collaborator

@goungoun Tbh Im fine with it. There will be another and more Cypher-like API built on top of the PropertyGraphFrame that will cover that. I would like to leave the GraphFrames pattern matching more for motiffs finding instead of MATCH-like queries.

@SemyonSinchenko
Copy link
Collaborator

@rjurney what do you think?

@goungoun
Copy link
Author

@SemyonSinchenko I referenced the Issue #539 and support named edge. If the edge is named e, it will generate _e1, _e2, _e3.

Supported:
(a)-[e*2]->(b)
(a)-[e*1..2]->(b)

Not Supported:
(a)-[e1..2]->(b) //wrong syntax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: variable length motif finding
3 participants