-
-
Notifications
You must be signed in to change notification settings - Fork 933
Use string index in String #sub/#gsub when String pattern passed without creating a Regexp #5952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I can provide a |
|
(Also, I'm going out on a limb and assuming that we don't want to replicate the inconsistency in output from MRI's |
`String#sub` and `#gsub` called with a String arg should use string indexing and not compile a Regexp. This requires: * Constructing MatchData instances appropriately so that `$~` et al get populated correctly (but only in those situations where it would happen). * Using the `Regexp` static method `regsub()` to perform the replacement string parsing so that special sequences (`\0`, `\'`, etc) are correctly expanded and inserted. @enebo wrote a new `StringSupport.index()` method that makes the position indexing easier, which we use here. As a style refactor, the `gsubCommon19()` methods were renamed `gsubCommon()`, since there's no 1.9 distinction anymore and the methods are all private.
68e7c90 to
33bb5ff
Compare
|
Squashed the code changes down, left the new benchmark as its own commit to make before/after running easier for others. Benchmark results before/after from my machine https://gist.github.com/fidothe/7d76f60cf2adbb8a2aab8adf21f40192 |
|
As I said before, this is the first Java code I've written, so I imagine it's full of style problems. (I found redundant casting and stuff, but assume there's more stuff like that I didn't recognise.) Let me know if there's stuff you want changed (and why, so I learn...) and I will happily oblige. |
|
@fidothe Awesome job! Getting some more perf out of a common method hopefully is a pretty satisfying start in hacking JRuby. As far as style I made a few very minor changes: 457d12b. Unfortunately, RubyString is not so consistent from a single-style perspective. Some people more heavily favor marking anything which will not change as final whereas others (myself included) feel final is more useful as a marker when that finalness important or aids in helping signal intent. We have some other oddities in this file like following MRI naming vs using more obvious full word variable names. There was 3 places with the else was not joined with the previous { in that commit above. For conditionals we use {} always and they are on same line as if/else. For single line ifs they require {} if you want if and body on different lines or no {} if the body is on the same line as the if. |
|
Matt, your Java is very nice. Would certainly assumed you're an experienced programmer based on this! |
Aw, thanks 😊 @kares |
This PR addresses #5905. It's still pretty naive, and I'm really not a Java programmer, so there's probably a few dumb mistakes and style things aside from the big MatchData problem.
The big known problem remaining is how to populate the
$~MatchData. In MRI theMatchDatafrom a string-onlyString#subor#gsubwill return aRegexpfromMatchData#regexp. I don't think we currently have a way of doing that, so that means changes toMatchData.MRI is clearly doing something odd here, because if you call
#inspecton theMatchDatareturned from a string pattern#sub, it's in a different form to what you get from invoking it with a Regexp pattern (this from MRI 2.6.3):Once I know what the best approach to handling that is, I can fill the hole in the
#subimplementation and#gsub's returnedMatchDatas will behave as expected.