-
Notifications
You must be signed in to change notification settings - Fork 5.4k
parse.y: add heredoc <<~ syntax (Feature #9098) #878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yay! |
👍 |
What happens if there are blank lines in the here doc (not necessarily with leading whitespace) ? |
@Fryguy currently those would be considered lines with no indentation, so they would cause the entire heredoc to be flush left. That means that the documentation I just pushed is incorrect, but before I fix it, do you think it's better to ignore blank lines, or treat them as lines with no indentation? |
I was originally thinking ignore them for the purposes of figuring out the strip size. As a user of the method, my least surprise would be with this: class FancyHello
def self.hello
puts <<~README.inspect
Hello
World!
README
end
end
FancyHello.hello # => "Hello\n\n World!\n" Not 100% sure though...what do others think? @avdi? |
With this last commit, lines which are blank (empty or consisting only of tabs and spaces) will not be used to find the base indentation level. On a blank line, any amount of indentation less than the heredoc's base indentation level will be ignored, while any additional indentation will be preserved. |
I expect that literally written spaces/tabs would be stripped, but not escaped ones, such as |
@nobu |
I've thought about this a lot, and I've ended up with two options that I would find acceptable: #1. Indent is based on shortest-indented non-whitespace line. So: class FancyHello
def self.hello
puts <<~README.inspect
Hello
World!
README
end
end outputs:
#2. Final indent is based on the indent level of the closing marker (
Of the two, I suspect #1 is less likely to surprise people. In both cases, blank lines are ignored for the purpose of indent. |
Escaped spaces seem fine. |
BTW, it's better to adopt the existing coding style (indent, braces, etc.) to send patches, even if it is far from your favorites. |
@nobu I definitely didn't intend to introduce style inconsistencies! I guess you specifically meant where I was using just spaces for indentation, instead of tabs and then spaces ... if so, I think I have fixed it with this last commit. I'll work on the other issues later this week. Thanks for all the feedback! |
@nobu With the changes you mentioned above, interpolation seems to be working now. I also updated ripper to provide the dedented string and added support for backticks. |
👍 This would solve a long-time annoyance I (and presumably many others) have had with the heredoc syntax. Rails has had a solution to this for a while, but for non-rails code the need to remove indentation from heredoc strings has been rather irritating. |
Hi! I'm guessing this would need a rebase if it was to be merged, but... is it still being considered? I would personally find it very handy to have it in core. |
As suggested by nobu, this eliminates one of the passes through the string and also allows us to give different treatment to escape sequences. doing this required a few other changes: * parser_params now includes a parser_heredoc_indent element * the amount of indentation to remove from a heredoc is now found in parser_tokadd_string rather than parser_heredoc_indent * parser_heredoc_dedent is now called from parser_here_document rather than parser_str_new some cleanup happened in this commit as well: * removed unneeded parser_heredoc_dedent signature * starting size for parser->parser_heredoc_indent is now INT_MAX * parser_heredoc_dedent now calls dispose_string on the input string, unless parser->parser_heredoc_indent is 0, in which case it returns the input string
* added an interpolation test case, which still fails * eliminated STR_FUNC_DEDENT, heredoc_dedent is sufficient * changed heredoc_dedent() to accept and return NODE's * call heredoc_dedent() and reset heredoc_dedent from parser rules * save and restore heredoc_dedent around compstmt in string_content * set heredoc_indent as soon as a squiggly heredoc starts parsing
* make heredoc_line_indent a member of parser_params (needed because a line can be broken into multiple nodes by an interpolated expression) * drop reading_indentation from parser_tokadd_string in favor of heredoc_line_indent * rewrite heredoc_dedent() to walk the AST and rewrite indentation across string fragments * add failing case of interpolated string, fix it by tracking yet-to-be-removed indent for the count process and for the copy process separately * remove carriage return handling
* extract actual dedenting activity from parser_heredoc_dedent() to parser_heredoc_dedent_string() * add parser_heredoc_dedent_ripper() for use in ripper * add squiggly heredoc tests
35dbcd5
to
eb7f824
Compare
I rebased this branch and made a first attempt at @matz 's request regarding the handling of hard tabs. It should now do something sensible for any indentation other than spaces followed by tabs on a single line. The build error seems to be unrelated, something in |
@bjmllr I re-runned Travis CI. |
I'm confused. Why are tabs being treated as equivalent to spaces at all? E.g. If I write: def hello
puts <<~README.inspect
<tab>Hello
<space><space><space><space><space><space><space><space>World!
README
end
end Are you saying that should be accepted by the compiler? Why? Why should that be any less invalid than: def hello
puts <<~README.inspect
<tab>Hello
<space><space><space><space>World!
README
end
end or def hello
puts <<~README.inspect
<space><space><space><space>Hello
<space><space>World!
README
end
end Shouldn't we just throw an error in all of those cases? Is there ever a legitimate reason why you'd want to allow inconsistent indentation in one of these blocks? What happens when someone has their editor set to display tabs as 4 spaces, and writes: def hello
puts <<~README.inspect
<space><space><space><space>Hello
<tab>World!
README
end
end Why should that result in: hello #=> " Hello\n\n\tWorld!" I certainly wouldn't expect that result intuitively. In such a case, wouldn't a well-written error message explaining that I'm mixing tabs and spaces be much more helpful for me as a developer? |
Closing this since the feature was added in 9a28a29 |
Allows for the use of heredocs which appear nicely indented in ruby source code, but the indentation is removed during parsing.
Original proposal: https://bugs.ruby-lang.org/issues/9098
Uses the syntax suggested by Avdi Grimm (
<<~
), and should have the same semantics asString#strip_heredoc
from ActiveSupport, that is, the indentation of the least-indented line is removed from each line of the string.No attempt was made to deal with inconsistent indentation (tabs are considered equal to spaces).
Please let me know if I can improve this patch. Thanks!