parse.y: add heredoc <<~ syntax (Feature #9098) #878

bjmllr · 2015-04-20T11:39:38Z

Allows for the use of heredocs which appear nicely indented in ruby source code, but the indentation is removed during parsing.

Original proposal: https://bugs.ruby-lang.org/issues/9098

Uses the syntax suggested by Avdi Grimm (<<~), and should have the same semantics as String#strip_heredoc from ActiveSupport, that is, the indentation of the least-indented line is removed from each line of the string.

No attempt was made to deal with inconsistent indentation (tabs are considered equal to spaces).

Please let me know if I can improve this patch. Thanks!

avdi · 2015-04-22T19:52:06Z

Yay!

arbox · 2015-04-22T19:58:11Z

👍

Fryguy · 2015-04-22T22:51:14Z

What happens if there are blank lines in the here doc (not necessarily with leading whitespace) ?

bjmllr · 2015-04-22T22:57:05Z

@Fryguy currently those would be considered lines with no indentation, so they would cause the entire heredoc to be flush left. That means that the documentation I just pushed is incorrect, but before I fix it, do you think it's better to ignore blank lines, or treat them as lines with no indentation?

Fryguy · 2015-04-22T23:09:06Z

I was originally thinking ignore them for the purposes of figuring out the strip size. As a user of the method, my least surprise would be with this:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

FancyHello.hello # => "Hello\n\n  World!\n"

Not 100% sure though...what do others think? @avdi?

bjmllr · 2015-04-23T00:11:19Z

With this last commit, lines which are blank (empty or consisting only of tabs and spaces) will not be used to find the base indentation level. On a blank line, any amount of indentation less than the heredoc's base indentation level will be ignored, while any additional indentation will be preserved.

nobu · 2015-04-23T01:22:19Z

I expect that literally written spaces/tabs would be stripped, but not escaped ones, such as \, \s, \t, \040, \x09, and so on.
If we use your approach, line_indent should be counted at parsing each lines, but not after the whole here doc, I think.

bjmllr · 2015-04-23T08:14:46Z

@nobu \ (backslash space) is now preserved when it appears at the start of a line (other escape sequences should be the same). To achieve this, I moved the counting of line_indent inside parser_tokadd_string. Is this more or less what you had in mind?

avdi · 2015-04-23T22:20:48Z

I've thought about this a lot, and I've ended up with two options that I would find acceptable:

#1. Indent is based on shortest-indented non-whitespace line. So:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

outputs:

Hello

  World!

#2. Final indent is based on the indent level of the closing marker (README, in this example). Output would be:

  Hello

    World!

Of the two, I suspect #1 is less likely to surprise people. In both cases, blank lines are ignored for the purpose of indent.

bjmllr · 2015-04-23T23:31:06Z

@avdi Seems like we're all leaning toward #1, that's the behavior implemented in this PR.

nobu · 2015-04-26T05:08:30Z

Escaped spaces seem fine.
Still not working well with string interpolation, #{}.
You'll need to reset heredoc_indent at the beginning of a heredoc but not for each fragments, as well as lex_strterm, and dedent them at the rule string1.
Also, heredoc_indent needs to be saved/restored around compstmt in string_content.

nobu · 2015-04-26T05:14:02Z

BTW, it's better to adopt the existing coding style (indent, braces, etc.) to send patches, even if it is far from your favorites.
This is not MUST and won't be the only reason to reject for ruby usually, but recommended in general.

bjmllr · 2015-04-26T19:13:22Z

@nobu I definitely didn't intend to introduce style inconsistencies! I guess you specifically meant where I was using just spaces for indentation, instead of tabs and then spaces ... if so, I think I have fixed it with this last commit. I'll work on the other issues later this week. Thanks for all the feedback!

bjmllr · 2015-05-04T02:16:33Z

@nobu With the changes you mentioned above, interpolation seems to be working now. I also updated ripper to provide the dedented string and added support for backticks.

Ajedi32 · 2015-07-15T15:53:22Z

👍 This would solve a long-time annoyance I (and presumably many others) have had with the heredoc syntax. Rails has had a solution to this for a while, but for non-rails code the need to remove indentation from heredoc strings has been rather irritating.

deivid-rodriguez · 2015-11-04T18:21:04Z

Hi! I'm guessing this would need a rebase if it was to be merged, but... is it still being considered? I would personally find it very handy to have it in core.

sikachu · 2015-11-23T14:32:35Z

@bjmllr thank you so much for implementing this patch. I gave up a while back since I don't know C and lexer so well and couldn't finish the patch.

Per @avdi's comment, the original intention is to have the output as example #1 as well. I'm glad that this is being adopted.

(Feature ruby#9098)

As suggested by nobu, this eliminates one of the passes through the string and also allows us to give different treatment to escape sequences. doing this required a few other changes: * parser_params now includes a parser_heredoc_indent element * the amount of indentation to remove from a heredoc is now found in parser_tokadd_string rather than parser_heredoc_indent * parser_heredoc_dedent is now called from parser_here_document rather than parser_str_new some cleanup happened in this commit as well: * removed unneeded parser_heredoc_dedent signature * starting size for parser->parser_heredoc_indent is now INT_MAX * parser_heredoc_dedent now calls dispose_string on the input string, unless parser->parser_heredoc_indent is 0, in which case it returns the input string

* added an interpolation test case, which still fails * eliminated STR_FUNC_DEDENT, heredoc_dedent is sufficient * changed heredoc_dedent() to accept and return NODE's * call heredoc_dedent() and reset heredoc_dedent from parser rules * save and restore heredoc_dedent around compstmt in string_content * set heredoc_indent as soon as a squiggly heredoc starts parsing

* make heredoc_line_indent a member of parser_params (needed because a line can be broken into multiple nodes by an interpolated expression) * drop reading_indentation from parser_tokadd_string in favor of heredoc_line_indent * rewrite heredoc_dedent() to walk the AST and rewrite indentation across string fragments * add failing case of interpolated string, fix it by tracking yet-to-be-removed indent for the count process and for the copy process separately * remove carriage return handling

* extract actual dedenting activity from parser_heredoc_dedent() to parser_heredoc_dedent_string() * add parser_heredoc_dedent_ripper() for use in ripper * add squiggly heredoc tests

bjmllr · 2015-11-28T01:36:49Z

I rebased this branch and made a first attempt at @matz 's request regarding the handling of hard tabs. It should now do something sensible for any indentation other than spaces followed by tabs on a single line.

The build error seems to be unrelated, something in test_fork.rb?

hsbt · 2015-11-28T03:07:21Z

@bjmllr I re-runned Travis CI.

Ajedi32 · 2015-11-28T03:12:09Z

I'm confused. Why are tabs being treated as equivalent to spaces at all? E.g. If I write:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space><space><space><space><space>World!
    README
  end
end

Are you saying that should be accepted by the compiler? Why? Why should that be any less invalid than:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space>World!
    README
  end
end

or

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<space><space>World!
    README
  end
end

Shouldn't we just throw an error in all of those cases? Is there ever a legitimate reason why you'd want to allow inconsistent indentation in one of these blocks? What happens when someone has their editor set to display tabs as 4 spaces, and writes:

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<tab>World!
    README
  end
end

Why should that result in:

hello #=> "    Hello\n\n\tWorld!"

I certainly wouldn't expect that result intuitively. In such a case, wouldn't a well-written error message explaining that I'm mixing tabs and spaces be much more helpful for me as a developer?

bjmllr · 2015-12-08T10:26:08Z

Closing this since the feature was added in 9a28a29

bjmllr force-pushed the tildoc branch from 61a35ad to a59c3af Compare April 20, 2015 12:20

bjmllr closed this Apr 20, 2015

bjmllr reopened this Apr 20, 2015

bjmllr force-pushed the tildoc branch from 2d2c2df to aab34a4 Compare April 22, 2015 22:29

bjmllr force-pushed the tildoc branch from 0b0545d to bf88013 Compare April 23, 2015 08:08

bjmllr force-pushed the tildoc branch from 4aeaf2b to 7f9c35f Compare May 3, 2015 04:04

bjmllr closed this May 3, 2015

bjmllr reopened this May 3, 2015

bjmllr added 5 commits November 27, 2015 10:21

parse.y: add heredoc <<~ syntax (Feature ruby#9098)

a94728d

literals.rdoc: describe heredoc <<~ syntax

f0fa559

(Feature ruby#9098)

parse.y: blank line handling for <<~ heredoc

c9cc036

(Feature ruby#9098)

undo indentation inconsistencies in parse.y

5e1b044

bjmllr added 4 commits November 27, 2015 10:30

add heredoc dedenting to ripper

388f8ca

* extract actual dedenting activity from parser_heredoc_dedent() to parser_heredoc_dedent_string() * add parser_heredoc_dedent_ripper() for use in ripper * add squiggly heredoc tests

dedent squiggly heredocs when using backticks

af42d44

bjmllr force-pushed the tildoc branch 2 times, most recently from 35dbcd5 to eb7f824 Compare November 28, 2015 01:13

parse.y: dedented heredoc treats tab as 8 spaces

eb7f824

bjmllr closed this Dec 8, 2015

edward mentioned this pull request Jun 14, 2016

Nokogiri HTML refactor Shopify/erb_lint#6

Merged

1 task

arharovets mentioned this pull request Aug 19, 2019

Display a product with its repositories SUSE/rmt#458

Closed

arharovets mentioned this pull request Aug 31, 2019

Display a product with its repositories SUSE/rmt#464

Merged

sharpobject mentioned this pull request Sep 25, 2020

Dedenter should handle tabs sorbet/sorbet#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse.y: add heredoc <<~ syntax (Feature #9098) #878

parse.y: add heredoc <<~ syntax (Feature #9098) #878

bjmllr commented Apr 20, 2015

avdi commented Apr 22, 2015

arbox commented Apr 22, 2015

Fryguy commented Apr 22, 2015

bjmllr commented Apr 22, 2015

Fryguy commented Apr 22, 2015

bjmllr commented Apr 23, 2015

nobu commented Apr 23, 2015

bjmllr commented Apr 23, 2015

avdi commented Apr 23, 2015

bjmllr commented Apr 23, 2015

nobu commented Apr 26, 2015

nobu commented Apr 26, 2015

bjmllr commented Apr 26, 2015

bjmllr commented May 4, 2015

Ajedi32 commented Jul 15, 2015

deivid-rodriguez commented Nov 4, 2015

sikachu commented Nov 23, 2015

bjmllr commented Nov 28, 2015

hsbt commented Nov 28, 2015

Ajedi32 commented Nov 28, 2015

bjmllr commented Dec 8, 2015

parse.y: add heredoc <<~ syntax (Feature #9098) #878

parse.y: add heredoc <<~ syntax (Feature #9098) #878

Conversation

bjmllr commented Apr 20, 2015

avdi commented Apr 22, 2015

arbox commented Apr 22, 2015

Fryguy commented Apr 22, 2015

bjmllr commented Apr 22, 2015

Fryguy commented Apr 22, 2015

bjmllr commented Apr 23, 2015

nobu commented Apr 23, 2015

bjmllr commented Apr 23, 2015

avdi commented Apr 23, 2015

bjmllr commented Apr 23, 2015

nobu commented Apr 26, 2015

nobu commented Apr 26, 2015

bjmllr commented Apr 26, 2015

bjmllr commented May 4, 2015

Ajedi32 commented Jul 15, 2015

deivid-rodriguez commented Nov 4, 2015

sikachu commented Nov 23, 2015

bjmllr commented Nov 28, 2015

hsbt commented Nov 28, 2015

Ajedi32 commented Nov 28, 2015

bjmllr commented Dec 8, 2015