Thanks to visit codestin.com
Credit goes to github.com

Skip to content

parse.y: add heredoc <<~ syntax (Feature #9098) #878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

bjmllr
Copy link

@bjmllr bjmllr commented Apr 20, 2015

Allows for the use of heredocs which appear nicely indented in ruby source code, but the indentation is removed during parsing.

Original proposal: https://bugs.ruby-lang.org/issues/9098

Uses the syntax suggested by Avdi Grimm (<<~), and should have the same semantics as String#strip_heredoc from ActiveSupport, that is, the indentation of the least-indented line is removed from each line of the string.

No attempt was made to deal with inconsistent indentation (tabs are considered equal to spaces).

Please let me know if I can improve this patch. Thanks!

@avdi
Copy link
Contributor

avdi commented Apr 22, 2015

Yay!

@arbox
Copy link
Contributor

arbox commented Apr 22, 2015

👍

@Fryguy
Copy link

Fryguy commented Apr 22, 2015

What happens if there are blank lines in the here doc (not necessarily with leading whitespace) ?

@bjmllr
Copy link
Author

bjmllr commented Apr 22, 2015

@Fryguy currently those would be considered lines with no indentation, so they would cause the entire heredoc to be flush left. That means that the documentation I just pushed is incorrect, but before I fix it, do you think it's better to ignore blank lines, or treat them as lines with no indentation?

@Fryguy
Copy link

Fryguy commented Apr 22, 2015

I was originally thinking ignore them for the purposes of figuring out the strip size. As a user of the method, my least surprise would be with this:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

FancyHello.hello # => "Hello\n\n  World!\n"

Not 100% sure though...what do others think? @avdi?

@bjmllr
Copy link
Author

bjmllr commented Apr 23, 2015

With this last commit, lines which are blank (empty or consisting only of tabs and spaces) will not be used to find the base indentation level. On a blank line, any amount of indentation less than the heredoc's base indentation level will be ignored, while any additional indentation will be preserved.

@nobu
Copy link
Member

nobu commented Apr 23, 2015

I expect that literally written spaces/tabs would be stripped, but not escaped ones, such as \, \s, \t, \040, \x09, and so on.
If we use your approach, line_indent should be counted at parsing each lines, but not after the whole here doc, I think.

@bjmllr
Copy link
Author

bjmllr commented Apr 23, 2015

@nobu \ (backslash space) is now preserved when it appears at the start of a line (other escape sequences should be the same). To achieve this, I moved the counting of line_indent inside parser_tokadd_string. Is this more or less what you had in mind?

@avdi
Copy link
Contributor

avdi commented Apr 23, 2015

I've thought about this a lot, and I've ended up with two options that I would find acceptable:

#1. Indent is based on shortest-indented non-whitespace line. So:

class FancyHello
  def self.hello
    puts <<~README.inspect
      Hello

        World!
    README
  end
end

outputs:

Hello

  World!

#2. Final indent is based on the indent level of the closing marker (README, in this example). Output would be:

  Hello

    World!

Of the two, I suspect #1 is less likely to surprise people. In both cases, blank lines are ignored for the purpose of indent.

@bjmllr
Copy link
Author

bjmllr commented Apr 23, 2015

@avdi Seems like we're all leaning toward #1, that's the behavior implemented in this PR.

@nobu
Copy link
Member

nobu commented Apr 26, 2015

Escaped spaces seem fine.
Still not working well with string interpolation, #{}.
You'll need to reset heredoc_indent at the beginning of a heredoc but not for each fragments, as well as lex_strterm, and dedent them at the rule string1.
Also, heredoc_indent needs to be saved/restored around compstmt in string_content.

@nobu
Copy link
Member

nobu commented Apr 26, 2015

BTW, it's better to adopt the existing coding style (indent, braces, etc.) to send patches, even if it is far from your favorites.
This is not MUST and won't be the only reason to reject for ruby usually, but recommended in general.

@bjmllr
Copy link
Author

bjmllr commented Apr 26, 2015

@nobu I definitely didn't intend to introduce style inconsistencies! I guess you specifically meant where I was using just spaces for indentation, instead of tabs and then spaces ... if so, I think I have fixed it with this last commit. I'll work on the other issues later this week. Thanks for all the feedback!

@bjmllr
Copy link
Author

bjmllr commented May 4, 2015

@nobu With the changes you mentioned above, interpolation seems to be working now. I also updated ripper to provide the dedented string and added support for backticks.

@Ajedi32
Copy link

Ajedi32 commented Jul 15, 2015

👍 This would solve a long-time annoyance I (and presumably many others) have had with the heredoc syntax. Rails has had a solution to this for a while, but for non-rails code the need to remove indentation from heredoc strings has been rather irritating.

@deivid-rodriguez
Copy link
Contributor

Hi! I'm guessing this would need a rebase if it was to be merged, but... is it still being considered? I would personally find it very handy to have it in core.

@sikachu
Copy link
Contributor

sikachu commented Nov 23, 2015

@bjmllr thank you so much for implementing this patch. I gave up a while back since I don't know C and lexer so well and couldn't finish the patch.

Per @avdi's comment, the original intention is to have the output as example #1 as well. I'm glad that this is being adopted.

As suggested by nobu, this eliminates one of the passes through the
string and also allows us to give different treatment to escape
sequences.

doing this required a few other changes:
* parser_params now includes a parser_heredoc_indent element
* the amount of indentation to remove from a heredoc is now found in
  parser_tokadd_string rather than parser_heredoc_indent
* parser_heredoc_dedent is now called from parser_here_document
  rather than parser_str_new

some cleanup happened in this commit as well:
* removed unneeded parser_heredoc_dedent signature
* starting size for parser->parser_heredoc_indent is now INT_MAX
* parser_heredoc_dedent now calls dispose_string on the input string,
  unless parser->parser_heredoc_indent is 0, in which case it returns
  the input string
* added an interpolation test case, which still fails
* eliminated STR_FUNC_DEDENT, heredoc_dedent is sufficient
* changed heredoc_dedent() to accept and return NODE's
* call heredoc_dedent() and reset heredoc_dedent from parser rules
* save and restore heredoc_dedent around compstmt in string_content
* set heredoc_indent as soon as a squiggly heredoc starts parsing
* make heredoc_line_indent a member of parser_params (needed because a
  line can be broken into multiple nodes by an interpolated expression)
* drop reading_indentation from parser_tokadd_string in favor of
  heredoc_line_indent
* rewrite heredoc_dedent() to walk the AST and rewrite indentation across
  string fragments
* add failing case of interpolated string, fix it by tracking yet-to-be-removed
  indent for the count process and for the copy process separately
* remove carriage return handling
* extract actual dedenting activity from parser_heredoc_dedent() to
  parser_heredoc_dedent_string()
* add parser_heredoc_dedent_ripper() for use in ripper
* add squiggly heredoc tests
@bjmllr bjmllr force-pushed the tildoc branch 2 times, most recently from 35dbcd5 to eb7f824 Compare November 28, 2015 01:13
@bjmllr
Copy link
Author

bjmllr commented Nov 28, 2015

I rebased this branch and made a first attempt at @matz 's request regarding the handling of hard tabs. It should now do something sensible for any indentation other than spaces followed by tabs on a single line.

The build error seems to be unrelated, something in test_fork.rb?

@hsbt
Copy link
Member

hsbt commented Nov 28, 2015

@bjmllr I re-runned Travis CI.

@Ajedi32
Copy link

Ajedi32 commented Nov 28, 2015

I'm confused. Why are tabs being treated as equivalent to spaces at all? E.g. If I write:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space><space><space><space><space>World!
    README
  end
end

Are you saying that should be accepted by the compiler? Why? Why should that be any less invalid than:

def hello
  puts <<~README.inspect
<tab>Hello

<space><space><space><space>World!
    README
  end
end

or

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<space><space>World!
    README
  end
end

Shouldn't we just throw an error in all of those cases? Is there ever a legitimate reason why you'd want to allow inconsistent indentation in one of these blocks? What happens when someone has their editor set to display tabs as 4 spaces, and writes:

def hello
  puts <<~README.inspect
<space><space><space><space>Hello

<tab>World!
    README
  end
end

Why should that result in:

hello #=> "    Hello\n\n\tWorld!"

I certainly wouldn't expect that result intuitively. In such a case, wouldn't a well-written error message explaining that I'm mixing tabs and spaces be much more helpful for me as a developer?

@bjmllr
Copy link
Author

bjmllr commented Dec 8, 2015

Closing this since the feature was added in 9a28a29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants