[WIP, 2025-10-28] This is an experiment in clarifying some aspects of Ruby syntax and semantics. For that we're going to introduce an alternative Lisp-based syntax for Ruby, preserving Ruby semantics.
The goal is to define a comprehensive, trivially-parsable and sugar-free syntax.
As I started working on this, I had to find a better explanation for some aspects of Ruby than what is available in standard documentation. So we also discuss some aspects of standard Ruby syntax and semantics.
Table of Contents
- Array literals: full version
- Single-variable assignment
- Multi-variable assignment
- Logical operators
- Control flow
- Rubysyn: literals
For some reason, the standard documentation does not explain full syntax of array literals.
Most common case of array literals is extremely well known:
-
empty array:
[]; -
array of three elements:
[1, 2, 3]; -
string-array literals:
%w(...)and%W(...); -
symbol-array literals:
%i(...) and%i(...)`;
Additionally, array literals support so called "constructing array splat" syntax:
[1, 2, *foo, 3]The asterisk before the value replaces it with zero or more values,
depending on what is in foo:
- if
foois an array,*foois replaced by its elements:
foo = [10, 11]
[1, 2, *foo, 3]
# [1, 2, 10, 11, 3]-
if
fooresponds toto_amethod, that method is called, and*foois replaced by the result array (see below for some examples); -
finally, for all other values
*foois replaced by the value offoo:
foo = "hello"
[1, 2, *foo, 3]
# [1, 2, "hello", 3]Particularly, nil.to_a returns an empty array:
foo = nil
[1, 2, *foo, 3]
# [1, 2, 3]If foo is a hash, *foo is replaced by a list of two-element
arrays, one for each hash key:
foo = { foo: :bar, quux: 23 }
[1, 2, *foo, 3]
# [1, 2, [ :foo, :bar ], [ :quux, 23 ], 3]For some reason, this is not explained in standard Ruby documentation:
This syntax is used in the "Implicit Array Assignment" section, but in a very confusing way (more on that below).
This syntax has nothing to do with assignment, it works everywhere where you use array literals. NB: Do not confuse it with "destructuring array splat" syntax which is very much different, see below.
Constructing array splat is pure syntactic sugar. You can easily implement it as a simple Ruby function:
def array_splat(arr, chunk)
case
when chunk.is_a?(Array)
return arr.concat(chunk)
when chunk.respond_to?(:to_a)
tmp = chunk.to_a
if tmp.is_a?(Array)
return arr.concat(tmp)
else
raise TypeError.new("can't convert #{chunk.class} to Array (#{chunk.class}#to_a gives #{tmp.class}) (TypeError)")
end
else
return arr.append(chunk)
end
endNote that the semantics of this function has only been specified in
the standard documentation very recently:
"Unpacking Positional Arguments".
Also, there does not seem to exist a function with the same semantics
as array_splat.
Note that [] is itself a sugar for Array#[] method:
Array.[](2, 3, 4)
# [2, 3, ]So it's possible that constructing array splat actually stems from function argument processing.
However, for now we consider array literal suffix an independent syntactical construct.
Having considered all that, we realize that we need to handle only the most trivial case, everything else is a syntax sugar.
(array <value>...)
Here are some examples:
| Ruby | Rubysyn |
|---|---|
[] |
(array) |
[ 1, 2, 3 ] |
(array 1 2 3) |
We also define the array-splat function with the same semantics as
def array_splat defined above.
(array-splat arr chunk)Here are some examples:
| Ruby | Rubysyn |
|---|---|
[1, 2, *foo] |
(array-splat (array 1 2) foo) |
[ 3, 4, *bar, 5, 6 ] |
(array-splat (array-splat (array 3 4) bar) (array 5 6)) |
Single-variable assignment has a very simple base syntax:
a = 3
# 3On the right side of the equals sign there is always a single expression, but there is an extra syntax sugar that automatically creates arrays from comma-separated expressions.
a = 3, 4, 5
# [3, 4, 5]This is completely equivalent to the usual:
a = [3, 4, 5]
# [3, 4, 5]Another way to trigger automatic creation of arrays is to use a constructing array splat syntax:
a = *3
# [3]This is completely equivalent to:
a = [3]
# [3]Variable assignment automatically declares variable in the current binding, if it was not already declared.
Newly-declared variables have a value of nil.
We'll clarify what "binding" means below.
Note that the right-hand side of assignment is executed after the
left-hand variable was declared and initialized to nil. For example:
a = a
# nil
b = b.class
# NilClassHaving considered all of this, we decouple variable declaration from variable assignment.
(var <var>)
Declares listed variables in the current binding and initializes them to nil.
(var) also returns nil.
(var a)
# nil
(assign var value) assigns a single value to a single variable. Variable must
be declared by (var), otherwise a runtime exception is raised.
(assign) returns a value as the result.
Example:
(var a)
(assign a 3)
# 3
Multi-variable assignment seems to be a completely different construct compared to single-variable assignment.
a, b, c = 1, 2, 3
# [1, 2, 3]
[a, b, c]
# [1, 2, 3]On the left side of assignment operator (=) there is a list of two
or more variable names. Note that variables do not need to be unique:
a, a, a = 1, 2, 3
# [1, 2, 3]
a
# 3On the right side of assignment operator there is always an array of values. The size of that array can be arbitrary and may not match the number of variables.
On the right side of the equals sign there is always a single array value. There is also an extra syntax sugar that automatically creates arrays from comma-separated values. Additionally, a single non-array value is converted to a one-element array.
a, b, c = 3, 4, 5
# [3, 4, 5]
[a, b, c]
# [3, 4, 5]This is completely equivalent to:
a, b, c = [3, 4, 5]
# [3, 4, 5]
[a, b, c]
# [3, 4, 5]Single non-array value is almost equivalent to a one-element array, only the return value of the operator itself is different:
a, b, c = 1
# 1
[a, b, c] = [1, nil, nil]
a, b, c = [1]
# [1]
[a, b, c] = [1, nil, nil]Constructor array splat syntax works the same way as in single-variable assignment.
foo = [2, 3]
a, b, c = 1, *foo
# [1, 2, 3]
[a, b, c]
# [1, 2, 3]If there are fewer variables than values, unused values are ignored.
a, b = [1, 2, 3]
# [1, 2, 3]
[a, b]
# [1, 2]If there are more variables than values, extra variables are set to nil.
a, b, c = [1, 2]
# [1, 2]
[a, b, c]
# [1, 2, nil]Assignment operator works in several steps. First, all variables are added to the current binding, unless they are already declared.
Second, the right-hand array values are evaluated, using the current binding.
Third, the variables are bound to evaluated values. (This part is intentionally vague, to be clarified later.)
This allows us to swap to variables without using the third, for example:
a = 1
b = 2
a, b = b, a
[a, b]
# [2, 1]Also, just-declared variables could be used on the right-hand side:
a, b = b, 1
# [nil, 1]
[a, b]
# [nil, 1]One, and only one variable on the left hand side could be marked with
a special "*" (asterisk) syntax. This variable will get assigned an
array value that contains all values left after other variables are
assigned.
a, b, *c, d = 1, 2, 3, 4, 5, 6, 7
# [1, 2, 3, 4, 5, 6, 7]
[a, b, c, d]
# [1, 2, [3, 4, 5, 6], 7]See that a got assigned the first value, b got assigned the second
value, and d got assigned the last value. Remaining values were put
into the array and assigned to splat variable c ([3, 4, 5, 6]).
Normal variables get assigned first, splat variable is assigned last.
If there is not enough values, splat variables will get assigned an empty array.
a, *b, c = 1, 2
# [1, 2]
[a, b, c]
# [1, [], 2]If there is not enough values even for normal variables, they will get
assigned nil, as usual.
There could be no values at all:
a, *b, c = []
# []
[a, b, c]
# [nil, [], nil]There is a special syntactic case that at the moment may be too tediuos to incorporate into general rules of multi-assignment.
One splat variable without any other variables is also a variant of multi-assignment.
*a = 1, 2, 3
# [1, 2, 3]
a
# [1, 2, 3]It is a multi-assignment because the splat variable still receives an array, even when there is only one value on the right hand side:
*a = 1
# 1
a
# [1]In Rubysyn, multi-assignment looks like this:
(assign-multi var1... expr)
Splat variable is marked by (splat-var var);
(assign-multi a (splat-var b) c (array 1 2 3))
It seems that (assign-multi) is not a proper Lisp function, but a syntactic
macro that generates the code that:
-
declares and initializes variables to be assigned;
-
uses temporary variables to evaluate and store right hand side values;
-
assigns temporary variables;
-
returns the expr as a result;
Later we'll see that the "assigns temporary variables" step can look differently depending on the type of assignment.
(not <expr>) implements logical operator NOT. It evaluates
<expr>, and returns true if the value is false or nil, and
false otherwise.
This corresponds to Ruby operator not.
Note that Ruby operator ! is different, see "Method-based operators".
Fun fact: not is not described in the standard Ruby documentation:
"Logical Operators".
(seq <expr>...) implements simple execution sequence. Provided
expressions are evaluated one by one. If the control flow reached the
end of (seq), the value of last element is returned as the result.
(seq) corresponds to the almost invisible syntax in Ruby: new lines
and semicolons
(see "Ending an Expression").
Empty (seq) is a no-op. It returns nil as the result.
(if <expr> <true-branch> [<false-branch>]) implements if operator as defined in Ruby.
First, an <expr> is evaluated. If its value is true,
<true-branch> is executed and its value is returned as the result.
If the <false-branch> exists, all the (var) variable declarations
are gathered from its body, and executed.
Otherwise, if the value is false and <false-branch> exists, it is
executed and its value is returned as the result. Before returning,
all the (var) variable declarations are gathered from
<true-branch> body, and executed.
All of this is needed because in variable declarations in Ruby are valid even if they are in the branch that was never taken. E.g.:
if true
# do nothing
else
a = 2
end
a
# => nilHere the a variable is declared even though the "else" branch of
this if was never taken. This syntax is recursive: you can define
more if's and other constructs in a never-taken branch, and all of
those variables would be declared after the end of the top-level if.
In Rubysyn this code corresponds to:
(if true (seq)
(seq (var a) (assign a 2)))
a
;; => nilIn this example, we can analyze the "else" branch and see that it
contains a declaration of a variable. This analysis is completely
static and works on a syntax level. The original code is rewritten
like this:
(if true (seq (var a)) ;; <--- (var a) inserted here
(seq (var a) (assign a 2)))
a
;; => nilThis "declaration gathering" is explained in more detail below.
Ruby ternary operator a ? b : c is implemented as (if a b c).
elsif is equvalent to else if.
unless is equivalent to if not.
String literals in Rubysyn are double-quoted. Only a small number of
escape sequences is supported: \", \\, \n, \r, \t,
\u{nnnnn}, and \xnn. Other symbols after backslash are not
allowed.
All other Ruby syntax for string construction, including here-documents etc. is a syntactic sugar and is not supported.
Example:
(var foo)
(assign foo "Hello, world!")
String interpolation is implemented as a helper function:
(string-interpolate "<template>" <value>...)
<template> is a string literal with two active components: %s and
%%. All other symbols after percent sign are not allowed.
For each value a #to_s method is called, and the resulting value is
inserted into a template.
String literals correspond to instances of class String. We discuss
memory allocation of such instances elsewhere.