Quoting and symbols#

Understanding quoting#

Quoting is a concept that frequently confuses beginners in languages from the Lisp family. Yet, it is an essential tool in practice. This part takes a slightly more theoretical approach than before in order to dissect this feature.

In the part about lists, we already met the basic syntax of quoting:

'(0 1 2 3 "Hello")

This expression is (almost, see below) equivalent to (list 0 1 2 3 "Hello"). It evaluates to a list whose elements are 0, 1, 2, 3 and "Hello".

However, quoting is not all that similar to the list function:

(define four 4)

(list four (+ 2 3))
 (4 5)

'(four (+ 2 3))
 (four (+ 2 3))

What happened?

To understand this, we have to dive into fundamental aspects of the languages that remained more or less hidden or implicit up to now.

The design principle to remember is that Scheme looks for maximum simplicity. One domain this extends to is how programs are represented. To understand this, it is useful to think of it by comparison to another language, for example, Python. In Python, a program is represented using an “abstract syntax tree”, which can be read using the ast module. On the example 1+2, this gives:

>>> import ast
>>> ast.dump(ast.parse("1+2", mode="eval"))
'Expression(body=BinOp(left=Constant(value=1), op=Add(), right=Constant(value=2)))'

The expression 1+2 is represented by the Python interpreter in a relatively complex way, using an object of type Expression of which the body attribute is an object of type BinOp with attributes left, op and right.

None of this exists in Scheme. The representation of expressions is a lot simpler: rather than using special data types (with classes, attributes, etc.), it just reuses the basic data types. You have already met some of the basic data types: numbers, strings, booleans, and lists. Observe how the Scheme interpreter displays a list in a format that is very similar to how Scheme expressions are formed. In both cases, there are elements separated by spaces inside a pair of parentheses.

(+ 1 2) ; computation
(0 1 2) ; list

This resemblance between notations is no coincidence. If the expression (+ 1 2) looks a lot a list, this is because it is, in fact, a list, of which the elements are +, 1 and 2. This is Scheme’s much simpler representation for what Python represents as an AST node. In Python syntax, this list would roughly correspond to ["+", 1, 2].

One frequent source of misunderstandings is the difference between an expression (or a datum, in Scheme speak), and the value it evaluates to. How is (+ 1 2) both a list and the number 3? Of course, it is not; it is a list, which evaluates to the number 3.

One way to see this concept of using simple data types for programs is that the designers of the Scheme language were so lazy that they could not even be bothered to pick a syntax for their language. They only picked a data format for the basic values of their language (with parentheses, and all that), and required all programs to be written using data structures in that data format. It’s a bit like if all Python programs were written with a syntax like

["def", "f", ["x"],
  ["return", ["+", "x", 1]]]

instead of

def f(x):
    return x+1

When a Scheme file is executed, first it is loaded as a data structure by the Scheme syntax parser (more commonly called the reader; it’s so simple that it doesn’t even deserve the name of a parser); then the data structure is evaluated. Let us review the different data types and how they evaluate:

  • Numbers, like -5.5. These are dubbed “self-evaluating” because evaluating a number gives the same number. When you write -5.5, you get -5.5. This seems trivial, but it is important. When you write (+ 1 2), you do not get (+ 1 2), which is because (+ 1 2), as a list, is not self-evaluating.

  • Strings, like "abcd". Also self-evaluating.

  • Booleans: #t and #f. Also self-evaluating.

  • Lists, like (display "foo"). Most of the time, a list is evaluated by first evaluating all its elements, then applying the evaluated value of the first element, as a function, to the remaining elements.

There is a remaining very fundamental data type that you have not seen mentioned explicitly so far, yet is ubiquitous: the type of display in (display "foo"). This is called a symbol. At first glance, symbols are similar to strings: sequences of characters. However, besides being written without quotes, symbols are fundamentally different to strings because they are not self-evaluating. When a symbol is evaluated, it is interpreted as the name of a variable. That variable is looked up, and the value is returned as the value the symbol evaluates to.

"my-variable" ; string, evaluates to itself
 "my-variable"

my-variable ; symbol
 error because my-variable is not defined

(define my-variable "foo") ; let's define it
my-variable ; now the symbol can be evaluated without error
 "foo"

Similarly, in (+ 1 2), + is a symbol. It is predefined by the Scheme interpreter as the addition function.

At this point, you should understand that when you write a Scheme program, you are in fact writing a big data structure which is the representation of that program in Scheme’s notation for data structures. But this has a little problem: whenever you write a data structure, it is evaluated. Yet, often, you just want to use a data structure for itself. This is where quoting is useful. Quoting prevents evaluating an expression, returning it as-is, as its plain data structure value.

(+ 1 2) ; list evaluated through a function call
 3

'(+ 1 2) ; list is not evaluated, returned as-is
 (+ 1 2)

In practice, quoting is a very convenient way of entering many lists. Moreover, it is the main way to obtain a symbol (without evaluating it as variable). This syntax is so common that it will look familiar to you if you use LilyPond frequently:

\tag #'edition (
\override NoteHead.style = #'cross

A more explicit syntax for quoting is available. Writing a single quote before an expression is actually a shorthand for wrapping that expression in (quote ...). These are strictly equivalent:

'(1 2.4 "Hello")

and

(quote (1 2.4 "Hello"))

Quasiquoting#

Quasiquoting syntax is a way to evaluate selected subexpressions within a quoted expression. For this to work, quote needs to be replaced with quasiquote. Expressions to be evaluated are then wrapped in unquote.

(quasiquote (0 1 2 (unquote (+ 1 2)) "Hello"))
 (0 1 2 3 "Hello")

Just like there is a shorthand ' (single quote) for quote, there is a shorthand ` (backtick) for quasiquote, and , (comma) for unquote.

`(0 1 2 ,(+ 1 2) "Hello")
 (0 1 2 3 "Hello")

This syntax is frequently used to create lists that contain symbols. Rather than

(list 'moveto x y)

you would often write

`(moveto ,x ,y)

Identity of symbols#

At first glance, symbols and strings are similar, the difference being that strings are self-evaluating while symbols evaluate to variables. There is another essential difference, however. When you create two strings, even equal strings, memory is allocated in the computer for both strings independently, and they are stored in different places. On the contrary, symbols are unique. A symbol is never stored twice in two different places. When a symbol already known to the interpreter is read, the already allocated symbol is reused automatically.

With equal?, this makes no difference at all, since equal? tests structural equality of objects. For two strings, equal? tests whether they have exactly the same characters. There is another test, called eq?, to determine whether two objects are actually the same object in memory. equal? is an equality test, whereas eq? is an identity test.

(equal? "hello" "hello")
 #t
(eq? "hello" "hello")
 #f
(equal? 'hello 'hello)
 #t
(eq? 'hello 'hello)
 #t

The advantage of eq? is that it’s very fast. It suffices to test whether the addresses of the two objects are equal. Unlike eq?, equal? can take significant time; the larger the objects it compares, the costlier it is. Thus, the lesson to remember is that symbols can be compared with eq?, and it’s a good idea to always compare them with eq?, even though it’s not really a problem to use equal?.

Identity and immutability of literal data#

There is a golden rule related to quoting that is useful to know about. This will guard you against hard-to-debug mistakes in the future.

Thou shalt not mutate literal data.

First, what does it mean to mutate? Put simply, it means modifying an existing value. Scheme has some functions that do this, although they are not used so often and not touched upon much in this tutorial. One example is the set-car! function, which changes the first element of a list.

(define lst (list 1 2))
(set-car! lst 0)
lst  (0 2)

Also, what is “literal data”? This means any data structure that is written directly in the program. The most obvious example is a quote: the value returned by the code '(0 1 2) is literal because it was present in the code (the (0 1 2) part), and returned as-is thanks to the quote operator, unlike the code (list 0 1 2). Another example is literal strings: the code "foo" evaluates to a literal string because it comes from the code directly, unlike (string-append "f" "oo"). Of course, literal numbers, booleans and symbols are also literal data. The rule does not matter for them, however, because these are immutable: Scheme provides no operations for mutating them in the first place.

Bringing back the Python analogy, Python’s [1, 2, 3] is more similar to (list 1 2 3) in Scheme than '(1 2 3). This is because the former always creates a new list. In contrast, because the latter is literal data, it always evaluates to the same list.

(define (build-list-using-list-function)
  (list 1 2 3))

(eq? (build-list-using-list-function) (build-list-using-list-function))
 #f

(define (build-list-using-quoting)
  '(1 2 3))

(eq? (build-list-using-quoting) (build-list-using-quoting))
 #t

Note that this is not the same thing as with symbols. When you get symbols with the same name from two different places, you always get the same symbol, which is not the case with other literal data like lists and strings. On the other hand, when you get a literal twice from the same place, then the two values are indeed identical.

(If you are a C/C++ programmer, think of a Scheme string literal as a C string literal, a const pointer into the program data section, and of a Scheme list literal as a static const array.)

When you write a call like

(function "foo string")

you don’t want that call to suddenly start passing "bar string" to the function, just because the previous call to that function mutated the literal string. To enforce sanity on this, Scheme mandates that literal data be treated as immutable.

If you use functions like set-car! on a literal datum, an error may occur. Unfortunately, it may also not occur, due to limitations in the Guile implementation. Such programs are, nevertheless, invalid.