Minimal syntax for rose trees

Darius J Chuck

2021-11-27

In this article I propose a minimal syntax definition for representing rose trees.

This is a step on a path to explaining how a minimal syntax like Jevko has direct correspondence to familiar tree structures and thus can be used to represent them with minimal accidental complexity.

What is a rose tree?

A rose (or multiway) tree is a tree data structure where each node contains a label/value alongside a number of branches/subtrees. The number of branches is variable and unbounded.

A rose tree can be defined as follows in Haskell:

data Tree a = Node {
        label :: a,
        branches :: [Tree a]
    }

Haskell’s lazy semantics are irrelevant here, so the following TypeScript definition is analogous for our purposes:

type Tree<a> = {
  label: a,
  branches: Tree<a>[],
}

The proposed syntactical definition

The above definitions have the following minimal syntactical analog (in ABNF):

Tree     = label branches
branches = *(open Tree close)

where we might define:

open  = "["
close = "]"
label = *character

where character is any Unicode character excluding the square brackets.

The definition on top of a Dyck language

Note that if we take the original definition:

Tree     = label branches
branches = *(open Tree close)

then drop label:

Tree     = branches
branches = *(open Tree close)

then inline branches:

Tree = *(open Tree close)

then inline both open and close:

Tree = *("[" Tree "]")

we are left with a definition that describes the Dyck language.

We can then rewirte the original definition as follows:

Tree  = label *("[" Tree "]")
label = *character

to see that it is the same as a Dyck language extended with a (possibly empty) label that goes with each (possibly empty) sequence of bracket-delimited branches.

Example

The following example tree (source):

1
|
+- 2
|  |
|  +- 4
|  |
|  `- 5
|
`- 3
   |
   +- 6
   |
   `- 7

can be represented with our syntax as follows:

1[2[4][5]][3[6][7]]

Appendix: the details of labels

Our label definition allows using almost arbitrary strings as labels.

We might complete it with a simple escape mechanism (as in the definition of Jevko data) to effectively allow arbitrary Unicode strings.

To represent a value of type a we would serialize it to such a string.

Alternatively, if we weren’t building up to a definition of a generic syntax, we could restrict the label definition to allow only specific kinds of values.