Tutorial: Turning the Tables …

Introduction

Lua’s only rich native type is the table.

The table is the only game in town, so you will use it to implement every non-trivial data structure you need in any Lua project.

In this article, we will gradually build scribe, a Lua module that converts tables (and other Lua types) to readable strings.

Converting arbitrary Lua tables into descriptive strings is more complex than it initially appears. We’ll examine the issues that arise and how scribe addresses some of the pitfalls.

We will start with a trivial implementation in a dozen lines of Lua. Over time, we will evolve that code into a production-ready Lua module that handles the most complex tables with cycles and shared references. We will also see how to support multiple output formats in a single code block.

This blow-by-blow description and the liberally documented final product, scribe.lua, should be a helpful tutorial, at least for those new to Lua, especially those with experience in other languages.

This is not an introduction to Lua. Think of it as more Lua 201 than Lua 101.

This article is long, but we have tried to make it worthwhile.
And, of course, we hope you find scribe itself as helpful as we do!

Lua Types

Like every other programming language ever invented, the classic first Lua script is:

str = "Hello World"
print(str)

And, hey presto, it works! On your terminal, the output is:

Hello World

That handy print function works as you’d expect for many Lua types.

Lua always aims for the minimal and has only eight types in total.
Compare that to Rust, which has twelve types just for integers!

By the way, Lua’s tostring function is a companion to print and converts any Lua type to a string.

Simple Types

The four most straightforward Lua types are number, boolean, string and nil:

1str = "Cinderella"
2answer = 42
3pi = 3.14
4flag = false
5oops = nil
print(str, answer, pi, flag, oops)

1: A string.
2: A number that is an integer.
3: This number is a float, but Lua uses one type for integers and floats.
4: A boolean.
5: A special nil type indicates not-founds, fails, etc.

In each case, you get very reasonable results on your screen:

Cinderella  42 3.14 false nil

We can use print to dump recognisable values from number, boolean, string and even nil.

The simplest form of debugging is to sprinkle print statements throughout your code liberally, so the more types print works on, the better. Sure, it’s not elegant, but every programmer uses print statements when things go awry. Even more so in a non-compiled, dynamic language like Lua, where adding a print statement and rerunning happens as fast as you can type.

Lua has four additional types beyond number, string, boolean, and nil.
These are function, userdata, thread and table.

Lua Functions

Lua methods you write or import all have the type function.
Let’s look at a simple function example:

function answer() return 42 end
1print(answer())
2print(answer)

1: This prints whatever is returned from our answer function.
2: This prints what Lua thinks of as the function itself.

Output:

42
1function: 0x600003e6cca0

1: The part after the colon will vary from run to run.

The string "function" is descriptive enough, but the string 0x... that follows the colon is opaque. It is the address in memory where Lua stores its form of the function in question. That is consistent for a single run, so if you print the function twice:

function answer() return 42 end
print(answer)
print(answer)

The code outputs the exact string twice, e.g.

function: 0x6000032a8ca0
function: 0x6000032a8ca0

However, the next time you run the program, you’ll get something else, such as

function: 0x600002650ca0
function: 0x600002650ca0

We don’t usually write things like print(answer) in our code except by accident! When we do, it’s likely a bug. We probably meant to write print(answer()) with those parentheses () that tells Lua to please execute the answer function and capture the result, So, while the output from print(answer) is opaque, it’s generally followed by an “oops, I forgot some parentheses!”

Two Opaque Types

One of Lua’s great strengths is its ability to interface with things written in other languages. Lua’s userdata type is commonly associated with this ability.

When you try to print something implemented in another language, it is hardly surprising that Lua can only say, “I see that as a piece of user data located at this address in memory.”

You can’t expect much more; if you need something more descriptive, you’d expect to perform that action in another language.

Lua also has a thread type, which is used to implement coroutines. Again this is an opaque type, so print will only say, “I see that as a thread …” and give you a memory address.

Array Tables

Finally, we come to the all-important table type, starting with Lua arrays, a subset of this type.

The table type is Lua’s only “complex” native data type and is amazingly versatile. Once you use Lua for anything beyond trivial scripts, you will inevitably build and interpret many tables.

Tables can contain all Lua types, including Lua functions and other tables, which can refer to each other in cycles, etc.

But let’s start with a simple array example:

gents = {'Tom', 'Dick', 'Harry'}
print(gents)

The corresponding output will be something like:

table: 0x600001d32980

This output is similar in spirit to what we got by calling print on that Lua function shown above. Lua recognises the gents object as a table at some memory address, and that’s all it reveals.

To emphasise the point, we note that the Lua assignment operator for tables creates another variable that points to the same table:

gents = {'Tom', 'Dick', 'Harry'}
aka = gents
print(gents)
print(aka)

This outputs:

1table: 0x600002e96940
2table: 0x600002e96940

1: The variables gents and aka are really pointers to the same memory address.
2: The specific memory location will vary from run to run,

Of course, this output is not helpful and isn’t what you’d naively expect!

You search for “How do I print a Lua array?” and find an answer like:

print(table.concat(gents, ", "))

And sure enough, out pops the string “Tom, Dick, Harry”.

At this point, you may feel aggrieved!

Why didn’t print(gents) return something like "Tom", "Dick", "Harry" in the first place? What is that table.concat(...) call? Everybody would prefer the second output over being told that Lua recognises gents as a table that resides at some address in memory. There must be a better way!

Key-Value Tables

Things get even more screwy when you try to print a more general Lua table that isn’t an array:

1mouse = {
    first = 'Minnie',
    last = 'Mouse'
}

1: This is a Lua table with two name-value pairs.

Lua adheres to Mies Van der Rohe’s “less is more” mantra. It likes to keep things simple!

For example, we saw earlier that the Lua number type encompasses all classes of integers and all classes of floating-point numbers. Other “system-level” computer languages distinguish between them, as every piece of computer hardware has different paths for the types at the chip level. Programmers of those languages must understand and care about the differences between integers and floats. That distinction makes sense if you want to squeeze the maximum performance from every CPU nanosecond.

Lua has different goals. It is still efficient, but it is willing to spare a few compute cycles to limit type complexity for the programmer. If you code in Lua, you can only use generic “numbers” and trust that Lua handles them efficiently, whatever the form of those numbers of interest may be.

The Lua table type is similar, encompassing simple arrays, like the gents example, and more general hash map tables with explicit keys and values, like the mouse example. This combination seems odd if you have done any programming before encountering Lua.

The other “real” computer languages you learnt all distinguish between arrays and dictionaries. In those languages, arrays are part of the core language. A long, early manual chapter will expound on their use. The description for the name-value dictionary-type container will be in the back of the book in the section dedicated to the language’s “standard” library. This division reflects that the hardware paths for the two container types are generally very different. Arrays are considered more fundamental than dictionaries of name-value pairs.

Lua, in effect, says:

Trust me, build that table however makes the most sense to you, and let me worry about efficiency.

Overall, this works remarkably well. Lua internally splits tables into an array part that zips along the high-speed lane of the hardware highway and a dictionary part that is necessarily over on a lower-speed lane. Again, the trade-off is between programming simplicity with a “trust me, I’ll get you almost the same speed” clause and the maximum performance per nanosecond.

Given our lack of success at getting something useful out of print for an array, we aren’t going to be surprised to see similar nonsense from print(mouse):

table: 0x6000027d9b00

Lua tells you that mouse is a table residing at a specific memory location.
True, but not very helpful!

If we try our earlier trick

print(table.concat(mouse, ", "))

Lua outputs a blank line. Well, you just learnt something—apparently, table.concat only works on Lua array-like tables.

A Lua array has implicit keys with successive integers starting at 1. General Lua hash tables have explicit keys, such as the strings first and last in the mouse example. The keys can be any Lua object, not just strings.

Of course, we can unpack our table and write:

print(name.first, name.last)

Then we get “Minnie Mouse”.

Another quick search provides an answer for tables with an arbitrary number of key-value pairs:

for k, v in pairs(mouse) do
    print(k,v)
end

When I ran it the first time, this output:

last    Mouse
first   Minnie

The output is a valid representation of the data but not in a natural order. Running the script a few more times may eventually give a better order:

first   Minnie
last    Mouse

Lua stores key-value tables in an undefined order, which can vary from run to run. The pairs function iterates through the key-value pairs in storage order, so it’s not constant. Arrays, on the other hand, are always stored in the natural increasing index order.

First Shot at Tables

At this point in your Lua journey, you probably search for “How do I convert a Lua table to a string?”. You will find a lot of suggestions, some quite good and some not so good.

But suppose you wish to build your very own solution based on the discovery that you can use the pairs function to iterate through a table.

Well, you know that recursion is the touch of the hand of God and that Spidey sense is telling you this is the place to use it!

With a little spare time on your hands, you come with code along the lines of:

1function table_string(tbl)
2    local indent = '    '
3    local retval = '{\n'
    for k, v in pairs(tbl) do
4        retval = retval .. indent
5        retval = retval .. tostring(k) .. ' = '
        if type(v) ~= 'table' then
6            retval = retval .. tostring(v)
        else
7            retval = retval .. table_string(v)
        end
8        retval = retval .. ',\n'
    end
9    retval = retval ..  '\n}'
    return retval
end

1: A descriptive function name. However, we should check that tbl is a Lua table!
2: We hard code the indent to four spaces.
This is a parameter the user will want to set.
3: Start the return string with a {and a newline character.
The user might want to set the table delimiters to something other than braces.
4: Indent every key-value pair inside the table.
5: Add the key k as a string and an assignment =.
Another potentially user-settable parameter.
6: The value v isn’t a table. We can use tostring and add it to the return value.
7: A sub-table! “Look, Ma, that’s recursion. I’m a real programmer!””
8: End the table element with a separator , followed by a newline character.
9: Finally, close the string with a newline character and a matching table end-delimiter }.

While we have begun handling nested sub-tables using recursion, this version will not get the indentation right. We’ll come back to that problem shortly.

You try it out on our little mouse by calling print(table_string(mouse)), which returns:

{
    first = Minnie,
1    last = Mouse,

}

1: That’s an annoying extra comma and newline character after the final table element.

Overall, it’s not bad! There is that extra comma and new line that looks a bit off, and of course, if you run that print(table_string(mouse)) a few times, you will see that the print order of the elements changes:

{
    last = Mouse,
1    first = Minnie,

}

1: The element order changed, but the extra comma and newline character remains firmly in place.

Making `indent` a Parameter

Before we tackle the extra comma and newline character, let’s make indent a parameter. This is easy to do by adding a second optional argument to the function:

1function table_string(tbl, indent)
2    indent = indent or '    '
    ...

1: We add a second argument to the function, which should be a string.
2: If the user doesn’t provide a value for indent, we default to four spaces.

Only multiline formats will ever use indentation. The output should be a single line if the function is called with an indent as the empty string. We can use this check to trigger inline versus multiline output:

function table_string(tbl, indent)
    indent  = indent or '    '

1    local nl     = indent == '' and '' or '\n'
2    local retval = '{' .. nl
    for k, v in pairs(tbl) do
        retval = retval .. indent
        retval = retval .. tostring(k) .. ' = '
        if type(v) ~= 'table' then
            retval = retval .. tostring(v)
        else
3            retval = retval .. table_string(v, indent)
        end
4        retval = retval .. ',' .. nl
    end
5    retval = retval .. nl .. '}'
    return retval
end

1: We parametrise the “newline character” nl and set it to the empty string for inline outputs.
2: Instead of hard-coding the newline character, we add nl to the opening brace
3: We pass indent to the recursive call.
4: We add nl to the separator ,.
5: Finally, we add nl to the closing brace.

Whenever you change the calling signature of a recursive function, you must update the recursive call to match. From experience, this is a common source of bugs.

Now, if you call print(table_string(mouse, '')), you will get:

{first = Minnie,last = Mouse,}

That’s a single line with no newlines or indentation, though there is an extra trailing comma we need to eliminate.

Anatomy of a Table

Although our current output string is flawed, nonetheless, it highlights the general structure for any table:

table-begin-delimiter
    content
table-end-delimiter

In our first attempt, the table_begin and table_end delimiters are the opening and closing braces surrounding the table content. The table delimiters should be user-configurable.

The table content is a sequence of zero or more elements:

table-begin-delimiter
    element,
    element,
    ...
table-end-delimiter

Each element includes a key, possibly an assignment operator, and a value. Array “keys” are the array indices and are often not shown as they are implicit in the ordering of the values.

In some formats like JSON, the keys must be enclosed in double-quotes. We can accommodate that requirement by introducing key delimiters, key_begin and key_end. The assignment operator can always be incorporated as part of key_end.

Elements also have begin and end delimiters, though those vary according to context. In our current implementation, the element beginning delimiter is some indentation. The element ending delimiter is the comma character followed by a new line. This is the separator between elements in the table.

The indentation amount and the element separator should be user-configurable.

Using this terminology, we can rewrite our table_string function:

function table_string(tbl, indent)
    indent = indent or '    '

    local nl          = indent == '' and '' or '\n'
1    local table_begin = '{' .. nl
    local table_end   = nl .. '}'
2    local key_begin   = ''
    local key_end     = ' = '
3    local sep         = ',' .. nl

4    local content = ''
    for k, v in pairs(tbl) do
5        local k_string = key_begin .. tostring(k) .. key_end
6        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, indent)
7        content = content .. indent .. k_string .. v_string .. sep
    end
8    return table_begin .. content .. table_end
end

1: We introduce the table delimiters as parameters.
2: We introduce the key delimiters as parameters.
3: We introduce the element separator as a parameter.
4: Capture the table content in content.
5: Appropriate delimiters surround the key string.
We might cause this to disappear entirely if tbl is a Lua array.
6: The value string may need to be found using recursion.
7: Add the current element to the content.
8: Finally, surround the table content with table delimiters.

At first blush, this does not look like an improvement. It is undoubtedly more verbose. However, it is a step towards the goal of supporting many different output formats in one function.

If we set key_begin and key_end to '"' and '": ' respectively, we get:

{
    "last": Mouse,
    "first": Minnie,

}

This is a good start on JSON output, but we still have the trailing comma problem, and the string values are not enclosed in double-quotes. We’ll return to this later.

Formatting Options

There are already quite a few parameters at the top of the table_string function that the user might want to set, and more are to come.

Formatting problems, such as the one here and UI settings for many programs, are notorious for having numerous settable parameters. If a parameter is missing, it should default to some reasonable value.

We could continue adding arguments to the function, but that’s not a great idea.

table_string(tbl, indent, table_begin, table_end, key_begin, key_end, sep)

This calling signature is not user-friendly. It is too verbose and error-prone. It’s easy to forget the arguments’ order or leave one out.

Some languages have the idea of named arguments, which greatly help in this situation. Lua doesn’t directly support named parameters but has a versatile table object. We can pack all the formatting options into a table and pass that table as a single argument:

table_string(tbl, opts)

opts is a table that holds all our formatting parameters. For example, we might query opts.indent for the desired tab size, etc.

The opts argument itself should be optional. For now, we’ll assume that if it is present, it has all the fields we need—it is fully defined.

Let’s set up a default fallback table of formatting options that might look like this:

local pretty_options = {
    indent      = '    ',
    table_begin = '{',
    table_end   = '}',
    key_begin   = '',
    key_end     = ' = ',
    sep         = ','
}

We should have a few different sets of formatting options. For example, we would like a multiline version, as well as a more compact, inline version. We can set up a table of options for each of these, so let’s start with that pretty version:

1local options = {}
2options.pretty = {
    indent      = '    ',
    table_begin = '{',
    table_end   = '}',
    key_begin   = '',
    key_end     = ' = ',
    sep         = ','
}

1: We set up a table to hold all our tables of formatting parameters.
2: We set up a sub-table options.pretty of options for the pretty version.

To use this, our primary table_string function becomes:

1function table_string(tbl, opts)
2    opts = opts or options.pretty

3    local indent = opts.indent
    local nl     = indent == '' and '' or '\n'
4    local tb     = opts.table_begin .. nl
    local te     = nl .. opts.table_end
5    local kb, ke = opts.key_begin, opts.key_end
6    local sep    = opts.sep .. nl

    local content = ''
    for k, v in pairs(tbl) do
        local k_string = kb .. tostring(k) .. ke
7        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string .. sep
    end
    return tb .. content .. te
end

1: We changed the calling signature to incorporate an optional table of formatting parameters.
2: We use the options.pretty table if’ options’ is absent.
3: Grab the indent field from the opts table.
4: We unpack the opts table into local variables for convenience where tb is table_begin, etc.
5: We unpack the opts table into local variables for convenience where kb is key_begin, etc.
6: Localise the element separator.
7: Remember to pass the opts table to the recursive call!

We can now call print(table_string(mouse)) and get the same output as before:

{
    last = Mouse,
    first = Minnie,

}

Let’s add a set of options that is specifically for one-line output. We start with a little function to make a shallow clone of any table:

local function table_clone(tbl)
    local retval = {}
    for k,v in pairs(tbl) do retval[k] = v end
    return retval
end

Then we can easily set up options.inline:

1options.inline = table_clone(options.pretty)
2options.inline.indent = ''

1: We make a shallow copy of options.pretty and then override the fields we want to change.
2: We set indent to an empty string.

Now we can call print(table_string(mouse, options.inline)) and get:

1{last = Mouse,first = Minnie,}

1: Still have that pesky trailing comma, but we’ll fix that soon.

The inline version looks cramped. One way to improve things is to add some spaces to the table delimiters and element separator:

options.inline = table_clone(options.pretty)
options.inline.indent      = ''
1options.inline.table_begin = '{ '
options.inline.table_end   = ' }'
2options.inline.sep         = ', '

1: Add some breathing room between the table delimiters and the content.
2: Space out the table elements.

An alternate approach is to add those spaces on the fly when needed. Some inline formats want to be as compact as possible, so we can make adding those spaces a formatting option:

options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
1    inline_spacer = ' '
}

options.inline = table_clone(options.pretty)
options.inline.indent = ''

1: As the name suggests, inline_spacer controls how generous the spacing is for the inline version of a set of formatting options.

Here’s how we use that new formatting field:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local tb, te = opts.table_begin, opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent

1    local nl = indent == '' and opts.inline_spacer or '\n'
    sep = sep .. nl
    tb  = tb  .. nl
    te  = nl  .. te

    local content = ''
    for k, v in pairs(tbl) do
        local k_string = kb .. tostring(k) .. ke
        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string .. sep
    end
    return tb .. content .. te
end

1: If there is an indentation, then nl is a newline character; otherwise its the user-configurable spacer.

Finally, we add a couple of convenience functions that package table_string with a specific set of options:

function pretty(tbl) return
    table_string(tbl, options.pretty)
end

function inline(tbl)
    return table_string(tbl, options.inline)
end

For example, print(inline(mouse)) now returns:

{ last = Mouse, first = Minnie, }

print(pretty(mouse)) returns:

{
    last = Mouse,
    first = Minnie,

}

Adding small facade functions like pretty and inline can make the API more user-friendly. Providing a few of these functions for everyday use cases is a good idea.

The Comma Problem

It’s time to eliminate the “comma” problem, which is done by not adding the element separator after the last element.

Let’s start with Lua arrays, which are tables you can iterate through using indices:

1for i = 1, #tbl do
    ...
end

1: #tbl is a built-in Lua function that returns the number of elements in the array part of tbl.

For arrays, we always know when we are at the last element.

We can replace the line that looks like this:

    ...
        content = content .. indent .. k_string .. v_string .. sep
    ...

with

    ...
        content = content .. indent .. k_string .. v_string
1        if i + 1 < #tbl then content = content .. sep end
    ...

1: We are using i as the current element index, and if we’re at the end of the array, we avoid adding a separator.

However, we want to handle all Lua tables, which may or may not be arrays. Unfortunately, we cannot rely on #tbl to return the number of elements in a general tbl. If we have the Lua array of strings:

local friends = { "Mickey", "Goofy" }

Then #friends will return 2.

If, instead, we have a general table that happens to have some key-value elements like:

local mouse_in_characters =
{
    'a', 'b', first = "Minnie", last = "Mouse", 'c', 'd'
}

Then #mouse_in_characters returns 4!

Even though we have deliberately written mouse_in_characters as a couple of key-value elements surrounded by straight array elements, Lua will aggregate the array elements {a, b, c, d} into an array part for the table and, under the covers, keep the two key-value elements in a separate hash map. If you try:

for i = 1, #mouse_in_characters do
    print(mouse_in_characters[i])
end

Out pops:

a
b
c
d

We cannot access the “dictionary” part of the table this way!

Lua tables can be arrays, dictionaries, or both in a single instance! This makes Lua tables very flexible, but it can also be a source of confusion. I suspect it wasn’t a great design decision, as it makes it harder to write general-purpose functions that work with arrays and dictionaries, which are very different data structures. It is what it is, and we must work with it.

Using an Extra Pass

However, we know that the pairs function will access all the table elements:

for k, v in  pairs(mouse_in_characters) do
    print('key', k, 'value', v)
end

Yields

1key 1       value   a
key 2       value   b
key 3       value   c
key 4       value   d
2key last    value   Mouse
key first   value   Minnie

1: The “array” elements will always come first and always in the natural order.
2: The general key-value elements come next but in an undefined order that changes from run to run.

So, for the price of an extra pass, we can compute the number of elements in any table:

local function table_size(tbl)
    local size = 0
    for _,_ in pairs(tbl) do size = size + 1 end
    return size
end

Then print(table_size(mouse_in_characters)) will return 6.

We can use table_size in our table_string function:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local tb, te = opts.table_begin, opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent

    local nl = indent == '' and opts.inline_spacer or '\n'
    sep = sep .. nl
    tb  = tb  .. nl
    te  = nl  .. te

    local content = ''
1    local i, size = 0, table_size(tbl)
    for k, v in pairs(tbl) do
2        i = i + 1
        local k_string = kb .. tostring(k) .. ke
        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string
3        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

1: i’ is the current element index running from 1 to size.
2: Increment the element “index”.
3: Add the separator if we are not at the last element.

With this version:

print(pretty(mouse))

Yields:

{
    first = Minnie,
1    last = Mouse
}

1: Yeah! That extra comma is gone!

print(inline(mouse)) is also correct:

{ first = Minnie, last = Mouse }

Using a Guard

Using the table_size function means we make an extra pass through the table.

We can avoid the extra pass by using a guard variable. While we cannot know when we are at the last element, we do know when we are at the first element. All elements except the first element have a preceding element separator. With that in mind, we can rearrange the main loop in table_string:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local tb, te = opts.table_begin, opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent

    local nl = indent == '' and opts.inline_spacer or '\n'
    sep = sep .. nl
    tb  = tb  .. nl
    te  = nl  .. te

    local content = ''
1    local first_element = true
    for k, v in pairs(tbl) do
2        if first_element then first_element = false else content = content .. sep end
        local k_string = kb .. tostring(k) .. ke
        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string
    end
    return tb .. content .. te
end

1: We initialize first_element to true.
2: If we’re not at the first element, we start by adding an element-end delimiter before the current element.

This is a common idiom in Lua for handling iterations where you must do something special for the final element. Instead, you do something special for the first element and then do the usual thing for all subsequent elements.

This code version avoids the extra pass and still eliminates the trailing comma.

print(pretty(mouse))

Yields:

{
    first = Minnie,
    last = Mouse
}

Computing the size of tbl does require an extra pass. However, as we shall see shortly, we can use that pass to gather other useful information, so we are happy enough to pay the price of some extra compute cycles.

Empty Tables

We have one more issue to address. print(pretty({})) returns:

{
}

print(inline({})) returns:

{   }

We would prefer to see {} in both cases. If we know the size of tbl, then we can add a quick check for an early return at the top of the function,

1local function empty_table_string(opts)
2    local retval = (opts.table_begin .. opts.table_end):gsub('%s+', '')
    return retval
end

function table_string(tbl, opts)
    opts = opts or options.pretty

    local size = table_size(tbl)
3    if size == 0 then return empty_table_string(opts) end

    local tb, te = opts.table_begin, opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent

    local nl = indent == '' and opts.inline_spacer or '\n'
    sep = sep .. nl
    tb  = tb  .. nl
    te  = nl  .. te
    local content = ''
    local i = 0
    for k, v in pairs(tbl) do
        i = i + 1
        local k_string = kb .. tostring(k) .. ke
        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string
        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

1: We add a helper function to return a string for an empty table, taking into account the table delimiters.
2: It does that by concatenating the table delimiters and then using gsub to remove all whitespace.
3: In our table_string function we look for an early exit for empty tables.

With this change in place, print(pretty({})) and print(inline({})) both return {}.

Arrays vs. Tables

Lua has one type of table. It can be an array, a dictionary, or a mix of both. Under the covers, Lua keeps the array part separate from the dictionary part for efficiency.

Most programming languages have a distinct array type, and differentiating between arrays and dictionaries is often crucial.

For example, JSON is a popular human-readable data exchange format with a separate array type. In JSON, arrays are always ordered and have implicit keys that are consecutive integers. They are represented by square brackets [ ... ] to distinguish them from dictionaries represented by curly braces { ... }.

We can easily write a small function to determine whether a table is an array or a dictionary:

local function table_is_array(tbl)
    local size = 0
    for _,_ in pairs(tbl) do
        size = size + 1
1        if tbl[size] == nil then return false end
    end
2    return true
end

1: Arrays are indexed by consecutive integers from 1. If we find a hole, we know that tbl is not an array.
2: If we make it through the loop without finding a hole, we know that tbl is an array.

If tbl is a Lua array, a complete pass through tbl is required to confirm it is an array. We can add the check to our existing table_size function, which we rename metadata:

local function metadata(tbl)
    local size = 0
1    local array = true
    for _,_ in pairs(tbl) do
        size = size + 1
2        if array and tbl[size] == nil then array = false end
    end
3    return size, array
end

1: We assume tbl is an array until we find otherwise.
2: If we find a “hole”, then tbl is not an array.
3: Return both the computed size and array values.

Lua functions can return multiple values. This feature can be handy, but you don’t want to overdo it, as the function’s caller needs to get the order of the returned values right. Correct ordering is not a problem for two or even three values. After that, it is best to put the returns in a name-value table.

We use metadata to indicate that we are returning more than the table size. We will add other bits of metadata as we go along. Do not confuse this with Lua’s metatable concept, which allows you to override the behaviour standard operators like +, -, etc. and the behaviour of methods like tostring, print, etc.

We can add some array delimiters to our option tables:

options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
1    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' '
}

1: We will differentiate arrays by using square bracket delimiters.

Let’s put the new metadata method to use in the main event:

function table_string(tbl, opts)
    opts = opts or options.pretty

1    local size, array = metadata(tbl)
    if size == 0 then return empty_table_string(opts) end

2    local tb     = array and opts.array_begin or opts.table_begin
    local te     = array and opts.array_end or opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent

    local nl = indent == '' and opts.inline_spacer or '\n'
    sep = sep .. nl
    tb  = tb  .. nl
    te  = nl  .. te

    local content = ''
    local i = 0
    for k, v in pairs(tbl) do
        i = i + 1
        local k_string = kb .. tostring(k) .. ke
        local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
        content = content .. indent .. k_string .. v_string
        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

1: metadata returns the size and the type tbl.
The order is fixed.
2: We can pick suitable table delimiters depending on whether tbl is an array.

Now print(pretty(mouse)) returns:

{
    last = Mouse,
    first = Minnie
}

while print(pretty(friends)) returns:

1[
2    1 = Mickey,
    2 = Goofy
]

1: Arrays are now delimited with square brackets.
2: However, we are outputting the array indices 1, 2, ..., which is generally unnecessary.

Lua has “keys” for all table elements. In the case of arrays, those keys are the array indices, which are consecutive integers starting at 1. You don’t usually need to see those, so we alter our function only to show keys if tbl is not an array.

function table_string(tbl, opts)
    ...
    for k, v in pairs(tbl) do
        ...
1        if not array then content = content .. kb .. tostring(k) .. ke end
        ...
    end
    ...
    return retval
end

1: Now, we don’t show keys for array tables.

Now print(pretty(friends)) returns:

[
    Mickey,
    Goofy
]

The output from print(pretty(mouse)) remains unchanged:

{
    last = Mouse,
    first = Minnie
}

Sometimes, you need to see the “keys:” for an array. For example, if you are debugging and want to see the array indices. Let’s add an option to show the keys for arrays:

options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
1    show_indices  = false
}

1: Typically, we suppress seeing array indices.

The corresponding change to table_string is straightforward:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local size, array = metadata(tbl)
    if size == 0 then return empty_table_string(opts) end
1    local show_keys = not array and true or opts.show_indices
    ...
    for k, v in pairs(tbl) do
        i = i + 1
2        local k_string = show_keys and kb .. tostring(k) .. ke or ''
        ...
    end
    ...
    return retval
end

1: We set show_keys to true unless we are dealing with an array, in which case we use whatever is dictated by opts.show_indices.
2: We only show keys if show_keys is true.
That is always the case for non-arrays and is user-settable for arrays.

With that change, print(inline(friends)) returns [ Mickey, Goofy ]. If you set opts.show_indices = true, then print(inline(friends)) returns [ 1 = Mickey, 2 = Goofy ].

Finally, let’s add a couple of sets of formatting options that don’t include separate array delimiters. This is the style you most often see in Lua code, so it is handy to have it available.

options.classic = table_clone(options.pretty)
1options.classic.array_begin = '{'
options.classic.array_end   = '}'

2function classic(tbl)
    return table_string(tbl, options.classic)
end

1: All tables use the same delimiters { ... }.
2: We add a convenience function, classic, that uses the options.classic.

Now print(classic(friends)) returns

{
    Mickey,
    Goofy
}

Adding Indentation

Earlier, we alluded that while our solution does something for nested sub-tables by recursion, it certainly gets indentation screwed up in the process.

Suppose we introduce a table that captures Minnie’s “user profile” and try to print it:

local user =
{
    first = "Minnie",
    last = "Mouse",
1    friends = { "Mickey", "Goofy" }
}

1: Minnie’s friends are captured in an array.

Then, print(pretty(user)) might yield:

{
    first = Minnie,
1    friends = [
    Mickey,
    Goofy
],
    last = Mouse
}

1: We see friends as a nice array, but the indentation is incorrect.

Ideally, we’d like to see:

{
    friends = [
        Mickey,
        Goofy
    ],
    first = Minnie,
    last = Mouse
}

Our current output is readable but gets less and less with larger tables and more nesting. Deeper nesting requires more indentation! We better fix that next.

The most straightforward idea is to add indentation to the string returned from the recursive call table_string(v, opts).

We can make a function that adds indentation line-by-line to any Lua string:

local function indent_string(str, indent)
1    if not indent or indent == "" or not str or str == "" then return str end
2    local ends_with_newline = str:sub(-1) == "\n"
    local indented_str = ""
3    local first_line = true
4    for line in str:gmatch("([^\n]*)\n?") do
5        if not first_line then indented_str = indented_str .. "\n" end
        indented_str = indented_str .. indent .. line
        first_line = false
    end
6    if ends_with_newline then indented_str = indented_str .. "\n" end
    return indented_str
end

1: Handle some edge cases, as we do not need to do anything if the indent is the empty string. This check allows downstream methods to call indent_string without worrying that it will do something stupid.
2: We will add the indentation line-by-line. If the input str ends with a new line, the output should also.
3: This looks like that guard “trick” we discussed earlier.
4: Here, we iterate through str line-by-line with an unknown number of hits using Lua’s pattern search function gmatch.
5: Add newline characters to all but the first line.
6: Match the input — if it ends with a new line, the output will also.

Aside: Lua Patterns

The gmatch method added to the string class is another type of iterator. In this case, it looks for a pattern in the string str and returns the next match. When it can find no more matches, it returns nil and the iteration loop finishes.

Lua string patterns are like regular expressions in other languages, though they use fewer features. For example, if we have the string "ho, ho, ho" then the pattern "ho" matches the literal character 'h' followed immediately by 'o'. We might use it like this:

local str = "ho, ho, ho"
local count = 0
for _ in str:gmatch("ho") do
    count = count + 1
    print("Found", count)
end

That will output:

Found 1
Found 2
Found 3

Of course, if gmatch and friends could only find literal matches, they wouldn’t be powerful enough for most applications. While Lua’s pattern-matching library is slim, fortunately, it’s not that slim. Lua patterns can encompass classes of characters instead of literal ones.

In the indent_string function, the pattern we successively match on is "([^\n]*)\n?". This has many characteristic elements of a regular expression: it is terse and full of punctuation characters!

If you remove the parentheses, you have "[^\n]*\n?". The first part "[^\n]" simply says to look for a substring that starts with either the beginning of the string (denoted by the magic character, the caret '^') or the newline character '\n'. In patterns, you create “ors” with square brackets, so "[xyz]" will match on 'x' or 'y' or 'z'. The next part, ’ “?“starts with another magic character’’that matches *anything*. The‘?’` is another magic incantation; in this case, it tells the pattern matcher that the previous character (the newline character) is optional.

In all, the `“[^\n]*?” pattern says to match on a substring that starts at the beginning of the string or a newline character and finishes when you hit a newline character or run out of string.

The only thing missing is telling the pattern-matching engine which bits of the pattern constitute the substring we want. What should the pattern matcher capture?

That is what the parentheses are used for. The engine will capture whatever you put inside parentheses. In this case, we have parentheses around the first bit "([^\n]*)\n?" so we capture everything from either the string start or a newline character until we hit a newline character or the end of the string. In other words, we capture a line in the string. The g in gmatch stands for “global,” so it doesn’t stop at the first line but keeps iterating through the whole string line by line.

Indenting Tables

With the indent_string method in place, we can rewrite our primary function:

function table_string(tbl, opts)
    ...
    for k, v in pairs(tbl) do
        ...
        local v_string = ''
        if type(v) == 'table' then
            v_string = table_string(v, opts)
            v_string = indent_string(v_string, indent)
        else
            v_string = tostring(v)
        end
        ...
    end
    ...
    return retval
end

With those changes, we can call print(pretty(user)) and get:

{
    friends =     [
        Mickey,
        Goofy
    ],
    last = Mouse,
    first = Minnie
}

The elements in the friends array are now indented correctly, but the opening brace is also indented.

We can alter our indent_string function to ignore the first line optionally:

1local function indent_string(str, indent, ignore_first_line)
2    ignore_first_line = ignore_first_line or false
    if not indent or indent == "" or not str or str == "" then return str end
    local ends_with_newline = str:sub(-1) == "\n"
    local indented_str = ""
    local first_line = true
    for line in str:gmatch("([^\n]*)\n?") do
        if not first_line then indented_str = indented_str .. '\n' end
        local tab = first_line and ignore_first_line and '' or indent
        indented_str = indented_str .. tab .. line
        first_line = false
    end
    if ends_with_newline then indented_str = indented_str .. "\n" end
    return indented_str
end

1: We have added an optional boolean parameter ignore_first_line to the function.
2: If the user doesn’t provide a value for ignore_first_line, we default to false.

With those changes, we can call print(pretty(user)) and get:

{
    friends = [
        Mickey,
        Goofy
    ],
    last = Mouse,
    first = Minnie
}

The inline format print(inline(user)) is also correct:

{ last = Mouse, first = Minnie, friends = [ Mickey, Goofy ] }

Other Output Formats

We will look at a few other formats commonly used for viewing tables.

Indentation Only

Another commonly used multiline table format avoids delimiters and instead relies on indentation to show the structure. Here is how our user table would look in this format:

last: Mouse,
first: Minnie,
1friends:
    Mickey,
    Goofy

1: This all looks straightforward, but this format is tricky to implement.

We add a new set of formatting options for this format:

options.alt = table_clone(options.pretty)
options.alt.table_begin = ''
options.alt.table_end   = ''
options.alt.array_begin = ''
options.alt.array_end   = ''
options.alt.key_end     = ': '

Nothing too wild here; we start with options.pretty and set the table/array delimiters to blank strings. We also set up colons to act as the assignment operators.

We also add the usual convenience function that packages those formatting options with table_string:

function alt(tbl)
    return table_string(tbl, options.alt)
end

If we try print(alt(user)) we get something like:

1    first: Minnie,
    last: Mouse,
    friends:
        Mickey,
 Goofy

1: An extra indentation layer isn’t needed when the table delimiters are blank.
2: There are also some extra newlines at the end of the output.

A first attempt at fixing this format is to remove the indentation from the top-level elements. We can do this by adding a check for a blank table begin-delimiter:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local size, array = metadata(tbl)
    if size == 0 then return empty_table_string(opts) end
    local show_keys = not array and true or opts.show_indices

    local tb     = array and opts.array_begin or opts.table_begin
    local te     = array and opts.array_end or opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent
    local nl     = indent == '' and opts.inline_spacer or '\n'

1    if tb ~= '' then tb = tb .. nl end
2    if te ~= '' then te = nl .. te end
3    sep = sep .. nl

    local no_delims = tb == ''
4    if no_delims then indent = '' end

    local content = ''
    local i = 0
    for k, v in pairs(tbl) do
        i = i + 1
        local k_string = show_keys and kb .. tostring(k) .. ke or ''
        local v_string = ''
        if type(v) == 'table' then
            v_string = table_string(v, opts)
5            v_string = indent_string(v_string, opts.indent, true)
        else
            v_string = tostring(v)
        end
        content = content .. indent .. k_string .. v_string
        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

1: We add a new line to the table begin-delimiter if we use multiline output and the table begin-delimiter is not blank.
2: We add a new line to the table end-delimiter if we use multiline output and the table end-delimiter is not blank.
3: We add a new line to the separator if we are using multiline output.
4: If the table begin-delimiter is blank, we don’t indent the top-level elements in tbl.
5: We still indent any sub-table elements with the “real” indentation amount from the formatting options.

With that in place, print(alt(user)) returns something unindented at the outermost level and without the extra newlines at the end:

first: Minnie,
last: Mouse,
1friends: Mickey,
    Goofy

1: There should be a new line after friends here.

We are missing a newline character before the sub-array of friends. It should only be present if the table is multiline and the begin-delimiter is blank. This suggests a small addition to the table_string function:

function table_string(tbl, opts)
    ...
    for k, v in pairs(tbl) do
        ...
        if type(v) == 'table' then
            v_string = table_string(v, opts)
1            if tb == '' then v_string = nl .. v_string end
        ...
    end
    return tb .. content .. te
end

1: The suggested fix.

However, this doesn’t quite work as expected as print(alt(user)) now returns:

last: Mouse,
first: Minnie,
friends:
1Mickey,
    Goofy

1: We’re missing an indentation on the Mickey line.

However, we can fix this by using that third ignore_first_line argument in indent_string:

function table_string(tbl, opts)
    ...
    for k, v in pairs(tbl) do
        ...
        if type(v) == 'table' then
            v_string = table_string(v, opts)
1            v_string = indent_string(v_string, opts.indent, not no_delims)
2            if no_delims and show_keys then v_string = nl .. v_string end
        ...
    end
    return tb .. content .. te
end

1: We skip indenting the first line of the sub-table unless the table begin-delimiter is blank.
2: We add a newline character if the table begin-delimiter is blank and we are showing keys.

The full table_string function now looks like:

function table_string(tbl, opts)
    opts = opts or options.pretty

    local size, array = metadata(tbl)
    if size == 0 then return empty_table_string(opts) end
    local show_keys = not array and true or opts.show_indices

    local tb     = array and opts.array_begin or opts.table_begin
    local te     = array and opts.array_end or opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
    local indent = opts.indent
    local nl     = indent == '' and opts.inline_spacer or '\n'

    sep = sep .. nl
    if tb ~= '' then tb = tb .. nl end
    if te ~= '' then te = nl .. te end

    local no_delims = tb == ''
    if no_delims then indent = '' end

    local content = ''
    local i = 0
    for k, v in pairs(tbl) do
        i = i + 1
        local k_string = show_keys and kb .. tostring(k) .. ke or ''
        local v_string = ''
        if type(v) == 'table' then
            v_string = table_string(v, opts)
            v_string = indent_string(v_string, opts.indent, not no_delims)
            if no_delims and show_keys then v_string = nl .. v_string end
        else
            v_string = tostring(v)
        end
        content = content .. indent .. k_string .. v_string
        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

With this change in place print(alt(user)) returns something like:

1last: Mouse,
first: Minnie,
friends:
    Mickey,
    Goofy

1: The elements can be ordered differently.

The other formats still work as expected. print(pretty(user)) returns:

{
    last = Mouse,
    first = Minnie,
    friends = [
        Mickey,
        Goofy
    ]
}

print(inline(user)) returns:

{ last = Mouse, first = Minnie, friends = [ Mickey, Goofy ] }

JSON

The JSON format is a popular format for exchanging data between systems. Like our pretty format, JSON delimits tables with curly braces and arrays with square brackets. It surrounds keys with double quotes and uses colons to separate keys from values.

Let’s add a new set of formatting options for JSON:

options.json = table_clone(options.pretty)
options.json.key_begin = '"'
options.json.key_end   = '": '

We also add the usual convenience function that packages those formatting options with table_string:

function json(tbl)
    return table_string(tbl, options.json)
end

If we try `print(alt(user))` we get:
```txt
{
    "last": Mouse,
    "first": Minnie,
    "friends": [
        Mickey,
        Goofy
 ]
}

This isn’t quite JSON, as JSON requires string values to be surrounded by double quotes.

In fact, it is a good idea to always surround string values with double quotes. Lua’s string class has a string.format method that is perfect for this task.

For example, string.format("Hello, %s!", "world") returns "Hello, world!". The %s is a placeholder for a string value that is passed as a trailing argument to string.format. string.format is a wrapper around the venerable C function sprintf and uses almost all the same format specifiers. So %s is used for strings, %d for integers, and %f for floating-point numbers etc.

One of Lua’s primary use cases is dealing with large amounts of text that often includes multiline strings. It is useful to be able to see those in their raw form. For that reason, Lua has a special format specifier %q that is used to quote strings. It is similar to %s but it adds double quotes around the string and escapes any special characters. For example, string.format("%q", 'Hello, "world"!') returns '"Hello, \"world\"!"'.

We can use this format specifier to good effect. While at it, we will add a simple_string counterpart to table_string to take any Lua object and return a simple string representation.

1local function simple_string(obj)
    if obj == nil then return 'nil' end
    local obj_type = type(obj)
2    if obj_type == 'number' or obj_type == 'boolean' or obj_type == nil then
        return tostring(obj)
    elseif obj_type == 'string' then
3        return string.format("%q", obj)
    elseif obj_type == 'table' then
4        return string.format("%p", obj)
    elseif obj_type == 'function' then
        return '<function>'
    elseif obj_type == 'userdata' then
        return '<userdata>'
    elseif obj_type == 'thread' then
        return '<thread>'
    else
5        return '<UNKNOWN type: ' .. tostring(obj) .. '>'
    end
end

1: The new function simple_string takes any Lua object and returns a simple string representation of it.
2: We let tostring handle numbers, booleans, and nil values.
3: We use string.format with the %q format specifier to quote strings.
4: We use string.format with the %p format specifier to print the memory address of a table.
We will usually defer table conversion to table_string.
5: We should never reach this point, but add a catch-all for unknown types that Lua might introduce.

We can now use simple_string in our table_string function:

function table_string(tbl, opts)
    ...
    local i, content = 0, ''
    for k, v in pairs(tbl) do
        i = i + 1
1        local k_string = show_keys and kb .. tostring(k) .. ke or ''
        local v_string = ''
        if type(v) == 'table' then
            ...
        else
2            v_string = simple_string(v)
        end
        ...
    end
    return tb .. content .. te
end

1: We still use tostring to convert keys to strings and rely on key delimiters to add quotes if needed.
2: We use simple_string to convert non-table values to strings, so always get double quotes around strings.

With this change in place print(json(user)) returns:

{
    "last": "Mouse",
    "first": "Minnie",
    "friends": [
        "Mickey",
        "Goofy"
    ]
}

Compact JSON

While JSON is often used in its pretty format, it is common to use a more compact format where all extra spaces and newlines are removed.

We can add a new set of formatting options for inline JSON:

options.inline_json = table_clone(options.json)
options.inline_json.indent        = ''
options.inline_json.key_end       = '":'
1options.inline_json.inline_spacer = ''

1: In this case, we remove the inline spacer as well to make the output even more compact.

We also add the usual convenience function that packages those formatting options with table_string:

function inline_json(tbl)
    return table_string(tbl, options.inline_json)
end

If we try print(inline_json(user)) we get:

{"last":"Mouse","first":"Minnie","friends":["Mickey","Goofy"]}

This is also a valid JSON format, but it is harder to read for humans.

Debug Format

We can add a set of formatting options that makes the structure of the table explicit. This can be useful when you are trying to add a custom set of formatting options:

options.debug = table_clone(options.pretty)
options.debug = table_clone(options.pretty)
options.debug.indent        = ' INDENT '
options.debug.table_begin   = 'TABLE BEGIN'
options.debug.table_end     = 'TABLE END'
options.debug.array_begin   = 'ARRAY BEGIN'
options.debug.array_end     = 'ARRAY END'
options.debug.key_begin     = ' KEY BEGIN '
options.debug.key_end       = ' KEY END = '
options.debug.sep           = ' SEP '
options.debug.show_indices  = true

As usual, we add the convenience function that packages those formatting options with table_string:

function debug(tbl)
    return table_string(tbl, options.debug)
end

If we try print(debug(user)) we get:

TABLE BEGIN
 INDENT  KEY BEGIN first KEY END = "Minnie" SEP
 INDENT  KEY BEGIN last KEY END = "Mouse" SEP
 INDENT  KEY BEGIN friends KEY END = ARRAY BEGIN
 INDENT  INDENT  KEY BEGIN 1 KEY END = "Mickey" SEP
 INDENT  INDENT  KEY BEGIN 2 KEY END = "Goofy"
 INDENT ARRAY END
TABLE END

Ordered Output

Lua has a single table type. However, as talked about several times now, under the covers, Lua distinguishes between the array part of a table and any dictionary part it might contain. The elements in a Lua array are in fixed constant order so that if:

local arr = { 'a', 'b', 'c' }

Then, print(inline(arr)) will always print ['a', 'b', 'c'].

In contrast, the element order in a general key-value table is not defined or constant. If we have:

local mouse = { first = 'Minnie', last = 'Mouse' }

Then, print(inline(mouse)) will sometimes display { last = Mouse, first = Minnie, }, other times { first = Minnie, last = Mouse, }.

Jumping around like that can be disconcerting.

So far, we have used the Lua standard pairs function to traverse through the key-value pairs in all tables.

    for k, v in pairs(tbl) do
        ...
    end

Lua provides an efficient iterator function, ipairs, specifically for arrays. We can alter our iteration based on whether the table is an array or a key-value table and get a little performance boost.

    local iter = array and ipairs or pairs
    for k, v in iter(tbl) do
        ...
    end

Of course, ipairs doesn’t solve the problem of inconsistent output for key-value tables.

Fortunately, Lua lets us define custom iterator functions, and we can create one to iterate over the keys in a consistent order.

1    local iter = array and ipairs or ordered_pairs
    for k, v in tbl(tbl) do
        ...
    end

1: We have replaced the standard pairs iterator with a custom ordered_pairs function.
We still use ipairs for arrays.

A custom iterator function is passed a table and should return the “next” key-value pair in the table. The function should return nil if no more key-value pairs exist. You are free to determine what “next” means in this context.

Here is a simple implementation of ordered_pairs:

local function ordered_pairs(tbl)
    local keys = {}
1    for k in pairs(tbl) do table.insert(keys, k) end
2    table.sort(keys)
    local i = 0
3    return function()
4        i = i + 1
5        return keys[i], tbl[keys[i]]
    end
end

1: We capture all the keys from tbl in the keys array.
2: The default behaviour for table.sort is alphabetical sorting.
However, table.sort can take a comparison function as a second argument if you want to sort the keys in a different order.
3: The ordered_pairs function returns an iterator which is itself a function.
4: The iterator function is a closure, so it has access to the keys and the current index i from the enclosing function.
5: The iterator increments the index i and returns the corresponding key-value pair from tbl.
The iterator will return nil, nil when there are no more elements, but you could put in an explicit check on i if you wanted to.

This version of ordered_keys assumes that the keys are all the same type, which is too limiting. The table.sort call will fail if they aren’t. A comparison function takes two arguments and returns true if the first argument should come before the second. We can make a default one that works for all types:

local function compare(a, b)
    local ta, tb = type(a), type(b)
    if ta ~= tb then
        return ta < tb
    elseif ta == 'table' or ta == 'boolean' or ta == 'function' then
        return tostring(a) < tostring(b)
    else
        return a < b
    end
end

This function sorts keys first by type and then by value. We note that alphabetically, number comes before string, so we will see numbers before strings, which is the standard convention.

We could use this function in ordered_pairs:

local function ordered_pairs(tbl)
    ...
1    table.sort(keys, compare)
    ...
end

1: We sort the keys using the comparison function compare.

However, the user may want to define a custom comparison function. For example, they might want to sort the keys case-insensitively or in reverse alphabetical order.

Ideally, we want the user to be able to pass a comparison function to ordered_pairs and have it return an iterator maker that can use that comparator to iterate over any table in a consistent order.

An extra level of indirection is required:

1local function ordered_pairs(comparator)
2    if comparator == false then return pairs end
3    comparator = comparator or compare
4    return function(tbl)
        local keys = {}
        for k, _ in pairs(tbl) do table.insert(keys, k) end
5        table.sort(keys, comparator)
        local i = 0
6        return function()
            i = i + 1
            return keys[i], tbl[keys[i]]
        end
    end
end

1: We have added a comparator argument, which should be a function that takes two keys and returns true if the first key should come before the second.
2: If comparator is explicitly set to false, we return the standard pairs iterator.
3: If comparator is missing, we use the compare.
4: We return a function that takes a table and returns an iterator function for that table using the sorted keys.
5: We sort the keys using comparator, which will be set by now.
6: The iterator function is a closure with access to the sorted keys and the current index.

Adding a layer of indirection is another typical pattern in programming. Our ordered_pairs is a function that returns a function that returns a function.

We add a comparator field to the options.pretty table:

local options = {}
options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
    show_indices  = false,
1    comparator    = compare
}

1: We use the default comparison function unless the user specifies otherwise.

The user can set the comparator field to false if they want to use the standard pairs iterator.

Aside: `nil` vs. `false`

Like many older languages, Lua treats nil as false in a conditional test.

However, false is a distinct value in Lua. It is a boolean that is false in a conditional test. In Lua, nil represents the absence of a value. false represents a value that is explicitly false.

Choosing to treat nil as false in a conditional test probably seemed convenient. It is a common idiom in many languages, particularly C, where 0 can represent false. Modern languages have moved away from this.

This conflating of nil and false can lead to subtle bugs. This is particularly true in Lua, where you will likely have functions with optional arguments. The common idiom for optional arguments looks like this:

local function foo(arg)
    arg = arg or 'default'
    print(arg)
end

If arg is missing or nil, it will be set to 'default'. If arg is explicitly false, it will still be set to 'default' which is probably not what you want.

Try it:

foo()           -- prints 'default'
foo(nil)        -- prints 'default'
foo('hello')    -- prints 'hello'
1foo(false)      -- prints 'default'

1: This is not what you want!

From personal experience, this will bite you at some point.

You sometimes might want to distinguish between the absence of an argument and an explicitly false argument. We can rewrite foo to handle this:

local function foo(arg)
1    if arg == false then print('false') end
    arg = arg or 'default'
    print(arg)
end

foo()           -- prints 'default'
foo(nil)        -- prints 'default'
foo('hello')    -- prints 'hello'
foo(false)      -- prints 'false'

1: We added a check for arg being explicitly false.

Ordered Output Resolved

The change to table_string is quite small:

function table_string(tbl, opts)
    ...
1    local iter = array and ipairs or ordered_pairs(opts.comparator)
    for k, v in iter(tbl) do
        ...
    end
    ...
end

1: We have replaced the pairs iterator with ordered_pairs using a user-defined comparison function for non-arrays.

Now if you try print(pretty(user)) you always get:

1{
    first = Minnie,
2    friends = [
        Mickey,
        Goofy
    ],
    last = Mouse
}

1: user is a key-value table, and the elements are shown with the keys alphabetically.
2: friends is a sub-array with the elements shown in index order.

Inlining Simple Sub-Tables

A nice feature of some pretty-printers is the ability to inline “simple” sub-tables. This option can make the output more readable and compact.

Of course, we need to define what “simple” means. It could be a small table that fits inside a set number of characters. Or it could be a table with a certain number of elements.

For our purposes, we will consider a table “simple” if it has no sub-tables. We will also add an optional limit on the number of elements to this definition.

We can alter our metadata function to return the number of sub-tables in a table:

local function metadata(tbl)
    local size = 0
    local array = true
1    local subs = 0
    for _, v in pairs(tbl) do
        size = size + 1
        if array and tbl[size] == nil then array = false end
2        if type(v) == 'table' then subs = subs + 1  end
    end
3    local md = { size = size, array = array, subs = subs }
4    return md
end

1: subs will be the number of sub-tables.
2: If we find a sub-table, we increment subs.
3: Instead of returning three values, we create a table with three fields.
4: We return the metadata table.

If you haven’t seen this coding style before, the md table is created with a table constructor. It is a shorthand way to create a table with some initial values. Assignments of the form tbl = { x = x } look odd, but they are a common idiom in Lua. The assignment is shorthand for tbl[x] = x where the x key is a string, and the x value can be any type.

We can now use the subs field in our table_string method to decide whether to inline a sub-table.

However, whether or not to inline simple tables should also be user-configurable. To accommodate that, we can add another field to our options table.

local options = {}
options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
    show_indices  = false,
    comparator    = compare,
1    inline_size   = math.huge
}

options.classic = table_clone(options.pretty)
options.classic.array_begin     = '{'
options.classic.array_end       = '}'
2options.classic.inline_size     = 0

1: A simple table will be inlined if it has no sub-tables and strictly fewer than inline_size elements.
2: In the classic format, we never inline simple tables.

So, by default, simple tables are always inlined in the pretty format and never in the classic format. If you set inline_size to 6 in the pretty format, we inline simple tables if they have fewer than six elements.

Given our current setup, it only takes a small tweak to our existing code to accommodate this new feature:

function table_string(tbl, opts)
    opts = opts or options.pretty

1    local md = metadata(tbl)
2    local size   = md.size
    local array  = md.array
3    local simple = md.subs == 0 and md.size <  options.inline_size

    if size == 0 then return empty_table_string(opts) end
    local show_keys = not array and true or opts.show_indices

    local tb     = array and opts.array_begin or opts.table_begin
    local te     = array and opts.array_end or opts.table_end
    local kb, ke = opts.key_begin, opts.key_end
    local sep    = opts.sep
4    local indent = simple and '' or opts.indent
    local nl     = indent == '' and opts.inline_spacer or '\n'
    local delims = tb ~= ''

    sep = sep .. nl
    if delims then tb, te = tb .. nl, nl .. te  else indent = '' end

    local content = ''
    local i = 0
    local iter = array and ipairs or ordered_pairs(opts.comparator)
    for k, v in iter(tbl) do
        i = i + 1
        local k_string = show_keys and kb .. tostring(k) .. ke or ''
        local v_string = ''
        if type(v) == 'table' then
            v_string = table_string(v, opts)
            v_string = indent_string(v_string, opts.indent, delims)
            if delims == false and show_keys then v_string = nl .. v_string end
        else
            v_string = simple_string(v)
        end
        content = content .. indent .. k_string .. v_string
        if i < size then content = content .. sep end
    end
    return tb .. content .. te
end

1: metadata returns a table instead of a couple of values.
2: Extract the size and array values from the md table.
3: If there are no sub-tables and the table is small enough, we consider it simple.
4: This is the only change needed to incorporate that new metadata about tbl.

Looking at print(pretty(user)) we get:

{
    first = "Minnie",
1    friends = [ "Mickey", "Goofy" ],
    last = "Mouse"
}

1: Now, the friends array is printed inline as it has no sub-tables.

A more interesting example is:

local matrix = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} }

The print(classic(matrix)) gives:

With our tweaks print(pretty(matrix)) yields a much more readable:

[
    [ 1, 2, 3 ],
    [ 4, 5, 6 ],
    [ 7, 8, 9 ]
]

And print(alt(matrix)) yields

    1, 2, 3,
    4, 5, 6,
    7, 8, 9

Table Metadata

Our current scheme computes each table’s metadata on the fly. When we start our process with the root table, or when we recurse into a sub-table, we have the call to compute the metadata for the table that is currently under the microscope:

function table_string(tbl, opts)
    opts = opts or options.pretty

1    local md = metadata(tbl)
    local size   = md.size
    ...

1: The current table of interest is tbl.
md(tbl) returns a metadata table for tbl.

However, tables can reference other tables and even have references to themselves. For example, we might build a website with Disney characters and have a gallery where visitors can flip from one star to the next and back to the previous one, etc.

A doubly linked list is one data structure to model this type of interaction. In the most dumbed down, minimal version, we might have:

local stars =
{
    c1 = { first = "Mickey", last = "Mouse" },
    c2 = { first = "Minnie", last = "Mouse" }
}
stars.c1.next = stars.c2
stars.c2.prev = stars.c1
stars.home = stars

Here, c1, c2, … are characters. Each has a table of associated data (more realistically, a table of image links and the like).

The characters are connected by their next and previous links. To cap it all, we have a “home” link back to the original table — a self-reference.

If you try print(pretty(stars)) with our current implementation, the program will chase its tail and die of pure embarrassment at the rubbish state of table_string.

Before we get to that, we will first alter our metadata function significantly.

Instead of treating each table as it comes along and passing back some associated metadata, we will view the table as a whole entity in one go.

Our current metadata(tbl) returns md, a table with three fields, size, array and simple, that tell you something about tbl.

In our new implementation, metadata(tbl) will return md as a table of tables. If t is tbl itself or any sub-table of tbl, then

Field	Description
`md[t].size`	The number of top-level elements in `t`.
`md[t].array`	This will be `true` if `t` is a Lua array, otherwise `false`.
`md[t].subs`	The number of sub-tables in `t`.

Here is what our new call-it-once-and-be-done metadata function looks like:

1local function metadata(tbl, md)
2    md = md or {}
3    md[tbl] = {}
    local size, array, subs = 0, true, 0
    for _, v in pairs(tbl) do
        size = size + 1
        if array and tbl[size] == nil then array = false end
        if type(v) == 'table' then
            subs = subs + 1
4            if not md[v] then metadata(v, md) end
        end
    end
5    md[tbl].size  = size
    md[tbl].array = array
    md[tbl].subs  = subs
    return md
end

1: We’ve added md to the calling signature. It will be missing on the first call.
2: If md is completely missing, we set it up as an empty table.
3: We set up md[tbl] as an empty sub-table of md.
4: As we iterate through tbl, we may come across a new sub-table v, which is handled by recursion.
5: Record the three bits of metadata for tbl in the md[tbl] sub-table.

To use this new metadata method, we also need to alter table_string. That can be done a couple of different ways. One way to go is to make table_string a little wrapper around a recursive closure that does most of the work:

1function table_string(root_tbl, opts)
    opts = opts or options.pretty
2    local md = metadata(root_tbl)

3    local function process(tbl)
4        local size   = md[tbl].size
        if size == 0 then return empty_table_string(opts) end

        local array  = md[tbl].array
        local show_keys = not array and true or opts.show_indices

        local simple = md[tbl].subs == 0 and size < opts.inline_size
        local indent = simple and '' or opts.indent

        local tb     = array and opts.array_begin or opts.table_begin
        local te     = array and opts.array_end or opts.table_end
        local kb, ke = opts.key_begin, opts.key_end
        local nl     = indent == '' and opts.inline_spacer or '\n'
        local sep    = opts.sep .. nl

        local delims = tb ~= ''
        if delims then tb, te = tb .. nl, nl .. te  else indent = '' end

        local content = ''
        local i = 0
        local iter = array and ipairs or ordered_pairs(opts.comparator)
        for k, v in iter(tbl) do
            i = i + 1
            local k_string = show_keys and kb .. tostring(k) .. ke or ''
            local v_string = ''
            if type(v) == 'table' then
5                v_string = process(v)
                v_string = indent_string(v_string, opts.indent, delims)
                if delims == false and show_keys then v_string = nl .. v_string end
            else
                v_string = simple_string(v)
            end
            content = content .. indent .. k_string .. v_string
            if i < size then content = content .. sep end
        end
        return tb .. content .. te
    end

6    local retval = process(root_tbl)
    return retval
end

1: Now, table_string is primarily a wrapper around the inner process function.
We have changed the first argument to root_tbl to clarify that this is the root table.
2: We compute the root table root_tbl metadata and store it in md.
3: The process function is a closure and can access the enclosed md table.
4: md[tbl] is a sub-table, currently with three fields, size, array and simple.
5: If we hit a sub-table, we recurse using process. The md table does not need recomputing and continues to be available as we process v.
6: Most of the source lines in table_string are in the private process sub-function. We have md and get the ball rolling by running process on root_tbl.

Cyclical References

If we look at a simple linked list example:

local stars =
{
    c1 = { first = "Mickey", last = "Mouse"},
    c2 = { first = "Minnie", last = "Mouse"},
}
stars.c1.next = stars.c2

Then print(pretty(stars)) returns:

{
    c1 =
    {
        next = {
            first = Minnie,
            last = Mouse
        },
        first = Mickey,
        last = Mouse
    },
    c2 = {
        first = Minnie,
        last = Mouse
    }
}

We see two definitions of c2!
One is in the next field for c1 and another when we get to c2 by itself. That’s not ideal.

Things get worse if we use a doubly linked list by adding:

stars.c2.prev = stars.c1

Now, when we try print(pretty(stars)) the program will crash with a message like

1/path/to/script: stack overflow
stack traceback:
 /path/to/script:49: in function 'table_size_and_type'
 /path/to/script:98: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
2 ... (skipping 58803 levels)
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
 /path/to/script: in function 'table_string'
3 (...tail calls...)

1: Lua’s interpreter has run out of room.
2: That’s a lot of skipping!
3: It’s more like tail chasing in this instance!

It is easy to see what the issue is. When we convert c1 to a string, it encounters a sub-table c2. Our function then calls itself with a request to convert c2 to a string. That call, in its turn, will encounter c2.prev = c1 and see that c1 is a table. It handles that by calling itself with a request to convert c1 to a string. And round and round we go!

Our current solution doesn’t handle tables with shared references well. Even if it manages to complete, the shared table will be defined multiple times. The situation is even worse if there are cycles to be navigated. Those cause the program to crash with a stack overflow,

Lua makes it very easy to have tables with multiple references and cycles. Under the covers, the assignment c2.prev = c1 sets up another pointer to c1. No copying is done; everything is very efficient.

That’s great for many algorithms you might use beyond the most straightforward, plain old data tables. We still need to examine and view those tables without crashes.

Crash Proofing

The key to handling tables with cycles and shared references is marking those tables we have already put out a full string definition for. If we see those marked tables again, we can do something more sensible than trying to define them again and potentially going around in circles.

Our metadata function returns a metadata table for each table and sub-table it encounters. Currently, there are just three fields in that metadata table: size, array, and simple. We can add a fourth field, processed, that will be true if we have already seen and processed that table. If the processed field is true, we can print a simple reference to the table instead of trying to define it again. If the field is missing, we can define the table as we do now.

Here is what the table_string function looks like with the processed field added:

function table_string(root_tbl, opts)
    opts = opts or options.pretty
    local md = metadata(tbl)

    local function process(tbl)
1        md[tbl].processed = true
        ...
        for k, v in iter(tbl) do
            i = i + 1
            local k_string = show_keys and kb .. tostring(k) .. ke or ''
            local v_string = ''
            if type(v) == 'table' then
                if md[v].processed then
2                    v_string = simple_string(v)
                else
3                    v_string = process(v)
                    v_string = indent_string(v_string, opts.indent, delims)
                    if delims == false and show_keys then v_string = nl .. v_string end
                end
            ...
        end
        return tb .. content .. te
    end

    local retval = process(root_tbl)
    return retval
end

1: We are about to process tbl, so we mark it as processed in case it has a self-reference.
2: We have seen v before and can do something else instead of recursing.
Here, we print a reference to the table’s address.
3: Recurse into v and build up a complete definition for it.

Now, if you try print(pretty(stars)) on our doubly linked list of stars, you get something like this:

{
    c1 = {
        first = "Mickey",
        last = "Mouse",
        next = {
            first = "Minnie",
            last = "Mouse",
            prev = 0x600002ec0ec0
        }
    },
    c2 = 0x600002ec0f00,
}

The shared references are just table addresses, which isn’t user-friendly but better than crashing!

We can even add a self-reference to the stars table like this:

stars.home = stars

Then print(pretty(stars)) yields:

{
    c1 = {
        first = "Mickey",
        last = "Mouse",
        next = {
            first = "Minnie",
            last = "Mouse",
            prev = 0x6000012ecec0
        }
    },
    c2 = 0x6000012ecf00,
    home = 0x6000012ece80
}

Paths

That output is not very user-friendly.

How should we see those references? Ideally, we should see an understandable description of the reference.

Every table has a unique address in Lua, which we could use. However, as we saw above, that’s not very user-friendly. We could use the key in the table that points to the shared table. That is better, but still not great. We could use a path to the table from the top-level root table. This is the best option.

Then, in the case where there is no self-reference, we might see:

{
    c2 = {
        first = Minnie,
        prev = {
            first = Mickey,
1            next = <c1>,
            last = Mouse
        },
        last = Mouse
    },
2    c1 = <c2.prev>
}

1: The value of next refers to the table at the path c1.
2: The value of c1 refers to the table at the path c2.prev.

If the root table is tbl, then the path "<foo.bar.baz>" refers to the value tbl.foo.bar.baz. Thus, foo is a sub-table of tbl, bar is a sub-table of foo, and baz is a value in bar.

If there is a self-reference, such as stars.home = stars, we might see:

1<table> = {
    c2 = {
        first = Minnie,
        prev = {
            first = Mickey,
2            next = <c1>,
            last = Mouse
        },
        last = Mouse
    },
    c1 = <c2.prev>,
3    home = <table>
}

1: We only put out the <table> = ... line if there is a self-reference.
2: We could use the full path, <table.c1>, here, but that is generally overkill.
3: The value of home refers to the table itself.

In this representation, there are some obvious user-settable options: - The string used for the root table if there are any top-level self-references. In the example, we use table for that. - The separator to use in the path string to sub-sub-tables etc. In the example, we use".". - Perhaps the delimiters to use for path strings, which in the example are < and >.

Let’s add those to our options.pretty table:

local options = {}
options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
    show_indices  = false,
    comparator    = compare,
    inline_size   = math.huge,
1    path_root     = 'table',
2    path_sep      = '.',
3    path_begin    = '<',
    path_end      = '>'
}

1: The string for the root table if there are any top-level self-references.
2: The separator used in the path string to sub-sub-tables, etc.
3: The delimiters used for the path string.

With that in place, we can modify the table_string function as follows:

function table_string(root_tbl, opts)
    opts = opts or options.pretty
    local md = metadata(root_tbl)

1    local function process(tbl, path)
2        md[tbl].path = path

        local size   = md[tbl].size
        if size == 0 then return empty_table_string(opts) end

        local array  = md[tbl].array
        local show_keys = not array and true or opts.show_indices

        local simple = md[tbl].subs == 0 and size < opts.inline_size
        local indent = simple and '' or opts.indent

        local tb     = array and opts.array_begin or opts.table_begin
        local te     = array and opts.array_end or opts.table_end
        local kb, ke = opts.key_begin, opts.key_end
3        local pb, pe = opts.path_begin, opts.path_end
        local nl     = indent == '' and opts.inline_spacer or '\n'
        local sep    = opts.sep .. nl

        local delims = tb ~= ''
        if delims then tb, te = tb .. nl, nl .. te  else indent = '' end

        local content = ''
        local i = 0
        local iter = array and ipairs or ordered_pairs(opts.comparator)
        for k, v in iter(tbl) do
            i = i + 1
            local k_string = show_keys and kb .. tostring(k) .. ke or ''
            local v_string = ''
            if type(v) == 'table' then
                if md[v].path then
4                    v_string = pb .. md[v].path .. pe
                else
5                    local v_path = path .. opts.path_sep .. tostring(k)
                    v_string = process(v, v_path)
6                    v_string = indent_string(v_string, opts.indent, delims)
                    if delims == false and show_keys then v_string = nl .. v_string end
                end
            else
                v_string = simple_string(v)
            end
            content = content .. indent .. k_string .. v_string
            if i < size then content = content .. sep end
        end
        return tb .. content .. te
    end

7    local retval = process(root_tbl, opts.path_root)
    return retval
end

1: We have added an extra path argument.
2: We record the path to this table tbl as the value under the metadata key path in md[tbl].
3: Localise the path-begin and path-end delimiters.
4: If we have seen v before, we use the path string we stored in md for v, formatted with the delimiters.
5: v is a new table, so we need a path to v, which we get by appending the key k to the current path.
6: We recurse processing the contents of v using that new path string.
7: Kick off the process with the root table and path.

Now, if you try print(pretty(stars)) on our doubly linked list of stars, we get:

{
    c1 = {
        first = Mickey,
        last = Mouse,
        next = {
            first = Minnie,
            last = Mouse,
1            prev = <table.c1>
        }
    },
    c2 = <table.c1.next>,
2    home = <table>
}

1: The value of prev refers to the path table.c1.
2: The value of home refers to the table itself.

In a reference like <table.c1.next>, the root path prefix table. isn’t necessary. We will remove it in the next iteration.

Complete self-references like our home = <table> line are uncommon, but we would like to have that <table> defined if it does occur. Something along these lines:

<table> = {
 ...
}

However, that extra <table> = should only be present if there is a self-reference.

We can alter table_string as follows:

function table_string(root_tbl, opts)
    opts = opts or options.pretty
    local md = metadata(root_tbl)

1    local root = root_tbl
2    local root_ref = false

3    local kb, ke = opts.key_begin, opts.key_end
    local pb, pe = opts.path_begin, opts.path_end

    local function process(tbl, path)
        md[tbl].path = path
4        local path_prefix = path == opts.path_root and '' or path .. opts.path_sep

        local size = md[tbl].size
        if size == 0 then return empty_table_string(opts) end

        local array = md[tbl].array
        local show_keys = not array and true or opts.show_indices

        local simple = md[tbl].subs == 0 and size < opts.inline_size
        local indent = simple and '' or opts.indent

        local tb = array and opts.array_begin or opts.table_begin
        local te = array and opts.array_end or opts.table_end
        local nl = indent == '' and opts.inline_spacer or '\n'
        local sep = opts.sep .. nl

        local delims = tb ~= ''
        if delims then tb, te = tb .. nl, nl .. te else indent = '' end

        local content = ''
        local i = 0
        local iter = array and ipairs or ordered_pairs(opts.comparator)
        for k, v in iter(tbl) do
            i = i + 1
            local k_string = show_keys and kb .. tostring(k) .. ke or ''
            local v_string = ''
            if type(v) == 'table' then
                if md[v].path then
                    v_string = pb .. md[v].path .. pe
5                    if v == root then root_ref = true end
                else
6                    local v_path = path_prefix .. tostring(k)
                    v_string = process(v, v_path)
                    v_string = indent_string(v_string, opts.indent, delims)
                    if delims == false and show_keys then v_string = nl .. v_string end
                end
            else
                v_string = simple_string(v)
            end
            content = content .. indent .. k_string .. v_string
            if i < size then content = content .. sep end
        end
        return tb .. content .. te
    end

    local retval = process(root_tbl, opts.path_root)
7    if root_ref then
        retval = pb .. opts.path_root .. pe .. ' = ' .. retval
    end
    return retval
end

1: We capture the root table in root.
2: We capture whether there is a self-reference to the root table in root_ref.
3: Localise some delimiters that never vary by context (hoist these constant lines from the process function).
4: If this is not the root table, we will prepend any new path with a path prefix.
5: We record the self-reference to the root table if v is the root table.
6: We prepend the path with the path prefix if tbl is not the root table.
7: If there is a self-reference to the root table, we prepend the return string with <table> =.

Here’s the output from the latest version of print(pretty(stars)):

1<table> = {
    c1 = {
        first = Mickey,
        last = Mouse,
        next = {
            first = Minnie,
            last = Mouse,
            prev = <c1>
        }
    },
2    c2 = <c1.next>,
3    home = <table>
}

1: There is a self-reference to the stars parent table, so we have prepended the string with <table> =.
2: This looks better than <table.c1.next>.
3: Here is the self-reference to the root table, which reads quite naturally.

If we remove the stars.home = stars assignment then print(pretty(stars)) returns:

1{
    c1 = {
        next = {
            first = Minnie,
            prev = <c1>,
            last = Mouse
        },
        first = Mickey,
        last = Mouse
    },
    c2 = <c1.next>
}

1: There is no self-reference, so we do not need that <table> = we saw earlier.

Breadth First Traversal

While that last output is undoubtedly valid, it fails the readability test.

That c2 = <c1.next> is perfectly correct, but you have to go back and find the definition of c1 to understand what c1.next actually is. It would be much better to see the definition of c2 right there, not nested inside c1. We are after something that looks like this:

{
 c1 =
 {
 first = Minnie,
 last = Mouse,
 next = <c2>
 },
 c2 =
 {
 first = Mickey,
 last = Mouse,
 prev = <c1>
 },
 home = <table>
}

We would like to see the full definition of tables at the shallowest possible depth.

The root problem is that we are traversing tables depth-first.

We process all the elements in c1 before getting to c2. So when we see c1.next, we print the full definition of what c2 really is. Then, later, when we get to c2, we see that we have already processed it and output it as a reference to <c1.next>. That is ass-backwards and c1.next should be the reference to <c2>, and the definition of c2 should be deferred to later.

All the table-to-string implementations that are available on the web seem to have this problem, The depth-first traversal is a natural choice, but it doesn’t provide the most readable output.

We need to change the table traversal to be breadth-first. Then, we process the elements of tbl in the order they appear at the top level. If we encounter a sub-table, we will defer turning it to a string until after processing all the top-level elements.

To demonstrate, let’s see how breadth first traversal works for the simpler metadata method:

local function metadata(tbl, md)
    md = md or {}
    md[tbl] = {}
    local size, array, subs = 0, true, 0
1    local children = {}
    for _, v in pairs(tbl) do
        size = size + 1
        if array and tbl[size] == nil then array = false end
        if type(v) == 'table' then
            subs = subs + 1
2            if not md[v] then table.insert(children, v) end
        end
    end
    md[tbl].size, md[tbl].array, md[tbl].subs = size, array, subs

3    for _, child in ipairs(children) do metadata(child, md) end
    return md
end

1: We keep a list of the sub-tables we encounter.
2: If we encounter a sub-table, we add it to the list of children and defer immediate processing.
3: After processing all the top-level elements, we then process the children.

Changing the processing order in metadata doesn’t change the output. print(pretty(metadata(stars))) still gives:

<table> = {
 c1 = {
 first = "Mickey",
 last = "Mouse",
 next = {
 first = "Minnie",
 last = "Mouse",
 prev = <c1>
 }
 },
 c2 = <c1.next>,
 home = <table>
}

We need to apply similar changes to the more complex table_string function:

function table_string(root_tbl, opts)
    ...
    local function process(tbl, path)
        ...
        local children = {}
        ...
        for k, v in iter(tbl) do
            ...
            if type(v) == 'table' then
                if md[v].path then
                    v_string = pb .. md[v].path .. pe
                    if v == root then root_ref = true end
                else
                    local v_path = path_prefix .. tostring(k)
                    v_string = simple_string(v)
                    md[v].path = v_path
                    children[v] = v_path
                    if delims == false and show_keys then v_string = nl .. v_string end
                end
            else
                v_string = v_string .. simple_string(v)
            end
            content = content .. indent .. k_string .. v_string
            if i < size then content = content .. sep end
        end
        local retval = tb .. content .. te

        for child_table, child_path in pairs(children) do
            local child_string = process(child_table, child_path)
            child_string = indent_string(child_string, opts.indent, delims)
            retval = retval:gsub(simple_string(child_table), child_string)
        end
        return retval
    end

    local retval = process(root_tbl, opts.path_root)
    if root_ref then retval = pb .. opts.path_root .. pe .. ' = ' .. retval end
    return retval
end

With that change, print(pretty(stars)) now gives:

<table> = {
 c1 = {
 first = "Mickey",
 last = "Mouse",
 next = <c2>
 },
 c2 = {
 first = "Minnie",
 last = "Mouse",
 prev = <c1>
 },
 home = <table>
}

Arrays

That last table is very readable. Every shared reference like c1.next = <c2> has an easily identifiable right-hand side value, the value associated with the key c2 in this case.

However, we have gone to some lengths to suppress showing explicit keys for Lua tables that happen to be arrays. If we have an array of arrays with shared references, the paths will lack clarity.

For example, perhaps you are coding a Cludeo-type murder mystery game set in a big house with many rooms stored as an array. Each room might have a potential murder weapon in it:

local rooms = {
    { name = "Library", weapon = "Lead Pipe" },
    { name = "Kitchen", weapon = "Knife"     },
    { name = "Lounge",  weapon = "Poison"    },
    { name = "Bedroom", weapon = "Garrotte"  }
}

The user will move from room to room in a fashion that might be randomly generated or set by the game’s storyline. To keep it simple, we add next and prev fields to each room as follows:

rooms[1].next, rooms[2].next, rooms[3].next, rooms[4].next = rooms[2], rooms[3], rooms[4], rooms[1]
rooms[1].prev, rooms[2].prev, rooms[3].prev, rooms[4].prev = rooms[4], rooms[1], rooms[2], rooms[3]

Now if we print(pretty(rooms)) we get:

[
    {
        name = "Library",
        next = <2>,
        prev = <4>,
        weapon = "Lead Pipe"
    },
    {
        name = "Kitchen",
        next = <3>,
        prev = <1>,
        weapon = "Knife"
    },
    {
        name = "Lounge",
        next = <4>,
        prev = <2>,
        weapon = "Poison"
    },
    {
        name = "Bedroom",
        next = <1>,
        prev = <3>,
        weapon = "Garrotte"
    }
]

rooms is an array printed without showing the indices. The problem is that path references like next = <1> don’t make much sense.

If the value associated with an index is shared, we want to see that index explicitly.

The current implementation makes this difficult. The main loop in table_string looks like this:

    ...
        for k, v in iter(tbl) do
            i = i + 1
            local k_string = show_keys and kb .. tostring(k) .. ke or ''
            local v_string = ''
            if type(v) == 'table' then

            local k_string = show_keys and kb .. tostring(k) .. ke or ''
    ...

We are creating the key string k_string before we know whether the associate value v is a table, let alone a shared table. We also put out the key-value pair at one depth, but any shared reference may be at a different depth.

The solution is two-fold. First, add a new metadata field, refs, for each table and sub-table. md[t].refs will be the number of references seen for the table t. If md[t].refs is greater than 1, then t is a shared table.

We can compute the reference count field using the metadata method. We also switch the style of the function to having an inner hidden process closure that does all the work. Tables are still getting traversed depth-first.

local function metadata(root_tbl)
1    local md = {}
2    md[root_tbl] = { refs = 1 }

3    local function process(tbl)
        local size, array, subs  = 0, true, 0
        local children = {}
        for _, v in pairs(tbl) do
            size = size + 1
            if array and tbl[size] == nil then array = false end
            if type(v) == 'table' then
                subs = subs + 1
                if md[v] then
4                    md[v].refs = md[v].refs + 1
                else
5                    table.insert(children, v)
6                    md[v] = { refs = 1 }
                end
            end
        end
        md[tbl].size, md[tbl].array, md[tbl].subs = size, array, subs
7        for _, child in ipairs(children) do process(child) end
    end

8    process(root_tbl)
    return md
end

1: We set up the metadata table that will be accessible inside the process closure.
2: We immediately add an entry for the root table as it might be referenced by its immediate children
3: process is the recursive function that does all the heavy lifting.
4: If we’ve seen v before, we increment its reference count.
5: Otherwise we add v to the list of sub-tables to process later.
6: We add a metadata entry for v here in case it is referenced by an immediate sibling.
7: Go ahead and process the granchildren etc.
8: We kick things off by processing the root table.

Of course, we must tweak our table_string method:

function table_string(root_tbl, opts)
    ...
    local function process(tbl, path)
        ...
        for k, v in iter(tbl) do
            i = i + 1
1            local show_key = show_keys
            local v_string = ''
            if type(v) == 'table' then
                if md[v].path then
                    v_string = pb .. md[v].path .. pe
                    if v == root then root_ref = true end
                else
2                    if md[v].refs > 1 then show_key = true end
                    local v_path = path_prefix .. tostring(k)
                    v_string = simple_string(v)
                    md[v].path = v_path
                    children[v] = v_path
                    if delims == false and show_key then v_string = nl .. v_string end
                end
            else
                v_string = v_string .. simple_string(v)
            end
3            local k_string = show_key and kb .. tostring(k) .. ke or ''
            content = content .. indent .. k_string .. v_string
            if i < size then content = content .. sep end
        end
        ...
    end
    ...
end

1: By default we show this key based on the value of show_keys.
2: If v is new and has a reference count greater than 1, we will show the corresponding key whether or not show_keys is false. We must do that so that any path references to v make sense.
3: Now that we know the state of play, we can finally set the string for this key.

With this change, print(pretty(rooms)) gives:

[
    1 = {
        name = "Library",
1        next = <2>,
        prev = <4>,
        weapon = "Lead Pipe"
    },
    2 = {
        name = "Kitchen",
        next = <3>,
        prev = <1>,
        weapon = "Knife"
    },
    3 = {
        name = "Lounge",
        next = <4>,
        prev = <2>,
        weapon = "Poison"
    },
    4 = {
        name = "Bedroom",
        next = <1>,
        prev = <3>,
        weapon = "Garrotte"
    }
]

1: The path reference <2> now makes perfect sense.

Here’s what we get for print(alt(rooms)):

1:
    name: "Library",
    next: <2>,
    prev: <4>,
    weapon: "Lead Pipe",
2:
    name: "Kitchen",
    next: <3>,
    prev: <1>,
    weapon: "Knife",
3:
    name: "Lounge",
    next: <4>,
    prev: <2>,
    weapon: "Poison",
4:
    name: "Bedroom",
    next: <1>,
    prev: <3>,
    weapon: "Garrotte"

This output is also very readable.

One Small Tweak

Our current definition of a “simple” table is one that has no sub-tables. But what is a sub-table?

We can very slightly alter our metadata function to not count path references as distinct sub-tables.

local function metadata(root_tbl)
    ...
    local function process(tbl)
        ...
        for _, v in pairs(tbl) do
            ...
            if type(v) == 'table' then
1                -- subs = subs + 1
                if md[v] then
                    md[v].refs = md[v].refs + 1
                else
2                    subs = subs + 1
                    table.insert(children, v)
                    md[v] = { refs = 1 }
                end
            end
        end
        ...
    end
end

1: We move this line
2: to here.

With that change only “real” sub-tables count towards the sub total.

print(pretty(rooms)) now gives the more compact but still readable:

[
    1 = { name = "Library", next = <2>, prev = <4>, weapon = "Lead Pipe" },
    2 = { name = "Kitchen", next = <3>, prev = <1>, weapon = "Knife" },
    3 = { name = "Lounge", next = <4>, prev = <2>, weapon = "Poison" },
    4 = { name = "Bedroom", next = <1>, prev = <3>, weapon = "Garrotte" }
]

Scribe Facade

Introduction

After the first attempt at table_string(tbl), we commented that, while the method name was descriptive, we needed to check that the tbl argument is an actual table.

Instead of doing that, we will create another “facade” function scribe that will return a string for any Lua object. The user will call this function, and we will make table_string a private function only called by scribe when the object is a table. Currently, our table_string function starts as follows:

1function table_string(root_tbl, opts)
2    opts = opts or options.pretty
    local md = metadata(root_tbl)
    ...
end

1: table_string is a global function that is available to the user.
2: It has to check if opts is provided; if not, set it to the default options.pretty.

We will change this to:

1local function table_string(root_tbl, opts)
2    local md = metadata(root_tbl)
    ...
end

1: We make table_string a local function.
2: We remove the opts check as we know that scribe will always provide it.

In a later chapter, we will discuss the difference between global and local functions.

In the meantime, we introduce scribe as follows:

1function scribe(obj, opts)
2    if type(obj) ~= 'table' then return simple_string(obj) end
3    opts = opts or options.pretty
4    return table_string(obj, opts)
end

1: obj can be any Lua object and opts is an optional table of opts.
2: We handle non-table objects up-front by calling simple_string.
3: We set the opts to the default options.pretty if it is not provided.
4: If we get here, we know that obj is a table, so we call the private table_string method to convert it to a string.

Of course, our other public facade functions will also call scribe instead of table_string directly. For example, pretty_string will now look like this:

function pretty_string(tbl, opts)
1    return scribe(tbl, options.pretty)
end

1: We call scribe with the options.pretty table.

Health and Safety

We have now added a layer of protection to our table_string function by ensuring that it is only called by scribe when the object is a table.

However, we still need to check that the opts table is complete. Each of those many fields in the options table must be present, or table_string will fail.

Of course, we are sure that the standard options tables we provide are complete, but what if the user provides their own options table?

We start by adding a “marker” to our own options tables to indicate that they are complete:

local options = {}
options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
    show_indices  = false,
    comparator    = compare,
    inline_size   = math.huge,
    path_root     = 'table',
    path_sep      = '.',
    path_begin    = '<',
    path_end      = '>',
1    COMPLETE      = true
}

1: If the user provides their own options table, we will check for the presence of this field to determine if it is complete.

We also add a function that adds missing fields to an options table:

1local function complete_options_table(options, from)
    for k, v in pairs(from) do
2        if options[k] == nil then options[k] = v end
    end
end

1: This function takes two arguments: the opts table to complete and the from table to use as a template.
2: We add missing fields from the from table to the opts table.

complete_options_table is a private function that is only called by scribe, so we are sure that edge cases are handled correctly. For example, we can be confident that there will be two arguments and that the second argument will be a complete options table.

We call this function scribe:

function scribe(obj, opts)
    if type(obj) ~= 'table' then return simple_string(obj) end
1    opts = opts or options.pretty
2    if not opts.COMPLETE then
3        local from = opts.indent == '' and options.inline or options.pretty
4        complete_options_table(options, from)
    end
5    return table_string(obj, opts)
end

1: If the user does not provide any options table, we use the options.pretty table.
2: If the user provides a custom options table, we ensure it’s complete before calling table_string.
3: We use the options.inline table if the indent field is empty. Otherwise, we use the options.pretty table.
4: We call the complete_options_table function to add any missing fields to the opts table.
5: We can safely call table_string with the complete opts table.

Adding that COMPLETE field to our options tables can avoid most performance issues and ensure that our code is robust.

There is a caveat to this approach. If the user provides their own incomplete options table, then the first time we see it, we alter it. Generally, changing things under-the-covers is a bad idea, but in this case, the user will only see the performance hit once. All in all, it is a reasonable trade-off.

Here is an example of how the user can provide their own minimal options table that sets the indent to two spaces:

local my_options = { indent = '  ' }
local user =
{
    first = "Minnie",
    last = "Mouse",
    friends = { "Mickey", "Goofy" }
}
print(scribe(user, my_options))

This will output:

{
 first = "Minnie",
 last = "Mouse",
 friends = { "Mickey", "Goofy" }
}

The my_options table is complete as far as table_string is concerned. We can inspect it by a call to print(classic(my_options)) which will output:

{
1    COMPLETE = true,
    array_begin = "[",
    array_end = "]",
2    comparator = <function>,
3    indent = "  ",
4    inline_size = inf,
    inline_spacer = " ",
    key_begin = "",
    key_end = " = ",
    path_begin = "<",
    path_end = ">",
    path_root = "table",
    path_sep = ".",
    sep = ",",
    show_indices = false,
    table_begin = "{",
    table_end = "}"
}

1: The COMPLETE field is present and set to true,; all the other fields are present and mostly set to the default values from the options.pretty table.
2: The comparator field is shown as <function>.
3: The indent field is set to two spaces as provided by the user.
4: inf means infinity, accessible in Lua as math.huge.

The next time we call scribe with the my_options table, it will be complete and we will not have to call complete_options_table again.

Overrides

We also want to allow the user to override one or more options in any of the pre-canned options tables.

The signature of your main scribe function will now look like this:

1function scribe(obj, options, overrides)
    ...
end

1: We add a third argument, overrides, which is an optional table of options to override.

Now, both the second opts argument and the third overrides argument are optional. A moment’s thought will convince you that if the opts argument is missing, the overrides argument is also.

Here is the full scribe function:

function scribe(obj, options, overrides)
1    if type(obj) ~= 'table' then return simple_string(obj) end

2    if options == nil then return table_string(obj, options.pretty) end

3    if not opts.COMPLETE then
        local from = opts.indent == '' and options.inline or options.pretty
        complete_options_table(options, from)
    end
4    if overrides == nil then return table_string(obj, opts) end

    if not overrides.COMPLETE then complete_options_table(overrides, opts) end
5    return table_string(obj, overrides)
end

1: As usual, we handle non-table objects up-front.
2: If the user does not provide an opts table, we use the options.pretty table and are done.
3: We complete an incomplete opts table if the user provides it.
4: If the user does not provide an overrides table, we use the opts table and are done.
5: If the user provides an overrides table, we complete it from the opts table and use it.
By the time we get here, we can be sure that the opts table is complete.

We also alter the facade functions to permit an overrides table. For example:

function pretty_string(tbl, overrides)
1    return scribe(tbl, options.pretty, overrides)
end

1: The main options table is options.pretty, and we also pass along any user-provided overrides table.

Here is an example of how the user can provide their own options table and override the indent field:

local user =
{
    first = "Minnie",
    last = "Mouse",
    friends = { "Mickey", "Goofy" }
}
print(classic(user, { indent = '        ' }))

Output:

{
    first = "Minnie",
    last = "Mouse",
    friends = { "Mickey", "Goofy" }
}

Metamethods

We mentioned that any Lua table can have an associated metatable

The metatable is a regular table with arbitrary data and methods like any other table. However, if a table tbl has a metatable mt, Lua will check for specially named methods, metamethods, in mt and use those in place of its built-in default operations.

Metamethods, particularly the __index metamethod, are the keys to understanding how to use prototype and object-oriented methodologies in Lua. However, that isn’t the topic for today.

The one metamethod that interests us here is the __tostring function. (All Lua’s metamethods start with double underscores).

Here’s an example where we create a metatable with a __tostring method inside it:

1local count = 0
2local mt = {}
3function mt.__tostring(tbl)
    count = count + 1
4    return 'This is print number: ' .. tostring(count) .. ' for an array of size: ' .. #tbl
end

1: count will get incremented every time the __tostring metamethod is called.
2: mt is just a regular empty Lua table.
3: We add a function __tostring to mt.
4: Every time mt.__tostring is called, we increment count and return a string with the latest count.

You will frequently see the equivalent definition:

mt.__tostring = function(tbl)
    count = count + 1
    return 'This is print number: ' .. tostring(count) .. ' for an array of size: ' .. #tbl
end

The former style is more in keeping with most other programming languages. If you plan on expanding your horizons beyond Lua, stick with that look. However, both styles are perfectly acceptable and produce identical byte code.

For this metamethod to have any effect, we must attach its containing metatable to a Lua table using the setmetatable method:

local arr = { 1, 2, 3 }
setmetatable(arr, mt)

If you just give arr a __tostring method directly, Lua will not make any redirection calls to it. For Lua to see a metamethod, you must put it in a metatable and attach the metatable to the parent object. The setmetatable call endows tbl with a hidden metatable. The existence of that metatable is what triggers Lua to redirect some of its operations to your custom definitions. Just adding metamethods directly to a table does nothing.

Let’s exercise that metamethod:

print(tostring(arr))
print(tostring(arr))
print(tostring(arr))
print(tostring(arr))

This yields:

This is print number: 1 for an array of size 3
This is print number: 2 for an array of size 3
This is print number: 3 for an array of size 3
This is print number: 4 for an array of size 3

The built-in tostring method now redirects calls to the mt.__tostring method. If we remove the metatable:

setmetatable(tbl, nil)

Then tostring(tbl) reverts to something like:

table: 0x15f852480

Well, suppose the user is sophisticated enough to have added a custom __tostring metamethod to return a custom string for a particular table or class of tables. In that case, we should honour their effort by using that method.

We can add a call to the top of table_string to check for a custom __tostring metamethod and, if present, use that instead of our paltry efforts.

However, it is best to make that optional, which we do by adding a field to our options table:

local options = {}
options.pretty = {
    indent        = '    ',
    table_begin   = '{',
    table_end     = '}',
    array_begin   = '[',
    array_end     = ']',
    key_begin     = '',
    key_end       = ' = ',
    sep           = ',',
    inline_spacer = ' ',
    show_indices  = false,
    comparator    = compare,
    inline_size   = math.huge,
    path_root     = 'table',
    path_sep      = '.',
    path_begin    = '<',
    path_end      = '>',
1    use_metatable = true,
    COMPLETE      = true
}

1: If true and if there is a custom __tostring metamethod, then we redirect the table conversion to that method.

With that change, the top of the table_string looks like this:

local function table_string(root_tbl, opts)
    ...
    local function process(tbl, path)
1        if opts.use_metatable then
2            local mt = getmetatable(tbl)
3            if mt and mt.__tostring then return mt.__tostring(tbl) end
        end
        ...

1: Check whether we are allowed to use metamethods.
2: Check whether tbl has a metatable.
3: If tbl has an associated __tostring metamethod, invoke it and return early.

For example, if:

local count = 0
local mt = {}
function mt.__tostring(tbl)
    count = count + 1
    return 'This is print number: ' .. tostring(count) .. ' for a table of size: ' .. #tbl
end
local tbl = { 1, 2, 3 }
setmetatable(tbl, mt)

Then print(pretty(tbl)) yields:

This is print number: 1 for a table of size: 3

Why Optional?

Can you guess why we made using any custom __tostring metamethod controllable as a format option? When wouldn’t we want to use it?

Metamethods like __tostring are usually attached to a whole class of tables instead of a particular instance. The method might do something specific to the class as a whole and then defer much of the work back to scribe to convert the instance data to a string.

You then run into the danger of chasing your tail. The custom __tostring method calls table_string, which then calls the __tostring method and so on, ad infinitum!

In this case, we must set the opts.use_metatable to false to break the cycle.

Here’s an example:

local count = 0
local mt = {}
function mt.__tostring(tbl)
    count = count + 1
1    local tbl_options = { use_metatable = false }
    local tbl_string  = inline(tbl, tbl_options)
    return 'Print: ' .. tostring(count) .. ' for table: ' .. tbl_string
end

1: With this override, the following line will cause a stack overflow.

Then:

local tbl = { 1, 2, 3 }
setmetatable(tbl, mt)
print(pretty(tbl))
print(pretty(tbl))
print(pretty(tbl))

Yields:

Print: 1 for table: [ 1, 2, 3 ]
Print: 2 for table: [ 1, 2, 3 ]
Print: 3 for table: [ 1, 2, 3 ]

The `scribe` Module

In Lua, if you have a file where you set:

answer = 42

You are creating a global variable answer with the value 42. This means that answer is available to all other Lua files that are loaded after this one.

On the other hand, if you write:

local answer = 42

You are creating a local variable answer that is only available in the current file.

The same thing applies to functions. If you write:

function bump(a)
    answer = 42
    return a + answer
end

Then bump is a global function that can be called from any other Lua file. Moreover, even though answer is set in the bump function, it is a global variable that can be accessed and modified from anywhere.

On the other hand, if you write:

local function bump(a)
    local answer = 42
    return a + answer
end

Then bump is a local function that can only be called from within the current file. answer is a local variable that can only be accessed and modified within the function bump.

Prepending local to variables and functions confines them to the enclosing scope.

This is a good practice because it reduces the chance of inadvertently modifying variables or functions that are used elsewhere. It also makes the intent of the code much clearer.

In general, you should always use local unless you have a good reason not to.

In Lua, the local keyword is used to declare variables and functions as local to the block in which they are declared. I suspect that, with the benefit of hindsight, Lua’s designers would choose to make local the default and added some other keyword to make variables global. You will have many more local variables than global ones in your code, so that switch would be very beneficial. However, that is not the way Lua is designed, so you must remember to use local to keep your code clean and maintainable.

We have been fairly careful to use local in our code to this point.

Modules

There is a further level of encapsulation that we have not yet discussed: modules.

A module is a collection of functions and variables that are grouped together in a single Lua table. The table is returned by the module and can be used to access the functions and variables within it.

Here is a simple example of a module in a file called answer.lua:

1local M = {}

2local answer = 42

3function M.bump(a)
    return a + answer
end

4return M

1: We create a local table M to hold our module.
The name M is a common convention and has nothing to do with how the module is stored or used.
2: answer is a local variable that is only accessible within the module (within answer.lua).
3: We define a function bump within the module. It will become publicly accessible.
4: We export the module at the end of the file where it’s defined.
The return M statement makes the module available to any other Lua file that requires it.

To use the module in another file, you would write:

1local answer = require 'answer'
2print(answer.bump(10))

1: require is a built-in Lua function that loads a module and returns the table that the module exports.
2: We call the bump function from the answer module to print 52.

Notice that the answer module is a self-contained unit. It has its own local variables (and potentially local functions) that are private and not accessible from outside the module. The only way to interact with the module is through the functions and variables that it exports. Generally, the only thing that a module exports is a table that contains the functions and variables that you want to make available to the outside world. What you call the module internally is up to you, but the convention is to use M.

Typically, the user of the module will import the module into a local variable with the same name as the module’s file (without the .lua extension) though that is not a requirement.

Modules are a powerful way to organize your code and keep it clean and maintainable.

The `scribe` Module

Here is a sketch of how we can turn our current code into a module defined in a file called scribe.lua:

1local M = {}

2local function indent_string(str, indent, ignore_first_line)
    ...
end

local function compare(a, b)              ... end
local function ordered_pairs(comparator)             ... end
local function simple_string(obj)                    ... end
local function empty_table_string(opts)              ... end
local function metadata(root_tbl)                    ... end
local function table_string(root_tbl, opts)          ... end
local function table_clone(tbl)                      ... end
local function complete_options_table(options, from) ... end

3M.options = {}

4M.options.pretty      = { ... }
5M.options.inline      = table_clone(M.options.pretty)
...
M.options.classic     = table_clone(M.options.pretty)
...
M.options.alt         = table_clone(M.options.pretty)
...
M.options.json        = table_clone(M.options.pretty)
...
M.options.inline_json = table_clone(M.options.json)
...
M.options.debug       = table_clone(M.options.pretty)
...
M.options.default     = M.options.inline
...

6function M.scribe(obj, opts, overrides)
    if type(obj) ~= 'table' then return simple_string(obj) end

    if opts == nil then return table_string(obj, M.options.default) end

    if not opts.COMPLETE then
        local from = opts.indent == '' and M.options.inline or M.options.pretty
        complete_options_table(opts, from)
    end
    if overrides == nil then return table_string(obj, opts) end

    if not overrides.COMPLETE then complete_options_table(overrides, opts) end
    return table_string(obj, overrides)
end

7function M.pretty(tbl, overrides)
    return M.scribe(tbl, M.options.pretty, overrides)
end

function M.inline(tbl, overrides)       ... end
function M.classic(tbl, overrides)      ... end
function M.alt(tbl, overrides)          ... end
function M.json(tbl, overrides)         ... end
function M.inline_json(tbl, overrides)  ... end
function M.debug(tbl, overrides)        ... end

8return M

1: We create a local table M to hold our module.
It will contain all of the functions and variables that we want to export.
2: We define all the private helper functions that we need for our module.
These functions are declared as local and are not accessible from outside the module.
3: We create a table M.options to hold all of the options that we will use in our module.
These will all be accessible from the outside as we want the user to be able to modify them.
4: Where before we had options.pretty = { ... }, we now have M.options.pretty = { ... }.
5: And so on for the other tables of formatting parameters.
6: The main scribe function is now a member of the module.
It is shown in full so you can see how it uses both public options data and private helper functions.
7: This is true for all our convenience facade functions, like pretty, inline, classic, etc.
8: We finish by exporting the module by returning the table M.

Here is how you would use the scribe module in another file:

1local scribe = require 'scribe'
2print(scribe.pretty({a = 1, b = 2}))

1: We import the scribe module into a local variable scribe.
2: We call the classic function from the module to print a nicely formatted table.

This yields:

{
    a = 1,
    b = 2
}

A Little Bonus

Once you’ve loaded the scribe module, you can access the pretty function as scribe.pretty and so on. If you care about using the pretty function a lot, you can make it available as a local variable in your file:

local scribe = require 'scribe'
local pretty = scribe.pretty
local inline = scribe.inline

It would also be nice to have a shorthand for scribe.scribe.

We add a __call metamethod to the scribe table to do that. Lua calls this metamethod when you treat the table as a function (i.e. when you use scribe(...)).

Metamethods do not go in the module table itself. Instead, you give the module table a metatable that contains the metamethods. This extra level can seem confusing to judge by the number of questions about it on the internet.

In our case, we add the __call metamethod to the metatable of the scribe module as follows:

1local mt = {}
2function mt.__call(_, ...) return M.scribe(...) end
3setmetatable(M, mt)

return M

1: Start with an ordinary empty table mt.
2: Add the __call metamethod to the table.
The first argument to the metamethod is the table itself, but we don’t need it so we use _.
The ... collects all the arguments passed to the function.
3: We endow our module table M with the metatable mt that contains the metamethods.

You can use _ as a placeholder for any argument you don’t need. Also, note that ... is a special variable that collects all the arguments passed to a function and forwards them unchanged.

With that addition, you can now use scribe as a function:

local scribe = require 'scribe'
print(scribe({a = 1, b = 2}))

This will print the same table as before: {a = 1, b = 2}.

`require` Gotcha

require is a built-in Lua function that loads a module and returns whatever the module exports.

It looks for the module’s source file using Lua’s package.path variable. This is a long string of directories that Lua searches for files when you require them. The different directories in package.path are separated by semicolons.

Running Lua from the command line and typing:

print(package.path)

I get something like:

1/usr/local/share/lua/5.4/?.lua;
/usr/local/share/lua/5.4/?/init.lua;
/usr/local/lib/lua/5.4/?.lua;
/usr/local/lib/lua/5.4/?/init.lua;
2./?.lua;
./?/init.lua

1: Actually, the output is on a single line, but I have broken it up for clarity.
2: The . refers to the current directory.

The first four entries are the system directories where Lua looks for modules. Those were set when Lua was installed. The ./?.lua entry tells Lua to also look for modules in the “current” directory.

By the way, the ? is a wildcard that Lua replaces with the file name you are searching for.

With this setup you drop the scribe.lua in the same directory as your main Lua file and you can require it. Everything will work fine.

However, these days you are quite likely to run Lua from an IDE or perhaps via a plugin in another application. For example, I sometimes run Lua from ZeroBrane Studio which is a free lightweight IDE for Lua with a a full featured debugger (it’s cross-platform and highly recommended). Other times I run Lua from VSCode with the Lua for Visual Studio Code extension.

In both these cases, the current directory is not the directory where your Lua files are! Instead, it is the directory where the IDE or plugin is installed.

When you run Lua from these environments, you will get an error when you try to require a module in the same directory as your main Lua file. The error will be something like:

module 'scribe' not found:
    no field package.preload['scribe']
1    no file './scribe.lua'
    no file '/usr/local/share/lua/5.4/scribe.lua'
    no file '/usr/local/share/lua/5.4/scribe/init.lua'
    ...

1: This no file line will make you scratch your head!

It appears that Lua is looking for ./scribe.lua and not finding it even though it is clearly in the same directory as your main Lua file. You’ll probably double and triple check the file is there and that you have spelled the name correctly. Nothing will help.

The confusion arises because you think . is the directory where your main Lua file is but the IDE or plugin sees it as the directory where the IDE or plugin is installed.

The solution is to add the script’s directory to package.path. You could hardcode that directory name and append it to package.path but that’s clunky. If you change the directory structure of your project, you will have to remember to change the hardcoded path.

Instead, you can use Lua’s debug library to get the directory of the current source file. Here is how you can do that:

1local source_dir = debug.getinfo(1, 'S').source:match [[^@?(.*[\/])[^\/]-$]]
2package.path = source_dir .. "?.lua;" .. package.path

1: This magic incantation gets the directory of the current source file.
2: This line appends the directory to package.path.

You can put these lines at the top of your main Lua file and they will ensure that require works correctly.

This isn’t terribly elegant, but it is a portable way to ensure that your modules are found in the “current” directory when you run Lua from an IDE or plugin.

LuaRocks

scribe, like many Lua modules, is available via LuaRocks.

LuaRocks is the package manager for Lua modules and, when you install LuaRocks, it makes sure that any modules you install using it are available to Lua via require. It adds some LuaRocks standard directories to package.path so that Lua can find the modules.

If you install scribe using LuaRocks, you won’t have to worry about the require gotcha. LuaRocks will take care of everything for you.

Summary

At this point we have a developed a production ready version of scribe. It produces readable outputs for complex tables with cyclical references. scribe also supports options for customizing the output in many ways.

Our module also comes with pre-packaged styles for common output formats and simple to user-friendly functions for printing tables in those formats. For the most part, the user can just call pretty or json, etc. and get a good result without having to worry about the details.

Formatted Output

Stringing together messages using concatenation quickly becomes cumbersome.

Lua provides a simple way to format strings using the string.format method, similar to the sprintf function in C.

print(string.format("The value of %s is %.2f", 'pi', math.pi))

This prints The value of pi is 3.14 to your screen.

The format string "The value of %s is %.2f" is a template containing placeholders for the values you want to insert. It is a recipe for baking a string by replacing the placeholders with the trailing arguments to string.format.

The general form for calling string.format is:

string.format(format_string, arg1, arg2, ...)

The first argument is the format string; the rest are the values that string.format will insert into the placeholders. It is a variadic function, which means it can take any number of arguments after the format string.

Placeholders like %s and %f are format specifiers that tell string.format to look for a trailing argument that is a string and another that is a floating point number. The .2 in %.2f is a format modifier, and it tells string.format to round the floating point number to two decimal places. The placeholders are replaced by the trailing arguments in the order they appear in the format string.

string.format is identical to the venerable sprintf function in C, and it supports almost all the same format specifiers and modifiers. We already mentioned that it adds a couple of extra format specifiers, like %q, which are not available in C. It drops a few of the more esoteric format specifiers rarely used in practice.

At some point, everyone recreates the same wrapper around string.format that looks like this:

1function printf(format_string, ...)
    print(string.format(format_string, ...))
end

1: The name used here is printf to mimic the C function of the same name.

You can use this function to print formatted strings like this:

printf("The value of %s is %.2f", 'pi', math.pi)

Creating formatted output using string.format is a big step up from concatenation, but it suffers from the problem of having no concept of a Lua table. The underlying C function is unaware of Lua’s data structures, so it sees tables as a blob of memory and prints their address.

Adding Tables to the Mix

Scribe provides a scribe.format function that is a drop-in replacement for string.format with the added ability to format Lua tables.

local person = {name = 'Alice', age = 42}
print(scribe.format("Data: %t", person))

This prints Data: { age = 42, name = "Alice" } to your screen.

We do this by adding a new format specifier, %t, that tells scribe.format to format the trailing argument as a table. We have added several new format specifiers that allow you to format Lua tables in various ways.

It happens that %t, %T, %j, and %J were not already claimed as specifiers by string.format. Moreover, those specifiers are mnemonic and easy to remember:

%t formats a table as an inline string.
%T formats a table as a multiline string.
%j formats a table as a compact inline JSON string.
%J formats a table as a pretty-printed multiline JSON string.

So, uppercase %T and %J are for multiline output, while lowercase %t and %j are for inline output.

The signature for scribe.format is the same as string.format:

function M.format(template, ...)
...
end

The first argument is the format string; the rest are the values we will insert into the placeholders.

We know that all placeholders have the form %<modifier><specifier>, where <specifier> is the only required part. Our new format specifiers %t, %T, %j, and %J are no different.

Our custom format method looks for those new specifiers in the format string. If none exist, it calls string.format with the same arguments and returns the result.

If it finds any new specifier, it formats the trailing table argument as a string according to the specifier. It can then replace the custom placeholder like %t in the format string with a %s. It also replaces the table argument with its formatted string description. At this point, it calls string.format with the modified format string and the rest of the arguments.

The tricky part is using Lua’s pattern matching to find the custom specifiers in the format string.

function M.format(template, ...)
1    if template == nil then return "" end

2    local percent_rx = '%%+'
    local modifier_rx = '[%-%+ #0]?%d*%.?[%d%*]*[hljztL]?[hl]?'
3    local specifier_rx = '[diuoxXfFeEgGaAcspqtTjJ]'
4    local placeholder_rx = string.format('%s(%s)(%s)', percent_rx, modifier_rx, specifier_rx)
5    local table_rx = percent_rx .. '%d*[tTjJ]'

6    if not template:find(table_rx) then return string.format(template, ...) end

7    local table_placeholders = {}
    local n_placeholders = 0
8    for mod, spec in template:gmatch(placeholder_rx) do
        n_placeholders = n_placeholders + 1
        if spec == 't' or spec == 'T' or spec == 'j' or spec == 'J' then
            insert(table_placeholders, { n_placeholders, mod, spec })
        end
    end

9    local args = { ... }
    if #args ~= n_placeholders then
        return string.format("[FORMAT ERROR]: %q -- needs %d args but you sent %d!\n", template, n_placeholders, #args)
    end

10    for i = 1, #table_placeholders do
        local index, mod, spec = unpack(table_placeholders[i])
        local full_spec = mod .. spec

        if full_spec == 't' then
            args[index] = M.inline(args[index])
        elseif full_spec == 'T' then
            args[index] = M.pretty(args[index])
        elseif full_spec == 'J' then
            args[index] = M.json(args[index])
        elseif full_spec == 'j' then
            args[index] = M.inline_json(args[index])
        else
            return string.format("[FORMAT ERROR]: %q -- unknown table specifier: %q\n", template, full_spec)
        end
    end

11    template = template:gsub(table_rx, '%%s')
12    return string.format(template, unpack(args))
end

1: An edge case: if the format string is nil, we return an empty string.
2: The pattern for matching one or more percent signs.
3: The pattern for matching a format specifier.
4: The pattern for matching a placeholder.
5: The pattern for matching our table specifiers.
6: If the format string contains no table specifiers, we can call string.format and return the result.
7: We create space to store the positions of the table placeholders.
8: We iterate over the placeholders in the format string and store the position of any table specifiers.
9: We store the trailing arguments in a local variable.
10: We iterate over the table placeholders and format the table arguments according to the specifier.
11: We replace the table specifiers with %s in the format string.
12: We call string.format with the modified format string and the rest of the arguments.

A lot is going on here, but the key points are: - We use Lua’s pattern matching to find the placeholders in the format string. - We store the positions of any table specifiers. - We format the table arguments according to the specifier. - We replace the table specifiers with %s in the format string. - We call string.format with the modified format string and the rest of the arguments.

More Facades

We have added a few more facades to the scribe module to make it easier to work with formatted output. For example:

function M.put(template, ...)
1    io.stdout:write(M.format(template, ...))
end

1: The put function is a simple wrapper around scribe.format that writes the formatted string to the standard output.

A matching putln function appends a newline character to the same output.

function M.putln(template, ...)
    io.stdout:write(M.format(template, ...), '\n')
end

Corresponding eput, eputln, fput, and fputln functions write to the standard error stream and to files.

A Subtle Bug!

Introduction

scribe has existed for a while now.

The obvious use case is to print tables to the console as a simple form of debugging. It is an overkill for that purpose unless you deal with tables containing cycles. But scribe is a small, stand-alone module that is very simple to use, so why not?

We use the module to add custom __tostring metamethods to various classes in a larger Lua codebase. Those methods use the ability to customise the formatting options and exercise the full range of features.

scribe is entirely self-contained and does not require any external dependencies. This makes it a great candidate for a little tutorial project like this one.

It also makes it a playground for exploring the capabilities of LLMs.

Pairs Programming with AI

We recently have looked into the capabilities of Cursor and Windsurf to see how well they can help us write code.

They both are forks of the ubiquitous VSCode editor, designed to allow the LLM’s broader access to the codebase. Under the covers, they use the same LLMs that power Copilot, but they are a step up from the current iteration of that technology.

We prompted Cursor to write some tests for scribe using the Busted testing framework.

Without going into too much detail, the tests are many little functions that call scribe with different arguments and check the results.

Here is an example:

1describe('basic functionality', function()
2    it('should handle simple arrays', function()
3         assert.are.equal('[1, 2, 3]', scribe({ 1, 2, 3 }))
    end)
end)

1: The describe function groups related tests together.
2: The it function defines a single test.
3: The assert.are.equal function checks that the result of calling scribe with the given arguments equals the expected golden result.

Cursor generated many tests like this, saved them to a test file and then asked me to run them. Most of the tests failed!

A quick look at the tests revealed that Cursor had generated perfectly reasonable tests but got the golden results wrong.

Did you spot the error? That last example should have read:

1        assert.are.equal('[ 1, 2, 3 ]', scribe({ 1, 2, 3 }))

1: Note the space between the [ and the 1 and the space between the 3 and the ].

I then prompted Cursor to fix the issue (though I realise now that I should have it to fix the tests). It mused, “I see the issue, I will fix it,” and then altered scribe.lua to get it to match the incorrect golden results!

Fortunately, the tool has a built-in timeline view, so I could easily undo the changes it made.

Prompting AI to write code can be a great way to change existing code in unexpected ways!

A Real Bug

Once we were over that hurdle, we were able to get almost all the tests to pass. The critical word there is “almost”!

Busted reported a failure on the following test:

it('should handle tables with shared references', function()
    local shared = { x = 1 }
    local input = { a = { b = shared }, c = { d = shared } }
    local expected = '{\n a = {\n b = { x = 1 }\n },\n c = { d = <a.b> }\n}'
    assert.are.equal(expected, scribe.pretty(input))
end)

Given the earlier issues with the LLM’s idea of golden results, agreeing there was a real problem here took a while. In hindsight, it is obvious there is one!

If we extract the various bits of Busted machinery, the test looks like this:

local shared = { x = 1 }
local input = { a = { b = shared }, c = { d = shared } }
print(scribe.pretty(input))

With a bit of thought, it is clear that the output should be:

{
    a = {                           -- <1>
        b = { x = 1 }               -- <2>
    },
    c = { d = <a.b> }               -- <3>
}

a has sub-tables and should have fields on new lines.
b is a simple table so that it will have its fields on a single line.
Because a is alphabetically before c, the definition of shared will be already printed before we get to the definition of c.
This means that c has no “real” sub-table, so it should appear on one line.

When we ran the test, we first got the expected output! That was a bit disconcerting, and I cast another accusatory at Cursor!

However, when we reran the test, we got the following output:

{
    a = { b = { x = 1 } },       -- <1>
    c = {
        d = <a.b>                -- <2>
    }
}

This is way off, as clearly a has a sub-table and should have fields on new lines.
Moreover, c is a simple table and should have all its fields printed inline!

Quick Resolution

Rerunning the test, the output oscillated between the correct and incorrect results! That gives off a particular odour; we are iterating over the table in a different order each time. That seems to be the case even though we use the ordered_pairs iterator in table_string.

We did have a shot at getting the LLM to fix the problem, but it just rewrote a lot of code and introduced new bugs. The LLM is not yet ready to solve this subtle problem.

Once we had established that the problem was due to the iteration order, it was time to fix it. Using ordered_pairs in table_string, we expected to get the table elements in the same order each time. Looking through the code, we found that the metadata method was not using that iterator.

A quick fix is to slightly alter the signature of metadata to take an optional comparator function. We can then use that and move the main loop from pairs to ordered_pairs:

1local function metadata(root_tbl, comparator)
    ...
    local function process(tbl)
        local size, array, subs = 0, true, 0
        local children = {}
2        local iter = ordered_pairs(comparator)
        for _, v in iter(tbl) do
            ...
        end
        md[tbl].size, md[tbl].array, md[tbl].subs = size, array, subs
3        for _, child in ipairs(children) do process(child) end
    end
    process(root_tbl)
    return md
end

local function table_string(root_tbl, opts)
4    local md = metadata(root_tbl, opts.comparator)
    ...
end

1: The metadata method is now passed the comparator function.
2: The ordered_pairs function is used to iterate over the table.
3: The ipairs function is fine here as we are over an array.
4: table_string passes the comparator from the options table to metadata method.

This change fixed the problem, and the test now passes. Repeating the test, we get the correct output every time.

A More Elegant Solution

scribe aims to be a general solution for viewing arbitrary tables in the best human-readable format. It does not aspire to be maximally efficient in pursuit of that goal. However, our quick solution suffers from an obvious efficiency issue.

We are instantiating ordered_pairs twice; once in table_string and once in metadata. The table keys must be extracted and sorted each time we do this.

Well, we have a metadata method already. Why not make the appropriate iterator another piece of metadata for each table?

We can then use that iterator in the table_string method. This way, we only pay the price for sorting the keys once.

There are several possibilities for the iterator:

We should use ipairs if the table is an array.
We should use pairs if the comparator is false.
We should use some form of ordered_pairs if we have a comparator function.

Before we iterate through the depths of a table, we first write a helper function to get some metadata fields for the top level of a table:

local function top_level_metadata(tbl, comparator)
    ...
end

This function will return three values:

A boolean indicating if the table is an array.
The size of the table.
The appropriate iterator to use to iterate over the table.

We can then use this helper function to get the metadata fields all the way down the table:

local function metadata(root_tbl, comparator)
    local md = {}
    md[root_tbl] = { refs = 1 }

    local function process(tbl)
1        md[tbl].array, md[tbl].size, md[tbl].iter = top_level_metadata(tbl, comparator)

        local subs, sub_tables = 0, {}
2        local iter = md[tbl].iter
        for _, v in iter(tbl) do
            if type(v) == 'table' then
                if md[v] then
                    md[v].refs = md[v].refs + 1
                else
                    subs = subs + 1
                    table.insert(sub_tables, v)
                    md[v] = { refs = 1 }
                end
            end
        end
        md[tbl].subs = subs
        for _, sub_table in ipairs(sub_tables) do process(sub_table) end
    end

    process(root_tbl)
    return md
end

1: The top_level_metadata function is called to get three metadata fields for the top level of the table.
2: One of those three values is the appropriate iterator to iterate over the table we use here.

The table_string method is then updated to use the iterator from the metadata:

local function table_string(root_tbl, opts)
    local md = metadata(root_tbl, opts.comparator)
    ...
    local function process(tbl, path)
        ...
        local i, iter = 0, md[tbl].iter
1        for k, v in iter(tbl) do
            i = i + 1
            ...
        end
    end
    process(root_tbl, '')
end

1: The iter function is reused here. It already has the sorted array of table keys if needed.

It only remains to look at the top_level_metadata function shown here in full:

local function top_level_metadata(tbl, comparator)
1    if comparator == false then
        local array, size = true, 0
        for _ in pairs(tbl) do
            size = size + 1
            if array and tbl[size] == nil then array = false end
        end
2        local iter = array and ipairs or pairs
3        return array, size, iter
    end

4    local array, size, keys = true, 0, {}
    for k, _ in pairs(tbl) do
        size = size + 1
        if array and tbl[size] == nil then array = false end
        table.insert(keys, k)
    end
5    if array then return array, size, ipairs end

6    if comparator == nil then comparator = compare end
    table.sort(keys, comparator)

7    local iter = function(t)
        local i = 0
        return function()
            i = i + 1
            return keys[i], t[keys[i]]
        end
    end
    return array, size, iter
end

1: If the comparator is false, we will use a default Lua iterator.
2: If the table is an array, we will use ipairs; otherwise, we will use pairs.
3: We return three bits of metadata — an array flag, the size of the table and the iterator.
4: If the comparator is anything but false, we may have to sort the table keys so we collect them along the way.
5: If the table is an array, we don’t need the keys and return ipairs as the appropriate iterator.
6: Sort the keys using the comparator function if one was provided or the default one.
7: iter is a closure, a local function with access to the keys variable. It is the appropriate iterator for the table in question.

Note that the iter function takes a table as an argument. In practice, iter will only be called with tbl, where the keys come from. However, iter is written in the style of a generic iterator that can be used with any table. The metadata method and the table_string methods can call for k,v in iter(tbl) do ... end and not worry about whether iter is ipairs, pairs or something completely custom.

Conclusion

Lots of fun with AI!

If you do let something like Cursor loose to fix bugs in your codebase, you need to be comfortable using the git history or perhaps the timeline tool to backtrack! It’s probably a good idea to have the LLM work in its git branch and not mix the changes with your code until you approve them.

On the other hand, Cursor did come up with a simple test that pointed out an actual issue with the code. Its ability to generate lots of tests from a prompt is impressive. Its tests are at least a little independent of the ones you will likely have written yourself.

A Small Optimisation

A comment on Reddit suggested that the indent_string function could be optimised by storing the individual indented lines in a table and combining them using table concatenation at the end of the method.

The original version builds up the indented string bit by bit on the fly:

local function indent_string_orig(str, indent, ignore_first_line)
    ignore_first_line = ignore_first_line or false
    if not indent or indent == "" or not str or str == "" then return str end
    local ends_with_newline = str:sub(-1) == "\n"

    local indented_str = ""
    local first_line = true
    for line in str:gmatch("([^\n]*)\n?") do
        if not first_line then indented_str = indented_str .. '\n' end
        local tab = first_line and ignore_first_line and '' or indent
1        indented_str = indented_str .. tab .. line
        first_line = false
    end
    if ends_with_newline then indented_str = indented_str .. "\n" end
    return indented_str
end

1: We keep appending to indented_str which eventually gets returned at the end of the function.

Here is an alternate version that stores the individiual indented lines in a table and only joins them together at the end of the method:

local function indent_string(str, indent, ignore_first_line)
    ignore_first_line = ignore_first_line or false
    if not indent or indent == "" or not str or str == "" then return str end
    local ends_with_newline = str:sub(-1) == "\n"

1    local lines = {}
    local first_line = true
    for line in str:gmatch("[^\r\n]+") do
        local tab = first_line and ignore_first_line and '' or indent
2        table.insert(lines, tab .. line)
        first_line = false
    end
3    local retval = table.concat(lines, "\n")
4    if ends_with_newline then retval = retval .. "\n" end
    return retval
end

1: lines is a table used to store the individual indented lines from str.
2: We iterate through the lines in str and insert indented versions into that table.
3: We join all the lines together using Lua’s standard table.concat method.
4: If the input string ended with a newline character then so should the output string.

Here is a simple function to benchmark the two versions:

local function benchmark_indents()
1    local test_str = string.rep('line\n', 10000)
2    local iterations = 100

    -- Benchmark the `indent_string_orig` method:
    print('Benchmarking `indent_string_orig` ...')
    local start_time = os.clock()
    for i = 1, iterations do
        indent_string_orig(test_str, '    ')
    end
    local t1 = os.clock() - start_time

    -- Benchmark the `indent_string` method:
    print('Benchmarking `indent_string` ...')
    start_time = os.clock()
    for i = 1, iterations do
        indent_string(test_str, '    ')
    end
    local t2 = os.clock() - start_time

    -- Print and compare the results:
    print(string.format("indent_string_orig took: %.4f seconds", t1))
    print(string.format("indent_string took:      %.4f seconds", t2))
    print(string.format("indent_string is %.2fx %s than indent_string_orig",
                        math.abs(t1 / t2), t2 < t1 and "faster" or "slower"))
end

1: Create a largish test string with lots of lines to indent.
2: We run enough iterations of the two methods to consume a decent interval of time.
That way we’re sure that any loop overhead is not an important factor in the comparison.

On my machine, the optimised version is faster by a factor of 10 or more. The test is quite rough and ready but clearly the table version is worth moving to!

Introduction

Lua Types

Simple Types

Lua Functions

Two Opaque Types

Array Tables

Key-Value Tables

First Shot at Tables

Making indent a Parameter

Anatomy of a Table

Formatting Options

The Comma Problem

Using an Extra Pass

Using a Guard

Empty Tables

Arrays vs. Tables

Adding Indentation

Aside: Lua Patterns

Indenting Tables

Other Output Formats

Indentation Only

JSON

Compact JSON

Debug Format

Ordered Output

Aside: nil vs. false

Ordered Output Resolved

Inlining Simple Sub-Tables

Table Metadata

Cyclical References

Crash Proofing

Paths

Breadth First Traversal

Arrays

One Small Tweak

Scribe Facade

Introduction

Health and Safety

Overrides

Metamethods

Why Optional?

The scribe Module

Modules

The scribe Module

A Little Bonus

require Gotcha

LuaRocks

Summary

Formatted Output

Adding Tables to the Mix

More Facades

A Subtle Bug!

Introduction

Pairs Programming with AI

A Real Bug

Quick Resolution

A More Elegant Solution

Conclusion

A Small Optimisation

Making `indent` a Parameter

Aside: `nil` vs. `false`

The `scribe` Module

The `scribe` Module

`require` Gotcha