Tutorial: Turning the Tables …
Introduction
Lua’s only rich native type is the table
.
The table is the only game in town, so you will use it to implement every non-trivial data structure you need in any Lua project.
In this article, we will gradually build scribe
, a Lua module that converts tables (and other Lua types) to readable strings.
Converting arbitrary Lua tables into descriptive strings is more complex than it initially appears. We’ll examine the issues that arise and how scribe
addresses some of the pitfalls.
We will start with a trivial implementation in a dozen lines of Lua. Over time, we will evolve that code into a production-ready Lua module that handles the most complex tables with cycles and shared references. We will also see how to support multiple output formats in a single code block.
This blow-by-blow description and the liberally documented final product, scribe.lua
, should be a helpful tutorial, at least for those new to Lua, especially those with experience in other languages.
This is not an introduction to Lua. Think of it as more Lua 201 than Lua 101. |
This article is long, but we have tried to make it worthwhile.
And, of course, we hope you find scribe
itself as helpful as we do!
Lua Types
Like every other programming language ever invented, the classic first Lua script is:
str = "Hello World"
print(str)
And, hey presto, it works! On your terminal, the output is:
Hello World
That handy print
function works as you’d expect for many Lua types.
Lua always aims for the minimal and has only eight types in total. Compare that to Rust, which has twelve types just for integers! |
By the way, Lua’s tostring
function is a companion to print
and converts any Lua type to a string.
Simple Types
The four most straightforward Lua types are number
, boolean
, string
and nil
:
1str = "Cinderella"
2answer = 42
3pi = 3.14
4flag = false
5oops = nil
print(str, answer, pi, flag, oops)
- 1
-
A
string
. - 2
-
A
number
that is an integer. - 3
-
This
number
is a float, but Lua uses one type for integers and floats. - 4
-
A
boolean
. - 5
-
A special
nil
type indicates not-founds, fails, etc.
In each case, you get very reasonable results on your screen:
Cinderella 42 3.14 false nil
We can use print
to dump recognisable values from number
, boolean
, string
and even nil
.
The simplest form of debugging is to sprinkle print statements throughout your code liberally, so the more types print
works on, the better. Sure, it’s not elegant, but every programmer uses print statements when things go awry. Even more so in a non-compiled, dynamic language like Lua, where adding a print statement and rerunning happens as fast as you can type.
Lua has four additional types beyond number
, string
, boolean
, and nil
.
These are function
, userdata
, thread
and table
.
Lua Functions
Lua methods you write or import all have the type function
.
Let’s look at a simple function example:
- 1
-
This prints whatever is returned from our
answer
function. - 2
- This prints what Lua thinks of as the function itself.
Output:
42
1function: 0x600003e6cca0
- 1
- The part after the colon will vary from run to run.
The string "function"
is descriptive enough, but the string 0x...
that follows the colon is opaque. It is the address in memory where Lua stores its form of the function in question. That is consistent for a single run, so if you print the function twice:
function answer() return 42 end
print(answer)
print(answer)
The code outputs the exact string twice, e.g.
function: 0x6000032a8ca0
function: 0x6000032a8ca0
However, the next time you run the program, you’ll get something else, such as
function: 0x600002650ca0
function: 0x600002650ca0
We don’t usually write things like print(answer)
in our code except by accident! When we do, it’s likely a bug. We probably meant to write print(answer())
with those parentheses ()
that tells Lua to please execute the answer
function and capture the result, So, while the output from print(answer)
is opaque, it’s generally followed by an “oops, I forgot some parentheses!”
Two Non-Native Types
One of Lua’s great strengths is its ability to interface with things written in other languages. Lua’s two non-native types, userdata
and thread
, are associated with non-native items.
When you try to print something implemented in another language, it is hardly surprising that Lua can only say, “I see that as a piece of user data located at this address in memory.”
You can’t expect much more; if you need something more descriptive, you’d expect to perform that action in another language.
Array Tables
Finally, we come to the all-important table
type, starting with Lua arrays, a subset of this type.
The table
type is Lua’s only “complex” native data type and is amazingly versatile. Once you use Lua for anything beyond trivial scripts, you will inevitably build and interpret many tables.
Tables can contain all Lua types, including Lua functions and other tables, which can refer to each other in cycles, etc.
But let’s start with a simple array example:
gents = {'Tom', 'Dick', 'Harry'}
print(gents)
The corresponding output will be something like:
table: 0x600001d32980
This output is similar in spirit to what we got by calling print
on that Lua function shown above. Lua recognises the gents
object as a table
at some memory address, and that’s all it reveals.
To emphasise the point, we note that the Lua assignment operator for tables creates another variable that points to the same table:
gents = {'Tom', 'Dick', 'Harry'}
aka = gents
print(gents)
print(aka)
This outputs:
- 1
-
The variables
gents
andaka
are really pointers to the same memory address. - 2
- The specific memory location will vary from run to run,
Of course, this output is not helpful and isn’t what you’d naively expect!
You search for “How do I print a Lua array?” and find an answer like:
print(table.concat(gents, ", "))
And sure enough, out pops the string “Tom, Dick, Harry”.
At this point, you may feel aggrieved!
Why didn’t print(gents)
return something like "Tom", "Dick", "Harry"
in the first place? What is that table.concat(...)
call? Everybody would prefer the second output over being told that Lua recognises gents
as a table
that resides at some address in memory. There must be a better way!
Key-Value Tables
Things get even more screwy when you try to print a more general Lua table
that isn’t an array:
1mouse = {
first = 'Minnie',
last = 'Mouse'
}
- 1
- This is a Lua table with two name-value pairs.
Lua adheres to Mies Van der Rohe’s “less is more” mantra. It likes to keep things simple!
For example, we saw earlier that the Lua number
type encompasses all classes of integers and all classes of floating-point numbers. Other “system-level” computer languages distinguish between them, as every piece of computer hardware has different paths for the types at the chip level. Programmers of those languages must understand and care about the differences between integers and floats. That distinction makes sense if you want to squeeze the maximum performance from every CPU nanosecond.
Lua has different goals. It is still efficient, but it is willing to spare a few compute cycles to limit type complexity for the programmer. If you code in Lua, you can only use generic “numbers” and trust that Lua handles them efficiently, whatever the form of those numbers of interest may be.
The Lua table
type is similar, encompassing simple arrays, like the gents
example, and more general hash map tables with explicit keys and values, like the mouse
example. This combination seems odd if you have done any programming before encountering Lua.
The other “real” computer languages you learnt all distinguish between arrays and dictionaries. In those languages, arrays are part of the core language. A long, early manual chapter will expound on their use. The description for the name-value dictionary-type container will be in the back of the book in the section dedicated to the language’s “standard” library. This division reflects that the hardware paths for the two container types are generally very different. Arrays are considered more fundamental than dictionaries of name-value pairs.
Lua, in effect, says:
Trust me, build that table however makes the most sense to you, and let me worry about efficiency.
Overall, this works remarkably well. Lua internally splits tables into an array part that zips along the high-speed lane of the hardware highway and a dictionary part that is necessarily over on a lower-speed lane. Again, the trade-off is between programming simplicity with a “trust me, I’ll get you almost the same speed” clause and the maximum performance per nanosecond.
Given our lack of success at getting something useful out of print
for an array, we aren’t going to be surprised to see similar nonsense from print(mouse)
:
table: 0x6000027d9b00
Lua tells you that mouse
is a table residing at a specific memory location.
True, but not very helpful!
If we try our earlier trick
print(table.concat(mouse, ", "))
Lua outputs a blank line. Well, you just learnt something—apparently, table.concat
only works on Lua array-like tables.
A Lua array has implicit keys with successive integers starting at 1. General Lua hash tables have explicit keys, such as the strings first and last in the mouse example. The keys can be any Lua object, not just strings.
|
Of course, we can unpack our table and write:
print(name.first, name.last)
Then we get “Minnie Mouse”.
Another quick search provides an answer for tables with an arbitrary number of key-value pairs:
for k, v in pairs(mouse) do
print(k,v)
end
When I ran it the first time, this output:
last Mouse
first Minnie
The output is a valid representation of the data but not in a natural order. Running the script a few more times may eventually give a better order:
first Minnie
last Mouse
Lua stores key-value tables in an undefined order, which can vary from run to run. The pairs function iterates through the key-value pairs in storage order, so it’s not constant. Arrays, on the other hand, are always stored in the natural increasing index order.
|
First Shot at Tables
At this point in your Lua journey, you probably search for “How do I convert a Lua table to a string?”. You will find a lot of suggestions, some quite good and some not so good.
But suppose you wish to build your very own solution based on the discovery that you can use the pairs
function to iterate through a table.
Well, you know that recursion is the touch of the hand of God and that Spidey sense is telling you this is the place to use it!
With a little spare time on your hands, you come with code along the lines of:
1function table_string(tbl)
2local indent = ' '
3local retval = '{\n'
for k, v in pairs(tbl) do
4retval = retval .. indent
5retval = retval .. tostring(k) .. ' = '
if type(v) ~= 'table' then
6retval = retval .. tostring(v)
else
7retval = retval .. table_string(v)
end
8retval = retval .. ',\n'
end
9retval = retval .. '\n}'
return retval
end
- 1
-
A descriptive function name. However, we should check that
tbl
is a Lua table! - 2
-
We hard code the
indent
to four spaces.
This is a parameter the user will want to set. - 3
-
Start the return string with a
{
and a newline character.
The user might want to set the table delimiters to something other than braces. - 4
- Indent every key-value pair inside the table.
- 5
-
Add the key
k
as a string and an assignment=
.
Another potentially user-settable parameter. - 6
-
The value
v
isn’t a table. We can usetostring
and add it to the return value. - 7
- A sub-table! “Look, Ma, that’s recursion. I’m a real programmer!””
- 8
-
End the table element with a separator
,
followed by a newline character. - 9
-
Finally, close the string with a newline character and a matching table end-delimiter
}
.
While we have begun handling nested sub-tables using recursion, this version will not get the indentation right. We’ll come back to that problem shortly. |
You try it out on our little mouse by calling print(table_string(mouse))
, which returns:
{
first = Minnie,1
last = Mouse,
}
- 1
- That’s an annoying extra comma and newline character after the final table element.
Overall, it’s not bad! There is that extra comma and new line that looks a bit off, and of course, if you run that print(table_string(mouse))
a few times, you will see that the print order of the elements changes:
{
last = Mouse,1
first = Minnie,
}
- 1
- The element order changed, but the extra comma and newline character remains firmly in place.
Making indent
a Parameter
Before we tackle the extra comma and newline character, let’s make indent
a parameter. This is easy to do by adding a second optional argument to the function:
- 1
- We add a second argument to the function, which should be a string.
- 2
-
If the user doesn’t provide a value for
indent
, we default to four spaces.
Only multiline formats will ever use indentation. The output should be a single line if the function is called with an indent
as the empty string. We can use this check to trigger inline versus multiline output:
function table_string(tbl, indent)
indent = indent or ' '
1local nl = indent == '' and '' or '\n'
2local retval = '{' .. nl
for k, v in pairs(tbl) do
retval = retval .. indent
retval = retval .. tostring(k) .. ' = '
if type(v) ~= 'table' then
retval = retval .. tostring(v)
else
3retval = retval .. table_string(v, indent)
end
4retval = retval .. ',' .. nl
end
5retval = retval .. nl .. '}'
return retval
end
- 1
-
We parametrise the “newline character”
nl
and set it to the empty string for inline outputs. - 2
-
Instead of hard-coding the newline character, we add
nl
to the opening brace - 3
-
We pass
indent
to the recursive call. - 4
-
We add
nl
to the separator,
. - 5
-
Finally, we add
nl
to the closing brace.
Whenever you change the calling signature of a recursive function, you must update the recursive call to match. From experience, this is a common source of bugs. |
Now, if you call print(table_string(mouse, ''))
, you will get:
{first = Minnie,last = Mouse,}
That’s a single line with no newlines or indentation, though there is an extra trailing comma we need to eliminate.
Anatomy of a Table
Although our current output string is flawed, nonetheless, it highlights the general structure for any table:
table-begin-delimiter
content table-end-delimiter
In our first attempt, the table_begin
and table_end
delimiters are the opening and closing braces surrounding the table content. The table delimiters should be user-configurable.
The table content is a sequence of zero or more elements:
table-begin-delimiter
element,
element,
... table-end-delimiter
Each element includes a key, possibly an assignment operator, and a value. Array “keys” are the array indices and are often not shown as they are implicit in the ordering of the values.
In some formats like JSON, the keys must be enclosed in double-quotes. We can accommodate that requirement by introducing key delimiters, key_begin
and key_end
. The assignment operator can always be incorporated as part of key_end
.
Elements also have begin and end delimiters, though those vary according to context. In our current implementation, the element beginning delimiter is some indentation. The element ending delimiter is the comma character followed by a new line. This is the separator between elements in the table.
The indentation amount and the element separator should be user-configurable.
Using this terminology, we can rewrite our table_string
function:
function table_string(tbl, indent)
indent = indent or ' '
local nl = indent == '' and '' or '\n'
1local table_begin = '{' .. nl
local table_end = nl .. '}'
2local key_begin = ''
local key_end = ' = '
3local sep = ',' .. nl
4local content = ''
for k, v in pairs(tbl) do
5local k_string = key_begin .. tostring(k) .. key_end
6local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, indent)
7content = content .. indent .. k_string .. v_string .. sep
end
8return table_begin .. content .. table_end
end
- 1
- We introduce the table delimiters as parameters.
- 2
- We introduce the key delimiters as parameters.
- 3
- We introduce the element separator as a parameter.
- 4
-
Capture the table content in
content
. - 5
-
Appropriate delimiters surround the key string.
We might cause this to disappear entirely iftbl
is a Lua array. - 6
- The value string may need to be found using recursion.
- 7
- Add the current element to the content.
- 8
- Finally, surround the table content with table delimiters.
At first blush, this does not look like an improvement. It is undoubtedly more verbose. However, it is a step towards the goal of supporting many different output formats in one function.
If we set key_begin
and key_end
to '"'
and '": '
respectively, we get:
{
"last": Mouse,
"first": Minnie,
}
This is a good start on JSON output, but we still have the trailing comma problem, and the string values are not enclosed in double-quotes. We’ll return to this later.
Formatting Options
There are already quite a few parameters at the top of the table_string
function that the user might want to set, and more are to come.
Formatting problems, such as the one here and UI settings for many programs, are notorious for having numerous settable parameters. If a parameter is missing, it should default to some reasonable value.
We could continue adding arguments to the function, but that’s not a great idea.
(tbl, indent, table_begin, table_end, key_begin, key_end, sep) table_string
This calling signature is not user-friendly. It is too verbose and error-prone. It’s easy to forget the arguments’ order or leave one out.
Some languages have the idea of named arguments, which greatly help in this situation. Lua doesn’t directly support named parameters but has a versatile table
object. We can pack all the formatting options into a table and pass that table as a single argument:
(tbl, opts) table_string
opts
is a table that holds all our formatting parameters. For example, we might query opts.indent
for the desired tab size, etc.
The opts
argument itself should be optional. For now, we’ll assume that if it is present, it has all the fields we need—it is fully defined.
Let’s set up a default fallback table of formatting options that might look like this:
local pretty_options = {
indent = ' ',
table_begin = '{',
table_end = '}',
key_begin = '',
key_end = ' = ',
sep = ','
}
We should have a few different sets of formatting options. For example, we would like a multiline version, as well as a more compact, inline version. We can set up a table of options for each of these, so let’s start with that pretty version:
1local options = {}
2options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
key_begin = '',
key_end = ' = ',
sep = ','
}
- 1
- We set up a table to hold all our tables of formatting parameters.
- 2
-
We set up a sub-table
options.pretty
of options for the pretty version.
To use this, our primary table_string
function becomes:
1function table_string(tbl, opts)
2opts = opts or options.pretty
3local indent = opts.indent
local nl = indent == '' and '' or '\n'
4local tb = opts.table_begin .. nl
local te = nl .. opts.table_end
5local kb, ke = opts.key_begin, opts.key_end
6local sep = opts.sep .. nl
local content = ''
for k, v in pairs(tbl) do
local k_string = kb .. tostring(k) .. ke
7local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string .. sep
end
return tb .. content .. te
end
- 1
- We changed the calling signature to incorporate an optional table of formatting parameters.
- 2
-
We use the
options.pretty
table if’ options’ is absent. - 3
-
Grab the
indent
field from theopts
table. - 4
-
We unpack the
opts
table into local variables for convenience wheretb
istable_begin
, etc. - 5
-
We unpack the
opts
table into local variables for convenience wherekb
iskey_begin
, etc. - 6
- Localise the element separator.
- 7
-
Remember to pass the
opts
table to the recursive call!
We can now call print(table_string(mouse))
and get the same output as before:
{
last = Mouse,
first = Minnie,
}
Let’s add a set of options that is specifically for one-line output. We start with a little function to make a shallow clone of any table:
local function table_clone(tbl)
local retval = {}
for k,v in pairs(tbl) do retval[k] = v end
return retval
end
Then we can easily set up options.inline
:
- 1
-
We make a shallow copy of
options.pretty
and then override the fields we want to change. - 2
-
We set
indent
to an empty string.
Now we can call print(table_string(mouse, options.inline))
and get:
1 {last = Mouse,first = Minnie,}
- 1
- Still have that pesky trailing comma, but we’ll fix that soon.
The inline version looks cramped. One way to improve things is to add some spaces to the table delimiters and element separator:
options.inline = table_clone(options.pretty)
options.inline.indent = ''
1options.inline.table_begin = '{ '
options.inline.table_end = ' }'
2options.inline.sep = ', '
- 1
- Add some breathing room between the table delimiters and the content.
- 2
- Space out the table elements.
An alternate approach is to add those spaces on the fly when needed. Some inline formats want to be as compact as possible, so we can make adding those spaces a formatting option:
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
key_begin = '',
key_end = ' = ',
sep = ',',
1inline_spacer = ' '
}
options.inline = table_clone(options.pretty)
options.inline.indent = ''
- 1
-
As the name suggests,
inline_spacer
controls how generous the spacing is for the inline version of a set of formatting options.
Here’s how we use that new formatting field:
function table_string(tbl, opts)
opts = opts or options.pretty
local tb, te = opts.table_begin, opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
1local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
tb = tb .. nl
te = nl .. te
local content = ''
for k, v in pairs(tbl) do
local k_string = kb .. tostring(k) .. ke
local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string .. sep
end
return tb .. content .. te
end
- 1
-
If there is an indentation, then
nl
is a newline character; otherwise its the user-configurable spacer.
Finally, we add a couple of convenience functions that package table_string
with a specific set of options:
function pretty(tbl) return
(tbl, options.pretty)
table_stringend
function inline(tbl)
return table_string(tbl, options.inline)
end
For example, print(inline(mouse))
now returns:
{ last = Mouse, first = Minnie, }
print(pretty(mouse))
returns:
{
last = Mouse,
first = Minnie,
}
Adding small facade functions like pretty and inline can make the API more user-friendly. Providing a few of these functions for everyday use cases is a good idea.
|
The Comma Problem
It’s time to eliminate the “comma” problem, which is done by not adding the element separator after the last element.
Let’s start with Lua arrays, which are tables you can iterate through using indices:
1for i = 1, #tbl do
...
end
- 1
-
#tbl
is a built-in Lua function that returns the number of elements in the array part oftbl
.
For arrays, we always know when we are at the last element.
We can replace the line that looks like this:
...
content = content .. indent .. k_string .. v_string .. sep
...
with
...
content = content .. indent .. k_string .. v_string
1if i + 1 < #tbl then content = content .. sep end
...
- 1
-
We are using
i
as the current element index, and if we’re at the end of the array, we avoid adding a separator.
However, we want to handle all Lua tables, which may or may not be arrays. Unfortunately, we cannot rely on #tbl
to return the number of elements in a general tbl
. If we have the Lua array of strings:
local friends = { "Mickey", "Goofy" }
Then #friends
will return 2.
If, instead, we have a general table that happens to have some key-value elements like:
local mouse_in_characters =
{
'a', 'b', first = "Minnie", last = "Mouse", 'c', 'd'
}
Then #mouse_in_characters
returns 4
!
Even though we have deliberately written mouse_in_characters
as a couple of key-value elements surrounded by straight array elements, Lua will aggregate the array elements {a, b, c, d}
into an array part for the table and, under the covers, keep the two key-value elements in a separate hash map. If you try:
for i = 1, #mouse_in_characters do
print(mouse_in_characters[i])
end
Out pops:
a
b
c d
We cannot access the “dictionary” part of the table this way!
Lua tables can be arrays, dictionaries, or both in a single instance! This makes Lua tables very flexible, but it can also be a source of confusion. I suspect it wasn’t a great design decision, as it makes it harder to write general-purpose functions that work with arrays and dictionaries, which are very different data structures. It is what it is, and we must work with it. |
Using an Extra Pass
However, we know that the pairs
function will access all the table elements:
for k, v in pairs(mouse_in_characters) do
print('key', k, 'value', v)
end
Yields
1
key 1 value a
key 2 value b
key 3 value c
key 4 value d2
key last value Mouse key first value Minnie
- 1
- The “array” elements will always come first and always in the natural order.
- 2
- The general key-value elements come next but in an undefined order that changes from run to run.
So, for the price of an extra pass, we can compute the number of elements in any table:
local function table_size(tbl)
local size = 0
for _,_ in pairs(tbl) do size = size + 1 end
return size
end
Then print(table_size(mouse_in_characters))
will return 6
.
We can use table_size
in our table_string
function:
function table_string(tbl, opts)
opts = opts or options.pretty
local tb, te = opts.table_begin, opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
tb = tb .. nl
te = nl .. te
local content = ''
1local i, size = 0, table_size(tbl)
for k, v in pairs(tbl) do
2i = i + 1
local k_string = kb .. tostring(k) .. ke
local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string
3if i < size then content = content .. sep end
end
return tb .. content .. te
end
- 1
-
i
’ is the current element index running from1
tosize
. - 2
- Increment the element “index”.
- 3
- Add the separator if we are not at the last element.
With this version:
print(pretty(mouse))
Yields:
{
first = Minnie,1
last = Mouse }
- 1
- Yeah! That extra comma is gone!
print(inline(mouse))
is also correct:
{ first = Minnie, last = Mouse }
Using a Guard
Using the table_size
function means we make an extra pass through the table.
We can avoid the extra pass by using a guard variable. While we cannot know when we are at the last element, we do know when we are at the first element. All elements except the first element have a preceding element separator. With that in mind, we can rearrange the main loop in table_string
:
function table_string(tbl, opts)
opts = opts or options.pretty
local tb, te = opts.table_begin, opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
tb = tb .. nl
te = nl .. te
local content = ''
1local first_element = true
for k, v in pairs(tbl) do
2if first_element then first_element = false else content = content .. sep end
local k_string = kb .. tostring(k) .. ke
local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string
end
return tb .. content .. te
end
- 1
-
We initialize
first_element
totrue
. - 2
- If we’re not at the first element, we start by adding an element-end delimiter before the current element.
This is a common idiom in Lua for handling iterations where you must do something special for the final element. Instead, you do something special for the first element and then do the usual thing for all subsequent elements. |
This code version avoids the extra pass and still eliminates the trailing comma.
print(pretty(mouse))
Yields:
{
first = Minnie,
last = Mouse }
Computing the size of tbl does require an extra pass. However, as we shall see shortly, we can use that pass to gather other useful information, so we are happy enough to pay the price of some extra compute cycles.
|
Empty Tables
We have one more issue to address. print(pretty({}))
returns:
{ }
print(inline({}))
returns:
{ }
We would prefer to see {}
in both cases. If we know the size of tbl
, then we can add a quick check for an early return at the top of the function,
1local function empty_table_string(opts)
2local retval = (opts.table_begin .. opts.table_end):gsub('%s+', '')
return retval
end
function table_string(tbl, opts)
opts = opts or options.pretty
local size = table_size(tbl)
3if size == 0 then return empty_table_string(opts) end
local tb, te = opts.table_begin, opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
tb = tb .. nl
te = nl .. te
local content = ''
local i = 0
for k, v in pairs(tbl) do
i = i + 1
local k_string = kb .. tostring(k) .. ke
local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
- 1
- We add a helper function to return a string for an empty table, taking into account the table delimiters.
- 2
-
It does that by concatenating the table delimiters and then using
gsub
to remove all whitespace. - 3
-
In our
table_string
function we look for an early exit for empty tables.
With this change in place, print(pretty({}))
and print(inline({}))
both return {}
.
Arrays vs. Tables
Lua has one type of table. It can be an array, a dictionary, or a mix of both. Under the covers, Lua keeps the array part separate from the dictionary part for efficiency.
Most programming languages have a distinct array type, and differentiating between arrays and dictionaries is often crucial.
For example, JSON is a popular human-readable data exchange format with a separate array type. In JSON, arrays are always ordered and have implicit keys that are consecutive integers. They are represented by square brackets [ ... ]
to distinguish them from dictionaries represented by curly braces { ... }
.
We can easily write a small function to determine whether a table is an array or a dictionary:
local function table_is_array(tbl)
local size = 0
for _,_ in pairs(tbl) do
size = size + 1
1if tbl[size] == nil then return false end
end
2return true
end
- 1
-
Arrays are indexed by consecutive integers from 1. If we find a hole, we know that
tbl
is not an array. - 2
-
If we make it through the loop without finding a hole, we know that
tbl
is an array.
If tbl
is a Lua array, a complete pass through tbl
is required to confirm it is an array. We can add the check to our existing table_size
function, which we rename metadata
:
local function metadata(tbl)
local size = 0
1local array = true
for _,_ in pairs(tbl) do
size = size + 1
2if array and tbl[size] == nil then array = false end
end
3return size, array
end
- 1
-
We assume
tbl
is an array until we find otherwise. - 2
-
If we find a “hole”, then
tbl
is not an array. - 3
-
Return both the computed
size
andarray
values.
Lua functions can return multiple values. This feature can be handy, but you don’t want to overdo it, as the function’s caller needs to get the order of the returned values right. Correct ordering is not a problem for two or even three values. After that, it is best to put the returns in a name-value table. |
We use metadata to indicate that we are returning more than the table size. We will add other bits of metadata as we go along. Do not confuse this with Lua’s metatable concept, which allows you to override the behaviour standard operators like + , - , etc. and the behaviour of methods like tostring , print , etc.
|
We can add some array delimiters to our option tables:
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
1array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' '
}
- 1
- We will differentiate arrays by using square bracket delimiters.
Let’s put the new metadata
method to use in the main event:
function table_string(tbl, opts)
opts = opts or options.pretty
1local size, array = metadata(tbl)
if size == 0 then return empty_table_string(opts) end
2local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
tb = tb .. nl
te = nl .. te
local content = ''
local i = 0
for k, v in pairs(tbl) do
i = i + 1
local k_string = kb .. tostring(k) .. ke
local v_string = type(v) ~= 'table' and tostring(v) or table_string(v, opts)
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
- 1
-
metadata
returns the size and the typetbl
.
The order is fixed. - 2
-
We can pick suitable table delimiters depending on whether
tbl
is an array.
Now print(pretty(mouse))
returns:
{
last = Mouse,
first = Minnie }
while print(pretty(friends))
returns:
- 1
- Arrays are now delimited with square brackets.
- 2
-
However, we are outputting the array indices
1
,2
,...
, which is generally unnecessary.
Lua has “keys” for all table elements. In the case of arrays, those keys are the array indices, which are consecutive integers starting at 1. You don’t usually need to see those, so we alter our function only to show keys if tbl
is not an array.
function table_string(tbl, opts)
...
for k, v in pairs(tbl) do
...
1if not array then content = content .. kb .. tostring(k) .. ke end
...
end
...
return retval
end
- 1
- Now, we don’t show keys for array tables.
Now print(pretty(friends))
returns:
[
Mickey,
Goofy ]
The output from print(pretty(mouse))
remains unchanged:
{
last = Mouse,
first = Minnie }
Sometimes, you need to see the “keys:” for an array. For example, if you are debugging and want to see the array indices. Let’s add an option to show the keys for arrays:
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
1show_indices = false
}
- 1
- Typically, we suppress seeing array indices.
The corresponding change to table_string
is straightforward:
function table_string(tbl, opts)
opts = opts or options.pretty
local size, array = metadata(tbl)
if size == 0 then return empty_table_string(opts) end
1local show_keys = not array and true or opts.show_indices
...
for k, v in pairs(tbl) do
i = i + 1
2local k_string = show_keys and kb .. tostring(k) .. ke or ''
...
end
...
return retval
end
- 1
-
We set
show_keys
totrue
unless we are dealing with an array, in which case we use whatever is dictated byopts.show_indices
. - 2
-
We only show keys if
show_keys
istrue
.
That is always the case for non-arrays and is user-settable for arrays.
With that change, print(inline(friends))
returns [ Mickey, Goofy ]
. If you set opts.show_indices = true
, then print(inline(friends))
returns [ 1 = Mickey, 2 = Goofy ]
.
Finally, let’s add a couple of sets of formatting options that don’t include separate array delimiters. This is the style you most often see in Lua code, so it is handy to have it available.
options.classic = table_clone(options.pretty)
1options.classic.array_begin = '{'
options.classic.array_end = '}'
2function classic(tbl)
return table_string(tbl, options.classic)
end
- 1
-
All tables use the same delimiters
{ ... }
. - 2
-
We add a convenience function,
classic
, that uses theoptions.classic
.
Now print(classic(friends))
returns
{
Mickey,
Goofy }
Adding Indentation
Earlier, we alluded that while our solution does something for nested sub-tables by recursion, it certainly gets indentation screwed up in the process.
Suppose we introduce a table that captures Minnie’s “user profile” and try to print it:
local user =
{
first = "Minnie",
last = "Mouse",
1friends = { "Mickey", "Goofy" }
}
- 1
- Minnie’s friends are captured in an array.
Then, print(pretty(user))
might yield:
{
first = Minnie,1
friends = [
Mickey,
Goofy
],
last = Mouse }
- 1
-
We see
friends
as a nice array, but the indentation is incorrect.
Ideally, we’d like to see:
{
friends = [
Mickey,
Goofy
],
first = Minnie,
last = Mouse }
Our current output is readable but gets less and less with larger tables and more nesting. Deeper nesting requires more indentation! We better fix that next.
The most straightforward idea is to add indentation to the string returned from the recursive call table_string(v, opts)
.
We can make a function that adds indentation line-by-line to any Lua string:
local function indent_string(str, indent)
1if not indent or indent == "" or not str or str == "" then return str end
2local ends_with_newline = str:sub(-1) == "\n"
local indented_str = ""
3local first_line = true
4for line in str:gmatch("([^\n]*)\n?") do
5if not first_line then indented_str = indented_str .. "\n" end
indented_str = indented_str .. indent .. line
first_line = false
end
6if ends_with_newline then indented_str = indented_str .. "\n" end
return indented_str
end
- 1
-
Handle some edge cases, as we do not need to do anything if the
indent
is the empty string. This check allows downstream methods to callindent_string
without worrying that it will do something stupid. - 2
-
We will add the indentation line-by-line. If the input
str
ends with a new line, the output should also. - 3
- This looks like that guard “trick” we discussed earlier.
- 4
-
Here, we iterate through
str
line-by-line with an unknown number of hits using Lua’s pattern search functiongmatch
. - 5
- Add newline characters to all but the first line.
- 6
- Match the input — if it ends with a new line, the output will also.
Aside: Lua Patterns
The gmatch
method added to the string class is another type of iterator. In this case, it looks for a pattern in the string str
and returns the next match. When it can find no more matches, it returns nil
and the iteration loop finishes.
Lua string patterns are like regular expressions in other languages, though they use fewer features. For example, if we have the string "ho, ho, ho"
then the pattern "ho"
matches the literal character 'h'
followed immediately by 'o'
. We might use it like this:
local str = "ho, ho, ho"
local count = 0
for _ in str:gmatch("ho") do
count = count + 1
print("Found", count)
end
That will output:
Found 1
Found 2 Found 3
Of course, if gmatch
and friends could only find literal matches, they wouldn’t be powerful enough for most applications. While Lua’s pattern-matching library is slim, fortunately, it’s not that slim. Lua patterns can encompass classes of characters instead of literal ones.
In the indent_string
function, the pattern we successively match on is "([^\n]*)\n?"
. This has many characteristic elements of a regular expression: it is terse and full of punctuation characters!
If you remove the parentheses, you have "[^\n]*\n?"
. The first part "[^\n]"
simply says to look for a substring that starts with either the beginning of the string (denoted by the magic character, the caret '^'
) or the newline character '\n'
. In patterns, you create “ors” with square brackets, so "[xyz]"
will match on 'x'
or 'y'
or 'z'
. The next part, ’ “?“starts with another magic character
’’that matches *anything*. The
‘?’` is another magic incantation; in this case, it tells the pattern matcher that the previous character (the newline character) is optional.
In all, the `“[^\n]*?” pattern says to match on a substring that starts at the beginning of the string or a newline character and finishes when you hit a newline character or run out of string.
The only thing missing is telling the pattern-matching engine which bits of the pattern constitute the substring we want. What should the pattern matcher capture?
That is what the parentheses are used for. The engine will capture whatever you put inside parentheses. In this case, we have parentheses around the first bit "([^\n]*)\n?"
so we capture everything from either the string start or a newline character until we hit a newline character or the end of the string. In other words, we capture a line in the string. The g
in gmatch
stands for “global,” so it doesn’t stop at the first line but keeps iterating through the whole string line by line.
Indenting Tables
With the indent_string
method in place, we can rewrite our primary function:
function table_string(tbl, opts)
...
for k, v in pairs(tbl) do
...
local v_string = ''
if type(v) == 'table' then
v_string = table_string(v, opts)
v_string = indent_string(v_string, indent)
else
v_string = tostring(v)
end
...
end
...
return retval
end
With those changes, we can call print(pretty(user))
and get:
{
friends = [
Mickey,
Goofy
],
last = Mouse,
first = Minnie }
The elements in the friends
array are now indented correctly, but the opening brace is also indented.
We can alter our indent_string
function to ignore the first line optionally:
1local function indent_string(str, indent, ignore_first_line)
2ignore_first_line = ignore_first_line or false
if not indent or indent == "" or not str or str == "" then return str end
local ends_with_newline = str:sub(-1) == "\n"
local indented_str = ""
local first_line = true
for line in str:gmatch("([^\n]*)\n?") do
if not first_line then indented_str = indented_str .. '\n' end
local tab = first_line and ignore_first_line and '' or indent
indented_str = indented_str .. tab .. line
first_line = false
end
if ends_with_newline then indented_str = indented_str .. "\n" end
return indented_str
end
- 1
-
We have added an optional boolean parameter
ignore_first_line
to the function. - 2
-
If the user doesn’t provide a value for
ignore_first_line
, we default tofalse
.
With those changes, we can call print(pretty(user))
and get:
{
friends = [
Mickey,
Goofy
],
last = Mouse,
first = Minnie }
The inline format print(inline(user))
is also correct:
{ last = Mouse, first = Minnie, friends = [ Mickey, Goofy ] }
Other Output Formats
We will look at a few other formats commonly used for viewing tables.
Indentation Only
Another commonly used multiline table format avoids delimiters and instead relies on indentation to show the structure. Here is how our user
table would look in this format:
last: Mouse,
first: Minnie,1
friends:
Mickey, Goofy
- 1
- This all looks straightforward, but this format is tricky to implement.
We add a new set of formatting options for this format:
options.alt = table_clone(options.pretty)
options.alt.table_begin = ''
options.alt.table_end = ''
options.alt.array_begin = ''
options.alt.array_end = ''
options.alt.key_end = ': '
Nothing too wild here; we start with options.pretty
and set the table/array delimiters to blank strings. We also set up colons to act as the assignment operators.
We also add the usual convenience function that packages those formatting options with table_string
:
function alt(tbl)
return table_string(tbl, options.alt)
end
If we try print(alt(user))
we get something like:
1
first: Minnie,
last: Mouse,
friends:
Mickey, Goofy
- 1
- An extra indentation layer isn’t needed when the table delimiters are blank.
- 2
- There are also some extra newlines at the end of the output.
A first attempt at fixing this format is to remove the indentation from the top-level elements. We can do this by adding a check for a blank table begin-delimiter:
function table_string(tbl, opts)
opts = opts or options.pretty
local size, array = metadata(tbl)
if size == 0 then return empty_table_string(opts) end
local show_keys = not array and true or opts.show_indices
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
1if tb ~= '' then tb = tb .. nl end
2if te ~= '' then te = nl .. te end
3sep = sep .. nl
local no_delims = tb == ''
4if no_delims then indent = '' end
local content = ''
local i = 0
for k, v in pairs(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
v_string = table_string(v, opts)
5v_string = indent_string(v_string, opts.indent, true)
else
v_string = tostring(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
- 1
- We add a new line to the table begin-delimiter if we use multiline output and the table begin-delimiter is not blank.
- 2
- We add a new line to the table end-delimiter if we use multiline output and the table end-delimiter is not blank.
- 3
- We add a new line to the separator if we are using multiline output.
- 4
-
If the table begin-delimiter is blank, we don’t indent the top-level elements in
tbl
. - 5
- We still indent any sub-table elements with the “real” indentation amount from the formatting options.
With that in place, print(alt(user))
returns something unindented at the outermost level and without the extra newlines at the end:
first: Minnie,
last: Mouse,1
friends: Mickey, Goofy
- 1
-
There should be a new line after
friends
here.
We are missing a newline character before the sub-array of friends. It should only be present if the table is multiline and the begin-delimiter is blank. This suggests a small addition to the table_string
function:
function table_string(tbl, opts)
...
for k, v in pairs(tbl) do
...
if type(v) == 'table' then
v_string = table_string(v, opts)
1if tb == '' then v_string = nl .. v_string end
...
end
return tb .. content .. te
end
- 1
- The suggested fix.
However, this doesn’t quite work as expected as print(alt(user))
now returns:
last: Mouse,
first: Minnie,
friends:1
Mickey, Goofy
- 1
-
We’re missing an indentation on the
Mickey
line.
However, we can fix this by using that third ignore_first_line
argument in indent_string
:
function table_string(tbl, opts)
...
for k, v in pairs(tbl) do
...
if type(v) == 'table' then
v_string = table_string(v, opts)
1v_string = indent_string(v_string, opts.indent, not no_delims)
2if no_delims and show_keys then v_string = nl .. v_string end
...
end
return tb .. content .. te
end
- 1
- We skip indenting the first line of the sub-table unless the table begin-delimiter is blank.
- 2
- We add a newline character if the table begin-delimiter is blank and we are showing keys.
The full table_string
function now looks like:
function table_string(tbl, opts)
opts = opts or options.pretty
local size, array = metadata(tbl)
if size == 0 then return empty_table_string(opts) end
local show_keys = not array and true or opts.show_indices
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
local indent = opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
sep = sep .. nl
if tb ~= '' then tb = tb .. nl end
if te ~= '' then te = nl .. te end
local no_delims = tb == ''
if no_delims then indent = '' end
local content = ''
local i = 0
for k, v in pairs(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
v_string = table_string(v, opts)
v_string = indent_string(v_string, opts.indent, not no_delims)
if no_delims and show_keys then v_string = nl .. v_string end
else
v_string = tostring(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
With this change in place print(alt(user))
returns something like:
1
last: Mouse,
first: Minnie,
friends:
Mickey, Goofy
- 1
- The elements can be ordered differently.
The other formats still work as expected. print(pretty(user))
returns:
{
last = Mouse,
first = Minnie,
friends = [
Mickey,
Goofy
] }
print(inline(user))
returns:
{ last = Mouse, first = Minnie, friends = [ Mickey, Goofy ] }
JSON
The JSON format is a popular format for exchanging data between systems. Like our pretty
format, JSON delimits tables with curly braces and arrays with square brackets. It surrounds keys with double quotes and uses colons to separate keys from values.
Let’s add a new set of formatting options for JSON:
options.json = table_clone(options.pretty)
options.json.key_begin = '"'
options.json.key_end = '": '
We also add the usual convenience function that packages those formatting options with table_string
:
function json(tbl)
return table_string(tbl, options.json)
end
If we try `print(alt(user))` we get:
txt
```{
"last": Mouse,
"first": Minnie,
"friends": [
Mickey,
Goofy
]
}
This isn’t quite JSON, as JSON requires string values to be surrounded by double quotes.
In fact, it is a good idea to always surround string values with double quotes. Lua’s string
class has a string.format
method that is perfect for this task.
For example, string.format("Hello, %s!", "world")
returns "Hello, world!"
. The %s
is a placeholder for a string value that is passed as a trailing argument to string.format
. string.format
is a wrapper around the venerable C function sprintf
and uses almost all the same format specifiers. So %s
is used for strings, %d
for integers, and %f
for floating-point numbers etc.
One of Lua’s primary use cases is dealing with large amounts of text that often includes multiline strings. It is useful to be able to see those in their raw form. For that reason, Lua has a special format specifier %q
that is used to quote strings. It is similar to %s
but it adds double quotes around the string and escapes any special characters. For example, string.format("%q", 'Hello, "world"!')
returns '"Hello, \"world\"!"'
.
We can use this format specifier to good effect. While at it, we will add a simple_string
counterpart to table_string
to take any Lua object and return a simple string representation.
1local function simple_string(obj)
if obj == nil then return 'nil' end
local obj_type = type(obj)
2if obj_type == 'number' or obj_type == 'boolean' or obj_type == nil then
return tostring(obj)
elseif obj_type == 'string' then
3return string.format("%q", obj)
elseif obj_type == 'table' then
4return string.format("%p", obj)
elseif obj_type == 'function' then
return '<function>'
elseif obj_type == 'userdata' then
return '<userdata>'
elseif obj_type == 'thread' then
return '<thread>'
else
5return '<UNKNOWN type: ' .. tostring(obj) .. '>'
end
end
- 1
-
The new function
simple_string
takes any Lua object and returns a simple string representation of it. - 2
-
We let
tostring
handle numbers, booleans, andnil
values. - 3
-
We use
string.format
with the%q
format specifier to quote strings. - 4
-
We use
string.format
with the%p
format specifier to print the memory address of a table.
We will usually defer table conversion totable_string
. - 5
- We should never reach this point, but add a catch-all for unknown types that Lua might introduce.
We can now use simple_string
in our table_string
function:
function table_string(tbl, opts)
...
local i, content = 0, ''
for k, v in pairs(tbl) do
i = i + 1
1local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
...
else
2v_string = simple_string(v)
end
...
end
return tb .. content .. te
end
- 1
-
We still use
tostring
to convert keys to strings and rely on key delimiters to add quotes if needed. - 2
-
We use
simple_string
to convert non-table values to strings, so always get double quotes around strings.
With this change in place print(json(user))
returns:
{
"last": "Mouse",
"first": "Minnie",
"friends": [
"Mickey",
"Goofy"
]
}
Compact JSON
While JSON is often used in its pretty format, it is common to use a more compact format where all extra spaces and newlines are removed.
We can add a new set of formatting options for inline JSON:
options.inline_json = table_clone(options.json)
options.inline_json.indent = ''
options.inline_json.key_end = '":'
1options.inline_json.inline_spacer = ''
- 1
- In this case, we remove the inline spacer as well to make the output even more compact.
We also add the usual convenience function that packages those formatting options with table_string
:
function inline_json(tbl)
return table_string(tbl, options.inline_json)
end
If we try print(inline_json(user))
we get:
{"last":"Mouse","first":"Minnie","friends":["Mickey","Goofy"]}
This is also a valid JSON format, but it is harder to read for humans.
Debug Format
We can add a set of formatting options that makes the structure of the table explicit. This can be useful when you are trying to add a custom set of formatting options:
options.debug = table_clone(options.pretty)
options.debug = table_clone(options.pretty)
options.debug.indent = ' INDENT '
options.debug.table_begin = 'TABLE BEGIN'
options.debug.table_end = 'TABLE END'
options.debug.array_begin = 'ARRAY BEGIN'
options.debug.array_end = 'ARRAY END'
options.debug.key_begin = ' KEY BEGIN '
options.debug.key_end = ' KEY END = '
options.debug.sep = ' SEP '
options.debug.show_indices = true
As usual, we add the convenience function that packages those formatting options with table_string
:
function debug(tbl)
return table_string(tbl, options.debug)
end
If we try print(debug(user))
we get:
TABLE BEGIN
INDENT KEY BEGIN first KEY END = "Minnie" SEP
INDENT KEY BEGIN last KEY END = "Mouse" SEP
INDENT KEY BEGIN friends KEY END = ARRAY BEGIN
INDENT INDENT KEY BEGIN 1 KEY END = "Mickey" SEP
INDENT INDENT KEY BEGIN 2 KEY END = "Goofy"
INDENT ARRAY END TABLE END
Ordered Output
Lua has a single table
type. However, as talked about several times now, under the covers, Lua distinguishes between the array part of a table and any dictionary part it might contain. The elements in a Lua array are in fixed constant order so that if:
local arr = { 'a', 'b', 'c' }
Then, print(inline(arr))
will always print ['a', 'b', 'c']
.
In contrast, the element order in a general key-value table is not defined or constant. If we have:
local mouse = { first = 'Minnie', last = 'Mouse' }
Then, print(inline(mouse))
will sometimes display { last = Mouse, first = Minnie, }
, other times { first = Minnie, last = Mouse, }
.
Jumping around like that can be disconcerting.
So far, we have used the Lua standard pairs
function to traverse through the key-value pairs in all tables.
for k, v in pairs(tbl) do
...
end
Lua provides an efficient iterator function, ipairs
, specifically for arrays. We can alter our iteration based on whether the table is an array or a key-value table and get a little performance boost.
local iter = array and ipairs or pairs
for k, v in iter(tbl) do
...
end
Of course, ipairs
doesn’t solve the problem of inconsistent output for key-value tables.
Fortunately, Lua lets us define custom iterator functions, and we can create one to iterate over the keys in a consistent order.
1local iter = array and ipairs or ordered_pairs
for k, v in tbl(tbl) do
...
end
- 1
-
We have replaced the standard
pairs
iterator with a customordered_pairs
function.
We still useipairs
for arrays.
A custom iterator function is passed a table and should return the “next” key-value pair in the table. The function should return nil
if no more key-value pairs exist. You are free to determine what “next” means in this context.
Here is a simple implementation of ordered_pairs
:
local function ordered_pairs(tbl)
local keys = {}
1for k in pairs(tbl) do table.insert(keys, k) end
2table.sort(keys)
local i = 0
3return function()
4i = i + 1
5return keys[i], tbl[keys[i]]
end
end
- 1
-
We capture all the keys from
tbl
in thekeys
array. - 2
-
The default behaviour for
table.sort
is alphabetical sorting.
However,table.sort
can take a comparison function as a second argument if you want to sort the keys in a different order. - 3
-
The
ordered_pairs
function returns an iterator which is itself a function. - 4
-
The iterator function is a closure, so it has access to the
keys
and the current indexi
from the enclosing function. - 5
-
The iterator increments the index
i
and returns the corresponding key-value pair fromtbl
.
The iterator will returnnil, nil
when there are no more elements, but you could put in an explicit check oni
if you wanted to.
This version of ordered_keys
assumes that the keys are all the same type, which is too limiting. The table.sort
call will fail if they aren’t. A comparison function takes two arguments and returns true
if the first argument should come before the second. We can make a default one that works for all types:
local function compare(a, b)
local ta, tb = type(a), type(b)
if ta ~= tb then
return ta < tb
elseif ta == 'table' or ta == 'boolean' or ta == 'function' then
return tostring(a) < tostring(b)
else
return a < b
end
end
This function sorts keys first by type and then by value. We note that alphabetically, number
comes before string
, so we will see numbers before strings, which is the standard convention.
We could use this function in ordered_pairs
:
local function ordered_pairs(tbl)
...
1table.sort(keys, compare)
...
end
- 1
-
We sort the keys using the comparison function
compare
.
However, the user may want to define a custom comparison function. For example, they might want to sort the keys case-insensitively or in reverse alphabetical order.
Ideally, we want the user to be able to pass a comparison function to ordered_pairs
and have it return an iterator maker that can use that comparator to iterate over any table in a consistent order.
An extra level of indirection is required:
1local function ordered_pairs(comparator)
2if comparator == false then return pairs end
3comparator = comparator or compare
4return function(tbl)
local keys = {}
for k, _ in pairs(tbl) do table.insert(keys, k) end
5table.sort(keys, comparator)
local i = 0
6return function()
i = i + 1
return keys[i], tbl[keys[i]]
end
end
end
- 1
-
We have added a
comparator
argument, which should be a function that takes two keys and returnstrue
if the first key should come before the second. - 2
-
If
comparator
is explicitly set tofalse
, we return the standardpairs
iterator. - 3
-
If
comparator
is missing, we use thecompare
. - 4
- We return a function that takes a table and returns an iterator function for that table using the sorted keys.
- 5
-
We sort the keys using
comparator
, which will be set by now. - 6
- The iterator function is a closure with access to the sorted keys and the current index.
Adding a layer of indirection is another typical pattern in programming. Our ordered_pairs is a function that returns a function that returns a function.
|
We add a comparator
field to the options.pretty
table:
local options = {}
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
show_indices = false,
1comparator = compare
}
- 1
- We use the default comparison function unless the user specifies otherwise.
The user can set the comparator field to false if they want to use the standard pairs iterator.
|
Aside: nil
vs. false
Like many older languages, Lua treats nil
as false in a conditional test.
However, false
is a distinct value in Lua. It is a boolean that is false
in a conditional test. In Lua, nil
represents the absence of a value. false
represents a value that is explicitly false
.
Choosing to treat nil
as false
in a conditional test probably seemed convenient. It is a common idiom in many languages, particularly C, where 0
can represent false. Modern languages have moved away from this.
This conflating of nil
and false
can lead to subtle bugs. This is particularly true in Lua, where you will likely have functions with optional arguments. The common idiom for optional arguments looks like this:
local function foo(arg)
arg = arg or 'default'
print(arg)
end
If arg
is missing or nil
, it will be set to 'default'
. If arg
is explicitly false
, it will still be set to 'default'
which is probably not what you want.
Try it:
() -- prints 'default'
foo(nil) -- prints 'default'
foo('hello') -- prints 'hello'
foo1(false) -- prints 'default' foo
- 1
- This is not what you want!
From personal experience, this will bite you at some point.
You sometimes might want to distinguish between the absence of an argument and an explicitly false
argument. We can rewrite foo
to handle this:
local function foo(arg)
1if arg == false then print('false') end
arg = arg or 'default'
print(arg)
end
() -- prints 'default'
foo(nil) -- prints 'default'
foo('hello') -- prints 'hello'
foo(false) -- prints 'false' foo
- 1
-
We added a check for
arg
being explicitlyfalse
.
Ordered Output Resolved
The change to table_string
is quite small:
function table_string(tbl, opts)
...
1local iter = array and ipairs or ordered_pairs(opts.comparator)
for k, v in iter(tbl) do
...
end
...
end
- 1
-
We have replaced the
pairs
iterator withordered_pairs
using a user-defined comparison function for non-arrays.
Now if you try print(pretty(user))
you always get:
- 1
-
user
is a key-value table, and the elements are shown with the keys alphabetically. - 2
-
friends
is a sub-array with the elements shown in index order.
Inlining Simple Sub-Tables
A nice feature of some pretty-printers is the ability to inline “simple” sub-tables. This option can make the output more readable and compact.
Of course, we need to define what “simple” means. It could be a small table that fits inside a set number of characters. Or it could be a table with a certain number of elements.
For our purposes, we will consider a table “simple” if it has no sub-tables. We will also add an optional limit on the number of elements to this definition.
We can alter our metadata
function to return the number of sub-tables in a table:
local function metadata(tbl)
local size = 0
local array = true
1local subs = 0
for _, v in pairs(tbl) do
size = size + 1
if array and tbl[size] == nil then array = false end
2if type(v) == 'table' then subs = subs + 1 end
end
3local md = { size = size, array = array, subs = subs }
4return md
end
- 1
-
subs
will be the number of sub-tables. - 2
-
If we find a sub-table, we increment
subs
. - 3
- Instead of returning three values, we create a table with three fields.
- 4
- We return the metadata table.
If you haven’t seen this coding style before, the md
table is created with a table constructor. It is a shorthand way to create a table with some initial values. Assignments of the form tbl = { x = x }
look odd, but they are a common idiom in Lua. The assignment is shorthand for tbl[x] = x
where the x
key is a string, and the x
value can be any type.
We can now use the subs
field in our table_string
method to decide whether to inline a sub-table.
However, whether or not to inline simple tables should also be user-configurable. To accommodate that, we can add another field to our options table.
local options = {}
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
show_indices = false,
comparator = compare,
1inline_size = math.huge
}
options.classic = table_clone(options.pretty)
options.classic.array_begin = '{'
options.classic.array_end = '}'
2options.classic.inline_size = 0
- 1
-
A simple table will be inlined if it has no sub-tables and strictly fewer than
inline_size
elements. - 2
-
In the
classic
format, we never inline simple tables.
So, by default, simple tables are always inlined in the pretty
format and never in the classic
format. If you set inline_size
to 6
in the pretty
format, we inline simple tables if they have fewer than six elements.
Given our current setup, it only takes a small tweak to our existing code to accommodate this new feature:
function table_string(tbl, opts)
opts = opts or options.pretty
1local md = metadata(tbl)
2local size = md.size
local array = md.array
3local simple = md.subs == 0 and md.size < options.inline_size
if size == 0 then return empty_table_string(opts) end
local show_keys = not array and true or opts.show_indices
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local sep = opts.sep
4local indent = simple and '' or opts.indent
local nl = indent == '' and opts.inline_spacer or '\n'
local delims = tb ~= ''
sep = sep .. nl
if delims then tb, te = tb .. nl, nl .. te else indent = '' end
local content = ''
local i = 0
local iter = array and ipairs or ordered_pairs(opts.comparator)
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
v_string = table_string(v, opts)
v_string = indent_string(v_string, opts.indent, delims)
if delims == false and show_keys then v_string = nl .. v_string end
else
v_string = simple_string(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
- 1
-
metadata
returns a table instead of a couple of values. - 2
-
Extract the
size
andarray
values from themd
table. - 3
- If there are no sub-tables and the table is small enough, we consider it simple.
- 4
-
This is the only change needed to incorporate that new metadata about
tbl
.
Looking at print(pretty(user))
we get:
{
first = "Minnie",1
friends = [ "Mickey", "Goofy" ],
last = "Mouse" }
- 1
-
Now, the
friends
array is printed inline as it has no sub-tables.
A more interesting example is:
local matrix = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9} }
The print(classic(matrix))
gives:
{
{
1,
2,
3
},
{
4,
5,
6
},
{
7,
8,
9
} }
With our tweaks print(pretty(matrix))
yields a much more readable:
[
[ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8, 9 ] ]
And print(alt(matrix))
yields
1, 2, 3,
4, 5, 6, 7, 8, 9
Table Metadata
Our current scheme computes each table’s metadata on the fly. When we start our process with the root table, or when we recurse into a sub-table, we have the call to compute the metadata for the table that is currently under the microscope:
function table_string(tbl, opts)
opts = opts or options.pretty
1local md = metadata(tbl)
local size = md.size
...
- 1
-
The current table of interest is
tbl
.
md(tbl)
returns a metadata table fortbl
.
However, tables can reference other tables and even have references to themselves. For example, we might build a website with Disney characters and have a gallery where visitors can flip from one star to the next and back to the previous one, etc.
A doubly linked list is one data structure to model this type of interaction. In the most dumbed down, minimal version, we might have:
local stars =
{
c1 = { first = "Mickey", last = "Mouse" },
c2 = { first = "Minnie", last = "Mouse" }
}
stars.c1.next = stars.c2
stars.c2.prev = stars.c1
stars.home = stars
Here, c1
, c2
, … are characters. Each has a table of associated data (more realistically, a table of image links and the like).
The characters are connected by their next and previous links. To cap it all, we have a “home” link back to the original table — a self-reference.
If you try print(pretty(stars))
with our current implementation, the program will chase its tail and die of pure embarrassment at the rubbish state of table_string
.
Before we get to that, we will first alter our metadata
function significantly.
Instead of treating each table as it comes along and passing back some associated metadata, we will view the table as a whole entity in one go.
Our current metadata(tbl)
returns md
, a table with three fields, size
, array
and simple
, that tell you something about tbl
.
In our new implementation, metadata(tbl)
will return md
as a table of tables. If t
is tbl
itself or any sub-table of tbl
, then
Field | Description |
---|---|
md[t].size |
The number of top-level elements in t . |
md[t].array |
This will be true if t is a Lua array, otherwise false . |
md[t].subs |
The number of sub-tables in t . |
Here is what our new call-it-once-and-be-done metadata
function looks like:
1local function metadata(tbl, md)
2md = md or {}
3md[tbl] = {}
local size, array, subs = 0, true, 0
for _, v in pairs(tbl) do
size = size + 1
if array and tbl[size] == nil then array = false end
if type(v) == 'table' then
subs = subs + 1
4if not md[v] then metadata(v, md) end
end
end
5md[tbl].size = size
md[tbl].array = array
md[tbl].subs = subs
return md
end
- 1
-
We’ve added
md
to the calling signature. It will be missing on the first call. - 2
-
If
md
is completely missing, we set it up as an empty table. - 3
-
We set up
md[tbl]
as an empty sub-table ofmd
. - 4
-
As we iterate through
tbl
, we may come across a new sub-tablev
, which is handled by recursion. - 5
-
Record the three bits of metadata for
tbl
in themd[tbl]
sub-table.
To use this new metadata
method, we also need to alter table_string
. That can be done a couple of different ways. One way to go is to make table_string
a little wrapper around a recursive closure that does most of the work:
1function table_string(root_tbl, opts)
opts = opts or options.pretty
2local md = metadata(root_tbl)
3local function process(tbl)
4local size = md[tbl].size
if size == 0 then return empty_table_string(opts) end
local array = md[tbl].array
local show_keys = not array and true or opts.show_indices
local simple = md[tbl].subs == 0 and size < opts.inline_size
local indent = simple and '' or opts.indent
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
local nl = indent == '' and opts.inline_spacer or '\n'
local sep = opts.sep .. nl
local delims = tb ~= ''
if delims then tb, te = tb .. nl, nl .. te else indent = '' end
local content = ''
local i = 0
local iter = array and ipairs or ordered_pairs(opts.comparator)
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
5v_string = process(v)
v_string = indent_string(v_string, opts.indent, delims)
if delims == false and show_keys then v_string = nl .. v_string end
else
v_string = simple_string(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
6local retval = process(root_tbl)
return retval
end
- 1
-
Now,
table_string
is primarily a wrapper around the innerprocess
function.
We have changed the first argument toroot_tbl
to clarify that this is the root table. - 2
-
We compute the root table
root_tbl
metadata and store it inmd
. - 3
-
The
process
function is a closure and can access the enclosedmd
table. - 4
-
md[tbl]
is a sub-table, currently with three fields,size
,array
andsimple
. - 5
-
If we hit a sub-table, we recurse using
process
. Themd
table does not need recomputing and continues to be available as we processv
. - 6
-
Most of the source lines in
table_string
are in the privateprocess
sub-function. We havemd
and get the ball rolling by runningprocess
onroot_tbl
.
Cyclical References
If we look at a simple linked list example:
local stars =
{
c1 = { first = "Mickey", last = "Mouse"},
c2 = { first = "Minnie", last = "Mouse"},
}
stars.c1.next = stars.c2
Then print(pretty(stars))
returns:
{
c1 =
{
next = {
first = Minnie,
last = Mouse
},
first = Mickey,
last = Mouse
},
c2 = {
first = Minnie,
last = Mouse
} }
We see two definitions of c2
!
One is in the next
field for c1
and another when we get to c2
by itself. That’s not ideal.
Things get worse if we use a doubly linked list by adding:
stars.c2.prev = stars.c1
Now, when we try print(pretty(stars))
the program will crash with a message like
1
/path/to/script: stack overflow
stack traceback:
/path/to/script:49: in function 'table_size_and_type'
/path/to/script:98: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'2
... (skipping 58803 levels)
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'
/path/to/script: in function 'table_string'3 (...tail calls...)
- 1
- Lua’s interpreter has run out of room.
- 2
- That’s a lot of skipping!
- 3
- It’s more like tail chasing in this instance!
It is easy to see what the issue is. When we convert c1
to a string, it encounters a sub-table c2
. Our function then calls itself with a request to convert c2
to a string. That call, in its turn, will encounter c2.prev = c1
and see that c1
is a table. It handles that by calling itself with a request to convert c1
to a string. And round and round we go!
Our current solution doesn’t handle tables with shared references well. Even if it manages to complete, the shared table will be defined multiple times. The situation is even worse if there are cycles to be navigated. Those cause the program to crash with a stack overflow,
Lua makes it very easy to have tables with multiple references and cycles. Under the covers, the assignment c2.prev = c1
sets up another pointer to c1
. No copying is done; everything is very efficient.
That’s great for many algorithms you might use beyond the most straightforward, plain old data tables. We still need to examine and view those tables without crashes.
Crash Proofing
The key to handling tables with cycles and shared references is marking those tables we have already put out a full string definition for. If we see those marked tables again, we can do something more sensible than trying to define them again and potentially going around in circles.
Our metadata
function returns a metadata table for each table and sub-table it encounters. Currently, there are just three fields in that metadata table: size
, array
, and simple
. We can add a fourth field, processed
, that will be true
if we have already seen and processed that table. If the processed
field is true
, we can print a simple reference to the table instead of trying to define it again. If the field is missing, we can define the table as we do now.
Here is what the table_string
function looks like with the processed
field added:
function table_string(root_tbl, opts)
opts = opts or options.pretty
local md = metadata(tbl)
local function process(tbl)
1md[tbl].processed = true
...
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
if md[v].processed then
2v_string = simple_string(v)
else
3v_string = process(v)
v_string = indent_string(v_string, opts.indent, delims)
if delims == false and show_keys then v_string = nl .. v_string end
end
...
end
return tb .. content .. te
end
local retval = process(root_tbl)
return retval
end
- 1
-
We are about to process
tbl
, so we mark it as processed in case it has a self-reference. - 2
-
We have seen
v
before and can do something else instead of recursing.
Here, we print a reference to the table’s address. - 3
-
Recurse into
v
and build up a complete definition for it.
Now, if you try print(pretty(stars))
on our doubly linked list of stars, you get something like this:
{
c1 = {
first = "Mickey",
last = "Mouse",
next = {
first = "Minnie",
last = "Mouse",
prev = 0x600002ec0ec0
}
},
c2 = 0x600002ec0f00, }
- The shared references are just table addresses, which isn’t user-friendly but better than crashing!
We can even add a self-reference to the stars
table like this:
stars.home = stars
Then print(pretty(stars))
yields:
{
c1 = {
first = "Mickey",
last = "Mouse",
next = {
first = "Minnie",
last = "Mouse",
prev = 0x6000012ecec0
}
},
c2 = 0x6000012ecf00,
home = 0x6000012ece80 }
Paths
That output is not very user-friendly.
How should we see those references? Ideally, we should see an understandable description of the reference.
Every table has a unique address in Lua, which we could use. However, as we saw above, that’s not very user-friendly. We could use the key in the table that points to the shared table. That is better, but still not great. We could use a path to the table from the top-level root table. This is the best option.
Then, in the case where there is no self-reference, we might see:
{
c2 = {
first = Minnie,
prev = {
first = Mickey,1
next = <c1>,
last = Mouse
},
last = Mouse
},2
c1 = <c2.prev> }
- 1
-
The value of
next
refers to the table at the pathc1
. - 2
-
The value of
c1
refers to the table at the pathc2.prev
.
If the root table is tbl
, then the path "<foo.bar.baz>"
refers to the value tbl.foo.bar.baz
. Thus, foo
is a sub-table of tbl
, bar
is a sub-table of foo
, and baz
is a value in bar
.
If there is a self-reference, such as stars.home = stars
, we might see:
1
<table> = {
c2 = {
first = Minnie,
prev = {
first = Mickey,2
next = <c1>,
last = Mouse
},
last = Mouse
},
c1 = <c2.prev>,3
home = <table> }
- 1
-
We only put out the
<table> = ...
line if there is a self-reference. - 2
-
We could use the full path,
<table.c1>
, here, but that is generally overkill. - 3
-
The value of
home
refers to the table itself.
In this representation, there are some obvious user-settable options: - The string used for the root table if there are any top-level self-references. In the example, we use table
for that. - The separator to use in the path string to sub-sub-tables etc. In the example, we use"."
. - Perhaps the delimiters to use for path strings, which in the example are <
and >
.
Let’s add those to our options.pretty
table:
local options = {}
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
show_indices = false,
comparator = compare,
inline_size = math.huge,
1path_root = 'table',
2path_sep = '.',
3path_begin = '<',
path_end = '>'
}
- 1
- The string for the root table if there are any top-level self-references.
- 2
- The separator used in the path string to sub-sub-tables, etc.
- 3
- The delimiters used for the path string.
With that in place, we can modify the table_string
function as follows:
function table_string(root_tbl, opts)
opts = opts or options.pretty
local md = metadata(root_tbl)
1local function process(tbl, path)
2md[tbl].path = path
local size = md[tbl].size
if size == 0 then return empty_table_string(opts) end
local array = md[tbl].array
local show_keys = not array and true or opts.show_indices
local simple = md[tbl].subs == 0 and size < opts.inline_size
local indent = simple and '' or opts.indent
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local kb, ke = opts.key_begin, opts.key_end
3local pb, pe = opts.path_begin, opts.path_end
local nl = indent == '' and opts.inline_spacer or '\n'
local sep = opts.sep .. nl
local delims = tb ~= ''
if delims then tb, te = tb .. nl, nl .. te else indent = '' end
local content = ''
local i = 0
local iter = array and ipairs or ordered_pairs(opts.comparator)
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
if md[v].path then
4v_string = pb .. md[v].path .. pe
else
5local v_path = path .. opts.path_sep .. tostring(k)
v_string = process(v, v_path)
6v_string = indent_string(v_string, opts.indent, delims)
if delims == false and show_keys then v_string = nl .. v_string end
end
else
v_string = simple_string(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
7local retval = process(root_tbl, opts.path_root)
return retval
end
- 1
-
We have added an extra
path
argument. - 2
-
We record the path to this table
tbl
as the value under the metadata keypath
inmd[tbl]
. - 3
- Localise the path-begin and path-end delimiters.
- 4
-
If we have seen
v
before, we use the path string we stored inmd
forv
, formatted with the delimiters. - 5
-
v
is a new table, so we need a path tov
, which we get by appending the keyk
to the current path. - 6
-
We recurse processing the contents of
v
using that new path string. - 7
- Kick off the process with the root table and path.
Now, if you try print(pretty(stars))
on our doubly linked list of stars, we get:
{
c1 = {
first = Mickey,
last = Mouse,
next = {
first = Minnie,
last = Mouse,1
prev = <table.c1>
}
},
c2 = <table.c1.next>,2
home = <table> }
- 1
-
The value of
prev
refers to the pathtable.c1
. - 2
-
The value of
home
refers to the table itself.
In a reference like <table.c1.next>
, the root path prefix table.
isn’t necessary. We will remove it in the next iteration.
Complete self-references like our home = <table>
line are uncommon, but we would like to have that <table>
defined if it does occur. Something along these lines:
<table> = {
... }
However, that extra <table> =
should only be present if there is a self-reference.
We can alter table_string
as follows:
function table_string(root_tbl, opts)
opts = opts or options.pretty
local md = metadata(root_tbl)
1local root = root_tbl
2local root_ref = false
3local kb, ke = opts.key_begin, opts.key_end
local pb, pe = opts.path_begin, opts.path_end
local function process(tbl, path)
md[tbl].path = path
4local path_prefix = path == opts.path_root and '' or path .. opts.path_sep
local size = md[tbl].size
if size == 0 then return empty_table_string(opts) end
local array = md[tbl].array
local show_keys = not array and true or opts.show_indices
local simple = md[tbl].subs == 0 and size < opts.inline_size
local indent = simple and '' or opts.indent
local tb = array and opts.array_begin or opts.table_begin
local te = array and opts.array_end or opts.table_end
local nl = indent == '' and opts.inline_spacer or '\n'
local sep = opts.sep .. nl
local delims = tb ~= ''
if delims then tb, te = tb .. nl, nl .. te else indent = '' end
local content = ''
local i = 0
local iter = array and ipairs or ordered_pairs(opts.comparator)
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
if md[v].path then
v_string = pb .. md[v].path .. pe
5if v == root then root_ref = true end
else
6local v_path = path_prefix .. tostring(k)
v_string = process(v, v_path)
v_string = indent_string(v_string, opts.indent, delims)
if delims == false and show_keys then v_string = nl .. v_string end
end
else
v_string = simple_string(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
return tb .. content .. te
end
local retval = process(root_tbl, opts.path_root)
7if root_ref then
retval = pb .. opts.path_root .. pe .. ' = ' .. retval
end
return retval
end
- 1
-
We capture the root table in
root
. - 2
-
We capture whether there is a self-reference to the root table in
root_ref
. - 3
-
Localise some delimiters that never vary by context (hoist these constant lines from the
process
function). - 4
- If this is not the root table, we will prepend any new path with a path prefix.
- 5
-
We record the self-reference to the root table if
v
is the root table. - 6
-
We prepend the path with the path prefix if
tbl
is not the root table. - 7
-
If there is a self-reference to the root table, we prepend the return string with
<table> =
.
Here’s the output from the latest version of print(pretty(stars))
:
1
<table> = {
c1 = {
first = Mickey,
last = Mouse,
next = {
first = Minnie,
last = Mouse,
prev = <c1>
}
},2
c2 = <c1.next>,3
home = <table> }
- 1
-
There is a self-reference to the
stars
parent table, so we have prepended the string with<table> =
. - 2
-
This looks better than
<table.c1.next>
. - 3
- Here is the self-reference to the root table, which reads quite naturally.
If we remove the stars.home = stars
assignment then print(pretty(stars))
returns:
1
{
c1 = {
next = {
first = Minnie,
prev = <c1>,
last = Mouse
},
first = Mickey,
last = Mouse
},
c2 = <c1.next> }
- 1
-
There is no self-reference, so we do not need that
<table> =
we saw earlier.
Breadth First Traversal
While that last output is undoubtedly valid, it fails the readability test.
That c2 = <c1.next>
is perfectly correct, but you have to go back and find the definition of c1
to understand what c1.next
actually is. It would be much better to see the definition of c2
right there, not nested inside c1
. We are after something that looks like this:
{
c1 =
{
first = Minnie,
last = Mouse,
next = <c2>
},
c2 =
{
first = Mickey,
last = Mouse,
prev = <c1>
},
home = <table> }
We would like to see the full definition of tables at the shallowest possible depth.
The root problem is that we are traversing tables depth-first.
We process all the elements in c1
before getting to c2
. So when we see c1.next
, we print the full definition of what c2
really is. Then, later, when we get to c2
, we see that we have already processed it and output it as a reference to <c1.next>
. That is ass-backwards and c1.next
should be the reference to <c2>
, and the definition of c2
should be deferred to later.
All the table-to-string implementations that are available on the web seem to have this problem, The depth-first traversal is a natural choice, but it doesn’t provide the most readable output. |
We need to change the table traversal to be breadth-first. Then, we process the elements of tbl
in the order they appear at the top level. If we encounter a sub-table, we will defer turning it to a string until after processing all the top-level elements.
To demonstrate, let’s see how breadth first traversal works for the simpler metadata
method:
local function metadata(tbl, md)
md = md or {}
md[tbl] = {}
local size, array, subs = 0, true, 0
1local children = {}
for _, v in pairs(tbl) do
size = size + 1
if array and tbl[size] == nil then array = false end
if type(v) == 'table' then
subs = subs + 1
2if not md[v] then table.insert(children, v) end
end
end
md[tbl].size, md[tbl].array, md[tbl].subs = size, array, subs
3for _, child in ipairs(children) do metadata(child, md) end
return md
end
- 1
- We keep a list of the sub-tables we encounter.
- 2
- If we encounter a sub-table, we add it to the list of children and defer immediate processing.
- 3
- After processing all the top-level elements, we then process the children.
Changing the processing order in metadata
doesn’t change the output. print(pretty(metadata(stars)))
still gives:
<table> = {
c1 = {
first = "Mickey",
last = "Mouse",
next = {
first = "Minnie",
last = "Mouse",
prev = <c1>
}
},
c2 = <c1.next>,
home = <table> }
We need to apply similar changes to the more complex table_string
function:
function table_string(root_tbl, opts)
...
local function process(tbl, path)
...
local children = {}
...
for k, v in iter(tbl) do
...
if type(v) == 'table' then
if md[v].path then
v_string = pb .. md[v].path .. pe
if v == root then root_ref = true end
else
local v_path = path_prefix .. tostring(k)
v_string = simple_string(v)
md[v].path = v_path
children[v] = v_path
if delims == false and show_keys then v_string = nl .. v_string end
end
else
v_string = v_string .. simple_string(v)
end
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
local retval = tb .. content .. te
for child_table, child_path in pairs(children) do
local child_string = process(child_table, child_path)
child_string = indent_string(child_string, opts.indent, delims)
retval = retval:gsub(simple_string(child_table), child_string)
end
return retval
end
local retval = process(root_tbl, opts.path_root)
if root_ref then retval = pb .. opts.path_root .. pe .. ' = ' .. retval end
return retval
end
With that change, print(pretty(stars))
now gives:
<table> = {
c1 = {
first = "Mickey",
last = "Mouse",
next = <c2>
},
c2 = {
first = "Minnie",
last = "Mouse",
prev = <c1>
},
home = <table> }
Arrays
That last table is very readable. Every shared reference like c1.next = <c2>
has an easily identifiable right-hand side value, the value associated with the key c2
in this case.
However, we have gone to some lengths to suppress showing explicit keys for Lua tables that happen to be arrays. If we have an array of arrays with shared references, the paths will lack clarity.
For example, perhaps you are coding a Cludeo-type murder mystery game set in a big house with many rooms stored as an array. Each room might have a potential murder weapon in it:
local rooms = {
{ name = "Library", weapon = "Lead Pipe" },
{ name = "Kitchen", weapon = "Knife" },
{ name = "Lounge", weapon = "Poison" },
{ name = "Bedroom", weapon = "Garrotte" }
}
The user will move from room to room in a fashion that might be randomly generated or set by the game’s storyline. To keep it simple, we add next
and prev
fields to each room as follows:
rooms[1].next, rooms[2].next, rooms[3].next, rooms[4].next = rooms[2], rooms[3], rooms[4], rooms[1]
rooms[1].prev, rooms[2].prev, rooms[3].prev, rooms[4].prev = rooms[4], rooms[1], rooms[2], rooms[3]
Now if we print(pretty(rooms))
we get:
[
{
name = "Library",
next = <2>,
prev = <4>,
weapon = "Lead Pipe"
},
{
name = "Kitchen",
next = <3>,
prev = <1>,
weapon = "Knife"
},
{
name = "Lounge",
next = <4>,
prev = <2>,
weapon = "Poison"
},
{
name = "Bedroom",
next = <1>,
prev = <3>,
weapon = "Garrotte"
} ]
rooms
is an array printed without showing the indices. The problem is that path references like next = <1>
don’t make much sense.
If the value associated with an index is shared, we want to see that index explicitly.
The current implementation makes this difficult. The main loop in table_string
looks like this:
...
for k, v in iter(tbl) do
i = i + 1
local k_string = show_keys and kb .. tostring(k) .. ke or ''
local v_string = ''
if type(v) == 'table' then
local k_string = show_keys and kb .. tostring(k) .. ke or ''
...
We are creating the key string k_string
before we know whether the associate value v
is a table, let alone a shared table. We also put out the key-value pair at one depth, but any shared reference may be at a different depth.
The solution is two-fold. First, add a new metadata field, refs
, for each table and sub-table. md[t].refs
will be the number of references seen for the table t
. If md[t].refs
is greater than 1
, then t
is a shared table.
We can compute the reference count field using the metadata
method. We also switch the style of the function to having an inner hidden process
closure that does all the work. Tables are still getting traversed depth-first.
local function metadata(root_tbl)
1local md = {}
2md[root_tbl] = { refs = 1 }
3local function process(tbl)
local size, array, subs = 0, true, 0
local children = {}
for _, v in pairs(tbl) do
size = size + 1
if array and tbl[size] == nil then array = false end
if type(v) == 'table' then
subs = subs + 1
if md[v] then
4md[v].refs = md[v].refs + 1
else
5table.insert(children, v)
6md[v] = { refs = 1 }
end
end
end
md[tbl].size, md[tbl].array, md[tbl].subs = size, array, subs
7for _, child in ipairs(children) do process(child) end
end
8(root_tbl)
processreturn md
end
- 1
-
We set up the metadata table that will be accessible inside the
process
closure. - 2
- We immediately add an entry for the root table as it might be referenced by its immediate children
- 3
-
process
is the recursive function that does all the heavy lifting. - 4
-
If we’ve seen
v
before, we increment its reference count. - 5
-
Otherwise we add
v
to the list of sub-tables to process later. - 6
-
We add a metadata entry for
v
here in case it is referenced by an immediate sibling. - 7
- Go ahead and process the granchildren etc.
- 8
- We kick things off by processing the root table.
Of course, we must tweak our table_string
method:
function table_string(root_tbl, opts)
...
local function process(tbl, path)
...
for k, v in iter(tbl) do
i = i + 1
1local show_key = show_keys
local v_string = ''
if type(v) == 'table' then
if md[v].path then
v_string = pb .. md[v].path .. pe
if v == root then root_ref = true end
else
2if md[v].refs > 1 then show_key = true end
local v_path = path_prefix .. tostring(k)
v_string = simple_string(v)
md[v].path = v_path
children[v] = v_path
if delims == false and show_key then v_string = nl .. v_string end
end
else
v_string = v_string .. simple_string(v)
end
3local k_string = show_key and kb .. tostring(k) .. ke or ''
content = content .. indent .. k_string .. v_string
if i < size then content = content .. sep end
end
...
end
...
end
- 1
-
By default we show this key based on the value of
show_keys
. - 2
-
If
v
is new and has a reference count greater than1
, we will show the corresponding key whether or notshow_keys
isfalse
. We must do that so that any path references tov
make sense. - 3
- Now that we know the state of play, we can finally set the string for this key.
With this change, print(pretty(rooms))
gives:
[
1 = {
name = "Library",1
next = <2>,
prev = <4>,
weapon = "Lead Pipe"
},
2 = {
name = "Kitchen",
next = <3>,
prev = <1>,
weapon = "Knife"
},
3 = {
name = "Lounge",
next = <4>,
prev = <2>,
weapon = "Poison"
},
4 = {
name = "Bedroom",
next = <1>,
prev = <3>,
weapon = "Garrotte"
} ]
- 1
-
The path reference
<2>
now makes perfect sense.
Here’s what we get for print(alt(rooms))
:
1:
name: "Library",
next: <2>,
prev: <4>,
weapon: "Lead Pipe",
2:
name: "Kitchen",
next: <3>,
prev: <1>,
weapon: "Knife",
3:
name: "Lounge",
next: <4>,
prev: <2>,
weapon: "Poison",
4:
name: "Bedroom",
next: <1>,
prev: <3>, weapon: "Garrotte"
This output is also very readable.
One Small Tweak
Our current definition of a “simple” table is one that has no sub-tables. But what is a sub-table?
We can very slightly alter our metadata function to not count path references as distinct sub-tables.
local function metadata(root_tbl)
...
local function process(tbl)
...
for _, v in pairs(tbl) do
...
if type(v) == 'table' then
1-- subs = subs + 1
if md[v] then
md[v].refs = md[v].refs + 1
else
2subs = subs + 1
table.insert(children, v)
md[v] = { refs = 1 }
end
end
end
...
end
end
- 1
- We move this line
- 2
- to here.
With that change only “real” sub-tables count towards the sub
total.
print(pretty(rooms))
now gives the more compact but still readable:
[
1 = { name = "Library", next = <2>, prev = <4>, weapon = "Lead Pipe" },
2 = { name = "Kitchen", next = <3>, prev = <1>, weapon = "Knife" },
3 = { name = "Lounge", next = <4>, prev = <2>, weapon = "Poison" },
4 = { name = "Bedroom", next = <1>, prev = <3>, weapon = "Garrotte" } ]
Scribe Facade
Introduction
After the first attempt at table_string(tbl)
, we commented that, while the method name was descriptive, we needed to check that the tbl
argument is an actual table.
Instead of doing that, we will create another “facade” function scribe
that will return a string for any Lua object. The user will call this function, and we will make table_string
a private function only called by scribe
when the object is a table. Currently, our table_string
function starts as follows:
1function table_string(root_tbl, opts)
2opts = opts or options.pretty
local md = metadata(root_tbl)
...
end
- 1
-
table_string
is a global function that is available to the user. - 2
-
It has to check if
opts
is provided; if not, set it to the defaultoptions.pretty
.
We will change this to:
- 1
-
We make
table_string
a local function. - 2
-
We remove the
opts
check as we know thatscribe
will always provide it.
In a later chapter, we will discuss the difference between global and local functions. |
In the meantime, we introduce scribe
as follows:
1function scribe(obj, opts)
2if type(obj) ~= 'table' then return simple_string(obj) end
3opts = opts or options.pretty
4return table_string(obj, opts)
end
- 1
-
obj
can be any Lua object andopts
is an optional table of opts. - 2
-
We handle non-table objects up-front by calling
simple_string
. - 3
-
We set the
opts
to the defaultoptions.pretty
if it is not provided. - 4
-
If we get here, we know that
obj
is a table, so we call the privatetable_string
method to convert it to a string.
Of course, our other public facade functions will also call scribe
instead of table_string
directly. For example, pretty_string
will now look like this:
function pretty_string(tbl, opts)
1return scribe(tbl, options.pretty)
end
- 1
-
We call
scribe
with theoptions.pretty
table.
Health and Safety
We have now added a layer of protection to our table_string
function by ensuring that it is only called by scribe
when the object is a table.
However, we still need to check that the opts
table is complete. Each of those many fields in the options table must be present, or table_string
will fail.
Of course, we are sure that the standard options tables we provide are complete, but what if the user provides their own options table?
We start by adding a “marker” to our own options tables to indicate that they are complete:
local options = {}
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
show_indices = false,
comparator = compare,
inline_size = math.huge,
path_root = 'table',
path_sep = '.',
path_begin = '<',
path_end = '>',
1COMPLETE = true
}
- 1
- If the user provides their own options table, we will check for the presence of this field to determine if it is complete.
We also add a function that adds missing fields to an options table:
1local function complete_options_table(options, from)
for k, v in pairs(from) do
2if options[k] == nil then options[k] = v end
end
end
- 1
-
This function takes two arguments: the
opts
table to complete and thefrom
table to use as a template. - 2
-
We add missing fields from the
from
table to theopts
table.
complete_options_table
is a private function that is only called by scribe
, so we are sure that edge cases are handled correctly. For example, we can be confident that there will be two arguments and that the second argument will be a complete options table.
We call this function scribe
:
function scribe(obj, opts)
if type(obj) ~= 'table' then return simple_string(obj) end
1opts = opts or options.pretty
2if not opts.COMPLETE then
3local from = opts.indent == '' and options.inline or options.pretty
4(options, from)
complete_options_tableend
5return table_string(obj, opts)
end
- 1
-
If the user does not provide any options table, we use the
options.pretty
table. - 2
-
If the user provides a custom options table, we ensure it’s complete before calling
table_string
. - 3
-
We use the
options.inline
table if theindent
field is empty. Otherwise, we use theoptions.pretty
table. - 4
-
We call the
complete_options_table
function to add any missing fields to theopts
table. - 5
-
We can safely call
table_string
with the completeopts
table.
Adding that COMPLETE
field to our options tables can avoid most performance issues and ensure that our code is robust.
There is a caveat to this approach. If the user provides their own incomplete options table, then the first time we see it, we alter it. Generally, changing things under-the-covers is a bad idea, but in this case, the user will only see the performance hit once. All in all, it is a reasonable trade-off. |
Here is an example of how the user can provide their own minimal options table that sets the indent to two spaces:
local my_options = { indent = ' ' }
local user =
{
first = "Minnie",
last = "Mouse",
friends = { "Mickey", "Goofy" }
}
print(scribe(user, my_options))
This will output:
{
first = "Minnie",
last = "Mouse",
friends = { "Mickey", "Goofy" }
}
The my_options
table is complete as far as table_string
is concerned. We can inspect it by a call to print(classic(my_options))
which will output:
{1
COMPLETE = true,
array_begin = "[",
array_end = "]",2
comparator = <function>,3
indent = " ",4
inline_size = inf,
inline_spacer = " ",
key_begin = "",
key_end = " = ",
path_begin = "<",
path_end = ">",
path_root = "table",
path_sep = ".",
sep = ",",
show_indices = false,
table_begin = "{",
table_end = "}" }
- 1
-
The
COMPLETE
field is present and set totrue
,; all the other fields are present and mostly set to the default values from theoptions.pretty
table. - 2
-
The
comparator
field is shown as<function>
. - 3
-
The
indent
field is set to two spaces as provided by the user. - 4
-
inf
means infinity, accessible in Lua asmath.huge
.
The next time we call scribe
with the my_options
table, it will be complete and we will not have to call complete_options_table
again.
Overrides
We also want to allow the user to override one or more options in any of the pre-canned options tables.
The signature of your main scribe
function will now look like this:
1function scribe(obj, options, overrides)
...
end
- 1
-
We add a third argument,
overrides
, which is an optional table of options to override.
Now, both the second opts
argument and the third overrides
argument are optional. A moment’s thought will convince you that if the opts
argument is missing, the overrides
argument is also.
Here is the full scribe
function:
function scribe(obj, options, overrides)
1if type(obj) ~= 'table' then return simple_string(obj) end
2if options == nil then return table_string(obj, options.pretty) end
3if not opts.COMPLETE then
local from = opts.indent == '' and options.inline or options.pretty
(options, from)
complete_options_tableend
4if overrides == nil then return table_string(obj, opts) end
if not overrides.COMPLETE then complete_options_table(overrides, opts) end
5return table_string(obj, overrides)
end
- 1
- As usual, we handle non-table objects up-front.
- 2
-
If the user does not provide an
opts
table, we use theoptions.pretty
table and are done. - 3
-
We complete an incomplete
opts
table if the user provides it. - 4
-
If the user does not provide an
overrides
table, we use theopts
table and are done. - 5
-
If the user provides an
overrides
table, we complete it from theopts
table and use it.
By the time we get here, we can be sure that theopts
table is complete.
We also alter the facade functions to permit an overrides
table. For example:
function pretty_string(tbl, overrides)
1return scribe(tbl, options.pretty, overrides)
end
- 1
-
The main options table is
options.pretty
, and we also pass along any user-providedoverrides
table.
Here is an example of how the user can provide their own options table and override the indent
field:
local user =
{
first = "Minnie",
last = "Mouse",
friends = { "Mickey", "Goofy" }
}
print(classic(user, { indent = ' ' }))
Output:
{
first = "Minnie",
last = "Mouse",
friends = { "Mickey", "Goofy" } }
Metamethods
We mentioned that any Lua table can have an associated metatable
The metatable is a regular table with arbitrary data and methods like any other table. However, if a table tbl
has a metatable mt
, Lua will check for specially named methods, metamethods, in mt
and use those in place of its built-in default operations.
Metamethods, particularly the __index
metamethod, are the keys to understanding how to use prototype and object-oriented methodologies in Lua. However, that isn’t the topic for today.
The one metamethod that interests us here is the __tostring
function. (All Lua’s metamethods start with double underscores).
Here’s an example where we create a metatable with a __tostring
method inside it:
1local count = 0
2local mt = {}
3function mt.__tostring(tbl)
count = count + 1
4return 'This is print number: ' .. tostring(count) .. ' for an array of size: ' .. #tbl
end
- 1
-
count
will get incremented every time the__tostring
metamethod is called. - 2
-
mt
is just a regular empty Lua table. - 3
-
We add a function
__tostring
tomt
. - 4
-
Every time
mt.__tostring
is called, we incrementcount
and return a string with the latest count.
You will frequently see the equivalent definition:
mt.__tostring = function(tbl)
count = count + 1
return 'This is print number: ' .. tostring(count) .. ' for an array of size: ' .. #tbl
end
The former style is more in keeping with most other programming languages. If you plan on expanding your horizons beyond Lua, stick with that look. However, both styles are perfectly acceptable and produce identical byte code.
For this metamethod to have any effect, we must attach its containing metatable to a Lua table using the setmetatable
method:
local arr = { 1, 2, 3 }
setmetatable(arr, mt)
If you just give arr a __tostring method directly, Lua will not make any redirection calls to it. For Lua to see a metamethod, you must put it in a metatable and attach the metatable to the parent object. The setmetatable call endows tbl with a hidden metatable. The existence of that metatable is what triggers Lua to redirect some of its operations to your custom definitions. Just adding metamethods directly to a table does nothing.
|
Let’s exercise that metamethod:
print(tostring(arr))
print(tostring(arr))
print(tostring(arr)) print(tostring(arr))
This yields:
This is print number: 1 for an array of size 3
This is print number: 2 for an array of size 3
This is print number: 3 for an array of size 3 This is print number: 4 for an array of size 3
The built-in tostring
method now redirects calls to the mt.__tostring
method. If we remove the metatable:
setmetatable(tbl, nil)
Then tostring(tbl)
reverts to something like:
table: 0x15f852480
Well, suppose the user is sophisticated enough to have added a custom __tostring
metamethod to return a custom string for a particular table or class of tables. In that case, we should honour their effort by using that method.
We can add a call to the top of table_string
to check for a custom __tostring
metamethod and, if present, use that instead of our paltry efforts.
However, it is best to make that optional, which we do by adding a field to our options table:
local options = {}
options.pretty = {
indent = ' ',
table_begin = '{',
table_end = '}',
array_begin = '[',
array_end = ']',
key_begin = '',
key_end = ' = ',
sep = ',',
inline_spacer = ' ',
show_indices = false,
comparator = compare,
inline_size = math.huge,
path_root = 'table',
path_sep = '.',
path_begin = '<',
path_end = '>',
1use_metatable = true,
COMPLETE = true
}
- 1
-
If
true
and if there is a custom__tostring
metamethod, then we redirect the table conversion to that method.
With that change, the top of the table_string
looks like this:
local function table_string(root_tbl, opts)
...
local function process(tbl, path)
1if opts.use_metatable then
2local mt = getmetatable(tbl)
3if mt and mt.__tostring then return mt.__tostring(tbl) end
end
...
- 1
- Check whether we are allowed to use metamethods.
- 2
-
Check whether
tbl
has a metatable. - 3
-
If
tbl
has an associated__tostring
metamethod, invoke it and return early.
For example, if:
local count = 0
local mt = {}
function mt.__tostring(tbl)
count = count + 1
return 'This is print number: ' .. tostring(count) .. ' for a table of size: ' .. #tbl
end
local tbl = { 1, 2, 3 }
setmetatable(tbl, mt)
Then print(pretty(tbl))
yields:
This is print number: 1 for a table of size: 3
Why Optional?
Can you guess why we made using any custom __tostring
metamethod controllable as a format option? When wouldn’t we want to use it?
Metamethods like __tostring
are usually attached to a whole class of tables instead of a particular instance. The method might do something specific to the class as a whole and then defer much of the work back to scribe
to convert the instance data to a string.
You then run into the danger of chasing your tail. The custom __tostring
method calls table_string
, which then calls the __tostring
method and so on, ad infinitum!
In this case, we must set the opts.use_metatable
to false
to break the cycle.
Here’s an example:
local count = 0
local mt = {}
function mt.__tostring(tbl)
count = count + 1
1local tbl_options = { use_metatable = false }
local tbl_string = inline(tbl, tbl_options)
return 'Print: ' .. tostring(count) .. ' for table: ' .. tbl_string
end
- 1
- With this override, the following line will cause a stack overflow.
Then:
local tbl = { 1, 2, 3 }
setmetatable(tbl, mt)
print(pretty(tbl))
print(pretty(tbl))
print(pretty(tbl))
Yields:
Print: 1 for table: [ 1, 2, 3 ]
Print: 2 for table: [ 1, 2, 3 ] Print: 3 for table: [ 1, 2, 3 ]
The scribe
Module
In Lua, if you have a file where you set:
answer = 42
You are creating a global variable answer
with the value 42
. This means that answer
is available to all other Lua files that are loaded after this one.
On the other hand, if you write:
local answer = 42
You are creating a local variable answer
that is only available in the current file.
The same thing applies to functions. If you write:
function bump(a)
answer = 42
return a + answer
end
Then bump
is a global function that can be called from any other Lua file. Moreover, even though answer
is set in the bump
function, it is a global variable that can be accessed and modified from anywhere.
On the other hand, if you write:
local function bump(a)
local answer = 42
return a + answer
end
Then bump
is a local function that can only be called from within the current file. answer
is a local variable that can only be accessed and modified within the function bump
.
Prepending local
to variables and functions confines them to the enclosing scope.
This is a good practice because it reduces the chance of inadvertently modifying variables or functions that are used elsewhere. It also makes the intent of the code much clearer.
In general, you should always use local
unless you have a good reason not to.
In Lua, the local keyword is used to declare variables and functions as local to the block in which they are declared. I suspect that, with the benefit of hindsight, Lua’s designers would choose to make local the default and added some other keyword to make variables global . You will have many more local variables than global ones in your code, so that switch would be very beneficial. However, that is not the way Lua is designed, so you must remember to use local to keep your code clean and maintainable.
|
We have been fairly careful to use local
in our code to this point.
Modules
There is a further level of encapsulation that we have not yet discussed: modules.
A module is a collection of functions and variables that are grouped together in a single Lua table. The table is returned by the module and can be used to access the functions and variables within it.
Here is a simple example of a module in a file called answer.lua
:
- 1
-
We create a local table
M
to hold our module.
The nameM
is a common convention and has nothing to do with how the module is stored or used. - 2
-
answer
is a local variable that is only accessible within the module (withinanswer.lua
). - 3
-
We define a function
bump
within the module. It will become publicly accessible. - 4
-
We export the module at the end of the file where it’s defined.
Thereturn M
statement makes the module available to any other Lua file thatrequire
s it.
To use the module in another file, you would write:
- 1
-
require
is a built-in Lua function that loads a module and returns the table that the module exports. - 2
-
We call the
bump
function from theanswer
module to print 52.
Notice that the answer
module is a self-contained unit. It has its own local variables (and potentially local functions) that are private and not accessible from outside the module. The only way to interact with the module is through the functions and variables that it exports. Generally, the only thing that a module exports is a table that contains the functions and variables that you want to make available to the outside world. What you call the module internally is up to you, but the convention is to use M
.
Typically, the user of the module will import the module into a local variable with the same name as the module’s file (without the .lua
extension) though that is not a requirement.
Modules are a powerful way to organize your code and keep it clean and maintainable.
The scribe
Module
Here is a sketch of how we can turn our current code into a module defined in a file called scribe.lua
:
1local M = {}
2local function indent_string(str, indent, ignore_first_line)
...
end
local function compare(a, b) ... end
local function ordered_pairs(comparator) ... end
local function simple_string(obj) ... end
local function empty_table_string(opts) ... end
local function metadata(root_tbl) ... end
local function table_string(root_tbl, opts) ... end
local function table_clone(tbl) ... end
local function complete_options_table(options, from) ... end
3M.options = {}
4M.options.pretty = { ... }
5M.options.inline = table_clone(M.options.pretty)
...
M.options.classic = table_clone(M.options.pretty)
...
M.options.alt = table_clone(M.options.pretty)
...
M.options.json = table_clone(M.options.pretty)
...
M.options.inline_json = table_clone(M.options.json)
...
M.options.debug = table_clone(M.options.pretty)
...
M.options.default = M.options.inline
...
6function M.scribe(obj, opts, overrides)
if type(obj) ~= 'table' then return simple_string(obj) end
if opts == nil then return table_string(obj, M.options.default) end
if not opts.COMPLETE then
local from = opts.indent == '' and M.options.inline or M.options.pretty
(opts, from)
complete_options_tableend
if overrides == nil then return table_string(obj, opts) end
if not overrides.COMPLETE then complete_options_table(overrides, opts) end
return table_string(obj, overrides)
end
7function M.pretty(tbl, overrides)
return M.scribe(tbl, M.options.pretty, overrides)
end
function M.inline(tbl, overrides) ... end
function M.classic(tbl, overrides) ... end
function M.alt(tbl, overrides) ... end
function M.json(tbl, overrides) ... end
function M.inline_json(tbl, overrides) ... end
function M.debug(tbl, overrides) ... end
8return M
- 1
-
We create a local table
M
to hold our module.
It will contain all of the functions and variables that we want to export. - 2
-
We define all the private helper functions that we need for our module.
These functions are declared aslocal
and are not accessible from outside the module. - 3
-
We create a table
M.options
to hold all of the options that we will use in our module.
These will all be accessible from the outside as we want the user to be able to modify them. - 4
-
Where before we had
options.pretty = { ... }
, we now haveM.options.pretty = { ... }
. - 5
- And so on for the other tables of formatting parameters.
- 6
-
The main
scribe
function is now a member of the module.
It is shown in full so you can see how it uses both public options data and private helper functions. - 7
-
This is true for all our convenience facade functions, like
pretty
,inline
,classic
, etc. - 8
-
We finish by exporting the module by returning the table
M
.
Here is how you would use the scribe
module in another file:
- 1
-
We import the
scribe
module into a local variablescribe
. - 2
-
We call the
classic
function from the module to print a nicely formatted table.
This yields:
{
a = 1,
b = 2 }
A Little Bonus
Once you’ve loaded the scribe
module, you can access the pretty
function as scribe.pretty
and so on. If you care about using the pretty
function a lot, you can make it available as a local variable in your file:
local scribe = require 'scribe'
local pretty = scribe.pretty
local inline = scribe.inline
It would also be nice to have a shorthand for scribe.scribe
.
We add a __call
metamethod to the scribe
table to do that. Lua calls this metamethod when you treat the table as a function (i.e. when you use scribe(...)
).
Metamethods do not go in the module table itself. Instead, you give the module table a metatable that contains the metamethods. This extra level can seem confusing to judge by the number of questions about it on the internet.
In our case, we add the __call
metamethod to the metatable of the scribe
module as follows:
- 1
-
Start with an ordinary empty table
mt
. - 2
-
Add the
__call
metamethod to the table.
The first argument to the metamethod is the table itself, but we don’t need it so we use_
.
The...
collects all the arguments passed to the function. - 3
-
We endow our module table
M
with the metatablemt
that contains the metamethods.
You can use _
as a placeholder for any argument you don’t need. Also, note that ...
is a special variable that collects all the arguments passed to a function and forwards them unchanged.
With that addition, you can now use scribe
as a function:
local scribe = require 'scribe'
print(scribe({a = 1, b = 2}))
This will print the same table as before: {a = 1, b = 2}
.
require
Gotcha
require
is a built-in Lua function that loads a module and returns whatever the module exports.
It looks for the module’s source file using Lua’s package.path
variable. This is a long string of directories that Lua searches for files when you require
them. The different directories in package.path
are separated by semicolons.
Running Lua from the command line and typing:
print(package.path)
I get something like:
1
/usr/local/share/lua/5.4/?.lua;
/usr/local/share/lua/5.4/?/init.lua;
/usr/local/lib/lua/5.4/?.lua;
/usr/local/lib/lua/5.4/?/init.lua;2
./?.lua; ./?/init.lua
- 1
- Actually, the output is on a single line, but I have broken it up for clarity.
- 2
-
The
.
refers to the current directory.
The first four entries are the system directories where Lua looks for modules. Those were set when Lua was installed. The ./?.lua
entry tells Lua to also look for modules in the “current” directory.
By the way, the ?
is a wildcard that Lua replaces with the file name you are searching for.
With this setup you drop the scribe.lua
in the same directory as your main Lua file and you can require
it. Everything will work fine.
However, these days you are quite likely to run Lua from an IDE or perhaps via a plugin in another application. For example, I sometimes run Lua from ZeroBrane Studio which is a free lightweight IDE for Lua with a a full featured debugger (it’s cross-platform and highly recommended). Other times I run Lua from Visual Studio Code with the Lua for Visual Studio Code extension.
In both these cases, the current directory is not the directory where your Lua files are! Instead, it is the directory where the IDE or plugin is installed.
When you run Lua from these environments, you will get an error when you try to require
a module in the same directory as your main Lua file. The error will be something like:
module 'scribe' not found:
no field package.preload['scribe']1
no file './scribe.lua'
no file '/usr/local/share/lua/5.4/scribe.lua'
no file '/usr/local/share/lua/5.4/scribe/init.lua' ...
- 1
-
This
no file
line will make you scratch your head!
It appears that Lua is looking for ./scribe.lua
and not finding it even though it is clearly in the same directory as your main Lua file. You’ll probably double and triple check the file is there and that you have spelled the name correctly. Nothing will help.
The confusion arises because you think .
is the directory where your main Lua file is but the IDE or plugin sees it as the directory where the IDE or plugin is installed.
The solution is to add the script’s directory to package.path
. You could hardcode that directory name and append it to package.path
but that’s clunky. If you change the directory structure of your project, you will have to remember to change the hardcoded path.
Instead, you can use Lua’s debug
library to get the directory of the current source file. Here is how you can do that:
1local source_dir = debug.getinfo(1, 'S').source:match [[^@?(.*[\/])[^\/]-$]]
2package.path = source_dir .. "?.lua;" .. package.path
- 1
- This magic incantation gets the directory of the current source file.
- 2
-
This line appends the directory to
package.path
.
You can put these lines at the top of your main Lua file and they will ensure that require
works correctly.
This isn’t terribly elegant, but it is a portable way to ensure that your modules are found in the “current” directory when you run Lua from an IDE or plugin.
LuaRocks
scribe
, like many Lua modules, is available via LuaRocks.
LuaRocks is the package manager for Lua modules and, when you install LuaRocks, it makes sure that any modules you install using it are available to Lua via require
. It adds some LuaRocks standard directories to package.path
so that Lua can find the modules.
If you install scribe
using LuaRocks, you won’t have to worry about the require
gotcha. LuaRocks will take care of everything for you.
Summary
At this point we have a developed a production ready version of scribe
. It produces readable outputs for complex tables with cyclical references. scribe
also supports options for customizing the output in many ways.
Our module also comes with pre-packaged styles for common output formats and simple to user-friendly functions for printing tables in those formats. For the most part, the user can just call pretty
or json
, etc. and get a good result without having to worry about the details.
Formatted Output
Stringing together messages using concatenation quickly becomes cumbersome.
Lua provides a simple way to format strings using the string.format
method, similar to the sprintf
function in C.
print(string.format("The value of %s is %.2f", 'pi', math.pi))
This prints The value of pi is 3.14
to your screen.
The format string "The value of %s is %.2f"
is a template containing placeholders for the values you want to insert. It is a recipe for baking a string by replacing the placeholders with the trailing arguments to string.format
.
The general form for calling string.format
is:
string.format(format_string, arg1, arg2, ...)
The first argument is the format string; the rest are the values that string.format
will insert into the placeholders. It is a variadic function, which means it can take any number of arguments after the format string.
Placeholders like %s
and %f
are format specifiers that tell string.format
to look for a trailing argument that is a string and another that is a floating point number. The .2
in %.2f
is a format modifier, and it tells string.format
to round the floating point number to two decimal places. The placeholders are replaced by the trailing arguments in the order they appear in the format string.
string.format
is identical to the venerable sprintf
function in C, and it supports almost all the same format specifiers and modifiers. We already mentioned that it adds a couple of extra format specifiers, like %q
, which are not available in C. It drops a few of the more esoteric format specifiers rarely used in practice.
At some point, everyone recreates the same wrapper around string.format
that looks like this:
1function printf(format_string, ...)
print(string.format(format_string, ...))
end
- 1
-
The name used here is
printf
to mimic the C function of the same name.
You can use this function to print formatted strings like this:
("The value of %s is %.2f", 'pi', math.pi) printf
Creating formatted output using string.format
is a big step up from concatenation, but it suffers from the problem of having no concept of a Lua table. The underlying C function is unaware of Lua’s data structures, so it sees tables as a blob of memory and prints their address.
Adding Tables to the Mix
Scribe provides a scribe.format
function that is a drop-in replacement for string.format
with the added ability to format Lua tables.
local person = {name = 'Alice', age = 42}
print(scribe.format("Data: %t", person))
This prints Data: { age = 42, name = "Alice" }
to your screen.
We do this by adding a new format specifier, %t
, that tells scribe.format
to format the trailing argument as a table. We have added several new format specifiers that allow you to format Lua tables in various ways.
It happens that %t
, %T
, %j
, and %J
were not already claimed as specifiers by string.format
. Moreover, those specifiers are mnemonic and easy to remember:
%t
formats a table as an inline string.%T
formats a table as a multiline string.%j
formats a table as a compact inline JSON string.%J
formats a table as a pretty-printed multiline JSON string.
So, uppercase %T
and %J
are for multiline output, while lowercase %t
and %j
are for inline output.
The signature for scribe.format
is the same as string.format
:
function M.format(template, ...)
...
end
The first argument is the format string; the rest are the values we will insert into the placeholders.
We know that all placeholders have the form %<modifier><specifier>
, where <specifier>
is the only required part. Our new format specifiers %t
, %T
, %j
, and %J
are no different.
Our custom format
method looks for those new specifiers in the format string. If none exist, it calls string.format
with the same arguments and returns the result.
If it finds any new specifier, it formats the trailing table argument as a string according to the specifier. It can then replace the custom placeholder like %t
in the format string with a %s
. It also replaces the table argument with its formatted string description. At this point, it calls string.format
with the modified format string and the rest of the arguments.
The tricky part is using Lua’s pattern matching to find the custom specifiers in the format string.
function M.format(template, ...)
1if template == nil then return "" end
2local percent_rx = '%%+'
local modifier_rx = '[%-%+ #0]?%d*%.?[%d%*]*[hljztL]?[hl]?'
3local specifier_rx = '[diuoxXfFeEgGaAcspqtTjJ]'
4local placeholder_rx = string.format('%s(%s)(%s)', percent_rx, modifier_rx, specifier_rx)
5local table_rx = percent_rx .. '%d*[tTjJ]'
6if not template:find(table_rx) then return string.format(template, ...) end
7local table_placeholders = {}
local n_placeholders = 0
8for mod, spec in template:gmatch(placeholder_rx) do
n_placeholders = n_placeholders + 1
if spec == 't' or spec == 'T' or spec == 'j' or spec == 'J' then
insert(table_placeholders, { n_placeholders, mod, spec })
end
end
9local args = { ... }
if #args ~= n_placeholders then
return string.format("[FORMAT ERROR]: %q -- needs %d args but you sent %d!\n", template, n_placeholders, #args)
end
10for i = 1, #table_placeholders do
local index, mod, spec = unpack(table_placeholders[i])
local full_spec = mod .. spec
if full_spec == 't' then
args[index] = M.inline(args[index])
elseif full_spec == 'T' then
args[index] = M.pretty(args[index])
elseif full_spec == 'J' then
args[index] = M.json(args[index])
elseif full_spec == 'j' then
args[index] = M.inline_json(args[index])
else
return string.format("[FORMAT ERROR]: %q -- unknown table specifier: %q\n", template, full_spec)
end
end
11template = template:gsub(table_rx, '%%s')
12return string.format(template, unpack(args))
end
- 1
-
An edge case: if the format string is
nil
, we return an empty string. - 2
- The pattern for matching one or more percent signs.
- 3
- The pattern for matching a format specifier.
- 4
- The pattern for matching a placeholder.
- 5
- The pattern for matching our table specifiers.
- 6
-
If the format string contains no table specifiers, we can call
string.format
and return the result. - 7
- We create space to store the positions of the table placeholders.
- 8
- We iterate over the placeholders in the format string and store the position of any table specifiers.
- 9
- We store the trailing arguments in a local variable.
- 10
- We iterate over the table placeholders and format the table arguments according to the specifier.
- 11
-
We replace the table specifiers with
%s
in the format string. - 12
-
We call
string.format
with the modified format string and the rest of the arguments.
A lot is going on here, but the key points are: - We use Lua’s pattern matching to find the placeholders in the format string. - We store the positions of any table specifiers. - We format the table arguments according to the specifier. - We replace the table specifiers with %s
in the format string. - We call string.format
with the modified format string and the rest of the arguments.
More Facades
We have added a few more facades to the scribe
module to make it easier to work with formatted output. For example:
function M.put(template, ...)
1io.stdout:write(M.format(template, ...))
end
- 1
-
The
put
function is a simple wrapper aroundscribe.format
that writes the formatted string to the standard output.
A matching putln
function appends a newline character to the same output.
function M.putln(template, ...)
io.stdout:write(M.format(template, ...), '\n')
end
Corresponding eput
, eputln
, fput
, and fputln
functions write to the standard error stream and to files.