Lua LPEG — Extensions

Introduction

The lulu.xpeg module extends the standardlpeg module with predefined patterns and valuable functions.

Patterns

If you have imported the module as

local lpeg = require 'lulu.xpeg'

You can access many “standard” LPEG patterns in the lpeg.patterns table.

For convenience, define:

local p = lpeg.patterns

Then:

Pattern Description
p.any Matches any character.
p.eos Matches the end of the subject or string.
p.esc Matches the standard escape character — the backslash.
p.alpha Matches alphabetic characters.
p.digit Matches decimal digits.
p.alphanumeric Matches alphanumeric characters.
p.lower Matches lowercase letters.
p.upper Matches uppercase letters.
p.graph Matches printable characters.
p.punctuation Matches non-alphanumeric printable chars.
p.hs Matches horizontal space.
p.ws Matches whitespace including newlines.
p.non_hs Matches non-horizontal space characters.
p.non_ws Matches non-whitespace characters.
p.nl Matches newlines (DOS and Unix).
p.non_nl Matches non-newline characters (DOS & Unix).
p.eol Matches end-of-line or end-of-subject.
p.non_eol Matches non-end-of-line characters.
p.sign Matches plus or minus signs.
p.bin_digit Matches binary digits without digit separators.
p.oct_digit Matches octal digits without digit separators.
p.dec_digit Matches decimal digits without digit separators.
p.hex_digit Matches hexadecimal digits without digit separators.
p.dec Matches any decimal number without digit separators.
p.hex Matches any hexadecimal number without digit separators.
p.oct Matches an octal number without digit separators.
p.bin Matches any binary number without digit separators.
p.int Matches any integer without digit separators.
p.float Matches any floating point number without digit separators.
p.num Matches any number without digit separators.
p.ws_trim Capture pattern that ignores leading and trailing whitespace.
p.ws_collapse Substitution capture that trims and collapses all contiguous interior white-spaces to a single space.
p.ws_delete Substitution capture that removes all whitespace.
p.blocks Table capture all the text blocks/paragraphs in a subject string.
A block/paragraph is text followed by one or more empty lines or the end of the string.
p.single_quoted Matches single quoted strings.
p.double_quoted Matches double-quoted strings.
p.quoted Matches either single or double-quoted strings.
p.single_quoted_content Capture content inside single quotes.
p.double_quoted_content Capture content inside double quotes.
p.quoted_content Capture content inside single or double quotes.
p.careful_collapse_ws Substitution capture that trims and collapses contiguous interior white spaces to one space but not inside quotes.

Methods

If you have imported the module as

local lpeg = require 'lulu.lpeg'

You also have access to the following methods:

Pattern Description
lpeg.dec_number(sep) Match unsigned decimal integers with optional digit group separators.
lpeg.hex_number(sep) Match unsigned hexadecimal integers with optional digit group separators.
lpeg.oct_number(sep) Match unsigned octal integers with optional digit group separators.
lpeg.bin_number(sep) Match unsigned binary integers with optional digit group separators.
lpeg.int_number(sep) Match any form of signed integer with optional digit group separators.
lpeg.float_number(sep) Match any form of signed float with optional digit group separators.
lpeg.number(sep) Match any form of signed number with optional digit group separators.
lpeg.is_pattern(ptn) Query to see if an argument is an lpeg pattern.
lpeg.anywhere(ptn) Creates a pattern that allows ptn to work anywhere in a subject string.
lpeg.change(ptn,to) Creates a substitution pattern that acts on strings so matches to ptn are changed to to.
lpeg.is_escaped(esc) Creates a pattern that checks whether a string starts with an escape character.
The default escape character is the backslash.
lpeg.unescape(chars, esc) Creates a pattern that turns escaped characters into unescaped ones.
The default chars is ", the default esc is the backslash.
lpeg.to_eol(from) Creates a pattern that matches starting with string/pattern from until the end of the line.
lpeg.tokenizer(sep) Creates a table capture pattern to split a string into tokens based on a separator/pattern.
lpeg.before(sep) Creates a capture pattern for all content before the first occurrence of a separator/pattern.
lpeg.after(sep) Creates a capture pattern for all content after the first occurrence of a separator/pattern.
lpeg.delimited(l,r,line) Creates a pattern to match delimited content.
lpeg.delimited_content(l,r,line) Creates a pattern to capture delimited content.
lpeg.after_set(set,ptn,skip) Creates a new pattern where ptn is only matched if it is preceded by a character in the set string. The skip string is used to skip over characters between the set and the ptn match (default whitespace).
lpeg.after_newline(ptn, allow_indent) Creates a new pattern where ptn is only matched if it is preceded by a newline character.

Notes

An example of the sep character for the various lpeg.*_number(sep) might be a comma.

In the lpeg.delimited*(l, r, line) methods, l is the left delimiter, r is the right delimiter.
If the final boolean line is true (default is false) then the match stops at the first newline.

  • If neither delimiter is given then we default to using double quotes: “…”.
  • If only one delimiter is given and it’s a single character (e.g. “‘“) then we use that for both:’…’.
  • If only one delimiter is given with an even number of characters (e.g. “{}”) then we split it in half ‘{’ and ‘}’.
  • You can of course specify the left and right delimiters separately.
  • If l == r then escaped delimiters aren’t matched: “inner "quote" here” matches the whole string.
  • If l != r then we ‘balance’ so “{}” applied to “{123{45}67}” matches on ‘123{45}67’.

See Also

The patterns and methods in this module can be used on a stand-alone basis. In many cases, they act as the engine driving methods in the lulu.string module, which extends Lua’s string class.

lulu.string

Back to top