Lua LPEG — Extensions

Introduction

The lulu.xpeg module extends the standardlpeg module with predefined patterns and valuable functions.

Patterns

If you have imported the module as

local lpeg = require 'lulu.xpeg'

You can access many “standard” LPEG patterns in the lpeg.patterns table.

For convenience, define:

local p = lpeg.patterns

Then:

Pattern	Description
`p.any`	Matches any character.
`p.eos`	Matches the end of the subject or string.
`p.esc`	Matches the standard escape character — the backslash.
`p.alpha`	Matches alphabetic characters.
`p.digit`	Matches decimal digits.
`p.alphanumeric`	Matches alphanumeric characters.
`p.lower`	Matches lowercase letters.
`p.upper`	Matches uppercase letters.
`p.graph`	Matches printable characters.
`p.punctuation`	Matches non-alphanumeric printable chars.
`p.hs`	Matches horizontal space.
`p.ws`	Matches whitespace including newlines.
`p.non_hs`	Matches non-horizontal space characters.
`p.non_ws`	Matches non-whitespace characters.
`p.nl`	Matches newlines (DOS and Unix).
`p.non_nl`	Matches non-newline characters (DOS & Unix).
`p.eol`	Matches end-of-line or end-of-subject.
`p.non_eol`	Matches non-end-of-line characters.
`p.sign`	Matches plus or minus signs.
`p.bin_digit`	Matches binary digits without digit separators.
`p.oct_digit`	Matches octal digits without digit separators.
`p.dec_digit`	Matches decimal digits without digit separators.
`p.hex_digit`	Matches hexadecimal digits without digit separators.
`p.dec`	Matches any decimal number without digit separators.
`p.hex`	Matches any hexadecimal number without digit separators.
`p.oct`	Matches an octal number without digit separators.
`p.bin`	Matches any binary number without digit separators.
`p.int`	Matches any integer without digit separators.
`p.float`	Matches any floating point number without digit separators.
`p.num`	Matches any number without digit separators.
`p.ws_trim`	Capture pattern that ignores leading and trailing whitespace.
`p.ws_collapse`	Substitution capture that trims and collapses all contiguous interior white-spaces to a single space.
`p.ws_delete`	Substitution capture that removes all whitespace.
`p.blocks`	Table capture all the text blocks/paragraphs in a subject string. A block/paragraph is text followed by one or more empty lines or the end of the string.
`p.single_quoted`	Matches single quoted strings.
`p.double_quoted`	Matches double-quoted strings.
`p.quoted`	Matches either single or double-quoted strings.
`p.single_quoted_content`	Capture content inside single quotes.
`p.double_quoted_content`	Capture content inside double quotes.
`p.quoted_content`	Capture content inside single or double quotes.
`p.careful_collapse_ws`	Substitution capture that trims and collapses contiguous interior white spaces to one space but not inside quotes.

Methods

If you have imported the module as

local lpeg = require 'lulu.lpeg'

You also have access to the following methods:

Pattern	Description
`lpeg.dec_number(sep)`	Match unsigned decimal integers with optional digit group separators.
`lpeg.hex_number(sep)`	Match unsigned hexadecimal integers with optional digit group separators.
`lpeg.oct_number(sep)`	Match unsigned octal integers with optional digit group separators.
`lpeg.bin_number(sep)`	Match unsigned binary integers with optional digit group separators.
`lpeg.int_number(sep)`	Match any form of signed integer with optional digit group separators.
`lpeg.float_number(sep)`	Match any form of signed float with optional digit group separators.
`lpeg.number(sep)`	Match any form of signed number with optional digit group separators.
`lpeg.is_pattern(ptn)`	Query to see if an argument is an lpeg pattern.
`lpeg.anywhere(ptn)`	Creates a pattern that allows `ptn` to work anywhere in a subject string.
`lpeg.change(ptn,to)`	Creates a substitution pattern that acts on strings so matches to `ptn` are changed to `to`.
`lpeg.is_escaped(esc)`	Creates a pattern that checks whether a string starts with an escape character. The default escape character is the backslash.
`lpeg.unescape(chars, esc)`	Creates a pattern that turns escaped characters into unescaped ones. The default `chars` is `"`, the default `esc` is the backslash.
`lpeg.to_eol(from)`	Creates a pattern that matches starting with string/pattern `from` until the end of the line.
`lpeg.tokenizer(sep)`	Creates a table capture pattern to split a string into tokens based on a separator/pattern.
`lpeg.before(sep)`	Creates a capture pattern for all content before the first occurrence of a separator/pattern.
`lpeg.after(sep)`	Creates a capture pattern for all content after the first occurrence of a separator/pattern.
`lpeg.delimited(l,r,line)`	Creates a pattern to match delimited content.
`lpeg.delimited_content(l,r,line)`	Creates a pattern to capture delimited content.
`lpeg.after_set(set,ptn,skip)`	Creates a new pattern where `ptn` is only matched if it is preceded by a character in the `set` string. The `skip` string is used to skip over characters between the `set` and the `ptn` match (default whitespace).
`lpeg.after_newline(ptn, allow_indent)`	Creates a new pattern where `ptn` is only matched if it is preceded by a newline character.

Notes

An example of the sep character for the various lpeg.*_number(sep) might be a comma.

In the lpeg.delimited*(l, r, line) methods, l is the left delimiter, r is the right delimiter.
If the final boolean line is true (default is false) then the match stops at the first newline.

If neither delimiter is given then we default to using double quotes: “…”.
If only one delimiter is given and it’s a single character (e.g. “‘“) then we use that for both:’…’.
If only one delimiter is given with an even number of characters (e.g. “{}”) then we split it in half ‘{’ and ‘}’.
You can of course specify the left and right delimiters separately.
If l == r then escaped delimiters aren’t matched: “inner "quote" here” matches the whole string.
If l != r then we ‘balance’ so “{}” applied to “{123{45}67}” matches on ‘123{45}67’.

Introduction

Patterns

Methods

Notes

See Also