Lexer
in package
A Lexer is a stateful stream generator in that every time it is advanced, it returns the next token in the Source. Assuming the source lexes, the final Token emitted by the lexer will be of kind EOF, after which the lexer will repeatedly return the same EOF token whenever called.
Algorithm is O(N) both on memory and time
Table of Contents
- TOKEN_AMP = 38
- TOKEN_AT = 64
- TOKEN_BANG = 33
- TOKEN_BRACE_L = 123
- TOKEN_BRACE_R = 125
- TOKEN_BRACKET_L = 91
- TOKEN_BRACKET_R = 93
- TOKEN_COLON = 58
- TOKEN_DOLLAR = 36
- TOKEN_DOT = 46
- TOKEN_EQUALS = 61
- TOKEN_HASH = 35
- TOKEN_PAREN_L = 40
- TOKEN_PAREN_R = 41
- TOKEN_PIPE = 124
- $lastToken : Token
- The previously focused non-ignored token.
- $line : int
- The (1-indexed) line containing the current token.
- $lineStart : int
- The character offset at which the current line begins.
- $options : array<string|int, bool>
- $source : Source
- $token : Token
- The currently focused non-ignored token.
- $byteStreamPosition : int
- Current cursor position for ASCII representation of the source
- $position : int
- Current cursor position for UTF8 encoding of the source
- __construct() : mixed
- advance() : Token
- lookahead() : mixed
- assertValidBlockStringCharacterCode() : mixed
- assertValidStringCharacterCode() : mixed
- moveStringCursor() : self
- Moves internal string cursor position
- positionAfterWhitespace() : mixed
- Reads from body starting at startPosition until it finds a non-whitespace or commented character, then places cursor to the position of that character.
- readBlockString() : mixed
- Reads a block string token from the source file.
- readChar() : array<string|int, string|int>
- Reads next UTF8Character from the byte stream, starting from $byteStreamPosition.
- readChars() : array<string|int, string|int>
- Reads next $numberOfChars UTF8 characters from the byte stream, starting from $byteStreamPosition.
- readComment() : Token
- Reads a comment token from the source file.
- readDigits() : mixed
- Returns string with all digits + changes current string cursor position to point to the first char after digits
- readName() : Token
- Reads an alphanumeric + underscore name from the source.
- readNumber() : Token
- Reads a number token from the source file, either a float or an int depending on whether a decimal point appears.
- readString() : Token
- readToken() : Token
- unexpectedCharacterMessage() : mixed
Constants
TOKEN_AMP
private
mixed
TOKEN_AMP
= 38
TOKEN_AT
private
mixed
TOKEN_AT
= 64
TOKEN_BANG
private
mixed
TOKEN_BANG
= 33
TOKEN_BRACE_L
private
mixed
TOKEN_BRACE_L
= 123
TOKEN_BRACE_R
private
mixed
TOKEN_BRACE_R
= 125
TOKEN_BRACKET_L
private
mixed
TOKEN_BRACKET_L
= 91
TOKEN_BRACKET_R
private
mixed
TOKEN_BRACKET_R
= 93
TOKEN_COLON
private
mixed
TOKEN_COLON
= 58
TOKEN_DOLLAR
private
mixed
TOKEN_DOLLAR
= 36
TOKEN_DOT
private
mixed
TOKEN_DOT
= 46
TOKEN_EQUALS
private
mixed
TOKEN_EQUALS
= 61
TOKEN_HASH
private
mixed
TOKEN_HASH
= 35
TOKEN_PAREN_L
private
mixed
TOKEN_PAREN_L
= 40
TOKEN_PAREN_R
private
mixed
TOKEN_PAREN_R
= 41
TOKEN_PIPE
private
mixed
TOKEN_PIPE
= 124
Properties
$lastToken
The previously focused non-ignored token.
public
Token
$lastToken
$line
The (1-indexed) line containing the current token.
public
int
$line
$lineStart
The character offset at which the current line begins.
public
int
$lineStart
$options
public
array<string|int, bool>
$options
$source
public
Source
$source
$token
The currently focused non-ignored token.
public
Token
$token
$byteStreamPosition
Current cursor position for ASCII representation of the source
private
int
$byteStreamPosition
$position
Current cursor position for UTF8 encoding of the source
private
int
$position
Methods
__construct()
public
__construct(Source $source[, array<string|int, bool> $options = [] ]) : mixed
Parameters
- $source : Source
- $options : array<string|int, bool> = []
Return values
mixed —advance()
public
advance() : Token
Return values
Token —lookahead()
public
lookahead() : mixed
Return values
mixed —assertValidBlockStringCharacterCode()
private
assertValidBlockStringCharacterCode(mixed $code, mixed $position) : mixed
Parameters
- $code : mixed
- $position : mixed
Return values
mixed —assertValidStringCharacterCode()
private
assertValidStringCharacterCode(mixed $code, mixed $position) : mixed
Parameters
- $code : mixed
- $position : mixed
Return values
mixed —moveStringCursor()
Moves internal string cursor position
private
moveStringCursor(int $positionOffset, int $byteStreamOffset) : self
Parameters
- $positionOffset : int
- $byteStreamOffset : int
Return values
self —positionAfterWhitespace()
Reads from body starting at startPosition until it finds a non-whitespace or commented character, then places cursor to the position of that character.
private
positionAfterWhitespace() : mixed
Return values
mixed —readBlockString()
Reads a block string token from the source file.
private
readBlockString(mixed $line, mixed $col, Token $prev) : mixed
"""("?"?(\"""|\(?!=""")|[^"\]))*"""
Parameters
- $line : mixed
- $col : mixed
- $prev : Token
Return values
mixed —readChar()
Reads next UTF8Character from the byte stream, starting from $byteStreamPosition.
private
readChar([bool $advance = false ][, int $byteStreamPosition = null ]) : array<string|int, string|int>
Parameters
- $advance : bool = false
- $byteStreamPosition : int = null
Return values
array<string|int, string|int> —readChars()
Reads next $numberOfChars UTF8 characters from the byte stream, starting from $byteStreamPosition.
private
readChars(int $charCount[, bool $advance = false ][, null $byteStreamPosition = null ]) : array<string|int, string|int>
Parameters
- $charCount : int
- $advance : bool = false
- $byteStreamPosition : null = null
Return values
array<string|int, string|int> —readComment()
Reads a comment token from the source file.
private
readComment(int $line, int $col, Token $prev) : Token
#[\u0009\u0020-\uFFFF]*
Parameters
- $line : int
- $col : int
- $prev : Token
Return values
Token —readDigits()
Returns string with all digits + changes current string cursor position to point to the first char after digits
private
readDigits() : mixed
Return values
mixed —readName()
Reads an alphanumeric + underscore name from the source.
private
readName(int $line, int $col, Token $prev) : Token
[_A-Za-z][_0-9A-Za-z]*
Parameters
- $line : int
- $col : int
- $prev : Token
Return values
Token —readNumber()
Reads a number token from the source file, either a float or an int depending on whether a decimal point appears.
private
readNumber(int $line, int $col, Token $prev) : Token
Int: -?(0|[1-9][0-9]) Float: -?(0|[1-9][0-9])(.[0-9]+)?((E|e)(+|-)?[0-9]+)?
Parameters
- $line : int
- $col : int
- $prev : Token
Tags
Return values
Token —readString()
private
readString(int $line, int $col, Token $prev) : Token
Parameters
- $line : int
- $col : int
- $prev : Token
Tags
Return values
Token —readToken()
private
readToken(Token $prev) : Token
Parameters
- $prev : Token
Tags
Return values
Token —unexpectedCharacterMessage()
private
unexpectedCharacterMessage(mixed $code) : mixed
Parameters
- $code : mixed