The embedded language may be a source <code>(client | server)</code> followed by a language name
<code>(javascript | php | python | basic)</code>.
This may be extended in the future with other programming languages and style-definition languages like CSS.
</p>
<h3>Status</h3>
<p>
The statuses may be <code>(error | unused | predefined | inactive)</code>.<br/>
The <code>error</code> status is used for lexical statuses that indicate errors in the source code such as unterminated quoted strings.<br/>
The <code>unused</code> status may indicate a gap in the lexical states, possibly because an old lexical class is no longer used or an upcoming lexical class may fill that position.<br/>
The <code>predefined</code> status indicates a style in the range 32.39 that is used for non-lexical purposes in Scintilla.<br/>
The <code>inactive</code> status is used for text that is not currently interpreted such as C++ code that is contained within a '#if 0' preprocessor block.
</p>
<h3>Basic Types</h3>
<p>
The basic types for programming languages are <code>(default | operator | keyword | identifier | literal | comment | preprocessor | label)</code>.<br/>
The <code>default</code> type is commonly used for spaces and tabs between tokens although it may cover other characters in some languages.
</p>
<p>
Assembler languages add <code>(instruction | register)</code>. to the basic types from programming languages.<br/>
</p>
<p>
The basic types for markup languages are <code>(default | tag | attribute | comment | preprocessor)</code>.<br/>
</p>
<p>
The basic types for data languages are <code>(default | key | data | comment)</code>.<br/>
</p>
<h3>Comments</h3>
<p>
Programming languages may differentiate between line and stream comments and treat documentation comments as distinct from other comments.
Documentation comments may be marked up with documentation keywords.<br/>
The additional attributes commonly used are <code>(line | documentation | keyword | taskmarker)</code>.
</p>
<h3>Literals</h3>
<p>
Programming and assembler languages contain a rich set of literals including numbers like <code>7</code> and <code>3.89e23</code>; <code>"string\n"</code>; and <code>nullptr</code>
and differentiating between these is often wanted.<br/>
The common literal types are <code>(numeric | boolean | string | regex | date | time | uuid | nil | compound)</code>.<br/>
Numeric literal types are subdivided into <code>(integer | real)</code>.<br/>
String literal types may add (perhaps multiple) further attributes from <code> (heredoc | character | escapesequence | interpolated | multiline | raw)</code>.<br/>
</p>
<p>
An escape sequence within an interpolated heredoc may thus be <code>literal string heredoc escapesequence</code>.
<tr><td><code>raw</code></td><td>String type that avoids interpretation: may be used for regular expressions in languages without a specific regex type</td></tr>
<tr><td><code>real</code></td><td>Numeric literal which may have a fraction or exponent like '3.84e-15'</td></tr>
<tr><td><code>regex</code></td><td>Regular expression literal like '^[a-z]+'</td></tr>
<tr><td><code>register</code></td><td>CPU register in assembler languages</td></tr>
<tr><td><code>server</code></td><td>Script executed on server</td></tr>
<tr><td><code>string</code></td><td>Sequence of characters</td></tr>
<tr><td><code>tag</code></td><td>Markup tag like '<br />'</td></tr>
<tr><td><code>taskmarker</code></td><td>Word in comment that marks future work like 'FIXME'</td></tr>
<tr><td><code>time</code></td><td>Literal representing a time such as '9:34:31'</td></tr>
<tr><td><code>unused</code></td><td>Style that is not currently used</td></tr>
<tr><td><code>uuid</code></td><td>Universally unique identifier often used in interface definition files which may look like '{098f2470-bae0-11cd-b579-08002b30bfeb}'</td></tr>
</table>
<h2>
Extension
</h2>
<p>
Each element in this scheme may be extended in the future. This may be done by revising this document to provide a common approach to new features.
Individual lexers may also choose to expose unique language features through new tags.
</p>
<h2>
Translation
</h2>
<p>
Tags could be exposed directly in user interfaces or configuration languages.
However, an application may also translate these to match its naming schema.
Capitalization and punctuation could be different (like <code>Here-Doc</code> instead of <code>heredoc</code>),
terminology changed ("constant" instead of "literal"),
or human language changed from English to Chinese or Spanish.
</p>
<p>
Starting from a common set of tags makes these modifications tractable.
</p>
<h2>
Open issues
</h2>
<p>
The C++ lexer (for example) has inactive states and dynamically allocated substyles.
These should be exposed through the metadata mechanism but are not currently.