◄ Tutorial STXT Schema ►

STXT Documents

1. Introduction
2. Terminology
3. Document Encoding
4. Syntactic Unit: Node
5. Container nodes, INLINE type
6. Text block nodes, BLOCK type
7. Namespaces
8. Indentation and Hierarchy
9. Comments
10. Whitespace normalization
11. Error Rules
12. Conformance
13. File Extension and Media Type
14. Normative Examples
15. Security Considerations
16. Appendix A — Grammar (Informal)
17. Appendix B — Interaction with `@stxt.schema`
18. Appendix C — Interaction with `@stxt.template`
19. End of Document

1. Introduction

This document defines the specification of the STXT (Semantic Text) language.

STXT is a Human-First language, designed so that its natural form is readable, clear, and comfortable for people, while at the same time maintaining a precise and easily machine-processable structure.

STXT is a hierarchical and semantic textual format oriented to:

Representing documents and data clearly.
Being extremely simple to read and write.
Being trivial to parse in any language.
Allowing both structured content and free text.
Extending its semantics through @stxt.schema or @stxt.template.
Facilitating the creation of parsers while trying to minimize security errors.

This document describes the base syntax of the language.

2. Terminology

The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" must be interpreted according to RFC 2119.

3. Document Encoding

An STXT document SHOULD be encoded in UTF-8 without BOM.

A parser:

SHOULD accept documents that begin with BOM.
MAY emit a warning for documents that begin with BOM.

4. Syntactic Unit: Node

Each non-empty line of the document that is not a comment nor part of a >> block defines a node.

There are two forms of node:

Inline container node (INLINE node): Node name: Inline value
Text block node (BLOCK node): Node name >>

The node name cannot be empty. A line with only : or >> is not valid.

Example with INLINE nodes:

Node 1: Inline value
	Node 2 without value:
	Node 3 with another value: this is the other value

Example with BLOCK node:

Block node >>
	This is the content
	of the text block:

	  - Leading spaces and line breaks are preserved
	  - Right trim is applied
	  - Left trim is NOT applied

A node may optionally include a namespace:

Name (normal.namespace):
Name (@special.namespace):

4.1 Normalization of the node name

The node name is taken from the text between:

The first character not belonging to the indentation, and
The first character belonging to any of:
- The start of a namespace (,
- The : character,
- The >> operator,

On that fragment, the following is applied:

Removal of leading and trailing spaces and tabs (trim).
Compaction of spaces into a single one

The result of this normalization is the node name.

A node whose logical name is the empty string ("") is invalid and MUST cause a parse error.

Equivalent examples at the Node name level:

Node name:
Node name: value
Node  name   : value
Node  name (@a.special.namespace):
Node name(a.normal.namespace):
Node  name >>
Node name>>

The definition of a node MUST always include either : (INLINE container node) or >> (BLOCK text node), always preceded by a non-empty name.

4.2 Restrictions on the node name

The node name will only allow alphanumeric characters and the characters -, _, . Names with diacritics, uppercase, and lowercase are allowed.

4.3 Canonical node name

The canonical name is formed from the node name through the following process:

Unicode decomposition (NFKD)
Conversion to lowercase
Removal of diacritics
Compaction of spaces (not necessary on an already normalized name)
Replacement of [^a-z0-9] with -. 2 or more consecutive hyphens are not allowed; they must be compacted into a single one (-)
Remove hyphens (-) at the beginning and at the end if any exist

The canonical name will be used to know whether one node has the same name as another. It will also be used internally by all search or checking operations, to know whether it is the same element.

Examples of transformation:

A namé with äccent: a-name-with-accent
A NAME with äccent: a-name-with-accent
SIZe number 2__ and 3: size-number-2-y-3

4.4 Style rules

The recommended style rules are the following:

Separate the name from the definition of a namespace with a single space
Separate : from the value with a single space
: goes immediately after the name or the namespace if present
>> has no character after it
Separate the node name or the namespace with a space before >>
No more than one space is used in names
Namespace without spaces in the definition (namespace.def)

Examples of correct style:

Name with value: The value
Name without value:
Name with namespace (the.namespace):
Text node >>

5. Container nodes, INLINE type

The form with : defines an INLINE container node with the following characteristics:

It may have a value (optional).
It may have no value (empty node).
It may have children (nested nodes).
Its structured content includes:
- The node's own line.
- Its descendants with greater indentation.

Examples:

Title: Report
Author: Joan
Node:
Node: Value
Node:
    SubNode 1: 123
    Another subnode: 456

5.1 Value normalization

The (INLINE) value of a node must be normalized with a trim (right and left).

Example:

Name: value 1
Name:    value 1
# in both cases, the inline value of Name is "value 1", although in the
# second there are spaces before and after.

Strong normalization applies only to structural identifiers. Values are literal, although a simple normalization is applied: right and left trim.

6. Text block nodes, BLOCK type

The form with >> defines a block of literal text.

Valid examples:

Description >>
    Line 1
    Line 2

Section>>
    Accepts the operator without space

6.1 Formal rules

The >> node line MUST NOT contain significant content after >>, except optional spaces.
All lines with indentation strictly greater than that of the >> node belong to the textual content of the block.
Inside the block:
- The parser MUST NOT interpret any line as a structured node, even if it contains : or other STXT syntax.
- The parser MUST NOT interpret lines beginning with # as comments; all lines are literal text.
The block ends when a non-empty line appears whose indentation is less than or equal to the indentation of the >> node.
A comment line with indentation less than or equal to that of the >> node is outside the block and is processed as a normal comment.
Empty lines inside the block are preserved and MUST NOT close the block, regardless of their indentation.

6.2 Example

Block >>
    Text
        Child: value YES allowed, it is text, it is not parsed
        Another child: YES allowed
    # This is also text
Next Node: value

In this example:

Everything indented below Block >> is literal text.
Child: value and Another child: YES allowed are not nodes, but text.
Next Node: value is outside the >> block.

7. Namespaces

A namespace is optional and is specified like this:

Node (com.example.docs):
Another node (another.namespace):
More nodes (@a.special.name):

Rules:

A namespace MAY start with @.
It MUST use hierarchical format (a.b.c), with at least 2 elements (a.b).
The effective namespace of a root node that does not specify a namespace is the empty namespace "".
A child node that does not specify a namespace inherits the effective namespace of its parent.
The empty namespace cannot be specified as Node name ().
A child node may redefine its namespace by indicating (another.namespace), in which case it uses that namespace instead of the inherited one.
Only characters within the range [a-z0-9] are allowed, with optional @ at the beginning to indicate a special namespace.
A parser MUST internally convert a namespace to lowercase. For example, Name (COM.DEMO.DOCS) becomes Name (com.demo.docs).
By style rules, a namespace should be written in lowercase.

8. Indentation and Hierarchy

Indentation defines the structured hierarchy of the document.

8.1 Allowed indentation

An STXT document:

MAY use spaces or tabs for indentation.
It is not recommended to mix spaces and tabs on the same line. A parser MAY emit a warning in that case.
If spaces and tabs are mixed on the same line, the effective indentation MUST be calculated from left to right:
- Each tab completes the current level up to the next multiple of 4 columns.
- Spaces add one column each.
- The final result MUST be exactly equivalent to an integer number of levels. Following the Human-First principle, this rule seeks to ensure that a document that looks correct is also really correct.
If it uses spaces:
- It MUST use multiples of 4 spaces to increase a level.
If it uses tabs:
- Each tab represents exactly 1 level.
The following are equivalent as an increase of 1 level from an aligned column:
- 4 spaces
- 1 tab
- 1, 2, or 3 spaces followed by 1 tab
Once a full level has been reached, the calculation continues from that new base column.

8.2 Special indentation examples

In the following examples, . is shown to identify a space, and |--> to identify a tab. The tab is represented as occupying up to the next column that is a multiple of 4, as a text editor would do.

Example with tabs:

Level 0 node: Level 0 value
|-->Level 1 node:
|-->Another level 1 node:
|-->|-->Level 2:
|-->|-->Level 2:
|-->Level 1:
|-->Level 1:

Example with spaces:

Level 0 node: Level 0 value
....Level 1 node:
....Another level 1 node:
........Level 2:
........Level 2:
....Level 1:
....Level 1:

Example with a mix of spaces and tabs.

Allowed, although not recommended by style. A parser MAY give a warning about mixing on the same line. This example has the same indentation as the previous two.

Level 0 node: Level 0 value
.|-->Level 1 node: 1 space + 1 TAB: level 1
..|-->Another level 1 node: 2 spaces + 1 TAB: level 1
...|-->..|-->Level 2: 3 spaces + 1 TAB, 2 spaces + 1 TAB: level 2
|-->....Level 2: 1 TAB, 4 Spaces: level 2
..|-->Level 1: 2 spaces + 1 TAB: level 1
.|-->Level 1: 1 space + 1 TAB: level 1

8.3 Level errors

A parser MUST give a parse error in the following cases:

Non-consecutive levels:

Level 0:
....Level 1:
............Level3: ERROR, you cannot go from level 1 to level 3

Not reaching a multiple of 4 when using spaces or a mix

Level 0:
....Level 1
...Almost level 1: ERROR: 3 spaces (does not reach 4)

Level 0:
....Level 1:
.|-->..Almost level 2: ERROR: 1 space + 1 TAB, 2 spaces

Level 0:
....Level 1:
..........More than level 2: ERROR: 4 spaces, 4 spaces, 2 spaces

8.4 Hierarchy

Indentation MUST increase consecutively (jumps are not allowed).
Child nodes MUST have greater indentation than their parent.
Indentation inside a >> block does not affect structural hierarchy: it is simply text.

9. Comments

Outside >> blocks, a line is a comment if, after its indentation, the first character is #.

Example:

# Root comment
Node:
    # Inner comment

9.1 Comments inside `>>` blocks

Inside a >> block:

Any line with indentation strictly greater than that of the >> node MUST be treated as literal text, even if it starts with #.
A non-empty line with indentation less than or equal to that of the >> node is outside the block.

Example:

# A normal comment (level 0)
Document:
	# Another normal comment
			# This is also a comment! Outside block >>
    Text >>
        # This is text
        Normal line
            # This is also normal text
    # This one is a comment
# This is also a comment
    	Here the text of the node continues

9.2 Style for comments

It is recommended that the comment be at the same level as the following node. That is, comments for the following node.
Comments inside a text block are not recommended, since visually they are strange.

10. Whitespace normalization

This section defines how whitespace must be normalized in order to guarantee that different implementations produce the same logical representation from the same STXT text.

10.1 Inline values (`:`)

When parsing a node with ::

The parser takes all characters from immediately after : to the end of the line.
The inline value MUST be normalized by applying:
- Removal of leading spaces and tabs (left trim).
- Removal of trailing spaces and tabs (right trim).

This implies that the following lines are equivalent at the parsing level:

Name: Joan
Name:     Joan
Name: Joan
Name:     Joan

In all cases, the logical value of the Name node is "Joan".

If after the trim the value is empty, the inline value is considered the empty string ("").

10.2 Lines inside `>>` blocks

For each line that belongs to a >> block:

The parser determines the content of the line from the text that follows the minimum indentation of the block (that is, it removes only the block indentation, but preserves any additional indentation as part of the text).
On that content, the parser MUST remove all trailing spaces and tabs (right trim).
Empty lines are preserved in all cases.

Example of line canonicalization:

Block >>
    Hello
        World

Logical representation of the block content:

Line 1: "Hello"
Line 2: " World" (the 4 additional spaces after the minimum indentation are preserved; the spaces at the end are removed)

10.3 Empty lines in `>>` blocks

Empty lines inside the block, whether intermediate or final, MUST be preserved as empty lines ("") in the logical representation of the text.
Only right trim is applied on each individual line (removal of spaces and tabs at the end of the line).
No empty line is removed, neither intermediate nor final.

Example:

Text >>
    Line 1

    Line 2

Logical content of the block:

Line 1: "Line 1"
Line 2: ""
Line 3: "Line 2"
Line 4: ""

11. Error Rules

A document is invalid if any of these conditions occurs:

Spaces that are not multiples of 4 (when spaces are used for indentation).
Jumps in indentation levels.
A >> node contains significant inline content on the same line as >>.
A node contains neither : nor >>.

A conforming parser MUST reject the document.

12. Conformance

An STXT implementation is conforming if:

It implements the syntax described in this document.
It applies the strict indentation and hierarchy rules.
It correctly interprets nodes with : and >> blocks.
It interprets comments outside >> blocks.
It treats everything inside >> blocks as literal text.
It applies the whitespace normalization rules of section 10.
It rejects invalid documents according to section 11.

13. File Extension and Media Type

13.1 File Extension

STXT documents SHOULD use the extension: .stxt

13.2 Media Type (MIME)

Official media type: text/stxt
Compatible alternative: text/plain

14. Normative Examples

14.1 Valid document

Document (com.example.docs):
    Author: Joan
    Date: 2025/12/03
    Summary >>
        This is a text block.
        With several lines.
    Config:
        Mode: Active

14.2 Block with empty lines

Text>>

    Line 2

Logical content of the block:

""
"Line 2"

14.3 Comments inside and outside blocks

Document:
    Body >>
        # This is text
        More text
    # This one is a comment

15. Security Considerations

STXT has been designed with parsing security as a fundamental priority, minimizing the attack surface compared to other structured textual formats.

A conforming STXT parser is inherently resistant to common classes of vulnerabilities:

Immune to entity expansion attacks (such as "billion laughs" or XXE): the format does not define entities, external references, or inclusion of remote resources.
Immune to arbitrary code execution: there are no dynamic features, custom tags, loaders, or object deserialization. The only resulting structure is a simple tree of nodes and textual values.
Immune to injection inside literal blocks: all content inside a >> node is treated as literal text without any interpretation, even if it contains :, >>, #, or other STXT syntax.
Low risk of denial of service: the strict rules of consecutive indentation and the absence of circular references or anchors limit structural complexity. Implementations SHOULD impose a reasonable limit on nesting depth (recommended: ≤ 100 levels) and total document size.
Optional external schemas: semantic validation is a separate layer. A basic parser MAY operate without loading external schemas, eliminating risks associated with their resolution.

Consequently, STXT is especially suitable for processing documents from untrusted sources (remote configurations, user input, data exchange) where parser security is critical.

Implementations MUST reject invalid documents according to section 11 and MUST NOT introduce extensions that allow external loading or dynamic evaluation without explicit security measures.

16. Appendix A — Grammar (Informal)

Document       = { Line }

Line           = [Indentation] ( Comment | Node | BlockContinuation | EmptyLine )

Node            = Indentation Name [Namespace] ( Inline | BlockStart )
Inline          = ":" [Space] [InlineText]
BlockStart      = [Space] ">>" [TrailingSpaces]

Namespace       = "(" ["@"] Ident { "." Ident } ")"
Ident           = [a-z0-9]+   ; only lowercase and numbers according to style and normalization rules

Comment      = "#" { any character until end of line }

BlockContinuation = IndentationGreaterThanPreviousBlock { any text }   ; literal text

Indentation     = Allowed mix of spaces and tabs according to section 8
                  - Pure spaces: exact multiples of 4 per level
                  - Pure tabs: 1 tab = 1 level
                  - Mixed on line: calculation by columns; each tab completes up to the next multiple of 4

Name          = Normalized text (trim + space compaction) according to section 4.1

Key notes for implementers:

The parser must process the document line by line, maintaining state of:
- Current indentation level of the parent node.
- Base indentation and state of the active >> block (if any).
- Current inherited namespace.
Basic parsing flow:
1. Read line and calculate its effective indentation (according to the rules of section 8).
2. If there is an active >> block:
  - If the line is empty → add empty line to the block.
  - If indentation > indentation of the >> node → add line as literal text (right trim).
  - If indentation ≤ indentation of the >> node → close block and process the line outside the block.
3. If there is no active block:
  - Empty line → ignore (does not affect hierarchy).
  - Starts with # → comment.
  - Otherwise → new node (normalize name, detect namespace, type : or >>).
Namespace inheritance:
- The effective namespace of the root node is empty by default.
- Each child node without an explicit namespace inherits the effective namespace of its parent.
- If a node defines its own namespace within (), this replaces the inherited one for it and all its descendants.
Additional normalization:
- Node names: according to section 4.1–4.3.
- Namespaces: internally converted to lowercase (section 7).
- Inline values: left and right trim (section 10.1).
- Block lines: preserve relative indentation + right trim + preserve all empty lines (section 10.2–10.3).

17. Appendix B — Interaction with `@stxt.schema`

The schema system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the schema system (STXT-SCHEMA-SPEC).

A schema is an STXT document whose namespace is: @stxt.schema

and whose objective is to define the structural rules, value types, and cardinalities of the nodes belonging to a specific namespace.

The STXT core does not interpret these rules; it only defines how they are expressed and how they are combined through namespaces.

17.1. Associating a schema to a namespace

To associate a schema with the namespace com.example.docs, a document is written:

Schema (@stxt.schema): com.example.docs
	Node: Email
		Children:
			Child: From
			Child: To
			Child: Cc
			Child: Bcc
			Child: Title
				Max: 1
			Child: Body Content
				Min: 1
				Max: 1
			Child: Metadata (org.example.meta)
				Max: 1
	Node: From
	Node: To
	Node: Cc
	Node: Bcc
	Node: Title
	Node: Body Content
		Type: TEXT

17.2. Application to STXT documents

A document that declares the same namespace:

Document (com.example.docs):
    Field1: value
    Text: one
    Text: two

can be validated by an implementation that supports STXT schemas:

Validating the presence of nodes according to Node in the schema.
Validating value types (TEXT, DATE, NUMBER, etc.).
Validating cardinalities defined in Child.

17.3. Core independence

STXT MUST NOT impose semantic rules coming from schemas. The schema system is a separate and optional component that operates on the already parsed STXT.

It MAY also act as part of the parsing process. In that case it SHOULD be weakly coupled with it. This would make it possible to detect errors without having to wait until the end of parsing.

18. Appendix C — Interaction with `@stxt.template`

The template system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the template system (STXT-TEMPLATE-SPEC).

A template is an STXT document whose namespace is: @stxt.template

and whose objective is to define the structural rules, value types, and cardinalities of the nodes belonging to a specific namespace.

The template system is analogous to schemas, but with a simplified syntax, oriented toward rapid prototypes. Even so, it is a perfectly valid system for all kinds of documents. It could be considered syntactic sugar, since internally it can use the same representation as a schema.

The template system MAY coexist alongside a system with schemas, since in the end a template defines the same information as a schema.

18.1. Associating a template to a namespace

To associate a template with the namespace com.example.docs, a document is written:

Template (@stxt.template): com.example.docs
	Structure >>
		Email (com.example.docs):
			From:
			To:
			Cc:
			Bcc:
			Title: (?)
			Body    Content: (1) TEXT
			Metadata (org.example.meta): (?)

Once defined, a template fulfills the same function as a schema. If an implementation finds several schemas or templates applicable to the same namespace, SHOULD define a clear and deterministic priority policy. For a specific validation, a single effective semantic source MUST be selected: either a schema or a template.

19. End of Document

◄ Tutorial STXT Schema ►