STXT - Semantic Text
Built for humans. Reliable for machines.

STXT Documents

1. Introduction
2. Terminology
3. Document Encoding
4. Syntactic Unit: Node
5. Nodes with `:` (container nodes, allow inline value)
6. Nodes with `>>` (text block)
7. Namespaces
8. Indentation and Hierarchy
9. Comments
10. Whitespace normalization
11. Error Rules
12. Conformance
13. File Extension and Media Type
14. Normative Examples
15. Security Considerations
16. Appendix A — Grammar (Informal)
17. Appendix B — Interaction with `@stxt.schema`
18. Appendix B — Interaction with `@stxt.template`
19. End of Document

1. Introduction

This document defines the specification of the STXT (Semantic Text) language.

STXT is a Human-First language, designed so that its natural form is readable, clear, and comfortable for people, while at the same time maintaining a precise structure and easily processable by machines.

STXT is a hierarchical and semantic textual format aimed at:

This document describes the base syntax of the language.

2. Terminology

The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as described in RFC 2119.

3. Document Encoding

An STXT document SHOULD be encoded in UTF-8 without BOM.

A parser:

4. Syntactic Unit: Node

Each non-empty line of the document that is not a comment nor part of a >> block defines a node.

There are two forms of node:

  1. Inline container node (INLINE text node): Node name: Inline value
  2. Text block node (BLOCK text node): Node name >>

The node name cannot be empty. A line with only : or >> is not valid.

Example with INLINE nodes:

Node 1: Inline value
	Node 2 without value:
	Node 3 with another value: this is the other value

Example with a BLOCK node:

Block node >>
	This is the content
	of the text block:

	  - Leading spaces and line breaks are preserved
	  - Right trim is applied
	  - Left trim is NOT applied

A node may optionally include a namespace:

Name (namespace.normal):
Name (@namespace.special):

4.1 Node name normalization

The node name is taken from the text between:

On that fragment, apply:

The result of this normalization is the node name.

A node whose logical name is the empty string ("") is invalid and MUST cause a parse error.

Equivalent examples at the Node name level:

Node name:
Node name: value
Node  name   : value
Node  name (@a.special.namespace):
Node name(a.normal.namespace):
Node  name >>
Node name>>

The definition of a node must always include either : (INLINE container node) or >> (BLOCK text node), always preceded by a non-empty name.

4.2 Node name restrictions

The node name will only allow alphanumeric characters and the characters -, _, . Names with diacritics, uppercase and lowercase are allowed.

4.3 Canonical node name

The canonical name is formed from the node name through the following process:

The canonical name will be used to know whether a node has the same name as another. It will also be used internally by all search or check operations, to know whether it is the same element.

Transformation examples:

A namé with äccent: a-name-with-accent
AN NAME with äccent: a-name-with-accent
SIZe number 2__ and 3: size-number-2-y-3

4.4 Style rules

The recommended style rules are as follows:

Examples of correct style:

Name with value: The value
Name without value:
Name with namespace (the.namespace):
Text node >>

5. Nodes with `:` (container nodes, allow inline value)

The form with : defines a node that:

Examples:

Title: Report
Author: Joan
Node:
Node: Value
Node:
    SubNode 1: 123
    Another subnode: 456

5.1 Value normalization

The (INLINE) value of a node must be normalized with a trim (right and left).

Example:

Name: value 1
Name:    value 1
# in both cases, the inline value of Name is "value 1", even though in the
# second there are spaces before and after.

Strong normalization applies only to structural identifiers. Values are literals, although a simple normalization is applied: left and right trim.

6. Nodes with `>>` (text block)

The form with >> defines a literal text block.

Valid examples:

Description >>
    Line 1
    Line 2
Section>>
    Accepts the operator without a space

6.1 Formal rules

6.2 Example

Block >>
    Text
        Child: value YES allowed, it is text, it is not parsed
        Another child: YES allowed
    # This is also text
Next Node: value

In this example:

7. Namespaces

A namespace is optional and is specified like this:

Node (com.example.docs):
Another node (another.namespace):
More nodes (@a.special.name):

Rules:

8. Indentation and Hierarchy

Indentation defines the structured hierarchy of the document.

8.1 Allowed Indentation

An STXT document:

8.2 Special indentation examples

In the following examples . is shown to identify a space, and |--> to identify a Tab. The tab will be shown with the characters missing until reaching the next column, like a text editor.

Example with tabs:

Node level 0: Value level 0
|-->Node level 1:
|-->Another node level 1:
|-->|-->Level 2:
|-->|-->Level 2:
|-->Level 1:
|-->Level 1:

Example with spaces:

Node level 0: Value level 0
....Node level 1:
....Another node level 1:
........Level 2:
........Level 2:
....Level 1:
....Level 1:

Example with a mix of spaces and tabs.

Allowed, though not recommended by style. A parser MAY warn about mixing on the same line. This example has the same indentation as the two previous ones.

Node level 0: Value level 0
.|->Node level 1: Space + 1 TAB: level 1
..|>Another node level 1: 2 Spaces + 1 TAB: level 1
...>..|>Level 2: 3 Spaces + 1 TAB, 2 Spaces + 1 TAB: level 2
|-->....Level 2: 1 TAB, 4 Spaces: level 2
..|>Level 1: 2 Spaces + 1 TAB: level 1
.|->Level 1: 1 Space + 1 TAB: level 1

8.3 Level errors

A parser MUST raise a parse error in the following cases:

Level 0:
....Level 1:
............Level3: ERROR, you cannot go from level 1 to level 3
Level 0:
....Level 1
...Almost level 1: ERROR: 3 spaces (does not reach 4)

Level 0:
....Level 1:
.|->..Almost level 2: ERROR: 1 spaces + 1TAB, 2 spaces

Level 0:
....Level 1:
..........More than level 2: ERROR: 4 spaces, 4 spaces, 2 spaces

8.4 Hierarchy

9. Comments

Outside >> blocks, a line is a comment if, after its indentation, the first character is #.

Example:

# Root comment
Node:
    # Inner comment

9.1 Comments inside `>>` blocks

Inside a >> block:

Example:

# A normal comment (level 0)
Document:
	# Another normal comment
			# This is also a comment! Outside >> block
    Text >>
        # This is text
        Normal line
            # This is also normal text
    # This is a comment
# This is also a comment
    	Here the node text continues

9.2 Comment style

10. Whitespace normalization

This section defines how whitespace must be normalized to ensure that different implementations produce the same logical representation from the same STXT text.

10.1 Inline values (`:`)

When parsing a node with ::

  1. The parser takes all characters from immediately after : to the end of the line.

  2. The inline value MUST be normalized by applying:

    • Removal of leading spaces and tabs (left trim).
    • Removal of trailing spaces and tabs (right trim).

This implies that the following lines are equivalent at the parsing level:

Name: Joan
Name:     Joan
Name: Joan
Name:     Joan

In all cases, the logical value of the Name node is "Joan".

If after trim the value is empty, the inline value is considered the empty string ("").

10.2 Lines inside `>>` blocks

For each line that belongs to a >> block:

  1. The parser determines the content of the line from the text that follows the minimum indentation of the block (i.e., it removes only the block indentation, but preserves any additional indentation as part of the text).
  2. On that content, the parser MUST remove all trailing spaces and tabs (right trim).
  3. Empty lines are preserved in all cases, except lines that are real comments, with indentation lower than the block.

Example of line canonicalization:

Block >>
    Hello
        World

Logical representation of the block content:

10.3 Empty lines in `>>` blocks

Example:

Text >>
    Line 1

    Line 2

Logical content of the block:

11. Error Rules

A document is invalid if any of these conditions occur:

  1. Spaces that are not multiples of 4 (when spaces are used for indentation).
  2. Jumps in indentation levels.
  3. A >> node contains meaningful inline content on the same line as >>.
  4. A node contains neither : nor >>.

A conforming parser MUST reject the document.

12. Conformance

An STXT implementation is conforming if:

13. File Extension and Media Type

13.1 File Extension

STXT documents SHOULD use the extension: .stxt

13.2 Media Type (MIME)

14. Normative Examples

14.1 Valid document

Document (com.example.docs):
    Author: Joan
    Date: 2025/12/03
    Summary >>
        This is a text block.
        With multiple lines.
    Config:
        Mode: Active

14.2 Block with empty lines

Text>>

    Line 2

Logical content of the block:

  1. ""
  2. "Line 2"

14.3 Comments inside and outside blocks

Document:
    Body >>
        # This is text
        More text
    # This is a comment

15. Security Considerations

STXT has been designed with parsing security as a fundamental priority, minimizing the attack surface compared to other structured textual formats.

A conforming STXT parser is inherently resistant to common classes of vulnerabilities:

Consequently, STXT is especially suitable for processing documents from untrusted sources (remote configurations, user input, data exchange) where parser security is critical.

Implementations MUST reject invalid documents according to section 11 and MUST NOT introduce extensions that allow external loading or dynamic evaluation without explicit security measures.

16. Appendix A — Grammar (Informal)

Document       = { Line }

Line           = [Indentation] ( Comment | Node | BlockContinuation | EmptyLine )

Node            = Indentation Name [Namespace] ( Inline | BlockStart )
Inline          = ":" [Space] [InlineText]
BlockStart      = [Space] ">>" [TrailingSpaces]

Namespace       = "(" ["@"] Ident { "." Ident } ")"
Ident           = [a-z0-9]+   ; lowercase and numbers only according to style and normalization rules

Comment      = "#" { any character until end of line }

BlockContinuation = IndentationGreaterThanPreviousBlock { any text }   ; literal text

Indentation     = Allowed mix of spaces and tabs according to section 8
                  - Pure spaces: exact multiples of 4 per level
                  - Pure tabs: 1 tab = 1 level
                  - Mixed in line: tab wins, spaces <4 are ignored

Name          = Normalized text (trim + space compaction) according to section 4.1

Key notes for implementers:

17. Appendix B — Interaction with `@stxt.schema`

The schema system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the schema system (STXT-SCHEMA-SPEC).

A schema is an STXT document whose namespace is: @stxt.schema

and whose goal is to define the structural rules, value types, and cardinalities of nodes belonging to a specific namespace.

The STXT core does not interpret these rules; it only defines how they are expressed and how they are combined via namespaces.

17.1. Associating a schema to a namespace

To associate a schema to the namespace com.example.docs, write a document:

Schema (@stxt.schema): com.example.docs
	Node: Email
		Children:
			Child: From
			Child: To
			Child: Cc
			Child: Bcc
			Child: Title
				Max: 1
			Child: Body Content
				Min: 1
				Max: 1
			Child: Metadata (com.google)
				Max: 1
	Node: From
	Node: To
	Node: Cc
	Node: Bcc
	Node: Title
	Node: Body Content
		Type: TEXT

17.2. Application to STXT documents

A document that declares the same namespace:

Document (com.example.docs):
    Field1: value
    Text: one
    Text: two

can be validated by an implementation that supports STXT schemas:

17.3. Core independence

STXT MUST NOT impose semantic rules coming from schemas. The schema system is a separate and optional component that operates on the already-parsed STXT.

It also MAY act as part of the parsing process. In that case it SHOULD be weakly coupled with it. This would allow detecting errors without having to wait until the end of parsing.

18. Appendix B — Interaction with `@stxt.template`

The template system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the template system (STXT-TEMPLATE-SPEC).

A template is an STXT document whose namespace is: @stxt.template

and whose goal is to define the structural rules, value types, and cardinalities of nodes belonging to a specific namespace.

The Template system is analogous to schemas, but with a simplified syntax, oriented toward rapid prototypes. Even so, it is a perfectly valid system for all kinds of documents. It could be considered syntactic sugar, since internally it can use the same representation as a schema.

The template system MAY coexist alongside a schema system, since in the end a Template defines the same information as a schema.

18.1. Associating a schema to a template

To associate a schema to the namespace com.example.docs with templates, write a document:

Template (@stxt.template): com.example.docs
	Structure >>
		Email:
			From:
			To:
			Cc:
			Bcc:
			Title: (?)
			Body    Content: (1) TEXT
			Metadata (com.google): (?)

Once declared, templates fulfill the same function as schemas. A standard validator SHOULD prioritize a schema over a template.

19. End of Document