STXT - Semantic Text
Built for humans. Reliable for machines.

STXT Documents

1. Introduction
2. Terminology
3. Document Encoding
4. Syntactic Unit: Node
5. Container nodes, INLINE type
6. Text block nodes, BLOCK type
7. Namespaces
8. Indentation and Hierarchy
9. Comments
10. Whitespace normalization
11. Error Rules
12. Conformance
13. File Extension and Media Type
14. Normative Examples
15. Security Considerations
16. Appendix A — Grammar (Informal)
17. Appendix B — Interaction with `@stxt.schema`
18. Appendix C — Interaction with `@stxt.template`
19. End of Document

1. Introduction

This document defines the specification of the STXT (Semantic Text) language.

STXT is a Human-First language, designed so that its natural form is readable, clear, and comfortable for people, while at the same time maintaining a precise and easily machine-processable structure.

STXT is a hierarchical and semantic textual format oriented to:

This document describes the base syntax of the language.

2. Terminology

The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY" must be interpreted according to RFC 2119.

3. Document Encoding

An STXT document SHOULD be encoded in UTF-8 without BOM.

A parser:

4. Syntactic Unit: Node

Each non-empty line of the document that is not a comment nor part of a >> block defines a node.

There are two forms of node:

  1. Inline container node (INLINE node): Node name: Inline value
  2. Text block node (BLOCK node): Node name >>

The node name cannot be empty. A line with only : or >> is not valid.

Example with INLINE nodes:

Node 1: Inline value
	Node 2 without value:
	Node 3 with another value: this is the other value

Example with BLOCK node:

Block node >>
	This is the content
	of the text block:

	  - Leading spaces and line breaks are preserved
	  - Right trim is applied
	  - Left trim is NOT applied

A node may optionally include a namespace:

Name (normal.namespace):
Name (@special.namespace):

4.1 Normalization of the node name

The node name is taken from the text between:

On that fragment, the following is applied:

The result of this normalization is the node name.

A node whose logical name is the empty string ("") is invalid and MUST cause a parse error.

Equivalent examples at the Node name level:

Node name:
Node name: value
Node  name   : value
Node  name (@a.special.namespace):
Node name(a.normal.namespace):
Node  name >>
Node name>>

The definition of a node MUST always include either : (INLINE container node) or >> (BLOCK text node), always preceded by a non-empty name.

4.2 Restrictions on the node name

The node name will only allow alphanumeric characters and the characters -, _, . Names with diacritics, uppercase, and lowercase are allowed.

4.3 Canonical node name

The canonical name is formed from the node name through the following process:

The canonical name will be used to know whether one node has the same name as another. It will also be used internally by all search or checking operations, to know whether it is the same element.

Examples of transformation:

A namé with äccent: a-name-with-accent
A NAME with äccent: a-name-with-accent
SIZe number 2__ and 3: size-number-2-y-3

4.4 Style rules

The recommended style rules are the following:

Examples of correct style:

Name with value: The value
Name without value:
Name with namespace (the.namespace):
Text node >>

5. Container nodes, INLINE type

The form with : defines an INLINE container node with the following characteristics:

Examples:

Title: Report
Author: Joan
Node:
Node: Value
Node:
    SubNode 1: 123
    Another subnode: 456

5.1 Value normalization

The (INLINE) value of a node must be normalized with a trim (right and left).

Example:

Name: value 1
Name:    value 1
# in both cases, the inline value of Name is "value 1", although in the
# second there are spaces before and after.

Strong normalization applies only to structural identifiers. Values are literal, although a simple normalization is applied: right and left trim.

6. Text block nodes, BLOCK type

The form with >> defines a block of literal text.

Valid examples:

Description >>
    Line 1
    Line 2
Section>>
    Accepts the operator without space

6.1 Formal rules

6.2 Example

Block >>
    Text
        Child: value YES allowed, it is text, it is not parsed
        Another child: YES allowed
    # This is also text
Next Node: value

In this example:

7. Namespaces

A namespace is optional and is specified like this:

Node (com.example.docs):
Another node (another.namespace):
More nodes (@a.special.name):

Rules:

8. Indentation and Hierarchy

Indentation defines the structured hierarchy of the document.

8.1 Allowed indentation

An STXT document:

8.2 Special indentation examples

In the following examples, . is shown to identify a space, and |--> to identify a tab. The tab is represented as occupying up to the next column that is a multiple of 4, as a text editor would do.

Example with tabs:

Level 0 node: Level 0 value
|-->Level 1 node:
|-->Another level 1 node:
|-->|-->Level 2:
|-->|-->Level 2:
|-->Level 1:
|-->Level 1:

Example with spaces:

Level 0 node: Level 0 value
....Level 1 node:
....Another level 1 node:
........Level 2:
........Level 2:
....Level 1:
....Level 1:

Example with a mix of spaces and tabs.

Allowed, although not recommended by style. A parser MAY give a warning about mixing on the same line. This example has the same indentation as the previous two.

Level 0 node: Level 0 value
.|-->Level 1 node: 1 space + 1 TAB: level 1
..|-->Another level 1 node: 2 spaces + 1 TAB: level 1
...|-->..|-->Level 2: 3 spaces + 1 TAB, 2 spaces + 1 TAB: level 2
|-->....Level 2: 1 TAB, 4 Spaces: level 2
..|-->Level 1: 2 spaces + 1 TAB: level 1
.|-->Level 1: 1 space + 1 TAB: level 1

8.3 Level errors

A parser MUST give a parse error in the following cases:

Level 0:
....Level 1:
............Level3: ERROR, you cannot go from level 1 to level 3
Level 0:
....Level 1
...Almost level 1: ERROR: 3 spaces (does not reach 4)

Level 0:
....Level 1:
.|-->..Almost level 2: ERROR: 1 space + 1 TAB, 2 spaces

Level 0:
....Level 1:
..........More than level 2: ERROR: 4 spaces, 4 spaces, 2 spaces

8.4 Hierarchy

9. Comments

Outside >> blocks, a line is a comment if, after its indentation, the first character is #.

Example:

# Root comment
Node:
    # Inner comment

9.1 Comments inside `>>` blocks

Inside a >> block:

Example:

# A normal comment (level 0)
Document:
	# Another normal comment
			# This is also a comment! Outside block >>
    Text >>
        # This is text
        Normal line
            # This is also normal text
    # This one is a comment
# This is also a comment
    	Here the text of the node continues

9.2 Style for comments

10. Whitespace normalization

This section defines how whitespace must be normalized in order to guarantee that different implementations produce the same logical representation from the same STXT text.

10.1 Inline values (`:`)

When parsing a node with ::

  1. The parser takes all characters from immediately after : to the end of the line.

  2. The inline value MUST be normalized by applying:

    • Removal of leading spaces and tabs (left trim).
    • Removal of trailing spaces and tabs (right trim).

This implies that the following lines are equivalent at the parsing level:

Name: Joan
Name:     Joan
Name: Joan
Name:     Joan

In all cases, the logical value of the Name node is "Joan".

If after the trim the value is empty, the inline value is considered the empty string ("").

10.2 Lines inside `>>` blocks

For each line that belongs to a >> block:

  1. The parser determines the content of the line from the text that follows the minimum indentation of the block (that is, it removes only the block indentation, but preserves any additional indentation as part of the text).
  2. On that content, the parser MUST remove all trailing spaces and tabs (right trim).
  3. Empty lines are preserved in all cases.

Example of line canonicalization:

Block >>
    Hello
        World

Logical representation of the block content:

10.3 Empty lines in `>>` blocks

Example:

Text >>
    Line 1

    Line 2

Logical content of the block:

11. Error Rules

A document is invalid if any of these conditions occurs:

  1. Spaces that are not multiples of 4 (when spaces are used for indentation).
  2. Jumps in indentation levels.
  3. A >> node contains significant inline content on the same line as >>.
  4. A node contains neither : nor >>.

A conforming parser MUST reject the document.

12. Conformance

An STXT implementation is conforming if:

13. File Extension and Media Type

13.1 File Extension

STXT documents SHOULD use the extension: .stxt

13.2 Media Type (MIME)

14. Normative Examples

14.1 Valid document

Document (com.example.docs):
    Author: Joan
    Date: 2025/12/03
    Summary >>
        This is a text block.
        With several lines.
    Config:
        Mode: Active

14.2 Block with empty lines

Text>>

    Line 2

Logical content of the block:

  1. ""
  2. "Line 2"

14.3 Comments inside and outside blocks

Document:
    Body >>
        # This is text
        More text
    # This one is a comment

15. Security Considerations

STXT has been designed with parsing security as a fundamental priority, minimizing the attack surface compared to other structured textual formats.

A conforming STXT parser is inherently resistant to common classes of vulnerabilities:

Consequently, STXT is especially suitable for processing documents from untrusted sources (remote configurations, user input, data exchange) where parser security is critical.

Implementations MUST reject invalid documents according to section 11 and MUST NOT introduce extensions that allow external loading or dynamic evaluation without explicit security measures.

16. Appendix A — Grammar (Informal)

Document       = { Line }

Line           = [Indentation] ( Comment | Node | BlockContinuation | EmptyLine )

Node            = Indentation Name [Namespace] ( Inline | BlockStart )
Inline          = ":" [Space] [InlineText]
BlockStart      = [Space] ">>" [TrailingSpaces]

Namespace       = "(" ["@"] Ident { "." Ident } ")"
Ident           = [a-z0-9]+   ; only lowercase and numbers according to style and normalization rules

Comment      = "#" { any character until end of line }

BlockContinuation = IndentationGreaterThanPreviousBlock { any text }   ; literal text

Indentation     = Allowed mix of spaces and tabs according to section 8
                  - Pure spaces: exact multiples of 4 per level
                  - Pure tabs: 1 tab = 1 level
                  - Mixed on line: calculation by columns; each tab completes up to the next multiple of 4

Name          = Normalized text (trim + space compaction) according to section 4.1

Key notes for implementers:

17. Appendix B — Interaction with `@stxt.schema`

The schema system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the schema system (STXT-SCHEMA-SPEC).

A schema is an STXT document whose namespace is: @stxt.schema

and whose objective is to define the structural rules, value types, and cardinalities of the nodes belonging to a specific namespace.

The STXT core does not interpret these rules; it only defines how they are expressed and how they are combined through namespaces.

17.1. Associating a schema to a namespace

To associate a schema with the namespace com.example.docs, a document is written:

Schema (@stxt.schema): com.example.docs
	Node: Email
		Children:
			Child: From
			Child: To
			Child: Cc
			Child: Bcc
			Child: Title
				Max: 1
			Child: Body Content
				Min: 1
				Max: 1
			Child: Metadata (org.example.meta)
				Max: 1
	Node: From
	Node: To
	Node: Cc
	Node: Bcc
	Node: Title
	Node: Body Content
		Type: TEXT

17.2. Application to STXT documents

A document that declares the same namespace:

Document (com.example.docs):
    Field1: value
    Text: one
    Text: two

can be validated by an implementation that supports STXT schemas:

17.3. Core independence

STXT MUST NOT impose semantic rules coming from schemas. The schema system is a separate and optional component that operates on the already parsed STXT.

It MAY also act as part of the parsing process. In that case it SHOULD be weakly coupled with it. This would make it possible to detect errors without having to wait until the end of parsing.

18. Appendix C — Interaction with `@stxt.template`

The template system allows adding semantic validation to STXT documents without modifying the base syntax of the language.

The STXT core does not define how an implementation should react: the behavior belongs exclusively to the template system (STXT-TEMPLATE-SPEC).

A template is an STXT document whose namespace is: @stxt.template

and whose objective is to define the structural rules, value types, and cardinalities of the nodes belonging to a specific namespace.

The template system is analogous to schemas, but with a simplified syntax, oriented toward rapid prototypes. Even so, it is a perfectly valid system for all kinds of documents. It could be considered syntactic sugar, since internally it can use the same representation as a schema.

The template system MAY coexist alongside a system with schemas, since in the end a template defines the same information as a schema.

18.1. Associating a template to a namespace

To associate a template with the namespace com.example.docs, a document is written:

Template (@stxt.template): com.example.docs
	Structure >>
		Email (com.example.docs):
			From:
			To:
			Cc:
			Bcc:
			Title: (?)
			Body    Content: (1) TEXT
			Metadata (org.example.meta): (?)

Once defined, a template fulfills the same function as a schema. If an implementation finds several schemas or templates applicable to the same namespace, SHOULD define a clear and deterministic priority policy. For a specific validation, a single effective semantic source MUST be selected: either a schema or a template.

19. End of Document