What are nodes in YAML?

In YAML, nodes are the fundamental building blocks that form its hierarchical data structure, representing individual pieces of information or collections of data. Think of them as the basic units—similar to elements in XML or objects/arrays in JSON—that comprise a YAML document's content.

YAML's flexibility stems from its ability to combine these nodes to represent complex data in a human-readable format. At its core, YAML data is structured as a tree, with each piece of data, whether a simple value or a complex collection, being a node.

Types of YAML Nodes

YAML primarily utilizes three main types of nodes to organize data:

Scalars: These are the simplest nodes, representing single, atomic values such as strings, numbers, booleans, or nulls. They do not contain other nodes.
- Examples:
```
name: John Doe
age: 30
isStudent: true
price: 9.99
description: This is a string scalar.
```
Mappings (Dictionaries/Objects): YAML calls these "mappings," and they are equivalent to dictionaries or objects in other programming languages. A mapping node consists of an unordered set of unique key-value pairs. Each key within a mapping must be unique, and both the keys and values are themselves nodes (often scalars, but can also be other mappings or sequences).
- Key Characteristics:
  - Keys are unique within a given mapping.
  - Values can be any other node type, allowing for nested structures.
  - Represented by key: value pairs, often indented.
- Examples:
```
person:
  firstName: Alice
  lastName: Smith
  occupation: Engineer
```
Sequences (Lists/Arrays): These nodes represent an ordered collection of other nodes, similar to lists or arrays. Each item in a sequence is a separate node, which can be a scalar, mapping, or another sequence.
- Key Characteristics:
  - Items maintain their order.
  - Items are typically denoted by hyphens (-).
- Examples:
```
fruits:
  - Apple
  - Banana
  - Cherry
tasks:
  - buy groceries
  - clean house
  - read book
```

Understanding Node Structure

The power of YAML nodes lies in their ability to nest, creating complex hierarchical data structures. A mapping can contain sequences, which in turn can contain other mappings or scalars, and so on. This nesting is primarily managed through indentation, where increased indentation signifies a deeper level in the node hierarchy.

Consider this example demonstrating nested nodes:

config:
  database:
    type: PostgreSQL
    host: localhost
    port: 5432
  users:
    - id: 1
      name: Alice
      email: [email protected]
    - id: 2
      name: Bob
      email: [email protected]
  settings:
    darkMode: true
    notifications:
      email: true
      sms: false

In this example:

config is a mapping node.
database, users, and settings are keys within the config mapping, and their values are themselves mapping or sequence nodes.
users is a sequence node containing two mapping nodes, each representing a user.
id, name, email, type, host, port, darkMode, email (under notifications), and sms are scalar nodes serving as keys or values.

Why Nodes Matter

Understanding YAML nodes is crucial for:

Parsing and Generating YAML: Knowing the node types helps in correctly interpreting or constructing YAML documents programmatically.
Data Validation: It allows for defining schemas that validate the structure and types of data within a YAML file.
Troubleshooting: Identifying specific node types can help pinpoint issues in configuration files or data serialization.

For a deeper dive into the YAML specification, you can refer to the YAML 1.2 Specification.