Graphviz: The DOT Language and Rendering Pipeline

Table of Contents

Overview

Graphviz is the diagram substrate for every research page on this site. Eighty-five .dot files render to PNG via the dot layout engine. This page documents the language, the rendering pipeline, and the integration with Emacs org-babel for literate programming.

Installed version: 14.1.1 (FreeBSD pkg, Dec 2025). Current upstream: 14.1.4 (graphviz.org/download/source).

The DOT Language

Formal grammar

The DOT language grammar from graphviz.org/doc/info/lang.html:

graph     : [ 'strict' ] ('graph' | 'digraph') [ ID ] '{' stmt_list '}'
stmt_list : [ stmt [ ';' ] stmt_list ]
stmt      : node_stmt | edge_stmt | attr_stmt | ID '=' ID | subgraph
attr_stmt : ('graph' | 'node' | 'edge') attr_list
attr_list : '[' [ a_list ] ']' [ attr_list ]
a_list    : ID '=' ID [ (';' | ',') a_list ]
edge_stmt : (node_id | subgraph) edgeRHS [ attr_list ]
edgeRHS   : edgeop (node_id | subgraph) [ edgeRHS ]
node_stmt : node_id [ attr_list ]
node_id   : ID [ port ]
port      : ':' ID [ ':' compass_pt ] | ':' compass_pt
subgraph  : [ 'subgraph' [ ID ] ] '{' stmt_list '}'
compass_pt: 'n' | 'ne' | 'e' | 'se' | 's' | 'sw' | 'w' | 'nw' | 'c' | '_'

ID types

Four forms of identifier:

Form Syntax Example
Alphanumeric [a-zA-Z_][a-zA-Z_0-9]* my_node
Numeral =[-]?(\.[0-9]+\ [0-9]+(\.[0-9]*)?)= 3.14
Quoted string "..." with \" escape "Node A"
HTML string <...> with matched angle brackets <B>bold</B>

Quoted strings support concatenation with + and backslash-continued newlines. HTML strings do not.

Keywords

Case-insensitive: node, edge, graph, digraph, subgraph, strict.

Edge operators

Graph type Operator
Directed ->
Undirected --

Comment syntax

Three forms, all standard C/C++:

The // line comment is the form org-babel uses for tangle breadcrumbs:

These comments survive parsing — dot -Tpng ignores them, dot -Tdot strips them from the canonical output. The breadcrumbs are invisible to the renderer but enable org-babel-detangle to map tangled code back to its source block.

Attribute inheritance

Default attributes propagate forward in the file:

This is why the wal.sh style guide places node [...] and edge [...] defaults at the top of every graph — they establish the palette for all nodes that follow.

Clusters

A subgraph whose name starts with cluster is laid out as a visual container by the dot engine. This is a convention, not a language rule — other layout engines may ignore it.

Rendering pipeline

diagram-rendering-pipeline.png

The pipeline: source text → libcgraph parser (builds an in-memory graph AST) → layout engine (assigns coordinates to every node and edge) → renderer (serializes to the target format).

dot -Tdot skips the renderer and emits the positioned graph as canonical DOT with coordinates — useful for debugging layout and for verifying that a source file parses correctly.

Layout engines

Engine Algorithm Use case
dot Hierarchical (Sugiyama) Directed graphs, layered diagrams — default for wal.sh
neato Spring model (Kamada-Kawai) Undirected graphs, network topologies
fdp Force-directed (Fruchterman-Reingold) Large undirected graphs
sfdp Scalable force-directed Very large graphs (10K+ nodes)
circo Circular layout Ring topologies, cyclic structures
twopi Radial layout Trees with a central root
osage Clustered rectangles Treemaps, space-filling
patchwork Squarified treemap Area-proportional visualization

All 85 wal.sh diagrams use dot (hierarchical). The rankdir attribute controls the primary axis: TB (top-to-bottom) for layer stacks, LR (left-to-right) for pipelines and state machines.

Org-babel integration

ob-dot

Emacs ob-dot enables dot src blocks in org files:

#+begin_src dot :file diagram.png :exports results :cmdline -Tpng
digraph G { a -> b; }
#+end_src

:file is required — ob-dot writes to this path. :cmdline -Tpng passes flags to the dot command. :exports results shows only the image in HTML output, not the source.

Tangle/detangle workflow

The round-trip workflow for editing diagrams:

  1. Tanglegmake tangle exports #+begin_src dot :tangle file.dot blocks from .org to standalone .dot files. With :comments link, the tangled file includes // [[file:...]] breadcrumbs.
  2. Edit — modify the .dot file directly (agents or manual).
  3. Compiledot -Tpng file.dot -o file.png to verify it renders.
  4. Detanglegmake detangle reads the breadcrumbs and pulls changes back into the org src blocks.
  5. Verify — re-tangle to confirm the round-trip is stable.

For this to work, the graphviz-dot-mode must set comment-start to "// " so org-babel knows how to write the breadcrumb comments. This is configured in project-config.el:

(add-to-list 'org-babel-tangle-lang-exts '("dot" . "dot"))
(add-hook 'graphviz-dot-mode-hook
          (lambda () (setq-local comment-start "// ")
                     (setq-local comment-end "")))

Canonical validation

dot -Tdot parses a source file and emits the canonical form with computed positions. Comments are stripped. This is useful for:

  • Verifying a file parses without errors
  • Comparing two files for structural equivalence (ignore formatting)
  • Debugging layout issues (the pos attributes show exact coordinates)

Batch validation of all diagrams:

find site/research -name '*.dot' | while read f; do
  dot -Tdot "$f" > /dev/null 2>&1 || echo "FAIL: $f"
done

wal.sh diagram inventory

As of 2026-05-20: 85 dot files across site/research/, all rendering without errors. See diagram-style-guide for the canonical palette and the six diagram archetypes (pipeline, layer stack, comparison, state machine, process loop, ecosystem map).

Emacs built-in support (31.0.50)

Emacs ships four relevant subsystems for DOT. No external packages required.

ob-dot (org-babel)

Built-in since Emacs 24. Evaluates #+begin_src dot blocks.

Header arg Default Purpose
:results file Output is always a file (image)
:exports results Show image, not source
:file (required) Output path, extension sets format
:cmdline -T<ext> Passed to dot command
:cmd dot Layout engine (neato, fdp, etc.)

The execute function writes the body to a temp file, calls dot <tmpfile> <cmdline> -o <outfile>, and returns the file path. Variables ($name) are interpolated from :var headers.

No session support — each evaluation is a fresh process.

ob-tangle + comment-start

Tangle writes DOT src blocks to standalone .dot files. With :comments link, breadcrumb comments enable detangle (round-trip).

Requires comment-start set to "// " in the target buffer. Our project-config.el registers this via graphviz-dot-mode-hook and adds ("dot" . "dot") to org-babel-tangle-lang-exts.

wisent (LALR parser generator)

Built-in since Emacs 23 (part of CEDET/Semantic). Accepts BNF grammars in .wy files, generates Emacs Lisp LALR(1) parsers.

The DOT grammar translates mechanically from the spec's EBNF:

EBNF wisent BNF
[ X ] opt_X : /* empty */ \vert X ;
( A \vert B ) Separate alternatives
: production op Same syntax

Compile path: M-x semantic-grammar-batch-build-packages on a .wy file produces a *-wy.el parser table.

semantic-lex (lexer framework)

Built-in companion to wisent. Defines lexer rules as define-lex-regex-analyzer forms. The DOT lexer needs:

  • // line comments (trivial)
  • # preprocessor lines (trivial)
  • /* */ block comments (via syntax table)
  • Quoted strings with \" escape and backslash-NL continuation
  • HTML strings with balanced <...> nesting
  • Numeric IDs (-?(\.[0-9]+|[0-9]+(\.[0-9]*)?))
  • Case-insensitive keywords (graph = Graph = GRAPH)

The first three are trivial; the last four are the real work.

graphviz-dot-mode (external package)

Not built-in. Installed from MELPA via project-config.el. Provides:

  • Syntax highlighting for .dot files
  • comment-start / comment-end (needed for tangle breadcrumbs)
  • Indentation
  • compile integration (C-c C-c runs dot)

Wisent grammar for DOT

The full wisent grammar is at ../../research/graphviz/dot.wy (if tangled from the companion note). Production-by-production match with the spec at graphviz.org/doc/info/lang.html, verified against 85 .dot files on this site.

Key design choice: compass_pt ({n, ne, e, …}) is folded into id rather than a separate token class. The closed set is enforced in a semantic pass, not the grammar. This avoids a shift-reduce conflict where a:n could be port-name or compass-point.

References

Author: Jason Walsh

jwalsh@nexus

Last Updated: 2026-05-20 09:55:00

build: 2026-05-20 10:38 | sha: 5d12aea