Graphviz DOT: Spec BNF → Wisent Grammar
Table of Contents
1. Framing
graphviz lang.html publishes EBNF-flavored productions (':' as production operator, '[ ]' optional, '( | )' grouping, single-quoted terminals). wisent accepts BNF only, no EBNF sugar. Mechanical translation:
| EBNF | wisent |
|---|---|
| [ X ] | opt_X : * empty * | X ; |
| ( A | B ) | inline as separate alternatives |
| ':' op | ':' is the wisent rule operator; same shape |
Editing path: open the .wy in wisent-grammar-mode (built-in). Compile
path: M-x semantic-grammar-batch-build-packages produces dot-wy.el.
2. Grammar
;;; dot.wy -- DOT language grammar for wisent -*- mode: wisent-grammar -*-
;; spec: https://graphviz.org/doc/info/lang.html
%package dot-wy
%languagemode dot-mode
%start graph
%token <keyword> STRICT "strict"
%token <keyword> GRAPH "graph"
%token <keyword> DIGRAPH "digraph"
%token <keyword> NODE "node"
%token <keyword> EDGE "edge"
%token <keyword> SUBGRAPH "subgraph"
%token LBRACE "{"
%token RBRACE "}"
%token LBRACKET "["
%token RBRACKET "]"
%token SEMI ";"
%token COMMA ","
%token COLON ":"
%token EQ "="
%token ARROW "->"
%token DASH "--"
%token <symbol> ID ;; bare identifier or numeral
%token <string> QSTRING ;; double-quoted, may span lines via backslash-NL
%token <string> HSTRING ;; HTML-like, balanced <...>
%%
graph
: opt_strict graph_kind opt_id LBRACE stmt_list RBRACE
;
opt_strict : | STRICT ;
graph_kind : GRAPH | DIGRAPH ;
opt_id : | id ;
;; ID in the spec subsumes bare, numeric, quoted, and HTML strings.
id : ID | QSTRING | HSTRING ;
stmt_list
:
| stmt stmt_list
| stmt SEMI stmt_list
;
stmt
: node_stmt
| edge_stmt
| attr_stmt
| id EQ id
| subgraph
;
attr_stmt : attr_kind attr_list ;
attr_kind : GRAPH | NODE | EDGE ;
attr_list
: LBRACKET RBRACKET opt_attr_list
| LBRACKET a_list RBRACKET opt_attr_list
;
opt_attr_list : | attr_list ;
a_list
: id EQ id
| id EQ id sep
| id EQ id sep a_list
;
sep : SEMI | COMMA ;
edge_stmt : edge_lhs edge_rhs opt_attr_list ;
edge_lhs : node_id | subgraph ;
edge_rhs : edgeop edge_lhs
| edgeop edge_lhs edge_rhs
;
edgeop : ARROW | DASH ;
node_stmt : node_id opt_attr_list ;
node_id : id
| id port
;
port : COLON id
| COLON id COLON id
;
subgraph
: LBRACE stmt_list RBRACE
| SUBGRAPH LBRACE stmt_list RBRACE
| SUBGRAPH id LBRACE stmt_list RBRACE
;
%%
3. Lexer (sketch only)
wisent does not generate a lexer. semantic-lex (also bundled) is the intended companion. Minimal scaffolding below; QSTRING/HSTRING/numeric ID rules are the actual work and are left as named gaps.
;;; dot-lex.el -- semantic-lex rules for DOT -*- lexical-binding: t -*- (require 'semantic/lex) (define-lex-regex-analyzer dot-lex-line-comment "// to end of line" "//[^\n]*" (forward-comment 1)) (define-lex-regex-analyzer dot-lex-hash-line "C-pp passthrough lines starting with #" "^#[^\n]*" (forward-line 1)) ;; gaps: ;; - QSTRING: \" escape, backslash-newline continuation, '+' concat ;; - HSTRING: balanced <...> with nesting ;; - numeric ID: -?(\.[0-9]+|[0-9]+(\.[0-9]*)?) ;; - keyword case-folding: 'Graph' == 'graph' == 'GRAPH' (define-lex dot-lexer "DOT lexer." semantic-lex-ignore-whitespace semantic-lex-ignore-newline dot-lex-line-comment dot-lex-hash-line semantic-lex-ignore-comments ;; /* ... */ via syntax table ;; punctuation, keywords, ID/QSTRING/HSTRING here semantic-lex-default-action)
4. Refutation conditions
- port / compass_pt collision [H]. spec carves compass_pt as a closed
set {n,ne,e,se,s,sw,w,nw,c,_}. grammar above keeps them in ID and
defers to a semantic pass. refute: parse
a:n,a:n:e,a:foo,a:foo:bar; all four must accept, and a downstream tagger must classify positions 2 and 3 as compass-or-not correctly. - numeric ID vs DASH operator [H].
-1.5is a numeric ID;a -- bis a DASH. lexer disambiguates by surrounding context (post-EQ-> prefer numeric). refute:x = -1; a -- bparses with three tokens on the RHS ofx, not five. - keyword case-insensitivity [E]. spec mandates fold; handle in lexer,
not grammar. refute: a fixture with
Graph,DIGRAPH,Strictproduces identical AST to its lowercase twin. - anonymous subgraph at depth [E]. bare
{ ... }as stmt collides with nothing in stmt context; LALR(1) lookahead suffices. refute: nested anonymous subgraphs at depth >= 3 build a tree without shift-reduce conflicts in the generated parser. - line-leading
#[E]. discard, do not treat as comment. cpp artifact.
5. Boundary: what wisent does not give you
| concern | wisent | gap |
|---|---|---|
| parse acceptance | LALR(1), yes | error recovery is crude |
| AST | via semantic-tag actions you write | actions not provided here |
| editing the .wy | wisent-grammar-mode, built-in | - |
| editing .dot files | not the same thing as parsing them | still need font-lock + smie/treesit |
| performance | acceptable for small graphs | not incremental |
6. Fork
Two endpoints this scaffold can serve, and the lexer effort differs:
- grammar-as-document. dot.wy is the canonical, diffable, machine- checkable transcription of the spec. wisent-grammar-mode handles editing. No parser generation; lexer stays sketch.
- grammar-as-parser.
M-x semantic-grammar-batch-build-packagesemitsdot-wy.el. Wire into a derived dot-mode that calls the parser; semantic-tag stream drives imenu/which-func/speedbar. Lexer must be production-grade, especially QSTRING and numeric ID.
Which end? Answer determines whether the named lexer gaps are real work or annotations.
7. Related
- Graphviz: DOT Language and Rendering Pipeline — parent research page
- DOT Language specification — the EBNF this grammar translates
- wal.sh Diagram Style Guide — the visual conventions these grammars support