This is semantic-langdev.info, produced by makeinfo version 4.3 from
lang-support-guide.texi.

This manual documents Application Development with Semantic.

   Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007 Eric M.
Ludlam Copyright (C) 2001, 2002, 2003, 2004 David Ponce Copyright (C)
2002, 2003 Richard Y. Kim

     Permission is granted to copy, distribute and/or modify this
     document under the terms of the GNU Free Documentation License,
     Version 1.1 or any later version published by the Free Software
     Foundation; with the Invariant Sections being list their titles,
     with the Front-Cover Texts being list, and with the Back-Cover
     Texts being list.  A copy of the license is included in the
     section entitled "GNU Free Documentation License".
   
INFO-DIR-SECTION Emacs
START-INFO-DIR-ENTRY
* Semantic Language Writer's guide: (semantic-langdev).
END-INFO-DIR-ENTRY

   This file documents Language Support Development with Semantic.
_Infrastructure for parser based text analysis in Emacs_

   Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam,
David Ponce, and Richard Y. Kim


File: semantic-langdev.info,  Node: Top,  Next: Tag Structure,  Up: (dir)

Language Support Developer's Guide
**********************************

Semantic is bundled with support for several languages such as C, C++,
Java, Python, etc.  However one of the primary gols of semantic is to
provide a framework in which anyone can add support for other languages
easily.  In order to support a new lanaugage, one typically has to
provide a lexer and a parser along with appropriate semantic actions
that produce the end result of the parser - the semantic tags.

This chapter first discusses the semantic tag data structure to
familiarize the reader to the goal.  Then all the components necessary
for supporting a lanaugage is discussed starting with the writing
lexer, writing the parser, writing semantic rules, etc.  Finally
several parsers bundled with semantic are discussed as case studies.

* Menu:

* Tag Structure::
* Language Support Overview::
* Writing Lexers::
* Writing Parsers::
* Parsing a language file::
* Debugging::
* Parser Error Handling::
* GNU Free Documentation License::
* Index::


File: semantic-langdev.info,  Node: Tag Structure,  Next: Language Support Overview,  Prev: Top,  Up: Top

Tag Structure
*************

The end result of the parser for a buffer is a list of tags.  Currently
each tag is a list with up to five elements:
     ("NAME" CLASS ATTRIBUTES PROPERTIES OVERLAY)

CLASS represents what kind of tag this is.  Common CLASS values include
`variable', `function', or `type'.  *note (semantic-appdev.info)Tag
Basics::.

ATTRIBUTES is a slot filled with langauge specific options for the tag.
Function arguments, return type, and other flags all are stored in
attributes.  A language author fills in the ATTRIBUTES with the tag
constructor, which is parser style dependant.

PROPERTIES is a slot generated by the semantic parser harness, and need
not be provided by a language author.  Programmatically access tag
properties with `semantic--tag-put-property',
`semantic--tag-put-property-no-side-effect' and
`semantic--tag-get-property'.

OVERLAY represents positional information for this tag.  It is
automatically generated by the semantic parser harness, and need not be
provided by the language author, unless they provide a tag expansion
function via `semantic-tag-expand-function'.

The OVERLAY property is accessed via several functions returning the
beginning, end, and buffer of a token.  Use these functions unless the
overlay is really needed (see *note (app-dev-guide)Tag Query::).
Depending on the overlay in a program can be dangerous because
sometimes the overlay is replaced with an integer pair
     [ START END ]
when the buffer the tag belongs to is not in memory.  This happens when
a user has activated the Semantic Database *note
(semantic-appdev)semanticdb::.

To create tags for a functional or object oriented language, you can
use s series of tag creation functions.  *note
(semantic-appdev)Creating Tags::


File: semantic-langdev.info,  Node: Language Support Overview,  Next: Writing Lexers,  Prev: Tag Structure,  Up: Top

Language Support Overview
*************************

Starting with version 2.0, semantic provides many ways to add support
for a language into the semantic framework.

The primary means to customize how semantic works is to implement
language specific versions of overloadable functions.  Semantic has a
specialized mode bound way to do this.  *Note Semantic Overload
Mechanism::.

The parser has several parts which are all also overloadable.  The
primary entry point into the parser is `semantic-fetch-tags' which calls
`semantic-parse-region' which returns a list of semantic tags which get
set to `semantic--buffer-cache'.

`semantic-parse-region' is the first "overloadable" function.  The
default behavior of this is to simply call `semantic-lex', then pass
the lexical token list to `semantic-repeat-parse-whole-stream'.  At
each stage, another more focused layer provides a means of overloading.

The parser is not the only layer that provides overloadable methods.
Application APIs *note (semantic-appdev)top:: provide many overload
functions as well.

* Menu:

* Semantic Overload Mechanism::
* Semantic Parser Structure::
* Application API Structure::


File: semantic-langdev.info,  Node: Semantic Overload Mechanism,  Next: Semantic Parser Structure,  Up: Language Support Overview

Semantic Overload Mechanism
===========================

one of semantic's goals is to provide a framework for supporting a wide
range of languages.  writing parsers for some languages are very
simple, e.g., any dialect of lisp family such as emacs-lisp and scheme.
parsers for many languages can be written in context free grammars such
as c, java, python, etc.  on the other hand, it is impossible to
specify context free grammars for other languages such as texinfo.  Yet
semantic already provides parsers for all these languages.

In order to support such wide range of languages, a mechanism for
customizing the parser engine at many levels was needed to maximize the
code reuse yet give each programmer the flexibility of customizing the
parser engine at many levels of granularity.  The solution that
semantic provides is the function overloading mechanism which allows
one to intercept and customize the behavior of many of the functions in
the parser engine.  First the parser engine breaks down the task of
parsing a language into several steps.  Each step is represented by an
Emacs-Lisp function.  Some of these are `semantic-parse-region',
`semantic-lex', `semantic-parse-stream', `semantic-parse-changes', etc.

Many built-in semantic functions are declared as being over-loadable
functions, i.e., functions that do reasonable things for most
languages, but can be customized to suit the particular needs of a
given language.  All over-loadable functions then can easily be
over-ridden if necessary.  The rest of this section provides details on
this overloading mechanism.

Over-loadable functions are created by defining functions with the
`define-overload' macro rather than the usual `defun'.
`define-overload' is a thin wrapper around `defun' that sets up the
function so that it can be overloaded.  An over-loadable function then
can be over-ridden in one of two ways:
`define-mode-overload-implementation' and
`semantic-install-function-overrides'.

Let's look at a couple of examples.  `semantic-parse-region' is one of
the top level functions in the parser engine defined via
`define-overload':

     (define-overload semantic-parse-region
       (start end &optional nonterminal depth returnonerror)
       "Parse the area between START and END, and return any tokens found.
     
     ...
     
     tokens.")

The document string was truncated in the middle above since it is not
relevant here.  The macro invocation above defines the
`semantic-parse-region' Emacs-Lisp function that checks first if there
is an overloaded implementation.  If one is found, then that is called.
If a mode specific implementation is not found, then the default
implementation is called which in this case is to call
`semantic-parse-region-default', i.e., a function with the same name
but with the tailing -default.  That function needs to be written
separately and take the same arguments as the entry created with
`define-overload'.

One way to overload `semantic-parse-region' is via
`semantic-install-function-overrides'.  An example from
`semantic-texi.el' file is shown below:

     (defun semantic-default-texi-setup ()
       "Set up a buffer for parsing of Texinfo files."
       ;; This will use our parser.
       (semantic-install-function-overrides
        '((parse-region . semantic-texi-parse-region)
          (parse-changes . semantic-texi-parse-changes)))
       ...
       )
     
     (add-hook 'texinfo-mode-hook 'semantic-default-texi-setup)

Above function is called whenever a buffer is setup as texinfo mode.
`semantic-install-function-overrides' above indicates that
`semantic-texi-parse-region' is to over-ride the default implementation
of `semantic-parse-region'.  Note the use of `parse-region' symbol
which is `semantic-parse-region' without the leading semantic- prefix.

Another way to over-ride a built-in semantic function is via
`define-mode-overload-implementation'.  An example from
`wisent-python.el' file is shown below.

     (define-mode-overload-implementation
       semantic-parse-region python-mode
       (start end &optional nonterminal depth returnonerror)
       "Over-ride in order to initialize some variables."
       (let ((wisent-python-lexer-indent-stack '(0))
             (wisent-python-explicit-line-continuation nil))
         (semantic-parse-region-default
          start end nonterminal depth returnonerror)))

Above over-rides `semantic-parse-region' so that for buffers whose
major mode is `python-mode', the code specified above is executed
rather than the default implementation.

Why not to use advice
---------------------

One may wonder why semantic Emacs already has advice.  *Note
(elisp)Advising Functions::.

Advising is generally considered a mechanism of last resort when
modifying or hooking into an existing package without modifying that
sourse file.  Overload files advertise that they should be overloaded,
and define syntactic sugar to do so.


File: semantic-langdev.info,  Node: Semantic Parser Structure,  Next: Application API Structure,  Prev: Semantic Overload Mechanism,  Up: Language Support Overview

Semantic Parser Structure
=========================

NOTE: describe the functions that do parsing, and how to overload each.



File: semantic-langdev.info,  Node: Application API Structure,  Prev: Semantic Parser Structure,  Up: Language Support Overview

Application API Structure
=========================

NOTE: improve this:

How to program with the Application programming API into the data
structures created by semantic guide. Read that guide to get a feel for
the specifics of what you can customize. *note (semantic-appdev)top::

Here are a list of applications, and the specific APIs that you will
need to overload to make them work properly with your language.

`imenu'
`speedbar'
`ecb'
     These tools requires that the `semantic-format' methods create
     correct strings.  *note (semantic-addpev)Format Tag::

`semantic-analyze'
     The analysis tool requires that the `semanticdb' tool is active,
     and that the searching methods are overloaded.  In addition,
     `semanticdb' system database could be written to provide symbols
     from the global environment of your langauge.  *note
     (semantic-appdev)System Databases::

     In addition, the analyzer requires that the `semantic-ctxt'
     methods are overloaded.  These methods allow the analyzer to look
     at the context of the cursor in your language, and predict the
     type of location of the cursor. *note (semantic-appdev)Derived
     Context::.

`semantic-idle-summary-mode'
`semantic-idle-completions-mode'
     These tools use the semantic analysis tool.  *note ()Context
     Analysis. . semantic-appdev::

* Menu:

* Semantic Analyzer Support::


File: semantic-langdev.info,  Node: Semantic Analyzer Support,  Up: Application API Structure

Semantic Analyzer Support
-------------------------


File: semantic-langdev.info,  Node: Writing Lexers,  Next: Writing Parsers,  Prev: Language Support Overview,  Up: Top

Writing Lexers
**************

In order to reduce a source file into a tag table, it must first be
converted into a token stream.  Tokens are syntactic elements such as
whitespace, symbols, strings, lists, and punctuation.

The lexer uses the major-mode's syntax table for conversion.  *Note
Syntax Tables: (elisp)Syntax Tables.  As long as that is set up
correctly (along with the important `comment-start' and
`comment-start-skip' variable) the lexer should already work for your
language.

The primary entry point of the lexer is the "semantic-lex" function
shown below.  Normally, you do not need to call this function.  It is
usually called by _semantic-fetch-tags_ for you.

 - Function: semantic-lex start end &optional depth length
     Lexically analyze text in the current buffer between START and END.
     Optional argument DEPTH indicates at what level to scan over entire
     lists.  The last argument, LENGTH specifies that "semantic-lex"
     should only return LENGTH tokens.  The return value is a token
     stream.  Each element is a list, such of the form   (symbol
     start-expression .  end-expression) where SYMBOL denotes the token
     type.  See `semantic-lex-tokens' variable for details on token
     types.  END does not mark the end of the text scanned, only the
     end of the beginning of text scanned.  Thus, if a string extends
     past END, the end of the return token will be larger than END.  To
     truly restrict scanning, use "narrow-to-region".

* Menu:

* Lexer Overview::              What is a Lexer?
* Lexer Output::                Output of a Lexical Analyzer
* Lexer Construction::          Constructing your own lexer
* Lexer Built In Analyzers::    Built in analyzers you can use
* Lexer Analyzer Construction::  Constructing your own anlyzers
* Keywords::                    Specialized lexical tokens.
* Keyword Properties::


File: semantic-langdev.info,  Node: Lexer Overview,  Next: Lexer Output,  Up: Writing Lexers

Lexer Overview
==============

semantic tokens.  This process is based mostly on regular expressions
which in turn depend on the syntax table of the buffer's major mode
being setup properly.  *Note Major Modes: (emacs)Major Modes.  *Note
Syntax Tables: (elisp)Syntax Tables.  *Note Regexps: (emacs)Regexps.

The top level lexical function "semantic-lex", calls the function
stored in "semantic-lex-analyzer".  The default value is the function
"semantic-flex" from version 1.4 of Semantic.  This will eventually be
depricated.

In the default lexer, the following regular expressions which rely on
syntax tables are used:

``\\s-''
     whitespace characters

``\\sw''
     word constituent

``\\s_''
     symbol constituent

``\\s.''
     punctuation character

``\\s<''
     comment starter

``\\s>''
     comment ender

``\\s\\''
     escape character

``\\s)''
     close parenthesis character

``\\s$''
     paired delimiter

``\\s\"''
     string quote

``\\s\'''
     expression prefix

In addition, Emacs' built-in features such as `comment-start-skip',
`forward-comment', `forward-list', and `forward-sexp' are employed.


File: semantic-langdev.info,  Node: Lexer Output,  Next: Lexer Construction,  Prev: Lexer Overview,  Up: Writing Lexers

Lexer Output
============

The lexer, *Note semantic-lex::, scans the content of a buffer and
returns a token list.  Let's illustrate this using this simple example.

     00: /*
     01:  * Simple program to demonstrate semantic.
     02:  */
     03:
     04: #include <stdio.h>
     05:
     06: int i_1;
     07:
     08: int
     09: main(int argc, char** argv)
     10: {
     11:     printf("Hello world.\n");
     12: }

Evaluating `(semantic-lex (point-min) (point-max))' within the buffer
with the code above returns the following token list.  The input line
and string that produced each token is shown after each semi-colon.

     ((punctuation     52 .  53)     ; 04: #
      (INCLUDE         53 .  60)     ; 04: include
      (punctuation     61 .  62)     ; 04: <
      (symbol          62 .  67)     ; 04: stdio
      (punctuation     67 .  68)     ; 04: .
      (symbol          68 .  69)     ; 04: h
      (punctuation     69 .  70)     ; 04: >
      (INT             72 .  75)     ; 06: int
      (symbol          76 .  79)     ; 06: i_1
      (punctuation     79 .  80)     ; 06: ;
      (INT             82 .  85)     ; 08: int
      (symbol          86 .  90)     ; 08: main
      (semantic-list   90 . 113)     ; 08: (int argc, char** argv)
      (semantic-list  114 . 147)     ; 09-12: body of main function
      )

As shown above, the token list is a list of "tokens".  Each token in
turn is a list of the form

     (TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)

where TOKEN-TYPE is a symbol, and the other two are integers indicating
the buffer position that delimit the token such that

     (buffer-substring BEGINNING-POSITION ENDING-POSITION)

would return the string form of the token.

Note that one line (line 4 above) can produce seven tokens while the
whole body of the function produces a single token.  This is because
the DEPTH parameter of `semantic-lex' was not specified.  Let's see the
output when DEPTH is set to 1.  Evaluate `(semantic-lex (point-min)
(point-max) 1)' in the same buffer.  Note the third argument of `1'.

     ((punctuation    52 .  53)     ; 04: #
      (INCLUDE        53 .  60)     ; 04: include
      (punctuation    61 .  62)     ; 04: <
      (symbol         62 .  67)     ; 04: stdio
      (punctuation    67 .  68)     ; 04: .
      (symbol         68 .  69)     ; 04: h
      (punctuation    69 .  70)     ; 04: >
      (INT            72 .  75)     ; 06: int
      (symbol         76 .  79)     ; 06: i_1
      (punctuation    79 .  80)     ; 06: ;
      (INT            82 .  85)     ; 08: int
      (symbol         86 .  90)     ; 08: main
     
      (open-paren     90 .  91)     ; 08: (
      (INT            91 .  94)     ; 08: int
      (symbol         95 .  99)     ; 08: argc
      (punctuation    99 . 100)     ; 08: ,
      (CHAR          101 . 105)     ; 08: char
      (punctuation   105 . 106)     ; 08: *
      (punctuation   106 . 107)     ; 08: *
      (symbol        108 . 112)     ; 08: argv
      (close-paren   112 . 113)     ; 08: )
     
      (open-paren    114 . 115)     ; 10: {
      (symbol        120 . 126)     ; 11: printf
      (semantic-list 126 . 144)     ; 11: ("Hello world.\n")
      (punctuation   144 . 145)     ; 11: ;
      (close-paren   146 . 147)     ; 12: }
      )

The DEPTH parameter "peeled away" one more level of "list" delimited by
matching parenthesis or braces.  The depth parameter can be specified
to be any number.  However, the parser needs to be able to handle the
extra tokens.

This is an interesting benefit of the lexer having the full resources
of Emacs at its disposal.  Skipping over matched parenthesis is
achieved by simply calling the built-in functions `forward-list' and
`forward-sexp'.


File: semantic-langdev.info,  Node: Lexer Construction,  Next: Lexer Built In Analyzers,  Prev: Lexer Output,  Up: Writing Lexers

Lexer Construction
==================

While using the default lexer is certainly an option, particularly for
grammars written in semantic 1.4 style, it is usually more efficient to
create a custom lexer for your language.

You can create a new lexer with "define-lex".

 - Function: define-lex name doc &rest analyzers
     Create a new lexical analyzer with NAME.  DOC is a documentation
     string describing this analyzer.  ANALYZERS are small code
     snippets of analyzers to use when building the new NAMED analyzer.
     Only use analyzers which are written to be used in "define-lex".
     Each analyzer should be an analyzer created with
     "define-lex-analyzer".  Note: The order in which analyzers are
     listed is important.  If two analyzers can match the same text, it
     is important to order the analyzers so that the one you want to
     match first occurs first.  For example, it is good to put a numbe
     analyzer in front of a symbol analyzer which might mistake a
     number for as a symbol.

The list of ANALYZERS, needed here can consist of one of several built
in analyzers, or one of your own construction.  The built in analyzers
are:


File: semantic-langdev.info,  Node: Lexer Built In Analyzers,  Next: Lexer Analyzer Construction,  Prev: Lexer Construction,  Up: Writing Lexers

Lexer Built In Analyzers
========================

 - Special Form: semantic-lex-default-action
     The default action when no other lexical actions match text.  This
     action will just throw an error.

 - Special Form: semantic-lex-beginning-of-line
     Detect and create a beginning of line token (BOL).

 - Special Form: semantic-lex-newline
     Detect and create newline tokens.

 - Special Form: semantic-lex-newline-as-whitespace
     Detect and create newline tokens.  Use this ONLY if newlines are
     not whitespace characters (such as when they are comment end
     characters) AND when you want whitespace tokens.

 - Special Form: semantic-lex-ignore-newline
     Detect and create newline tokens.  Use this ONLY if newlines are
     not whitespace characters (such as when they are comment end
     characters).

 - Special Form: semantic-lex-whitespace
     Detect and create whitespace tokens.

 - Special Form: semantic-lex-ignore-whitespace
     Detect and skip over whitespace tokens.

 - Special Form: semantic-lex-number
     Detect and create number tokens.  Number tokens are matched via
     this variable:

      - Variable: semantic-lex-number-expression
          Regular expression for matching a number.  If this value is
          `nil', no number extraction is done during lex.  This
          expression tries to match C and Java like numbers.

               DECIMAL_LITERAL:
                   [1-9][0-9]*
                 ;
               HEX_LITERAL:
                   0[xX][0-9a-fA-F]+
                 ;
               OCTAL_LITERAL:
                   0[0-7]*
                 ;
               INTEGER_LITERAL:
                   <DECIMAL_LITERAL>[lL]?
                 | <HEX_LITERAL>[lL]?
                 | <OCTAL_LITERAL>[lL]?
                 ;
               EXPONENT:
                   [eE][+-]?[09]+
                 ;
               FLOATING_POINT_LITERAL:
                   [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
                 | [.][0-9]+<EXPONENT>?[fFdD]?
                 | [0-9]+<EXPONENT>[fFdD]?
                 | [0-9]+<EXPONENT>?[fFdD]
                 ;


 - Special Form: semantic-lex-symbol-or-keyword
     Detect and create symbol and keyword tokens.

 - Special Form: semantic-lex-charquote
     Detect and create charquote tokens.

 - Special Form: semantic-lex-punctuation
     Detect and create punctuation tokens.

 - Special Form: semantic-lex-punctuation-type
     Detect and create a punctuation type token.  Recognized
     punctuations are defined in the current table of lexical types, as
     the value of the `punctuation' token type.

 - Special Form: semantic-lex-paren-or-list
     Detect open parenthesis.  Return either a paren token or a
     semantic list token depending on `semantic-lex-current-depth'.

 - Special Form: semantic-lex-open-paren
     Detect and create an open parenthisis token.

 - Special Form: semantic-lex-close-paren
     Detect and create a close paren token.

 - Special Form: semantic-lex-string
     Detect and create a string token.

 - Special Form: semantic-lex-comments
     Detect and create a comment token.

 - Special Form: semantic-lex-comments-as-whitespace
     Detect comments and create a whitespace token.

 - Special Form: semantic-lex-ignore-comments
     Detect and create a comment token.


File: semantic-langdev.info,  Node: Lexer Analyzer Construction,  Next: Keywords,  Prev: Lexer Built In Analyzers,  Up: Writing Lexers

Lexer Analyzer Construction
===========================

Each of the previous built in analyzers are constructed using a set of
analyzer construction macros.  The root construction macro is:

 - Function: define-lex-analyzer name doc condition &rest forms
     Create a single lexical analyzer NAME with DOC.  When an analyzer
     is called, the current buffer and point are positioned in a buffer
     at the location to be analyzed.  CONDITION is an expression which
     returns `t' if FORMS should be run.  Within the bounds of
     CONDITION and FORMS, the use of backquote can be used to evaluate
     expressions at compile time.  While forms are running, the
     following variables will be locally bound:
     `semantic-lex-analysis-bounds' - The bounds of the current
     analysis.                    of the form (START . END)
     `semantic-lex-maximum-depth' - The maximum depth of semantic-list
                     for the current analysis.
     `semantic-lex-current-depth' - The current depth of
     `semantic-list' that has                   been decended.
     `semantic-lex-end-point' - End Point after match.
       Analyzers should set this to a buffer location if their
               match string does not represent the end of the matched
     text.    `semantic-lex-token-stream' - The token list being
     collected.                     Add new lexical tokens to this list.
     Proper action in FORMS is to move the value of
     `semantic-lex-end-point' to after the location of the analyzed
     entry, and to add any discovered tokens at the beginning of
     `semantic-lex-token-stream'.  This can be done by using
     "semantic-lex-push-token".

Additionally, a simple regular expression based analyzer can be built
with:

 - Function: define-lex-regex-analyzer name doc regexp &rest forms
     Create a lexical analyzer with NAME and DOC that will match REGEXP.
     FORMS are evaluated upon a successful match.  See
     "define-lex-analyzer" for more about analyzers.

 - Function: define-lex-simple-regex-analyzer name doc regexp toksym
          &optional index &rest forms
     Create a lexical analyzer with NAME and DOC that match REGEXP.
     TOKSYM is the symbol to use when creating a semantic lexical token.
     INDEX is the index into the match that defines the bounds of the
     token.  Index should be a plain integer, and not specified in the
     macro as an expression.  FORMS are evaluated upon a successful
     match BEFORE the new token is created.  It is valid to ignore
     FORMS.  See "define-lex-analyzer" for more about analyzers.

Regular expression analyzers are the simplest to create and manage.
Often, a majority of your lexer can be built this way.  The analyzer
for matching punctuation looks like this:

     (define-lex-simple-regex-analyzer semantic-lex-punctuation
       "Detect and create punctuation tokens."
       "\\(\\s.\\|\\s$\\|\\s'\\)" 'punctuation)

More complex analyzers for matching larger units of text to optimize
the speed of parsing and analysis is done by matching blocks.

 - Function: define-lex-block-analyzer name doc spec1 &rest specs
     Create a lexical analyzer NAME for paired delimiters blocks.  It
     detects a paired delimiters block or the corresponding open or
     close delimiter depending on the value of the variable
     `semantic-lex-current-depth'.  DOC is the documentation string of
     the lexical analyzer.  SPEC1 and SPECS specify the token symbols
     and open, close delimiters used.  Each SPEC has the form:

     (BLOCK-SYM (OPEN-DELIM OPEN-SYM) (CLOSE-DELIM CLOSE-SYM))

     where BLOCK-SYM is the symbol returned in a block token.
     OPEN-DELIM and CLOSE-DELIM are respectively the open and close
     delimiters identifying a block.  OPEN-SYM and CLOSE-SYM are
     respectively the symbols returned in open and close tokens.

These blocks is what makes the speed of semantic's Emacs Lisp based
parsers fast.  For exmaple, by defining all text inside { braces } as a
block the parser does not need to know the contents of those braces
while parsing, and can skip them all together.


File: semantic-langdev.info,  Node: Keywords,  Next: Keyword Properties,  Prev: Lexer Analyzer Construction,  Up: Writing Lexers

Keywords
========

Another important piece of the lexer is the keyword table (see *Note
Writing Parsers::).  You language will want to set up a keyword table
for fast conversion of symbol strings to language terminals.

The keywords table can also be used to store additional information
about those keywords.  The following programming functions can be useful
when examining text in a language buffer.

 - Function: semantic-lex-keyword-p name
     Return non-`nil' if a keyword with NAME exists in the keyword
     table.  Return `nil' otherwise.

 - Function: semantic-lex-keyword-put name property value
     For keyword with NAME, set its PROPERTY to VALUE.

 - Function: semantic-lex-keyword-get name property
     For keyword with NAME, return its PROPERTY value.

 - Function: semantic-lex-map-keywords fun &optional property
     Call function FUN on every semantic keyword.  If optional PROPERTY
     is non-`nil', call FUN only on every keyword which as a PROPERTY
     value.  FUN receives a semantic keyword as argument.

 - Function: semantic-lex-keywords &optional property
     Return a list of semantic keywords.  If optional PROPERTY is
     non-`nil', return only keywords which have a PROPERTY set.

Keyword properties can be set up in a grammar file for ease of
maintenance.  While examining the text in a language buffer, this can
provide an easy and quick way of storing details about text in the
buffer.


File: semantic-langdev.info,  Node: Keyword Properties,  Prev: Keywords,  Up: Writing Lexers

Standard Keyword Properties
===========================

Keywords in a language can have multiple properties.  These properties
can be used to associate the string that is the keyword with additional
information.

Currently available properties are:

summary
     The summary property is used by semantic-summary-mode as a help
     string for the keyword specified.

Notes:

Possible future properties.  This is just me musing:

face
     Face used for highlighting this keyword, differentiating it from
     the keyword face.

template
skeleton
     Some sort of tempo/skel template for inserting the programatic
     structure associated with this keyword.

abbrev
     As with template.

action
menu
     Perhaps the keyword is clickable and some action would be useful.


File: semantic-langdev.info,  Node: Writing Parsers,  Next: Parsing a language file,  Prev: Writing Lexers,  Up: Top

Writing Parsers
***************

When converting a source file into a tag table it is important to
specify rules to accomplish this.  The rules are stored in the buffer
local variable `semantic--buffer-cache'.

While it is certainly possible to write this table yourself, it is most
likely you will want to use the *Note Grammar Programming Environment::.

There are three choices for parsing your language.

Bovine Parser
     The "bovine" parser is the original semantic parser, and is an
     implementation of an LL parser.  For more information, *note the
     Bovine Parser Manual: (bovine)top.

Wisent Parser
     The "wisent" parser is a port of the GNU Compiler Compiler Bison
     to Emacs Lisp.  Wisent includes the iterative error handler of the
     bovine parser, and has the same error correction as traditional
     LALR parsers.  For more information, *note the Wisent Parser
     Manual: (wisent)top.

External Parser
     External parsers, such as the texinfo parser can be implemented
     using any means.  This allows the use of a regular expression
     parser for non-regular languages, or external programs for speed.

* Menu:

* External Parsers::            Writing an external parser
* Grammar Programming Environment::  Using the grammar writing environemt
* Parser Backend Support::             Lisp needed to support a grammar.


File: semantic-langdev.info,  Node: External Parsers,  Next: Grammar Programming Environment,  Up: Writing Parsers

External Parsers
================

The texinfo parser in `semantic-texi.el' is an example of an external
parser.  To make your parser work, you need to have a setup function.

Note: Finish this.


File: semantic-langdev.info,  Node: Grammar Programming Environment,  Next: Parser Backend Support,  Prev: External Parsers,  Up: Writing Parsers

Grammar Programming Environment
===============================

Semantic grammar files in `.by' or `.wy' format have their own
programming mode.  This mode provides indentation and coloring services
in those languages.  In addition, the grammar languages are also
supported by semantic tools such as imenu or speedbar.

For more information, *note the Grammar Framework Manual:
(grammar-fw)top.


File: semantic-langdev.info,  Node: Parsing a language file,  Next: Debugging,  Prev: Writing Parsers,  Up: Top

Parsing a language file
***********************

The best way to call the parser from programs is via
`semantic-fetch-tags'.  This, in turn, uses other internal API
functions which plug-in parsers can take advantage of.

 - Function: semantic-fetch-tags
     Fetch semantic tags from the current buffer.  If the buffer cache
     is up to date, return that.  If the buffer cache is out of date,
     attempt an incremental reparse.  If the buffer has not been parsed
     before, or if the incremental reparse fails, then parse the entire
     buffer.  If a lexcial error had been previously discovered and the
     buffer was marked unparseable, then do nothing, and return the
     cache.

Another approach is to let Emacs call the parser on idle time, when
needed, then use `semantic-fetch-available-tags' to retrieve and
process only the available tags, provided that the
`semantic-after-*-hook' hooks have been setup to synchronize with new
tags when they become available.

 - Function: semantic-fetch-available-tags
     Fetch available semantic tags from the current buffer.  That is,
     return tags currently in the cache without parsing the current
     buffer.

     Parse operations happen asynchronously when needed on Emacs idle
     time.  Use the `semantic-after-toplevel-cache-change-hook' and
     `semantic-after-partial-cache-change-hook' hooks to synchronize
     with new tags when they become available.

 - Command: semantic-clear-toplevel-cache
     Clear the toplevel tag cache for the current buffer.  Clearing the
     cache will force a complete reparse next time a token stream is
     requested.


File: semantic-langdev.info,  Node: Parser Backend Support,  Prev: Grammar Programming Environment,  Up: Writing Parsers

Parser Backend Support
======================

Once you have written a grammar file that has been compiled into Emacs
Lisp code, additional glue needs to be written to finish connecting the
generated parser into the Emacs framework.

Large portions of this glue is automatically generated, but will
probably need additional modification to get things to work properly.

Typically, a grammar file `foo.wy' will create the file `foo-wy.el'.
It is then useful to also create a file `wisent-foo.el' (or
`sematnic-foo.el') to contain the parser back end, or the glue that
completes the semantic support for the language.

* Menu:

* Example Backend File::
* Tag Expansion::


File: semantic-langdev.info,  Node: Example Backend File,  Next: Tag Expansion,  Up: Parser Backend Support

Example Backend File
--------------------

Typical structure for this file is:

     ;;; semantic-foo.el -- parser support for FOO.
     
     ;;; Your copyright Notice
     
     (require 'foo-wy.el)  ;; The parser
     (require 'foo) ;; major mode definition for FOO
     
     ;;; Code:
     
     ;;; Lexical Analyzer
     ;;
     ;; OPTIONAL
     ;; It is possible to define your lexical analyzer completely in your
     ;; grammar file.
     
     (define-lex foo-lexical-analyzer
       "Create a lexical analyzer."
       ...)
     
     ;;; Expand Function
     ;;
     ;; OPTIONAL
     ;; Not all langauges are so complex as to need this function.
     ;; See `semantic-tag-expand-function' for more details.
     (defun foo-tag-expand-function (tag)
       "Expand TAG into multiple tags if needed."
       ...)
     
     ;;; Parser Support
     ;;
     ;; OPTIONAL
     ;; If you need some specialty routines inside your grammar file
     ;; you can add some here.   The process may be to take diverse info
     ;; and reorganize it.
     ;;
     ;; It is also appropriate to write these functions in the prologue
     ;; of the grammar function.
     (defun foo-do-something-hard (...)
       "...")
     
     ;;; Overload methods
     ;;
     ;; OPTIONAL
     ;; To allow your langauge to be fully supported by all the
     ;; applications that use semantic, it is important, but not necessary
     ;; to create implementations of overload methods.
     (define-mode-overload-implementation some-semantic-function foo-mode (tag)
       "Implement some-semantic-function for FOO."
       )
     
     ;;;###autoload
     (defun semantic-default-foo-setup ()
       "Set up a buffer for semantic parsing of the FOO language."
       (semantic-foo-by--install-parser)
       (setq semantic-tag-expand-function foo-tag-expand-function
             ;; Many other language specific settings can be done here
             ;; as well.
             )
       ;; This may be optional
       (setq semantic-lex-analyzer #'foo-lexical-analyzer)
       )
     
     ;;;###autoload
     (add-hook 'foo-mode-hook 'semantic-default-foo-setup)
     
     (provide 'semantic-c)
     
     ;;; semantic-foo.el ends here


File: semantic-langdev.info,  Node: Tag Expansion,  Prev: Example Backend File,  Up: Parser Backend Support

Tag Expansion
-------------

In any language with compound tag types, you will need to implement an
_expand function_.  Once written, assign it to this variable.

 - Variable: semantic-tag-expand-function
     Function used to expand a tag.  It is passed each tag production,
     and must return a list of tags derived from it, or `nil' if it
     does not need to be expanded.

     Languages with compound definitions should use this function to
     expand from one compound symbol into several.  For example, in C
     or Java the following definition is easily parsed into one tag:

     int a, b;

     This function should take this compound tag and turn it into two
     tags, one for A, and the other for B.

Additionally, you can use the expand function in conjuntion with your
language for other types of compound statements.  For example, in
Common Lisp Object System, you can have a definition:

     (defclass classname nil
       (slots ...) ...)

This will create both the datatype `classname' and the functional
constructor `classname'.  Each slot may have a `:accessor' method as
well.

You can create a special compounded tag in your rule, for example:

     classdef: LPAREN DEFCLASS name semantic-list semantic-list RPAREN
               (TAG "custom" 'compound-class
                    :value (list
                             (TYPE-TAG $3 "class" ...)
                             (FUNCTION-TAG $3 ...)
                             ))
             ;

and in your expand function, you would write:

     (defun my-tag-expand (tag)
       "Expand tags for my langauge."
       (when (semantic-tag-of-class-p tag 'compound-class)
          (remq nil
             (semantic-tag-get-attribute tag :value))))

This will cause the custom tag to be replaced by the tags created in
the :value attribute of the specially constructed tag.


File: semantic-langdev.info,  Node: Debugging,  Next: Parser Error Handling,  Prev: Parsing a language file,  Up: Top

Debugging
*********

Grammars can be tricky things to debug.  There are several types of
tools for debugging in Semantic, and the type of problem you have
requires different types of tools.

* Menu:

* Lexical Debugging::
* Parser Output tools::
* Bovine Parser Debugging::
* Wisent Parser Debugging::
* Overlay Debugging::
* Incremental Parser Debugging::
* Debugging Analysis::
* Semantic 1.4 Doc::


File: semantic-langdev.info,  Node: Lexical Debugging,  Next: Parser Output tools,  Up: Debugging

Lexical Debugging
=================

The first major problem you may encounter is with lexical analysis.  If
the text is not transformed into the expected token stream, no parser
will understand it.

You can step through the lexical analyzer with the following command:

 - Command: semantic-lex-debug arg
     Debug the semantic lexer in the current buffer.  Argument ARG
     specifies of the analyze the whole buffer, or start at point.
     While engaged, each token identified by the lexer will be
     highlighted in the target buffer   A description of the current
     token will be displayed in the minibuffer.  Press `SPC' to move to
     the next lexical token.

For an example of what the output of the `semantic-lex' function should
return, see *Note Lexer Output::.


File: semantic-langdev.info,  Node: Parser Output tools,  Next: Bovine Parser Debugging,  Prev: Lexical Debugging,  Up: Debugging

Parser Output tools
===================

There are several tools which can be used to see what the parser output
is.  These will work for any type of parser, including the Bovine
parser, Wisent parser.

The first and easiest is a minor mode which highlights text the parser
did not understand.

 - Command: semantic-show-unmatched-syntax-mode &optional arg
     Minor mode to highlight unmatched lexical syntax tokens.  When a
     parser executes, some elements in the buffer may not match any
     parser rules.  These text characters are considered unmatched
     syntax.  Often time, the display of unmatched syntax can expose
     coding problems before the compiler is run.

     With prefix argument ARG, turn on if positive, otherwise off.  The
     minor mode can be turned on only if semantic feature is available
     and the current buffer was set up for parsing.  Return non-`nil'
     if the minor mode is enabled.

    `key'
          binding

    `C-c ,'
          Prefix Command

    `C-c , `'
          semantic-show-unmatched-syntax-next

Another interesting mode will display a line between all the tags in
the current buffer to make it more obvious where boundaries lie.  You
can enable this as a minor mode.

 - Command: semantic-show-tag-boundaries-mode &optional arg
     Minor mode to display a boundary in front of tags.  The boundary
     is displayed using an overline in Emacs 21.  With prefix argument
     ARG, turn on if positive, otherwise off.  The minor mode can be
     turned on only if semantic feature is available and the current
     buffer was set up for parsing.  Return non-`nil' if the minor mode
     is enabled.

Another interesting mode helps if you are worred about specific
attributes, you can se this minor mode to highlight different tokens in
different ways based on the attributes you are most concerned with.

 - Command: semantic-highlight-by-attribute-mode &optional arg
     Minor mode to highlight tags based on some attribute.  By default,
     the protection of a tag will give it a different background color.

     With prefix argument ARG, turn on if positive, otherwise off.  The
     minor mode can be turned on only if semantic feature is available
     and the current buffer was set up for parsing.  Return non-`nil'
     if the minor mode is enabled.

Another tool that can be used is a dump of the current list of tags.
This shows the actual Lisp representation of the tags generated in a
rather bland dump.  This can be useful if text was successfully parsed,
and you want to be sure that the correct information was captured.

 - Command: bovinate &optional clear
     Bovinate the current buffer.  Show output in a temp buffer.
     Optional argument CLEAR will clear the cache before bovinating.
     If CLEAR is negative, it will do a full reparse, and also not
     display the output buffer.


File: semantic-langdev.info,  Node: Bovine Parser Debugging,  Next: Wisent Parser Debugging,  Prev: Parser Output tools,  Up: Debugging

Bovine Parser Debugging
=======================

The bovine parser is described in *note (bovine)top::.

Asside using a traditional Emacs Lisp debugger on functions you provide
for token expansion, there is one other means of debugging which
interactively steps over the rules in your grammar file.

 - Command: semantic-debug
     Parse the current buffer and run in debug mode.

Once the parser is activated in this mode, the current tag cache is
flushed, and the parser started.  At each stage in the LALR parser, the
current rule, and match step is highlighted in your parser source
buffer.  In a second window, the text being parsed is shown, and the
lexical token found is highlighted.  A clue of the current stack of
saved data is displayed in the minibuffer.

There is a wide range of keybindings that can be used to execute code
in your buffer.  (Not all are implemented.)

`n'
`SPC'
     Next.

`s'
     Step.

`u'
     Up.  (Not implemented yet.)

`d'
     Down.  (Not implemented yet.)

`f'
     Fail Match.  Pretend the current match element and the token in the
     buffer is a failed match, even if it is not.

`h'
     Print information about the current parser state.

`s'
     Jump to to the source buffer.

`p'
     Jump to the parser buffer.

`q'
     Quit.  Exits this debug session and the parser.

`a'
     Abort.  Aborts one level of the parser, possibly exiting the
     debugger.

`g'
     Go.  Stop debugging, and just start parsing.

`b'
     Set Breakpoint.  (Not implemented yet.)

`e'
     `eval-expression'.  Lets you execute some random Emacs Lisp
     command.

Note: While the core of `semantic-debug' is a generic debugger
interface for rule based grammars, only the bovine parser has a
specific backend implementation.  If someone wants to implement a
debugger backend for wisent, that would be spiff.


File: semantic-langdev.info,  Node: Wisent Parser Debugging,  Next: Overlay Debugging,  Prev: Bovine Parser Debugging,  Up: Debugging

Wisent Parser Debugging
=======================

Wisent does not implement a backend for `semantic-debug', it does have
some debugging commands the rule actions.  You can read about them in
the wisent manual.

*note (wisent)Grammar Debugging::


File: semantic-langdev.info,  Node: Overlay Debugging,  Next: Incremental Parser Debugging,  Prev: Wisent Parser Debugging,  Up: Debugging

Overlay Debugging
=================

Once a buffer has been parsed into a tag table, the next most important
step is getting those tags activated for a buffer, and storable in a
`semanticdb' backend.  *note (semantic-appdev)semanticdb::.

These two activities depend on the ability of every tag in the table to
be linked and unlinked to the current buffer with an overlay.  *note
(Tag Overlay)semantic-appdev:: *note (Tag Hooks)semantic-appdev::

In this case, the most important function that must be written is:

 - Function: semantic-tag-components-with-overlays tag
     Return the list of top level components belonging to TAG.
     Children are any sub-tags which contain overlays.

     Default behavior is to get "semantic-tag-components" in addition
     to the components of an anonymous types (if applicable.)

     Note for language authors:   If a mode defines a language tag that
     has tags in it with overlays you should still return them with
     this function.  Ignoring this step will prevent several features
     from working correctly.  This function can be overriden in
     semantic using the symbol `tag-components-with-overlays'.

If your are successfully building a tag table, and errors occur saving
or restoring tags from semanticdb, this is the most likely cause of the
problem.


File: semantic-langdev.info,  Node: Incremental Parser Debugging,  Next: Debugging Analysis,  Prev: Overlay Debugging,  Up: Debugging

Incremental Parser Debugging
============================

The incremental parser is a highly complex engine for quickly
refreshing the tag table of a buffer after some set of changes have
been made to that buffer by a user.

There is no debugger or interface to the incremental parser, however
there are a few minor modes which can help you identify issues if you
think there are problems while incrementally parsing a buffer.

The first stage of the incremental parser is in tracking the changes
the user makes to a buffer.  You can visibly track these changes too.

 - Command: semantic-highlight-edits-mode &optional arg
     Minor mode for highlighting changes made in a buffer.  Changes are
     tracked by semantic so that the incremental parser can work
     properly.  This mode will highlight those changes as they are
     made, and clear them when the incremental parser accounts for
     those edits.  With prefix argument ARG, turn on if positive,
     otherwise off.  The minor mode can be turned on only if semantic
     feature is available and the current buffer was set up for
     parsing.  Return non-`nil' if the minor mode is enabled.

Another important aspect of the incremental parser involves tracking
the current parser state of the buffer.  You can track this state also.

 - Command: semantic-show-parser-state-mode &optional arg
     Minor mode for displaying parser cache state in the modeline.  The
     cache can be in one of three states.  They are Up to date, Partial
     reprase needed, and Full reparse needed.  The state is indicated
     in the modeline with the following characters:
    `-'
          The cache is up to date.

    `!'
          The cache requires a full update.

    `^'
          The cache needs to be incrementally parsed.

    `%'
          The cache is not currently parseable.

    `@'
          Auto-parse in progress (not set here.)

     With prefix argument ARG, turn on if positive, otherwise off.  The
     minor mode can be turned on only if semantic feature is available
     and the current buffer was set up for parsing.  Return non-`nil'
     if the minor mode is enabled.

When the incremental parser starts updating the tags buffer, you can
also enable a set of messages to help identify how the incremental
parser is merging changes with the main buffer.

 - Variable: semantic-edits-verbose-flag
     Non-`nil' means the incremental perser is verbose.  If `nil',
     errors are still displayed, but informative messages are not.