This is semantic-langdev.info, produced by makeinfo version 4.3 from lang-support-guide.texi. This manual documents Application Development with Semantic. Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007 Eric M. Ludlam Copyright (C) 2001, 2002, 2003, 2004 David Ponce Copyright (C) 2002, 2003 Richard Y. Kim Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with the Invariant Sections being list their titles, with the Front-Cover Texts being list, and with the Back-Cover Texts being list. A copy of the license is included in the section entitled "GNU Free Documentation License". INFO-DIR-SECTION Emacs START-INFO-DIR-ENTRY * Semantic Language Writer's guide: (semantic-langdev). END-INFO-DIR-ENTRY This file documents Language Support Development with Semantic. _Infrastructure for parser based text analysis in Emacs_ Copyright (C) 1999, 2000, 2001, 2002, 2003, 2004 Eric M. Ludlam, David Ponce, and Richard Y. Kim  File: semantic-langdev.info, Node: Top, Next: Tag Structure, Up: (dir) Language Support Developer's Guide ********************************** Semantic is bundled with support for several languages such as C, C++, Java, Python, etc. However one of the primary gols of semantic is to provide a framework in which anyone can add support for other languages easily. In order to support a new lanaugage, one typically has to provide a lexer and a parser along with appropriate semantic actions that produce the end result of the parser - the semantic tags. This chapter first discusses the semantic tag data structure to familiarize the reader to the goal. Then all the components necessary for supporting a lanaugage is discussed starting with the writing lexer, writing the parser, writing semantic rules, etc. Finally several parsers bundled with semantic are discussed as case studies. * Menu: * Tag Structure:: * Language Support Overview:: * Writing Lexers:: * Writing Parsers:: * Parsing a language file:: * Debugging:: * Parser Error Handling:: * GNU Free Documentation License:: * Index::  File: semantic-langdev.info, Node: Tag Structure, Next: Language Support Overview, Prev: Top, Up: Top Tag Structure ************* The end result of the parser for a buffer is a list of tags. Currently each tag is a list with up to five elements: ("NAME" CLASS ATTRIBUTES PROPERTIES OVERLAY) CLASS represents what kind of tag this is. Common CLASS values include `variable', `function', or `type'. *note (semantic-appdev.info)Tag Basics::. ATTRIBUTES is a slot filled with langauge specific options for the tag. Function arguments, return type, and other flags all are stored in attributes. A language author fills in the ATTRIBUTES with the tag constructor, which is parser style dependant. PROPERTIES is a slot generated by the semantic parser harness, and need not be provided by a language author. Programmatically access tag properties with `semantic--tag-put-property', `semantic--tag-put-property-no-side-effect' and `semantic--tag-get-property'. OVERLAY represents positional information for this tag. It is automatically generated by the semantic parser harness, and need not be provided by the language author, unless they provide a tag expansion function via `semantic-tag-expand-function'. The OVERLAY property is accessed via several functions returning the beginning, end, and buffer of a token. Use these functions unless the overlay is really needed (see *note (app-dev-guide)Tag Query::). Depending on the overlay in a program can be dangerous because sometimes the overlay is replaced with an integer pair [ START END ] when the buffer the tag belongs to is not in memory. This happens when a user has activated the Semantic Database *note (semantic-appdev)semanticdb::. To create tags for a functional or object oriented language, you can use s series of tag creation functions. *note (semantic-appdev)Creating Tags::  File: semantic-langdev.info, Node: Language Support Overview, Next: Writing Lexers, Prev: Tag Structure, Up: Top Language Support Overview ************************* Starting with version 2.0, semantic provides many ways to add support for a language into the semantic framework. The primary means to customize how semantic works is to implement language specific versions of overloadable functions. Semantic has a specialized mode bound way to do this. *Note Semantic Overload Mechanism::. The parser has several parts which are all also overloadable. The primary entry point into the parser is `semantic-fetch-tags' which calls `semantic-parse-region' which returns a list of semantic tags which get set to `semantic--buffer-cache'. `semantic-parse-region' is the first "overloadable" function. The default behavior of this is to simply call `semantic-lex', then pass the lexical token list to `semantic-repeat-parse-whole-stream'. At each stage, another more focused layer provides a means of overloading. The parser is not the only layer that provides overloadable methods. Application APIs *note (semantic-appdev)top:: provide many overload functions as well. * Menu: * Semantic Overload Mechanism:: * Semantic Parser Structure:: * Application API Structure::  File: semantic-langdev.info, Node: Semantic Overload Mechanism, Next: Semantic Parser Structure, Up: Language Support Overview Semantic Overload Mechanism =========================== one of semantic's goals is to provide a framework for supporting a wide range of languages. writing parsers for some languages are very simple, e.g., any dialect of lisp family such as emacs-lisp and scheme. parsers for many languages can be written in context free grammars such as c, java, python, etc. on the other hand, it is impossible to specify context free grammars for other languages such as texinfo. Yet semantic already provides parsers for all these languages. In order to support such wide range of languages, a mechanism for customizing the parser engine at many levels was needed to maximize the code reuse yet give each programmer the flexibility of customizing the parser engine at many levels of granularity. The solution that semantic provides is the function overloading mechanism which allows one to intercept and customize the behavior of many of the functions in the parser engine. First the parser engine breaks down the task of parsing a language into several steps. Each step is represented by an Emacs-Lisp function. Some of these are `semantic-parse-region', `semantic-lex', `semantic-parse-stream', `semantic-parse-changes', etc. Many built-in semantic functions are declared as being over-loadable functions, i.e., functions that do reasonable things for most languages, but can be customized to suit the particular needs of a given language. All over-loadable functions then can easily be over-ridden if necessary. The rest of this section provides details on this overloading mechanism. Over-loadable functions are created by defining functions with the `define-overload' macro rather than the usual `defun'. `define-overload' is a thin wrapper around `defun' that sets up the function so that it can be overloaded. An over-loadable function then can be over-ridden in one of two ways: `define-mode-overload-implementation' and `semantic-install-function-overrides'. Let's look at a couple of examples. `semantic-parse-region' is one of the top level functions in the parser engine defined via `define-overload': (define-overload semantic-parse-region (start end &optional nonterminal depth returnonerror) "Parse the area between START and END, and return any tokens found. ... tokens.") The document string was truncated in the middle above since it is not relevant here. The macro invocation above defines the `semantic-parse-region' Emacs-Lisp function that checks first if there is an overloaded implementation. If one is found, then that is called. If a mode specific implementation is not found, then the default implementation is called which in this case is to call `semantic-parse-region-default', i.e., a function with the same name but with the tailing -default. That function needs to be written separately and take the same arguments as the entry created with `define-overload'. One way to overload `semantic-parse-region' is via `semantic-install-function-overrides'. An example from `semantic-texi.el' file is shown below: (defun semantic-default-texi-setup () "Set up a buffer for parsing of Texinfo files." ;; This will use our parser. (semantic-install-function-overrides '((parse-region . semantic-texi-parse-region) (parse-changes . semantic-texi-parse-changes))) ... ) (add-hook 'texinfo-mode-hook 'semantic-default-texi-setup) Above function is called whenever a buffer is setup as texinfo mode. `semantic-install-function-overrides' above indicates that `semantic-texi-parse-region' is to over-ride the default implementation of `semantic-parse-region'. Note the use of `parse-region' symbol which is `semantic-parse-region' without the leading semantic- prefix. Another way to over-ride a built-in semantic function is via `define-mode-overload-implementation'. An example from `wisent-python.el' file is shown below. (define-mode-overload-implementation semantic-parse-region python-mode (start end &optional nonterminal depth returnonerror) "Over-ride in order to initialize some variables." (let ((wisent-python-lexer-indent-stack '(0)) (wisent-python-explicit-line-continuation nil)) (semantic-parse-region-default start end nonterminal depth returnonerror))) Above over-rides `semantic-parse-region' so that for buffers whose major mode is `python-mode', the code specified above is executed rather than the default implementation. Why not to use advice --------------------- One may wonder why semantic Emacs already has advice. *Note (elisp)Advising Functions::. Advising is generally considered a mechanism of last resort when modifying or hooking into an existing package without modifying that sourse file. Overload files advertise that they should be overloaded, and define syntactic sugar to do so.  File: semantic-langdev.info, Node: Semantic Parser Structure, Next: Application API Structure, Prev: Semantic Overload Mechanism, Up: Language Support Overview Semantic Parser Structure ========================= NOTE: describe the functions that do parsing, and how to overload each.  File: semantic-langdev.info, Node: Application API Structure, Prev: Semantic Parser Structure, Up: Language Support Overview Application API Structure ========================= NOTE: improve this: How to program with the Application programming API into the data structures created by semantic guide. Read that guide to get a feel for the specifics of what you can customize. *note (semantic-appdev)top:: Here are a list of applications, and the specific APIs that you will need to overload to make them work properly with your language. `imenu' `speedbar' `ecb' These tools requires that the `semantic-format' methods create correct strings. *note (semantic-addpev)Format Tag:: `semantic-analyze' The analysis tool requires that the `semanticdb' tool is active, and that the searching methods are overloaded. In addition, `semanticdb' system database could be written to provide symbols from the global environment of your langauge. *note (semantic-appdev)System Databases:: In addition, the analyzer requires that the `semantic-ctxt' methods are overloaded. These methods allow the analyzer to look at the context of the cursor in your language, and predict the type of location of the cursor. *note (semantic-appdev)Derived Context::. `semantic-idle-summary-mode' `semantic-idle-completions-mode' These tools use the semantic analysis tool. *note ()Context Analysis. . semantic-appdev:: * Menu: * Semantic Analyzer Support::  File: semantic-langdev.info, Node: Semantic Analyzer Support, Up: Application API Structure Semantic Analyzer Support -------------------------  File: semantic-langdev.info, Node: Writing Lexers, Next: Writing Parsers, Prev: Language Support Overview, Up: Top Writing Lexers ************** In order to reduce a source file into a tag table, it must first be converted into a token stream. Tokens are syntactic elements such as whitespace, symbols, strings, lists, and punctuation. The lexer uses the major-mode's syntax table for conversion. *Note Syntax Tables: (elisp)Syntax Tables. As long as that is set up correctly (along with the important `comment-start' and `comment-start-skip' variable) the lexer should already work for your language. The primary entry point of the lexer is the "semantic-lex" function shown below. Normally, you do not need to call this function. It is usually called by _semantic-fetch-tags_ for you. - Function: semantic-lex start end &optional depth length Lexically analyze text in the current buffer between START and END. Optional argument DEPTH indicates at what level to scan over entire lists. The last argument, LENGTH specifies that "semantic-lex" should only return LENGTH tokens. The return value is a token stream. Each element is a list, such of the form (symbol start-expression . end-expression) where SYMBOL denotes the token type. See `semantic-lex-tokens' variable for details on token types. END does not mark the end of the text scanned, only the end of the beginning of text scanned. Thus, if a string extends past END, the end of the return token will be larger than END. To truly restrict scanning, use "narrow-to-region". * Menu: * Lexer Overview:: What is a Lexer? * Lexer Output:: Output of a Lexical Analyzer * Lexer Construction:: Constructing your own lexer * Lexer Built In Analyzers:: Built in analyzers you can use * Lexer Analyzer Construction:: Constructing your own anlyzers * Keywords:: Specialized lexical tokens. * Keyword Properties::  File: semantic-langdev.info, Node: Lexer Overview, Next: Lexer Output, Up: Writing Lexers Lexer Overview ============== semantic tokens. This process is based mostly on regular expressions which in turn depend on the syntax table of the buffer's major mode being setup properly. *Note Major Modes: (emacs)Major Modes. *Note Syntax Tables: (elisp)Syntax Tables. *Note Regexps: (emacs)Regexps. The top level lexical function "semantic-lex", calls the function stored in "semantic-lex-analyzer". The default value is the function "semantic-flex" from version 1.4 of Semantic. This will eventually be depricated. In the default lexer, the following regular expressions which rely on syntax tables are used: ``\\s-'' whitespace characters ``\\sw'' word constituent ``\\s_'' symbol constituent ``\\s.'' punctuation character ``\\s<'' comment starter ``\\s>'' comment ender ``\\s\\'' escape character ``\\s)'' close parenthesis character ``\\s$'' paired delimiter ``\\s\"'' string quote ``\\s\''' expression prefix In addition, Emacs' built-in features such as `comment-start-skip', `forward-comment', `forward-list', and `forward-sexp' are employed.  File: semantic-langdev.info, Node: Lexer Output, Next: Lexer Construction, Prev: Lexer Overview, Up: Writing Lexers Lexer Output ============ The lexer, *Note semantic-lex::, scans the content of a buffer and returns a token list. Let's illustrate this using this simple example. 00: /* 01: * Simple program to demonstrate semantic. 02: */ 03: 04: #include 05: 06: int i_1; 07: 08: int 09: main(int argc, char** argv) 10: { 11: printf("Hello world.\n"); 12: } Evaluating `(semantic-lex (point-min) (point-max))' within the buffer with the code above returns the following token list. The input line and string that produced each token is shown after each semi-colon. ((punctuation 52 . 53) ; 04: # (INCLUDE 53 . 60) ; 04: include (punctuation 61 . 62) ; 04: < (symbol 62 . 67) ; 04: stdio (punctuation 67 . 68) ; 04: . (symbol 68 . 69) ; 04: h (punctuation 69 . 70) ; 04: > (INT 72 . 75) ; 06: int (symbol 76 . 79) ; 06: i_1 (punctuation 79 . 80) ; 06: ; (INT 82 . 85) ; 08: int (symbol 86 . 90) ; 08: main (semantic-list 90 . 113) ; 08: (int argc, char** argv) (semantic-list 114 . 147) ; 09-12: body of main function ) As shown above, the token list is a list of "tokens". Each token in turn is a list of the form (TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION) where TOKEN-TYPE is a symbol, and the other two are integers indicating the buffer position that delimit the token such that (buffer-substring BEGINNING-POSITION ENDING-POSITION) would return the string form of the token. Note that one line (line 4 above) can produce seven tokens while the whole body of the function produces a single token. This is because the DEPTH parameter of `semantic-lex' was not specified. Let's see the output when DEPTH is set to 1. Evaluate `(semantic-lex (point-min) (point-max) 1)' in the same buffer. Note the third argument of `1'. ((punctuation 52 . 53) ; 04: # (INCLUDE 53 . 60) ; 04: include (punctuation 61 . 62) ; 04: < (symbol 62 . 67) ; 04: stdio (punctuation 67 . 68) ; 04: . (symbol 68 . 69) ; 04: h (punctuation 69 . 70) ; 04: > (INT 72 . 75) ; 06: int (symbol 76 . 79) ; 06: i_1 (punctuation 79 . 80) ; 06: ; (INT 82 . 85) ; 08: int (symbol 86 . 90) ; 08: main (open-paren 90 . 91) ; 08: ( (INT 91 . 94) ; 08: int (symbol 95 . 99) ; 08: argc (punctuation 99 . 100) ; 08: , (CHAR 101 . 105) ; 08: char (punctuation 105 . 106) ; 08: * (punctuation 106 . 107) ; 08: * (symbol 108 . 112) ; 08: argv (close-paren 112 . 113) ; 08: ) (open-paren 114 . 115) ; 10: { (symbol 120 . 126) ; 11: printf (semantic-list 126 . 144) ; 11: ("Hello world.\n") (punctuation 144 . 145) ; 11: ; (close-paren 146 . 147) ; 12: } ) The DEPTH parameter "peeled away" one more level of "list" delimited by matching parenthesis or braces. The depth parameter can be specified to be any number. However, the parser needs to be able to handle the extra tokens. This is an interesting benefit of the lexer having the full resources of Emacs at its disposal. Skipping over matched parenthesis is achieved by simply calling the built-in functions `forward-list' and `forward-sexp'.  File: semantic-langdev.info, Node: Lexer Construction, Next: Lexer Built In Analyzers, Prev: Lexer Output, Up: Writing Lexers Lexer Construction ================== While using the default lexer is certainly an option, particularly for grammars written in semantic 1.4 style, it is usually more efficient to create a custom lexer for your language. You can create a new lexer with "define-lex". - Function: define-lex name doc &rest analyzers Create a new lexical analyzer with NAME. DOC is a documentation string describing this analyzer. ANALYZERS are small code snippets of analyzers to use when building the new NAMED analyzer. Only use analyzers which are written to be used in "define-lex". Each analyzer should be an analyzer created with "define-lex-analyzer". Note: The order in which analyzers are listed is important. If two analyzers can match the same text, it is important to order the analyzers so that the one you want to match first occurs first. For example, it is good to put a numbe analyzer in front of a symbol analyzer which might mistake a number for as a symbol. The list of ANALYZERS, needed here can consist of one of several built in analyzers, or one of your own construction. The built in analyzers are:  File: semantic-langdev.info, Node: Lexer Built In Analyzers, Next: Lexer Analyzer Construction, Prev: Lexer Construction, Up: Writing Lexers Lexer Built In Analyzers ======================== - Special Form: semantic-lex-default-action The default action when no other lexical actions match text. This action will just throw an error. - Special Form: semantic-lex-beginning-of-line Detect and create a beginning of line token (BOL). - Special Form: semantic-lex-newline Detect and create newline tokens. - Special Form: semantic-lex-newline-as-whitespace Detect and create newline tokens. Use this ONLY if newlines are not whitespace characters (such as when they are comment end characters) AND when you want whitespace tokens. - Special Form: semantic-lex-ignore-newline Detect and create newline tokens. Use this ONLY if newlines are not whitespace characters (such as when they are comment end characters). - Special Form: semantic-lex-whitespace Detect and create whitespace tokens. - Special Form: semantic-lex-ignore-whitespace Detect and skip over whitespace tokens. - Special Form: semantic-lex-number Detect and create number tokens. Number tokens are matched via this variable: - Variable: semantic-lex-number-expression Regular expression for matching a number. If this value is `nil', no number extraction is done during lex. This expression tries to match C and Java like numbers. DECIMAL_LITERAL: [1-9][0-9]* ; HEX_LITERAL: 0[xX][0-9a-fA-F]+ ; OCTAL_LITERAL: 0[0-7]* ; INTEGER_LITERAL: [lL]? | [lL]? | [lL]? ; EXPONENT: [eE][+-]?[09]+ ; FLOATING_POINT_LITERAL: [0-9]+[.][0-9]*?[fFdD]? | [.][0-9]+?[fFdD]? | [0-9]+[fFdD]? | [0-9]+?[fFdD] ; - Special Form: semantic-lex-symbol-or-keyword Detect and create symbol and keyword tokens. - Special Form: semantic-lex-charquote Detect and create charquote tokens. - Special Form: semantic-lex-punctuation Detect and create punctuation tokens. - Special Form: semantic-lex-punctuation-type Detect and create a punctuation type token. Recognized punctuations are defined in the current table of lexical types, as the value of the `punctuation' token type. - Special Form: semantic-lex-paren-or-list Detect open parenthesis. Return either a paren token or a semantic list token depending on `semantic-lex-current-depth'. - Special Form: semantic-lex-open-paren Detect and create an open parenthisis token. - Special Form: semantic-lex-close-paren Detect and create a close paren token. - Special Form: semantic-lex-string Detect and create a string token. - Special Form: semantic-lex-comments Detect and create a comment token. - Special Form: semantic-lex-comments-as-whitespace Detect comments and create a whitespace token. - Special Form: semantic-lex-ignore-comments Detect and create a comment token.  File: semantic-langdev.info, Node: Lexer Analyzer Construction, Next: Keywords, Prev: Lexer Built In Analyzers, Up: Writing Lexers Lexer Analyzer Construction =========================== Each of the previous built in analyzers are constructed using a set of analyzer construction macros. The root construction macro is: - Function: define-lex-analyzer name doc condition &rest forms Create a single lexical analyzer NAME with DOC. When an analyzer is called, the current buffer and point are positioned in a buffer at the location to be analyzed. CONDITION is an expression which returns `t' if FORMS should be run. Within the bounds of CONDITION and FORMS, the use of backquote can be used to evaluate expressions at compile time. While forms are running, the following variables will be locally bound: `semantic-lex-analysis-bounds' - The bounds of the current analysis. of the form (START . END) `semantic-lex-maximum-depth' - The maximum depth of semantic-list for the current analysis. `semantic-lex-current-depth' - The current depth of `semantic-list' that has been decended. `semantic-lex-end-point' - End Point after match. Analyzers should set this to a buffer location if their match string does not represent the end of the matched text. `semantic-lex-token-stream' - The token list being collected. Add new lexical tokens to this list. Proper action in FORMS is to move the value of `semantic-lex-end-point' to after the location of the analyzed entry, and to add any discovered tokens at the beginning of `semantic-lex-token-stream'. This can be done by using "semantic-lex-push-token". Additionally, a simple regular expression based analyzer can be built with: - Function: define-lex-regex-analyzer name doc regexp &rest forms Create a lexical analyzer with NAME and DOC that will match REGEXP. FORMS are evaluated upon a successful match. See "define-lex-analyzer" for more about analyzers. - Function: define-lex-simple-regex-analyzer name doc regexp toksym &optional index &rest forms Create a lexical analyzer with NAME and DOC that match REGEXP. TOKSYM is the symbol to use when creating a semantic lexical token. INDEX is the index into the match that defines the bounds of the token. Index should be a plain integer, and not specified in the macro as an expression. FORMS are evaluated upon a successful match BEFORE the new token is created. It is valid to ignore FORMS. See "define-lex-analyzer" for more about analyzers. Regular expression analyzers are the simplest to create and manage. Often, a majority of your lexer can be built this way. The analyzer for matching punctuation looks like this: (define-lex-simple-regex-analyzer semantic-lex-punctuation "Detect and create punctuation tokens." "\\(\\s.\\|\\s$\\|\\s'\\)" 'punctuation) More complex analyzers for matching larger units of text to optimize the speed of parsing and analysis is done by matching blocks. - Function: define-lex-block-analyzer name doc spec1 &rest specs Create a lexical analyzer NAME for paired delimiters blocks. It detects a paired delimiters block or the corresponding open or close delimiter depending on the value of the variable `semantic-lex-current-depth'. DOC is the documentation string of the lexical analyzer. SPEC1 and SPECS specify the token symbols and open, close delimiters used. Each SPEC has the form: (BLOCK-SYM (OPEN-DELIM OPEN-SYM) (CLOSE-DELIM CLOSE-SYM)) where BLOCK-SYM is the symbol returned in a block token. OPEN-DELIM and CLOSE-DELIM are respectively the open and close delimiters identifying a block. OPEN-SYM and CLOSE-SYM are respectively the symbols returned in open and close tokens. These blocks is what makes the speed of semantic's Emacs Lisp based parsers fast. For exmaple, by defining all text inside { braces } as a block the parser does not need to know the contents of those braces while parsing, and can skip them all together.  File: semantic-langdev.info, Node: Keywords, Next: Keyword Properties, Prev: Lexer Analyzer Construction, Up: Writing Lexers Keywords ======== Another important piece of the lexer is the keyword table (see *Note Writing Parsers::). You language will want to set up a keyword table for fast conversion of symbol strings to language terminals. The keywords table can also be used to store additional information about those keywords. The following programming functions can be useful when examining text in a language buffer. - Function: semantic-lex-keyword-p name Return non-`nil' if a keyword with NAME exists in the keyword table. Return `nil' otherwise. - Function: semantic-lex-keyword-put name property value For keyword with NAME, set its PROPERTY to VALUE. - Function: semantic-lex-keyword-get name property For keyword with NAME, return its PROPERTY value. - Function: semantic-lex-map-keywords fun &optional property Call function FUN on every semantic keyword. If optional PROPERTY is non-`nil', call FUN only on every keyword which as a PROPERTY value. FUN receives a semantic keyword as argument. - Function: semantic-lex-keywords &optional property Return a list of semantic keywords. If optional PROPERTY is non-`nil', return only keywords which have a PROPERTY set. Keyword properties can be set up in a grammar file for ease of maintenance. While examining the text in a language buffer, this can provide an easy and quick way of storing details about text in the buffer.  File: semantic-langdev.info, Node: Keyword Properties, Prev: Keywords, Up: Writing Lexers Standard Keyword Properties =========================== Keywords in a language can have multiple properties. These properties can be used to associate the string that is the keyword with additional information. Currently available properties are: summary The summary property is used by semantic-summary-mode as a help string for the keyword specified. Notes: Possible future properties. This is just me musing: face Face used for highlighting this keyword, differentiating it from the keyword face. template skeleton Some sort of tempo/skel template for inserting the programatic structure associated with this keyword. abbrev As with template. action menu Perhaps the keyword is clickable and some action would be useful.  File: semantic-langdev.info, Node: Writing Parsers, Next: Parsing a language file, Prev: Writing Lexers, Up: Top Writing Parsers *************** When converting a source file into a tag table it is important to specify rules to accomplish this. The rules are stored in the buffer local variable `semantic--buffer-cache'. While it is certainly possible to write this table yourself, it is most likely you will want to use the *Note Grammar Programming Environment::. There are three choices for parsing your language. Bovine Parser The "bovine" parser is the original semantic parser, and is an implementation of an LL parser. For more information, *note the Bovine Parser Manual: (bovine)top. Wisent Parser The "wisent" parser is a port of the GNU Compiler Compiler Bison to Emacs Lisp. Wisent includes the iterative error handler of the bovine parser, and has the same error correction as traditional LALR parsers. For more information, *note the Wisent Parser Manual: (wisent)top. External Parser External parsers, such as the texinfo parser can be implemented using any means. This allows the use of a regular expression parser for non-regular languages, or external programs for speed. * Menu: * External Parsers:: Writing an external parser * Grammar Programming Environment:: Using the grammar writing environemt * Parser Backend Support:: Lisp needed to support a grammar.  File: semantic-langdev.info, Node: External Parsers, Next: Grammar Programming Environment, Up: Writing Parsers External Parsers ================ The texinfo parser in `semantic-texi.el' is an example of an external parser. To make your parser work, you need to have a setup function. Note: Finish this.  File: semantic-langdev.info, Node: Grammar Programming Environment, Next: Parser Backend Support, Prev: External Parsers, Up: Writing Parsers Grammar Programming Environment =============================== Semantic grammar files in `.by' or `.wy' format have their own programming mode. This mode provides indentation and coloring services in those languages. In addition, the grammar languages are also supported by semantic tools such as imenu or speedbar. For more information, *note the Grammar Framework Manual: (grammar-fw)top.  File: semantic-langdev.info, Node: Parsing a language file, Next: Debugging, Prev: Writing Parsers, Up: Top Parsing a language file *********************** The best way to call the parser from programs is via `semantic-fetch-tags'. This, in turn, uses other internal API functions which plug-in parsers can take advantage of. - Function: semantic-fetch-tags Fetch semantic tags from the current buffer. If the buffer cache is up to date, return that. If the buffer cache is out of date, attempt an incremental reparse. If the buffer has not been parsed before, or if the incremental reparse fails, then parse the entire buffer. If a lexcial error had been previously discovered and the buffer was marked unparseable, then do nothing, and return the cache. Another approach is to let Emacs call the parser on idle time, when needed, then use `semantic-fetch-available-tags' to retrieve and process only the available tags, provided that the `semantic-after-*-hook' hooks have been setup to synchronize with new tags when they become available. - Function: semantic-fetch-available-tags Fetch available semantic tags from the current buffer. That is, return tags currently in the cache without parsing the current buffer. Parse operations happen asynchronously when needed on Emacs idle time. Use the `semantic-after-toplevel-cache-change-hook' and `semantic-after-partial-cache-change-hook' hooks to synchronize with new tags when they become available. - Command: semantic-clear-toplevel-cache Clear the toplevel tag cache for the current buffer. Clearing the cache will force a complete reparse next time a token stream is requested.  File: semantic-langdev.info, Node: Parser Backend Support, Prev: Grammar Programming Environment, Up: Writing Parsers Parser Backend Support ====================== Once you have written a grammar file that has been compiled into Emacs Lisp code, additional glue needs to be written to finish connecting the generated parser into the Emacs framework. Large portions of this glue is automatically generated, but will probably need additional modification to get things to work properly. Typically, a grammar file `foo.wy' will create the file `foo-wy.el'. It is then useful to also create a file `wisent-foo.el' (or `sematnic-foo.el') to contain the parser back end, or the glue that completes the semantic support for the language. * Menu: * Example Backend File:: * Tag Expansion::  File: semantic-langdev.info, Node: Example Backend File, Next: Tag Expansion, Up: Parser Backend Support Example Backend File -------------------- Typical structure for this file is: ;;; semantic-foo.el -- parser support for FOO. ;;; Your copyright Notice (require 'foo-wy.el) ;; The parser (require 'foo) ;; major mode definition for FOO ;;; Code: ;;; Lexical Analyzer ;; ;; OPTIONAL ;; It is possible to define your lexical analyzer completely in your ;; grammar file. (define-lex foo-lexical-analyzer "Create a lexical analyzer." ...) ;;; Expand Function ;; ;; OPTIONAL ;; Not all langauges are so complex as to need this function. ;; See `semantic-tag-expand-function' for more details. (defun foo-tag-expand-function (tag) "Expand TAG into multiple tags if needed." ...) ;;; Parser Support ;; ;; OPTIONAL ;; If you need some specialty routines inside your grammar file ;; you can add some here. The process may be to take diverse info ;; and reorganize it. ;; ;; It is also appropriate to write these functions in the prologue ;; of the grammar function. (defun foo-do-something-hard (...) "...") ;;; Overload methods ;; ;; OPTIONAL ;; To allow your langauge to be fully supported by all the ;; applications that use semantic, it is important, but not necessary ;; to create implementations of overload methods. (define-mode-overload-implementation some-semantic-function foo-mode (tag) "Implement some-semantic-function for FOO." ) ;;;###autoload (defun semantic-default-foo-setup () "Set up a buffer for semantic parsing of the FOO language." (semantic-foo-by--install-parser) (setq semantic-tag-expand-function foo-tag-expand-function ;; Many other language specific settings can be done here ;; as well. ) ;; This may be optional (setq semantic-lex-analyzer #'foo-lexical-analyzer) ) ;;;###autoload (add-hook 'foo-mode-hook 'semantic-default-foo-setup) (provide 'semantic-c) ;;; semantic-foo.el ends here  File: semantic-langdev.info, Node: Tag Expansion, Prev: Example Backend File, Up: Parser Backend Support Tag Expansion ------------- In any language with compound tag types, you will need to implement an _expand function_. Once written, assign it to this variable. - Variable: semantic-tag-expand-function Function used to expand a tag. It is passed each tag production, and must return a list of tags derived from it, or `nil' if it does not need to be expanded. Languages with compound definitions should use this function to expand from one compound symbol into several. For example, in C or Java the following definition is easily parsed into one tag: int a, b; This function should take this compound tag and turn it into two tags, one for A, and the other for B. Additionally, you can use the expand function in conjuntion with your language for other types of compound statements. For example, in Common Lisp Object System, you can have a definition: (defclass classname nil (slots ...) ...) This will create both the datatype `classname' and the functional constructor `classname'. Each slot may have a `:accessor' method as well. You can create a special compounded tag in your rule, for example: classdef: LPAREN DEFCLASS name semantic-list semantic-list RPAREN (TAG "custom" 'compound-class :value (list (TYPE-TAG $3 "class" ...) (FUNCTION-TAG $3 ...) )) ; and in your expand function, you would write: (defun my-tag-expand (tag) "Expand tags for my langauge." (when (semantic-tag-of-class-p tag 'compound-class) (remq nil (semantic-tag-get-attribute tag :value)))) This will cause the custom tag to be replaced by the tags created in the :value attribute of the specially constructed tag.  File: semantic-langdev.info, Node: Debugging, Next: Parser Error Handling, Prev: Parsing a language file, Up: Top Debugging ********* Grammars can be tricky things to debug. There are several types of tools for debugging in Semantic, and the type of problem you have requires different types of tools. * Menu: * Lexical Debugging:: * Parser Output tools:: * Bovine Parser Debugging:: * Wisent Parser Debugging:: * Overlay Debugging:: * Incremental Parser Debugging:: * Debugging Analysis:: * Semantic 1.4 Doc::  File: semantic-langdev.info, Node: Lexical Debugging, Next: Parser Output tools, Up: Debugging Lexical Debugging ================= The first major problem you may encounter is with lexical analysis. If the text is not transformed into the expected token stream, no parser will understand it. You can step through the lexical analyzer with the following command: - Command: semantic-lex-debug arg Debug the semantic lexer in the current buffer. Argument ARG specifies of the analyze the whole buffer, or start at point. While engaged, each token identified by the lexer will be highlighted in the target buffer A description of the current token will be displayed in the minibuffer. Press `SPC' to move to the next lexical token. For an example of what the output of the `semantic-lex' function should return, see *Note Lexer Output::.  File: semantic-langdev.info, Node: Parser Output tools, Next: Bovine Parser Debugging, Prev: Lexical Debugging, Up: Debugging Parser Output tools =================== There are several tools which can be used to see what the parser output is. These will work for any type of parser, including the Bovine parser, Wisent parser. The first and easiest is a minor mode which highlights text the parser did not understand. - Command: semantic-show-unmatched-syntax-mode &optional arg Minor mode to highlight unmatched lexical syntax tokens. When a parser executes, some elements in the buffer may not match any parser rules. These text characters are considered unmatched syntax. Often time, the display of unmatched syntax can expose coding problems before the compiler is run. With prefix argument ARG, turn on if positive, otherwise off. The minor mode can be turned on only if semantic feature is available and the current buffer was set up for parsing. Return non-`nil' if the minor mode is enabled. `key' binding `C-c ,' Prefix Command `C-c , `' semantic-show-unmatched-syntax-next Another interesting mode will display a line between all the tags in the current buffer to make it more obvious where boundaries lie. You can enable this as a minor mode. - Command: semantic-show-tag-boundaries-mode &optional arg Minor mode to display a boundary in front of tags. The boundary is displayed using an overline in Emacs 21. With prefix argument ARG, turn on if positive, otherwise off. The minor mode can be turned on only if semantic feature is available and the current buffer was set up for parsing. Return non-`nil' if the minor mode is enabled. Another interesting mode helps if you are worred about specific attributes, you can se this minor mode to highlight different tokens in different ways based on the attributes you are most concerned with. - Command: semantic-highlight-by-attribute-mode &optional arg Minor mode to highlight tags based on some attribute. By default, the protection of a tag will give it a different background color. With prefix argument ARG, turn on if positive, otherwise off. The minor mode can be turned on only if semantic feature is available and the current buffer was set up for parsing. Return non-`nil' if the minor mode is enabled. Another tool that can be used is a dump of the current list of tags. This shows the actual Lisp representation of the tags generated in a rather bland dump. This can be useful if text was successfully parsed, and you want to be sure that the correct information was captured. - Command: bovinate &optional clear Bovinate the current buffer. Show output in a temp buffer. Optional argument CLEAR will clear the cache before bovinating. If CLEAR is negative, it will do a full reparse, and also not display the output buffer.  File: semantic-langdev.info, Node: Bovine Parser Debugging, Next: Wisent Parser Debugging, Prev: Parser Output tools, Up: Debugging Bovine Parser Debugging ======================= The bovine parser is described in *note (bovine)top::. Asside using a traditional Emacs Lisp debugger on functions you provide for token expansion, there is one other means of debugging which interactively steps over the rules in your grammar file. - Command: semantic-debug Parse the current buffer and run in debug mode. Once the parser is activated in this mode, the current tag cache is flushed, and the parser started. At each stage in the LALR parser, the current rule, and match step is highlighted in your parser source buffer. In a second window, the text being parsed is shown, and the lexical token found is highlighted. A clue of the current stack of saved data is displayed in the minibuffer. There is a wide range of keybindings that can be used to execute code in your buffer. (Not all are implemented.) `n' `SPC' Next. `s' Step. `u' Up. (Not implemented yet.) `d' Down. (Not implemented yet.) `f' Fail Match. Pretend the current match element and the token in the buffer is a failed match, even if it is not. `h' Print information about the current parser state. `s' Jump to to the source buffer. `p' Jump to the parser buffer. `q' Quit. Exits this debug session and the parser. `a' Abort. Aborts one level of the parser, possibly exiting the debugger. `g' Go. Stop debugging, and just start parsing. `b' Set Breakpoint. (Not implemented yet.) `e' `eval-expression'. Lets you execute some random Emacs Lisp command. Note: While the core of `semantic-debug' is a generic debugger interface for rule based grammars, only the bovine parser has a specific backend implementation. If someone wants to implement a debugger backend for wisent, that would be spiff.  File: semantic-langdev.info, Node: Wisent Parser Debugging, Next: Overlay Debugging, Prev: Bovine Parser Debugging, Up: Debugging Wisent Parser Debugging ======================= Wisent does not implement a backend for `semantic-debug', it does have some debugging commands the rule actions. You can read about them in the wisent manual. *note (wisent)Grammar Debugging::  File: semantic-langdev.info, Node: Overlay Debugging, Next: Incremental Parser Debugging, Prev: Wisent Parser Debugging, Up: Debugging Overlay Debugging ================= Once a buffer has been parsed into a tag table, the next most important step is getting those tags activated for a buffer, and storable in a `semanticdb' backend. *note (semantic-appdev)semanticdb::. These two activities depend on the ability of every tag in the table to be linked and unlinked to the current buffer with an overlay. *note (Tag Overlay)semantic-appdev:: *note (Tag Hooks)semantic-appdev:: In this case, the most important function that must be written is: - Function: semantic-tag-components-with-overlays tag Return the list of top level components belonging to TAG. Children are any sub-tags which contain overlays. Default behavior is to get "semantic-tag-components" in addition to the components of an anonymous types (if applicable.) Note for language authors: If a mode defines a language tag that has tags in it with overlays you should still return them with this function. Ignoring this step will prevent several features from working correctly. This function can be overriden in semantic using the symbol `tag-components-with-overlays'. If your are successfully building a tag table, and errors occur saving or restoring tags from semanticdb, this is the most likely cause of the problem.  File: semantic-langdev.info, Node: Incremental Parser Debugging, Next: Debugging Analysis, Prev: Overlay Debugging, Up: Debugging Incremental Parser Debugging ============================ The incremental parser is a highly complex engine for quickly refreshing the tag table of a buffer after some set of changes have been made to that buffer by a user. There is no debugger or interface to the incremental parser, however there are a few minor modes which can help you identify issues if you think there are problems while incrementally parsing a buffer. The first stage of the incremental parser is in tracking the changes the user makes to a buffer. You can visibly track these changes too. - Command: semantic-highlight-edits-mode &optional arg Minor mode for highlighting changes made in a buffer. Changes are tracked by semantic so that the incremental parser can work properly. This mode will highlight those changes as they are made, and clear them when the incremental parser accounts for those edits. With prefix argument ARG, turn on if positive, otherwise off. The minor mode can be turned on only if semantic feature is available and the current buffer was set up for parsing. Return non-`nil' if the minor mode is enabled. Another important aspect of the incremental parser involves tracking the current parser state of the buffer. You can track this state also. - Command: semantic-show-parser-state-mode &optional arg Minor mode for displaying parser cache state in the modeline. The cache can be in one of three states. They are Up to date, Partial reprase needed, and Full reparse needed. The state is indicated in the modeline with the following characters: `-' The cache is up to date. `!' The cache requires a full update. `^' The cache needs to be incrementally parsed. `%' The cache is not currently parseable. `@' Auto-parse in progress (not set here.) With prefix argument ARG, turn on if positive, otherwise off. The minor mode can be turned on only if semantic feature is available and the current buffer was set up for parsing. Return non-`nil' if the minor mode is enabled. When the incremental parser starts updating the tags buffer, you can also enable a set of messages to help identify how the incremental parser is merging changes with the main buffer. - Variable: semantic-edits-verbose-flag Non-`nil' means the incremental perser is verbose. If `nil', errors are still displayed, but informative messages are not.