XParse Overview

XParse is a domain-specific language that attempts to combine the power of tools like lex and yacc, which generate efficient parsers from declarative specifications, with the convenience, safety, and usability of text- or XML-processing languages such as Perl or XSLT. XParse is a standalone language which provides lex-style regular expression matching and yacc-style LALR(1) parsing.

Existing parsing tools such as lex and yacc can be difficult to learn, are usually highly language-dependent, and often vary across languages and platforms. Parsers are therefore difficult to create and reuse, so are usually developed on a per-application basis rather than a per-language basis. Unlike traditional, language-dependent parser tools, the semantic actions in XParse denote XML fragments rather than uninterpreted source code. This design facilitates independent typechecking and analysis of XParse programs, without first translating to some other programming language as is the norm for existing parsing tools. Furthermore, since XML already enjoys wide support, XParse programs can be reused in many environments without modification.

Currently, XParse is implemented as two separate tools, xlex and xyacc. We have developed several parsing applications with these tools, including a more user-friendly concrete syntax for XSLT, a parser for Java, and parsers for xlex and xyacc specifications themselves.

In the future, I plan to merge the two sublanguages of XParse into a single language, and to implement a strong type system for XParse to make it easier to develop correct programs, according to a design described in this paper. Assistance with this task would be very welcome.

Download XParse

The current version of XParse is 0.4. It is available for download as (highly undocumented) source code here.