Portable PDP-8 and DG Nova Cross-assembler

Intended to be a basic example of an assembler for two simple architectures, written with the aid of lex and yacc (or GNU flex/bison). Being small, it should lend itself to learning, extension and modification. The source code is released under the GPL and has been built and tested on NetBSD, Linux and OS X.

Building and Running

(First install Subversion client if you don't have it.)
Check out the source code: svn checkout http://telegraphics.com.au/svn/dpa/trunk dpa
Build: cd dpa; make
Test: make test (assembles all included examples)

More information

See the PDP-8 FAQ. If you don't own a PDP-8 (sadly, nor do I), I can highly recommend Bernhard Baehr's slick PDP-8/E Simulator for Macintosh, which apart from being attractive and usable, has the impressive virtue of running on all system software versions from 2 through OS X. Also available is Bob Supnik's outstanding simh emulator for these, and many other, machines (including software kits).

The assembler also targets the DG Nova, a related architecture with a 16-bit word length.

Why lex and yacc?

It has been said that writing an assembler using lex and yacc is "cheating". I can clearly show that it is, in fact, a very rational choice of tools.

The huge payoff of using lex and yacc: they free the assembler writer from the very tedious and error-prone activities of designing and building the lexical analyser and parser, and allows the author to concentrate on the relevant parts of the project: those that pertain to an assembler in general, and those that pertain to the specific architecture at hand.

(For simplicity I will refer to flex and GNU bison below rather than the original UNIX lex and yacc. Also note that all line counts run roughshod over comments, whitespace, etc.)

The magnitude of this payoff can be shown by a quick measurement of the source code:

dpa 1.95 lines approx. fraction
per architecture

architecture independent code 785 47%

PDP-8-specific code and tables 695 42%

PDP-8 lexer (.l) 70 4%

PDP-8 parser (.y) 109 7%

Nova-specific code and tables 1071 42%

Nova lexer (.l) 191 8%

Nova parser (.y) 476 19%

Total 3397

Jones/Coon/Baehr pal.c

lexing/parsing related code 692 58% (In particular I believe it is fair to include the large switch in onepass(), as this structure directly corresponds to the form of the parser.y I have used for dpa.)

other code and tables 508 42%

Total 1200

flex 2.5.4a source 17,033 (I am being only slightly unfair here, since flex and bison are general purpose tools, not tailored for a specific project. This is not a very relevant objection however, because dpa ends up exercising a significant part of their feature set, and also a custom-designed lexer/parser by definition is one-time effort that cannot be amortised over multiple projects.)

bison 1.875 source 21,871

It is clear that the effort is being expended where it should be, on the higher level task: nearly half on architecture-independent code (the stuff that any assembler does); and the other half on architecture specifics. The intricacies of lexing and parsing do not weigh the project down unnecessarily at only about one tenth of the overall product. The benefit to maintainability is hard to overstate.

In the case of the excellent pal.c assembler, the lexical analyser and parsing logic, as elegant as it is, overwhelms the actual task at hand, at nearly 60% of the total.

Because flex and bison are mature, high performance and debugged products, a resulting assembler (for any non-trivial input syntax) is far easier to maintain, more reliable, and very likely faster than one using a handbuilt lexer/parser. One important reason for this is that lexical analysers and parsers are particularly tedious machines to design, build, test and make fully correct; it's an entire field of study in itself. An assembler is a relatively simple thing.

In the Larry Wall sense, sometimes I am a "lazy" programmer who would prefer to solve a problem than reinvent a very intricate set of wheels. I often judge a toolset by how closely the size of a solution reflects the size of a problem. If the solution is somehow disproportionate to the difficulty of the problem, then likely the wrong tool is being used, or the tool is badly designed. (Examples are legion.)

In short, if I sit down to write an assembler, I should be thinking about the stuff an assembler does, and the architecture for which I am assembling. If I can leverage the drudgework put in by the geniuses who designed lex and yacc and their descendants, I will.

Back to software index (and download links)