How to Add a New Parser to Exuberant Ctags

I have been asked more than once how one might write a new parser for Exuberant Ctags. Therefore, I put this short note together to get you started.

Operational background

As ctags reads each supplied file name, it looks at the file extension, then maps the extension to a language using the Option.langMap[] table. The file is opened and then the appropriate language X parser, createXTags(), is called which reads the file as a stream using fileGetc(). Whenever the parser finds some interesting token, it populates a structure and passes it to makeTagEntry().

In order to do its job, the parser should read the file stream using fileGetc() and can put back a character using fileUngetc(). In order to create a tag, the parser defines a local variable of typetagEntryInfo, initializes it using initTagEntry() (which initializes defaults and fills information about the current line number and the file position of the beginning of the line). After filling in information defining the current entry (and possibly overriding the file position or other defaults), the parser passes this structure to makeTagEntry().

This is all there is to it. All other details (and the mess you see when you look at eiffel.c, fortran.c, or parse.c as examples) are specific to the parser and how it wants to do its job. There are some support functions which can take care of some commonly needed parsing tasks and these will be mentioned below.

Integrating a new parser

Let's assume that I want to add support for my new language, Swine, the successor to Perl (i.e. Perl before Swine <wince>).

First, I create a new module, swine.c, and add one externally visible function to it, extern void createSwineTags(void), and add its prototype to parse.h.

Then I add an enumerator for my language, LANG_SWINE, to the langType enumeration in ctags.h. Next I add a list of default extensions, static const char *const SwineExtensions[] to options.c and add this list to the DefaultLanguageMap[] table. I also add the textual name of my language to the list of names in getLanguageName() at the index corresponding to the value of the new langType enumerator. Lastly, I add a call to createSwineTags() to createTagsForFile(), to be called when the value of language matches LANG_SWINE. This completes the basic integration of my new parser. If I compile and link ctags with my new module and run ctags, supplying it a file name with a corresponding extension, my entry point, createSwineTags() is called.

Now all that is left is to implement the parser using fileGetc(), fileUngetc(), and makeTagEntry(). How createSwineTags() actually parses the contents of the file is entirely up to you--it can be as crude or elegant as you would like. I have kind of derived an approach lately which is used in the newer Eiffel and Fortran modules, but this specific approach is not required.

It should be apparent that adding support for a new parser is actually quite easy, although it may seem like so much stuff at first until you see through the trees. Almost everyting is already taken care of automatically for you by the infrastructure. Writing the parser is the hardest part, but is not constrained by any need to conform to anything in ctags other than that mentioned above.

I do have available some keyword table management code in keyword.c and you can look at eiffel.c to see how it is used.


Return Back to Exuberant Ctags