A universal format for code base metadata
Master Project of Jonathan Dönszelmann


Problem statement

Tools such as Github are used by many to explore new code bases, and therefore Github invests in editor services such as syntax highlighting, code navigation (navigating from a reference to a declaration), and hover information (signatures and documentation) for the most common programming languages they support. However, at their scale, Github is limited in the kinds of analyses they can perform on a code base. Therefore, less common languages, DSLs, and languages with complex (semantic) resolution rules (such as Agda) cannot benefit from the simple static analyses that Github can perform (using, say, Stack Graphs).

Project description

A universal format for the semantic metadata of a code base that tools such as Github can use to provide semantic highlighting, accurate code navigation, inline documentation, hover signatures, etc, for any language for which a tool can generate metadata in this format. There are already language-specific formats, such as Agda’s interface files with the resolution information and DWARF debug symbols for compiled binaries, but there is no universal format yet. This format could be generated by external tools, or be generated as part of the program compilation or analysis. Providing Github and other tools with this metadata would instantly enable them to give users a great experience when exploring code bases using these editor services, even for obscure or complex languages, without requiring complex analyses or any extensive effort on their part.

Contacts for the project


A universal format for code base metadata

Student: Jonathan Dönszelmann
Supervisor(s): Daniël Pelsmaeker, Jesper Cockx