LJClang -- A LuaJIT-based interface to libclang

Philipp Kutin :max-width: 56em

Introduction

:LuaJIT: http://luajit.org/ :libclang: http://clang.llvm.org/doxygen/group__CINDEX.html :luaclang-parser: https://github.com/mkottman/luaclang-parser

LJClang is an interface to {libclang}[libclang] for {LuaJIT}[LuaJIT], modeled after and mostly API-compatible with {luaclang-parser}[luaclang-parser] by Michal Kottman.

NOTE: Development currently happens on the staging branch. It is planned that it is eventually merged back to master.

Requirements

:LJDownload: http://luajit.org/download.html

  • {LJDownload}[LuaJIT 2.0] (latest Git HEAD of the master branch recommended)
  • LLVM/Clang -- read the http://clang.llvm.org/get_started.html[getting started] guide to find out how to obtain Clang from source. libclang is built and installed along with the Clang compiler.

Building and usage

:Clang-Win32: http://www.ishani.org/web/articles/code/clang-win32/

Most of LJClang is written in Lua (extensively using LuaJIT's FFI), but due to currently existing limitations, a support C library has to be built.

In the provided Makefile, adjust the libclang include path, and issue make to build libljclang_support.so.

NOTE: LJClang has been tested on Ubuntu Linux and Windows (using {Clang-Win32}[Clang-Win32]), but only minor modifications to the build process should be necessary to get it working with other OSes or configurations.

From here on, LJClang can be used with LuaJIT by issuing a require for "ljclang". One likely wants to use LJClang from its development directory without installing it to a system-wide path. Because it expects to find libljclang_support.so and several supporting Lua files, one approach is to wrap client programs into scripts starting LuaJIT with an environment containing appropriate LD_LIBRARY_PATH and LUA_PATH entries. For example, given the following function in .bashrc,


"LuaJIT with added path of the script directory"

ljwp () { local scriptdir=$(cd dirname $1; pwd) LUA_PATH=";;$scriptdir/?.lua" LD_LIBRARY_PATH="$scriptdir" luajit "[email protected]" }


and assuming that LJClang resides in ~/dl/ljclang, the extractdecls.lua program described below could be run from anywhere like this:


$~/some/other/dir: ljwp ~/dl/ljclang/extractdecls.lua [args...]

Overview

LJClang provides a cursor-based, callback-driven API to the abstract syntax tree (AST) of C/C++ source files. These are the main classes:

  • Index -- represents a set of translation units that could be linked together
  • TranslationUnit -- a source file together with everything included by it either directly or transitively
  • Cursor -- an element in the AST in a translation unit such as a typedef declaration or a statement
  • Type -- the type of an element (for example, that of a variable, structure member, or a function's input argument or return value)

To make something interesting happen, you usually create a single Index object, parse into it one or many translation units, and define a callback function to be invoked on each visit of a Cursor by libclang.

Example program

:CXCursorKind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013

The extractdecls.lua script accompanied by LJClang can be used to extract various kinds of C declarations from (usually) headers and print them in various forms usable as FFI C declarations or descriptive tables with LuaJIT.


Usage: ./extractdecls.lua [our_options...] <file.h> [clang_options...] -p -x [-x ] ... -s -1 -2 -C: print lines like static const int membname = 123; (enums/macros only) -R: reverse mapping, only if one-to-one. Print lines like [123] = "membname"; (enums/macros only) -f : user-provided body for formatting function (enums/macros only) Accepts args k', v'; `f' is string.format. Must return a formatted line. Example: "return f('%s = %s%s,', k, k:find('KEY_') and '65536+' or '', v)" Incompatible with -C or -R. -Q: be quiet -w: extract what? Can be EnumConstantDecl (default), TypedefDecl, FunctionDecl, MacroDefinition


In fact, the file ljclang_cursor_kind.lua is generated by this program and is used by LJClang to map values of the enumeration {CXCursorKind}[enum CXCursorKind] to their names. The bootstrap target in the Makefile extracts the relevant information using these options:


-R -p '^CXCursor_' -x '_First' -x 'Last' -x '_GCCAsmStmt' -x '_MacroInstantiation' -s '^CXCursor'
-1 'return { name={' -2 '}, }' -Q


Thus, the typedef declarations are filtered to begin with ++CXCursor_++'' and all secondary'' names aliasing the one considered the main one are rejected. (For example, CXCursor_AsmStmt and CXCursor_GCCAsmStmt have the same value.) Finally, the prefix is stripped (-s) to yield lines like


[215] = "AsmStmt";

Reference

:clang_createIndex: http://clang.llvm.org/doxygen/group__CINDEX.html#func-members :CXChildVisitResult: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga99a9058656e696b622fbefaf5207d715 :clang_parseTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#ga2baf83f8c3299788234c8bce55e4472e :clang_createTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#gaa2e74f6e28c438692fd4f5e3d3abda97

The module returned by require("ljclang") contains the following:

createIndex([excludePch : boolean [, showDiagnostics : boolean]]) -> Index::

Binding for {clang_createIndex}[clang_createIndex]. Will create an Index into which you can parse ++TranslationUnit++s. Both input arguments are optional and default to false. + NOTE: Loading pre-compiled translation units in not implemented.

[[ChildVisitResult]] ChildVisitResult::

An object containing a mapping of names to values permissible as values
{CXChildVisitResult}[returned] from cursor visitor callbacks: `Break`,
`Continue`, `Recurse`.

[[regCursorVisitor]] regCursorVisitor(visitorfunc) -> vf_handle::

Registers a child visitor callback function visitorfunc with LJClang, returning a handle which can be passed to Cursor:children(). The callback function receives two input arguments, (cursor, parent) -- with the cursors of the currently visited entity as well as its parent, and must return a value from the ChildVisitResult enumeration to indicate whether or how libclang should carry on AST visiting. +

CAUTION: The cursor passed to the visitor callback is only valid during one particular callback invocation. If it is to be used after the function has returned, it must be copied using the Cursor constructor mentioned below.

Cursor([cur : Cursor]) -> Cursor::

A constructor to create a permanent cursor from that received by the visitor callback.

Index

:TUFlags: http://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#enum-members

Index:parse(sourceFile : string, args : table [, opts : table]) -> TranslationUnit::

Binding for {clang_parseTranslationUnit}[clang_parseTranslationUnit]. This will parse a given source file sourceFile with the command line arguments args, which would be given to the compiler for compilation, containing e.g. include paths or defines. If sourceFile is the empty string, the source file is expected to be named in args. + The last optional argument opts is expected to be a sequence containing {TUFlags}[CXTranslationUnit_*] enum names without the "CXTranslationUnit_" prefix, for example { "DetailedPreprocessingRecord" }. + NOTE: Both args and opts (if given) must not contain an element at index 0.

////////// Index:load(astFile : string) -> TranslationUnit::

Binding for
{clang_createTranslationUnit}[clang_createTranslationUnit]. This will load
the translation unit from an AST file which was constructed using `clang
-emit-ast`. Useful when repeatedly processing large sets of files (like
frameworks).

//////////

TranslationUnit

:clang_getTranslationUnitCursor: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gaec6e69127920785e74e4a517423f4391 :clang_getFile: http://clang.llvm.org/doxygen/group__CINDEX__FILES.html#gaa0554e2ea48ecd217a29314d3cbd2085 :clang_getDiagnostic: http://clang.llvm.org/doxygen/group__CINDEX__DIAG.html#ga3f54a79e820c2ac9388611e98029afe5 :code_completion_API: http://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html :clang_visitChildren: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__TRAVERSAL.html#ga5d0a813d937e1a7dcc35f206ad1f7a91

TranslationUnit:cursor() -> Cursor::

Binding for
{clang_getTranslationUnitCursor}[clang_getTranslationUnitCursor]. Returns
the `Cursor` representing a given translation unit, which provides access
to information about e.g. functions and types defined in a given file.

////////// TranslationUnit:file(fileName : string) -> string, number:: ////////// TranslationUnit:file(fileName : string) -> string::

Binding for {clang_getFile}[clang_getFile]. Returns the absolute file path of fileName. + NOTE: The last modification date is currently not returned as in luaclang-parser. ////////// and a time_t last modification time //////////

TranslationUnit:diagnostics() -> { Diagnostic* }::

Binding for {clang_getDiagnostic}[clang_getDiagnostic]. Returns a table
array of `Diagnostic`, which represent warnings and errors. Each diagnostic
is a table indexable by these keys: `text` -- the diagnostic message, and
`category` -- a diagnostic category (also a string).

////////// TranslationUnit:codeCompleteAt(file : string, line : number, column : number) -> { Completion* }, { Diagnostics* }::

Binding for {code_completion_API}[code completion API]. Returns the
available code completion options at a given location using prior
content. Each `Completion` is a table consisting of several chunks, each of
which has a text and a {chunk kind}[chunk kind] without the
`CXCompletionChunk_` prefix. If there are any annotations, the
`annotations` key is a table of strings:

    completion = {
         priority = number, priority of given completion
         chunks = {
             kind = string, chunk kind
             text = string, chunk text
         },
         [annotations = { string* }]
    }

//////////

Cursor

:clang_getCursorSemanticParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gabc327b200d46781cf30cb84d4af3c877 :clang_getCursorLexicalParent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#gace7a423874d72b3fdc71d6b0f31830dd :clang_getCursorSpelling: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gaad1c9b2a1c5ef96cebdbc62f1671c763 :clang_getCursorDisplayName: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gac3eba3224d109a956f9ef96fd4fe5c83 :cursor_kind: http://clang.llvm.org/doxygen/group__CINDEX.html#gaaccc432245b4cd9f2d470913f9ef0013 :clang_Cursor_getArgument: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga673c5529d33eedd0b78aca5ac6fc1d7c :clang_getCursorResultType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6995a2d6352e7136868574b299005a63 :clang_getCursorExtent: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__SOURCE.html#ga79f6544534ab73c78a8494c4c0bc2840 :clang_getCursorReferenced: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gabf059155921552e19fc2abed5b4ff73a :clang_getCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#gafcfbec461e561bf13f1e8540bbbd655b :clang_getSpellingLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga01f1a342f7807ea742aedd2c61c46fa0 :clang_getPresumedLocation: http://clang.llvm.org/doxygen/group__CINDEX__LOCATIONS.html#ga03508d9c944feeb3877515a1b08d36f9

:clang_getEnumConstantDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6b8585818420e7512feb4c9d209b4f4d :clang_getEnumConstantUnsignedDeclValue: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaf7cbd4f2d371dd93e8bc997c951a1aef :clang_getTypedefDeclUnderlyingType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8de899fc18dc859b6fe3b97309f4fd52

:clang_Cursor_getTranslationUnit: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__MANIP.html#ga529f1504710a41ce358d4e8c3161848d :clang_isCursorDefinition: http://clang.llvm.org/doxygen/group__CINDEX__CURSOR__XREF.html#ga6ad05634a73e693217088eaa693f0010

You can compare whether two ++Cursor++s represent the same element using the standard == Lua operator. Comparisons with any other type yield false.

Cursor:children() -> { Cursor* }:: Cursor:children(vf_handle) -> boolean::

Binding over {clang_visitChildren}[clang_visitChildren]. This is the main function for AST traversal. The first form collects the direct descendants of the given cursor in a table, returning an empty one if none are found. The second, preferred form accepts a handle of a visitor function previously registered with <<regCursorVisitor,regCursorVisitor()>> instead. Here, the returned value indicates whether the traversal was aborted prematurely due to the callback returning +<<ChildVisitResult,ChildVisitResult>>.Break+. + NOTE: Currently, the recommended procedure is to encapsulate the logic of one particular ``analysis'' into one visitor callback, which may run different portions of code e.g. conditional on the cursor's kind. (Instead of calling Cursor:children(visitor_function_handle) with a different visitor function while another invocation of it is active.)

////////// Traverses the direct descendants of a given cursor and collects them in a table. If no child cursors are found, returns an empty table. //////////

Cursor:parent() -> Cursor::

Binding for
{clang_getCursorSemanticParent}[clang_getCursorSemanticParent]. Returns a
cursor to the semantic parent of a given element. For example, for a method
cursor, returns its class. For a global declaration, returns the
translation unit cursor.

Cursor:lexicalParent() -> Cursor::

Binding for
{clang_getCursorLexicalParent}[clang_getCursorLexicalParent]. Returns a
cursor to the lexical parent of a given element.

Cursor:name() -> string::

Binding over {clang_getCursorSpelling}[clang_getCursorSpelling]. Returns
the name of the entity referenced by cursor. `Cursor` also has `__tostring`
set to this method.

Cursor:displayName() -> string::

Binding over
{clang_getCursorDisplayName}[clang_getCursorDisplayName]. Returns the
display name of the entity, which for example is a function signature.

Cursor:kind() -> string::

Returns the {cursor_kind}[cursor kind] without the `CXCursor_` prefix,
e.g. `"FunctionDecl"`.

Cursor:haskind(kind : string) -> boolean::

Checks whether the cursor has kind given by `kind`, which must be a string
of {CXCursorKind}[`enum CXCursorKind`] names without the `CXCursor_`
prefix. For instance, `if (cur:haskind("TypedefDecl")) then --[[ do
something ]] end` .

////////// kindnum //////////

Cursor:arguments() -> { Cursor* }::

Binding of {clang_Cursor_getArgument}[clang_Cursor_getArgument]. Returns a
table array of ++Cursor++s representing arguments of a function or a
method. Returns an empty table if a cursor is not a method or function.

Cursor:translationUnit() -> TranslationUnit::

Binding for
{clang_Cursor_getTranslationUnit}[clang_Cursor_getTranslationUnit]. Returns
the translation unit that a cursor originated from.

Cursor:resultType() -> Type::

Binding for {clang_getCursorResultType}[clang_getCursorResultType]. For a
function or a method cursor, returns the return type of the function.

Cursor:typedefType() -> Type::

If the cursor references a typedef declaration, returns its
{clang_getTypedefDeclUnderlyingType}[underlying type].

////////// XXX: Make error instead? Otherwise, returns nil. //////////

Cursor:type() -> Type::

Returns the `Type` of a given element or *nil* if not available.

Cursor:location([linesfirst : boolean]) -> string, number, number, number, number [, number, number]::

Binding for {clang_getCursorExtent}[clang_getCursorExtent] and
{clang_getSpellingLocation}[clang_getSpellingLocation]. Returns the _file
name_, _starting line_, _starting column_, _ending line_ and _ending
column_ of the given cursor. If the optional argument `linesfirst` is true,
the numbers are ordered like _starting line_, _ending line_, _starting
column_, _ending column_, _starting offset_, _ending offset_ instead. If
`linesfirst` has the string value `'offset'`, only _starting offset_,
_ending offset_ are returned.

Cursor:presumedLocation([linesfirst : boolean]) -> `string, number, number, number, number

Binding for {clang_getCursorExtent}[clang_getCursorExtent] and {clang_getPresumedLocation}[clang_getPresumedLocation].

////////// XXX: Better provide an API around CXSourceRange. This can be used to look up the text a cursor consists of. //////////

Cursor:definition() -> Cursor::

Binding for {clang_getCursorDefinition}[clang_getCursorDefinition]. For a
reference or declaration, returns a cursor to the definition of the entity,
otherwise returns *nil*.

Cursor:referenced() -> Cursor::

Binding for {clang_getCursorReferenced}[clang_getCursorReferenced]. For a
reference type, returns a cursor to the element it references, otherwise
returns *nil*.

Cursor:access() -> string::

When cursor kind is `"AccessSpecifier"`, returns one of `"private"`,
`"protected"` and `"public"`.

Cursor:isDefinition() -> boolean::

Binding for {clang_isCursorDefinition}[clang_isCursorDefinition]. Determine
whether the declaration pointed to by this cursor is also a definition of
that entity.

Cursor:isVirtual() -> boolean::

For a C++ method, returns whether the method is virtual.

Cursor:isStatic() -> boolean::

For a C++ method, returns whether the method is static.

Cursor:enumValue([unsigned : boolean]) -> enum cdata::

If the cursor represents an enumeration constant (CXCursor_EnumConstantDecl), returns its numeric value as a {clang_getEnumConstantDeclValue}[signed] 64-bit signed integer, or a 64-bit {clang_getEnumConstantUnsignedDeclValue}[unsigned] integer if unsigned is true. + NOTE: In C99, an enumeration constant must be in the range of values representable by an int (6.7.2.2#2). LJClang does not check for this constraint.

Cursor:enumval([unsigned : boolean]) -> number::

Returns the cdata obtained from `enumValue()` as a Lua number, converted
using `tonumber()`. Again, no checking of any kind is carried out.

Type

:clang_getTypeKindSpelling: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga6bd7b366d998fc67f4178236398d0666 :clang_getCanonicalType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaa9815d77adc6823c58be0a0e32010f8c :clang_getPointeeType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaafa3eb34932d8da1358d50ed949ff3ee :clang_isPODType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga3e7fdbe3d246ed03298bd074c5b3703e :clang_isConstQualifiedType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga8c3f8029254d5862bcd595d6c8778e5b :clang_getTypeDeclaration: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga0aad74ea93a2f5dea58fd6fc0db8aad4 :clang_getArrayElementType: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga718591f4b07d9d4861557a3ed8b29713 :clang_getArraySize: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#ga91521260817054f153b5f1295056192d

:CXTypeKind: http://clang.llvm.org/doxygen/group__CINDEX__TYPES.html#gaad39de597b13a18882c21860f92b095a

You can compare whether two ++Type++s represent the same type using the standard == Lua operator. Comparisons with any other type yield false.

Type:name() -> string::

Binding of {clang_getTypeKindSpelling}[clang_getTypeKindSpelling]. Returns
one of {CXTypeKind}[`CXTypeKind`] as a string without the `CXType_`
prefix. `Type` also has `__tostring` set to this method.

Type:canonical() -> Type::

Binding of {clang_getCanonicalType}[clang_getCanonicalType]. Returns underlying type with all typedefs removed. + NOTE: Unlike luaclang-parser, LJClang does not dispatch to clang_getPointeeType() for pointer types.

////////// XXX: What was the intention of that? Test out stuff... //////////

Type:pointee() -> Type::

Binding of {clang_getPointeeType}[clang_getPointeeType]. For pointer type
returns the type of the pointee.

Type:isPod() -> boolean::

Binding of {clang_isPODType}[clang_isPODType]. Returns true if the type is
a ``Plain Old Data'' type.

Type:isConst() -> boolean:: Type:isConstQualified() -> boolean::

Binding of
{clang_isConstQualifiedType}[clang_isConstQualifiedType]. Returns true if
the type has a `const` qualifier.

Type:declaration() -> Cursor::

Binding of {clang_getTypeDeclaration}[clang_getTypeDeclaration]. Returns a
`Cursor` to the declaration of a given type, or *nil*.

Type:arrayElementType() -> Type::

Binding of {clang_getArrayElementType}[clang_getArrayElementType].

Type:arraySize() -> Type::

Binding of {clang_getArraySize}[clang_getArraySize].

License

Copyright (C) 2013 Philipp Kutin

(Portions of the documentation copied or adapted from luaclang-parser, Copyright (C) 2012 Michal Kottman)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Ljclang

A LuaJIT-based interface to libclang

Ljclang Info

⭐ Stars17
🔗 Source Codegithub.com
🕒 Last Updatea year ago
🕒 Created9 years ago
🐞 Open Issues0
➗ Star-Issue RatioInfinity
😎 Authorhelixhorned