In a previous post, Grepping functions with ANTLR, we looked into parsing source code to perform queries on it using XPath. For that post, I implemented a custom parser using ANTLR. In this post we look at an alternative: srcML is a software project that converts source code to XML so we can query it using XPath.

Recap

Like described in the previous post, we want to find functions that have the HttpPost attribute, but not the ValidateAntiForgeryToken attribute, in order to find actions that are vulnerable to CSRF. Normal searching tools such as grep don’t work here, since you can’t search for an attribute that is not there. Therefore, we want to parse the source code and query it.

About srcML

SrcML converts source code to XML. It supports C, C++, C#, and Java. The XML it generates is largely independent of the language. Functions in C are converted to a <function> tag, and methods in Java are also converted to a <function> tag. That makes it easier to use queries across languages without having to learn a new object model.

Using srcML

To test srcML, I used a random C# project from GitHub, ASCOM.NETStandard. First, we tell srcML to convert these files to XML:

srcml ~/dev/ASCOM.NETStandard -o ascom.xml

This creates a 15 MB XML file, which contains the syntax tree for all given files. Here is part of the XML file that corresponds to this function:

<comment type="line">// POST: DomeState/Create</comment>
<function><attribute>[<expr><name>HttpPost</name></expr>]</attribute><attribute>[<expr><name>ValidateAntiForgeryToken</name></expr>]</attribute><specifier>public</specifier><type><name>ActionResult</name></type><name>Create</name><parameter_list>(<parameter><decl><type><name>IFormCollection</name></type><name>collection</name></decl></parameter>)</parameter_list><block>{
    <try>try
    <block>{
        <comment type="line">// TODO: Add insert logic here</comment>

        <return>return <expr><call><name>RedirectToAction</name><argument_list>(<argument><expr><call><name>nameof</name><argument_list>(<argument><expr><name>Index</name></expr></argument>)</argument_list></call></expr></argument>)</argument_list></call></expr>;</return>
    }</block>
    <catch>catch
    <block>{
        <return>return <expr><call><name>View</name><argument_list>()</argument_list></call></expr>;</return>
    }</block></catch></try>
}</block></function>

Querying using xmllint

Now that we have an XML representation, we can use any XPath tool to query it. One tool that is often installed and can do XPath queries is xmllint. Because our document has a namespace, we have to use the shell function of xmllint to specify the namespace, and remember to add a prefix to every tag. The following command queries the syntax tree:

xmllint --shell ascom.xml
/ > setns src=http://www.srcML.org/srcML/src
/ > xpath //src:function[src:attribute//src:name='HttpPost' and not(src:attribute//src:name='ValidateAntiForgeryToken')]/src:name/text()
Object is a Node Set :
Set contains 36 nodes:
1  TEXT
    content=SetSerialTrace
2  TEXT
    content=SetSerialTraceFile
3  TEXT
    content=SetState
...

This returns a list of 36 function names that have the HttpPost attribute, but not the ValidateAntiForgeryToken attribute.

Conclusion

SrcML offers a XML tree specification and a tool to convert source code to XML, which makes it possible to query source code using XPath.

Read more