C++ code metrics with pygccxml
A while back I blogged about the very cool Perl module PPI. This allows Perl code to be treated as data, making questions such as “What is the average number of lines in a function in this program?” trivial to answer. To do the same kind of thing with C or C++, the best free tool I’ve found so far is pygccxml. This uses gccxml to generate an XML description of a C++ program from GCC’s internal representation. pygccxml then provides a relatively easy to use Python interface to the gccxml output. gccxml itself is a patched version of the GCC C++ front-end, which is a neat way of sidestepping the complexity of building a C++ parser.
Unfortunately, pygccxml doesn’t provide all the functionality of PPI, as gccxml is not able to parse function bodies. So pygccxml allows answering questions like “What are the member functions declared in class X?”, but can’t tell you how long those functions are, or what other functions are called by them.
As a first experiment with pygccxml, I implemented a short script to calculate the Weighted Methods per Class (WMC) metric, first proposed by Chindamber and Kemerer. In the simplest case, this just means the number of methods in a class, i.e. the weighting is 1, but there are variants such as giving a weighting of 1 to public methods and 0 to private methods.
The first step was to get gccxml and pygccxml installed. gccxml has not had an official release since February 2004, so the current codebase is only available via CVS. To satisfy my packaging fetish I created a gccxml rpm from a CVS checkout I did in June. Note that this is not an official gccxml release, in spite of the 0.7 version number. As it turned out, making an rpm was unnecessary as gccxml will run happily from wherever it’s built and doesn’t need to be installed in a system location.
I also created an rpm for the latest release of pygccxml (0.9) using checkinstall.
With everything installed, I hacked up the example.py script provided with pygccxml to create wmc.py. This script calculates the WMC metric for one or all of the classes declared in a list of C++ header files. Running the script on the Constraint.h header from my Springysim project gives the following output:
$ python wmc.py -I /usr/lib/qt-3.3/include/ \
/home/carl/workspace/springy_sim/src/Constraint.h
Circle 10
Constraint 8
ConstraintSystem 13
Data 3
Dimension 10
Distance 16
The output shows that there are 6 classes declared in Constraint.h. The number of member functions (including constructors and operators) in each class is also shown.
Note also that because we’re effectively running gcc, the location of any include directories must be provided. Since Constraint.h includes Qt header files, the Qt include directory path must be passed to gccxml via the wmc.py script and pygccxml. Without this, gccxml produces the sort of error messages you’d expect when gcc can’t locate a header file.
Using gcc as a parser introduces another wrinkle: the C preprocessor has been run, so the XML output from gccxml does not correspond exactly to the code that’s been lovingly crafted in vim. For example, Springysim makes use of Qt’s Meta Object Compiler (MOC). Classes that take advantage of Qt’s meta object functionality use the Q_OBJECT macro:
class MyClass : public QObject
{
Q_OBJECT
public:
MyClass(QObject *parent=0, const char *name=0);
~MyClass();
// rest of class declaration follows ...
The Q_OBJECT macro expands to a bunch of functions declarations, as seen in this output from wmc.py:
$ python wmc.py -I /usr/lib/qt-3.3/include/ \
-c ParticleField \
/home/carl/workspace/springy_sim/src/ParticleField.h
ParticleField
ParticleField
ParticleField
metaObject
className
qt_cast
qt_invoke
...
The last 4 functions in the list above are all created by Q_OBJECT, so the number of member functions in the ParticleField class is overstated. Even though metaObject(), className() etc. are fully-fledged members of class ParticleField, they shouldn’t be included in the WMC metric since they do not contribute to the psychological complexity of the class. I’m not sure if it’s possible work around this aspect of gccxml.
pygccxml is pretty damn cool though it might not be the right tool for calculating code metrics, particularly because function bodies cannot be parsed. There’s a somewhat outdated patch on the gccxml mailing list that adds this feature, so I’m gonna try that out next.
Here’s the links if you want to brew your own metrics at home: (rpms built for Fedora Core 6):