C++ code metrics with pygccxml

A while back I blogged about the very cool Perl module PPI. This allows Perl code to be treated as data, making questions such as “What is the average number of lines in a function in this program?” trivial to answer. To do the same kind of thing with C or C++, the best free tool I’ve found so far is pygccxml. This uses gccxml to generate an XML description of a C++ program from GCC’s internal representation. pygccxml then provides a relatively easy to use Python interface to the gccxml output. gccxml itself is a patched version of the GCC C++ front-end, which is a neat way of sidestepping the complexity of building a C++ parser.

Unfortunately, pygccxml doesn’t provide all the functionality of PPI, as gccxml is not able to parse function bodies. So pygccxml allows answering questions like “What are the member functions declared in class X?”, but can’t tell you how long those functions are, or what other functions are called by them.

As a first experiment with pygccxml, I implemented a short script to calculate the Weighted Methods per Class (WMC) metric, first proposed by Chindamber and Kemerer. In the simplest case, this just means the number of methods in a class, i.e. the weighting is 1, but there are variants such as giving a weighting of 1 to public methods and 0 to private methods.

The first step was to get gccxml and pygccxml installed. gccxml has not had an official release since February 2004, so the current codebase is only available via CVS. To satisfy my packaging fetish I created a gccxml rpm from a CVS checkout I did in June. Note that this is not an official gccxml release, in spite of the 0.7 version number. As it turned out, making an rpm was unnecessary as gccxml will run happily from wherever it’s built and doesn’t need to be installed in a system location.

I also created an rpm for the latest release of pygccxml (0.9) using checkinstall.

With everything installed, I hacked up the example.py script provided with pygccxml to create wmc.py. This script calculates the WMC metric for one or all of the classes declared in a list of C++ header files. Running the script on the Constraint.h header from my Springysim project gives the following output:


$ python wmc.py -I /usr/lib/qt-3.3/include/ \
    /home/carl/workspace/springy_sim/src/Constraint.h

Circle                        10
Constraint                    8
ConstraintSystem              13
Data                          3
Dimension                     10
Distance                      16

The output shows that there are 6 classes declared in Constraint.h. The number of member functions (including constructors and operators) in each class is also shown.

Note also that because we’re effectively running gcc, the location of any include directories must be provided. Since Constraint.h includes Qt header files, the Qt include directory path must be passed to gccxml via the wmc.py script and pygccxml. Without this, gccxml produces the sort of error messages you’d expect when gcc can’t locate a header file.

Using gcc as a parser introduces another wrinkle: the C preprocessor has been run, so the XML output from gccxml does not correspond exactly to the code that’s been lovingly crafted in vim. For example, Springysim makes use of Qt’s Meta Object Compiler (MOC). Classes that take advantage of Qt’s meta object functionality use the Q_OBJECT macro:


class MyClass : public QObject
{
  Q_OBJECT
  public:
    MyClass(QObject *parent=0, const char *name=0);
    ~MyClass();
  // rest of class declaration follows ...

The Q_OBJECT macro expands to a bunch of functions declarations, as seen in this output from wmc.py:


$ python wmc.py -I /usr/lib/qt-3.3/include/ \
  -c ParticleField \
  /home/carl/workspace/springy_sim/src/ParticleField.h
ParticleField
        ParticleField
        ParticleField
        metaObject
        className
        qt_cast
        qt_invoke
        ...

The last 4 functions in the list above are all created by Q_OBJECT, so the number of member functions in the ParticleField class is overstated. Even though metaObject(), className() etc. are fully-fledged members of class ParticleField, they shouldn’t be included in the WMC metric since they do not contribute to the psychological complexity of the class. I’m not sure if it’s possible work around this aspect of gccxml.

pygccxml is pretty damn cool though it might not be the right tool for calculating code metrics, particularly because function bodies cannot be parsed. There’s a somewhat outdated patch on the gccxml mailing list that adds this feature, so I’m gonna try that out next.

Here’s the links if you want to brew your own metrics at home: (rpms built for Fedora Core 6):

gccxml-0.7-1.fc6.i386.rpm

gccxml-0.7-1.fc6.src.rpm

pygccxml-0.9.0-1.i386.rpm

wmc.py

SLUG rocks Puppet, Flex and Bison

July’s SLUG meeting was once again awesome with SLUG President Lindsay Holmwood giving an introduction to Puppet, a tool for automating system administration tasks on a variety of Unix-like platforms. Erik de Castro Lopo also gave an introductory talk on Flex and Bison, two commonly used free software tools for parser generation.

I don’t do much system administration if I can help it, and Puppet seems to be the perfect tool for people who would rather be doing something else. Last week saw me reverse engineering one of the build servers at work so that I could get nightly builds running on a test box. This would hopefully allow me to avoid the murderous rage of the rest of the development team caused by my build system “improvements” breaking the build.

While hunting down the exotic combination of environment variables that allowed everything to work, it occurred to me that there are probably lots of servers that nobody really understands the configuration of- and that nobody really cares about either, until a random piece of hardware fails and chaos ensues. In short, if you don’t want this sort of hilarity on your network, get Puppet. Lindsay’s brief introduction made it seem like an elegant solution to a very messy problem.

Erik’s talk on Flex and Bison was very timely as I need a new data file format for one of my personal projects. These files will usually be written by hand and read by a program, so using Flex and Bison to create a parser makes sense. I downloaded Erik’s sample code and together with this tutorial it wasn’t difficult to get a basic parser working.

Learning about new tools is always a good idea, and SLUG meetings are an easy way to do it- someone else has already made the effort to figure out how the damn thing works. I’m often stumbling across code that makes me think: “Why didn’t you just use library x or standard module y rather than write this?”, and I try to avoid creating those moments for other programmers. It’s difficult because you need to know what’s already out there, and out there is really, really big.

C++ code metrics with cccc

After my last post I thought I should look a little deeper into code metrics. Unsurprisingly, a lot has been done in this area- researchers have been investigating metrics since at least the mid-70s. I’m not sure how active the field is today.

There are numerous commercial offerings of tools that will generate metrics for a codebase, but relatively few open source ones, at least for C and C++. Presumably this is because of the difficulty of developing a parser for the tortured syntax of C++. The best open-source tool I found was cccc which unfortunately is no longer under active development. cccc was written by Tim Littlefair for his PhD at Edith Cowan University in Perth, making it home-grown open source. Cool! It uses PCCTS (The Purdue Compiler-Compiler Tool Set) as a parser and generates XML and HTML files containing the calculated metrics.

The range of metrics calculated is good, although the HTML output is fairly basic (sorry Tim), and there’s no graphs. I ran cccc over my pet project Springysim, the resulting output is here.

The metrics produced by cccc are divided into three groups: procedural, object-oriented and structural:

Procedural metrics include Lines of Code (LOC), Lines of Comment (COM), McCabe’s cyclomatic complexity measure and various ratios of these numbers. The concept of cyclomatic complexity was introduced by McCabe in his 1976 paper and the cccc documentation has this to say about it:

The formal definition of cyclomatic complexity is that it is the count of linearly independent paths through a flow of control graph derived from a subprogram. A pragmatic approximation to this can be found by counting language keywords and operators which introduce extra decision outcomes. This can be shown to be quite accurate in most cases. In the case of C++, the count is incremented for each of the following tokens: ‘if’,'while’,'for’,’switch’,'break’,'&&’,'||’

This intuitively seems like a useful metric, although I’d like to read some studies validating it in practice.

Objec-oriented metrics produced by cccc for each class include:

  • Weighted methods per class (WMC). In the simplest case the weighting of each method is just one. cccc also provides WMCv, which only counts public and protected methods.
  • Depth of inheritance tree (DIT)
  • Number of children (NOC)
  • Coupling between objects (CBO). This is the number of other classes that are coupled to a class either as clients or a suppliers.

All these metrics were originally proposed by Chindamber and Kemerer in their 1994 paper A Metrics Suite for Object Oriented Design. It’s not a bad read, but does spend quite some time proving that the proposed metrics satisfy various formal properties proposed by Weyuker in her 1988 paper Evaluating Software Complexity Measures; these parts might be a little dry for some. But it’s not all ivory tower stuff, they also evaluated the metrics by collecting empirical samples at two different software development organisations. However, no attempt was made to correlate the code metrics with project outcomes such as defect rates or maintenance costs.

Unfortunately cccc does not calculate the 5th and 6th metrics suggested by Chindamber and Kemerer. The 6th metric, Lack of Cohesion in Methods (LCOM), examines which instance variables are used by which methods of a class. A class with a single instance variable that is used by all methods has high cohesion, while a class with many instance variables each used by few methods will have a low cohesion. This seems like an interesting metric for OO designers to know.

The structural metrics calculated by cccc are:

  • Fan-in: The number of other modules that pass information into a module.
  • Fan-out: The number of other modules that a module passes information to.
  • An “Information Flow measure” calculated as the square of the product of the fan-in and fan-out of a single module.

These metrics were proposed by Henry and Kafura in their 1981 paper Software Structure Metrics based on Information Flow, this unfortunately does not seem to be freely available. This paper is super-cool as the code base they use for evaluating the metrics is UNIX, version 6. The Lions book is cited as a reference- even cooler!

Tragic fawning over old-school UNIX aside, the paper shows that the information flow measure described above is strongly correlated with the occurrence of changes in the UNIX sources. That is, modules with a high value of the metric also had many changes made to them. The number of changes in a module is used as a proxy for the number of errors in a module, on the assumption that these two measures are strongly correlated.

cccc looks like an interesting tool, or at least the beginning of one. To be useful during development, it would be nice to see how these metrics are changing over time, and cccc doesn’t provide any facilities for that.

Perl code statistics with PPI

Sometimes there’s nothing better to do on a Sunday morning than read the Google Testing Blog. A recent post suggested that methods (or functions, subroutines, etc.) should be made shorter to make testing easier- because a short method does less than a longer one, it’s usually easier to test.

Normally I would cite improving readability and flexibility as the main reasons to prefer short methods, but ease of testing seems just as good. Code that’s difficult to read or difficult to test will always be difficult and costly to maintain. In fact right now, all over the world, there are thousands of maintenance programmers bent over in prayer reciting their litany: Please, write code that can be read rather than deciphered…

The post made me speculate on how long my methods are. The only code I’ve written recently that’s wholly my own work is a test harness in Perl. This is the confidential intellectual property of my employer and is busy delivering sustainable competitive advantage, so I can’t post it on this blog. Hopefully some statistics on the code won’t dilute shareholder value too much.

It turns out there’s a Perl module for doing just this sort of thing: PPI.There’s a cool introduction to PPI by Adam Kennedy (the module author) on perl.com. PPI makes doing something basic like finding the average length of subroutines very easy, so I hacked up a script to do just that.

My test harness script is a bit less than 1000 lines:

$ wc -l foo.pl
914 foo.pl

Running the sublength.pl script on it gave the following results:

$ ./sublength.pl foo.pl
Number of subroutines: 46
Length of longest sub: 50
Length of shortest sub: 4
Average length of sub: 15.76

My subroutines are pretty short with an average length of just 15.76 lines, which includes the header and the opening and closing braces on separate lines. As I’ve said before, I find long, deeply-nested methods difficult to understand, so I just don’t write them like that.

More generally, I like the idea of measuring the complexity or readability of a codebase with metrics. You could even analyse a repository commit-by-commit and see whether each commit has increased or decreased readability. Then you can have graphs showing the change in readability over time. Awesome! Sure, it sounds like overkill, but folks much smarter than me have emphasised the importance of readability:

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

xvidcap

Sometimes I think “I really need an app that does x“, and then ten minutes later I’ve found an app that does x. And it’s free!

That’s not what happened with xvidcap, but it was pretty close. I needed something that would create a screen capture video, and xvidcap does just that. It only took a few minutes to find, but the 1.1.4 RPM package I installed from the Fedora livna repository segfaulted as soon as I tried to do a capture. I built the more recent 1.1.5 release from source, and that worked fine.

xvidcap seems to be under intensive development - I just checked back at Sourceforge and 1.1.6 was released yesterday.

And yeah, that means there will be some videos up soon.

Readline commands

As someone who spends much of their life at a bash prompt, I should really use GNU readline commands more often. The only one I ever use is Ctrl-R, to search backwards through the history. The full list of bindable readline commands is here.

A few that look particularly useful:

beginning-of-history (Alt-<)
  Move to the first line in the history.

end-of-history (Alt->)
  Move to the end of the input history, i.e.,
  the line currently being entered.

unix-line-discard (Ctrl-u)
  Kill backward from the cursor to the
  beginning of the current line.

kill-line (Ctrl-k)
  Kill the text from point to the end of
  the line.

forward-word (Alt-f)
  Move forward to the end of the next word.
  Words are composed of letters and digits.

backward-word (Atl-b)
  Move back to the start of the current or
  previous word. Words are composed of letters
  and digits.

NetKernel

One of the many bad habits I’m yet to break is slumming it through Slashdot comment threads. They’re mostly dross, of course, but occasionally there’s something useful. Today there was a post on Optimizing PHP and Apache. Like any good slashdotter I didn’t bother to read the article, but I did find a comment that claimed:

We have recently ported Sugar CRM PHP/Apache to NetKernel and lost over 95% of the code and subsecond response times … For me performance is important but maintainability is equal. The less code the easier to maintain. There is a great white paper from the NK guys here.

As a firm believer in the maxim that the way to programming nirvana is to write the absolute minimum amount of code, this sounded interesting. I had a read of the whitepaper. It’s overly wordy and has a generally breathless tone, e.g.:

To demonstrate that these statements are in fact simple facts, the paper introduces and builds upon a foundation of fundamental principles. It is likely that these principles will challenge your understanding of the nature of computation.

A foundation of fundamental principles? Eww.

But there was some interesting stuff in there. The white paper introduces a model termed Resource-Oriented Computing (ROC), which in a sentence is like the web crossed with shell pipes. Yahoo pipes is used as an example of a Resource-Oriented System. The idea also seemed to have something in common with Service-oriented architecture, particularly the emphasis on combining loosely coupled, interoperable services. NetKernel is the framework built by 1060 Research to support the implementation of ROC applications.

The white paper didn’t really give me a picture of what a full scale Resource-Oriented application would look like, but I’m looking forward to the promised next installments. Very excellent blogger Jon Udell took a look at NetKernel a few years ago and seemed to be quite impressed.

Web app performance tuning

I’ve done a fair bit of performance optimization on server applications and device drivers, but not much with web applications, since I don’t normally do this type of development. Today was different.

The problem was that Roundup, our issue tracking system, was slow. Really slow. Loading a simple page could take up to 10 seconds. When it’s a tool you use all day, this starts to be annoying very quickly.

One of the many cool things about Roundup is that it can use a number of different storage backends- sqlite, mySQL, PostgreSQL, etc. We are using sqlite, and I suspected that since the number of issues is growing rapidly, we had outgrown it and needed to move to a “proper” database.

I decided to migrate to the MySQL backend, as it’s the database I’m most familiar with. I duly exported all the data from sqlite, set up a test installation of MySQL, and then imported everything into MySQL as described in the Roundup documentation.

I rushed to my browser to check the speed improvement, but it was pretty much the same. More thought was required.

I hypothesized that since Roundup is running on same server as our version control system, the slow page loads were being caused by the heavy disk IO associated with checkouts, updates, tagging and the like. I begged my friendly neighborhood system administrator for a spare machine and installed Roundup and MySQL on that.

The spare machine has dual 2.80GHz Xeons with Hyperthreading, 512KB cache and 2GB of memory, so it should have plenty of grunt for a little issue tracker like ours. Unfortunately, it didn’t improve the page load times either.

Hmmm…..

I installed the wonderful htop on the spare machine to get a better picture of what was going on. Requesting a page in my browser, I could watch the Roundup server process and mysqld using up massive gobs of CPU over the full 5-10 seconds that the request took to run. I’m thinking: what the hell is it doing?

I spent some time wandering around the Roundup source tree, but nothing seemed to leap out at me. I checked out the database using phpMyAdmin, but all the tables seemed to have indexes where they should be. Not that I would really know…

I decided it would help if I knew what queries were running on the database. A few minutes looking at the MySQL manual led me to this page that describes how to activate the general query log. When enabled, all queries run by msqld will be logged to a specified file. For some reason this must be done with a command-line parameter rather than using the configuration file, so I edited /etc/init.d/mysqld to pass the -l flag to mysqld when it starts.

This was a good trick. I requested one page. The resulting query log file was was 1.4 MB in size and full of select statements. Lots of select statements:

$ grep -i select query.log | wc -l
15492

So rendering a single page required the database to run 15,000 queries! That might, just might, explain the slow page loads.

Once I had the log, it was fairly easy to figure out what was generating all the queries. On the “issue” page, there is a drop-down menu that allows specifying the issue’s parent issue. Creating a parent/child relationship betwen issues is often useful to break down large units of work into smaller, more managable tasks. The problem here was that the drop-down menu was populated with the “title” of every issue in the system - all 2,500 of them. Due to the way the database abstraction works in Roundup, around 6 queries are necessary to retrieve each issue.

Commenting out the offending menu in the page template (Roundup uses the Template Attribute Language from Zope) reduced the number of queries required to display a page from 15,000 to 200. Page load times were correspondingly improved.

If I has more experience with web applications I would have been able to sort this problem out much faster. Using the MySQL query log was the key step that allowed me to understand the problem. It’s my own fault though- I disobeyed the first rule of performance optimization - measure first, then make changes. Don’t assume you know where the problem is.

UPDATE: Looks like this is a known issue.

Bringing an XP MBR back from the dead

When I installed my laptop at work, I divided the hard drive for a dual boot setup with Windows XP and Redhat EL4. I never actually dual booted since, well, it sucks and I have another RHEL machine I can SSH into anyway.

A couple of weeks ago I ran out of space on the NTFS partition, mostly because 25GB is used storing my MP3 collection. So I needed to wipe the multiple ext3 partitions I created for the RHEL installation and combine them into a single NTFS partition.

I’ve never had to do this under Windows before, but a small amount of Googling led me to a Microsoft support page that describes how to use the XP disk management utility. This utility is really pretty good. Go Redmond! It was very easy to make the changes I needed, and within 10 minutes or less I had a shiny new NTFS partition as my E: drive providing a much-needed 40GB of additional storage.

The only problem was that GRUB was still installed on the MBR, so next time I booted the machine, nothing happened. Disappointing.

I fished out my XP installation CD and tried to do a recovery. I tried various commands, but none of them worked. More Googling led to this comment that describes the command sequence required:

> fixboot c:
> fixmbr

Using either of these commands seperately doesn’t work, at least not in my situation.

daemon(1) woes resolved

I asked about my previously discussed problems with daemon(1) and ssh on the daemon-users list, and received a very prompt reply from Raf (thanks Raf!). Apparently the solution is to use the -t option to ssh. This is described in the man page as:

Force pseudo-tty allocation. This can be used to execute arbitrary screen-based programs on a remote machine, which can be very useful, e.g., when implementing menu services. Multiple -t options force tty allocation, even if ssh has no local tty.

Because -t forces ssh to allocate a pseudo-terminal, daemon’s stdin will no longer be a socket, so it won’t think it’s being started from inetd. It will then fork() and the ssh session will end immediately as desired.