Enlarge your debugger

Sometimes my (amazing) coder friends and I organize a private informal conference/meeting, not unlike the small hacker meetings described in this post at hackerfactor.com, to talk about cool stuff we are doing lately. Last time was some weeks ago, and I gave a short talk about how you can use Python to extend and customize the functionality of gdb.

There is no recording of the talk, but you can read the slides. I hope you like them :D (Use left/right arrows to go to previous/next slide. In some slides you can also press down arrow to view “extra” slides with additional content, press the Escape key if you want to see which ones have extra slides.)

Comments off

Poor man’s tracepoints with gdb

(This article is an adaptation of this thing in Spanish I wrote in 2008. It’s nothing groundbreaking but a good friend of mine suggested a few days ago that it was still useful to him, so I decided to put it here. The new title is a reference to the poor man’s profiler, which you will probably like if you like this post)

Everyone of us has resorted at some point to printf-based debugging. Traces can be very useful, either because you want to see at a glance how a given value is changing over time or because stopping a thread at a breakpoint to examine some values can vanish the same race condition you are trying to debug. Nevertheless, inserting printf statements is a boring task, which makes you recompile everytime you want to change the set of values you want to see.

Ideally we could use tracepoints in our favorite debugger to collect some values and later dump them to a log, and gdb supports them but as the documentation points out:

The tracepoint facility is currently available only for remote targets. See Targets. In addition, your remote target must know how to collect trace data. This functionality is implemented in the remote stub; however, none of the stubs distributed with gdb support tracepoints as of this writing.

So, if our platform doesn’t support tracepoints… are we stuck with printf()? Well, yes and no. It turns out gdb allows us to specify actions to be executed at a breakpoint, so you can tell it to, every time it passes by some point in your program, print whatever you want and continue running. It’s not as good as the real deal, because a tracepoint would be more lightweight, but it can be helpful nonetheless.

Let’s see how to do it. Consider the following C program:

1
2
3
4
5
6
7
8
9
10
#include <stdio.h>
 
int main()
{
        int i;
        double x=2.0f;
        for (i=0; i<64; i++)
                x*=2;
        return 0;
}

if we compile it with debug info and launch it on gdb, we can log the values of x doing this:

(gdb) break 8
Breakpoint 1 at 0x400463: file a.c, line 8.
(gdb) commands 1
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just "end".
>silent
>printf "x=%g\n",x
>cont
>end
(gdb) r
Starting program: /home/slack/a.out
x=2
x=4
x=8

The “silent” keyword as first action means that gdb shouldn’t print the info about the current stack frame it usually prints when stopping at a breakpoint, and the rest should be pretty self-explaining. In addition to gdb’s printf function you can also use more complex gdb expresions so you could, for instance, traverse a list and log its values or stuff like that :)

Happy coding!

Comments off

How to use debug information in separate files with gdb

One of the things that surprised me when I started using Visual Studio, coming from a Linux programming background, was the amount of files it generates when you create and compile a project. One of these files is usually named project.pdb and contains the debug information for the final executable, along with information for incremental linking (Microsoft calls it the “program database“, hence the PDB extension). It turns out having separate debug information files can be useful when you want to distribute your program in binary form, but still be able to debug it when you get crash reports.

I was curious if the GNU toolchain could do the same, and the answer is yes. According to the gdb documentation there are two ways of using debug info from another file:

  • Adding a debug link to the executable file: create the binary with embedded debug information as usual, then copy it to another file, strip it out and add the debug link:
    objcopy --only-keep-debug program program.debug
    strip program
    objcopy --add-gnu-debuglink=program.debug program

    The debug link contains the file name, and a CRC of the full contents of the debug file. When loading the executable into gdb, it will look in program.debug and .debug/program.debug (paths relative to the executable file) for a file containing debug information. This method works with every executable format that can contain a section called .gnu_debuglink. I tested it on cygwin.

  • Build ID: ld, the GNU linker, can embed a special section with a build identifier, which can be either a MD5 or SHA1 hash of the output file contents, a random number, or any given bitstring (see the documentation of the –build-id option). This ID will then be copied and preserved when a debug file is created or the binary is stripped. When a Build ID is present, gdb will try to load a debug file in /<debug-dir>/xx/yyyyyyyy.debug, where:- <debug-dir> is each directory specified with set debug-file-directory,- xxis the first byte of the hexadecimal Build ID bitstring, and- yyyyyyyy is the rest of the bitstring.

    This method is supported only on some platforms using ELF format for binaries and the GNU binutils.

Comments off

Automatic memoization in C++0x

Memoization is a pretty well-known optimization technique which consists in “remembering” (i.e.: caching) the results of previous calls to a function, so that repeated calls with the same parameters are resolved without repeating the original computation.

Some days ago, while trying to show a colleague the benefits of a modern high-level language like Python over C++, I came up with the following snippet:

1
2
3
4
5
6
7
def memoize(fn):
     cache = {}
     def memoized_fn(*args):
         if args not in cache:
             cache[args] = fn(*args)
         return cache[args]
     return memoized_fn

It is a small function which takes a function as its only parameter, and returns a memoized version of that function. It is short, it shows some interesting Python features, like built-in dictionaries and tuples, or functions as first-class objects, and it should be pretty readable.

To make a fair comparison I needed to code a C++ version too. I was thinking about writing, just to prove my point, the classic boilerplate-filled template to create a function object, and using typelists and compile-time recursion to allow an arbitrary number of parameters. But it would have been quite boring. Also, it turns out that, with the upcoming C++ standard supporting lambda functions, tuples and variadic templates, it is possible to get rid of most of the boilerplate and use pretty much the same functional approach. Moreover, gcc 4.5 already supports these things, so I decided to give it a go:

1
2
3
4
5
6
7
8
9
10
11
template <typename ReturnType, typename... Args>
std::function<ReturnType (Args...)> memoize(std::function<ReturnType (Args...)> func)
{
    std::map<std::tuple<Args...>, ReturnType> cache;
    return ([=](Args... args) mutable  {
            std::tuple<Args...> t(args...);
            if (cache.find(t) == cache.end())                
                cache[t] = func(args...);
            return cache[t];
    });
}

Tricky things to note about the C++ version:

  • The new lambda syntax: the equals sign in [=] means “capture local variables in the surrounding scope by value”, which is needed because we are returning the lambda function, and the local variable will disappear at that moment, so we can’t hold a reference to it. As we are capturing by value and we pretend to change this captured value, the function should be marked as mutable (see “Appearing and disappearing consts in C++” by Scott Meyers)
  • Lambda functions are function objects of implementation-dependent type, so we need to use std::function as the return type from memoize() to wrap our lambda.

I still like the Python version better, as it looks cleaner to me, but I’m glad the new features can help us reduce the amount of boilerplate where switching to newer languages is not possible (sometimes you just NEED the extra speed). Kudos to the C++ standards committee for the improvements and the gcc team for keeping up with them.

EDIT (2011/03/21): Thanks everyone for the feedback, both in the comments here and in the reddit thread. Some additional notes:

  • Here you have the complete sample file. I tested it under g++ 4.5.2 on MinGW, with -std=c++0x.
  • This is a proof of concept, which I did to become familiar with the new language features. As some people pointed out, a map is not the best data structure to use as a cache (it has O(log n) lookups). You will probably do better if you use a hash map (like the new std::unordered_map in C++0x). I chose map just for the sake of clarity. Also, you will need to define an operator< for any type you would like to use (or a hash function for unordered_map).
  • There were also suggestions to use lower_bound or equal_range to avoid the second map lookup and use the resulting iterator also as an insertion hint. I thought about saving the result in a local variable to avoid another lookup, but I wanted it to be as close as possible to the python version, just for clarity. I also didn’t know about these functions, so thanks for the tip! :D
  • Some people also pointed out that this example doesn’t work with recursive functions. That’s completely true. In this post on stackoverflow the user larsmans suggests that I’m leaving the implementation of a fixed-point combinator as an exercise to the reader. Maybe it would be a good exercise for the writer, too… if I’m able to write something like that it will surely deserve its own post ;D

Comments (15)

Welcome!

I’m currently moving here from my old website, so expect things to change or move around in the next days/weeks. I hope to move the music, demoscene productions and personal projects soon, but i still haven’t decided what to do with the posts in my old blog. Maybe I’ll import the posts, maybe I won’t.

In the meantime, please check out the music section for some of my creations :)

Stay tuned!

Comments off