Function Pointers in C and C++

Posted on February 26, 2019

Argument Order in C

Programmers of functional programming languages will often point out that, in functional programming languages, the order of the arguments is often significant, because of currying. If you have a function that takes two arguments (e.g. map which takes a function to apply and a list to apply it to) it actually takes the first argument, and returns a function that takes the second argument and returns the final result. This makes it more convenient to write a lambda where the second argument is the unknown parameter: \x -> map someFunc x can be written as map f, whereas \f -> map f someValue has no such convenient shorthand (flip map someValue is actually clunkier).

To this, I sometimes respond that the order of arguments is significant in C (and thus its hipper cousin, C++) as well. This is most obvious in a function that uses variable arguments like printf: the first argument tells the compiler what to expect from the others. If you write printf("%s %i\n", "foo", 3);, we know from the first parameter that a char* and an int are expected later. If, however, we just have printf("Hi!\n"); it takes no further arguments.

The C mechanism used to do this, called “varargs,” works from left to right only. You declare the function as int printf(const char *fmt, ...);, and then during the function dynamically decide what the further arguments are. You could not instead arrange to have the last argument be the format string and then on that basis determine how many previous arguments there would be. The C programming language allows functions to dynamically determine what arguments they take, but only left to right.

ABI Considerations

This has consequences for the ABI, which specifies for each platform how C function calls are represented as assignments to registers or writes to stack memory. For any function that takes varargs, this left-to-right dynamic argument reading must be supported. This means that if an ABI assigns the first parameter to r2 in a varargs function with one parameter, it had better assign it to r2 in a function that takes that parameter plus an additional one. If it assigns the first four parameters to registers when there’s only four parameters, it had better use the same registers when there’s more than 4 parameters as well.

And, in practice, this doesn’t just apply to varargs functions. Other functions will have the same ABI. The standard doesn’t explicitly require this, but C does allow traditional K&R declarations (int printf();) or even implicit function declarations (in older C standards that are still common enough to be worth considering), so that you might not be able to tell when you’re calling a function what its official signature is or whether it takes a variable number of arguments. The way printf("%s %i\n", "foo", 3); is called, on a machine code level, will be the same whether printf was declared int printf(const char *fmt,...);, as int printf(const char *fmt, const char *arg1, int arg2); or as int printf();.

The principle is always the same: You never need to know anything about the latter arguments to access the former arguments. Number of former arguments, the type of the former arguments — fair game. Latter arguments? Right out.

Function Pointers and Callbacks

This has an interesting consequence for function pointers. What follows is not, strictly speaking, endorsed by the standard, but the standard is written in such a way that ABI designers have to make it work, and I haven’t seen a compiler optimization yet that breaks it.

Let’s say you have a function pointer used as a callback. Let’s say it gets called whenever data comes in on a socket. It would receive perhaps a pointer to the buffer of the incoming data, and a size indicating how much data, and would return how much of the data it had consumed. It would therefore have a signature that would look something like this:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);

The arguments and return value make sense for what it does, and are all absolutely necessary for a callback that acts like that, except for one, context. The context parameter is a convention in C that allows the same function to serve as a callback for different situations.

For example, if we wanted to write the data that came into the socket to a file, but wanted to write to different files based on which socket the data had come into, the context might indicate which file to write to, and perhaps even what to do in case of a write error (which, if it is a function pointer, might similarly require a context):

struct callback_data {
  int fd;
  void (*error_callback)(void *context);
  void *context;
};

size_t write_to_file_callback(const char *buff, size_t size, void *context) {
  struct callback_data *data = context; // No cast required in C
  ssize_t res = write(data->fd, buff, size);
  if (res < 0) {
    data->error_callback(data->context);
    return 0;
  }
  return (size_t)res;
}

And then we’d register the callback along with the callback_data it corresponds to, which would then be stored by whatever socket library we were using, without any knowledge of what that data would mean.

Now, let’s say that you have a function that just prints the data to the screen, and doesn’t care which context was used:

size_t print_data(const char *buff, size_t size) {
  return write(1, buff, size);
}

Or, for a more extreme example, let’s say that you have a function that panic-quits the program, that you want to be able to pass to any function that takes a callback, no matter what type of callback it takes:

__attribute__((noreturn)) size_t panic() {
  abort(); // Or you could just use the library's abort function...
}

Can you use these functions as the callback, if the callback type is defined as process_data_cb is above?

Officially, the answer is no. Certainly, this sort of thing won’t compile:

size_t (*process_data_cb)(const char *buff, size_t size, void *context);
process_data_cb = panic;

But, if you include a cast, it will:

typedef size_t (*process_data_cb_t)(const char*, size_t, void*);
process_data_cb_t cb = (process_data_cb_t)panic;

And will it work? Well, try it! You will find that it will.

Why? Because the function we’re calling takes a prefix of the parameters we’re calling it with, and so we’ll be writing to the right registers for that function to read. It just won’t read the registers with the parameters that it doesn’t have — which is fine, it didn’t have to anyway.

And the return type is the same. This is important, because return types don’t have anything to do with varargs. Returning a struct can add a secret first parameter in some ABIs, changing which register goes with which parameter for every parameter.

Implications for Programmers

Is this a horrible hack? Perhaps. Is this officially allowed by the standard? Not really — although it works on all compilers and platforms I’ve tested it on, which is all the ones I’ve developed on.

It certainly wouldn’t be the end of the world to avoid this nonsense and write wrapper functions:

size_t panic_cb(const char*, size_t, void*) {
  abort();
}

There are two problems I have with this. First, this can create a lot of boilerplate for the very lightweight operation of turning an existing function into a callback. C++ lambdas help with that (but they’re not available in C) yielding pretty light-weight, low-boilerplate results:

// With lambdas
register_callback(some_socket, [](const char *, size_t, void *) { abort(); });
// With a cast
register_callback(some_socket, reinterpret_cast<process_data_cb_t>(abort));

But then again, C++ already has better mechanisms than this void *context pattern for callback functions. std::function handles these things anyway for situations where the callback must be stored, and templates can be used to take functors when the callback need not be.

The other problem is a little harder to avoid: performance. By doing a cast, we can shave time off of an extra function call. In most situations, this doesn’t matter, and wouldn’t be a reason for a hack — if it is a hack. But there are some situations where every little bit of performance matters, and function pointer stuff like this can be hard to optimize.

Specifically, most C++ compilers could improve the overall performance of std::function by adopting a variant of this trick — but more on that in a future post.

My Personal Opinions

I will program as if this ABI feature were completely required by the C standard, and by the C++ standard for POD types. I think the improvement in terms of eliminating boilerplate is well worth it in C, optimization benefits notwithstanding. In C++, it doesn’t come up very often, and so I don’t use it — but I would if the right situation were to come up.

I know it’s on the border though. Certainly I wouldn’t use it if it would offend the leader of whatever project I were working on — and depending on what type of project I was working on, I might raise it as a question with them explicitly.

I think the standards of both programming languages should be amended to require this. In fact, I think calling a function with extra arguments in general should only be a warning, and that functions with fewer arguments should be able to override functions with more arguments in C++. Unfortunately — or fortunately — that is not my call to make.

And more importantly than all of this, I think this fact about C and C++ ABIs is something that every serious C or C++ programmer should be aware of. And I think it should be used within the standard library (in the implementation of std::function) wherever the platform is known, readability is relatively unimportant (the standard library is maintained by C++ experts) and performance improvements are possible to help every user of that library.