C++: Uniquely addressed

In C++, many times its useful to write functions as global function objects. We can pass them to other functions easily, and decorate them with new enhancements. In general, its good practice to initialize global function objects using static initialization in order to avoid the static initialization fiasco. In C++11, we can just use constexpr for this:

struct sum_f
{
    template<class T, class U>
    auto operator()(T x, U y) const
    {
        return x+y;
    }  
};

static constexpr auto sum = sum_f();

One problem with this approach is the possibility of ODR(One Definition Rule) violations. Using static makes the objects have internal linkage, so most ODR is masked by this. However, if we take the address, it will be different across translation units(thus violating ODR).

More importantly than just the ODR(perhaps most users won’t ever take the address of the object), this can lead to bloat in the executable, since there can be multiple copies of the same object. Most compilers do inline the objects, so in general there is no bloat. However, if the compiler reaches some internal limit in inlining then bloat in the executable is still a real possibility. For small examples, executable bloat can be seen by disabling inlining in the compiler.

Static class variable

In N4381, Eric Niebler describes a mechanism we can use to avoid ODR violation(pending issue 2104 in CWG) and executable bloat. He uses a static class variable to store the function object, and then at global scope creates a reference to the static variable:

template<class T>
struct static_const_storage
{
    static constexpr T value = T();
};

template<class T>
constexpr T static_const_storage<T>::value;


template<class T>
constexpr const T& static_const()
{
    return static_const_storage<T>::value;
}

static constexpr auto& sum = static_const<sum_f>();

Since classes have to be unique across translation units, the static variable declared at class scope are given a unique address across translation units as well. So in addition to the unique address, only one function object is in the executable(even when inlining is disabled).

Lambdas

Another nice way to declare function objects is using lambdas. Since lambdas don’t have constexpr constructors, I discussed in a previous post how we can statically initialize them using constexpr with a convenient STATIC_LAMBDA macro:

template<class F>
struct wrapper
{
    static_assert(std::is_empty<F>(), "Lambdas must be empty");
    template<class... Ts>
    decltype(auto) operator()(Ts&&... xs) const
    {
        return reinterpret_cast<const F&>(*this)(std::forward<Ts>(xs)...);
    }
};

struct wrapper_factory
{
    template<class F>
    constexpr wrapper<F> operator += (F*)
    {
        return {};
    }
};

struct addr_add
{
    template<class T>
    friend typename std::remove_reference<T>::type *operator+(addr_add, T &&t) 
    {
        return &t;
    }
};

#define STATIC_LAMBDA wrapper_factory() += true ? nullptr : addr_add() + []

static constexpr auto add_one = STATIC_LAMBDA(auto x)
{
    return x + 1;
};

We could update wrapper_factory to return a unique address using the static_const function:

struct wrapper_factory
{
    template<class F>
    constexpr const wrapper<F>& operator += (F*)
    {
        return static_const<wrapper<F>>();
    }
};

So then we could declare add_one like this:

static constexpr auto& add_one = STATIC_LAMBDA(auto x)
{
    return x + 1;
};

However, despite our efforts, add_one will still have a different address across translation units. This is because the type of lambdas are treated as different every time they are declared(even across translation units). Of course, this is C++, so we shouldn’t give up yet.

Forcing a unique address

Since we already rely on being able to reinterpret_cast lambdas(since as an implementation detail, lambdas are empty and free of side effects), we could take advantage of this by doing a reinterpret_cast on an address that will be unique across translation units. So lets update the wrapper_factory to take a placeholder type that will be unique across translation units:

template<class T>
struct wrapper_factory
{
    template<class F>
    constexpr const wrapper<F>& operator += (F*)
    {
        return reinterpret_cast<const wrapper<F>&>(static_const<T>());
    }
};

Then we can update the macro to take this type as a parameter:

#define STATIC_LAMBDA(T) wrapper_factory<T>() += true ? nullptr : addr_add() + []

So we create the add_one_t class as a placeholder address and then declare add_one function like this:

struct add_one_t {};
static constexpr auto& add_one = STATIC_LAMBDA(add_one_t)(auto x)
{
    return x + 1;
};

Of course, this doesn’t work. A reinterpret_cast is not allowed in a constexpr expression. However, most compilers can still do constant folding with reinterpret_cast, since there are valid use cases for reinterpret_cast in a constant expression(See this gcc bug report for more info).

Surprisingly, both gcc and clang both support an extension using the __builtin_constant_p builtin that allows us to do constant folding of these expressions in a constexpr context. By writing __builtin_constant_p(expr) ? (expr) : (expr) then the expr will be constant folded first by the compiler without regard if it meets the rules for a constexpr expression(See here for a more thorough explanation). So we can write a macro to encapsulate this:

#define CONST_FOLD(x) (__builtin_constant_p(x) ? (x) : (x))

Then update wrapper_factory to use CONST_FOLD

template<class T>
struct wrapper_factory
{
    template<class F>
    constexpr const wrapper<F>& operator += (F*)
    {
        return CONST_FOLD(reinterpret_cast<const wrapper<F>&>(static_const<T>()));
    }
};

And now it compiles, and add_one will have a unique address across translation units.

Conclusion

Of course, this is fairly non-portable C++, although it works on more than one compiler. Ideally, C++ should be fixed so these workarounds are not necessary. Non-capturing lambdas should have a constexpr default constructor and be unique across translation units(just like functions). Since its possible to twist implementations to work like this, it should not be too much of a stretch to require it.