Exposing aligned Eigen::Quaternionf with boost::python

16 Apr 2015 | Programming

Introduction

This post is about how to expose C++ struct/classes that have alignment requirements using boost::python.

In the general case, the easiest solution is to use a shared_ptr<Type> as the holder (second argument to bp::class_) and make sure the class has an operatore new that enforces alignment.

But sometimes, you want to expose a class that requires alignment by value, without using a shared_ptr<Type> (typically because you don’t want to write tons of wrappers for your existing C++ codebase). That is what we do here with Eigen::Quaternion.

If you are not interested in the analysis, go directly to the solution near the end.

Exposing Eigen::Quaternion

I was trying to expose an Eigen::Quaternion using boost::python

bp::class_<Quaternionf>("Quaternion", bp::init<float, float, float, float>())
  .def(bp::init<Matrix3f>())
  .add_property("w", get_prop_const(&Quaternionf::w))
  .add_property("x", get_prop_const(&Quaternionf::x))
  .add_property("y", get_prop_const(&Quaternionf::y))
  .add_property("z", get_prop_const(&Quaternionf::z))
  .def("matrix", &Quaternionf::matrix)
  .def("rotvec", &quaternion_to_rotvec);

Now, I also want to be able to export vector<Quaternionf>. So I want to expose the Quaternion by value and do not want to use a shared_ptr<Quaternion> as a Holder.

The problem is, Eigen Quaternion need to be aligned to 16 bytes.

With the code above, there is a high probability that you’ll trigger an assertion like the following (meaning Eigen detects that the Quaternion coefficients are not aligned) :

Assertion failed: ((reinterpret_cast<size_t>(array) & 0xf) == 0 && "this assertion is explained here: " "ht
tp://eigen.tuxfamily.org/dox-devel/group__TopicUnalignedArrayAssert.html" " **** READ THIS WEB PAGE !!! ****"), function plain_array, file /Users/julien/shapetwin/dev/shapy/libs/_install/_all/include/eigen3/Eigen/s
rc/Core/DenseStorage.h, line 86.
Process 93520 stopped

The full stacktrace is here.

Problem analysis

You can see frame #15 is boost/python/object/make_instance.hpp:71 :

template <class T, class Holder>
struct make_instance
    : make_instance_impl<T, Holder, make_instance<T,Holder> >
{
    template <class U>
    static inline PyTypeObject* get_class_object(U&)
    {
        return converter::registered<T>::converters.get_class_object();
    }
    
    static inline Holder* construct(void* storage, PyObject* instance, reference_wrapper<T const> x)
    {
->        return new (storage) Holder(instance, x);
    }
};

The -> indicate the line. Ok so the problem is that boost is trying to create a new Quaternion in a memory area (storage) that is not guaranteed to be aligned. To fix this, we have to find how to tell boost to allocate storage for our Quaternion using Eigen’s aligned_allocator.

Ok, so the make_instance::construct method is called (frame #16) in make_instance.hpp:45 :

template <class T, class Holder, class Derived>
struct make_instance_impl
{
    typedef objects::instance<Holder> instance_t;
      
    template <class Arg>
    static inline PyObject* execute(Arg& x)
    {
        BOOST_MPL_ASSERT((mpl::or_<is_class<T>, is_union<T> >));

        PyTypeObject* type = Derived::get_class_object(x);

        if (type == 0)
            return python::detail::none();

        PyObject* raw_result = type->tp_alloc(
            type, objects::additional_instance_size<Holder>::value);
        
        if (raw_result != 0)
        {
            python::detail::decref_guard protect(raw_result);
          
            instance_t* instance = (instance_t*)raw_result;
          
            // construct the new C++ object and install the pointer
            // in the Python object.
      ->    Derived::construct(&instance->storage, 
                (PyObject*)instance, x)->install(raw_result);
            
            // Note the position of the internally-stored Holder,
            // for the sake of destruction
            Py_SIZE(instance) = offsetof(instance_t, storage);

            // Release ownership of the python object
            protect.cancel();
        }
        return raw_result;
    }
};

And we see it passes &instance->storage as the storage and instance is raw_result which is allocated with type->tp_alloc. We can see the size is determined by objects::additional_instance_size<Holder>::value.

In our case, we find the type of Holder from the second template param to make_instance_impl :

frame #16: 0x00000001029d81c1 volumit_human.so`_object* boost::python::objects::make_instance_impl<
    Eigen::Quaternion<float, 0>, 
    boost::python::objects::value_holder<Eigen::Quaternion<float, 0> >, 
    boost::python::objects::make_instance<
        Eigen::Quaternion<float, 0>, 
        boost::python::objects::value_holder<Eigen::Quaternion<float, 0>>
    >
>::execute<boost::reference_wrapper<Eigen::Quaternion<float, 0> const> const>(x=0x00007fff5fbfd618) + 145 at make_instance.hpp:45

So our Holder is

boost::python::objects::value_holder<Eigen::Quaternion<float, 0>>

Now, if we look at boost/python/object/instance.hpp:40, where additional_instance_size is defined :

template <class Data>
struct additional_instance_size
{
    typedef instance<Data> instance_data;
    typedef instance<char> instance_char;
    BOOST_STATIC_CONSTANT(
        std::size_t, value = sizeof(instance_data)
                           - BOOST_PYTHON_OFFSETOF(instance_char,storage));
};

This computes the additional size by the particular instance (the substraction of a char instance is to get only the additional size without all the common members - dict, weakrefs, objects.

offsetof(type, var) returns the offset in bytes of the variable inside the struct of the given type.

With a bit more digging, we see that additional_instance_size uses the instance class which is defined just above in the same file :

// Each extension instance will be one of these
template <class Data = char>
struct instance
{
    PyObject_VAR_HEAD
    PyObject* dict;
    PyObject* weakrefs; 
    instance_holder* objects;

    typedef typename type_with_alignment<
        ::boost::alignment_of<Data>::value
    >::type align_t;
        
    union
    {
        align_t align;
        char bytes[sizeof(Data)];
    } storage;
};

This, in turn, figures out the type alignment using boost.typetraits and specifically alignment_of.

The union type is a way to force the alignment. The C++ standard requires the alignment of a struct to be at least the lowest common multiple of the alignments of all member of the struct. See the gcc doc about alignment.

So it looks like we can just specialize alignment_of for our type.

But it is already correct. Adding the following log

LOG(INFO) << "Alignment of Quaternion : " << boost::alignment_of<Quaternionf>::value;
LOG(INFO) << "Alignment of QuaternionHolder : " << boost::alignment_of<QuaternionfHolder>::value;

prints

0414 18:06:27.344784 2072736528 human_mod.cc:1072] Alignment of Quaternion : 16
I0414 18:06:27.344817 2072736528 human_mod.cc:1073] Alignment of QuaternionHolder : 16

If we specialize make_instance to LOG when a Quaternion is created, as follow :

namespace boost { namespace python { namespace objects {

//template <class T, class Holder>
template<>
struct make_instance<Quaternionf, QuaternionfHolder>
  : make_instance_impl<Quaternionf, QuaternionfHolder, make_instance<Quaternionf,QuaternionfHolder> >
{
    template <class U>
    static inline PyTypeObject* get_class_object(U&)
    {
        return converter::registered<Quaternionf>::converters.get_class_object();
    }

    static inline QuaternionfHolder* construct(void* storage, PyObject* instance, reference_wrapper<Quaternionf const> x)
    {
        LOG(INFO) << "Into make_instance";
        LOG(INFO) << "storage : " << storage;
        LOG(INFO) << "&x : " << x.get_pointer();
        LOG(INFO) << "&x alignment (0 = aligned): " << (reinterpret_cast<size_t>(x.get_pointer()) & 0xf);
        QuaternionfHolder* new_holder = new (storage) QuaternionfHolder(instance, x);
        LOG(INFO) << "&new_holder : " << new_holder;
        return new_holder;
    }
};

}}} // namespace boost::python::objects

We get the following log

I0416 11:49:48.403352 2072736528 quaternionf_specializations.h:103] Into make_instance
I0416 11:49:48.403367 2072736528 quaternionf_specializations.h:104] storage : 0x10593a698
I0416 11:49:48.403373 2072736528 quaternionf_specializations.h:105] &x : 0x1060d10e0
I0416 11:49:48.403378 2072736528 quaternionf_specializations.h:106] &x alignment (0 = aligned): 0

Now, I think that’s the problem. Our storage at 0x106239698 is not 16-bytes aligned. A 16-bytes aligned address has its 4 last bits (the last hex digit) at 0.

First idea - using an aligned python allocation function

CPython PyTypeObject allows to specify a custom allocation function, tp_alloc.

Ok so the real problem is the call to

PyObject* raw_result = type->tp_alloc(
    type, objects::additional_instance_size<Holder>::value);

In make_instance_impl.hpp. This means the boost::python::objects instance holding the storage where our QuaternionHolder (and therefore Quaternion) is newed is allocated using Python allocators. But we actually need this to be allocated using an aligned allocator, because it is holding our Quaternion instance and as explained in the Eigen docs, a structure holding an aligned type should be aligned itself.

Remark: even if we suceeded in doing this, we would need to use EIGEN_MAKE_ALIGNED_OPERATOR_NEW in the structure holding our quaternion)

Ok so we could specify the usage of a custom tp_alloc like this :

#define GC_UNTRACKED                    _PyGC_REFS_UNTRACKED

static PyObject*
aligned_PyObject_malloc(Py_ssize_t size) {
    void* memptr;
    int ret = posix_memalign(&memptr, 16, size);
    if (ret != 0) {
        return PyErr_NoMemory();
    }
    return (PyObject*)memptr;
}


// https://github.com/python/cpython/blob/2.7/Modules/gcmodule.c

// https://github.com/python/cpython/blob/2.7/Objects/typeobject.c
PyObject *
aligned_PyType_GenericAlloc(PyTypeObject *type, Py_ssize_t nitems)
{
    PyObject *obj;
    const size_t size = _PyObject_VAR_SIZE(type, nitems+1);
    /* note that we need to add one, for the sentinel */

    // TODO: use posix_aligned_malloc
    if (PyType_IS_GC(type)) {
      // This would involve rewriting _PyObject_GC_malloc with posix_memalign.
      // The problem is that _PyObject_GC_malloc access some gc-internal
      // statis variables that we cannot access here. So we do not support
      // objects with special GC requirements (boost class type doesn't
      // have PyType_IS_GC anyway)
      // https://docs.python.org/2/c-api/gcsupport.html#c.PyObject_GC_NewVar
        //obj = aligned_PyObject_GC_malloc(size);
        LOG(ERROR) << "Aligned allocator does not support GC objects";
        return PyErr_NoMemory();
    } else
        obj = aligned_PyObject_malloc(size);

    if (obj == NULL)
        return PyErr_NoMemory();

    memset(obj, '\0', size);

    if (type->tp_flags & Py_TPFLAGS_HEAPTYPE)
        Py_INCREF(type);

    if (type->tp_itemsize == 0)
        (void)PyObject_INIT(obj, type);
    else
        (void) PyObject_INIT_VAR((PyVarObject *)obj, type, nitems);

    if (PyType_IS_GC(type))
        _PyObject_GC_TRACK(obj);
    return obj;
}
  
template<>
struct make_instance<Quaternionf, QuaternionfHolder>
    : make_instance_impl<Quaternionf, QuaternionfHolder, make_instance<Quaternionf,QuaternionfHolder> >
{
    template <class U>
    static inline PyTypeObject* get_class_object(U&)
    {
        //return converter::registered<Quaternionf>::converters.get_class_object();
        PyTypeObject* type = converter::registered<Quaternionf>::converters.get_class_object();
        type->tp_alloc = aligned_PyType_GenericAlloc;
        return type;
    }

    static inline QuaternionfHolder* construct(void* storage, PyObject* instance, reference_wrapper<Quaternionf const> x)
    {
      LOG(INFO) << "Into make_instance";
      LOG(INFO) << "storage : " << storage;
      LOG(INFO) << "&x : " << x.get_pointer();
      LOG(INFO) << "&x alignment (0 = aligned): " << (reinterpret_cast<size_t>(x.get_pointer()) & 0xf);

      // From the specialized make_instance_impl above, we are guaranteed to
      // be able to align our storage
      //void* aligned_storage = reinterpret_cast<void*>(
          //(reinterpret_cast<size_t>(storage) & ~(size_t(15))) + 16);
      QuaternionfHolder* new_holder = new (storage) QuaternionfHolder(instance, x);
      LOG(INFO) << "&new_holder : " << &new_holder;
      return new_holder;
      //return new (storage) QuaternionfHolder(instance, x);
    }
};

The problem is, this only works for python type that are not PyType_IS_GC. But boost::class are managed by the GC cyclic collector (in boost class.cpp, tp_is_gc is a function that returns true). So we would have to define our own _PyObject_GC_malloc but we cannot do that because it relies on some static variables in python gcmodule.

There is some discussions on allowing aligned allocators in Python and for Python 3.4, we could use PyMem_SetAllocator but there seem to be no solution for 2.7

Solution - Specialize boost::python::make_instance

We can specialize some of the classes boost/python/object/make_instance.hpp . We will make sure we get a storage area large enough so that we can align our Quaternion instance. To do so, we’ll request the size of our Quaternion struct plus 16 bytes. This means that we’ll use a bit more memory than what is required.

We need to do 3 modifications :

Force instance<QuaternionfHolder> to allocate 16 bytes more than the size of QuaternionfHolder. This will allow us to align our QuaternionfHolder

union
{
  align_t align;
  char bytes[sizeof(Data) + 16];
} storage;

Have make_instance_impl correctly set the size of the python object

Holder* holder = Derived::construct(
  &instance->storage, (PyObject*)instance, x);
holder->install(raw_result);
   
// Note the position of the internally-stored Holder,
// for the sake of destruction
// Since the holder not necessarily allocated at the start of
// storage (to respect alignment), we have to add the holder
// offset relative to storage
size_t holder_offset = reinterpret_cast<size_t>(holder)
                        - reinterpret_cast<size_t>(&instance->storage)
                        + offsetof(instance_t, storage);
Py_SIZE(instance) = holder_offset;

Have make_instance::construct new our QuaternionfHolder in an aligned memory block.

static inline QuaternionfHolder* construct(void* storage, PyObject* instance, reference_wrapper<Quaternionf const> x)
{
  // From the specialized make_instance_impl above, we are guaranteed to
  // be able to align our storage
  void* aligned_storage = reinterpret_cast<void*>(
    (reinterpret_cast<size_t>(storage) & ~(size_t(15))) + 16);
  QuaternionfHolder* new_holder = new (aligned_storage) 
    QuaternionfHolder(instance, x);
  return new_holder;
}

Here is the full code :

typedef bp::objects::value_holder<Eigen::Quaternionf> QuaternionfHolder;

namespace boost { namespace python { namespace objects {

using namespace Eigen;

//template <class Data = char>
template<>
struct instance<QuaternionfHolder>
{
  typedef QuaternionfHolder Data;
    PyObject_VAR_HEAD
    PyObject* dict;
    PyObject* weakrefs;
    instance_holder* objects;

    typedef typename type_with_alignment<
        ::boost::alignment_of<Data>::value
    >::type align_t;

    union
    {
        align_t align;
        char bytes[sizeof(Data) + 16];
    } storage;
};


// Adapted from boost/python/object/make_instance.hpp

//template <class T, class Holder, class Derived>
template<class Derived>
struct make_instance_impl<Quaternionf, QuaternionfHolder, Derived>
{
    typedef Quaternionf T;
    typedef QuaternionfHolder Holder;

    typedef objects::instance<Holder> instance_t;

    template <class Arg>
    static inline PyObject* execute(Arg& x)
    {
        BOOST_MPL_ASSERT((mpl::or_<is_class<T>, is_union<T> >));

        PyTypeObject* type = Derived::get_class_object(x);

        if (type == 0)
            return python::detail::none();

        PyObject* raw_result = type->tp_alloc(
            type, objects::additional_instance_size<Holder>::value);

        if (raw_result != 0)
        {
            python::detail::decref_guard protect(raw_result);

            instance_t* instance = (instance_t*)raw_result;

            // construct the new C++ object and install the pointer
            // in the Python object.
            //Derived::construct(&instance->storage, (PyObject*)instance, x)->install(raw_result);
            Holder* holder = Derived::construct(
                &instance->storage, (PyObject*)instance, x);
            holder->install(raw_result);

            // Note the position of the internally-stored Holder,
            // for the sake of destruction
            // Since the holder not necessarily allocated at the start of
            // storage (to respect alignment), we have to add the holder
            // offset relative to storage
            size_t holder_offset = reinterpret_cast<size_t>(holder)
                                 - reinterpret_cast<size_t>(&instance->storage)
                                 + offsetof(instance_t, storage);
            Py_SIZE(instance) = holder_offset;

            // Release ownership of the python object
            protect.cancel();
        }
        return raw_result;
    }
};


//template <class T, class Holder>
template<>
struct make_instance<Quaternionf, QuaternionfHolder>
    : make_instance_impl<Quaternionf, QuaternionfHolder, make_instance<Quaternionf,QuaternionfHolder> >
{
    template <class U>
    static inline PyTypeObject* get_class_object(U&)
    {
        return converter::registered<Quaternionf>::converters.get_class_object();
    }

    static inline QuaternionfHolder* construct(void* storage, PyObject* instance, reference_wrapper<Quaternionf const> x)
    {
      LOG(INFO) << "Into make_instance";
      LOG(INFO) << "storage : " << storage;
      LOG(INFO) << "&x : " << x.get_pointer();
      LOG(INFO) << "&x alignment (0 = aligned): " << (reinterpret_cast<size_t>(x.get_pointer()) & 0xf);

      // From the specialized make_instance_impl above, we are guaranteed to
      // be able to align our storage
      void* aligned_storage = reinterpret_cast<void*>(
          (reinterpret_cast<size_t>(storage) & ~(size_t(15))) + 16);
      QuaternionfHolder* new_holder = new (aligned_storage) QuaternionfHolder(instance, x);
      LOG(INFO) << "&new_holder : " << new_holder;
      return new_holder;
      //return new (storage) QuaternionfHolder(instance, x);
    }
};


}}} // namespace boost::python::objects

Alternative solution - disable Quaternion alignment

This require changes in the code, but one option is to use Quaternion<float, Eigen::DontAlign> instead of Quaternionf. This disable alignment and should solve the problem.

References

boost_python_headers.hpp

python gcmalloc.c

comments