Python import hooks

Today I worked with William on the promising ironclad project which allows you to use CPython extension such as numpy under IronPython. Ironclad needs to setup some import hooks to allow the loading of .pyd files. Here's some findings:

ihooks is old

One way to do import hooks is to use the ihooks module. However, this is the old way: it replaces the builtin import, and reimplements some of the functionality of the import statement. It's not deprecated, so you can still use it if it fits your needs. But there's probably a better way out there.

Enter PEP 302

PEP 302 defines two new, more granular ways to extend the import mechanism: pathhooks and metapath.

path_hooks

I'll get path_hooks out of the way, because it seems to be the least useful (at least to my understanding). If you add your own path hook, whenever an import happens, your hook will get called with every directory that is in sys.path. You can state that you will be responsible for loading everything in that directory by returning a loader instance. Unfortunately, this means that your hook replaces the built-in mechanism for that path - so you'll have to handle the loading of the plain python modules and packages as well.

meta_path

Meta path is much more interesting and useful, and it's what we used in ironclad. Basically, your handler gets called once for every import, with the full name of the module to be imported. So for example, when you do: from numpy.core import multiarray you will first get called with numpy, then with numpy.core and finally with numpy.core.multiarray. You also get an extra path argument which is None for top-level modules/packages or the path of the local import in case of subpackages/submodules.

If you can handle the loading of the requested module, you return a loader instance (which defines load_module). If you can't, and this is the interesting bit, you just return None and the normal import mechanism will take place. You can also raise ImportError to completely block the loading of a module.

Example code

This is what we are now doing in ironclad. There used to be an ihooks-based solution before, but since IronPython does some extra trickery for imports to allow importing .NET namespaces, things were a bit broken.

import sys
import os

# _mapper is added by deep black magic - pls ignore

def loader(path):
    class Loader(object):
        def load_module(self, name):
            if name not in sys.modules:
                _mapper.LoadModule(path, name)
                module = _mapper.GetModule(name)
                module.__file__ = path
                sys.modules[name] = module
                if '.' in name:
                    parent_name, child_name = name.rsplit('.', 1)
                    setattr(sys.modules[parent_name], child_name, module)
            return sys.modules[name]
    return Loader()

class MetaImporter(object):
    def find_module(self, fullname, path=None):
        if fullname == 'numpy' or fullname.startswith('numpy.'):
            _mapper.PerpetrateNumpyFixes()
        if fullname in ('_hashlib', 'ctypes'):
            raise ImportError('%s is not available in ironclad yet' % fullname)

        lastname = fullname.rsplit('.', 1)[-1]
        for d in (path or sys.path):
            pyd = os.path.join(d, lastname + '.pyd')
            if os.path.exists(pyd):
                return loader(pyd)

        return None


sys.meta_path = [MetaImporter()]

A minor annoyance is that load_module isn't called with any path, so you would have to walk sys.path again and somehow know if this a relative import. We avoid this trickery by having a class factory that closes over the path to the .pyd file of the module we want to import.

I hope that this post will help someone struggling in adding their own custom import hook to Python. You can find out more details, plus other niceties of the new hooks in PEP 302.