PEP: 382 Title: Namespace Packages Version: $Revision$ Last-Modified: $Date$ Author: Martin v. Löwis Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Apr-2009 Python-Version: 3.2 Post-History: Abstract ======== Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. In current Python versions, an algorithm to compute the packages __path__ must be formulated. With the enhancement proposed here, the import machinery itself will construct the list of directories that make up the package. Terminology =========== Within this PEP, the term package refers to Python packages as defined by Python's import statement. The term distribution refers to separately installable sets of Python modules as stored in the Python package index, and installed by distutils or setuptools. The term vendor package refers to groups of files installed by an operating system's packaging mechanism (e.g. Debian or Redhat packages install on Linux systems). The term portion refers to a set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package. Namespace packages today ======================== Python currently provides the pkgutil.extend_path to denote a package as a namespace package. The recommended way of using it is to put:: from pkgutil import extend_path __path__ = extend_path(__path__, __name__) in the package's ``__init__.py``. Every distribution needs to provide the same contents in its ``__init__.py``, so that extend_path is invoked independent of which portion of the package gets imported first. As a consequence, the package's ``__init__.py`` cannot practically define any names as it depends on the order of the package fragments on sys.path which portion is imported first. As a special feature, extend_path reads files named ``.pkg`` which allow to declare additional portions. setuptools provides a similar function pkg_resources.declare_namespace that is used in the form:: import pkg_resources pkg_resources.declare_namespace(__name__) In the portion's __init__.py, no assignment to __path__ is necessary, as declare_namespace modifies the package __path__ through sys.modules. As a special feature, declare_namespace also supports zip files, and registers the package name internally so that future additions to sys.path by setuptools can properly add additional portions to each package. setuptools allows declaring namespace packages in a distribution's setup.py, so that distribution developers don't need to put the magic __path__ modification into __init__.py themselves. Rationale ========= The current imperative approach to namespace packages has lead to multiple slightly-incompatible mechanisms for providing namespace packages. For example, pkgutil supports ``*.pkg`` files; setuptools doesn't. Likewise, setuptools supports inspecting zip files, and supports adding portions to its _namespace_packages variable, whereas pkgutil doesn't. In addition, the current approach causes problems for system vendors. Vendor packages typically must not provide overlapping files, and an attempt to install a vendor package that has a file already on disk will fail or cause unpredictable behavior. As vendors might chose to package distributions such that they will end up all in a single directory for the namespace package, all portions would contribute conflicting __init__.py files. Specification ============= Rather than using an imperative mechanism for importing packages, a declarative approach is proposed here, as an extension to the existing ``*.pth`` mechanism available on the top-level python path. The import statement is extended so that it directly considers ``*.pth`` files during import; a directory is considered a package if it either contains a file named __init__.py, or a file whose name ends with ".pth". Unlike .pth files on the top level, lines starting with "import" are not supported in per-package .pth files. In addition, the format of the ``*.pth`` file is extended: a line with the single character ``*`` indicates that the entire sys.path will be searched for portions of the namespace package at the time the namespace packages is imported. For a sub-package, the package's __path__ is searched (instead of sys.path). Importing a package will immediately compute the package's __path__; the ``*.pth`` files are not considered anymore after the initial import. If a ``*.pth`` package contains an asterisk, this asterisk is prepended to the package's __path__ to indicate that the package is a namespace package (and that thus further extensions to sys.path might also want to extend __path__). At most one such asterisk gets prepended to the path. In addition, the (possibly dotted) names of all namespace packages are added to the set sys.namespace_packages. If the first directory found on the path only contains an __init__.py and no \*.pth file, searching the path stops; IOW, namespace packages must include at least one .pth file. If both \*.pth files and an __init__.py had been found, search continues looking for further .pth files. No other change to the importing mechanism is made; searching modules (including __init__.py) will continue to stop at the first module encountered. In summary, the process import a package foo works like this: 1. sys.path is search for a directory foo, or a file foo.. If a file is found and no directory, it is treated as a module, and imported. 2. if it is a directory, it checks for both \*.pth and an __init__ file. If it finds only \*.pth files, a package is created, and its __path__ is extended. 3. If neither a \*.pth file nor an __init__.py was found, the directory is skipped, and search for the module/package continues. If an __init__.py was found, further search only looks for \*.pth files. 4. If an __init__ module was found, it is imported, with __path__ being initialized to the path computed from the \*.pth files. Impact on Import Hooks ---------------------- Both loaders and finders as defined in PEP 302 will need to be changed to support namespace packages. Failure to comform to the protocol below might cause a package not being recognized as a namespace package; loaders and finders not supporting this protocol must raise AttributeError when the functions below get accessed. Finders need to support looking for \*.pth files in step 1 of above algorithm. To do so, a finder must support a method: finder.find_path(fullname, path=None) This method will be called in the same manner as find_module, and it must return a list of strings, each representing the contents of one \*.pth file. If fullname is not found, is not a package, or does not have any \*.pth files, None must be returned. If any \*.pth files are found, but no loader was returned from find_module, a package is created and initialized with the path. If a loader was return, but no \*.pth files, load_module is called as defined in PEP 302. If both \*.pth files where found, and a loader was returned, a new method is called on the loader: loader.load_module_with_path(load_module, path) where the path parameter is the list of strings containing the contents of all \*.pth files. Discussion ========== With the addition of ``*.pth`` files to the import mechanism, namespace packages can stop filling out the namespace package's __init__.py. As a consequence, extend_path and declare_namespace become obsolete. It is recommended that distributions put a file .pth into their namespace packages, with a single asterisk. This allows vendor packages to install multiple portions of namespace package into a single directory, with no risk of overlapping files. Namespace packages can start providing non-trivial __init__.py implementations; to do so, it is recommended that a single distribution provides a portion with just the namespace package's __init__.py (and potentially other modules that belong to the namespace package proper). The mechanism is mostly compatible with the existing namespace mechanisms. extend_path will be adjusted to this specification; any other mechanism might cause portions to get added twice to __path__. It has been proposed to also add this feature to Python 2.7. Given that 2.x reaches its end-of-life, it is questionable whether the addition of the feature would really do more good than harm (in having users and tools starting to special-case 2.7). Prospective users of this feature are encouraged to comment on this particular question. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: