PEP: 3122 Title: Delineation of the main module Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon Status: Rejected Type: Standards Track Content-Type: text/x-rst Created: 27-Apr-2007 Post-History: .. attention:: This PEP has been rejected. Guido views running scripts within a package as an anti-pattern [#guido-rejection]_. Abstract ======== Because of how name resolution works for relative imports in a world where PEP 328 is implemented, the ability to execute modules within a package ceases being possible. This failing stems from the fact that the module being executed as the "main" module replaces its ``__name__`` attribute with ``"__main__"`` instead of leaving it as the absolute name of the module. This breaks import's ability to resolve relative imports from the main module into absolute names. In order to resolve this issue, this PEP proposes to change how the main module is delineated. By leaving the ``__name__`` attribute in a module alone and setting ``sys.main`` to the name of the main module this will allow at least some instances of executing a module within a package that uses relative imports. This PEP does not address the idea of introducing a module-level function that is automatically executed like PEP 299 proposes. The Problem =========== With the introduction of PEP 328, relative imports became dependent on the ``__name__`` attribute of the module performing the import. This is because the use of dots in a relative import are used to strip away parts of the calling module's name to calculate where in the package hierarchy an import should fall (prior to PEP 328 relative imports could fail and would fall back on absolute imports which had a chance of succeeding). For instance, consider the import ``from .. import spam`` made from the ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package itself, i.e., does not define ``__path__``). Name resolution of the relative import takes the caller's name (``bacon.ham.beans``), splits on dots, and then slices off the last n parts based on the level (which is 2). In this example both ``ham`` and ``beans`` are dropped and ``spam`` is joined with what is left (``bacon``). This leads to the proper import of the module ``bacon.spam``. This reliance on the ``__name__`` attribute of a module when handling relative imports becomes an issue when executing a script within a package. Because the executing script has its name set to ``'__main__'``, import cannot resolve any relative imports, leading to an ``ImportError``. For example, assume we have a package named ``bacon`` with an ``__init__.py`` file containing:: from . import spam Also create a module named ``spam`` within the ``bacon`` package (it can be an empty file). Now if you try to execute the ``bacon`` package (either through ``python bacon/__init__.py`` or ``python -m bacon``) you will get an ``ImportError`` about trying to do a relative import from within a non-package. Obviously the import is valid, but because of the setting of ``__name__`` to ``'__main__'`` import thinks that ``bacon/__init__.py`` is not in a package since no dots exist in ``__name__``. To see how the algorithm works in more detail, see ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_. Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed. The Solution ============ The solution to the problem is to not change the value of ``__name__`` in modules. But there still needs to be a way to let executing code know it is being executed as a script. This is handled with a new attribute in the ``sys`` module named ``main``. When a module is being executed as a script, ``sys.main`` will be set to the name of the module. This changes the current idiom of:: if __name__ == '__main__': ... to:: import sys if __name__ == sys.main: ... The newly proposed solution does introduce an added line of boilerplate which is a module import. But as the solution does not introduce a new built-in or module attribute (as discussed in `Rejected Ideas`_) it has been deemed worth the extra line. Another issue with the proposed solution (which also applies to all rejected ideas as well) is that it does not directly solve the problem of discovering the name of a file. Consider ``python bacon/spam.py``. By the file name alone it is not obvious whether ``bacon`` is a package. In order to properly find this out both the current direction must exist on ``sys.path`` as well as ``bacon/__init__.py`` existing. But this is the simple example. Consider ``python ../spam.py``. From the file name alone it is not at all clear if ``spam.py`` is in a package or not. One possible solution is to find out what the absolute name of ``..``, check if a file named ``__init__.py`` exists, and then look if the directory is on ``sys.path``. If it is not, then continue to walk up the directory until no more ``__init__.py`` files are found or the directory is found on ``sys.path``. This could potentially be an expensive process. If the package depth happens to be deep then it could require a large amount of disk access to discover where the package is anchored on ``sys.path``, if at all. The stat calls alone can be expensive if the file system the executed script is on is something like NFS. Because of these issues, only when the ``-m`` command-line argument (introduced by PEP 338) is used will ``__name__`` be set. Otherwise the fallback semantics of setting ``__name__`` to ``"__main__"`` will occur. ``sys.main`` will still be set to the proper value, regardless of what ``__name__`` is set to. Implementation ============== When the ``-m`` option is used, ``sys.main`` will be set to the argument passed in. ``sys.argv`` will be adjusted as it is currently. Then the equivalent of ``__import__(self.main)`` will occur. This differs from current semantics as the ``runpy`` module fetches the code object for the file specified by the module name in order to explicitly set ``__name__`` and other attributes. This is no longer needed as import can perform its normal operation in this situation. If a file name is specified, then ``sys.main`` will be set to ``"__main__"``. The specified file will then be read and have a code object created and then be executed with ``__name__`` set to ``"__main__"``. This mirrors current semantics. Transition Plan =============== In order for Python 2.6 to be able to support both the current semantics and the proposed semantics, ``sys.main`` will always be set to ``"__main__"``. Otherwise no change will occur for Python 2.6. This unfortunately means that no benefit from this change will occur in Python 2.6, but it maximizes compatibility for code that is to work as much as possible with 2.6 and 3.0. To help transition to the new idiom, 2to3 [#2to3]_ will gain a rule to transform the current ``if __name__ == '__main__': ...`` idiom to the new one. This will not help with code that checks ``__name__`` outside of the idiom, though. Rejected Ideas ============== ``__main__`` built-in --------------------- A counter-proposal to introduce a built-in named ``__main__``. The value of the built-in would be the name of the module being executed (just like the proposed ``sys.main``). This would lead to a new idiom of:: if __name__ == __main__: ... A drawback is that the syntactic difference is subtle; the dropping of quotes around "__main__". Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug. While the name of built-in could obviously be different (e.g., ``main``) the other drawback is that it introduces a new built-in. With a simple solution such as ``sys.main`` being possible without adding another built-in to Python, this proposal was rejected. ``__main__`` module attribute ----------------------------- Another proposal was to add a ``__main__`` attribute to every module. For the one that was executing as the main module, the attribute would have a true value while all other modules had a false value. This has a nice consequence of simplify the main module idiom to:: if __main__: ... The drawback was the introduction of a new module attribute. It also required more integration with the import machinery than the proposed solution. Use ``__file__`` instead of ``__name__`` ---------------------------------------- Any of the proposals could be changed to use the ``__file__`` attribute on modules instead of ``__name__``, including the current semantics. The problem with this is that with the proposed solutions there is the issue of modules having no ``__file__`` attribute defined or having the same value as other modules. The problem that comes up with the current semantics is you still have to try to resolve the file path to a module name for the import to work. Special string subclass for ``__name__`` that overrides ``__eq__`` ------------------------------------------------------------------ One proposal was to define a subclass of ``str`` that overrode the ``__eq__`` method so that it would compare equal to ``"__main__"`` as well as the actual name of the module. In all other respects the subclass would be the same as ``str``. This was rejected as it seemed like too much of a hack. References ========== .. [#2to3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC] .. [#importlib] importlib (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup) [ViewVC] .. [#guido-rejection] Python-Dev email: "PEP to change how the main module is delineated" (http://mail.python.org/pipermail/python-3000/2007-April/006793.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: