PEP: 256 Title: Docstring Processing System Framework Version: $Revision$ Last-Modified: $Date$ Author: dgoodger@bigfoot.com (David Goodger) Discussions-To: doc-sig@python.org Status: Draft Type: Standards Track Requires: PEP 257 Docstring Conventions PEP 258 DPS Generic Implementation Details Created: 01-Jun-2001 Post-History: Abstract Python modules, classes and functions have a string attribute called __doc__. If the first expression inside the definition is a literal string, that string is assigned to the __doc__ attribute, called a documentation string or docstring. It is often used to summarize the interface of the module, class or function. There is no standard format (markup) for docstrings, nor are there standard tools for extracting docstrings and transforming them into useful structured formats (e.g., HTML, DocBook, TeX). Those tools that do exist are for the most part unmaintained and unused. The issues surrounding docstring processing have been contentious and difficult to resolve. This PEP proposes a Docstring Processing System (DPS) framework. It separates out the components (program and conceptual), enabling the resolution of individual issues either through consensus (one solution) or through divergence (many). It promotes standard interfaces which will allow a variety of plug-in components (e.g., input parsers and output formatters) to be used. This PEP presents the concepts of a DPS framework independently of implementation details. Rationale Python lends itself to inline documentation. With its built-in docstring syntax, a limited form of Literate Programming [2] is easy to do in Python. However, there are no satisfactory standard tools for extracting and processing Python docstrings. The lack of a standard toolset is a significant gap in Python's infrastructure; this PEP aims to fill the gap. There are standard inline documentation systems for some other languages. For example, Perl has POD (plain old documentation) and Java has Javadoc, but neither of these mesh with the Pythonic way. POD is very explicit, but takes after Perl in terms of readability. Javadoc is HTML-centric; except for '@field' tags, raw HTML is used for markup. There are also general tools such as Autoduck and Web (Tangle & Weave), useful for multiple languages. There have been many attempts to write autodocumentation systems for Python (not an exhaustive list): - Marc-Andre Lemburg's doc.py [3] - Daniel Larsson's pythondoc & gendoc [4] - Doug Hellmann's HappyDoc [5] - Laurence Tratt's Crystal [6] - Ka-Ping Yee's htmldoc & pydoc [7] (pydoc.py is now part of the Python standard library; see below) - Tony Ibbs' docutils [8] These systems, each with different goals, have had varying degrees of success. A problem with many of the above systems was over-ambition. They provided a self-contained set of components: a docstring extraction system, an input parser, an internal processing system and one or more output formatters. Inevitably, one or more components had serious shortcomings, preventing the system from being adopted as a standard tool. Throughout the existence of the Python Documentation Special Interest Group (Doc-SIG) [9], consensus on a single standard docstring format has never been reached. A lightweight, implicit markup has been sought, for the following reasons (among others): 1. Docstrings written within Python code are available from within the interactive interpreter, and can be 'print'ed. Thus the use of plaintext for easy readability. 2. Programmers want to add structure to their docstrings, without sacrificing raw docstring readability. Unadorned plaintext cannot be transformed ('up-translated') into useful structured formats. 3. Explicit markup (like XML or TeX) has been widely considered unreadable by the uninitiated. 4. Implicit markup is aesthetically compatible with the clean and minimalist Python syntax. Early on, variants of Setext (Structure Enhanced Text) [10], including Digital Creation's StructuredText [11], were proposed for Python docstring formatting. Hereafter we will collectively call these variants 'STexts'. Although used by some (including in most of the above-listed autodocumentation tools), these markup schemes have failed to become standard because: - STexts have been incomplete: lacking 'essential' constructs that people want to use in their docstrings, STexts are rendered less than ideal. Note that these 'essential' constructs are not universal; everyone has their own requirements. - STexts have been sometimes surprising: bits of text are marked up unexpectedly, leading to user frustration. - SText implementations have been buggy. - Some STexts have have had no formal specification except for the implementation itself. A buggy implementation meant a buggy spec, and vice-versa. - There has been no mechanism to get around the SText markup rules when a markup character is used in a non-markup context. Recognizing the deficiencies of STexts, some people have proposed using explicit markup of some kind. There have been proposals for using XML, HTML, TeX, POD, and Javadoc at one time or another. Proponents of STexts have vigorously opposed these proposals, and the debates have continued off and on for at least five years. It has become clear (to this author, at least) that the "all or nothing" approach cannot succeed, since no all-encompassing proposal could possibly be agreed upon by all interested parties. A modular component approach, where components may be multiply implemented, is the only chance at success. By separating out the issues, we can form consensus more easily (smaller fights ;-), and accept divergence more readily. Each of the components of a docstring processing system should be developed independently. A 'best of breed' system should be chosen and/or developed and eventually included in Python's standard library. Pydoc & Other Existing Systems Pydoc is part of the Python 2.1 standard library. It extracts and displays docstrings from within the Python interactive interpreter, from the shell command line, and from a GUI window into a web browser (HTML). In the case of GUI/HTML, except for some heuristic hyperlinking of identifier names, no formatting of the docstrings is done. They are presented within

tags to avoid unwanted line wrapping. Unfortunately, the result is not pretty. The functionality proposed in this PEP could be added to or used by pydoc when serving HTML pages. However, the proposed docstring processing system's functionality is much more than pydoc needs (in its current form). Either an independent tool will be developed (which pydoc may or may not use), or pydoc could be expanded to encompass this functionality and *become* the docstring processing system (or one such system). That decision is beyond the scope of this PEP. Similarly for other existing docstring processing systems, their authors may or may not choose compatibility with this framework. However, if this framework is accepted and adopted as the Python standard, compatibility will become an important consideration in these systems' future. Specification The docstring processing system framework consists of components, as follows:: 1. Docstring conventions. Documents issues such as: - What should be documented where. - First line is a one-line synopsis. PEP 257, Docstring Conventions[12], documents these issues. 2. Docstring processing system generic implementation details. Documents issues such as: - High-level spec: what a DPS does. - Command-line interface for executable script. - System Python API - Docstring extraction rules. - Input parser API. - Intermediate internal data structure: output from input parser, input to output formatter. - Output formatter API. - Output management. These issues are applicable to any docstring processing system implementation. PEP 258, DPS Generic Implementation Details[13], documents these issues. 3. Docstring processing system implementation. 4. Input markup specifications: docstring syntax. 5. Input parser implementations. 6. Output formats (HTML, XML, TeX, DocBook, info, etc.). 7. Output formatter implementations. Components 1, 2, and 3 will be the subject of individual companion PEPs, although they may be merged into this PEP once consensus is reached. If there is only one implementation, PEPs for components 2 & 3 can be combined. Multiple PEPs will be necessary for each of components 4, 5, 6, and 7. An alternative to the PEP mechanism may be used instead, since these are not directly related to the Python language. The following diagram shows an overview of the framework. Interfaces are indicated by double-borders. The ASCII diagram is very wide; please turn off line wrapping to view it: +========================+ | Command-Line Interface | +========================+ | Executable Script | +------------------------+ | | calls v +===========================================+ returns +---------+ | System Python API |==========>| output | +--------+ +===========================================+ | objects | _ writes | Python | reads | Docstring Processing System | +---------+ / \ ==============>| module |<===========| | \_/ +--------+ | input | transformation, | output | +--------+ | +-------------+ follows | docstring | integration, | object | writes | output | --+-- consults | docstring |<-----------| extraction | linking | management |===========>| files | | --------->| conventions | +============+=====+=====+=====+============+ +--------+ / \ +-------------+ | parser API | | formatter API | / \ +-------------+ +===========+======+ +======+===========+ +--------+ author consults | markup | implements | input | intermediate | output | implements | output | --------->| syntax spec |<-----------| parser | data structure | formatter |----------->| format | +-------------+ +-----------+-------------------+-----------+ +--------+ Project Web Site A SourceForge project has been set up for this work at http://docstring.sf.net. References and Footnotes [1] PEP 236, Docstring Format, Zadka http://www.python.org/peps/pep-0216.html [2] http://www.literateprogramming.com/ [3] http://www.lemburg.com/files/python/SoftwareDescriptions.html#doc.py [4] http://starship.python.net/crew/danilo/pythondoc/ [5] http://happydoc.sf.net/ [6] http://www.btinternet.com/~tratt/comp/python/crystal/index.html [7] http://www.lfw.org/python/ [8] http://homepage.ntlworld.com/tibsnjoan/docutils/ [9] http://www.python.org/sigs/doc-sig/ [10] http://www.bsdi.com/setext/ [11] http://dev.zope.org/Members/jim/StructuredTextWiki/FrontPage/ [12] PEP 257, Docstring Conventions, Goodger, Van Rossum http://www.python.org/peps/pep-0257.html [13] PEP 258, DPS Generic Implementation Details, Goodger http://www.python.org/peps/pep-0258.html Copyright This document has been placed in the public domain. Acknowledgements This document borrows text from PEP 216, Docstring Format by Moshe Zadka[1]. It is intended as a reorganization of PEP 216 and its approach. This document also borrows ideas from the archives of the Python Doc-SIG. Thanks to all members past & present. Local Variables: mode: indented-text indent-tabs-mode: nil End: