This file describes differences between PEP 3101 and the C implementation in this directory, and describes the reasoning behind the differences. PEP3101 is a well thought out, excellent starting point for advanced string formatting, but as one might expect, there are a few gaps in it which were not noticed until implementation, and there are almost certainly gaps in the implementation which will not be noticed until the code is widely used. Fortunately, the schedule for both Python 2.6 and Python 3.0 have enough slack in them that if we work diligently, we can widely distribute a working implementation, not just a theoretical document, well in advance of the code freeze dates. This should allow for a robust discussion about the merits or drawbacks of some of the fine points of the PEP and the implementation by people who are actually **using** the code. This nice schedule has made at least one of the implementers bold enough to consider the first cut of the implementation "experimental" in the sense that, since there is time to correct any problems, the implementation can diverge from the PEP (in well-documented ways!) both for perceived flaws in the PEP, and also to add minor enhancements. The code is being structured so that it should be easy to subsequently modify the operation to conform to consensus opinion. GOALS: Replace % The primary goal of the advanced string formatting is to replace the % operator. Not in a coercive fashion. The goal is to be good enough that nobody wants to use the % operator. Modular design for subfunction reuse The PEP explicitly disclaims any attempt to replace string.Template, concentrating exclusively on the % operator. While this narrow focus is very useful in removing things like compiling/caching and arbitrary expressions from the discussion about the PEP, if the PEP is successful, there is a good chance the syntax provided will become the "de facto" syntax for Python string templates, so the design of the implementation adds the goal of being able to expose the lower-level field formatting functionality for subsequent reuse in compatible templating systems. Efficiency It is not claimed that the initial implementation is particularly efficient, but it is desirable to tweak the specification in such a fashion that an efficient implementation IS possible. Since the goal is to replace the % operator, it is particularly important that the formatting of small strings is not prohibitively expensive. (The primary divergence between the PEP and the implementation due to this goal is that the implementation, by default, does not perform any sort of dictionary lookups other than those explicitly requested by the format string.) Security Security is a stated goal of the PEP, with an apparent goal of being able to accept a string from J. Random User and format it without potential adverse consequences. This may or may not be an achievable goal (this author is by no means a security expert so cannot know); the PEP certainly has some features that should help with this, such as the restricted number of operators, and the implemetation has some additional features, such as not allowing leading underscores on attributes by default, but these may be attempts to solve an intractable problem, similar to the original restricted Python execution mode. In any case, security is a goal, and anything reasonable we can do to support it should be done. Unreasonable things to support security include things which would be very costly in terms of execution time, and things which rely on the by now very much discredited "security through obscurity" approach. Older Python Versions Some of the implementers have very strong desires to use this formatting on older Python versions, and Guido has mentioned that any 3.0 features which do not break backward compatibility are potential candidates for inclusion in 2.6. This could almost certainly include additional string and unicode methods. No global state The PEP states "The string formatting system has two error handling modes, which are controlled by the value of a class variable." As has been discussed on the developer's list, this might be problematic, especially in large systems where components are being aggregated from multiple sources. One component might deliberately throw and catch exceptions in the string processing, and disabling this on a global basis might cause this component to stop working properly. If the ability to control this on a global basis is truly desirable, it is easy enough to add in later, but if it is not desirable, then deciding that after the fact and removing the capability from the method could break user code which has grown to rely on the feature. FORMATTING METADATA The basic desired operation of the PEP is to be able to write: 'some format control string'.format(param1, param2, keyword1=whatever, ...) Unfortunately, there needs to be some mechanism to handle out of band data for some formatting and error handling options. This could be really costly, if multiple options are looked up in the **keywords on every single call on even short strings, so some tweaks on the initial implementation are designed to reduce the overhead of looking up metadata. Two techniques are used: 1) Lazy evaluation where possible. For example, the code does not need to look up error-handling options until an error occurs. 2) Metadata embedded in the string where appropriate. This saves a dictionary lookup on every call. However this is only appropriate when (a) the metadata arguably relates to the actual control string and not the function where it is being used; and (b) there are no security implications. DIFFERENCES BETWEEN PEP AND INITIAL IMPLEMENTATION: Support for old Python versions The original PEP is Python 3000 only, which implies a lack of regular string support (unicode only). To make the code compatible with 2.6, it has been written to support regular strings as well, and to make the code compatible with earlier versions, it has been written to be usable as an extension module as well as/instead of as a string method: from pep3101 import format format('control string', parameter1, ...) Support for centering alignment In addition to left, right, and sign alignment ('<', '>', and '=', respectively), support has been added for center alignment, using '^'. format_item function A large portion of the code in the new advanced formatter is the code which formats a single field according to the given format specifier. (Thanks, Eric!) This code is useful on its own, especially for template systems or other custom formatting solutions. The initial implementation will have a format_item function which takes a format specifier and a single object and returns a formatted result for that object and specifier. comments The PEP does not have a mechanism for comments embedded in the format strings. The usefulness of comments inside format strings may be debatable, but the implementation is easy and easy to understand: {#This is a comment} Actually, one of the best uses for comments is not as comments, per se, but as delimiters to be able to break up long source lines in the format string (whitespace including newlines is allowed inside comments). errors and exceptions The PEP defines a global flag for "strict" or "lenient" mode. The implementation eschews the use of a global flag (see more information in the goals section, above), and splits out the various error features discussed by the PEP into different options. It also adds an option for disallowing identifiers with leading underscores. The first error option is controlled by the optional _allow_leading_underscores keyword argument. If this is present and evaluates non-zero, then leading underscores are allowed on identifiers and attributes in the format string. The implementation will lazily look for this argument the first time it encounters a leading underscore. The next error option is controlled by metadata embedded in the string. If "{!useall}" appears in the string, then a check is made that all arguments are converted. The decision to embed this metadata in the string can certainly be changed later; the reasons for doing it this way in the initial implementation are as follows: 1) In the original % operator, the exception reporting that an extra argument is present is orthogonal to the exception reporting that not enough arguments are present. Both these errors are easy to commit, because it is hard to count the number of arguments and the number of % specifiers in your string and make sure they match. In theory, the new string formatting should make it easier to get the arguments right, because all arguments in the format string are numbered or even named, and with the new string formatting, the corresponding error is that a _specific_ argument is missing, not just "you didn't supply enough." 2) It is arguably not Pythonic to check that all arguments to a function are actually used by the execution of the function, and format() is, after all, just another function. So it seems that the default should be to not check that all the arguments are used. In fact, there are similar reasons for not using all the arguments here as with any other function. For example, for customization, the format method of a string might be called with a superset of all the information which might be useful to view. 3) Assuming that the normal case is to not check all arguments, it is computationally much cheaper (especially for small strings) to notice the {! and process the metadata in the strings that want it than it is to look for a keyword argument for every string. The final error option concerns the ability to handle exceptions by catching them and embedding the exception information in the resultant output string rather than by passing them up to the caller. The original PEP distinguishes between references to missing or invalid arguments, and exceptions "raised by the underlying formatter." This is a difficult distinction. An attribute lookup can cause any arbitrary Python machinery to be invoked, so an exception could occur deep in the bowels of some nested function. "Lenient" handling according to the PEP would report this as a simple "TypeError" in the output string, rather than pass the exception through to the calling function, which might be counterproductive in debugging the problem. Conversely, a simple editing error in the specifier portion of a string which produces an invalid specifier would cause an exception "raised by the underlying formatter" and would always be an exception passed back to the calling function, rather than displayed to the user, even in "lenient" mode. The error handling proposed by one of the implementers (but not yet quite implemented) is as follows: 1) Hard-to-recover-from errors (memory allocation and or errors where it would be hard to know how to display useful information in the string) will always raise exceptions up to the caller. 2) Other error conditions are controlled by the _exception_display keyword argument. The value of this argument should either be: 0 - always raise exceptions up to caller 1 - dump simple exception information in the string where the field would have been displayed 2 - dump more comprehensive exception information in the string at exactly the location where the error was noticed (e.g. display the portion of the format field preceding the error, and also display traceback information, if any. Getattr and getindex rely on underlying object exceptions For attribute and index lookup, the PEP specifies that digits will be treated as numeric values, and non-digits should be valid Python identifiers. The implementation does not rigorously enforce this, instead deferring to the object's getattr or getindex to throw an exception for an invalid lookup. The only time this is not true is for leading underscores, which are disallowed by default. User-defined Python format function The PEP specifies that an additional string method, cformat, can be used to call the same formatting machinery, but with a "hook" function that can intercept formatting on a per-field basis. The implementation does not have an additional cformat function/method. Instead, user format hooks are accomplished as follows: 1) A format hook function, with call signature and semantics as described in the PEP, may be passed to format() as the keyword argument _hook. This argument will be lazily evaluated the first time it is needed. 2) If "{!hook}" appears in the string, then the hook function will be called on every single format field. 3) If the last character (the type specifier) in a format field is "h" (for hook) then the hook function will be called for that field, even if "{!hook}" has not been specified. User-specified dictionary The call machinery to deal with keyword arguments is quite expensive, especially for large numbers of arguments. For this reason, the implementation supports the ability to pass in a dictionary as the _dict argument. The _dict argument will be lazily retrieved the first time the template requests a named parameter which was not passed in as a keyword argument. Name mapping To support the user-specified dictionary, a name mapper will first look up names in the passed keywords arguments, then in the passed _dict (if any). User specified tuple of dictionaries Since we need a name mapper to look up items in the keywords dictionary, then in the passed-in dictionary, it is only a small feature creep to allow _dict itself to be a tuple of dictionaries. This is particularly useful for passing both locals() and globals() in to the format function. Automatic locals/globals lookup This is likely to be a contentious feature, but it seems quite useful, so in it goes for the initial implementation. For security reasons, this happens only if format() is called with no parameters. Since the whole purpose of format() is to apply parameters to a string, a call to format() without any parameters would otherwise be a silly thing to do. We can turn this degenerate case into something useful by using the caller's locals and globals. An example from Ian Bicking: assert x < 3, "x has the value of {x} (should be < 3)".format() The argument against doing this is EIBTI, but if it is truly believed that format() should not have automatic locals()/globals() lookup, then for Python 3000 (where many features of the language are being perfected), this feature should be reevaluated for eval() as well, because it seems that the arguments for or against automatic locals()/globals() lookups for eval and ''.format() are identical. Syntax modes The PEP correctly notes that the mechanism used to delineate markup vs. text is likely to be one of the most controversial features, and gives reasons why the chosen mechanism is better than others. The chosen mechanism is quite readable and reasonable, but different problem domains might have differing requirements. For example, C code generated using the current mechanism could get quite ugly with a large number of "{" and "}" characters. The initial implementation supports the notion of different syntax modes. This is bad from the "more than one way to do it" perspective, but is not quite so bad if the template itself has to indicate if it is not using the default mechanism. To give reviewers an idea of how this could work, the implementation supports 4 different modes: "{!syntax0}" -- the mode as described in the PEP "{!syntax1}" -- same as mode 0, except close-braces do not need to be doubled "{!syntax2}" -- Uses "${" for escape to markup, "$${" for literal "${" "{!syntax3}" -- Like syntax0 "{" for escape to markup, except literal "{" is denoted by "{ " or "{\n" (where the space is removed but the newline isn't). Syntax for metadata in strings There have been several examples in this document of metadata embedded inside strings, for "hook", "useall", and "syntax". The basic metadata syntax is "{!}", however to allow more readable templates, in this case, if the "}" is immediately followed by "\n" or "\r\n", this whitespace will not appear in the formatted output.