This file describes differences between PEP 3101 and the C implementation
in this directory, and describes the reasoning behind the differences.

PEP3101 is a well thought out, excellent starting point for advanced string
formatting, but as one might expect, there are a few gaps in it which were
not noticed until implementation, and there are almost certainly gaps in
the implementation which will not be noticed until the code is widely used.
Fortunately, the schedule for both Python 2.6 and Python 3.0 have enough
slack in them that if we work diligently, we can widely distribute a working
implementation, not just a theoretical document, well in advance of the code
freeze dates.  This should allow for a robust discussion about the merits or
drawbacks of some of the fine points of the PEP and the implementation by
people who are actually **using** the code.

This nice schedule has made at least one of the implementers bold enough
to consider the first cut of the implementation "experimental" in the sense
that, since there is time to correct any problems, the implementation can
diverge from the PEP (in well-documented ways!) both for perceived flaws in
the PEP, and also to add minor enhancements.  The code is being structured
so that it should be easy to subsequently modify the operation to conform
to consensus opinion.


GOALS:

    Replace %

The primary goal of the advanced string formatting is to replace the %
operator.  Not in a coercive fashion.  The goal is to be good enough
that nobody wants to use the % operator.


    Modular design for subfunction reuse

The PEP explicitly disclaims any attempt to replace string.Template,
concentrating exclusively on the % operator.  While this narrow focus
is very useful in removing things like compiling/caching and arbitrary
expressions from the discussion about the PEP, if the PEP is successful,
there is a good chance the syntax provided will become the "de facto"
syntax for Python string templates, so the design of the implementation
adds the goal of being able to expose the lower-level field formatting
functionality for subsequent reuse in compatible templating systems.


    Efficiency

It is not claimed that the initial implementation is particularly
efficient, but it is desirable to tweak the specification in such
a fashion that an efficient implementation IS possible.  Since the
goal is to replace the % operator, it is particularly important
that the formatting of small strings is not prohibitively expensive.
(The primary divergence between the PEP and the implementation
due to this goal is that the implementation, by default, does not
perform any sort of dictionary lookups other than those explicitly
requested by the format string.)


    Security

Security is a stated goal of the PEP, with an apparent goal of being
able to accept a string from J. Random User and format it without
potential adverse consequences.  This may or may not be an achievable
goal (this author is by no means a security expert so cannot know);
the PEP certainly has some features that should help with this,
such as the restricted number of operators, and the implemetation has
some additional features, such as not allowing leading underscores
on attributes by default, but these may be attempts to solve an
intractable problem, similar to the original restricted Python
execution mode.

In any case, security is a goal, and anything reasonable we can do to
support it should be done.  Unreasonable things to support security
include things which would be very costly in terms of execution time,
and things which rely on the by now very much discredited "security
through obscurity" approach.


    Older Python Versions

Some of the implementers have very strong desires to use this formatting
on older Python versions, and Guido has mentioned that any 3.0 features
which do not break backward compatibility are potential candidates for
inclusion in 2.6.  This could almost certainly include additional string
and unicode methods.


    No global state

The PEP states "The string formatting system has two error handling modes,
which are controlled by the value of a class variable."  As has been
discussed on the developer's list, this might be problematic, especially in
large systems where components are being aggregated from multiple sources.
One component might deliberately throw and catch exceptions in the string
processing, and disabling this on a global basis might cause this component
to stop working properly.  If the ability to control this on a global
basis is truly desirable, it is easy enough to add in later, but if it is
not desirable, then deciding that after the fact and removing the capability
from the method could break user code which has grown to rely on the feature.


FORMATTING METADATA

The basic desired operation of the PEP is to be able to write:

 'some format control string'.format(param1, param2, keyword1=whatever, ...)

Unfortunately, there needs to be some mechanism to handle out of band
data for some formatting and error handling options.  This could
be really costly, if multiple options are looked up in the **keywords
on every single call on even short strings, so some tweaks on the
initial implementation are designed to reduce the overhead of looking
up metadata.  Two techniques are used:

    1) Lazy evaluation where possible.  For example, the code does not
       need to look up error-handling options until an error occurs.

    2) Metadata embedded in the string where appropriate.  This
       saves a dictionary lookup on every call.  However this
       is only appropriate when (a) the metadata arguably relates
       to the actual control string and not the function where it
       is being used; and (b) there are no security implications.


DIFFERENCES BETWEEN PEP AND INITIAL IMPLEMENTATION:

    Support for old Python versions

The original PEP is Python 3000 only, which implies a lack of regular
string support (unicode only).  To make the code compatible with 2.6,
it has been written to support regular strings as well, and to make
the code compatible with earlier versions, it has been written to be
usable as an extension module as well as/instead of as a string method:

                from pep3101 import format
                format('control string', parameter1, ...)


    Support for centering alignment

In addition to left, right, and sign alignment ('<', '>', and '=',
respectively), support has been added for center alignment, using '^'.


    format_item function

A large portion of the code in the new advanced formatter is the code
which formats a single field according to the given format specifier.
(Thanks, Eric!)  This code is useful on its own, especially for template
systems or other custom formatting solutions.  The initial implementation
will have a format_item function which takes a format specifier and a
single object and returns a formatted result for that object and specifier.


    comments

The PEP does not have a mechanism for comments embedded in the format
strings.  The usefulness of comments inside format strings may be
debatable, but the implementation is easy and easy to understand:

                {#This is a comment}

Actually, one of the best uses for comments is not as comments,
per se, but as delimiters to be able to break up long source
lines in the format string (whitespace including newlines is
allowed inside comments).

    errors and exceptions

The PEP defines a global flag for "strict" or "lenient" mode.  The
implementation eschews the use of a global flag (see more information
in the goals section, above), and splits out the various error
features discussed by the PEP into different options.  It also adds
an option for disallowing identifiers with leading underscores.

The first error option is controlled by the optional _allow_leading_underscores
keyword argument.  If this is present and evaluates non-zero, then leading
underscores are allowed on identifiers and attributes in the format string.
The implementation will lazily look for this argument the first time it
encounters a leading underscore.

The next error option is controlled by metadata embedded in the string.
If "{!useall}" appears in the string, then a check is made that all
arguments are converted.  The decision to embed this metadata in the
string can certainly be changed later; the reasons for doing it this
way in the initial implementation are as follows:

      1) In the original % operator, the exception reporting that an
         extra argument is present is orthogonal to the exception reporting
         that not enough arguments are present.  Both these errors are
         easy to commit, because it is hard to count the number of arguments
         and the number of % specifiers in your string and make sure they
         match. In theory, the new string formatting should make it easier
         to get the arguments right, because all arguments in the format
         string are numbered or even named, and with the new string
         formatting, the corresponding error is that a _specific_
         argument is missing, not just "you didn't supply enough."

      2) It is arguably not Pythonic to check that all arguments to
         a function are actually used by the execution of the function,
         and format() is, after all, just another function.  So it seems
         that the default should be to not check that all the arguments
         are used.  In fact, there are similar reasons for not using
         all the arguments here as with any other function.  For example,
         for customization, the format method of a string might be called
         with a superset of all the information which might be useful to
         view.

      3) Assuming that the normal case is to not check all arguments,
         it is computationally much cheaper (especially for small
         strings) to notice the {! and process the metadata in the
         strings that want it than it is to look for a keyword argument
         for every string.

The final error option concerns the ability to handle exceptions by
catching them and embedding the exception information in the resultant
output string rather than by passing them up to the caller.  The original
PEP distinguishes between references to missing or invalid arguments,
and exceptions "raised by the underlying formatter."  This is a difficult
distinction.  An attribute lookup can cause any arbitrary Python machinery
to be invoked, so an exception could occur deep in the bowels of some
nested function.  "Lenient" handling according to the PEP would report
this as a simple "TypeError" in the output string, rather than pass the
exception through to the calling function, which might be counterproductive
in debugging the problem.  Conversely, a simple editing error in the
specifier portion of a string which produces an invalid specifier would
cause an exception "raised by the underlying formatter" and would always
be an exception passed back to the calling function, rather than displayed
to the user, even in "lenient" mode.

The error handling proposed by one of the implementers (but not yet quite
implemented) is as follows:

    1) Hard-to-recover-from errors (memory allocation and or errors where
       it would be hard to know how to display useful information in the
       string) will always raise exceptions up to the caller.
    2) Other error conditions are controlled by the _exception_display
       keyword argument.  The value of this argument should either be:
           0 - always raise exceptions up to caller
           1 - dump simple exception information in the string where
               the field would have been displayed
           2 - dump more comprehensive exception information in the
               string at exactly the location where the error was
               noticed (e.g. display the portion of the format field
               preceding the error, and also display traceback information,
               if any.


    Getattr and getindex rely on underlying object exceptions

For attribute and index lookup, the PEP specifies that digits will be
treated as numeric values, and non-digits should be valid Python
identifiers.  The implementation does not rigorously enforce this,
instead deferring to the object's getattr or getindex to throw an
exception for an invalid lookup.  The only time this is not true
is for leading underscores, which are disallowed by default.


    User-defined Python format function

The PEP specifies that an additional string method, cformat, can be
used to call the same formatting machinery, but with a "hook" function
that can intercept formatting on a per-field basis.

The implementation does not have an additional cformat function/method.
Instead, user format hooks are accomplished as follows:

        1) A format hook function, with call signature and semantics
           as described in the PEP, may be passed to format() as the
           keyword argument _hook.  This argument will be lazily evaluated
           the first time it is needed.

        2) If "{!hook}" appears in the string, then the hook function
           will be called on every single format field.

        3) If the last character (the type specifier) in a format field
           is "h" (for hook) then the hook function will be called for
           that field, even if "{!hook}" has not been specified.


    User-specified dictionary

The call machinery to deal with keyword arguments is quite expensive,
especially for large numbers of arguments.  For this reason, the
implementation supports the ability to pass in a dictionary as the
_dict argument.  The _dict argument will be lazily retrieved the first
time the template requests a named parameter which was not passed
in as a keyword argument.


    Name mapping

To support the user-specified dictionary, a name mapper will first
look up names in the passed keywords arguments, then in the passed
_dict (if any).


    User specified tuple of dictionaries

Since we need a name mapper to look up items in the keywords dictionary,
then in the passed-in dictionary, it is only a small feature creep to
allow _dict itself to be a tuple of dictionaries.  This is particularly
useful for passing both locals() and globals() in to the format function.

    Automatic locals/globals lookup

This is likely to be a contentious feature, but it seems quite useful,
so in it goes for the initial implementation.  For security reasons,
this happens only if format() is called with no parameters.  Since
the whole purpose of format() is to apply parameters to a string,
a call to format() without any parameters would otherwise be a
silly thing to do.  We can turn this degenerate case into something
useful by using the caller's locals and globals.  An example from
Ian Bicking:

            assert x < 3, "x has the value of {x} (should be < 3)".format()

The argument against doing this is EIBTI, but if it is truly believed
that format() should not have automatic locals()/globals() lookup, then
for Python 3000 (where many features of the language are being perfected),
this feature should be reevaluated for eval() as well, because it seems
that the arguments for or against automatic locals()/globals() lookups for
eval and ''.format() are identical.


    Syntax modes

The PEP correctly notes that the mechanism used to delineate markup
vs. text is likely to be one of the most controversial features,
and gives reasons why the chosen mechanism is better than others.

The chosen mechanism is quite readable and reasonable, but different
problem domains might have differing requirements.  For example,
C code generated using the current mechanism could get quite ugly
with a large number of "{" and "}" characters.

The initial implementation supports the notion of different syntax
modes.  This is bad from the "more than one way to do it" perspective,
but is not quite so bad if the template itself has to indicate if it
is not using the default mechanism.  To give reviewers an idea of
how this could work, the implementation supports 4 different modes:

        "{!syntax0}"   -- the mode as described in the PEP
        "{!syntax1}"   -- same as mode 0, except close-braces
                          do not need to be doubled
        "{!syntax2}"   -- Uses "${" for escape to markup, "$${" for
                          literal "${"
        "{!syntax3}"   -- Like syntax0 "{" for escape to markup,
                          except literal "{" is denoted by "{ "
                          or "{\n" (where the space is removed but
                          the newline isn't).


    Syntax for metadata in strings

There have been several examples in this document of metadata
embedded inside strings, for "hook", "useall", and "syntax".

The basic metadata syntax is "{!<keyword>}",  however to allow
more readable templates, in this case, if the "}" is immediately
followed by "\n" or "\r\n", this whitespace will not appear in
the formatted output.