Examples of how to use the PEP3101 sandbox implementation. This does not show the nuances of using different specifiers. We should add another file for those. For now, you can see the unittests in ../test_simpleformat.py for comprehensive yet terse examples. >>> from pep3101 import format as f A null string is OK. >>> print f('') A string with no parameters is OK. >>> print f('Hi there') Hi there A string with parameters is OK, even if it doesn't use them. >>> print f('Hi there', 1, name='Joe') Hi there Unless the string declares that it needs to use all of them. >>> print f('{!useall}Hi there', 1, name='Joe') Traceback (most recent call last): ValueError: Not all arguments consumed at end of format_string Positional parameters are accessed by number, and the escaping mechanism is {}. For literal { or }, use {{ or }}. >>> print f('{{{0}}}: My name is {0}. Are you {1}?', 'Fred', 'George') {Fred}: My name is Fred. Are you George? Keyword parameters can also be used. >>> print f('{{{actor2}}}: My name is {actor2}. Are you {actor1}?', ... actor2='Fred', actor1='George') {Fred}: My name is Fred. Are you George? Locals() and globals() can be used if no parameters are given. We may change this if it isn't a big enough convenience win, because EIBTI. >>> actor1, actor2 = 'George', 'Fred' >>> print f('{{{actor2}}}: My name is {actor2}. Are you {actor1}?') {Fred}: My name is Fred. Are you George? We also have the capability to explicitly specify the dictionary. >>> print f('{{{actor2}}}: My name is {actor2}. Are you {actor1}?', ... _dict=locals()) {Fred}: My name is Fred. Are you George? If you have a real need, you could even specify multiple dictionaries. >>> print f('{{{actor2}}}: My name is {actor2}. Are you {actor1}?', ... _dict=(dict(actor1=actor1), dict(whatever=3), dict(), dict(actor2=actor2))) {Fred}: My name is Fred. Are you George? An attempt to use a non-existent argument will cause an exception. >>> print f("There is no {4} arg", 42, 24) Traceback (most recent call last): ValueError: Not enough positional arguments at format_string[13] >>> print f("There is no {foo} arg", 42, 24) Traceback (most recent call last): ValueError: Keyword argument not found at format_string[13] For multi-line strings, the error message tries to help you out a bit more, by figuring out the line and column. >>> print f("Test\nTest\nTest\nThere is no {4} arg\nTest", 42, 24) Traceback (most recent call last): ValueError: Not enough positional arguments in format_string at line 4, column 14 Arbitrarily complex attributes and indices can be accessed. >>> class Foo(object): ... pass ... >>> x, y, z = Foo(), Foo(), Foo() >>> x.a = [3, 4, 5, 42, 7, 2, 9, 6] >>> y.b = [1, x, 5] >>> z.c = [10, 11, 12, 13, 14, y, 16, 17, 1, 9] >>> print f("{z.c[5].b[1].a[3]}") 42 As any Python programmer knows, access to deeply embedded attributes can create lengthy source lines. The best way to keep format string source lines to a reasonable length (while still allowing arbitrary-length output lines) is to embed non-printing comments inside the format string. >>> print f("""The product of {z.c[5].b[1].a[7]} and {z.c[5].b[1].a[4]} {# ... This is a comment. It starts with {# and ends with the next }. ... As you can see, I have nested { and }, which only works if each ... nested { has a corresponding } somewhere after it in the comment. ... }is {z.c[5].b[1].a[3]}.""") The product of 6 and 7 is 42. Attributes and indices can even be retrieved from other variables. NB: Unless somebody has a really good use case, this capability is probably going away. It was added to try to satisfy Ian's requests that dictionary lookup capabilities of the % operator still be supported, but that was before the _dict parameter was added, so it's probably not that necessary any more. >>> print f("{y.b[{z.c[8]}].a[3]}") 42 Indices don't have to be numbers. >>> m = dict(foo='Sam') >>> print f("{0[foo]} went to the fair.", m) Sam went to the fair. Actually, the current rule (which is subject to debate), is that indices and attributes can contain anything which won't confuse the basic format string lexer. For example, they can't contain "{", "}", "[", "]", or ".", but pretty much anything else is fair game. (Note, however, that if we beef up the lexer for any reason, the list of disallowed characters could grow. For example, whitespace could easily be added to this list.) >>> m['-a?#$% 03'] = 4 >>> setattr(x, '27', 2) >>> print f("{m[-a?#$% 03]}{x.27}") 42 The only processing performed is that an index (but not an attribute) with a leading digit is converted to an integer. >>> m[0] = 4 >>> setattr(x, '0', 2) >>> print f("{m[0]}{x.0}") 42 The reason this behavior might be reasonable is that, in general, objects will protect themselves from "unauthorized" access by the simple fact that an exception will be thrown when the bad attribute or index is tried. In other words, we rely on the underlying object to throw an exception. >>> print f("{m[1]}") Traceback (most recent call last): KeyError: 1 >>> print f("{x.1}") Traceback (most recent call last): AttributeError: 'Foo' object has no attribute '1' I lied a teensy bit about all that. Objects won't protect themselves from unauthorized access to "private" attributes, so the parser disallows leading underscores; it assumes you're trying to pull a fast one if you use one of these. >>> x._evil_access = "Don't try this at home" >>> print f("{x._evil_access}") Traceback (most recent call last): ValueError: Leading underscores not allowed in attribute/index strings at format_string[3] The programmer (but not the format string itself) can override this security behavior. >>> _allow_leading_underscores = 1 >>> print f("{x._evil_access}") Don't try this at home Any object can provide a custom __format__ method to control its own formatting. In 3.0, some of the builtin types will probably have this method added to them and we can reduce the code in unicodeformat.c. >>> class Custom(object): ... def __format__(self, specifiers): ... return specifiers.upper() >>> custom = Custom() >>> print f("{0: Almost any specifier text you want}", custom) ALMOST ANY SPECIFIER TEXT YOU WANT Since the specifier can be wholly or partially retrieved from a variable, this can give great flexibility in what is printed out. >>> print f("The{0: answer is {1}{2}, or} so I hear", custom, 4, 2) The ANSWER IS 42, OR so I hear The custom format hook is similar to the custom __format__ method, in that the format function will call out to external Python code to help with the formatting. If you use the {!hook} directive in the format string, you can call out to the hook function on every single format call. >>> def hook(obj, specifier): ... print "Hook received", obj, specifier ... if obj == 42: ... return "the answer" ... if obj == 66: ... return None # returning None lets the default specifier handle it ... return '%s %s' % (obj, specifier) ... >>> _hook = hook >>> print f("{!hook}Call out to a {0:function} to print {1}.", "hook", 42) Hook received hook function Hook received 42 Call out to a hook function to print the answer. If the hook function decides it doesn't want to perform the formatting, it can defer back to the internal formatter by returning None. >>> print f("{!hook}I think {0} is {1:x}.", 42, 66) Hook received 42 Hook received 66 x I think the answer is 42. Using the hook function in this fashion gives great flexibility about how things are displayed. However, in many applications, the hook function is only required for a few fields, so you can explicitly request the hook function by placing a "p" (for python) in the field specifier type. >>> print f("In fact, I'm sure {0:h} is {0}!", 42) Hook received 42 h In fact, I'm sure the answer is 42! The hook function could still decide to defer execution to the internal formatter, but no good can come of this, since the internal formatter doesn't know anything about the "p" type. XXX -- I don't like the error message here -- need to debug and fix. >>> print f("{0:h} trombones led the big parade.", 66) Traceback (most recent call last): ValueError: Invalid conversion character at end of format_string Format strings may contain comments (which are not inserted into the output string). Comments are specified in a fairly intuitive way. One of the best uses for comments is to break up source lines which are too long. >>> print f("""The product of {z.c[5].b[1].a[7]} and {z.c[5].b[1].a[4]}{# ... ... Don't really need any text here, but it's in a comment so it ... doesn't matter. The point is that we can wrap a line before ... it gets to 80 characters if we wish to do so. ... ... } is {z.c[5].b[1].a[3]}.""") The product of 6 and 7 is 42. The implementation currently has the ability to support a few variations on the characters used to transition from text to markup. It might be unwieldy (and require inefficient code) to support too many different markup syntaxes, but the currently supported ones are similar enough to be parsed by the same C loop quite easily. It may seem that the choice of syntax comes down to personal taste (and there is undoubtedly a lot of that), but different problem domains arguably lend themselves to different syntaxes. One thing that is certain is that, if different syntaxes are to be supported, the template itself should announce if it is not using the default syntax, both so that automated tools can be used to analyze templates, and also because the syntax chosen is definitely a part of the template coding itself, rather than a part of the underlying Python objects to be displayed. The default syntax, as we have seen, uses {} to delineate markup from normal text. >>> print f("""{!syntax0} ... In syntax 0, literal {{ and }} characters are inserted by doubling them.""") In syntax 0, literal { and } characters are inserted by doubling them. It was pointed out to me that XML, for instance, requires a literal < to be escaped, but not a literal >. Obviously, for some sorts of text, it will be useful to see {{ balance }}, but perhaps for other sorts of text it is useful to not need to escape }. I don't know, but it's easy enough to do. >>> print f("""{!syntax1} ... Syntax 1 is the same as syntax 0, except } characters are not doubled.""") Syntax 1 is the same as syntax 0, except } characters are not doubled. In some text with a lot of braces, but few markup substitutions, it might be difficult to visually find the markup inside the text. Syntax 2 handles these situations handily by requiring a more traditional ${} approach to markup. >>> print f("""{!syntax2} ... Syntax ${0} requires $${} for markup. Use $$$${ for a literal $${.""", 2) Syntax 2 requires ${} for markup. Use $${ for a literal ${. This brings us to the last syntax supported by the sandbox implementation. Syntax 3 would send any non-Python programmer running, because it depends on significant whitespace. It works great for things like C templates where most left braces are followed by a newline. >>> print f("""{!syntax3} ... Syntax {0} requires { } for markup. Use "{ " for a literal "{ ", ... or use { by itself followed by a newline: { ... The trailing space will be "eaten" but the trailing newline will not.""", ... 3) Syntax 3 requires {} for markup. Use "{ " for a literal "{", or use { by itself followed by a newline: { The trailing space will be "eaten" but the trailing newline will not.