Search This Blog

Labels

Friday, December 31, 2010

Charming Python: Python elegance and warts——Attributes and methods

David Mertz, Ph.D. (mertz@gnosis.cx), Developer, Gnosis Software, Inc.

Summary:  In this series of two articles, David discusses the non-obvious features and misfeatures that have been added to the last several Python versions, with the goal of helping part-time Python programmers uncover the gems while avoiding the pitfalls. This installment adds attributes and methods, descriptors, and properties to the discussion.

The first installment in this series covers sequences and comparisons. This installment builds on those topics.

In most object-oriented languages, methods and attributes are almost, but not quite, the same thing. Both can be attached to a class and/or to an instance. Aside from the details of implementation, there is a key difference: attached to an object, methodsare things you can call to initiate actions and calculations; attributes simply have values that can be retrieved (and perhaps modified).

For some languages (the Java™ language, for example), this distinction is pretty much the end of the story. Attributes are attributes, methods are methods. The Java language, by convention, puts a heavy emphasis on encapsulation and data hiding; thus encouraging the use of "setters" and "getters" to access otherwise private attribute data. To the Java way of thinking, using explicit method calls covers in advance the case where you might want to add computation or side-effects to data access or modification. Of course, the result of the Java attitude is verbosity, and the imposition of a sometimes artificial-seeming discipline: you wind up writing foo.getBar() instead of foo.bar, and foo.setBar(value) instead of foo.bar=value.

Ruby is worth mentioning as something of an odd creature in this regard. Ruby actually insists on hiding data to an even greater degree than Java does: all attributes are always "private"; you can never directly access instance data. At the same time, Ruby uses some syntax conventions that make method calls look like attribute access looks in other languages. The first element of this is Ruby's optional parentheses in method calls; the second part is its use of semi-special naming of methods with symbols that are operators in most languages. So in Ruby, foo.bar is just a shorter option for calling foo.bar(); and "setting" foo.bar=value is shorthand for foo.bar=(value). Behind the scenes, everything goes through a method call.

Python is much more flexible than either Java or Ruby, but that fact is as much a criticism as it is praise. If you accessfoo.bar, or set foo.bar=value in Python, you might be using a simple data value, or you might be calling some semi-hidden code. Moreover, in the latter case, there are at least a half-dozen different ways you might reach that block of code, each one having slightly different behavior from the others, with dizzying subtleties and nuances. This deluge of options harms the regularity of Python and makes it harder to understand for non-experts (and even for experts). I know why things have gotten where they are: new capabilities have been added to Python's object-oriented underpinnings in several steps. But I am not terribly pleased by the chaos.

The old fashioned way

Since the old days (before Python 2.1), Python has had a magic method called .__getattr__() that a class could define to return computed values rather than simple data accesses. Correspondingly, the magic methods .__setattr__() and.__delattr__() could cause code to run when "attributes" were set or deleted. The problem with this old system was that you never really knew whether or not the code would actually be called, since it depended on whether an attribute with the same name as the one accessed existed in obj.__dict__. You could try to create .__setattr__() and .__delattr__()methods that controlled what wound up there, but even that did not prevent direct manipulation of obj.__dict__ by other code. Both changed inheritance trees, and passing objects to external functions could often make it non-obvious whether a method would or would not actually run when working with an object. For example:

Listing 1. Will the method run?

                
>>> class Foo(object):
... def __getattr__(self, name):
... return "Value of %s" % name
>>> foo = Foo()
>>> foo.just_this = "Some value"
>>> foo.just_this
'Some value'
>>> foo.something_else
'Value of something_else'

Accessing foo.just_this skips the method code, while accessing foo.something_else runs it; other than the fact that this shell session is short, nothing makes the difference obvious. In fact, asking the obvious hasattr() question gives you a misleading answer:

Listing 2. Ambiguity using hasattr()

                
>>> hasattr(foo,'never_mentioned')
True
>>> foo2.__dict__.has_key('never_mentioned') # this works
False
>>> foo2.__dict__.has_key('just_this')
True

The slots hack

With Python 2.2, we gained a new mechanism for creating "restricted" classes. Exactly what the new-style class _slots_attribute is supposed to be used for is nowhere made clear. For the most part, the Python documentation advises to use.__slots__ only for performance optimization in classes that might have a very large number of instances -- but specificallynot as a way to declare attributes. Nonetheless, the latter is what slots do: they create a class without a .__dict__ attribute that only has the attributes explicitly named (methods, however, are still declared as normal within the class body). It is peculiar, but this gives you a way to ensure that method code is called on attribute access:

Listing 3. Ensuring method execution using .__slots__

                
>>> class Foo2(object):
... __slots__ = ('just_this')
... def __getattr__(self, name):
... return "Value of %s" % name
>>> foo2 = Foo2()
>>> foo2.just_this = "I'm slotted"
>>> foo2.just_this
"I'm slotted"
>>> foo2.something_else = "I'm not slotted"
AttributeError: 'Foo' object has no attribute 'something_else'
>>> foo2.something_else
'Value of something_else'

The declaration of .__slots__ guarantees that only those attributes you specify can be accessed directly; everything else will go through the .__getattr__() call. If you have also created a .__setattr__() method, you can make an assignment do something other than raise an AttributeError (but be sure to let the "slotted" value pass through on assignment). For example:

Listing 4. Using .__setattr__ along with .__slots__

                
>>> class Foo3(object):
... __slots__ = ('x')
... def __setattr__(self, name, val):
... if name in Foo.__slots__:
... object.__setattr__(self, name, val)
... def __getattr__(self, name):
... return "Value of %s" % name
...
>>> foo3 = Foo3()
>>> foo3.x
'Value of x'
>>> foo3.x = 'x'
>>> foo3.x
'x'
>>> foo3.y
'Value of y'
>>> foo3.y = 'y' # Doesn't do anything, but doesn't raise exception
>>> foo3.y
'Value of y'

The .__getattribute__() method

In Python 2.2 and later, you have the option of using the method .__getattribute__() instead of the similarly and confusingly named, old-style .__getattr__(). Well, you have that option if you use new-style classes (which you generally should anyway). The .__getattribute__() method is more powerful than its sibling in that it intercepts all attribute access, whether or not an attribute is defined in obj.__dict__ or obj.__slots__. A drawback of using .__getattribute__() is that all access goes though the method. If you use this method, a bit of special programming is needed if you want to return (or manipulate) the "real" value of the attribute: usually you do this by calling .__getattribute__() on the superclass (usuallyobject). For example:

Listing 5. Returning a "real" .__getattribute__ value         

>>> class Foo4(object):
...     def __getattribute__(self, name):
...         try:
...             return object.__getattribute__(self, name)
...         except:
...             return "Value of %s" % name
...
>>> foo4 = Foo4()
>>> foo4.x = 'x'
>>> foo4.x
'x'
>>> foo4.y
'Value of y'

In all versions of Python, .__setattr__() and .__delattr__() also intercept all the write and delete access to attributes, rather than merely those absent from obj.__dict__.

Descriptors

We are moving along nicely in an enumeration of ways to make attributes act like methods. Within these magic methods, you can examine the specific attribute name being accessed, assigned, or deleted. In fact, if you like, you can check names by regular expression or by other computation. In principle, you can make all sorts of runtime decisions about how to handle the use of some given pseudo-attribute. For example, perhaps you do not simply want to compare the attribute name to a string pattern, but actually look up whether the name is an attribute that has been stored in a persistent database.

Much of the time, however, you would just like for a few "attributes" to act in a special manner but let other attributes operate as plain attributes. The plain attributes should neither trigger any special code nor suffer the time penalty of working through method code. In these cases, you can use descriptors for attributes. Or, you can define properties, closely related to descriptors. Behind the scenes, properties and descriptors amount to the same thing, but the syntax of defining them is rather different. And with the difference in definition styles, as you might have guessed, you get advantages and disadvantages.

Let's first look at a descriptor. The idea here is to assign an instance of a special kind of class to an attribute within another class. This special "descriptor" class is a new-style class that contains methods called .__get__(), .__set__(), and__delete__() (or at least a subset of those). If the descriptor class implements at least the first two, it is called a "data descriptor"; if it implements only the first, it is called a "non-data descriptor."

The non-data descriptor is likely to be used to return a callable object. A non-data descriptor is, in a sense, often a fancy name for a method -- but the particular method returned by descriptor access could be determined at runtime. This starts to get into the scary realm of things similar to metaclasses and decorators, which I have written about before in this column (seeResources for links). Of course, a regular method can also decide what code to run based on runtime conditions, so there is nothing fundamentally new about this concept of runtime determination of what a "method" does.

In any case, a data descriptor is more general, so I'll show you one. Such a descriptor could return something callable -- any Python function or method can return anything, after all. But our example just deals with simple values (and side effects). We want to make any use of a few attributes log the action to STDERR:

Listing 6. An example data descriptor

                

>>> class ErrWriter(object):
... def __get__(self, obj, type=None):
... print >> sys.stderr, "get", self, obj, type
... return self.data
... def __set__(self, obj, value):
... print >> sys.stderr, "set", self, obj, value
... self.data = value
... def __delete__(self, obj):
... print >> sys.stderr, "delete", self, obj
... del self.data
>>> class Foo(object):
... this = ErrWriter()
... that = ErrWriter()
... other = 4
>>> foo = Foo()
>>> foo.this = 5
set <__main__.ErrWriter object at 0x5cec90>
<__main__.Foo object at 0x5cebf0> 5
>>> print foo.this
get <__main__.ErrWriter object at 0x5cec90>
<__main__.Foo object at 0x5cebf0> <class '__main__.Foo'>
5
>>> print foo.other
4
>>> foo.other = 6
>>> print foo.other
6

The class Foo defines this and that as descriptors of the ErrWriter class. The attribute other is just a plain class attribute. Actually, there is a caveat here. On first access to foo.other, we read the class attribute; after we assign a value, we actually read the instance attribute instead. The class attribute is still there, just hidden, i.e.:

Listing 7. The class attribute vs. the instance attribute

                
>>> foo.other
6
>>> foo.__class__.other
4

In contrast, the descriptor remains a class-level object, even though you can access it through the instance. This has the usually undesirable effect of making the descriptor something like a singleton. For example:

Listing 8. The descriptor as a singleton

                
>>> foo2 = Foo()
>>> foo2.this
get <__main__.ErrWriter object at 0x5cec90>
<__main__.Foo object at 0x5cebf0> <class '__main__.Foo'>
5

To simulate a usual "per instance" behavior, you would need to make use of the obj passed into ErrWriter magic methods. This obj is the instance that has the descriptor. So you might define a non-singleton descriptor like:

Listing 9. Defining a non-singleton descriptor

                
class ErrWriter(object):
def __init__(self):
self.inst = {}
def __get__(self, obj, type=None):
return self.inst[obj]
def __set__(self, obj, value):
self.inst[obj] = value
def __delete__(self, obj):
del self.inst[obj]

Properties

Properties work like descriptors, but are generally defined inside a particular class rather than being created as "utility descriptors" that various classes might utilize. As with "regular" descriptors, the idea is to define "getters," "setters," and "deleters". After that, you use the special function property() to turn those methods into a descriptor. For those readers who are paying a bit closer attention: property is not really a function but a type -- don't worry about it.

Oddly, properties bring us full circle to the brief description I gave at top about how the Ruby programming languge works. A property is just a thing that looks like an attribute syntactically (as used), but is defined by defining all the getters, setters, and so on. If you wanted to, you could impose complete "Ruby-discipline" in Python, and never access "real" attributes. More likely, you will want to mix-and-match though. Here's how a property works:

Listing 10. How properties work

                
class FooP(object):
def getX(self): return self.__x
def setX(self, value): self.__x = value
def delX(self): del self.__x
x = property(getX, setX, delX, "I'm the 'x' property.")

The names of the getter, setter, and deleter are nothing reserved. Usually you will want to use sensible names like the above. What they actually do can be anything, but often it is reasonable to use double-underscore versions of names for the attributes. These attributes get attached to the instance, just with the usual Python name mangling for "semi-hidden" attributes. Moreover, the methods remain perfectly usable too:

Listing 11. Using methods

                
>>> foop = FooP()
>>> foop.x = 'FooP x'
>>> foop.getX()
'FooP x'
>>> foop._FooP__x
'FooP x'
>>> foop.x
'FooP x'

Let confusion reign

In this installment I have shown far too many ways to make Python instance attributes act like (or be) method calls, but I really do not have any clear advice on how to cut through the complexity. I'd like to be able to tell you simply to choose one of the described techniques and ignore the others as inferior or less general. Unfortunately, each technique has both strengths and weaknesses. Each is pretty reasonable for certain programming contexts, even though the syntax and semantics of each is so radically different from the others.

Moreover, while I have not described it in this article, I have thought (vaguely) of a number of other even more obscure ways that a programmer might use metaclasses, class factories, and decorators to obtain similar effects to the "standard" half-dozen techniques I have outlined. Those ideas would truly probe into some dark corners of Python metaprogramming.

It would be nice if all the things I described were possible, but the variations among them were simply parameterized in some straightforward way rather than using wholly different syntax and organization. The grand goal of Python 3000 is a simplification along these lines; but I have not seen any concrete proposals on how such unification and simplification of attributes-as-methods might work. One idea that occurs to me is that Python might enable decorators for classes (along with the current use in methods and functions), and also provide some standard module of decorators for the most common "magic attribute" behaviors. This is speculation, and I do not know exactly how it might work, but I can just imagine such a thing could hide the complexity from the 95% of Python programmers who really do not wish to worry too much about Python internals and cryptic mojo.

Resources

Learn


Get products and technologies



  • Order the SEK for Linux, a two-DVD set containing the latest IBM trial software for Linux from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.


  • With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.

Discuss

No comments:

Post a Comment