2012-01-16
Understanding isinstance()
on Python classes
Suppose that you have:
class A(object): pass class B(A): pass
As previously mentioned, the type of classes is
type
, which is to say that class objects are instances of type
:
>>> isinstance(A, type) True >>> isinstance(B, type) True
Both A and B are clearly subclasses of object
; A is a direct subclass
and B is indirectly a subclass through A. In fact every new-style Python
class is a subclass of object
, since object
is the root of the
class inheritance tree. However, class type is not the same as class
inheritance:
>>> issubclass(B, A) True >>> isinstance(B, A) False
Although B is a subclass of A, it is not an instance of A; it is a
direct instance of type
(we can see this with 'type(B)
'). Now,
given that A and B are instances of type
, one might expect that they
would not be instances of object
since they merely inherit from it, as
B inherits from A:
>>> isinstance(A, object) True
Well, how about that. We're wrong (well, I'm wrong, you may already have known the correct answer). Here is why:
>>> issubclass(type, object) True
A and B are instances of type
and, like all other classes and types,
type
is a subclass of object
. So A and B are also instances of
object
(at least in an abstract, Python level view of things), in
the same way that an instance of B would also be an instance of A.
I believe that this implies that 'isinstance(X, object)
' is always
true for anything involved in the new-style Python object system. The
corollary is that this is an (almost) surefire test to see if the random
object you are dealing with is an old style class or an instance of one:
class C: pass >>> issubclass(C, object) False >>> isinstance(C, object) False
(This goes away in Python 3, where there is only new-style classes
and there is much rejoicing, along with people no longer having to
explicitly inherit from object
for everything.)
PS: as originally noted by Peter Donis on a comment here, object
is also an instance of type
because
object
is itself a class. type
is an instance of itself in addition
to being a subclass of object
. Try not to think about the recursion
too much.
(This isinstance()
surprise is an easy thing to get wrong, which is
why I'm writing it down; I almost made this mistake in another entry I'm
working on.)
Sidebar: isinstance()
and metaclasses
If A (or B) has a metaclass, it is an instance of the metaclass instead of a direct instance of type
. In any
sane Python program, 'isinstance(A, type)
' will continue to be True
because A's metaclass will itself be a subclass of type
.
(I'm not even sure it's possible to create a working metaclass
class that doesn't directly or indirectly subclass type
(cf), but I'm not going to bet against it.)
This implies that I was dead wrong when I said, back in ClassesAndTypes,
that 'type(type(obj))
' would always be 'type
' for any arbitrary
Python object, as Daniel Martin noted at the time and I never
acknowledged (my bad). In the presence of metaclasses, type(type(obj))
can be the metaclass instead of type
itself. Since metaclasses can
themselves have metaclasses, so there is no guarantee that any fixed
number of type()
invocations will wind up at type
.
What you can find out about the memory usage of your Linux programs
Recently I wound up reading Linux Memory Reporting
(via Hacker News),
where Robert Haas talks about Linux's lack of clear reporting on
process memory use. Today I'm going to sort of answer his question
by covering what information Linux gives you about the various
sorts of memory usage that you could
be curious about. My primary focus is going to be on the numbers
that you can get with ps
, top
, and smem
. The background information on
general Unix memory management
will be helpful.
So, what you can get:
- the total amount of virtual address space that your process currently
has allocated and mapped is the 'virtual size' of your process;
ps
reports this asVSZ
andtop
reports it asVIRT
. This includes anything the process has mapped, regardless of how it got there; the program's code, shared libraries, (System V) shared memory areas,mmap()
'd files,mmap()
'd private anonymous memory areas (which are often used by the C library formalloc()
), everything.If your program is in a steady state but your VSZ keeps increasing, you have some sort of allocation leak. It may not strictly be a memory leak; you might be forgetting to unmap files or unload dynamically loaded code or something.
(You can check at least some of this with
lsof
.) - how much RAM would be immediately freed up if this process exited
is
smem
'sUSS
('unique set size') field; this counts pages of RAM that the process is the only user of. These pages may be private pages (pages that will never be accessible by anyone else), or they may be shared pages that are only actively used by this process.(
smem
gets this information from the per-processsmaps
proc file.) - how much RAM your program has looked at recently (which is roughly
how much RAM it needs to be happy if it wasn't sharing anything) is
the 'resident set size', reported as
RSS
byps
andsmem
andRES
bytop
. The resident set size doesn't care whether or not some of that RAM is also used by other processes; each process counts it up separately.(In the terminology of my basic Unix memory management entry, a process's RSS is just how many page table entries in its virtual memory areas point to real RAM.)
Your process's RSS increases every time it looks at a new piece of memory (and thereby establishes a page table entry for it). It decreases as the kernel removes PTEs that haven't been used sufficiently recently; how fast this happens depends on how much memory pressure the overall system is under. The more memory pressure, the more the kernel tries to steal pages from processes and decrease their RSS.
If you have a memory leak it's routine for your RSS to stay constant while your VSZ grows. After all, you aren't looking at that leaked memory any more.
A large RSS on an active system (one under memory pressure) means that your process touches a lot of memory (often rapidly) during its operation. A growing RSS means that it is increasing the amount of memory it touches. A constant RSS doesn't mean that the process is touching the same memory over and over; it just means that it's touching about the same amount of memory per unit time.
- the process's fair share of currently in use RAM is
smem
'sPSS
('proportional set size') field. This prorates shared pages of RAM by charging each process for 1/Nth of the page, where N is how many processes currently have a page table entry for the page (the degenerate case is that you are charged the full page if you are the only user, ie this would be counted as part of your USS). Note that this is not how many processes have the shared resource mapped into their address space, it is how many processes have touched the page recently (ie, have it in their RSS). Mapping a shared resource is free (except to your VSZ); looking at it is what costs you here.It follows that the more processes actively look at pages of a shared resource, the lower each of their PSS goes for it (because more and more processes map the same pages from it).
(Like USS,
smem
gets this information from the per-processsmaps
proc file.)
Because of how it's defined, summing the per-process PSS for a resource
over all of the processes using that resource will tell you how much
RAM that resource is using. Smem can do this (for some resources) with
'smem -m
', although you need to know a certain amount about how Linux
gives names to various resources in order to understand smem
's output
here.
(If you have all of the processes of interest running under a single
userid, you can also use 'smem -u
'. Smem doesn't currently have an
option to aggregate reporting by program, so you can't do things like
see how much memory your httpd
processes are collectively using.)
As far as I know, Linux has no per-process or global number for how much of your virtual address size has ever been looked at (my second question in the six different meanings of memory usage). Nor can you get per-process information on how much memory the operating system might need to provide if your process wrote to everything it was entitled to (the sixth question), although you can get system-wide information on committed address space.
Top reports a SHR
number but it's not clear to me how useful this is,
partly because top
doesn't document where it gets this information
from. If I am reading the kernel code correctly, the most likely source
is the (process) RSS for memory areas that were mmap()
'd from files. I
am not sure if this includes things like System V shared memory areas,
and certainly it understates the potential sharing between, say,
fork()
'd processes. This is also only potential sharing, since it
says nothing about whether or not any other process has mmap()
'd the
same object.
(Ie, if your single process mmap()
's a private two gigabyte file and
then scans all of it, I believe that your SHR
will be two gigabytes
and change.)
Sidebar: answers to Bruce Momjian's questions
From his comment on Robert Haas's entry:
There are various methods for representing memory that is shared, either via SysV shared memory, fork's copy-on-write, or shared libraries. Does every process get charged the full amount, or do they split it among themselves, e.g. if five processes use shared memory, is each process charged 20% of the total size? (If another process attaches, does your percentage decrease?) What happens when you map in a large shared memory area but only access part of it? When do you stop using that memory?
Each process is charged the full amount to VSZ, but not to other numbers. When you map a large area but only refer to some of it, your VSZ goes up by the full amount but your RSS only goes up by the amount you access (and then goes down again at some rate if you don't access it and the system is under memory pressure). Your PSS is the only number that goes down if other people attach to the shared resource and also actually look at pages of that shared resource that you are also looking at (if they attach but don't look, your PSS doesn't change). If five processes all map the same shared memory segment but look at five different portions of it, each of them will be charged separately for their portion (their PSS for the segment will be the same as their USS); if they all look at the same portion, their PSS is 1/5th of the size of the portion.
(Your RSS never changes when people attach or detach from a shared resource.)