This is an idea which I think Thomas came up with first: it would be
very nice if one could limit the memory usage to, say, 200 MiB.
The program should run a normal, but when the memory usage grows above
the limit, the remaining parts of the program should be delayed. When
memory has been released, the program starts again.
There are two problems with this:
* Is it safe?
* How should it work?
The first question is a matter of avoiding deadlocks. The Deferreds
used in VIFF can depend on other Deferreds, and we want to avoid
cycles in the dependency graph.
Locally I think it is okay since we always make the Deferreds wait on
each other, and ultimately wait on incoming network traffic. So that
should work.
Globally is a different matter: can P1 be in a situation where it
needs data from P2, and P2 needs data from P1? I don't hope so :-)
As for how this could be implemented in an as seamless way as
possible, then I just got an idea! It is too late to "stop" or "delay"
the code at the time when it tries to create a new Deferred. I don't
see any good way to suspend the rest of the program at that moment.
But what we can do is to stop triggering already created Deferreds! If
we have reached the memory limit, the code in stringReceived (which
handles incoming network traffic) would simply store the data received
and not yet trigger the waiting Deferred. That would mean that the
program is starved for data -- it would appear to the program as if
the data has simply not arrived yet.
We should then periodically check the memory usage and when it has
dropped we can start triggering some more Deferreds with the received
data.
This idea relies on an assumption on how much outstanding data there
can be in transit at any given time. It also relies on the programs to
be structured in such a way that they do not allocate Deferreds for
the entire calculation up front, but instead allocate them during the
computation. So it wont solve the program of doing
large_result_list = []
for (x, y) in zip(very_large_list, another_large_list):
z = x * y
large_result_list.append(z)
since that would still immediately allocate a Deferred for each product.
|