Issue3

Title Pre-processing
Type feature Status resolved
Importance
Superseder Nosy List ivan, mg, mk, tpj
Assigned To mg Keywords design

Created on 2008-03-13.13:10:26 by mk, last changed 2008-09-23.16:19:17 by tpj.

Files
File name Uploaded Type Edit Remove
preproc.diff mg, 2008-04-08.22:31:56 text/x-diff
Messages
msg88 (view) Author: mg Date: 2008-04-13.23:31:18
I have pushed revision e41ce22ed035 which implements pre-processing
using the ideas discussed here.
msg79 (view) Author: mg Date: 2008-04-09.16:40:03
Right, the patch currently requires you to explicitly indicate which
program counters you wish to generate data for. But that is tracked as
well -- here I have deliberately "forgotten" to generate the first
four triples and this is the result:

  Starting parallel test.
  ****************************************************************
  Started
  No data for (2003, 1, 0) <function get_triple at 0xb7a3117c>
  No data for (2004, 1, 0) <function get_triple at 0xb7a3117c>
  No data for (2005, 1, 0) <function get_triple at 0xb7a3117c>
  No data for (2006, 1, 0) <function get_triple at 0xb7a3117c>

  Total time used: 3.381 sec
  Time per operation: 3.381 ms
  ******
  Finished, synchronizing shutdown.
  {('generate_triples', (<class 'viff.field.GFElement'>,)):
     [(2003, 1, 0), (2004, 1, 0), (2005, 1, 0), (2006, 1, 0)]}

The final piece output is almost the correct code needed for
generating the missing preprocessing data. The correct code would be
something similar to this:

  {('generate_triples', (GF(123... 65-bit prime),)):
     [(2003, 1, 0), (2004, 1, 0), (2005, 1, 0), (2006, 1, 0)]}

The problem is that fields (those produced by calls to the GF
function) cannot be pickled right now since the classes are created
dynamically. Apart from that, then I think it should be a simple
matter of saving this description of the needed data and loading it at
the beginning of a run.
msg78 (view) Author: mk Date: 2008-04-09.13:30:04
No, that is actually quite impressive. You still have to know when they're going
to be used to get the program counters, but I guess you would in many cases.
msg77 (view) Author: mg Date: 2008-04-08.22:31:56
The attached patch shows my first attempt at implementing preprocessing
in a somewhat automated fashion... It still needs some work, but I
wanted to show you guys what I've come up with.

First, benchmark.py has been hard-coded to preprocess 1000 triples. They
happen to have program counters (i, 1, 0) for i in range 2003-3003.
Those numbers were found by running the benchmark with no preprocessed
data and printing rt._needed_data just before shutting down.

Running the benchmark on thyra{01,02,03} I get results like this:

  Seeding random generator with random seed 9148
  I am player 1, will mul-active 1000 numbers
  Using TLS
  Will connect to <Player 2: thyra02:7500>
  Will connect to <Player 3: thyra03:8500>
  #### Starting reactor ###
  Starting preprocessing
  Generating data using generate_triples (need 1000 items)
  Runtime ready, starting protocol
  Synchronizing test start.
  Starting test in 3
  Starting test in 2
  Starting test in 1
  Starting test now
  Starting parallel test.
  ****************************************************************
  Started

  Total time used: 3.278 sec
  Time per operation: 3.278 ms
  ******
  Finished, synchronizing shutdown.
  Shutdown.
  Stopped VIFF Runtime.
  Initiating shutdown sequence.

So the time drops from around 12 ms to a little more than 3 ms! Not so
bad, I think...
msg76 (view) Author: mk Date: 2008-04-08.14:05:03
My point is that in some sense we assume in VIFF for preprocessing that we know
what we are going to be doing, but when it comes to program counters and data
transmission, we don't have that assumption. It's not exactly what happens, and
I'm not sure I have any better solutions, but basically this is what bothers me.

The simple case where we have an interactive protocol which prepares for all
sorts of operations in its idle time makes perfect sense. The interactive
property does not conflict with the idea of preprocessing.

This is my way of pressuring people. Sometimes making people explain things to
you slowly reveals that there's something THEY haven't thought about. Actually
I've found that very often this is the case :)
msg75 (view) Author: mg Date: 2008-04-08.11:40:04
> Mikkel Krøigård added the comment:
>
> Actually my question is this: Doesn't the need for program counters
> attached to the precomputed data actually go against it being used to
> emulate an interactive functionality?

I think it is the idea of preprocessing that clashes with the idea of an
interactive functionality: if you are allowed to change your mind during
the program, then preprocessing does not make much sense.

> If we just want to run a specific program that we already know in
> advance, it's a completely different story. But then, like I wrote
> earlier, couldn't you even get rid of program counters in that case?

Sure -- a fixed progrem ought to do a fixed pattern of send/receive and
so we could simply number them.

> Don't get me wrong, I understand the need to take data in the same
> order everywhere, but again I find myself struggling to accept the
> whole program counter concept and trying to figure out something else
> instead. But as usual, no big ideas :)

Hehe :-) But really, please keep thinking! The program counters were
only my initial solution to the problem...
msg74 (view) Author: mg Date: 2008-04-08.11:35:03
> Mikkel Krøigård added the comment:
>
> If we could run through the code in advance in general, could we not
> just name the variables before we run the program and skip the program
> counter completely?

Yeah, you could do that in principle... my idea is exactly that every
program run has the same trace of program counter, so one could replace
that trace with a list of numbers. But how do you get from the list of
numbers back to the places in the code where they are needed?

A fancy static analysis that would unwind loops and inline function
calls could procude a nice flat program representation where every
"send" is used just once. But with the current code I don't see how...

> [...] The cool thing about having a pool of precomputed stuff is that
> you can work on producing more when you've got some free time and just
> use it when you need it. [...]

I remember that we talked about having the players generate triples in
their "free time", but wont that mean that the adversary has many
chances to disrupt the calculation? He might even be able to halt it at
a time where something sensitive has been revealed in an earlier part of
the calculation.
msg73 (view) Author: mg Date: 2008-04-08.11:25:03
It is still my hope that it is possible to know in advance how much
preprocessed data we need.

This is because the preprocessed data is independent of the actual input
data (otherwise we could not calculate it in a preprocessing step). This
means that, say, the benchmark will use the same amount of preprocessed
data in every run. I've tested it, and it really does produce the same
trace of program counters every time.

Things can get messy if a program makes choices that depend on values
opened along the way. I think there are two cases here:

1. Interactive programs where the users decide at runtime that they want
   to do this or that next (which, for some reason, is known as "SecRas"
   among SIMAP people...). Here I think we should see the sub-programs
   as independent units and simply make sure that we know in advance how
   much preprocessed data each sub-program needs and then generate some
   amount of data.

   There is really nothing else to do: a static analysis of such a
   program should also reach the conclusion that we need an infinite
   amount of preprocessed data to run it safely.

   So we should actually have slower protocols which we can fallback to
   in case we run out of preprocessed data...

2. Programs like the double auction that branch on opened values. That
   can actually be quite easy: simply make sure that the program
   executes the same number of branches each time, i.e., run all the way
   through the binary search tree. I believe we already do this for the
   double auction implementation in VIFF.

I have been worried about prss_share_random: it might recurse and so
generate more program counters. But it might not be a problem since
these program counters wont influence the counters in the rest of the
program trace and since prss_share_random uses no preprocessed data.

Also, in a protocol secure against active adversaries, one would
probably use one of the random numbers from the multiplication triples
instead of prss_share_random.
msg72 (view) Author: mk Date: 2008-04-08.11:05:03
Actually my question is this: Doesn't the need for program counters attached to
the precomputed data actually go against it being used to emulate an
interactive functionality?

If we just want to run a specific program that we already know in
advance, it's a completely different story. But then, like I wrote earlier,
couldn't you even get rid of program counters in that case?

Don't get me wrong, I understand the need to take data in the same order
everywhere, but again I find myself struggling to accept the whole program
counter concept and trying to figure out something else instead. But as usual,
no big ideas :)
msg71 (view) Author: tpj Date: 2008-04-08.10:35:03
Here's my initial thoughts regarding preprocessing:

For non-interactive programs where input is not known in advance,
isn't it computationally impossible in the general case to generate an
exact description of the data that can be preprocessed? As I
understand it, the best we can hope for is an estimate, e.g. a
conservative estimate stating that at most this much preprocessed data
will be needed or an estimate of the average.

It would be nice to be able to generate this automatically. An
"average" estimate of the preprocessing needed could be calculated
dynamically, if we have a collection of "representative" input. But I
guess that "representative" input is often hard to define. And as Ivan
notes, we need to ensure that at least a certain amount of data is
guaranteed to be preprocessed in the active case.

Another approach would be a static analysis. This would certainly be
easier for viff to carry out if the implementor of a protocol has
annotated the protocols in some way regarding the need for
preprocessing. In my opinion it would be ok to require the implementor
of a low level protocol (say comparison) to annotate it with info,
say, about the minimal, the maximal, and maybe even the average amount
of preprocessing needed for that particular protocol. This would make
it easier for viff to automate the analysis for higher-level
applications such as a double auction.

Another thing to consider, as Mikkel points out, is the case of
interactive programs. Here, it might be possible to do some analysis
that allows viff to use the "idle-time", e.g. where the program waits
for input, to do further preprocessing. But againn, as Ivan notes, we
need to take special care of this in the case of active security.
msg70 (view) Author: mk Date: 2008-04-08.09:20:04
If we could run through the code in advance in general, could we not just name
the variables before we run the program and skip the program counter
completely? I think the reason that we did not do this was agreed to be the
fact that we do not know the whole program in advance - meaning commands may be
given interactively based on past results.

I hope I remembered this correctly anyway, because right now it makes sense to
me. But the thing about preprocessing is exactly the same - once we are done
with some initial set of actions, we have a problem. The cool thing about
having a pool of precomputed stuff is that you can work on producing more when
you've got some free time and just use it when you need it. The
stuff-with-program-counters solution demands that we must know exactly why we
are preprocessing. Which is generally not the case!

Well, it annoys me anyway. Maybe someone can explain to me why this is not a
problem :)
msg69 (view) Author: mg Date: 2008-04-07.18:10:08
(I'm adding Thomas to the nosy list since I guess this discussion is
interesting for him too.)

I have been looking at this today, and it is a bit more complicated than
simply having a pool of data where we can draw elements from as needed.

* The preprocessed data needs to be taken from the pool in the same
  order by all players => it needs to be indexed by program counter.

* Some preprocessing functions (like generate_triples) return data for
  more than one operation. We obviously want to call generate_triples as
  few times as possible => we need to know beforehand how many triples
  we need and for which program counters.

My initial plan was to simply dump preprocessed data in one run (with
dummy data) and use it for real in a later run. But I now think we need
more than that:

* A run with dummy data records the program counters where preprocessed
  data is needed and the function that returns it.

* In the preprocessing step we run the functions as many times as
  necessary to obtain the needed data.

* The data is associated with the program counters. If generate_triples
  produces two triples, then these might be assigned to program counters
  that (at runtime) are encountered at different branches of the
  execution tree.

* The calculation can then begin.

So instead of having a dump of the data, we end up with a *description*
of it, and based on that we can generate it again as efficiently as
possible. Also, we can generate just the needed data without having to
run the entire computation on dummy data.
msg7 (view) Author: mg Date: 2008-03-13.18:16:59
Ivan Damgard <ivan@daimi.au.dk> writes:

> One should remember in the design of this that some asynchronous
> protocols allow the adversary to stop the whole thing if he acts
> before a certain point. This point is typically when the preprocessing
> is just finished. If the adversary does not interfere, the protocol is
> guaranteed to terminate. In such a protocol, it will be crucial that
> enough material is produced in the preprocessing for the rest of the
> computation to actually finish. If we start to generate more stuff
> on-line, we give the adversary another chance to interfere. This will
> mean that we are implementing a functionality different from what the
> original protocol promised..
>
> - Ivan

Yes, we should carefully decide how much preprocessed stuff we need at
the beginning of the real protocol run. The idea with the two-phased
approach is that it is easy to implement, and that it must work even
when we measure the needed randomness by using dummy data.

My intuition is that we can use dummy data instead of real data since
the protocol must be secure, and this means that it must proceed in an
identical manner no matter what data it is presented with.

I am not entirely sure if this intuition holds if stuff is opened in the
middle of a protocol, for then the dummy-data-run will differ from the
real-data-run... Hmm, this might be more complicated after all :-)
msg3 (view) Author: mk Date: 2008-03-13.13:10:26
With the use of deferreds, some amount of preprocessing is already being done
before the protocol begins proper. But this kind of preprocessing does nothing
for the double auction case where the program branches during execution.

So a general mechanism is needed by which one can preprocess a given number of
multiplications, comparisons, etc.

A possible design could be one in which this is done dynamically: when a
protocol is executed, preprocessed stuff is taken from a pool as needed. If the
pool runs dry, then more stuff is generated online. At the end of a run, the
program will know how much stuff was needed -- this information can then be
dumped to a file.

A separate preprocessing script can read this information from the file and
produce another file with a pickled pool of preprocessed stuff.
History
Date User Action Args
2008-09-23 16:19:17tpjsettype: feature
2008-04-13 23:31:18mgsetstatus: in-progress -> resolved
messages: + msg88
title: Preprocessing -> Pre-processing
2008-04-09 16:40:03mgsetmessages: + msg79
2008-04-09 13:30:04mksetmessages: + msg78
2008-04-08 22:31:56mgsetstatus: chatting -> in-progress
files: + preproc.diff
messages: + msg77
2008-04-08 14:05:03mksetmessages: + msg76
2008-04-08 11:40:04mgsetmessages: + msg75
2008-04-08 11:35:04mgsetmessages: + msg74
2008-04-08 11:25:04mgsetmessages: + msg73
2008-04-08 11:05:03mksetmessages: + msg72
2008-04-08 10:35:03tpjsetmessages: + msg71
2008-04-08 09:20:04mksetmessages: + msg70
2008-04-08 07:45:13mgsetassignedto: mg
2008-04-07 18:10:08mgsetnosy: + tpj
messages: + msg69
title: Support preprocessing -> Preprocessing
2008-03-13 18:17:00mgsetnosy: + ivan
messages: + msg7
2008-03-13 16:39:47mgsetmessages: - msg4
2008-03-13 16:35:04mgsetstatus: unread -> chatting
nosy: + mg
messages: + msg4
2008-03-13 13:10:26mkcreate