The color of gray

Thursday, 07 November, Year 5 d.Tr. | Author: Mircea Popescu

Matthew Skala's excellent "Color of Bits" article is, ten years later, still the best translation of copyright law for the use of computer people.i

Meditation upon it has also inspired the current article, which is a proposition of a better torrent system, sadly bereft of technical detail of implementationii. To understand the matters proposed one has to have some familiarity with number-theoretic mathematics, or at least some practical understanding of cryptography enough to know what malleability is.

I. Torrenting as it works now. A large number X is split into fixed length numbers x1...xn. An index list of these x1...xn items is held somewhereiii. Clients interested in acquiring X will obtain the list of x1...xn fragments and the list of active other clients, and proceed to exchange the xi they have for the xi others have until each i from 1 to n is represented at least once in their collection, at which point "the download is complete". Most systems rely on some upload redundancy, in the sense that most clients should give out more i bits than n, generally by a margin of about 50%.

II. Torrenting as it is proposed to work. A large number X is split into fixed length numbers x1... xn. Each of these x1 numbers is put through mystery function F, with the property that while F(xi) exists for every i in [1, n], a F' does not exist so that F'(F(xi)) = xi for any i in [1,n] and moreover that a F'', Γiv pair also doesn't exist so that F''( Γ (F(xi), (i in [1..n])) = X. In plain human language, one needs to be able to calculate a hash of each fragment but not be able to recoup the fragment from the hash nor the whole number from the whole collection of fragment hashes.

This process is repeated by a number of different clients, each using his own F1... Fm, all of which satisfy the same above requirements. And now the fun part starts : the requirement is that for any m over a reasonable value (perhaps 10 ?), a G reverse function DOES exist together with a Γ so that G(Γ (F(xij), (i in [1..n], j in [1..k, k < m]) = X. In plain human language, one needs to be able to recover the original X from a collection of n+k hashes made by different users.

So in practice : you take Smart Money and cut it into Kb sized items. You hash all these. All your friends do the same. You can not obtain your original film from your collection of hashes, nor can they obtain their copy back from their collection of hashes. However, if you obtain a sufficient variety of such bits, say 60% of your set and 55% of Joe's set and 35% of Moe's set and so on and so forth you will eventually be able to bust the film back out of there.

In theory I imagine the case where m = n (so there's as many different users doing the hashing as there's hash fragments) might result in very close to 100% efficiency (ie, you can actually obtain X out of n fragments), whereas lower sample sizes might require more redundancy in the available bits (but ideally even if you only have 10ish sources, you shouldn't need more than say 1.30 to 1.5x n bits to decode X).

The disadvantage of this system, quite obviously, is that it'd need more CPU (or perhaps GPU ?) cycles, and more bandwidth to deliver the same copyright infringing experience - all those things we really have plenty of. The advantage, in case it's not obvious, is complete prosecutorial immunity for any copyright infringement whatsoever, and more generally, for any data sharing of any kindv - all those things we desperately need.

There is no doubt in my mind this can be done. The question, of course, is what will function F be ? I have no idea, quite frankly, I imagine it'd probably be something close to some sort of malleable cypher, perhaps with or perhaps without some help from schemes such as the Lamportized Blockchain.

So now, who's the bright young fellow that's going to define F for us ?

  1. I have a lot of respect for people who verify in practice the ability to translate key concepts in one intellectual system in such terms so as to allow consistency of understanding for practitioners skilled in another intellectual system. It is, as far as I can see, the foremost intellectual activity, and certainly to these people belongs the future. They are the new traders of the world, except where the old traders that made the old new world traded spices and slaves for gold and silver, the modern masters of all they survey trade simply ideas. Oh, but what a glorious trade it is, and how nothing else may ever come close! []
  2. Because I'm not good enough to do that part. []
  3. Originally by a centralised service called a "tracker", but from what I gather now the respective database is also distributed - you'll have to pardon my very approximate understanding of these things as pretty much everything I know on the topic comes from Bitcoin really. []
  4. Think of gamma as some sort of intuitive "addition" as conveniently defined for the application. []
  5. Inasmuch as you can't obtain the alleged infringed/forbidden material from the actual data shared by the actual individual, you will have to change the law to be able to prosecute this activity. And the problem with so changing the law is that you'd have to change it in ways it can't be changed, so this is never happening. []
Comments feed : RSS 2.0. Leave your own comment below, or send a trackback.

11 Responses

  1. Freenet project ( ) actually does this, if by slightly different means.

  2. And as for the prosecutorial immunity, whole thing won't work without sharing a key or address or whatever which you can use to get sufficient number of pieces together. So, someone or some group who put the copyrighted data in, must either make it available to network indexing engine under recognizable name for you to find, or they have to pass you some access gizmo. As the original article explains, *this* is the magic that makes the bits colored (and shows your unlawful intent) and it can't be washed away with some clever maths. The best we can achieve is to make enforcing the copyright unfeasible, which practically already happened anyway.

  3. jurov,

    All P2P nets are arguably based on the concept of transmuting 'heavy elements' (warez) into 'light' ones (the necessary search strings.)

    The objective here is to create a situation where a network user can only be shown to be sharing 'generic warezstuff' rather than any particular piece of warez.

    Likewise, while it may be quite impossible to arrange things so that an enemy capturing the packets a user downloads can never reveal which warez the latter is interested in, this can be forced into NP-completeness - enemy must try all possible reconstitution keys. And naturally this can only be done for those which are public (or that he was able to get a hold of through subterfuge.)

    Copyrasts would be forced into trying to ban the very act of accessing this particular P2P net - something they have not, AFAIK, managed to pull off in any civilized country.

  4. A possible solution to the puzzle follows.

    F is a 'sufficiently noisy' [1] version of Shamir's secret-splitter.

    G is a 'convolutional code', e.g. trellis, 'turbo,' or other 'forward error correcting' code.

    Γ is carried out by including a 'tag' field in every fragment which, with a probability P, 0 < P < 1, results in that fragment being returned given a certain query. Bonus points if you can cough up a Γ where said field is part of the noisy payload, such that every node must compute a little convolution prior to being able to answer any queries at all. This would prevent a malicious node from refusing to store arbitrary fragments based on their tags. Or at least simplify the unmasking of such nodes.

    [1] How noisy? Enough to make reconstitution from the number of fragments published by one typical user impossible. The exact noise requirement would depend on the the size of the payload and that of the fragments, as well as the number of such published by each node. One would have to play with the constants.

  5. It is entirely essential that any given fragment has a high probability of being sent back in reply to more than one distinct query.

    Otherwise the scheme degenerates to a comically-wasteful version of BitTorrent.

  6. Mircea Popescu`s avatar
    Mircea Popescu 
    Thursday, 7 November 2013

    @jurov Freenet did pop up in the preliminary discussion of this article. I'm personally unfamiliar with it, but what I gathered was that 1. the project is practically dead, for whatever reasons ; 2. it only superficially does something similar, but not actually this.

    And no, the whole thing is supposed to work without sharing any keys. That's the point of it. I am certain that your "won't work" is misplaced. After all, people manage to share Bitcoin with others without sharing their keys all the time. Don't tell me it can only be done successfully if one doesn't mean to do it.

    Moreover, if you want things not to work there's little point in reading articles about how they will in fact work, neh ?

    @Stanislav Datskovskiy Actually, there is no such thing as "sharing warez generally". The problem with the legal system is that ambiguity is poison in that context, and one can only show specific things in an acceptable manner.

    I think actually you're right, and a neglected part of the solution will have to be a fuzzer of some sort, a noise function with very narrowly controlled characteristics. Perhaps it should actually come in the shape of a very well defined pseudorandom function used as the key rng ? A sort of NSA diddling applied to actually useful purposes. In that case the 1st part would be satisfied by the same one user KNOWING which sort of pseudo-randomness he already used and avoiding it, whereas other users not knowing what other users used would end up with overlap.

    I'm entirely unsure how your degeneration point works. Expound ?

  7. Re: degeneration to BitTorrent: it is important that 'user sent fragment F' cannot be used to readily deduce the fact 'user possessed warez W.' Hence there must be maximal reuse of fragments (if a noisy frag is by any stretch of the imagination applicable to multiple potential queries, it is returned in response to said queries.)

  8. Mircea Popescu`s avatar
    Mircea Popescu 
    Thursday, 7 November 2013

    How would the fact that user sent F can be used to readily deduced user possessed W if F is only sent as part of the download process for W ?

  9. MP, 'fragment F sent if and only if queried for W' is the current situation in BitTorrent. What we want is 'F is sent if query is W1, W2, ... Wn - or maybe just because node felt like it.'

  10. Freenet *does* strive for complete plausible deniability both for stuff you are storing and stuff you're transferring. It is not dead, just not very popular, something like bitcoin in 2010. I don't see what is superficial therein, I only see arguments of "not invented here" type plus "it's in java, slooow".

  11. Mircea Popescu`s avatar
    Mircea Popescu 
    Thursday, 7 November 2013

    @Stanislav Datskovskiy Inasmuch as you can't obtain W from F it's moot that F is only sent in response to a request for W. Otherwise responding with "Please don't infringe copyright 128301298310" would be equally prosecutable.

    Not saying that it wouldn't be better to have ambiguous F-ing, but I don't see it is actually necessary.

    @jurov Well ok, so it strives. It's in fact much older than Bitcoin, right ? As in, decade+ older than Bitcoin. Wouldn't that make the comparison to Bitcoin in 2010 quite ungermane ?

    I will despise any implementation of anything in Java no matter what anyone else says. If Jesus saves in Java I won't be saved. So that settles that. The "not invented here" part is a little heavy if the best that can be said for this freenet thing is that it "strives" to accomplish the same thing that this here thing actually does accomplish. The "not invented here" is strictly reserved for things that actually work, not for things that intend to have worked.

Add your cents! »
    If this is your first comment, it will wait to be approved. This usually takes a few hours. Subsequent comments are not delayed.