MP's fabulous hash function, and its family

Thursday, 05 January, Year 9 d.Tr. | Author: Mircea Popescu

This proposal stems from a recent discussion in the forum of The Most Serene Republic, and consists of a readaptation of an older idea of Stan's to the task at handi.

The item here contemplated is then an algorithmic, rather than algebraic, hash function. It consists of three elements and four operations, which we proceed to define :

Elements of the fabulous hash function :

  • One element is the message to be hashed, M, which is a field of bits of unspecified length.
  • Another element is the so called state machine, S, which starts as one null bit and grows to an unspecified length.
  • The last element is the result of the hashing, R, which is a field of bits of user-specified length.

Operations of the fabulous hash function :

  • One operation is the bit flip, let it be called flip. This operation consists of toggling one specified bit of either S or R.
  • Another operation is the inversion, let it be called invert. This operation consists of toggling all the bits of either S or R.
  • A third operation is the flipping of a number of bits in either S or R, let it be called the screw. This operation consists of taking the bit count of either S or R, iterating over that value, at each step multiplying the iterator with the current position in M, calculating the remainder of that product against the bit count of R or S respectively, and flipping the remainder-th bit in R or S respectively. A half-screw will take half (floor) the bit count of S or R instead.
  • The fourth operation is a shift of the state machine, call it expand. This operation consists of adding one null bit at the end of S.
  • The last operation is a position rewind, call it a rewind. This operation consists of decreasing our position in the message M by one, except in the case our position is already 0.

On this basis we can now proceed to define our function :

I. The function starts by allocating memory : one bit for S, and the size specified by the user for R. All allocated bits are zero.

II. The function starts at position 0 in M and iterates over each bit in M. These iterations are called steps. During each step, the function considers whether the position-th bit in M is 0 or 1, and executes a defined set of operations in either case. Once the operations have been executed, the position is incremented by one. Once the position is larger than the size of M, the function returns R as the hashed value of M.

So far we have defined a whole class of hash functions : the Fabulous Hash Function family. Numerous kinds and types could be devised on the basis described, and any willing operator of a numeric machine is certainly encouraged to do so for his own needs and uses.

Let us now consider the canonical example of a FHF : MP's own FHF. It goes like so :

  • if the bit in M is 0, expand and screw S in R. If the bit in R found at the position equal to the remainder of the division of our position in M by the size of R is 0
    • That bit in R is flipped and we rewind.
    • else, that bit in R is flipped and S is inverted.
  • if the bit in M is 1, half-screw S in R. If the bit in R found at the position equal to the remainder of the division of our position in M by the size of R is equal to the bit in S found at the position equal to the remainder of the division of our position in M by the size of S
    • We expand and then screw R in S.
    • else, that bit in R is flipped.

There are a number of important properties to be underlined.

  1. The very last bit of M can produce significant change in R, potentially touching all its bits.
  2. The fabulous hash-function is message-size indifferent, being operable of messages of any size within the limitations of the hardware only.
  3. The fabulous hash-function outputs hashes of any arbitrary length equally well.
  4. It is not possible to say, on the basis of a hash, how much memory, or how much CPU was consumed to process the message.
  5. The cheapest way to calculate how much memory, or how much CPU will be needed to process a message is to process the message.
  6. Because the fabulous hash function does not use blocks, it does not require any kind of padding.

Diligent work by ben_vulpes and sina allows us no less than three canonical implementations to choose from : lisp, go, python.

———
  1. Which is, exactly as he states it, to resolve the major weakness that the hash function can be written as an equation.

    The fundamental, intrinsic and so far nor escaped weakness of all currently known hash functions is that a system of equations equal to the count of their possible outputs could be written ; and resolved. No, the usual obscurantism passing for "analysis" in the field does not resolve this problem anymore than sprinkling some truly strong holy water cures lesbianism. []

Comments feed : RSS 2.0. Leave your own comment below, or send a trackback.

26 Responses

  1. > Once the position is larger than the size of M, the function returns R as the hashed value of M.

    Why not waltz full-circle (modularly) around the underflow ?
    i.e. nextpos

  2. Mircea Popescu`s avatar
    2
    Mircea Popescu 
    Friday, 6 January 2017

    Because you gotta stop sometime.

  3. Mircea Popescu`s avatar
    3
    Mircea Popescu 
    Friday, 13 January 2017

    Since alf decided to vaguely open the discussion in the log,

    asciilifeform: !~later tell mircea_popescu while sleeping i had a thought re http://trilema.com/2017/towards-a-better-hash-function and my earlier attempt thereof likewise : that the thing may not in fact be ~collision-resistant~ (it may not be so difficult to take an arbitrary desired output and find a tape that hashes to it.)

    let me point out that I won't entertain discussions pertaining to this hash outside of comments here ; or else articles on blogs which have the capacity to pingback or something. Finding discussions in #trilema logs is already a hard enough problem ; no need to make it even harder for absolutely no reason through lazily misplacing there discussions that do have a natural genesis otherwise.

    As to the subject of collision resistance : every hash function has collisions as a matter of definition : if the message M is 65 bits long and the hash result R is set to 64 bits, then there will be on average 264 = 18`446`744`073`709`551`616 collisions over the entire space, or about every other M will collide with something. This results from elementary properties of numbers, you can't shove four apples in two boxes so that there's one apple per box.

    So : if R is not shorter than (most) M then the function is not a hashing function (I suspect USGtards would call it a "PRNG") ; if R is shorter than (some) M then there's going to be (some) collision going on. These are the definitions of our terms.

    The problem of finding collisions is an entirely different kettle of fish.

    Should the length of R be set to one bit, then in a list of arbitrarily chosen Ms every other one would collide, on average, and irrespective of length. Finding collisions would require calculating slightly more than one hash (by a formula produced by l'Hospital, not going to go into detail here).

    Should the length of R be set to something reasonable (which, for this hash, as opposed to any other, is very easy to do), and should the length of the message M not be known by the attacker, then the problem he faces consists of finding a message M' so that MPfhf(M) = MPfhf(M'). The brute force method to do this is to take as final S all numeric values from 0 to an arbitrary number T, and work backwards to a legal state. It is to be expected that some of these final values for S the attacker chooses will produce a legal final state (ie, with S = "0") after a finite number of steps. It is however also to be expected that some won't, and as far as I know there isn't currently a way known to establish whether the brute force hash inversor stops after a finite number of steps. (Then again this problem may in fact be mitigated by limiting the hash reverser to a number of steps chosen on the basis of assumptions wrt to the length of the message, as suggested by the examples given in the article).

    In particular, should the length of M be known to the attacker as an upper bound (M is no longer than 64 bytes, say, or about 2% of this comment) then this method reduces to taking all values from 0 to 21024 and working backwards, for a total of 21025-1 tries. If the attacker knows the exact length of M, then the values to be taken are from 22lM-1 to 22lM -1, or roughly speaking 22lM.

    The variance seen in practice, where a short 63 bit (8 characters) message required from a minimum of 231 to a maximum of 264 steps and ended with a S from a minimum size of 194 to a maximum size of 228 bits would suggest that assumptions to be made in reversing the hash can't be very narrow. That 228 to 194 bit differential in final S accounts for an immense numeric space, longer certainly than the entire strength of any and all hashing schemes currently in use.

    It should also be pointed out that with each successive step, the possible message space doubles, so if the attacker is expecting to find a collision for a hash of a message that is precisely 64 bytes long then he is looking at 21024 distinct messages to be processed against the 21025 machine states (because at every successive step, the message bit corresponding could be either 0 or 1). A nontrivial task as such.

    It would be very interesting to see a better-than-bruteforce attack described for this hashing scheme, and I strongly urge all cryptographers, of the actual as opposed to the commonly seen pretend kind, to set to work and make a name for themselves through presenting such an attack.

    It would also be very interesting to see efficient hash and reverse-hash processes implemented, so actual timing (in both terms of microseconds and cycles) can be calculated. So far there's been a remarkable resistence to any sort of useful work doing on this front, notwithstanding that unlike most other activities, this is both approachable and

    That is all.

  4. This is a very good likbez re subj of hash collisions.

    However, my own research has not turned up anything encouraging that would differentiate the current state of the art re: collision-resistance of any known hash algo, from the quite similarly pseudoscientifically alleged attack-resistance of any block ciphers. Exactly same arguments from "I find it confusing, therefore the enemy must be similarly confused", "let's mix it up! and again! and again!" and other claptrap in the same vein.

    In so far as is known to me, no known hash algo has any (publicly known, various USG algos may well have nonpublic proofs of weakness...) actual scientific basis whatsoever. Nor are any statements that have been uttered regarding their collision resistance, supported by actual logic of whatever kind. (Wishful thinking, and "not broken yet!" are not logic.) The only thing that is easily turned up is the record of hash algos known to be broken (collisions known to have been found on demand at reasonable expense.) Which is not particularly useful.

  5. Mircea Popescu`s avatar
    5
    Mircea Popescu 
    Friday, 13 January 2017

    This is fair, and I readily admit that I have no further basis to suspect the scheme described may be workable than "it feels this way to me". I know of no proper approach to the matter, which is why I've yet to do anymore than that myself.

    It would be not at all an easy task to undertake defending the proposition that sha-512 say is any stronger than the above.

  6. The only thing that I was able to establish with certainty is that if ~anyone~ knows of a basis to logically support the strength of even one yet-unbroken hash algo -- they aren't telling. And that discussing the subj with so-called "professional" cryptographers is an activity of strictly entomological interest.

  7. Mircea Popescu`s avatar
    7
    Mircea Popescu 
    Friday, 13 January 2017

    Myeah. I don't see much path forward other than actually doing the basic work of implementation.

  8. Should not flipping 2nd (0th, 1st, 2nd) bit of 010 result in 011, and is not remainder of 15 against 4 3?

  9. Mircea Popescu`s avatar
    9
    Mircea Popescu 
    Wednesday, 1 March 2017

    Do you actually mean that

    The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of S, obtaining 1.

    should in fact read

    The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of S, obtaining 2.

    and that the rest of the operations are incorrect from there on ?

    I'd say this is not the case, the bitcount of S is 3, and 10 % 3 = 1.

    If you actually mean something else please give explicit difs, it's rather impossible to guess-follow otherwise.

  10. Sorry, I'll try again.

    The last step, we multiply 4 with our position in M, to obtain the value 20, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "000".

    Should read:

    The last step, we multiply 4 with our position in M, to obtain the value 20, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "011".

    "2nd bit" being in the last position, as we zero-index (0th, 1st, 2nd).

    And...

    The last step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of R, obtaining 1. We thereby flip the 1st bit in R, thus R becomes "1001".

    Should read:

    The last step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of R, obtaining 3. We thereby flip the 3rd bit in R, thus R becomes "1100".

    As the bit-count of R is four; 15 mod 4 returns 3.

  11. Mircea Popescu`s avatar
    11
    Mircea Popescu 
    Wednesday, 1 March 2017

    Ah right you are, seems I fucked it up.

  12. Mircea Popescu`s avatar
    12
    Mircea Popescu 
    Saturday, 10 June 2017

    The spec was updated as per discussion in the logz.

  13. #!/usr/bin/env python
    from sys import argv
    
    def stringify(field):
        return ''.join(map(str, field))
    
    def flip(field, index):
        field[index] = 0 if field[index] == 1 else 1
    
    def invert(field):
        for i in range(0, len(field)):
            flip(field, i)
    
    def screw(a, b, half):
        count = len(a)/2 if half else len(a)
        for i in range(0, count):
            flip(b, ((i * M_pos) % len(b)))
    
    def expand():
        S.append(0)
    
    message = argv[1]
    R_len = int(argv[2])
    
    M = list(map(int, ''.join(map(lambda p: format(p, 'b'), map(ord, message)))))
    S = [0]
    R = [0] * R_len
    M_pos = 0
    step = 0
    
    while M_pos < len(M):
        if M[M_pos] == 0:
            expand()
            screw(S, R, False)
            if R[M_pos%R_len] == 0:
                flip(R, M_pos%R_len)
                if M_pos != 0: M_pos -= 1
            else:
                flip(R, M_pos%R_len)
                invert(S)
        else:
            screw(S, R, True)
            if R[M_pos%R_len] == S[M_pos%len(S)]:
                expand()
                screw(R, S, False)
            else:
                flip(R, M_pos%R_len)
        M_pos += 1
        step += 1
    
    print(stringify(R))
  14. package main
    
    import (
        "flag"
        "fmt"
        "strings"
        "strconv"
        "runtime"
    )
    
    func toBinArray(message string) []int {
        binString := ""
        var binArray []int
    
        for _, c := range message {
            binString = fmt.Sprintf("%s%b", binString, c)
        }
    
        for _, i := range strings.Split(binString, "") {
            j, err := strconv.Atoi(i)
            if err != nil {
                panic(err)
            }
            binArray = append(binArray, j)
        }
    
        return binArray
    }
    
    func flip(field []int, index int) {
        if field[index] == 0 {
            field[index] = 1
        } else {
            field[index] = 0
        }
    }
    
    func invert(field []int) {
        for i, _ := range field {
            flip(field, i)
        }
    }
    
    func screw(a []int, b []int, M_pos int, half bool) {
        count := 0
        if half {
            count = len(a)/2
        } else {
            count = len(a)
        }
    
        for i := 0; i
  15. Mircea Popescu`s avatar
    15
    Mircea Popescu 
    Saturday, 10 June 2017

    Thanx for posting!

  16. single threaded mpfhf-golang can provide a 32-bit mpfhf hash of GPLv2 in ~7m

    http://wotpaste.cascadianhacker.com/pastes/nfTbm/?raw=true

  17. Mircea Popescu`s avatar
    17
    Mircea Popescu 
    Saturday, 10 June 2017

    Sounds just about as it should be.

  18. In the article, footnote 2, it describes the screw going from 1 to the number of bits, but Sina's python implementation posted goes from 0 to number of bits - 1. Should one of these be corrected?

  19. Mircea Popescu`s avatar
    19
    Mircea Popescu 
    Saturday, 10 June 2017

    Yes, the footnote. Basically we encountered some strange in terms of counterintuitive behaviour, which frankly is still being investigated because wtf.

  20. Mircea Popescu`s avatar
    20
    Mircea Popescu 
    Friday, 30 June 2017

    The article was substantially edited today. I expect we have a mature item now, for which reason the title changed also.

  21. The wording for the screw operation is slightly ambiguous. When you say "screw R in S", do you mean you are flipping bits of R or flipping bits of S?

  22. Mircea Popescu`s avatar
    22
    Mircea Popescu 
    Thursday, 6 July 2017

    In S -- the first variable of the screw function provides indexing and the second gets operated on. (That's why they're ordered S, R and then R, S. Also that's why the word respectively is in there, this being the convention to avoid ordering ambiguities with a minimum of effort.)

  1. [...] Mircea Popescu Various people pointed out to me that they can't get to the better hash function article through google. Everything else, yes, that -- [...]

  2. [...] public key, it is required to also provide a padding scheme as part of its specification. [↩]MP's Fabulous Hash Function. [↩]Specifically no NIST "standards" are being contemplated or seriously considered. All [...]

  3. [...] the basis of randomly selecting one byte from the UIS, one byte from the ISR, concatenating them, MPFHF-ing the two byte product to obtain a one byte mask. The UMM will never be divulged to the [...]

  4. [...] stuck believing me, I keep detailed logs of this sort of thing), I wrote an implementation of Mircea Popescu's Fabulous Hash Function, in the process finding a few mistakes in his [...]

Add your cents! »
    If this is your first comment, it will wait to be approved. This usually takes a few hours. Subsequent comments are not delayed.