# Towards a better hash function

Thursday, 05 January, Year 9 d.Tr. | Author: Mircea Popescu

This proposal stems from a recent discussion in the forum of The Most Serene Republic, and consists of a readaptation of an older idea of Stan's to the task at handi.

The item here contemplated is then an algorithmic, rather than algebraic, hash function. It consists of three elements and four operations, which we proceed to define :

Elements of the fabulous hash function :

• One element is the message to be hashed, M, which is a field of bits of unspecified length.
• Another element is the so called state machine, S, which starts as one null bit and grows to an unspecified length.
• The last element is the result of the hashing, R, which is a field of bits of user-specified length.

Operations of the fabulous hash function :

• One operation is the bit flip, let it be called flip. This operation consists of toggling one specified bit of either S or R.
• Another operation is the inversion, let it be called invert. This operation consists of toggling all the bits of either S or R.
• A third operation is the flipping of a number of bits in either S or R, let it be called the screw. This operation consists of taking the bit count of either S or R, iterating over that value, at each step multiplying the iterator with the current position in M, calculating the remainder of that product against the bit count of R or S respectively, and flipping the remainder-th bit in R or S respectively. A half-screw will take half the bit count of S or R instead.ii
• The fourth operation is a shift of the state machine, call it expand. This operation consists of adding one null bit at the end of S.
• The last operation is a position rewind, call it a rewind. This operation consists of decreasing our position in the message M by one, except in the case our position is already 0.

On this basis we can now proceed to define our function :

I. The function starts by allocating memory : one bit for S, and the size specified by the user for R. All allocated bits are zero.

II. The function starts at position 0 in M and iterates over each bit in M. These iterations are called steps. During each step, the function considers whether the position-th bit in M is 0 or 1, and executes a defined set of operations in either case. Once the operations have been executed, the position is incremented by one. Once the position is larger than the size of M, the function returns R as the hashed value of M.

One example implementationiii of the fabulous hash function was presented as a prototype. The stepping rule is :

• if 0, expand and screw S in R. If the bit in R found at the position equal to the remainder of the division of our position in M by the size of R is 0
• That bit in R is flipped and we rewind.
• else, that bit in R is flipped and S is inverted.

• if 1, half-screw S in R. If the bit in R found at the position equal to the remainder of the division of our position in M by the size of R is equal to the bit in S found at the position equal to the remainder of the division of our position in M by the size of S
• We expand and then screw R in S.
• else, that bit in R is flipped.

Here's some typical output :

Our work has ended, in 264 steps and using 228 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110010
and the resulting hash is
1111100000011010111001000010111000001111110011001001111011000000

Our work has ended, in 234 steps and using 197 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110011
and the resulting hash is
1101001010111010110011101010000000101111111001100001011011101000

Our work has ended, in 233 steps and using 197 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110100
and the resulting hash is
0001100101111001010001101101111110001110100000000011001000000010

Our work has ended, in 233 steps and using 196 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110101
and the resulting hash is
0011011011111111001001010011110011101010110100101100101110010010

Our work has ended, in 235 steps and using 198 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110110
and the resulting hash is
1001001110010110110101001110100100010001001111010111101111100000

Our work has ended, in 234 steps and using 196 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011110111
and the resulting hash is
0011100100111100011111100100001110111011100101111101000101111110

Our work has ended, in 233 steps and using 197 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011111000
and the resulting hash is
1100100010001110110101111101010100110111001110111101111011011000

Our work has ended, in 233 steps and using 196 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011111001
and the resulting hash is
0110001000000110110111111111011110111111000100111101010001111100

Our work has ended, in 231 steps and using 194 bits for the state machine. The message was
110010011100111100110111001111010001101011110101011001011111010
and the resulting hash is
0100001001110111000110010001100011101101100110010010101011111010

There are a number of important properties to be underlined.

1. The very last bit of M can produce significant change in R, potentially touching all its bits.
2. The fabulous hash-function is message-size indifferent, being operable of messages of any size within the limitations of the hardware only.
3. The fabulous hash-function outputs hashes of any arbitrary length equally well.
4. It is not possible to say, on the basis of a hash, how much memory, or how much CPU was consumed to process the message.
5. The cheapest way to calculate how much memory, or how much CPU will be needed to process a message is to process the message.
6. Because the fabulous hash function does not use blocks, it does not require any kind of padding.

It seems unlikely doing so would be useful, but nevertheless alternative operations could readily be defined.

The actual composition of the step iterator is entirely open, allowing the operator to mix and match operations as he desires, in arbitrarily long chainsiv, with RAM and CPU footprints best adequate to his usecase.

The prototype implementation proposed always terminates ; it is possible to create alternative implementations that do not always terminate, should this be a desirable property (adding more rewinds is a sure ticket in this direction).

———
1. Which is, exactly as he states it, to resolve the major weakness that the hash function can be written as an equation.

The fundamental, intrinsic and so far nor escaped weakness of all currently known hash functions is that a system of equations equal to the count of their possible outputs could be written ; and resolved. No, the usual obscurantism passing for "analysis" in the field does not resolve this problem anymore than sprinkling some truly strong holy water cures lesbianism. []

2. To understand by example, suppose our position in M is 5, R = "1011" and S = "101". If we now screw R in S we will take R's bit count (4), and iterate over it.
• The first step, we multiply 1 with our position in M, to obtain the value 5, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "100".
• The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of S, obtaining 1. We thereby flip the 1st bit in S, thus S becomes "110" (was "100" at previous step).
• The third step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of S, obtaining 0. We thereby flip the 0th bit in S, thus S becomes "010".
• The last step, we multiply 4 with our position in M, to obtain the value 20, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "000".

So therefore, screwing R in S at M-position 5 when R = "1011" and S = "101" results in S becoming "000".

Conversely, if we were to screw S in R we take S's bit count (3), and iterate over it.

• The first step, we multiply 1 with our position in M, to obtain the value 5, and then remainder it against the bit count of R, obtaining 1. We thereby flip the 1st bit in R, thus R becomes "1111".
• The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of R, obtaining 2. We thereby flip the 2nd bit in R, thus R becomes "1101".
• The last step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of R, obtaining 1. We thereby flip the 1st bit in R, thus R becomes "1001".

So therefore, screwing S in R at M-position 5 when R = "1011" and S = "101" results in R becoming "1001". []

3. Approximate ; the concept has evolved somewhat since that prototype. []
4. The example given is two branch, which we would note 1+1. There's nothing (outside of code readability / implementation complexity concerns) to prevent the usage of a 3 + 2 + 3 scheme, whereby each step consists of 3 ifs, each of which branches into two ifs, each of which branches into three more. []

23 Responses

1. > Once the position is larger than the size of M, the function returns R as the hashed value of M.

Why not waltz full-circle (modularly) around the underflow ?
i.e. nextpos

2. 2
Mircea Popescu
Friday, 6 January 2017

Because you gotta stop sometime.

3. 3
Mircea Popescu
Friday, 13 January 2017

Since alf decided to vaguely open the discussion in the log,

asciilifeform: !~later tell mircea_popescu while sleeping i had a thought re http://trilema.com/2017/towards-a-better-hash-function and my earlier attempt thereof likewise : that the thing may not in fact be ~collision-resistant~ (it may not be so difficult to take an arbitrary desired output and find a tape that hashes to it.)

let me point out that I won't entertain discussions pertaining to this hash outside of comments here ; or else articles on blogs which have the capacity to pingback or something. Finding discussions in #trilema logs is already a hard enough problem ; no need to make it even harder for absolutely no reason through lazily misplacing there discussions that do have a natural genesis otherwise.

As to the subject of collision resistance : every hash function has collisions as a matter of definition : if the message M is 65 bits long and the hash result R is set to 64 bits, then there will be on average 264 = 18`446`744`073`709`551`616 collisions over the entire space, or about every other M will collide with something. This results from elementary properties of numbers, you can't shove four apples in two boxes so that there's one apple per box.

So : if R is not shorter than (most) M then the function is not a hashing function (I suspect USGtards would call it a "PRNG") ; if R is shorter than (some) M then there's going to be (some) collision going on. These are the definitions of our terms.

The problem of finding collisions is an entirely different kettle of fish.

Should the length of R be set to one bit, then in a list of arbitrarily chosen Ms every other one would collide, on average, and irrespective of length. Finding collisions would require calculating slightly more than one hash (by a formula produced by l'Hospital, not going to go into detail here).

Should the length of R be set to something reasonable (which, for this hash, as opposed to any other, is very easy to do), and should the length of the message M not be known by the attacker, then the problem he faces consists of finding a message M' so that MPfhf(M) = MPfhf(M'). The brute force method to do this is to take as final S all numeric values from 0 to an arbitrary number T, and work backwards to a legal state. It is to be expected that some of these final values for S the attacker chooses will produce a legal final state (ie, with S = "0") after a finite number of steps. It is however also to be expected that some won't, and as far as I know there isn't currently a way known to establish whether the brute force hash inversor stops after a finite number of steps. (Then again this problem may in fact be mitigated by limiting the hash reverser to a number of steps chosen on the basis of assumptions wrt to the length of the message, as suggested by the examples given in the article).

In particular, should the length of M be known to the attacker as an upper bound (M is no longer than 64 bytes, say, or about 2% of this comment) then this method reduces to taking all values from 0 to 21024 and working backwards, for a total of 21025-1 tries. If the attacker knows the exact length of M, then the values to be taken are from 22lM-1 to 22lM -1, or roughly speaking 22lM.

The variance seen in practice, where a short 63 bit (8 characters) message required from a minimum of 231 to a maximum of 264 steps and ended with a S from a minimum size of 194 to a maximum size of 228 bits would suggest that assumptions to be made in reversing the hash can't be very narrow. That 228 to 194 bit differential in final S accounts for an immense numeric space, longer certainly than the entire strength of any and all hashing schemes currently in use.

It should also be pointed out that with each successive step, the possible message space doubles, so if the attacker is expecting to find a collision for a hash of a message that is precisely 64 bytes long then he is looking at 21024 distinct messages to be processed against the 21025 machine states (because at every successive step, the message bit corresponding could be either 0 or 1). A nontrivial task as such.

It would be very interesting to see a better-than-bruteforce attack described for this hashing scheme, and I strongly urge all cryptographers, of the actual as opposed to the commonly seen pretend kind, to set to work and make a name for themselves through presenting such an attack.

It would also be very interesting to see efficient hash and reverse-hash processes implemented, so actual timing (in both terms of microseconds and cycles) can be calculated. So far there's been a remarkable resistence to any sort of useful work doing on this front, notwithstanding that unlike most other activities, this is both approachable and

That is all.

4. This is a very good likbez re subj of hash collisions.

However, my own research has not turned up anything encouraging that would differentiate the current state of the art re: collision-resistance of any known hash algo, from the quite similarly pseudoscientifically alleged attack-resistance of any block ciphers. Exactly same arguments from "I find it confusing, therefore the enemy must be similarly confused", "let's mix it up! and again! and again!" and other claptrap in the same vein.

In so far as is known to me, no known hash algo has any (publicly known, various USG algos may well have nonpublic proofs of weakness...) actual scientific basis whatsoever. Nor are any statements that have been uttered regarding their collision resistance, supported by actual logic of whatever kind. (Wishful thinking, and "not broken yet!" are not logic.) The only thing that is easily turned up is the record of hash algos known to be broken (collisions known to have been found on demand at reasonable expense.) Which is not particularly useful.

5. 5
Mircea Popescu
Friday, 13 January 2017

This is fair, and I readily admit that I have no further basis to suspect the scheme described may be workable than "it feels this way to me". I know of no proper approach to the matter, which is why I've yet to do anymore than that myself.

It would be not at all an easy task to undertake defending the proposition that sha-512 say is any stronger than the above.

6. The only thing that I was able to establish with certainty is that if ~anyone~ knows of a basis to logically support the strength of even one yet-unbroken hash algo -- they aren't telling. And that discussing the subj with so-called "professional" cryptographers is an activity of strictly entomological interest.

7. 7
Mircea Popescu
Friday, 13 January 2017

Myeah. I don't see much path forward other than actually doing the basic work of implementation.

8. Should not flipping 2nd (0th, 1st, 2nd) bit of 010 result in 011, and is not remainder of 15 against 4 3?

9. 9
Mircea Popescu
Wednesday, 1 March 2017

Do you actually mean that

The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of S, obtaining 1.

The second step, we multiply 2 with our position in M, to obtain the value 10, and then remainder it against the bit count of S, obtaining 2.

and that the rest of the operations are incorrect from there on ?

I'd say this is not the case, the bitcount of S is 3, and 10 % 3 = 1.

If you actually mean something else please give explicit difs, it's rather impossible to guess-follow otherwise.

10. Sorry, I'll try again.

The last step, we multiply 4 with our position in M, to obtain the value 20, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "000".

The last step, we multiply 4 with our position in M, to obtain the value 20, and then remainder it against the bit count of S, obtaining 2. We thereby flip the 2nd bit in S, thus S becomes "011".

"2nd bit" being in the last position, as we zero-index (0th, 1st, 2nd).

And...

The last step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of R, obtaining 1. We thereby flip the 1st bit in R, thus R becomes "1001".

The last step, we multiply 3 with our position in M, to obtain the value 15, and then remainder it against the bit count of R, obtaining 3. We thereby flip the 3rd bit in R, thus R becomes "1100".

As the bit-count of R is four; 15 mod 4 returns 3.

11. 11
Mircea Popescu
Wednesday, 1 March 2017

Ah right you are, seems I fucked it up.

12. 12
Mircea Popescu
Saturday, 10 June 2017

The spec was updated as per discussion in the logz.

13. #!/usr/bin/env python
from sys import argv

def stringify(field):
return ''.join(map(str, field))

def flip(field, index):
field[index] = 0 if field[index] == 1 else 1

def invert(field):
for i in range(0, len(field)):
flip(field, i)

def screw(a, b, half):
count = len(a)/2 if half else len(a)
for i in range(0, count):
flip(b, ((i * M_pos) % len(b)))

def expand():
S.append(0)

message = argv[1]
R_len = int(argv[2])

M = list(map(int, ''.join(map(lambda p: format(p, 'b'), map(ord, message)))))
S = [0]
R = [0] * R_len
M_pos = 0
step = 0

while M_pos < len(M):
if M[M_pos] == 0:
expand()
screw(S, R, False)
if R[M_pos%R_len] == 0:
flip(R, M_pos%R_len)
if M_pos != 0: M_pos -= 1
else:
flip(R, M_pos%R_len)
invert(S)
else:
screw(S, R, True)
if R[M_pos%R_len] == S[M_pos%len(S)]:
expand()
screw(R, S, False)
else:
flip(R, M_pos%R_len)
M_pos += 1
step += 1

print(stringify(R))
14. package main

import (
"flag"
"fmt"
"strings"
"strconv"
"runtime"
)

func toBinArray(message string) []int {
binString := ""
var binArray []int

for _, c := range message {
binString = fmt.Sprintf("%s%b", binString, c)
}

for _, i := range strings.Split(binString, "") {
j, err := strconv.Atoi(i)
if err != nil {
panic(err)
}
binArray = append(binArray, j)
}

return binArray
}

func flip(field []int, index int) {
if field[index] == 0 {
field[index] = 1
} else {
field[index] = 0
}
}

func invert(field []int) {
for i, _ := range field {
flip(field, i)
}
}

func screw(a []int, b []int, M_pos int, half bool) {
count := 0
if half {
count = len(a)/2
} else {
count = len(a)
}

for i := 0; i
15. 15
Mircea Popescu
Saturday, 10 June 2017

Thanx for posting!

16. single threaded mpfhf-golang can provide a 32-bit mpfhf hash of GPLv2 in ~7m

17. 17
Mircea Popescu
Saturday, 10 June 2017

Sounds just about as it should be.

18. In the article, footnote 2, it describes the screw going from 1 to the number of bits, but Sina's python implementation posted goes from 0 to number of bits - 1. Should one of these be corrected?

19. 19
Mircea Popescu
Saturday, 10 June 2017

Yes, the footnote. Basically we encountered some strange in terms of counterintuitive behaviour, which frankly is still being investigated because wtf.

1. [...] Mircea Popescu Various people pointed out to me that they can't get to the better hash function article through google. Everything else, yes, that -- [...]

2. [...] public key, it is required to also provide a padding scheme as part of its specification. [↩]MP's Fabulous Hash Function. [↩]Specifically no NIST "standards" are being contemplated or seriously considered. All [...]

3. [...] the basis of randomly selecting one byte from the UIS, one byte from the ISR, concatenating them, MPFHF-ing the two byte product to obtain a one byte mask. The UMM will never be divulged to the [...]

4. [...] stuck believing me, I keep detailed logs of this sort of thing), I wrote an implementation of Mircea Popescu's Fabulous Hash Function, in the process finding a few mistakes in his [...]

»
If this is your first comment, it will wait to be approved. This usually takes a few hours. Subsequent comments are not delayed.