MatPlus.Net

 Website founded by
Milan Velimirović
in 2006

1:00 UTC
ISC 2024
 
  Forum*
 
 
 
 

Username:

Password:

Remember me

 
Forgot your
password?
Click here!
SIGN IN
to create your account if you don't already have one.
CHESS
SOLVING

Tournaments
Rating lists
1-Oct-2024

B P C F





 
 
MatPlus.Net Forum Internet and Computing Recursive compression idea for tablebases (requires high computing power)
 
You can only view this page!
(1) Posted by Sarah Hornecker [Friday, Sep 7, 2018 00:12]; edited by Sarah Hornecker [18-09-07]

Recursive compression idea for tablebases (requires high computing power)


Dear all,

I don't know if this already exists as compression method. There would be two filetypes used in this method. One is a filetree, the other is the compressed file. The filetree would essentially be a file with a folder and files etc. if unpacked. You know what I mean if you click on a for example ZIP file.

The compressed file would consist of three or more parts. First it would have an unique identifier for the format followed by FF. Then it would have an indicator of the number of compressions (base 254) plus the separator FF. So the beginning of the file would be for example:

54 43 46 FF 00 05 FF

After every layer of uncompression the resulting file can be checked by seeing that it has the beginning. This would allow for ovver 65k layers of compression to make it futureproof. xx xx would be one smaller after each decompression layer, with 00 00 indicating that the filetree and not another compression layer follows.

54 43 46 FF xx xx FF

After that beginning there follows a series of mathematical operators based on the n^2 system. It would take each n^2 number as follows (hex vs. what it means)
00 = 0
01 = 1
02 = 2
03 = 4
04 = 8
05 = 16
06 = 32
07 = 64
08 = 128
09 = 256
0A = 512
0B = 1024
etc.

By writing like this, instead of assigning 00 - FF to 0 - 256 we save a lot of space already. Unfortunately we need some space for mathematical operators, which is why we need a lower base than 256 for this. Let us say we use 224, meaning that the room from E0 to FF is assigned for operators. So for example

03 E8

would not mean the 1000th binary number, but the third one (the number 4) which is used with the operator specified at E8 (for example, addition, subtraction, division, multiplication, factorial, etc.).

So by using those mathematical operations one would output the result(which would be a very large decimal number) and convert it to a hexadecimal number, which is the code of the next iteration. So if for example the operation 03 E2 09 is carried out [in practice this wouldn't happen, as 0B can be used instead], the result is 1024, which would be converted into hex as 04 00, which would be the hex code in the next iteration of the file. Now this is of course no valid file (it needs to start with the file format denomnator and FF), but only for example.

I hope it became clear what I mean. Thanks to Hauke Reddmann for improving my idea (I wanted to use primes at first, but he told me that the n^2 numbers works better).
Possibly it can be refined by using n^2+1 instead of n^2, but that would be something mathematicians would need to look at (I'm not sure if I had understood Hauke correctly).

But I believe that this way very large numbers can be compressed into rather small numbers. The question is if a table of the n^2 numbers is already added (which takes space but saves time) or is generated manually by the end-user. Maybe both options should be given. The issue is that the table necessarily would be very large, otherwise the compression would be ineffective.

Let me know your thoughts and if you think this is a good idea. With the Syzygy technology this isn't necessary for 7-piece EGTB, at least.

The big questions would be:
How can we make a computer find the smallest possible number of operators needed to get to each result? For example, how does he know that 16+2+1 is more efficient than 8+1*2+1 (operaors would be carried out in order, no parentheses or prefereence of multiplication/division, in standard notation it would be (8+1)*2+1 of course.) In the above example it is clear, but with more complex and huge numbers that could be an isue. How does the computer for example when compressing know that 4! is more efficient spacewise [but requires more processing power] than 16+8?
 
(Read Only)pid=16786
(2) Posted by Dmitri Turevski [Saturday, Sep 8, 2018 08:52]

 QUOTE 
The big questions would be:
How can we make a computer find the smallest possible number of operators needed to get to each result?


It was proven that no such program can exist, this result is called "the uncomputability of Kolmogorov complexity".
Wikipedia article on "Kolmogorov complexity" describes all that pretty nicely.
 
(Read Only)pid=16793

No more posts


MatPlus.Net Forum Internet and Computing Recursive compression idea for tablebases (requires high computing power)