How many different possible combinations are there for a random megabyte?

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: sao123
2 ^ 1048576
too large for any 32bit computer to compute.

That's incorrect; 1,048,756 is the number of bytes in a megabyte (well, a mebibyte, if you want to get specific; a "mega"byte should technically be 1,000,000 bytes), not the number of bits. The number of unique bit combinations in a mebibyte is:

2 ^ ((2^8) * (2^20)) = 2 ^ (2 ^ 28) = 2^(268,435,456)

That's a really, really big number.
 

sao123

Lifer
May 27, 2002
12,653
205
106
I realized my mistake as soon as i walked away from my computer...and went to lunch...

would not the correct value be
2 ^ (8 * 1024 * 1024) or
2 ^ 8388608
 

imported_Nail

Senior member
May 23, 2004
218
1
0
I've decided to chop this into sections of about nine digits for consumption.

2^8388608 == 256^1048576 == 256 * 256 * 256... == (256 * ( 2*10^2 )) + (256 * (5*10^1)) + (256 * (6*10^0))...

So I'll slam all of this into an array and add in sections. Is there an easier way?


Actually, I have a 64-bit machine, yet I don't know how to use the 64-bit functions (oy!). I doubt it would help much either way.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: sao123
I realized my mistake as soon as i walked away from my computer...and went to lunch...

would not the correct value be
2 ^ (8 * 1024 * 1024) or
2 ^ 8388608

2 ^ (256 * 1024 * 1024), not (8 * 1024 * 1024). There are 2^8 = 256 possible values per byte, not 8.

Nail, I'm not sure what you're trying to do with this. You have the number... or do you just want it in decimal?
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: Nail
2 ^ (256 * 1024 * 1024), not (8 * 1024 * 1024).
It's 2^(1024^2) * (2^8 = 256)

Ah, you're right. I put the 2^8 in the exponent instead of 2^3. Grrrr. I did know the first one was wrong, though. :p

8 (2^3) bits per byte, 256 (2^8) permutations per byte. So it's 2^(2^3 * 2^20), or 2^(2^23) = 2^(8,388,608), as given above. However, it's not 2^(2^20) * 2^8, which would be 2^(2^20 + 8), which is not 2^(2^23).

do you just want it in decimal?
Yes.

Well, you can do this:

x = 2^(2^23)
log2(x) = 2^23

And then convert from log2 to log10 (I'm blanking on what you do here), and then get it in the form x = 10^(a) (that is, scientific notation). That would be pretty close. Trying to calculate it out on a computer would be a PITA, as the number of bits required to represent this number is on the order of several megabytes (at least 2^23 bits!)
 

cquark

Golden Member
Apr 4, 2004
1,741
0
0
Originally posted by: Nail
So I'll slam all of this into an array and add in sections. Is there an easier way?

Actually, I have a 64-bit machine, yet I don't know how to use the 64-bit functions (oy!). I doubt it would help much either way.

64-bit integers are much too small for these calculations, as they only go up to around 10^18.

While you could write your own bigint library, you could use one that's already been written. Python will automatically use its bigint library for integers larger than machine size. Perl and Java come with bigint libraries too. It won't be fast, though, and you probably don't want all the decimal places anyway, so I would suggest approximation the way Matthias indicates above.

While python's calculating the result of 2^(2^23), I'll do the approximation by hand:

log_10(2^(2^23)) = log_2(2^(2^23)) / log_2 (10) = 2^23 / 3.32 =~ 2/3 * 2^22

so 2^(2^23) =~ 4.64 * 10^(2^22)

which is a number of about four million digits. The python interpreter takes under half an hour to calculate this on an 400MHz Ultrasparc II, so if you really need the full expansion, it should only take a few minutes on a modern processor.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
You could create your own datatype and program it. Make one variable represent 1-10 digit integers
another variable represent the digits from 11-20, etc.....

Now you just have to make your own add functions or multiply functions and you're good to go. With 10 of such variables, you can make numbers up to a google.
 

sao123

Lifer
May 27, 2002
12,653
205
106
I am performing the calculation...using an array based multiplier.
The array im using has (1024 * 1024 * 4) digits in it. I'm hoping that
a) I wont run out of memory.
b) That will have enough didgits for the calculation.

I suspect it will take about 4-6 hours to complete this task.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Originally posted by: sao123
I am performing the calculation...using an array based multiplier.
The array im using has (1024 * 1024 * 4) digits in it. I'm hoping that
a) I wont run out of memory.
b) That will have enough didgits for the calculation.

I suspect it will take about 4-6 hours to complete this task.

Good luck. I was trying to cut down on time by instead of multiplying by 2 every time by 2^12 every time. Should shorten it the computation a bit unless you're using a loop friendly software. (Matlab hates loops)
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
I have the answer but I can't post it as the number is over 2MB long. If your interested, the starting digits are: 4264487423... and there are exactly 2525223 digits.

I calculated it using the language Haskell which supports arbitrarily large integers.

If anybody is REALLY interested in the results, email me at shalmanese AT dart DOT net DOT au with an offer of webspace and I'll post it up somewhere.
 

sao123

Lifer
May 27, 2002
12,653
205
106
What i want to know is.... if you put this entire number in a notepad (.txt) file, what is the size of that file?
 

f95toli

Golden Member
Nov 21, 2002
1,547
0
0
Each digit(char) requires one byte
So 2525223 bytes I guess.

And btw the latest versions of Malab handles loops quite well (it uses a JIT compiler)
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: sao123
What i want to know is.... if you put this entire number in a notepad (.txt) file, what is the size of that file?

Depends on the encoding. If you take the ASCII representation of it (as f95toli suggests), it will be 2525223 bytes. If it's stored in a binary encoding, an n-bit number requires log2(n) bits to store.
 

gsellis

Diamond Member
Dec 4, 2003
6,061
0
0
Originally posted by: Matthias99
Originally posted by: Nail
2 ^ (256 * 1024 * 1024), not (8 * 1024 * 1024).
It's 2^(1024^2) * (2^8 = 256)

Ah, you're right. I put the 2^8 in the exponent instead of 2^3. Grrrr. I did know the first one was wrong, though. :p

8 (2^3) bits per byte, 256 (2^8) permutations per byte. So it's 2^(2^3 * 2^20), or 2^(2^23) = 2^(8,388,608), as given above. However, it's not 2^(2^20) * 2^8, which would be 2^(2^20 + 8), which is not 2^(2^23).

do you just want it in decimal?
Yes.

Well, you can do this:

x = 2^(2^23)
log2(x) = 2^23

And then convert from log2 to log10 (I'm blanking on what you do here), and then get it in the form x = 10^(a) (that is, scientific notation). That would be pretty close. Trying to calculate it out on a computer would be a PITA, as the number of bits required to represent this number is on the order of several megabytes (at least 2^23 bits!)

Minus 1. ;)
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Originally posted by: Matthias99
Originally posted by: sao123
What i want to know is.... if you put this entire number in a notepad (.txt) file, what is the size of that file?

Depends on the encoding. If you take the ASCII representation of it (as f95toli suggests), it will be 2525223 bytes. If it's stored in a binary encoding, an n-bit number requires log2(n) bits to store.

But the binary encoding method would be very compressable since it would just be a 1 with a whole bunch of zeros after it.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: TuxDave
Originally posted by: Matthias99
Originally posted by: sao123
What i want to know is.... if you put this entire number in a notepad (.txt) file, what is the size of that file?

Depends on the encoding. If you take the ASCII representation of it (as f95toli suggests), it will be 2525223 bytes. If it's stored in a binary encoding, an n-bit number requires log2(n) bits to store.

But the binary encoding method would be very compressable since it would just be a 1 with a whole bunch of zeros after it.

For this particular number (since it's of the form 2^x), that's true. Essentially all you're storing is the exponent (which itself is of the form 2^x). So essentially you only have to save the "x" (which is not of the form 2^x, and so you have to store a full binary expansion of it), and make a note that the actual number you want is 2^(2^x).
 

Shalmanese

Platinum Member
Sep 29, 2000
2,157
0
0
Surprisingly, the resulting text file I have is very uncompressable using RAR compression, something of the order of 50%, I would have expected something closer to 95% since we are only really using 10/256 of the ASCII charecter range. Odd.
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
This C code isn't as optimized as it could be but it will do the trick.
After you compile it just put the exponent as a command-line param, i.e.

2ToTheN 8388608

You can sort of watch the progress by looking at VM Size in task manager -- i.e. if the program is taking 1MB of virtual memory that means it is somewhere around 2^1000000

Code:
#include <iostream>
#include <string>

using namespace std;

//assumes "digits" in string are ascii \0 through ascii 9 (cntrl-i) rather than the characters '0'-'9'
//for speed purposes
void doubleStringValue(string&amp; s)
{
	long len = s.length();
	int carry = 0;
	for(long pos=len-1;pos>=0;pos--)
	{
		char *c=&amp;(s[pos]);
		int digval = (*c << 1) | carry;
		carry = (digval>9);
		*c = (carry)?digval-10:Digval;
	}
	if(carry) s=string("\01")+s;
}	

int main(int argc, char *argv[])
{
	if(argc<2)
	{
		cout << "Usage: " << argv[0] << " <n>\n";
		return 0;
	}

	int n = atoi(argv[1]);
	
	string result="\01";
	for(int x=0;x<n;x++)
	{
		doubleStringValue(result);
	}
	int len=result.length();
	for(x=0;x<len;x++)
	{
		result[x] = '0'+result[x];
	}
	cout << result << "\n";
	return 0;
}
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
I made a version of this a bit over 2 orders of magnitude faster by modifying a bigint class I found online. (the bigint class was pretty slow to start with, biggest changes I made is the way the numbers are stored during calculation instead of being an array of chars it's now a vector<int>, with each vector element representing 9 digits, and I added multilpy and power functions. The power function shifts off the bits of the exponent, essentially processing it in binary, and for each x^(2^n), x^(2^(n+1)) is found by squaring it, which gives the values for each of the 1 bits in the exponent. These are then multiplied together.

I think it should be possible for an additional 4x improvement on AMD64 by changing the storage type from vector<int> to vector<__int64>, and changing the vector<__int64> now used to hold the result before carrys are processed to a vector<__int128>. How do you post/attach code here without it getting UBB processed?

I know it sounds counter-intuitive that the improvement should be more than 2x for doubling the number of bits per register, but the number of operations involved in multiplying is proportional to the number of digits squared, or in this case number of blocks of 9 digits squared, and 64-bit would change those blocks of 9 digits to blocks of 19 digits. Also, working with larger blocks in this case would almost cut the memory and cache bandwidth usage in half due to fewer outer loop iterations.

Anyways, it looks like the best compression for this is UHARC, which gets a 41.7% ratio from the original, beating 7zip by 50K. This is extremely surprising because if you take a RANDOM series of digits, you get the ratio down to 41.524% (1/log 256) just by converting it from decimal chars to binary. This has me thinking just returning digit sections from 2^(somehugenum) is actually an excellent random number generator.

It occurs to me that with the "randomness" of the digits, the branch prediction in determining when to carry has got to be terrible, probably a mispredict rate around 35%, even taking the nearly-always-taken loop branches into account. Made a slight modification to make it essentially always carry (often carrying a zero) to avoid branches. This amazingly yielded more than another order of magnitude improvement by not having to flush pipelines from branch mispredicts so often. (so it's now literally over 1000x faster than it was originally), at least until it starts cache thrashing (around 2^1000000 on a Thunderbird core w/256K L2).

modified bigint.cpp
modified bigint.h
compiled binary (save as and run from command line)
 

SonicIce

Diamond Member
Apr 12, 2004
4,771
0
76
Name Abbr. Size
Kilo K 2^10 = 1,024
Mega M 2^20 = 1,048,576
Giga G 2^30 = 1,073,741,824
Tera T 2^40 = 1,099,511,627,776
Peta P 2^50 = 1,125,899,906,842,624
Exa E 2^60 = 1,152,921,504,606,846,976
Zetta Z 2^70 = 1,180,591,620,717,411,303,424
Yotta Y 2^80 = 1,208,925,819,614,629,174,706,176