• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Opening/Reading arbitrary files in binary?

TecHNooB

Diamond Member
Is there a very straightforward way to do this in C? Can I just open something and look at all the 1s and 0s?
 
Sure. You are supposed to use mode "rb" with fopen to ensure "binary" mode. Then just use fread to fill a buffer with data. fread expects record size and count parameters to be supplied, so it can be conveniently used to read structs directly.
 
Alright, at the moment I just arbitrarily chose a txt file to read and it outputs the hex correctly. As an aside, why are these statements not equivalent?
Code:
printf("%c  %x\n", *buffer, *buffer); *buffer++

and

Code:
printf("%c  %x\n", *buffer, *buffer++);
 
I'll assume you meant "buffer++;" at the end there.

The reason is that C does not specify the order of evaluation of arguments to a function (much like Scheme, oddly). That means the compiler is free to choose to execute "*buffer++" before "*buffer" if it so desires.

In general, do not rely on side-effects that occur inside the arguments to a function call. Do it separately and explicitly, like in the first statement.
 
I'll assume you meant "buffer++;" at the end there.

The reason is that C does not specify the order of evaluation of arguments to a function (much like Scheme, oddly). That means the compiler is free to choose to execute "*buffer++" before "*buffer" if it so desires.

In general, do not rely on side-effects that occur inside the arguments to a function call. Do it separately and explicitly, like in the first statement.
Sorry, but no. http://en.wikipedia.org/wiki/Operators_in_C_and_C%2B%2B

Ansi C has a very strict order of operations definition for every base operator (including ++ and --). ++ and -- just so happen to be on the top of the operator food chain (used as postfix, as a prefix they are on the same level as *, which means that ++*thing is different from *++thing. Which, btw is valid ansi C...). Operators get evaluated before function calls.

So yes, while the order the variables are pushed on the stack isn't defined by C, the order of evaluation is very well defined.

Though, I do agree with you. I would never write *thing++; that just leads to too many headaches for just about everyone.
 
I wasn't talking about the operators, just the arguments to the function. Perhaps I should have been clearer when I said it:

That means the compiler is free to choose to execute "*buffer++" (from argument 2) before "*buffer" (from argument 1) if it so desires.
(fixed)

You wrote:
So yes, while the order the variables are pushed on the stack isn't defined by C, the order of evaluation is very well defined.

Right, this is the way you phrased it, though I would not have talked about variables being pushed on the stack because that implies a specific ABI, and also it glosses over the fact that the arguments were expressions which needed further evaluation (and not just variable look-up).

It is enough to say that "the order of evaluation of function arguments is not defined". At least, in PL theory, which is my field.

Here's how they phrase it in ISO C 9899 6.5.2.2 (10):
The order of evaluation of the function designator,the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point before the actual call.
 
The order of evaluation of the function designator,the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a sequence point before the actual call.

I love a good PL quote in the morning. In other words:
If you play around with side effects, you get burned. Don't do it -- even when you know what you're doing.
 
Another question.. what's a good way of collecting the bits for each symbol for a huffman encoder if you already have the tree assembled? I'm able to draw the tree and figure out what everything is by hand but if i wanted to do it in code, I would have to link to the leaves and roots together such that you can travel back up the tree. Is there a better way?
 
You use bit-wise operations to set/clear bits in a byte or word, and write that to the file.
 
🙁 This is going to be a pita
Not if done right. If you are doing bitwise operations, I recommend the following.

1. Your file usage flow should look something like this open file->load file into a buffer (unless it is too big)->close file->modify buffer, performing all bit operations you like->write the buffer to the file/new file.

I do not recommend a open file->until finished(read byte->write byte)->close file. That just results in WAY too many gets and prints which ends up being slower then it should be. You are most likely not going to run out of memory, so you might as well use it. (plus the first method opens itself quite nicely to an async file io setup...)

2. If the specific bits mean something (they should if you are modifying them) use an enums or macros. They are great tools for making an unreadable bitwise operations much more readable, ie

flags = isHere | wasthere | wantThere;

Much better then

flags = 0x1 | 0x2 | 0x4;
or worse
flags = 0x7;

Don't worry about operator performance here. compilers are smart enough to combine everything so that the extra |s disappear.

3. Learn what the different bitwise operators do/can do. Specifically, you might want to read http://graphics.stanford.edu/~seander/bithacks.html You can do some amazing stuff without branching. ie count the number of 1s in a 32 bit word (seriously, you don't need ifs whiles, or loops to do this, and the code is just about as long to write.)
 
I recommend looking into C bitfields.
e.g.,
http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html

A C compiler will do the masking and shifting and such for you, if you use bit fields properly. Bear in mind that no C compiler has access to a true single-bit operation -- it will generally read/write bytes, though only change the bits you want. It may also do it out of apparent memory order, and cannot make changes atomically, so use caution when clobbering bits in control registers.

As for how/when to open/close your file, it depends on when you need the bits changed with respect to how other processes perceive the file. If you care to elaborate on your needs, we can comment in some detail.
 
Back
Top