Issue
I’ve been delving deeper into Linux and C, and I’m curious how functions are stored in memory.
I have the following function:
void test(){
printf( "test\n" );
}
Simple enough. When I run objdump on the executable that has this function, I get the following:
08048464 <test>:
8048464: 55 push %ebp
8048465: 89 e5 mov %esp,%ebp
8048467: 83 ec 18 sub $0x18,%esp
804846a: b8 20 86 04 08 mov $0x8048620,%eax
804846f: 89 04 24 mov %eax,(%esp)
8048472: e8 11 ff ff ff call 8048388 <[email protected]>
8048477: c9 leave
8048478: c3 ret
Which all looks right.
The interesting part is when I run the following piece of code:
int main( void ) {
char data[20];
int i;
memset( data, 0, sizeof( data ) );
memcpy( data, test, 20 * sizeof( char ) );
for( i = 0; i < 20; ++i ) {
printf( "%x\n", data[i] );
}
return 0;
}
I get the following (which is incorrect):
55
ffffff89
ffffffe5
ffffff83
ffffffec
18
ffffffc7
4
24
10
ffffff86
4
8
ffffffe8
22
ffffffff
ffffffff
ffffffff
ffffffc9
ffffffc3
If I opt to leave out the memset( data, 0, sizeof( data ) );
line, then the right-most byte is correct, but some of them still have the leading 1s.
Does anyone have any explanation for why
-
using memset to clear my array results in an incorrect (or inaccurate) representation of the function, and
-
what is this byte stored as in memory? ints? char? I don’t quite understand what’s going on here. (clarification: what type of pointer would I use to traverse such data in memory?)
My immediate thought is that this is a result of x86 having an instructions that don’t end on a byte or half-byte boundary. But that doesn’t make a whole lot of sense, and shouldn’t cause any problems.
Solution
Here is a much simpler case of the code you tried to do:
int main( void ) {
unsigned char *data = (unsigned char *)test;
int i;
for( i = 0; i < 20; ++i ) {
printf( "%02x\n", data[i] );
}
return 0;
}
The changes I made is to remove your superfluous buffer, instead using a pointer to test
, use unsigned char
instead of char
, and change the printf
to use %02x
, so that it always prints two characters (it wouldn’t fix the ‘negative’ numbers coming out as ffffff89
or so – that’s fixed with the unsigned
on the data pointer).
All instructions in x86 end on byte boundaries, and the compiler will often insert extra "padding-instructions" to make sure branch-targets are aligned to 4, 8 or 16-byte boundaries for efficiency.
Answered By – Mats Petersson
Answer Checked By – Mildred Charles (BugsFixing Admin)