C strings: replacing a substring with another string

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
I know that this is possible when using an actual string object but I am attempting to do this with a char array. Is there a fancy, easy way to do this using a function such as the string replace function? I want to take a string such as "I am on top of the world" and replace "on top of" with "under". I asked google but I couldn't find anything from it.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Nothing built-in. In-place is problematic since the legnth could increase and (a) overwrite part of the following text and (b) cause a buffer overrun.

Write your own as something like
char* CharReplace( char* dest, const char* source, const char* search, const char* replace, int dest_max ) ;

Use a character-by-character loop, copy unless strncmp shows a match then copy the replacement.
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76

void replace(char str[], char from[], char to[])
{
char str2[strlen(str)];
strcpy(str2,str);
strcpy(strstr(str2,from), to);
strcat(str2,str + strlen(str2)+ strlen(from) - strlen(to));
strcpy(str, str2);
}

This is what I did and it seems to work ok.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,000
126
What does the ASM look like for that line? The compiler must be calling malloc automatically or something.
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
Originally posted by: BFG10K
What does the ASM look like for that line? The compiler must be calling malloc automatically or something.

I don't know how to read ASM out of a C program. If you tell me how, I'll let you know.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,000
126
What is your development platform?

Also there's another problem with that line: you haven't added an extra element for the null byte. When you copy str into str2 it'll crash.
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
I am using a redhat linux: Linux version 2.4.20-8 (gcc version 3.2.2).
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,000
126
Can you see .asm files in the makefile directory?

Also are you sure that program is working? Because it doesn't look like it should ever work for the reason I listed above.
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
I will note that all my substrings that I am replacing with are smaller than the ones that they are replacing. It fails when replacing with a larger string.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,000
126
That strcpy is always over-writing memory it shouldn't be. It might not crash every time because sometimes it's over-writing memory that's been allocated to your program..
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
Originally posted by: ReelC00L
void replace(char str[], char from[], char to[])
{
char str2[strlen(str)];
strcpy(str2,str);
strcpy(strstr(str2,from), to);
strcat(str2,str + strlen(str2)+ strlen(from) - strlen(to));
strcpy(str, str2);
}

This is what I did and it seems to work ok.
Problems:
1. you are corrupting 1 byte of the stack when you do the initial strcpy.
2. If strlen(to) > strlen(from) you will corrupt the stack by the difference in the lengths.

This is the kind of code Microsoft programmers wrote for IE ;)

(the fix is of course to allocate more space to str2, and otherwise it's a good design for fixing small strings with a single occurrence of the substring to replace)
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
So if I allocate str2's size to be strlen(str) + 1 and ensure always strlen(to) < strlen(from), I am ok?
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: ReelC00L
So if I allocate str2's size to be strlen(str) + 1 and ensure always strlen(to) < strlen(from), I am ok?

Yes.

Originally posted by: BFG10K
What does the ASM look like for that line? The compiler must be calling malloc automatically or something.

No, I'm guessing it places the buffer on the stack (hence the stack corruption). You don't need to malloc. If you look at the assembly, you'll probably see a line like "SUB %esp, 8", except the 8 will be generated rather than a constant... though I'm speculating, because I didn't know you could do dynamic array lengths ;).
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
If someone can tell me how to generate assembly code or where to find it if it is already generated somewhere else, I can let you know how it looks. The only output I get from running gcc is the executable file.

Thanks all for the checking of my code. I don't profess to be a C expert. I stumble my way through the coding.
 

Reel

Diamond Member
Jul 14, 2001
4,484
0
76
08048c58 <replace>:
8048c58: 55 push %ebp
8048c59: 89 e5 mov %esp,%ebp
8048c5b: 57 push %edi
8048c5c: 56 push %esi
8048c5d: 53 push %ebx
8048c5e: 83 ec 0c sub $0xc,%esp
8048c61: 89 e7 mov %esp,%edi
8048c63: 83 ec 0c sub $0xc,%esp
8048c66: ff 75 08 pushl 0x8(%ebp)
8048c69: e8 52 f7 ff ff call 80483c0 <_init+0x68>
8048c6e: 83 c4 10 add $0x10,%esp
8048c71: 40 inc %eax
8048c72: 83 c0 0f add $0xf,%eax
8048c75: c1 e8 04 shr $0x4,%eax
8048c78: c1 e0 04 shl $0x4,%eax
8048c7b: 29 c4 sub %eax,%esp
8048c7d: 89 e3 mov %esp,%ebx
8048c7f: 83 ec 08 sub $0x8,%esp
8048c82: ff 75 08 pushl 0x8(%ebp)
8048c85: 53 push %ebx
8048c86: e8 85 f7 ff ff call 8048410 <_init+0xb8>
8048c8b: 83 c4 10 add $0x10,%esp
8048c8e: 83 ec 08 sub $0x8,%esp
8048c91: ff 75 10 pushl 0x10(%ebp)
8048c94: 83 ec 0c sub $0xc,%esp
8048c97: ff 75 0c pushl 0xc(%ebp)
8048c9a: 53 push %ebx
8048c9b: e8 10 f7 ff ff call 80483b0 <_init+0x58>
8048ca0: 83 c4 14 add $0x14,%esp
8048ca3: 50 push %eax
8048ca4: e8 67 f7 ff ff call 8048410 <_init+0xb8>
8048ca9: 83 c4 10 add $0x10,%esp
8048cac: 83 ec 08 sub $0x8,%esp
8048caf: 83 ec 04 sub $0x4,%esp
8048cb2: 53 push %ebx
8048cb3: e8 08 f7 ff ff call 80483c0 <_init+0x68>
8048cb8: 83 c4 08 add $0x8,%esp
8048cbb: 89 c6 mov %eax,%esi
8048cbd: 03 75 08 add 0x8(%ebp),%esi
8048cc0: 83 ec 04 sub $0x4,%esp
8048cc3: ff 75 0c pushl 0xc(%ebp)
8048cc6: e8 f5 f6 ff ff call 80483c0 <_init+0x68>
8048ccb: 83 c4 08 add $0x8,%esp
8048cce: 01 c6 add %eax,%esi
8048cd0: 83 ec 04 sub $0x4,%esp
8048cd3: ff 75 10 pushl 0x10(%ebp)
8048cd6: e8 e5 f6 ff ff call 80483c0 <_init+0x68>
8048cdb: 83 c4 08 add $0x8,%esp
8048cde: 29 c6 sub %eax,%esi
8048ce0: 89 f0 mov %esi,%eax
8048ce2: 50 push %eax
8048ce3: 53 push %ebx
8048ce4: e8 f7 f6 ff ff call 80483e0 <_init+0x88>
8048ce9: 83 c4 10 add $0x10,%esp
8048cec: 83 ec 08 sub $0x8,%esp
8048cef: 53 push %ebx
8048cf0: ff 75 08 pushl 0x8(%ebp)
8048cf3: e8 18 f7 ff ff call 8048410 <_init+0xb8>
8048cf8: 83 c4 10 add $0x10,%esp
8048cfb: 89 fc mov %edi,%esp
8048cfd: 8d 65 f4 lea 0xfffffff4(%ebp),%esp
8048d00: 5b pop %ebx
8048d01: 5e pop %esi
8048d02: 5f pop %edi
8048d03: c9 leave
8048d04: c3 ret

This seems to be the function in assembly. I did not read it through much myself. Sorry if the forum eats the spacing.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I'm thinking this "allocates" space on the stack:
8048c7b: 29 c4 sub %eax,%esp

(Actually a whole bunch of lines do, but this one is allocating a non-constant amount). I don't see where eax gets set though.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
if you want to see the stack corruption, make your own "guard bytes" like this:

int a = 0x0abcd;
char str2[strlen(str)];
int b = 0x0def1;

print out or step / watch in the debugger and notice how one of the 2 changes after the line

strcpy(str2,str);


(this assumes your compiler doesn't re-arrange the order of local variables on the stack, you might need to use single char values instead to stop any re-arranging)