Couple of c++ questions about string manipulation (strtok, strcat)

Sophia

Senior member
Apr 26, 2001
680
0
0
Two quick questions:

Using g++ (gcc-2.95.1) to compile, I've been having no luck with strtok. So I dig up a simple strtok example from the internet to see what's up (see below). This complies with out error, but when I run the executable (under FreeBSD) I get the message: Bus error (core dumped). What might the problem be?

#include <string.h>
#include <stdio.h>
#include <iostream.h>

int main()
{
char* s1 = "tokenize.me. i am a.tokenizer string";
char* s2 = ". "; // delimiters are '.' and ' ' (space)
char* next_token = strtok(s1, s2);
while (next_token) // strtok returns 0 if no tokens left
{
// print each token to the screen
cout << next_token << endl;
next_token = strtok(0, s2); // must pass null pointer to
// continue to tokenize s1,
// or else tokenization restarts
}

return 1;
}

Also, I'd like to be able to "strcat" an integer to a character string (using a variable such as "i-1", not a fixed "5" e.g.) but the "itoa" function is unsupported in g++. Is there an easy work around to convert an integer into a character?


 

Turkey

Senior member
Jan 10, 2000
839
0
0
I think the problem may be in the delimiters... try adding \0 in there and see what happens. Or just run it in gdb... it'll stop at the erroneous line.

But the key question is, why are you not using c++? ;)
 

manly

Lifer
Jan 25, 2000
13,076
3,835
136
Code compiles okay w/ g++ 2.95.3

Running it on my Linux box results in a segfault. Running it in gdb results in correct behavior. :confused:

I think the standard C equivalent to itoa is sprintf.
 

skriefal

Golden Member
Apr 10, 2000
1,424
3
81
Hmm. Note the following from the strtok() man page:


<< BUGS
Never use these functions. If you do, note that:

These functions modify their first argument.

The identity of the delimiting character is lost.

These functions cannot be used on constant strings.

The strtok() function uses a static buffer while parsing, so it's
not thread safe. Use strtok_r() if this matters to you.
>>



On a hunch, I replaced your declaration of s1 and s2 with the following and reran the program... and it worked:



<<
char* s1 = (char*) malloc( 40 * sizeof(char) );
char* s2 = (char*) malloc( 3 * sizeof(char) );
strcpy( s1, "tokenize.me. i am a.tokenizer string" );
strcpy( s2, ". " );
>>



It's been a while since I've used C++ for much, but my suspicion is that your declaration of the char* using string literals results in the usage of const strings. And as the man page snippet above states, that won't work. Anyone know if this is correct?
 

MGMorden

Diamond Member
Jul 4, 2000
3,348
0
76


<< Hmm. Note the following from the strtok() man page:


<< BUGS
Never use these functions. If you do, note that:

These functions modify their first argument.

The identity of the delimiting character is lost.

These functions cannot be used on constant strings.

The strtok() function uses a static buffer while parsing, so it's
not thread safe. Use strtok_r() if this matters to you.
>>



On a hunch, I replaced your declaration of s1 and s2 with the following and reran the program... and it worked:



<<
char* s1 = (char*) malloc( 40 * sizeof(char) );
char* s2 = (char*) malloc( 3 * sizeof(char) );
strcpy( s1, "tokenize.me. i am a.tokenizer string" );
strcpy( s2, ". " );
>>



It's been a while since I've used C++ for much, but my suspicion is that your declaration of the char* using string literals results in the usage of const strings. And as the man page snippet above states, that won't work. Anyone know if this is correct?
>>



strcpy () automatically puts in null bytes ('\0') to terminate the string. I'm almost postive that just saying:
char * string = "stuff for string"
leaving out the null bytes, so the program goes past the end of the string looking for data (doesn't know when to stop) and ultimately seg faults.
 

manly

Lifer
Jan 25, 2000
13,076
3,835
136


<<

strcpy () automatically puts in null bytes ('\0') to terminate the string. I'm almost postive that just saying:
char * string = "stuff for string"
leaving out the null bytes, so the program goes past the end of the string looking for data (doesn't know when to stop) and ultimately seg faults.
>>



I'm not much of a C hacker these days, but I'm pretty sure C string literals are null-terminated. If it didn't do so automatically (and the programmer didn't null-terminate the buffer), most of the string library functions would break.

skriefal's commentary looks very plausible though. The problem isn't with literals per se, but with literal C strings that are optimized into pointers into a constant string pool.

I also mentioned that strangely enough, the above code works correctly under GDB, but not otherwise. Does this mean that GDB links the code to a debug runtime C library?
 

manly

Lifer
Jan 25, 2000
13,076
3,835
136
Also, if a C string literal weren't null-terminated, then the canonical strcpy implementation would tromp all over the destination buffer as well. Just think about it for a minute. :)
 

skriefal

Golden Member
Apr 10, 2000
1,424
3
81


<< I'm pretty sure C string literals are null-terminated >>



Yes, string literals are definitely null-terminated. It's quite easy to verify this; just create a string literal (as in the s1 of the original post). Then loop through each character and dump the int value of the character -- and you'll see that there's always a null after the last character.

Or you could simply accept manly's "just think about it for a minute" proof :).
 

PCHPlayer

Golden Member
Oct 9, 2001
1,053
0
0
Actually just write this program.

main()
{
char str[] = "hello";
printf("%d\n", sizeof(str));
}

It will output 6 as the size of str.
 

HigherGround

Golden Member
Jan 9, 2000
1,827
0
0


<<

main()
{
char str[] = "hello";
printf("%d\n", sizeof(str));
}

>>



char str[] = "hello";

and

char* str = "hello";

are different declarations ( one creates a pointer to a NULL terminated string the other creates an array ), either way both create a valid string literal, the difference is, like skriefal suggested, that the first assignment creates a const qualified string and since strtok attempts to butcher the first argument by NULL terminating each token, it dies since the write access to the const object is prohibited.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0


<< BUGS
Never use these functions. ...
>>



LoL, I think that says it all!
That's one of the things I like about linux/open source. The brutally honest documentation. It's always nice to get a chuckle when sorting through docs for a tough problem.
When is the last time you got a laugh out of MFC documentation?