Piping in C

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
I'm working on creating a basic Unix shell in C. So far, it can read in commands and arguments and execute them (ex: "man ls" brings up the man page for ls like it should). It also supports inputting from or outputting to other files instead of just stdin/stdout. So, I've been using fork(), dup() and dup2(), execvp(), and so on.

I am now trying to implement piping into the shell. I've found a lot of information on how to pipe two or three commands together, but barely anything on 'n' number of commands ('n' being as many commands as I want to enter in the shell).

Currently, my code sets up a linked list that contains all the separate commands I enter into the command line and the number of arguments for each command. Using the shell code I had before, I have it currently set up to loop through each separate command in the linked list and execute them. So, for example, if I type "man ls | man ls | man ls", it will execute that 3 times (one after the other). I only did this to ensure that my linked list was working properly.

Does anyone have any tips on how I might be able to approach this multiple piping problem? I'm going to keep searching around and try to use what I do know to get it working.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
If one process A is to pipe data to another process B then essentially you need to spawn A and capture stdout to a buffer, then spawn B and feed the buffer contents to stdin. I'm not a unix guy specifically, so I can't help much with the platform specific issues, but there must be a ton of stuff written on how this is done in Linux, for example. In fact you can go get the source for bash and look through it.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
If one process A is to pipe data to another process B then essentially you need to spawn A and capture stdout to a buffer, then spawn B and feed the buffer contents to stdin. I'm not a unix guy specifically

That's how MS-DOS used to do it. However, Unix-based systems with real multiprocessing (and presumably newer Windows) can setup a direct pipe stdout to stdin where the OS handles buffering and concurrency.

The linked list is a good start, though minor parsing quibble: in general you parse concrete syntax into an abstract syntax tree. A linked list is a special case of a tree where each node has only a single child. You might find it more beneficial to generalize your parser to trees when handling more complex, nested, commands.

Since you say that you can use file descriptor functions to connect files to stdin and stdout, it seems like you are most of the way to a solution. Instead of connecting files, you need a pipe. The pipe provides you with an input and an output fd. You just need to get the file descriptor of the first process's stdout and dup2 one end of the pipe to it, then do the same with the second process's stdin and the other end of the pipe. Once you can do this for two processes, you can do it for any number by applying the same technique recursively down the list.
 
Last edited:

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
Since you say that you can use file descriptor functions to connect files to stdin and stdout, it seems like you are most of the way to a solution. Instead of connecting files, you need a pipe. The pipe provides you with an input and an output fd. You just need to get the file descriptor of the first process's stdout and dup2 one end of the pipe to it, then do the same with the second process's stdin and the other end of the pipe. Once you can do this for two processes, you can do it for any number by applying the same technique recursively down the list.

I understand that much about piping. For now, I'm just trying to pipe two commands together. I can't get it to work! I guess I'm just not very clear on how the forking works when it comes down to the first command (parent) and the second (child). There's that and waiting, executing, all while messing with piping and dup2. For some reason, I just can't figure out how to piece things together. I've even looked at tons of example functions that pipe two commands together, but none of those seem to integrate with my code.

For example, I was looking at this link:

http://www.cse.ohio-state.edu/~mamrak/CIS762/pipes_lab_notes.html

It makes sense! But as soon as I try to sit down, I just can't figure it out. I have no idea why, either. Programming everything else in the shell up to now has been pretty straightforward. For now, I'm going to take a break...try to look at it again tomorrow...
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
When you create a pipe you get two file descriptors: input fd and output fd. When you fork a child process, it inherits all the file descriptors of the parent. And when you use dup2, you make a copy of a file descriptor onto a given number. Using exec*() functions won't wipe it out either. So all you need to do is fork a new child, and in the child use dup2 to rename input fd as fd 1 (stdout), then call exec for program1. Then back in the parent shell, fork another child and use dup2 to rename output fd as fd 0 (stdin), then call exec for program2.
 

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
When you create a pipe you get two file descriptors: input fd and output fd. When you fork a child process, it inherits all the file descriptors of the parent. And when you use dup2, you make a copy of a file descriptor onto a given number. Using exec*() functions won't wipe it out either. So all you need to do is fork a new child, and in the child use dup2 to rename input fd as fd 1 (stdout), then call exec for program1. Then back in the parent shell, fork another child and use dup2 to rename output fd as fd 0 (stdin), then call exec for program2.

Thanks! Managed to get two commands piped together. Here's the function I created. It takes two argv type arrays. There's a function I call in the code that creates the full path for execvp, in case you're wondering. I understand this isn't necessary. I just want to make sure I got this down correctly before I attempt piping N number of commands together.

Code:
void runPipedCommands(char **command1, char **command2){

    int pfd[2];
    pid_t pid, pid2;
    int status, status2;
    char *execPath;

    pipe(pfd);

    pid = fork();
    if(pid < 0){
        printf("Error in the parent forking process. &#37;s\n", strerror(errno));
    }
    else if(pid == 0){
        close(pfd[0]);
        dup2(pfd[1], STDOUT_FILENO);
        execPath = lookupPath(command1[0]);
        if(execvp(execPath, command1) < 0){
            printf("Error running parent command. %s\n", strerror(errno));
            exit(1);
        }
    }
    else {
        while(wait(&status) != pid)
            ;
    }

    close(pfd[1]);
    
    pid2 = fork();
    if(pid2 < 0){
        printf("Error in the child forking process. %s\n", strerror(errno));
    }
    else if(pid2 == 0){
        close(pfd[1]);
        dup2(pfd[0], STDIN_FILENO);
        execPath = lookupPath(command2[0]);
        if(execvp(execPath, command2) < 0){
            printf("Error running child command. %s\n", strerror(errno));
            exit(1);
        }
    }
    else {
        while(wait(&status2) != pid2)
            ;
    }

    close(pfd[0]);
}

If that looks good, the logic behind piping multiple commands together using this is really confusing me. And again, thanks for your help so far!

Edit: I was just told by someone else that I should fork once in the shell for the parent process, and then fork within that for the child process. I was also told I need to do recursion so that the Nth command is n generations down within forks (which is still a bit vague).
 
Last edited:

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
All this talk of parent and child I think is causing confusion because you are using it in two different senses: one for pipes, and one for forks. Let's keep parent/child process terminology usage where it belongs: fork. I am referring to the two commands to be piped as "program1" and "program2".

Now remember that you want the shell (parent) to retain its process ID indefinitely. Therefore new programs that you run must be exec'ed in the child process that you spawn using fork.

Next, you want to strengthen your function's type. Right now it is tailored to running only two commands and piping them, and nothing else. Try to think of a way to generalize this function so that it might be composed with itself or other functions. For example, why not add a parameter that lets you call the function with a file descriptor providing the input to command1? And similarly, you could return a file descriptor corresponding to command2's output fd. You can even generalize "command" from a string to a struct which contains the information you need to run a program and its current status -- or do something else with the child process entirely, even.
 

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
All this talk of parent and child I think is causing confusion because you are using it in two different senses: one for pipes, and one for forks. Let's keep parent/child process terminology usage where it belongs: fork. I am referring to the two commands to be piped as "program1" and "program2".

Now remember that you want the shell (parent) to retain its process ID indefinitely. Therefore new programs that you run must be exec'ed in the child process that you spawn using fork.

Next, you want to strengthen your function's type. Right now it is tailored to running only two commands and piping them, and nothing else. Try to think of a way to generalize this function so that it might be composed with itself or other functions. For example, why not add a parameter that lets you call the function with a file descriptor providing the input to command1? And similarly, you could return a file descriptor corresponding to command2's output fd. You can even generalize "command" from a string to a struct which contains the information you need to run a program and its current status -- or do something else with the child process entirely, even.

Ok, that makes sense. Did you happen to have anything to say about my edit at the bottom of that regarding using nested forks instead of doing it one after the other? I can't make sense of nesting forking for this, but what I have makes perfect sense to me.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
I don't think it is necessary for this particular case. Something to keep in mind for nested commands, perhaps.
 

veri745

Golden Member
Oct 11, 2007
1,163
4
81
Thanks! Managed to get two commands piped together. Here's the function I created. It takes two argv type arrays. There's a function I call in the code that creates the full path for execvp, in case you're wondering. I understand this isn't necessary. I just want to make sure I got this down correctly before I attempt piping N number of commands together.

Code:
void runPipedCommands(char **command1, char **command2){

     int pfd[2];
     pid_t pid, pid2;
     int status, status2;
     char *execPath;

     pipe(pfd);

     pid = fork();
     if(pid < 0){
         printf("Error in the parent forking process. %s\n", strerror(errno));
     }
     else if(pid == 0){
            close(pfd[0]);
            dup2(pfd[1], STDOUT_FILENO);
            execPath = lookupPath(command1[0]);
            if(execvp(execPath, command1) < 0){
                printf("Error running parent command. %s\n", strerror(errno));
                exit(1);
            }
    }
    else {
        while(wait(&status) != pid)
            ;
    }

    close(pfd[1]);
    
    pid2 = fork();
    if(pid2 < 0){
        printf("Error in the child forking process. %s\n", strerror(errno));
    }
    else if(pid2 == 0){
        close(pfd[1]);
        dup2(pfd[0], STDIN_FILENO);
        execPath = lookupPath(command2[0]);
        if(execvp(execPath, command2) < 0){
            printf("Error running child command. %s\n", strerror(errno));
            exit(1);
        }
    }
    else {
        while(wait(&status2) != pid2)
            ;
    }

    close(pfd[0]);
}
If that looks good, the logic behind piping multiple commands together using this is really confusing me. And again, thanks for your help so far!

Edit: I was just told by someone else that I should fork once in the shell for the parent process, and then fork within that for the child process. I was also told I need to do recursion so that the Nth command is n generations down within forks (which is still a bit vague).

Code tags are useful
 

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
I don't think it is necessary for this particular case. Something to keep in mind for nested commands, perhaps.

Ok, sounds good.

Another question...as you mentioned, say I want function that pipes two commands together to return a file descriptor so that I can take that output and run it through more commands. Can I just create a second pipe (not using the same array as the first), set the command (in this case, command 2) to output to the input of the 2nd pipe, return the output of the 2nd pipe, and then use that as an input file descriptor when I call another function that runs a command(s)? I'm just not sure if that will "transfer" properly between two separate function calls. Or, would it be better to have a global array to be used for piping?

I'm also thinking I just need the piping function to execute one command. It would use the input from either stdin (if it's the first command) or a file descriptor from a previous call to that function. At the end of the function, it would do the pipe business if there's more commands. I just figure that if I can do this piping stuff "between" function calls, then I really only need the function to execute commands one at at time.

I'll be experimenting with shortly, but if it's not possible or a bad idea, it will give me a chance to stop before I run into too many problems.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
Sounds like good ideas. Remember file descriptors are global to a process, so it doesn't matter what function does what, so long as it's the same process. Of course when you fork the child is working with a copy of the file descriptor table, so any changes to that made in the child will not reflect in the parent (and conversely).
 

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
Sounds like good ideas. Remember file descriptors are global to a process, so it doesn't matter what function does what, so long as it's the same process. Of course when you fork the child is working with a copy of the file descriptor table, so any changes to that made in the child will not reflect in the parent (and conversely).

Cool, I managed to get the piping to work! I'm pretty happy with my solution, as it's fairly simple.

Now I'm having trouble reintegrating redirects into it (< and >). Currently I have a function that searches a command for a >. If it finds it, it makes note of that index. It then uses the name in the index after that, opens it, and returns the file descriptor. Before returning, though, it removes the elements related to redirecting in the array. I then pass the file descriptor like I would for all my other code, like when using piping.

Well, it works in the sense that it will create the output file I specify it to. But then that file is empty. This worked in my previous version of the shell, but that did not have piping. That version also had very messy code, so I'm trying to clean things up and move more things to dedicated functions.

I might post the code later. I'm gonna work on it for a while longer and see what I can figure out.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
I think you should work on having a proper parser that translates concrete syntax (a stream of characters) into abstract syntax (normally a tree data-structure) before anything else. Your shell semantics should be implemented in terms of abstract syntax, not concrete syntax. This will help a lot as you add features. This advice is exactly the same as for implementing any language, and a command-line shell defines a language for interaction with the computer. You don't need to use bison or antlr, a simple hand coded parser should be able to do the trick for you.

Now, it sounds like you have the right idea with implementing the semantics of redirection. A file descriptor is a file descriptor, whether it is created by open or pipe. There is probably a small detail that you missed somewhere. I also recommend that you check all system calls for errors. Even if you simply print the error to the screen and exit, that will help. You can also try using "strace" on your shell to see the results of system calls.

In general, Unix-based systems require programmers to check and act on the result of system call error codes for proper operation, though in practice many people get away with being lazy. But it's a good habit to not be lazy here, especially when learning to write system software like a shell.
 

hans030390

Diamond Member
Feb 3, 2005
7,326
2
76
Now, it sounds like you have the right idea with implementing the semantics of redirection. A file descriptor is a file descriptor, whether it is created by open or pipe. There is probably a small detail that you missed somewhere. I also recommend that you check all system calls for errors. Even if you simply print the error to the screen and exit, that will help. You can also try using "strace" on your shell to see the results of system calls.

I figured it out, and it was a really dumb mistake. I forgot to modify my runCommand function to change to the redirected output if it received a file descriptor. It previously just always went to the piping file descriptor because I wasn't checking things properly in my code! Oh well, at least it works now!

Thanks again for the help. :)