• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

bash scripting

tidehigh

Senior member
I've got a script named script composed of a huge list of sed commands... the script works fine.

But i don't know how to execute it on every file in a directory. Ideally it would execute all the way down the directory tree.

i run it like this script.sh file.cpp

if i try script.sh . , then i get the error . is a directory.
 
ls -1 | while read filename
do
echo "$filename"
# do sed commands on $filename
done

*********************

ls -1 puts a filename on each line. Pipe it to while read filename takes care of filenames with spaces

You don't have to pass args either unless you don't want to ls in the directory that the script is running in. If you want to pass the directory as an arg just change the line to:

ls -1 $1 | while read filename
 
for filename in *
do
...

is a bit more idiomatic.

But shouldn't ls automatically go to 1 per line because it knows it's feeding into a pipe?

tidehigh, did you try "script *"? Can you tell by reading it if it can handle multiple files in one invocation?
 
script * give me the error "directory" is a directory. so i guess i need to change the script to go into the directory and work with the files. I'd like to run this script once on a large file tree.
 
Originally posted by: tidehigh
script * give me the error "directory" is a directory. so i guess i need to change the script to go into the directory and work with the files. I'd like to run this script once on a large file tree.

You have to do a test -d to make sure it is not a directory. If you want to traverse a directory's subdirectories, the easiest way is with recursion.
 
Ok, so without recursion:

for file in *; do [ -d "$file" ] || script "$file"; done

The quotes are cuz when I tested it, I had files with spaces in their names. Run that from within the directory that you want to process.
 
Originally posted by: tidehigh
here's the script.

cat $1 | sed '' | ... | sed '' > tmp
mv tmp $1

Ummmm I'd ignore most of the previous responses, those aren't very robust ways to iterate file sets.
l
try this:

find_script.sh:
#!/bin/sh
basepath=somedir/.

mkdir ${basepath}
cat > ${basepath}/foo <<EOF
dog
bird
cat
EOF

cat > ${basepath}/bar <<EOF
nutria
snake
cat
weasel
EOF

find ${basepath} -type f -print0 | xargs -0 -n1 --no-run-if-empty ./sed_script.sh

sed_script.sh:
#!/bin/sh
filename=$1
tmpfile=`mktemp`
echo file ${filename}
echo tmp ${tmpfile}
sed "{s/cat/dog/}" < ${filename} > ${tmpfile} && mv ${tmpfile} ${filename} || rm -f ${tmpfile}
 
Originally posted by: QuixoticOne
Ummmm I'd ignore most of the previous responses, those aren't very robust ways to iterate file sets.
Please explain how they break.

find ${basepath} -type f -print0 | xargs -0 -n1 --no-run-if-empty ./sed_script.sh
At least the previous examples were portable 😛

Of course, my last example should have been stricter:
for file in *; do [ -f "$file" ] && script "$file"; done
 

For one, mine recurses down the directory tree as the OP desires, while many of the other code fragments do not.
But i don't know how to execute it on every file in a directory. Ideally it would execute all the way down the directory tree.

Secondarily, you've corrected one of the problems with your new code suggestion using "[-f]" as opposed to:
for file in *; do [ -d "$file" ] || script "$file"; done

...where -d ... || command will match things that are not directories, however of course that doesn't mean they're files either. They could be devices, et. al.

And although you used quotes to deal with the possibility that files could have spaces in their names, other examples weren't as robust, so I suggested "find...-print0" and "xargs -0" which use null termination to delimit the filenames and would be immune to general problems with filenames containing spaces or special characters, though of course the invoked sed script would still have to use the filenames properly unglobbed so that any whitespace or special characters in them didn't cause problems. I don't recall if my sed script example did that, but I was assuming that was irrelevant except for demonstration since the OP already has a sed script working that they're happy with, and only wanted to know how to invoke it.

Besides lacking the desired recursion (which is a principal problem), I'm not sure if "for file in *" is going to work properly depending on how many directory entries there are and what lengths those are in the directory. It is possible that the shell could run out of environment / variable space if there were a large number of characters / words in "*". I suppose that's an implementation dependent detail, though usually you do have to be careful about argument length limits and can't always expect to invoke a script like ./foo.sh * due to environment/argument length limits. Though using the 'for' iterator may be more forgiving depending on how it is implemented.

Also depending on your shell's implementation, "*" as an invocation of pathname expansion may not match files whose names begin with "." as in ".ignore" or whatever. Those are presumably valid files to be processed, so one must be careful in setting the shell parameters to match them as desired.

man bash
..
Pathname Expansion
After word splitting, unless the -f option has been set, bash scans each word for the characters *, ?, and [. If one of these characters appears, then the word is regarded as a pat-
tern, and replaced with an alphabetically sorted list of file names matching the pattern. If no matching file names are found, and the shell option nullglob is disabled, the word is
left unchanged. If the nullglob option is set, and no matches are found, the word is removed. If the failglob shell option is set, and no matches are found, an error message is
printed and the command is not executed. If the shell option nocaseglob is enabled, the match is performed without regard to the case of alphabetic characters. When a pattern is used
for pathname expansion, the character ??.?? at the start of a name or immediately following a slash must be matched explicitly, unless the shell option dotglob is set. When matching a
pathname, the slash character must always be matched explicitly. In other cases, the ??.?? character is not treated specially. See the description of shopt below under SHELL BUILTIN
COMMANDS for a description of the nocaseglob, nullglob, failglob, and dotglob shell options.

The GLOBIGNORE shell variable may be used to restrict the set of file names matching a pattern. If GLOBIGNORE is set, each matching file name that also matches one of the patterns in
GLOBIGNORE is removed from the list of matches. The file names ??.?? and ??..?? are always ignored when GLOBIGNORE is set and not null. However, setting GLOBIGNORE to a non-null
value has the effect of enabling the dotglob shell option, so all other file names beginning with a ??.?? will match. To get the old behavior of ignoring file names beginning with a
??.??, make ??.*?? one of the patterns in GLOBIGNORE. The dotglob option is disabled when GLOBIGNORE is unset.

Of course you'd have to be careful with IFS et. al. to get the desired word expansion behavior with "*" too:

man bash
...
Word Splitting
The shell scans the results of parameter expansion, command substitution, and arithmetic expansion that did not occur within double quotes for word splitting.

The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words on these characters. If IFS is unset, or its value is exactly
<space><tab><newline>, the default, then any sequence of IFS characters serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters
space and tab are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is
not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is
null, no word splitting occurs.

Explicit null arguments ("" or '') are retained. Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed. If a parameter with no
value is expanded within double quotes, a null argument results and is retained.

Note that if no expansion occurs, no splitting is performed.


As for the non-portability of --no-run-if-empty, true, though I'd expect one would know (or quickly find out when you hit the error message!) if it's supported on one's platform and, if not, change your script behavior to reflect the possibility of getting no file arguments.


Originally posted by: kamper
Originally posted by: QuixoticOne
Ummmm I'd ignore most of the previous responses, those aren't very robust ways to iterate file sets.
Please explain how they break.

find ${basepath} -type f -print0 | xargs -0 -n1 --no-run-if-empty ./sed_script.sh
At least the previous examples were portable 😛

Of course, my last example should have been stricter:
for file in *; do [ -f "$file" ] && script "$file"; done

 
Thanks quixotic, nice explanation. You're right, yours handles recursion much better. I've been able to use "for foo in *" before when * expanded to greater than the number of args allowed to a program, but I've never pushed it to see how far it'll go or how it reacts across shells.

"for foo in * .*" should be enough to fix the . matching issue, although it means you'll also get . and .. even though that will get filtered out by [ -f foo ].

For --no-run-if-empty, -r seems to be at least a bit more portable. It's supported on OpenBSD but not, oddly enough, osx. I thought osx had mostly taken the gnu toolset.
 
Back
Top