BASH Script, Viewing data from a specific line...

WarDemon666

Platinum Member
Nov 28, 2000
2,224
0
0
I have it working completely, but its verrrryyy slow, the slowest part seems to be fetching a certain line of a file...

Does anyone know of a way to speed up the process?

Attached code is what I have so far...

Please recommend what I could do to make it faster!!

thanks a lot!

Its not very commented but I think you can get the point... the first and last variables are the first and last names... the script gets the first 3 chars of the first name, and the first 3 chars of the last name, and if the user id exists appends a 1,

so youd get

jontom
jontom1
jontom2
harpot
bobdly

etc etc

if something isnt clear let me know, this slow code is really dissapointing... anyone know of a faster way? :)
 

bersl2

Golden Member
Aug 2, 2004
1,617
0
0
So, you want to access text from a file by line number? How about:

awk "NR == $LINE_NUMBER { print \$0 }"
 

WarDemon666

Platinum Member
Nov 28, 2000
2,224
0
0
Originally posted by: bersl2
So, you want to access text from a file by line number? How about:

awk "NR == $LINE_NUMBER { print \$0 }"

alright I tried that, saved me 2 seconds, (11 compared to 13)

is there another way I could speed up this process?

(my teacher told me to redirect the words file to the pwd because it would save time, but i cannot see why. Im pretty sure /usr/share/dict/words is on the local machine and is not a networked drive. he just wanted to sound smart :p

hehehe....

ideas?
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
my guess (and it's only a guess) is that if you wrote it all in something like perl you would get an improvement in performance because you're not starting up a new process for each action you're taking (perl, sed, fgrep, etc).
 

WarDemon666

Platinum Member
Nov 28, 2000
2,224
0
0
Originally posted by: bersl2
What's the deal with running /usr/share/dict/words through sed?

It was my original way of retreiving data from a line of the file



my guess (and it's only a guess) is that if you wrote it all in something like perl you would get an improvement in performance because you're not starting up a new process for each action you're taking (perl, sed, fgrep, etc).

I guess that could be a solution, but id rather it be a shell script, is there a way to make it faster?

The real slow part is still getting the password from the words file (it has over 450000 words in it supposedly)

the awk part took less time...

if I remove the password part the script takes < 3 seconds to complete I believe,,, so its the whole password part thats slow, to get a rand # isnt that bad but thats also kind of slow, any ideas?
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
I recently converted a bash script that ran for a good 48 hours to perl. It took a LOT less time. It went from days to hours. :)
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
Originally posted by: WarDemon666

The real slow part is still getting the password from the words file (it has over 450000 words in it supposedly)

the awk part took less time...

if I remove the password part the script takes < 3 seconds to complete I believe,,, so its the whole password part thats slow, to get a rand # isnt that bad but thats also kind of slow, any ideas?

Sorry, which part is the password part?
 

WarDemon666

Platinum Member
Nov 28, 2000
2,224
0
0
Originally posted by: oog
Originally posted by: WarDemon666

The real slow part is still getting the password from the words file (it has over 450000 words in it supposedly)

the awk part took less time...

if I remove the password part the script takes < 3 seconds to complete I believe,,, so its the whole password part thats slow, to get a rand # isnt that bad but thats also kind of slow, any ideas?

Sorry, which part is the password part?

password part is getting the lines from the password file, ie the awk with NR as suggested by bersl2, or using sed the way I did... that part seems to take the longest, getting id's is kind of slow too, sooo...

I dont know what to do any more
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
instead of doing sed twice, can't you find a way to grab two lines at the same time and concatenate them? that way, instead of going though the words file twice, you go through it once. there should be a way of doing that with awk.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Your bottleneck is going to be the calls to sed/awk/whatever method you use to grab a line out of the file. Because each line in the file can be a different length, the only way to get a specified line is to read through the file until you have read that number of lines (which is O(n/2) amortized, where n is the number of lines in the file). This is extremely wasteful if there are a large number of lines in the file and you are repeating this operation many times. The program ends up being O(m * (n/2)), where n is the number of lines in the file and m is the number of random passwords you need. Since n in your case is quite large (~450000), this will execute very slowly, and will scale VERY poorly.

You could make it a *lot* faster by writing a C/Java program (or maybe a Perl script, although I don't know if it has built-in facilities for this sort of thing) that would parse the file into a big array in memory (allowing much more efficient access to the random word list). This would reduce it to O(n + m) time, where m is the number of random words you need, and would scale a lot better.

Basically, it'll never be "fast" if you have reread the dictionary file every time you want a new word out of it (even if the file is cached entirely in RAM, you'd be wasting a ton of CPU time scanning through the whole thing). Eliminating that step will reduce the asymptotic complexity of the program considerably.
 

WarDemon666

Platinum Member
Nov 28, 2000
2,224
0
0
Originally posted by: Matthias99
Your bottleneck is going to be the calls to sed/awk/whatever method you use to grab a line out of the file. Because each line in the file can be a different length, the only way to get a specified line is to read through the file until you have read that number of lines (which is O(n/2) amortized, where n is the number of lines in the file). This is extremely wasteful if there are a large number of lines in the file and you are repeating this operation many times. The program ends up being O(m * (n/2)), where n is the number of lines in the file and m is the number of random passwords you need. Since n in your case is quite large (~450000), this will execute very slowly, and will scale VERY poorly.

You could make it a *lot* faster by writing a C/Java program (or maybe a Perl script, although I don't know if it has built-in facilities for this sort of thing) that would parse the file into a big array in memory (allowing much more efficient access to the random word list). This would reduce it to O(n + m) time, where m is the number of random words you need, and would scale a lot better.


Ive tried to put it into an array using a bash script, it took waaaayyyy to long just to read it into an array, dont know why but it was pretty crazy....



OOG:

I still have to get two random words in the file, so technically the only way is if I start off at 0, go till the first rand# is found, then keep going to the next random number, then quit. that would be the fastest way IMO. I dont want to add and C/Java/Perl to this, it has to be made completely w/bash

Submitting the project tonight so I need this code cleaned up!!

ideas?!?
 

M00T

Golden Member
Mar 12, 2000
1,214
1
0
Originally posted by: oog
my guess (and it's only a guess) is that if you wrote it all in something like perl you would get an improvement in performance because you're not starting up a new process for each action you're taking (perl, sed, fgrep, etc).

 

oog

Golden Member
Feb 14, 2002
1,721
0
0
Originally posted by: WarDemon666

OOG:

I still have to get two random words in the file, so technically the only way is if I start off at 0, go till the first rand# is found, then keep going to the next random number, then quit. that would be the fastest way IMO. I dont want to add and C/Java/Perl to this, it has to be made completely w/bash

Submitting the project tonight so I need this code cleaned up!!

ideas?!?

as i said, awk can print two lines with a single call. instead of making two separate calls, have you tried doing something like this:

awk "NR == $LINE_NUMBER { print \$0 } NR == $LINE_NUMBER2 { print \$0 }"

it would return the lines at positions $LINE_NUMBER and $LINE_NUMBER2. i have to believe that a single call like this would be faster than two separate calls.
 

Matthias99

Diamond Member
Oct 7, 2003
8,808
0
0
Originally posted by: WarDemon666
Originally posted by: Matthias99
Your bottleneck is going to be the calls to sed/awk/whatever method you use to grab a line out of the file. Because each line in the file can be a different length, the only way to get a specified line is to read through the file until you have read that number of lines (which is O(n/2) amortized, where n is the number of lines in the file). This is extremely wasteful if there are a large number of lines in the file and you are repeating this operation many times. The program ends up being O(m * (n/2)), where n is the number of lines in the file and m is the number of random passwords you need. Since n in your case is quite large (~450000), this will execute very slowly, and will scale VERY poorly.

You could make it a *lot* faster by writing a C/Java program (or maybe a Perl script, although I don't know if it has built-in facilities for this sort of thing) that would parse the file into a big array in memory (allowing much more efficient access to the random word list). This would reduce it to O(n + m) time, where m is the number of random words you need, and would scale a lot better.


Ive tried to put it into an array using a bash script, it took waaaayyyy to long just to read it into an array, dont know why but it was pretty crazy....

If you tried to pull each line out separately (rather than reading through the file once and parsing it), it would, indeed, take a VERY long time on a ~450K line file (it then becomes O(n^2), where n is the number of lines). And I'm not sure how well bash handles VERY large arrays. This is somewhere that Perl (or an actual compiled language) would be much better most of the time. I mean, shell scripting is nice and all, but you gotta use the right tool for the job.

Also, if you just want to grab a few lines out of the file, you wouldn't save much by trying to read it into memory (and the huge amount of memory needed for this particular file might offset any such time savings). But if you need to get thousands and thousands of random words like that, then it will be a lot faster. Any implementation that relies on using repeated sed/awk calls will NOT scale well.