Java gurus, can I get some IO help here?

AFB · Oct 4, 2004

I am trying to make a download manager. I am using a class download that will do most of the work and will allow you to use multiple connections like FlashGet or the like.

Here is a proof of concept class that copies a file using more than one thread at the same time. The only problem I have is that I can't figure out how change the number of threads after download.

Current algorithm:

1) Call to copy()
2) It makes x amount of Threads using Thread(this) and runs them
3) Check if the length of the destination file is set - set it
4) Divide the file length by the number of threads; multiply this by the id of the thread and you get the end position ; id -1 for the start postion
5)Synch on the file and seek to the current position
6)Read
7)Synch on the other file and seek to the current position
8)Write
9)Go to 5 till you have read till stop

Anyone got any ideas on how to allow it to resize the partitions and add/remove threads?

Edit: Maybe I should have posted the code

AFB · Oct 4, 2004

manly · Oct 4, 2004

I'm not really following (and I'm a bit too lazy to understand your code). You want a fixed-size or dynamically-sized thread pool?

AFB · Oct 4, 2004

Originally posted by: manly
I'm not really following (and I'm a bit too lazy to understand your code). You want a fixed-size or dynamically-sized thread pool?

I want to be able to dynamically add and kill threads. Each thread will copy part of the file. When you add another thread, I need to figure out how to reallocate the partitions.

kamper · Oct 4, 2004

So you want to be in progress of copying a file in parts and then add new threads and have each thread that's already running adjust it's place in the file, keep copying, but somehow not recopy what may have been done before the switch? May I ask what the point is? What's the point of even having multiple threads at all? It won't improve performance because you're i/o bound anyways.

AFB · Oct 4, 2004

Originally posted by: kamper
So you want to be in progress of copying a file in parts and then add new threads and have each thread that's already running adjust it's place in the file, keep copying, but somehow not recopy what may have been done before the switch? May I ask what the point is? What's the point of even having multiple threads at all? It won't improve performance because you're i/o bound anyways.

Sure it will. This isn't the final result. I am trying to do this with downloads so you can download different parts in different threads. You can usually get full speed in each thread.

kamper · Oct 4, 2004

Well, alright then.

If you start all the threads at evenly spaced intervals and then want to change those intervals after they've started without recopying anything I think you're looking at mind-boggling algorithmic complexity and it'd just be pointless.

What I'd do is, instead of spacing the threads evenly, logically divide the file into, say, 100 parts. Then start the threads on the first X parts and when each one finishes it jumps to another part of the file. If you want to add more threads they just jump in and take another part. You could have one thread act as a sort of server that keeps track of which parts of the file are being worked on and dishes out new parts to each thread that requests one. Another possible advantage of that: if you've got a file that's gigabytes in size (and assuming it's stored contiguously) then your reads will tend to be concentrated in a smaller area of the disk.

Then you could fiddle with changing the size of the slices that you use. Bigger slices would allow for more un-interrupted work but wouldn't be as flexible. Smaller slices should keep reads closer together. With this concurrent writing of the file I'd be interested to see how incredibly fragmented the result would be.

I notice that you are synchronized around source and destination. That means if you have alot of threads they're just going to spend most of their time waiting to use the files. So again I ask: what's the point? It still looks like the theoretical maximum speed is simply one thread occupying source and one thread occupying destination so the best performance would be had with two, maybe 3 threads.

AFB · Oct 4, 2004

Originally posted by: kamper
Well, alright then.

If you start all the threads at evenly spaced intervals and then want to change those intervals after they've started without recopying anything I think you're looking at mind-boggling algorithmic complexity and it'd just be pointless.

What I'd do is, instead of spacing the threads evenly, logically divide the file into, say, 100 parts. Then start the threads on the first X parts and when each one finishes it jumps to another part of the file. If you want to add more threads they just jump in and take another part. You could have one thread act as a sort of server that keeps track of which parts of the file are being worked on and dishes out new parts to each thread that requests one. Another possible advantage of that: if you've got a file that's gigabytes in size (and assuming it's stored contiguously) then your reads will tend to be concentrated in a smaller area of the disk.

Then you could fiddle with changing the size of the slices that you use. Bigger slices would allow for more un-interrupted work but wouldn't be as flexible. Smaller slices should keep reads closer together. With this concurrent writing of the file I'd be interested to see how incredibly fragmented the result would be.

I notice that you are synchronized around source and destination. That means if you have alot of threads they're just going to spend most of their time waiting to use the files. So again I ask: what's the point? It still looks like the theoretical maximum speed is simply one thread occupying source and one thread occupying destination so the best performance would be had with two, maybe 3 threads.

I have found the best amount is four or five. Yeah, thats kind of what I was thinking with the controller to dish it out. I was also thinking about using a map internaly to make a kind of file map of who was working on what part of the file so I could make it redundant. I was thinking like 100KB slices or maybe having it be dynamic depending on the size.

You're the best kamper :beer:

kamper · Oct 4, 2004

Originally posted by: amdfanboy

I have found the best amount is four or five.

Geez, I guess you've put your work into this, you must know what you are doing (still seems incredibly weird to me).

Yeah, thats kind of what I was thinking with the controller to dish it out. I was also thinking about using a map internaly to make a kind of file map of who was working on what part of the file so I could make it redundant.

Make the copying redundant? Man, I so don't understand your requirements

If the first one got it right and the second one got it wrong you'd still have the wrong result.

I was thinking like 100KB slices or maybe having it be dynamic depending on the size.

You're the best kamper :beer:

Oh, I know

AFB · Oct 4, 2004

Ahh, fusk it. I think I may just do a single NIO SocketChannel download and be done with it. I can't even imagine how I could pause this becasue I would have to keep track of so much local data it isn't even funny. I'm sure their is a slick way to so this, but I haven't found it.

Thanks for your help manly and kamper. :beer:

kamper · Oct 5, 2004

Pausing it would be simple. If you have this client/server threads model, when you want to pause it, just stop the server from handing out more sections. The client threads will assume that the file is finished and may as well just die. Then, when you want to resume, tell the server that it can hand out more work (it should still have all it's info) and inject some more threads into the system.

Search

Java gurus, can I get some IO help here?

AFB

Lifer

AFB

Lifer

manly

Lifer

AFB

Lifer

kamper

Diamond Member

AFB

Lifer

kamper

Diamond Member

AFB

Lifer

kamper

Diamond Member

AFB

Lifer

kamper

Diamond Member

TRENDING THREADS