Powershell scripting, copying a lot of files...

TridenT

Lifer
Sep 4, 2006
16,800
45
91
So, we have some networked drives. There's shit tons of folders and files. I have a program that goes through certain folders and copies those things to others.

The problem I'm having right now is that when it encounters files it already copied it goes hellishly slow. I mean, 1 file every 3-5 seconds. It'll punch out the first few fast but after that... it crawls.

I'm not sure why. I'm using the copy-item cmdlet.

Code:
 try {
  md "$tht\T Drive\" -ea stop
  #$stream.WriteLine("Make directory $tht\T Drive\ was successful")
 }
 catch { #$stream.WriteLine("Folder already exists at $tht\T Drive\") 
 }
 try {
  copy-item $tdrive\* "$tht\T Drive\" -Recurse -ea stop
 } catch {
  if($error[0].Exception.ToString().EndsWith("already exists.")){
   #$stream.WriteLine("Some files may already exist in the $tht\T Drive\")
   copy-item $tdrive\* "$tht\T Drive\" -Recurse #-ea 0
   $error.clear()
  }
  elseif($error[0].Exception.ToString().EndsWith("does not exist.")){
   #$stream.WriteLine("The folder $tdrive was not found")
  }
  else{
   #$stream.WriteLine($error[0].Exception.toString())
  }
 }
I'm doing this more or less. The #$streams are commented out right now because I was testing to see if that was what was slowing it down. It wasn't. :( The $error.clear() is there because I was testing to see if maybe the $error variable was getting bogged down. Nope... didn't fix it.

I'm transfering files from one networked server to another. There's tens of thousands. The first time this program ran it actually went pretty fast. But, it missed a lot of files. I don't have a clue as to why... Recurse should include every file and folder within those folders that I am starting at, yes? Irregardless of file name or folder name? Well, it didn't...

It's pissing me off. I don't have a lot of experience with powershell. So, I'm baffled.

We could do drag and drop, but we're talking hundreds of folders that are being copied. It wouldn't be efficient in the long run. That's why I'm trying to create this script to do it for us. And, no, we cannot just drag the parent folder of all the things... There are certain folders that are supposed to be copied and then there are ones that are not. (The ones we are copying are users that no longer use our system. We're backing up their stuff. Normally this is done manually, but if we just copy and paste a huge list of names... this program can automate the task for us really well. (There's a lot of different areas that these people have their folders hidden in. There's a system. Outside of the code I showed you there is a large system at work.))
 
Last edited:

LumbergTech

Diamond Member
Sep 15, 2005
3,622
1
0
why are you using a script to do this when you could just use a backup program like synkron or synctoy?
 

KB

Diamond Member
Nov 8, 1999
5,406
389
126
You should perform a check to see if the file already exists instead of allowing the code to throw an exception when it exists. Exceptions are very slow to handle.
 

Zargon

Lifer
Nov 3, 2009
12,218
2
76
you can probably use robocopy inside powershell as opposed to just a batch script job



I use bat's to schedule synctoy jobs
 

TridenT

Lifer
Sep 4, 2006
16,800
45
91
why are you using a script to do this when you could just use a backup program like synkron or synctoy?

Because the system that is in place is a lot more complicated than what those programs are made for. We only copy certain folders from certain people and they're in all kinds of places depending on what the person does for their job. There's a parser that figures out the location of the person's folders based upon a larger system. It's likely not possible with those tools unless I can still use powershell to pass the folder locations through. (Even then, it's likely to be slower just because you have to run and close so many programs so many times. A lot of the time the folders don't even exist because the users didn't use them.)

You should perform a check to see if the file already exists instead of allowing the code to throw an exception when it exists. Exceptions are very slow to handle.

I was going to try that but I figured the throwing of exceptions was going to be the same speed. I don't know the command to see if a file exists. I'll have to look it up.
 
Last edited:

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
Actually, this is a perfect example of where rsync would solve all your problems.
 

ThinkOfTheCode

Junior Member
Aug 6, 2012
5
0
66
It's always hard to debug over email but you might want to take a look at http://stackoverflow.com/questions/4676977/powershell-performance-issue. The real meat can be found at http://blogs.msdn.com/b/powershell/archive/2009/11/04/why-is-get-childitem-so-slow.aspx.

They're worried about pull (get-childitem) over a network while you're worried about a push (copy-item) but there may be enough there to give you some ideas.

You can always crank up Microsoft Network Monitor to see what's going on under the cover.

You mention that when first run it goes pretty fast. I'm wondering if it's doing something like the following:

Code:
foreach file on $tdrive          
    grab listOfFiles on $ths\T Drive\          
    if file not in listOfFiles then                  
        copy file to $ths\T Drive\
 
Last edited:

TridenT

Lifer
Sep 4, 2006
16,800
45
91
There's multiple problems I'm seeing now. One of them being that sometimes the file path is too long for copy-item. The limit is 260 characters... I wasn't seeing that one before. Now I am. So, that would be a problem. I'll have to have a separate test for the path being too long. If it's too long then I'll run something else? What should I run?

I've managed to get it more efficient overall though. I think it only bogs down when it finds files that are very large. (100mb+, there are files that are over 1GB in some directories) Which is weird. It shouldn't bog down if it's just checking to see if the file exists.

Here is what I am using more or less right now
Code:
function dostuff($tolocation, $fromlocation){
    if(Test-Path $tolocation)
    {} else
    {
        md $tolocation
    }
    if(Test-Path $fromlocation){
        testandmove $tolocation $fromlocation
    }
    else{
        Write-Host "$fromlocation was not found." -foregroundcolor Magenta
    }
}

function testandmove($thepitfolder, $specialdrive){
    $ultraarray = Get-ChildItem $specialdrive\* -Recurse
    foreach($superitem in $ultraarray){
        $tempstring = $superitem.FullName.substring($specialdrive.Length)
        if(Test-Path $thepitfolder\$tempstring){
            Write-Host "$thepitfolder\$tempstring already exists."
        }
        else{
            copy-item $superitem.FullName $thepitfolder\$tempstring
        }
        
    }
}
 
Last edited:

TridenT

Lifer
Sep 4, 2006
16,800
45
91
so map your shares further up the directory tree?

That isn't a solution to a wider problem. That's a solution to a specific user. I'm dealing with hundreds of users with tens of thousands of files... Considering a lot of them have a knack for making extremely long file names, it wouldn't solve the problem.
 

Zargon

Lifer
Nov 3, 2009
12,218
2
76
the charactor limit is a windows deal, there are ways around it

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

you can also easily script the creation and deletion of mapped shares inside the script using variables ETC

I'm not sure you are in a place to be telling anyone what the solutions are or aren't since you didnt even know about the char limits in the first place :p


whatever files it missed is probably due to errors, is the script running in silent mode? or are you kicking errors to a log file?
 

TridenT

Lifer
Sep 4, 2006
16,800
45
91
the charactor limit is a windows deal, there are ways around it

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

you can also easily script the creation and deletion of mapped shares inside the script using variables ETC

I'm not sure you are in a place to be telling anyone what the solutions are or aren't since you didnt even know about the char limits in the first place :p


whatever files it missed is probably due to errors, is the script running in silent mode? or are you kicking errors to a log file?

It was kicking the errors out to a log file, but it ain't anymore. It just shoots them out at the powershell prompt now. I know what errors I am getting at this point.

And considering robocopy is supposedly able to deal with folder and file names as long as 32,000 characters... It shouldn't be doing this. For some reason it is though. That's really annoying.

Again, mapping the drives to a location further down the chain is not going to solve the problem.
 

ThinkOfTheCode

Junior Member
Aug 6, 2012
5
0
66
Note: I'm still assuming that copy time is still your main issue and that long paths are your second issue.

Without giving out any PII can you answer the following:
  1. How many files are you copying?
  2. What is the approximate total size of the files being copied?
  3. How long does it take to copy the files?
  4. How long does it take to copy a single 1 GB file?
  5. What is the speed of the network card on the sending and receiving machines? 100MB? 1GB? (This is a trick question since your network is only as fast as the slowest piece and you're probably going through a hop or two.)
You'll know your needs better than I would but does it make sense to filter the number of files that you copy? Maybe limit to copying only the files that changed today?
 
Last edited:

Zargon

Lifer
Nov 3, 2009
12,218
2
76
what switches are you using with robocopy? and chance you can paste a few errors?
 

TridenT

Lifer
Sep 4, 2006
16,800
45
91
Note: I'm still assuming that copy time is still your main issue and that long paths are your second issue.
Without giving out any PII can you answer the following:
  1. How many files are you copying?
  2. What is the approximate total size of the files being copied?
  3. How long does it take to copy the files?
  4. How long does it take to copy a single 1 GB file?
  5. What is the speed of the network card on the sending and receiving machines? 100MB? 1GB? (This is a trick question since your network is only as fast as the slowest piece and you're probably going through a hop or two.)
You'll know your needs better than I would but does it make sense to filter the number of files that you copy? Maybe limit to copying only the files that changed today?

1. It goes anywhere from 0 to 10,000+ per user.
2. From 0 to 15gb. It depends on each user.
3. Reasonable time depending on the user. I'd say the speed is probably 8-15mb/second.
4. No idea. I don't really keep tabs. I let the program run in the background. It pops out errors when it needs to.
5. I don't think this detail matters.

Again, we're backing up everything. These are users that are leaving and we have to back up all their shit in case the apocalypse happens and the group I work with needs the info for some reason.


what switches are you using with robocopy? and chance you can paste a few errors?

I'm not using anything. It should work with long file names + long paths by default. If I used /256 then it would have issues with long file names. (At least that's what I read from the documentation)

If I use robocopy it's
Code:
robocopy $fromsomeplace $tosomedestination
.

The errors are generally file/folder too long.

Ah, wait. I just read one. It's Test-Path that is having the trouble actually. I misread. Well, I feel dumb. Still, is there something besides Test-Path that I can use? (I use test-path to test if the file is on the backup server already)
Code:
Test-Path : The specified path, file name, or both are too long. blahblahblah 260 characters.
What's interesting is that everytime robocopy is used I have red text come out saying, "used robocopy" (intentional, my doing). But, it comes out 6 times. This test-path errors comes out once. This is for the same user, over and over. I get the same message no matter how many times I run the program. It does the exact same thing. Robocopy should not be used if the file is copied over successfully and if test-path comes back 'true'... and I should get multiple exceptions thrown instead of just 1 if test-path is failing six times and then robocopy fails to copy six times?

whatiseei.jpg
I blacked out some personal info.
 
Last edited: