I was having trouble with the backups on two of my VM clients that appeared to hang while NetBackup was backing up the VM's. The primary symptom was that clients were inaccessible via the network. Ping tests failed, and even if I have an RDP session open, I was unable to do anything. This caused communication failures between the client and other machines on the network. The problem usually lasted for about 5 minutes, after which the machines were easily accessible again. No errors in the Windows events.
There were no failures of the job itself, so there were no error codes in NetBackup. The job appeared to run normally, and completed just fine.
If I looked in my vCenter console, I could see the backup activity as it occured. The problem appeared during the tail end of the "remove snapshot" phase.
The ESX servers are version 4.0.0, 208167
The VM Client version number is version 4.0
There were other VM clients that share the same hosts and version numbers, and those backups run just fine, and there were no issues with communications while they run.
The solution was to clean out the existing VM Snapshots of the machines in question, and in this case, I had around 100 GB (each) of snapshots from a very long time ago. I removed them (which took several hours), and the backups jobs not only worked better, but the clients never dropped off of the network.
I discovered they were there while trying to isolate issues with snapshots on those VMs in general. I figured I should try to run a manual snapshot in VMware to see if I get similar results, and there they were.
Other things I tried (to no avail):
Uninstalling and reinstalling the VM Ware client.
Removing the VSS driver from the VM tools (which fixed another failing backup client issue).