Find orphaned vmdk files via workflow

Extract via PowerCLI the list of “probably” orphaned files -vmdk, vmx etc – across multiples datastores in parallel using PowerShell workflow.

There are already many great resources on how to find orphaned vmdk files in a VMware environment.
The logic from the script below has been initially inspired from a post from Jason Coleman, that has been inspired itself from a script from HJA von Bokhoven modified by Luc Dekens.

This logic was working well when working with one datastore at a time.
However this script was not fast enough when working with a datastore cluster with many large datastores.
A possible solution for this speed issue has been inspired by the following post “PowerCLI and PowerShell Workflows” from Luc Dekens.

The initial script has been “slightly” modified and is now based on four functions.
Get-FilesIdentifiedAsAssociatedToAllVMs

Function Get-FilesIdentifiedAsAssociatedToAllVMs{
<#
.SYNOPSIS
Get file associated to a VM via the API. However some files will not be reported like "ctk.vmdk"

.NOTES
Author: Christophe Calvet
Blog: http://www.thecrazyconsultant.com/

#>
	process{
			try{
				Get-View -ViewType VirtualMachine | foreach-object{
				$VMName = $_.Name
				$VMinstanceUuid = $_.config.instanceUuid
				$Template = $_.config.template
					$_.layoutex.file | foreach-object{
						$Output = New-Object -Type PSObject -Prop ([ordered]@{
							'VMName'= $VMname
							'VMinstanceUuid' = $VMinstanceUuid
							'IsTemplate' = $Template
							'FileKey' = $_.Key
							'FileName' =  $_.Name
							'FileSize' = $_.Size
							'FileType' = $_.Type
							'FileUniqueSize' = $_.UniqueSize
						})					
						Return $Output
					}
				}
			}
			Catch{
					Write-error $_
			}
	}
}

This function will extract the “majority” of files that are associated to all virtual machines and template in a vCenter server.
The key point here is “majority”. Some files associated with a VM will not be extracted.

Get-FileInDatastore

  Function Get-FileInDatastore{
<#
.SYNOPSIS
Extract the list of all files in datastore(S).

.NOTES
Author: Christophe Calvet
Blog: http://www.thecrazyconsultant.com/

.PARAMETER Datastore
Pipe one or many PowerCLI datastore object

.PARAMETER matchPattern
This is the search parameter. By default "*" but it can be replaced by "*.vmdk" or "*.vmx" for example
#>  
  
param(
	[Parameter(Mandatory=$true,ValueFromPipeline=$true)]
	[VMware.VimAutomation.ViCore.Impl.V1.DatastoreManagement.DatastoreImpl]$Datastore,
	[string]$matchPattern = "*"
)
	process{
		try{
			$HostDatastoreBrowserSearchSpec = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec
			$HostDatastoreBrowserSearchSpec.matchPattern = $matchPattern
			$HostDatastoreBrowserSearchSpec.sortFoldersFirst = $true
				$fileQueryFlags = New-Object VMware.Vim.FileQueryFlags
				$fileQueryFlags.fileOwner = $True
				$fileQueryFlags.fileSize = $True
				$fileQueryFlags.fileType = $True
				$fileQueryFlags.modification = $True
			$HostDatastoreBrowserSearchSpec.details = $fileQueryFlags

			$DatastoreName = $Datastore.extensiondata.Name
			$DatastoreUrl = $Datastore.extensiondata.info.url
			$DatastoreBrowser = Get-view -id ($Datastore.extensiondata.Browser)
			$datastorePath = "[" + $DatastoreName + "]"
			$HostDatastoreBrowserSearchResults = $DatastoreBrowser.SearchDatastoreSubFolders($datastorePath,$HostDatastoreBrowserSearchSpec)
				$HostDatastoreBrowserSearchResults | foreach-object{
				$FolderPath = $_.FolderPath
					$_.file | foreach-object{
					$FileTypeFullName = ($_.gettype()).FullName		
						If($FileTypeFullName -ne "VMware.Vim.FolderFileInfo"){			
								$Output = New-Object -Type PSObject -Prop ([ordered]@{
								'DatastoreName'= $DatastoreName
								'DatastoreUrl' = $DatastoreUrl
								'FolderPath' = $FolderPath
								'Path' = $_.Path
								'FullPath' = $FolderPath + $_.Path
								'FileSize' = $_.FileSize
								'Modification' = $_.Modification
								'Owner' = $_.Owner
								'FileTypeFullName' = $FileTypeFullName
								})
						
								Return $Output
						}
					}
				}	
				
		}
		Catch{
				Write-error $_
		}
	}	
}

This function can be used independently and will provide you the list of all files in one or many datastores.
It is also possible to modify the search criteria according to your needs.
For example to extract the list and location of all ISO files across all datastores:
Get-datastore | Get-FileInDatastore -matchPattern “*.iso” | ogv

get-FileInDatastoreWithWorkflow

workflow get-FileInDatastoreWithWorkflow{
<#
.SYNOPSIS
Get all files accross multiple datastores using workflow to increase the speed.

.NOTES
Author: Christophe Calvet
Blog: http://www.thecrazyconsultant.com/

.PARAMETER vCenter
The vCenter name

.PARAMETER session
An existing vCenter session ($global:DefaultVIServer.SessionSecret)

.PARAMETER matchPattern
This is the search parameter. By default "*" but it can be replaced by "*.vmdk" or "*.vmx" for example

.PARAMETER Datastores
A table containing the name of all datastore to analyse.

#>
   param(
   [Parameter(Mandatory=$true)]
	[string]$vcenter,
	[Parameter(Mandatory=$true)]
	[string]$session,
	[string]$matchPattern = "*",
	[Parameter(Mandatory=$true)]
    [string[]]$Datastores

   )

    foreach -parallel ($Datastore in $Datastores){
	
		$DatastoreFiles = InlineScript{
		
	   
  Function Get-FileInDatastore{
<#
.SYNOPSIS
Extract the list of all files in datastore(S).

.NOTES
Author: Christophe Calvet
Blog: http://www.thecrazyconsultant.com/

.PARAMETER Datastore
Pipe one or many PowerCLI datastore object

.PARAMETER matchPattern
This is the search parameter. By default "*" but it can be replaced by "*.vmdk" or "*.vmx" for example
#>  
  
param(
	[Parameter(Mandatory=$true,ValueFromPipeline=$true)]
	[VMware.VimAutomation.ViCore.Impl.V1.DatastoreManagement.DatastoreImpl]$Datastore,
	[string]$matchPattern = "*"
)
	process{
		try{
			$HostDatastoreBrowserSearchSpec = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec
			$HostDatastoreBrowserSearchSpec.matchPattern = $matchPattern
			$HostDatastoreBrowserSearchSpec.sortFoldersFirst = $true
				$fileQueryFlags = New-Object VMware.Vim.FileQueryFlags
				$fileQueryFlags.fileOwner = $True
				$fileQueryFlags.fileSize = $True
				$fileQueryFlags.fileType = $True
				$fileQueryFlags.modification = $True
			$HostDatastoreBrowserSearchSpec.details = $fileQueryFlags

			$DatastoreName = $Datastore.extensiondata.Name
			$DatastoreUrl = $Datastore.extensiondata.info.url
			$DatastoreBrowser = Get-view -id ($Datastore.extensiondata.Browser)
			$datastorePath = "[" + $DatastoreName + "]"
			$HostDatastoreBrowserSearchResults = $DatastoreBrowser.SearchDatastoreSubFolders($datastorePath,$HostDatastoreBrowserSearchSpec)
				$HostDatastoreBrowserSearchResults | foreach-object{
				$FolderPath = $_.FolderPath
					$_.file | foreach-object{
					$FileTypeFullName = ($_.gettype()).FullName		
						If($FileTypeFullName -ne "VMware.Vim.FolderFileInfo"){			
								$Output = New-Object -Type PSObject -Prop ([ordered]@{
								'DatastoreName'= $DatastoreName
								'DatastoreUrl' = $DatastoreUrl
								'FolderPath' = $FolderPath
								'Path' = $_.Path
								'FullPath' = $FolderPath + $_.Path
								'FileSize' = $_.FileSize
								'Modification' = $_.Modification
								'Owner' = $_.Owner
								'FileTypeFullName' = $FileTypeFullName
								})
						
								Return $Output
						}
					}
				}	
				
		}
		Catch{
				Write-error $_
		}
	}	
}
  
			
		Add-PSSnapin VMware.VimAutomation.Core
		 Connect-VIServer -Server $Using:vcenter -Session $Using:session | Out-Null
         Get-datastore -name $using:Datastore | Get-FileInDatastore -matchPattern $using:matchPattern 
		}
		$DatastoreFiles 
		
	}	

}	

Please check the post of Luc Dekens to understand the logic of Workflow
In this case we use a table of “datastore name” as a parameter.
You will notice that the function Get-FileInDatastore is defined in the InlineScript.
So now this function will be executed in parallel across many datatores.

get-ProbablyOrphanedFile

function get-ProbablyOrphanedFile{
<#
.SYNOPSIS
Get all file that are probably orphaned.

.NOTES
Author: Christophe Calvet
Blog: http://www.thecrazyconsultant.com/

.PARAMETER Datastore
Pipe one or many PowerCLI datastore object

.PARAMETER matchPattern
This is the search parameter. By default "*" but it can be replaced by "*.vmdk" or "*.vmx" for example

.PARAMETER SafeSearch
Enabled by default. It should contain only the type of file that can be identified as orphaned. (No ctk.vmdk for example)
When disabled it will report all files not identified as associated to any VMs, it means that they can be associated to some VMs (Like ctk.vmdk) 

#>
	param(
	[Parameter(Mandatory=$true,ValueFromPipeline=$true)]
	$Datastores,
	$matchPattern = "*",
	[boolean]$SafeSearch = $True
	)
	process{
		if ($global:DefaultVIServers.Count -gt 1 -OR $global:DefaultVIServers.RefCount -gt 1 ) {
		Write-error "Only one connection to vCenter allowed"
		}
		Else{
			Try{
				$DatastoresName = $Datastores.Name
				$DatastoreFiles = get-FileInDatastoreWithWorkflow -Datastores $DatastoresName -matchPattern $matchPattern -vcenter $global:DefaultVIServer.NAme -session $global:DefaultVIServer.SessionSecret
				$FilesAssociatedToAllVMs = Get-FilesIdentifiedAsAssociatedToAllVMs

				$FilesNotIdentifiedAsAssociatedToAnyVM = $DatastoreFiles  | foreach-object{
					$FullPath = $_.FullPath
						If ($FilesAssociatedToAllVMs.FileName -notcontains $FullPath){
						Return $_
						}
				}	
					if 	($SafeSearch) {
					$ProbablyOrphanedFiles = $FilesNotIdentifiedAsAssociatedToAnyVM | where{ $_.FileTypeFullName -match "VMware.Vim.Vm*" -OR ($_.FileTypeFullName -eq "VMware.Vim.FileInfo" -AND ($_.Fullpath -match ".vmsd" -OR $_.Fullpath -match ".vmxf" -OR $_.Fullpath -match "aux.xml" -OR $_.Fullpath -match ".vswp" -OR ($_.Fullpath -match ".vmdk" -AND $_.Fullpath -notmatch "ctk.vmdk") -OR ($_.Fullpath -match ".vmx" -AND $_.Fullpath -notmatch ".vmx~" -AND $_.Fullpath -notmatch ".vmx.lck") ))}
					$ProbablyOrphanedFiles
					}
					else{
					$FilesNotIdentifiedAsAssociatedToAnyVM
					}
			}
			Catch{
				Write-error $_
			}	
		}
	}
	
}

The final function glue all functions presented above.
The function “Get-FilesIdentifiedAsAssociatedToAllVMs” is executed AFTER extracting the list of files in datastore(s). It reduces the impact of “false positive”.
In case of storage VMotion during the execution of this function the “orphaned files” will be the location of the files before the migration instead of the location post migration used in production.

More explanation regarding the “safesearch” parameter.
The “Get-FilesIdentifiedAsAssociatedToAllVMs” will report many files associated to all VMs but not all.
All files of type VMware.Vim.Vm* will be identified as associated to a VM like for example the “.log” of type “VMware.Vim.VmLogFileInfo”
For the files of type “VMware.Vim.FileInfo”, while browsing the datastore, this is more challenging.
The following files will not be identified as associated to a VM:
ctk.vmdk
.hlog
.vmx.lck
.vmx~
However the following files will be identified as associated to a VM:
.vmsd /snapshotlist
.vmx / Config
.vmxf / extendedconfig
.vmdk / DiskDescriptor (FOR RDM)
-rdmp.vmdk / diskExtent (FOR RDM)
aux.xml / snapshotmanifestlist
.vswp /swap or unswap

How to use it?
Connect-VIServer -Server “testVC”
#To identify all probably orphaned files
Get-Datastore | where {$_.name -like “SSD*”} | get-ProbablyOrphanedFile -matchpattern “*”| ogv
#To identify all probably orphaned vmdk files
Get-Datastore | where {$_.name -like “SSD*”} | get-ProbablyOrphanedFile -matchpattern “*.vmdk”| ogv
#To identify an orphaned VM (Handy if someone has removed a VM from the inventory by mistake)
Get-Datastore | where {$_.name -like “SSD*”} | get-ProbablyOrphanedFile -matchpattern “*.vmx”| ogv
#To identify all ISOs while searching in paraller accross multiples datastores
Get-Datastore | where {$_.name -like “SSD*”} | get-ProbablyOrphanedFile -matchpattern “*.iso” -SafeSearch $false| ogv
DIsconnect-VIServer -Server “testVC” -confirm:$False

Should you delete automatically the orphaned files?
Legitimate question, short answer NO.
A datastore can be shared across multiples vCenter servers.
I have seen a VM that was not reporting, wrongly, any disks associated to it
If some operations happen at the storage level at the same time like a snapshot or rotation of logs you will end up with “false positives”
So a manual check will be strongly recommended.

Known issues:
Workflow will be limited to 5, even if increasing throttle limit.
This is due to the “inlinescript” and the maximum number of process.
I didn’t found a solution so far on how to increase this number above 5 for a local “workflow”

Time out errors?
I ended up with a time out error when working with a very large NetApp NFS datastore.
Daniel Jensen has described this issue and a possible solution in great details in this post “Orphaned vmdk search return exception on large Datastores”
I will update this post sonn with another workaround.

Leave a Reply

Your email address will not be published. Required fields are marked *