Accelerate Your PowerShell Code with Parallel Processes

(source: http://openclipart.org/people/Sirrob01/Cartoon_Robot.svg)

Most systems in Windows-based environments can be queried and/or manipulated with PowerShell. It can be used to: report on workstations & servers across an organization; modify user accounts; dig through registry hives; and so much more. The extensive reach & capabilities of PowerShell is the reason why it is so often implemented to interface with systems on a large scale.

However, despite it’s immense power to operate on scale, by default, PowerShell executes scripts sequentially. This means that code is executed line-by-line, advancing to a subsequent line of code only when the current line of code is completed. This can be an impediment to implementing PowerShell code for big collections of tasks because each task, no matter how awesome & far reaching, is executed one-at-a-time.

Fortunately, there are ways around the sequential nature of PowerShell, which is by parallelizing tasks. This is not a straightforward feature, but it is doable, and learning to do it is worthwhile. In my scripts, I accomplish parallel execution using what is known in PowerShell as ‘jobs’. What PS jobs allow us to do is launch a number of simultaneous background tasks; [optionally] wait for them to complete; and then [optionally] capture their output. Common PowerShell functions are available when running jobs, and local (custom) functions can also be utilized when referenced properly (modules).

I often run queries on large numbers of devices by running tasks in parallel. For example, when running WMI queries on thousands of computers, each task can be fairly time consuming. This depends on the nature of the WMI query, the network connection, and the remote device’s time to respond. A device list of 5,000 devices can require 120 minutes to complete the queries, or approximately 0.024 minutes/device (120/5000). When running tasks in parallel batches, the time to complete each task remains the same. Without taking into consideration the local processor drag from the increased workload, let us generalize anyhow to illustrate the efficiency gains. Taking this same example: 5,000 devices, separated into batches of 50 devices, for a total of 100 batches. The per-task completion time remains the same. Each batch will require 0.024 minutes to complete, or 50 devices/0.024 min = 100 batches/2.4 min = 5,000 devices/2.4 min. In reality, each parallel task requires time to initialize, and the processor drain has a considerable impact on performance. Maybe in the vacuum of outer space it would work faster, but for many tasks within the atmosphere I am able to trim 120 min down to less than 20 min, or ~(-83%). Although it is tempting to launch all background jobs at once, a large number of jobs could crash your computer. Even if it only crashes PowerShell, it may leave a large number of jobs running in the background, and you may have to reboot your machine to terminate them. Or worse, running too many remote tasks at once could set off some alarms and catch the attention of your IT Security team. So, make sure you test your code locally, and then on a small scale before reaching too far & wide, and notify the appropriate parties prior to execution.

The code below performs some network-related tasks in parallel.

[int]$maxConcurrentJobs = 40
[int]$timeLimitperJob = 5 #minutes
[int]$statusUpdateInterval = 1 #seconds. This applies when job count has reached maxJobs threshold.

$moduleDefinition =
{
    Function Get-NetworkConnectionInfo($url)
    {
        $returnArray = @();

        $result = Test-NetConnection $url | Select-Object PingSucceeded, RemoteAddress, @{Name = "RoundTripTime"; Expression = { $_.PingReplyDetails.RoundTripTime }}
        $hopCount = (Test-NetConnection -TraceRoute -ComputerName $url).traceroute.count 

        $item =  New-Object psobject
        $item | Add-Member -NotePropertyName "URL" -NotePropertyValue $url
        $item | Add-Member -NotePropertyName "PingSucceeded" -NotePropertyValue $null
        $item | Add-Member -NotePropertyName "RemoteAddress" -NotePropertyValue $null
        $item | Add-Member -NotePropertyName "RoundTripTime" -NotePropertyValue $null
        $item | Add-Member -NotePropertyName "HopCount" -NotePropertyValue $null

        if($result)
        {
            $item.PingSucceeded = $result.PingSucceeded
            $item.RemoteAddress = $result.RemoteAddress
            $item.RoundTripTime = $result.RoundTripTime
        }
            else
            {
                $item.PingSucceeded = $false;
                $item.RemoteAddress = (Resolve-DnsName $url -type A)[0].IPAddress
            }

        if($hopCount)
        {
            $item.HopCount = $hopCount
        }

        $returnArray += $item;
        return $returnArray;
    }
}

Function PrintStatus()
{
    Param([int]$NumOfItems)

    $completedjobs = (get-job -State Completed).Count
    $runningjobs = (get-job -State Running).Count
    
    write-host "[Running Jobs: $($runningjobs)]/[Completed Jobs: $($completedjobs)]/[Total Items: $($NumOfItems)]/) | [$([math]::Round((100*($completedjobs/$NumOfItems)), 2))%]" 
}

Function CancelOutStandingJobs()
{
    $runningjobs = get-job -State Running
    
    foreach($job in $runningjobs)
    {
        $job.StopJob();
        $job.Dispose();
    }
    get-job | remove-job
}


Function ThrottleJobs()
{
    Param([int]$NumOfItems)

    #The number of concurrent jobs allowed is set by an integer at the beginning of script.
    #This loop essentially blocks the start of new jobs whenever the maximum number of jobs is already running.
    while((Get-Job -State Running).Count -ge $maxConcurrentJobs)
    {               
        $runningjobs = get-job -State Running

        PrintStatus -NumOfItems $NumOfItems
                    
        foreach($job in $runningjobs)
        {
            #dispose of jobs that are stuck. Time limit is defined at start of script. 
            if(((get-date).AddMinutes(-$timeLimitperJob) -ge $job.PSBeginTime))
            {
                $job.StopJob();
                $job.Dispose();
            }
        }
        start-sleep $statusUpdateInterval
    }
}


function WaitforCompletion()
{
    Param([int]$NumOfItems)

    #Wait for running jobs to complete.
    while(get-job -State Running)
    {
        $runningjobs = (get-job -State Running)

        PrintStatus -NumOfItems $NumOfItems

        foreach($job in $runningjobs)
        {
            #dispose of jobs that are stuck. Time limit is defined at start of script. 
            if(((get-date).AddMinutes(-$timeLimitperJob) -ge $job.PSBeginTime))
            {
                $job.StopJob();
                $job.Dispose();
            }
        }
        Start-Sleep $statusUpdateInterval
    }
}

function runJobs()
{   
    Param([array]$listOfUrls)

    $arrayofJobs = @();
    $arrayofData = @();
    $NumOfItems = $listOfUrls.Count

    for($index=0; $index -lt $listOfUrls.Count ; $index++)
    {
         
        $thisUrl = $listOfUrls[$index].ToString().Trim();

        PrintStatus -NumOfItems $NumOfItems    
        
        #Call function to assess job count, and throttle if applicable.   
        ThrottleJobs -NumOfItems $NumOfItems

            #Create a new background job. Because PowerShell does not allow controlling the ID attribute of a job, instead we name the jobs
            #with a controllable integer. Since we are stepping through an array of elements, we name each job by it's $index.
            $arrayofJobs += 
            Start-Job -Name $index -ScriptBlock{
                $modDef = [ScriptBlock]::Create($Using:moduleDefinition)    
                New-Module -Name MyFunctions -ScriptBlock $modDef | out-null; 

                Get-NetworkConnectionInfo @args
            } -ArgumentList $thisUrl
    }

    #Wait for running jobs to complete.
    WaitforCompletion -NumOfItems $NumOfItems

    write-host "100% Complete"

    foreach($job in $arrayofJobs)
    {
        $arrayofData += Receive-Job -Name $job.Name
    }

    return $arrayofData
}

#Import a list of URL's from an external txt file in the same folder as this script.
$listofUrls = Get-Content -Path "$($PSScriptRoot)\listOfUrls.txt"

#Terminate any already running jobs, prior to execution.
CancelOutStandingJobs

runJobs -listOfUrls $listofUrls | Select-Object URL, PingSucceeded, RemoteAddress, RoundTripTime, HopCount | Format-Table

#Terminate any lingering jobs - there should not be any though. 
CancelOutStandingJobs

The code ingests a .txt file that contains one URL per line. Example:

google.com 
youtube.com 
facebook.com 
wikipedia.org 
Nbc.com 
taobao.com 
tmall.com 
yahoo.com 
amazon.com 
twitter.com 
live.com 
instagram.com 
reddit.com 
sina.com.cn 
yandex.ru 
360.cn 
login.tmall.com

Please do not hesitate to contact me if you have any questions, and feel free to comment below!