Issue
UPDATE (06/21/22): See my updated script below, which utilizes some of the answer.
I am building a script to search for $name
through a large batch of CSV files. These files can be as big as 67,000 KB. This is my script that I use to search the files:
Powershell Script
Essentially, I use Import-Csv
. I change a few things depending on the file name, however. For example, some files don't have headers, or they may use a different delimiter. Then I store all the matches in $results
and then return that variable. This is all put in a function called CSVSearch
for ease of running.
#create function called CSV Search
function CSVSearch{
#prompt
$name = Read-Host -Prompt 'Input name'
#set path to root folder
$path = 'Path\to\root\folder\'
#get the file path for each CSV file in root folder
$files = Get-ChildItem $path -Filter *.csv | Select-Object -ExpandProperty FullName
#count files in $files
$filesCount = $files.Count
#create empty array, $results
$results= @()
#count for write-progress
$i = 0
foreach($file in $files){
Write-Progress -Activity "Searching files: $i out of $filesCount searched. $resultsCount match(es) found" -PercentComplete (($i/$files.Count)*100)
#import method changes depending on CSV file name found in $file (headers, delimiters).
if($file -match 'File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File3*'){$results += Import-Csv $file | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
if($file -match 'File4*'){$results += Import-Csv $file | Select-Object *,@{Name='FileName';Expression={$file}} | Where-Object { $_.'Name' -match $name}}
$i++
$resultsCount = $results.Count
}
#if the loop ends and $results array is empty, return "No matches."
if(!$results){Write-Host 'No matches found.' -ForegroundColor Yellow}
#return results stored in $results variable
else{$results
Write-Host $resultsCount 'matches found.' -ForegroundColor Green
Write-Progress -Activity "Completed" -Completed}
}
CSVSearch
Below are what the CSV files look like. Obviously, the amount of the data below is not going to equate to the actual size of the files. But below is the basic structure:
CSV files
File1.csv
1|Moonknight|QWEPP|L
2|Star Wars|QWEPP|T
3|Toy Story|QWEPP|U
File2.csv
JKLH|1|Moonknight
ASDF|2|Star Wars
QWER|3|Toy Story
File3.csv
1,Moonknight,AA,DDD
2,Star Wars,BB,CCC
3,Toy Story,CC,EEE
File4.csv
1,Moonknight,QWE
2,Star Wars,QWE
3,Toy Story,QWE
The script works great. Here is an example of the output I would receive if $name = Moonknight
:
Example of results
A : 1
Name : Moonknight
C: QWE
FileName: Path\to\root\folder\File4.csv
A : 1
Name : Moonknight
B : AA
C : DDD
FileName: Path\to\root\folder\File3.csv
A : JKLH
B : 1
Name : Moonknight
FileName: Path\to\root\folder\File2.csv
A : 1
Name : Moonknight
C : QWEPP
D : L
FileName: Path\to\root\folder\File1.csv
4 matches found.
However, it is very slow, and I have a lot of files to search through. Any ideas on how to speed my script up?
Edit: I must mention. I tried importing the data into a hash table and then searching the hash table, but that was much slower.
UPDATED SCRIPT - My Solution (06/21/22):
This update utilizes some of Santiago's script below. I was having a hard time decoding everything he did, as I am new to PowerShell. So I sort of jerry rigged my own solution, that used a lot of his script/ideas.
The one thing that made a huge difference was outputting $results[$i]
which returns the most recent match as the script is running. Probably not the most efficient way to do it, but it works for what I'm trying to do. Thanks!
function CSVSearch{
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name
)
$files = Get-ChildItem 'Path\to\root\folder\' -Filter *.csv -Recurse | %{$_.FullName}
$results = @()
$i = 0
foreach($file in $files){
if($file -like '*File1*'){$results += Import-Csv $file -Header A, Name, C, D -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,@{Name='FileName';Expression={$file}}}
if($file -like' *File2*'){$results += Import-Csv $file -Header A, B, Name -Delimiter '|' | Where-Object { $_.'Name' -match $Name} | Select-Object *,@{Name='FileName';Expression={$file}}}
if($file -like '*File3*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,@{Name='FileName';Expression={$file}}}
if($file -like '*File4*'){$results += Import-Csv $file | Where-Object { $_.'Name' -match $Name} | Select-Object *,@{Name='FileName';Expression={$file}}}
$results[$i]
$i++
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
}
Solution
Give this one a try, it should be a bit faster. Select-Object
has to reconstruct your object, if you use it before filtering, you're actually recreating your entire CSV, you want to filter first (Where-Object
/ .Where
) before reconstructing it.
.Where
should be a faster than Where-Object
here, the caveat is that the intrinsic method requires that the collections already exists in memory, there is no pipeline processing and no streaming.
Write-Progress
will only slow down your script, better remove it.
Lastly, you can use splatting to avoid having multiple if
conditions.
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
$param = @{
File1 = @{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = @{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = @{}; File4 = @{} # File3 & 4 should have headers ?
}
$results = foreach($file in Get-ChildItem . -Filter file*.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
(Import-Csv @thisparam).where{ $_.Name -match $name } |
Select-Object *, @{Name='FileName';Expression={$file}}
}
if(-not $results) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($results.Count) matches found." -ForegroundColor Green
$results
}
CSVSearch -Name Moonknight
If you want the function to stream results as they're found, you can use a Filter, this is a very efficient filtering technique, certainly faster than Where-Object
:
function CSVSearch {
[cmdletbinding()]
param(
[Parameter(Mandatory)]
[string] $Name,
[Parameter()]
[string] $Path = 'Path\to\root\folder\'
)
begin {
$param = @{
File1 = @{ Header = 'A', 'Name', 'C', 'D'; Delimiter = '|' }
File2 = @{ Header = 'A', 'B', 'Name' ; Delimiter = '|' }
File3 = @{}; File4 = @{} # File3 & 4 should have headers ?
}
$counter = [ref] 0
filter myFilter {
if($_.Name -match $name) {
$counter.Value++
$_ | Select-Object *, @{N='FileName';E={$file}}
}
}
}
process {
foreach($file in Get-ChildItem $path -Filter *.csv) {
$thisparam = $param[$file.BaseName]
$thisparam['LiteralPath'] = $file.FullName
Import-Csv @thisparam | myFilter
}
}
end {
if(-not $counter.Value) {
Write-Host 'No matches found.' -ForegroundColor Yellow
return
}
Write-Host "$($counter.Value) matches found." -ForegroundColor Green
}
}
Answered By - Santiago Squarzon Answer Checked By - Candace Johnson (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.