Python... beats.... Battlestar Powershell

I had last written of my powershell explorations as it related to scouring excel files for certain values. Well, let me tell you… It’s garbage. I found that it would take several minutes to process even 10 files. Knowing I was up against many thousands, I had to regroup. And what I found was Python, and a pretty swell module called “openpyxl”.

By using this module, I was able to use some very efficient openxml based hooks to hammer through over 20,000 excel files and return the results within hours. Yes, it is helpful to know that modern Office files are essentially zip archives containing xml based document information, and that led to this python extension. By removing the need to use Excel directly, it when from 0 to 100 in speed, and what a life saver.

Here is the library and accompanying info: Link

If you ever find yourself having to scan and list a lot of files quick, and they are office files, this is a no brainer.