Three Sets of Chisels Followup

Tags: programming tools

I had a great followup to yesterday’s post that happened over IM today.

I was wondering what command you’d use to grep through .xls file on linux?

That’s a moment where my knowledge of my favorite tools helped me to eliminate those tools quickly from consideration. The text processing tools of the unix shell are terrible for searching through binary files like Excel spreadsheets.

If I had a single Excel spreadsheet with a ton of data to extract, I might convert it to a CSV file to then manipulate it further in my shell, but even then I’d probably just open it with Libre Office or Gnumeric and call it a day.

What if there were tens or hundreds of spreadsheets to sift through?

  1. I’d talk to one of my Windows + .Net friends to see if PowerShell would work.

  2. or ask them about C# APIs to do the same if PowerShell failed us.

  3. or as a very last resort look to use an open source tool to do the necessary ETL work.

It’s strange writing open source and last resort in the same sentence. I’m the weirdo that kept my music collection encoded with Ogg Vorbis for most of a decade, and I haven’t played a commercial computer game since Unreal Tournament ran on my old Gentoo desktop. Maybe it’s old age that’s moving me from zealous to pragmatic, but I think I’ve realized that work isn’t about software for the sake of software. It’s about software for the sake of getting shit done.