How can I parse CSV files on the Linux command line? [closed]

How can I parse CSV files on the Linux command line? [closed]

How can I parse CSV files on the Linux command line?
To do things like:
csvparse -c 2,5,6 filename

to extract fields from columns 2, 5 and 6 from all rows.
It should be able to handle the csv file format: http://tools.ietf.org/html/rfc4180 which means quoting fields and escaping inner quotes as appropriate, so for an example row with 3 fields:
field1,”field, number “”2″”, has inner quotes and a comma”,field3

so that if I request field 2 for the row above I get:
field, number “2”, has inner quotes and a comma

I appreciate that there are numerous solutions, Perl, Awk (etc.) to this problem but I would like a native bash command line tool that does not require me to invoke some other scripting environment or write any additional code(!).

Solutions/Answers:

Solution 1:

My FOSS CSV stream editor CSVfix does exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.

Solution 2:

csvtool is really good. Available in Debian / Ubuntu (apt-get install csvtool). Example:

csvtool namedcol Account,Cost input.csv > output.csv

See the CSVTool manual page for usage tips.

Solution 3:

As suggested by @Jonathan in a comment, there is a module for python that provides the command line tool csvfilter. It works like cut, but properly handles CSV column quoting:

csvfilter -f 1,3,5 in.csv > out.csv

If you have python (and you should), you can install it simply like this:

pip install csvfilter

More info at https://github.com/codeinthehole/csvfilter/

Solution 4:

I found csvkit to be useful, it is based on python csv module and has quite a lot of options for parsing complex csv files.

Related:  How to use grep with large (millions) number of files to search for string and get result in few minutes

Although it seems to be a bit slow. I am getting 4MB/s (with 100% cpu) when extracting one field from a 7GB csv with 5 columns.

To extract 4th column from file.csv

csvcut -c 4 file.csv

Solution 5:

Try crush-tools, they are great at manipulating delimited data.
It sounds like exactly what you’re looking for.

Solution 6:

I wrote one of these tools too (UNIX only) called csvprintf. It can also converts to XML in an online fashion.

References