How can I parse CSV files on the Linux command line? [closed]
How can I parse CSV files on the Linux command line?
To do things like:
csvparse -c 2,5,6 filename
to extract fields from columns 2, 5 and 6 from all rows.
It should be able to handle the csv file format: http://tools.ietf.org/html/rfc4180 which means quoting fields and escaping inner quotes as appropriate, so for an example row with 3 fields:
field1,”field, number “”2″”, has inner quotes and a comma”,field3
so that if I request field 2 for the row above I get:
field, number “2”, has inner quotes and a comma
I appreciate that there are numerous solutions, Perl, Awk (etc.) to this problem but I would like a native bash command line tool that does not require me to invoke some other scripting environment or write any additional code(!).
My FOSS CSV stream editor CSVfix does exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.
csvtool is really good. Available in Debian / Ubuntu (
apt-get install csvtool). Example:
csvtool namedcol Account,Cost input.csv > output.csv
See the CSVTool manual page for usage tips.
As suggested by @Jonathan in a comment, there is a module for python that provides the command line tool csvfilter. It works like cut, but properly handles CSV column quoting:
csvfilter -f 1,3,5 in.csv > out.csv
If you have python (and you should), you can install it simply like this:
pip install csvfilter
More info at https://github.com/codeinthehole/csvfilter/
I found csvkit to be useful, it is based on python csv module and has quite a lot of options for parsing complex csv files.
Although it seems to be a bit slow. I am getting 4MB/s (with 100% cpu) when extracting one field from a 7GB csv with 5 columns.
To extract 4th column from
csvcut -c 4 file.csv
Try crush-tools, they are great at manipulating delimited data.
It sounds like exactly what you’re looking for.
I wrote one of these tools too (UNIX only) called csvprintf. It can also converts to XML in an online fashion.