Lesson 2: Piping commands#
We’ve been using one command at a time, but what if we want to use multiple?
For example let’s say I want to only see what fruits there are and their availability from fruits_extended.csv
in a nicely formatted table.
Selecting the columns#
Take a brief look at qsv select
:
qsv select -h
Select columns from CSV data efficiently.
This command lets you manipulate the columns in CSV data. You can re-order,
duplicate, reverse or drop them. Columns can be referenced by index or by
name if there is a header row (duplicate column names can be disambiguated with
more indexing). Column ranges can also be specified. Finally, columns can be
selected using regular expressions.
Select the first and fourth columns:
$ qsv select 1,4
Select the first 4 columns (by index and by name):
$ qsv select 1-4
$ qsv select Header1-Header4
Ignore the first 2 columns (by range and by omission):
$ qsv select 3-
$ qsv select '!1-2'
Select the third column named 'Foo':
$ qsv select 'Foo[2]'
Select the first and last columns, _ is a special character for the last column:
$ qsv select 1,_
Reverse the order of columns:
$ qsv select _-1
Sort the columns lexicographically (i.e. by their byte values)
$ qsv select 1- --sort
Select some columns and then sort them:
$ qsv select 1,4,5-7 --sort
Randomly shuffle the columns:
$ qsv select 1- --random
# with a seed
$ qsv select 1- --random --seed 42
Select some columns and then shuffle them with a seed:
$ qsv select 1,4,5-7 --random --seed 42
Select columns using a regex using '/<regex>/':
# select columns starting with 'a'
$ qsv select /^a/
# select columns with a digit
$ qsv select '/^.*\d.*$/'
# remove SSN, account_no and password columns
$ qsv select '!/SSN|account_no|password/'
Re-order and duplicate columns arbitrarily using different types of selectors:
$ qsv select 3-1,Header3-Header1,Header1,Foo[2],Header1
Quote column names that conflict with selector syntax:
$ qsv select '\"Date - Opening\",\"Date - Actual Closing\"'
For more examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_select.rs.
Usage:
qsv select [options] [--] <selection> [<input>]
qsv select --help
select arguments:
<selection> The columns to select.
You can select columns by index, by name, by range, by regex and
any combination of these. If the first character is '!', the
selection will be inverted. If the selection contains embedded
spaces or characters that conflict with selector syntax, it must
be quoted. See examples above.
select options:
These options only apply to the `select` command, not the `--select` option in other commands.
-R, --random Randomly shuffle the columns in the selection.
--seed <number> Seed for the random number generator.
-S, --sort Sort the selected columns lexicographically,
i.e. by their byte values.
Common options:
-h, --help Display this message
-o, --output <file> Write output to <file> instead of stdout.
-n, --no-headers When set, the first row will not be interpreted
as headers. (i.e., They are not searched, analyzed,
sliced, etc.)
-d, --delimiter <arg> The field delimiter for reading CSV data.
Must be a single character. (default: ,)
There are several ways to select the columns we want. Let’s take a look at the headers first:
qsv headers fruits_extended.csv
1 fruit
2 price
3 size
4 availability
We only want data within the fruit
and availability
columns. Let’s try selecting the data within those columns:
qsv select 1,4 fruits_extended.csv
fruit,availability
apple,available
banana,available
strawberry,available
orange,out of stock
pineapple,available
grape,out of stock
mango,available
watermelon,available
pear,out of stock
Great, we got the column data that we’re looking for. But how do we run this data through qsv table
?
Command redirection#
If you’re not sure what stdin
, stdout
, and stderr
are then we recommend reading the Command data streams section in the Appendix.
For example let’s say we want to display the previous output with qsv table
. We can run the following to pipe the output into qsv table
:
qsv select 1,4 fruits_extended.csv | qsv table
fruit availability
apple available
banana available
strawberry available
orange out of stock
pineapple available
grape out of stock
mango available
watermelon available
pear out of stock
Now we’ve got what we were looking for!
Notice that the output of the first command qsv select 1,4 fruits_extended.csv
was used as the input for qsv table
.
Exercise 2: Piping commands example#
Pipe the first and second columns of fruits_extended.csv
into qsv table
.
After running that pipeline and viewing the output, try adding qsv transpose
before qsv table
in the pipeline and see what the output looks like.
qsv select --help
Select columns from CSV data efficiently.
This command lets you manipulate the columns in CSV data. You can re-order,
duplicate, reverse or drop them. Columns can be referenced by index or by
name if there is a header row (duplicate column names can be disambiguated with
more indexing). Column ranges can also be specified. Finally, columns can be
selected using regular expressions.
Select the first and fourth columns:
$ qsv select 1,4
Select the first 4 columns (by index and by name):
$ qsv select 1-4
$ qsv select Header1-Header4
Ignore the first 2 columns (by range and by omission):
$ qsv select 3-
$ qsv select '!1-2'
Select the third column named 'Foo':
$ qsv select 'Foo[2]'
Select the first and last columns, _ is a special character for the last column:
$ qsv select 1,_
Reverse the order of columns:
$ qsv select _-1
Sort the columns lexicographically (i.e. by their byte values)
$ qsv select 1- --sort
Select some columns and then sort them:
$ qsv select 1,4,5-7 --sort
Randomly shuffle the columns:
$ qsv select 1- --random
# with a seed
$ qsv select 1- --random --seed 42
Select some columns and then shuffle them with a seed:
$ qsv select 1,4,5-7 --random --seed 42
Select columns using a regex using '/<regex>/':
# select columns starting with 'a'
$ qsv select /^a/
# select columns with a digit
$ qsv select '/^.*\d.*$/'
# remove SSN, account_no and password columns
$ qsv select '!/SSN|account_no|password/'
Re-order and duplicate columns arbitrarily using different types of selectors:
$ qsv select 3-1,Header3-Header1,Header1,Foo[2],Header1
Quote column names that conflict with selector syntax:
$ qsv select '\"Date - Opening\",\"Date - Actual Closing\"'
For more examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_select.rs.
Usage:
qsv select [options] [--] <selection> [<input>]
qsv select --help
select arguments:
<selection> The columns to select.
You can select columns by index, by name, by range, by regex and
any combination of these. If the first character is '!', the
selection will be inverted. If the selection contains embedded
spaces or characters that conflict with selector syntax, it must
be quoted. See examples above.
select options:
These options only apply to the `select` command, not the `--select` option in other commands.
-R, --random Randomly shuffle the columns in the selection.
--seed <number> Seed for the random number generator.
-S, --sort Sort the selected columns lexicographically,
i.e. by their byte values.
Common options:
-h, --help Display this message
-o, --output <file> Write output to <file> instead of stdout.
-n, --no-headers When set, the first row will not be interpreted
as headers. (i.e., They are not searched, analyzed,
sliced, etc.)
-d, --delimiter <arg> The field delimiter for reading CSV data.
Must be a single character. (default: ,)
Solution
For the first part, you may run:
qsv select 1,2 fruits_extended.csv | qsv table
The output should be:
fruit price
apple 2.50
banana 3.00
strawberry 1.50
orange 2.00
pineapple 3.50
grape 4.00
mango 1.80
watermelon 6.00
pear 2.20
The second part is adding qsv transpose
within the pipeline:
qsv select 1,2 fruits_extended.csv | qsv transpose | qsv table
The output should be:
fruit apple banana strawberry orange pineapple grape mango watermelon pear
price 2.50 3.00 1.50 2.00 3.50 4.00 1.80 6.00 2.20