Lesson 2: Piping commands#

We’ve been using one command at a time, but what if we want to use multiple?

For example let’s say I want to only see what fruits there are and their availability from fruits_extended.csv in a nicely formatted table.

Selecting the columns#

Take a brief look at qsv select:

qsv select -h
Select columns from CSV data efficiently.

This command lets you manipulate the columns in CSV data. You can re-order,
duplicate, reverse or drop them. Columns can be referenced by index or by
name if there is a header row (duplicate column names can be disambiguated with
more indexing). Column ranges can also be specified. Finally, columns can be
selected using regular expressions.

  Select the first and fourth columns:
  $ qsv select 1,4

  Select the first 4 columns (by index and by name):
  $ qsv select 1-4
  $ qsv select Header1-Header4

  Ignore the first 2 columns (by range and by omission):
  $ qsv select 3-
  $ qsv select '!1-2'

  Select the third column named 'Foo':
  $ qsv select 'Foo[2]'

  Select the first and last columns, _ is a special character for the last column:
  $ qsv select 1,_

  Reverse the order of columns:
  $ qsv select _-1

  Sort the columns lexicographically (i.e. by their byte values)
  $ qsv select 1- --sort

  Select some columns and then sort them:
  $ qsv select 1,4,5-7 --sort

  Randomly shuffle the columns:
  $ qsv select 1- --random
  # with a seed
  $ qsv select 1- --random --seed 42

  Select some columns and then shuffle them with a seed:
  $ qsv select 1,4,5-7 --random --seed 42

  Select columns using a regex using '/<regex>/':
  # select columns starting with 'a'
  $ qsv select /^a/
  # select columns with a digit
  $ qsv select '/^.*\d.*$/'
  # remove SSN, account_no and password columns
  $ qsv select '!/SSN|account_no|password/'

  Re-order and duplicate columns arbitrarily using different types of selectors:
  $ qsv select 3-1,Header3-Header1,Header1,Foo[2],Header1

  Quote column names that conflict with selector syntax:
  $ qsv select '\"Date - Opening\",\"Date - Actual Closing\"'

For more examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_select.rs.

Usage:
    qsv select [options] [--] <selection> [<input>]
    qsv select --help

select arguments:
    <selection>            The columns to select. 
                           You can select columns by index, by name, by range, by regex and
                           any combination of these. If the first character is '!', the
                           selection will be inverted. If the selection contains embedded
                           spaces or characters that conflict with selector syntax, it must
                           be quoted. See examples above.

select options:
These options only apply to the `select` command, not the `--select` option in other commands.

    -R, --random           Randomly shuffle the columns in the selection.
    --seed <number>        Seed for the random number generator.

    -S, --sort             Sort the selected columns lexicographically,
                           i.e. by their byte values.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)

There are several ways to select the columns we want. Let’s take a look at the headers first:

qsv headers fruits_extended.csv
1   fruit
2   price
3   size
4   availability

We only want data within the fruit and availability columns. Let’s try selecting the data within those columns:

qsv select 1,4 fruits_extended.csv
fruit,availability
apple,available
banana,available
strawberry,available
orange,out of stock
pineapple,available
grape,out of stock
mango,available
watermelon,available
pear,out of stock

Great, we got the column data that we’re looking for. But how do we run this data through qsv table?

Command redirection#

If you’re not sure what stdin, stdout, and stderr are then we recommend reading the Command data streams section in the Appendix.

For example let’s say we want to display the previous output with qsv table. We can run the following to pipe the output into qsv table:

qsv select 1,4 fruits_extended.csv | qsv table
fruit       availability
apple       available
banana      available
strawberry  available
orange      out of stock
pineapple   available
grape       out of stock
mango       available
watermelon  available
pear        out of stock

Now we’ve got what we were looking for!

Notice that the output of the first command qsv select 1,4 fruits_extended.csv was used as the input for qsv table.

Exercise 2: Piping commands example#

Binder

Pipe the first and second columns of fruits_extended.csv into qsv table.

After running that pipeline and viewing the output, try adding qsv transpose before qsv table in the pipeline and see what the output looks like.

Here we show the usage text of qsv select for your reference. Solve this exercise using Thebe, Binder or locally.

qsv select --help
Select columns from CSV data efficiently.

This command lets you manipulate the columns in CSV data. You can re-order,
duplicate, reverse or drop them. Columns can be referenced by index or by
name if there is a header row (duplicate column names can be disambiguated with
more indexing). Column ranges can also be specified. Finally, columns can be
selected using regular expressions.

  Select the first and fourth columns:
  $ qsv select 1,4

  Select the first 4 columns (by index and by name):
  $ qsv select 1-4
  $ qsv select Header1-Header4

  Ignore the first 2 columns (by range and by omission):
  $ qsv select 3-
  $ qsv select '!1-2'

  Select the third column named 'Foo':
  $ qsv select 'Foo[2]'

  Select the first and last columns, _ is a special character for the last column:
  $ qsv select 1,_

  Reverse the order of columns:
  $ qsv select _-1

  Sort the columns lexicographically (i.e. by their byte values)
  $ qsv select 1- --sort

  Select some columns and then sort them:
  $ qsv select 1,4,5-7 --sort

  Randomly shuffle the columns:
  $ qsv select 1- --random
  # with a seed
  $ qsv select 1- --random --seed 42

  Select some columns and then shuffle them with a seed:
  $ qsv select 1,4,5-7 --random --seed 42

  Select columns using a regex using '/<regex>/':
  # select columns starting with 'a'
  $ qsv select /^a/
  # select columns with a digit
  $ qsv select '/^.*\d.*$/'
  # remove SSN, account_no and password columns
  $ qsv select '!/SSN|account_no|password/'

  Re-order and duplicate columns arbitrarily using different types of selectors:
  $ qsv select 3-1,Header3-Header1,Header1,Foo[2],Header1

  Quote column names that conflict with selector syntax:
  $ qsv select '\"Date - Opening\",\"Date - Actual Closing\"'

For more examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_select.rs.

Usage:
    qsv select [options] [--] <selection> [<input>]
    qsv select --help

select arguments:
    <selection>            The columns to select. 
                           You can select columns by index, by name, by range, by regex and
                           any combination of these. If the first character is '!', the
                           selection will be inverted. If the selection contains embedded
                           spaces or characters that conflict with selector syntax, it must
                           be quoted. See examples above.

select options:
These options only apply to the `select` command, not the `--select` option in other commands.

    -R, --random           Randomly shuffle the columns in the selection.
    --seed <number>        Seed for the random number generator.

    -S, --sort             Sort the selected columns lexicographically,
                           i.e. by their byte values.

Common options:
    -h, --help             Display this message
    -o, --output <file>    Write output to <file> instead of stdout.
    -n, --no-headers       When set, the first row will not be interpreted
                           as headers. (i.e., They are not searched, analyzed,
                           sliced, etc.)
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)