Lesson 0: Exploring qsv help messages and syntax#

Listing all commands#

This may be your first time using qsv, so let’s see what qsv has to offer. We’ll run qsv with the --list flag.

qsv --list
Installed commands (55):
    apply       Apply series of transformations to a column
    behead      Drop header from CSV file
    cat         Concatenate by row or column
    count       Count records
    datefmt     Format date/datetime strings
    dedup       Remove redundant rows
    describegpt Infer extended metadata using a LLM
    diff        Find the difference between two CSVs
    enum        Add a new column enumerating CSV lines
    excel       Exports an Excel sheet to a CSV
    exclude     Excludes the records in one CSV from another
    explode     Explode rows based on some column separator
    extdedup    Remove duplicates rows from an arbitrarily large text file
    extsort     Sort arbitrarily large text file
    fetch       Fetches data from web services for every row using HTTP Get.
    fetchpost   Fetches data from web services for every row using HTTP Post.
    fill        Fill empty values
    fixlengths  Makes all records have same length
    flatten     Show one field per line
    fmt         Format CSV output (change field delimiter)
    foreach     Loop over a CSV file to execute bash commands (*nix only)
    frequency   Show frequency tables
    geocode     Geocodes a location against the Geonames cities database.
    headers     Show header names
    help        Show this usage message
    index       Create CSV index for faster access
    input       Read CSVs w/ special quoting, skipping, trimming & transcoding rules
    join        Join CSV files
    joinp       Join CSV files using the Pola.rs engine
    jsonl       Convert newline-delimited JSON files to CSV
    luau        Execute Luau script on CSV data
    partition   Partition CSV data based on a column value
    pseudo      Pseudonymise the values of a column
    rename      Rename the columns of CSV data efficiently
    replace     Replace patterns in CSV data
    reverse     Reverse rows of CSV data
    safenames   Modify a CSV's header names to db-safe names
    sample      Randomly sample CSV data
    schema      Generate JSON Schema from CSV data
    search      Search CSV data with a regex
    searchset   Search CSV data with a regex set
    select      Select, re-order, duplicate or drop columns
    slice       Slice records from CSV
    snappy      Compress/decompress data using the Snappy algorithm
    sniff       Quickly sniff CSV metadata
    sort        Sort CSV data in alphabetical, numerical, reverse or random order
    sortcheck   Check if a CSV is sorted
    split       Split CSV data into many files
    sqlp        Run a SQL query against several CSVs using the Pola.rs engine
    stats       Infer data types and compute summary statistics
    table       Align CSV data into columns
    tojsonl     Convert CSV to newline-delimited JSON
    to          Convert CSVs to PostgreSQL/XLSX/Parquet/SQLite/Data Package
    transpose   Transpose rows/columns of CSV data
    validate    Validate CSV data for RFC4180-compliance or with JSON Schema

sponsored by datHere - Data Infrastructure Engineering (https://qsv.datHere.com)

Here we see a list of commands and a brief description about them.[1]

Viewing a command’s help message#

You may view a command’s help message by running:

qsv <command> --help

For example I may run the following to get the help message for the headers command:

qsv headers --help
Prints the fields of the first row in the CSV data.

These names can be used in commands like 'select' to refer to columns in the
CSV data.

Note that multiple CSV files may be given to this command. This is useful with
the --intersect flag.

For examples, see https://github.com/jqnatividad/qsv/blob/master/tests/test_headers.rs.

Usage:
    qsv headers [options] [<input>...]
    qsv headers --help

headers arguments:
    <input>...             The CSV file(s) to read. Use '-' for standard input.
                           If input is a directory, all files in the directory will
                           be read as input.
                           If the input is a file with a '.infile-list' extension,
                           the file will be read as a list of input files.
                           If the input are snappy-compressed files(s), it will be
                           decompressed automatically.

headers options:
    -j, --just-names       Only show the header names (hide column index).
                           This is automatically enabled if more than one
                           input is given.
    --intersect            Shows the intersection of all headers in all of
                           the inputs given.
    --trim                 Trim space & quote characters from header name.

Common options:
    -h, --help             Display this message
    -d, --delimiter <arg>  The field delimiter for reading CSV data.
                           Must be a single character. (default: ,)

Usually you’ll find a similar structure for other qsv commands:

  • Description about the command

  • More details

  • Examples and/or a link to them

  • Usage format

  • Subcommands[2]

  • Arguments

  • Options (flags)

Displaying headers of a CSV#

Let’s try viewing the headers in the fruits.csv file located in lessons/0. Based on the command format in the “Usage” section of the help message for qsv headers, we’ll run:

qsv headers fruits.csv
1   fruit
2   price

Recap#

In this lesson we’ve covered how to:

  • List all available qsv commands with qsv --list

  • View the help message for an individual command with qsv <command> --help

  • Interpret the parts of a command help message

  • Run a command on an arbitrary CSV file, getting the headers with qsv headers <filepath>

Now it’s your turn to take on the first exercise.

Exercise 0: Total rows#

Binder

Using a qsv command, get the total number of rows that are in the fruits.csv file.

Here we list qsv commands for your reference. Solve this exercise using Thebe, Binder or locally.

qsv --list
Installed commands (55):
    apply       Apply series of transformations to a column
    behead      Drop header from CSV file
    cat         Concatenate by row or column
    count       Count records
    datefmt     Format date/datetime strings
    dedup       Remove redundant rows
    describegpt Infer extended metadata using a LLM
    diff        Find the difference between two CSVs
    enum        Add a new column enumerating CSV lines
    excel       Exports an Excel sheet to a CSV
    exclude     Excludes the records in one CSV from another
    explode     Explode rows based on some column separator
    extdedup    Remove duplicates rows from an arbitrarily large text file
    extsort     Sort arbitrarily large text file
    fetch       Fetches data from web services for every row using HTTP Get.
    fetchpost   Fetches data from web services for every row using HTTP Post.
    fill        Fill empty values
    fixlengths  Makes all records have same length
    flatten     Show one field per line
    fmt         Format CSV output (change field delimiter)
    foreach     Loop over a CSV file to execute bash commands (*nix only)
    frequency   Show frequency tables
    geocode     Geocodes a location against the Geonames cities database.
    headers     Show header names
    help        Show this usage message
    index       Create CSV index for faster access
    input       Read CSVs w/ special quoting, skipping, trimming & transcoding rules
    join        Join CSV files
    joinp       Join CSV files using the Pola.rs engine
    jsonl       Convert newline-delimited JSON files to CSV
    luau        Execute Luau script on CSV data
    partition   Partition CSV data based on a column value
    pseudo      Pseudonymise the values of a column
    rename      Rename the columns of CSV data efficiently
    replace     Replace patterns in CSV data
    reverse     Reverse rows of CSV data
    safenames   Modify a CSV's header names to db-safe names
    sample      Randomly sample CSV data
    schema      Generate JSON Schema from CSV data
    search      Search CSV data with a regex
    searchset   Search CSV data with a regex set
    select      Select, re-order, duplicate or drop columns
    slice       Slice records from CSV
    snappy      Compress/decompress data using the Snappy algorithm
    sniff       Quickly sniff CSV metadata
    sort        Sort CSV data in alphabetical, numerical, reverse or random order
    sortcheck   Check if a CSV is sorted
    split       Split CSV data into many files
    sqlp        Run a SQL query against several CSVs using the Pola.rs engine
    stats       Infer data types and compute summary statistics
    table       Align CSV data into columns
    tojsonl     Convert CSV to newline-delimited JSON
    to          Convert CSVs to PostgreSQL/XLSX/Parquet/SQLite/Data Package
    transpose   Transpose rows/columns of CSV data
    validate    Validate CSV data for RFC4180-compliance or with JSON Schema

sponsored by datHere - Data Infrastructure Engineering (https://qsv.datHere.com)