The Reads stage involves cleaning up the raw read sequences, as well as other processing steps. Modules to manipulate reads. Use -h after any command for a list of options.

sample_reads

Subsample reads using seqtk (documentation). Input is reads in FASTQ format. Output is sampled reads in FASTQ format. You do not have to have all read options (i.e., read1, read2 AND unpaired reads). You can have a combination of any of those.

Usage:

haphpipe sample_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

(or):

hp_sample_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

Output files:
sample_1.fastq
sample_2.fastq

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--outdir Output directory (default: current directory).

Options:

Option Description
--nreads Number of reads to sample. If greater than the number of reads in file, all reads will be sampled.
--frac Fraction of reads to sample, between 0 and 1. Each read has [frac]
--seed Seed for random number generator.

Settings:

Option Description
--quiet Do not write output to console (silence stdout and stderr), default is False.
--logfile Append console output to this file.
--debug Print commands but do not run, default is False.

Example usage:

This pulls 1000 reads from these paired end files with a starting seed of 1234.

haphpipe sample_reads --fq1 read_1.fastq --fq2 read_2.fastq --nreads 1000 --seed 1234

--

trim_reads

Trim reads using Trimmomatic (documentation). Input is reads in FASTQ format. Output is trimmed reads in FASTQ format.

Usage:

haphpipe trim_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

(or):

hp_trim_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

Output files:
trimmed_1.fastq
trimmed_2.fastq
trimmed_U.fastq
trimmomatic_summary.out

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--outdir Output directory (default: current directory).

Options:

Option Description
--adapter_file Adapter file.
--trimmers Trim commands for trimmomatic (default: ['LEADING:3', 'TRAILING:3', 'SLIDINGWINDOW:4:15', 'MINLEN:36']).
--encoding Quality score encoding.

Settings:

Option Description
--ncpu Number of CPU to use (default: 1).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

This trims paired end read files 1 and 2.

haphpipe trim_reads --fq1 read_1.fastq --fq2 read_2.fastq 

--

join_reads

Join reads using FLASH (paper). Input is reads in FASTQ format. Output is joined reads in FASTQ format.

Usage:

haphpipe join_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> [--outdir]

(or):

hp_join_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> [--outdir]

Output files:
joined.fastq
notjoined_1.fastq
notjoined_2.fastq

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--outdir Output directory (default: current directory).

Settings:

Option Description
--min_overlap The minimum required overlap length between two reads to provide a confident overlap (default: 10).
--max_overlap Maximum overlap length expected in approximately 90% of read pairs, longer overlaps are penalized.
--allow_outies Also try combining read pairs in the "outie" orientation (default: False).
--encoding Quality score encoding.

Settings:

Option Description
--ncpu Number of CPU to use (default: 1).
--keep_tmp Keep temporary directory (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe join_reads --fq1 trimmed_1.fastq --fq2 trimmed_2.fastq

ec_reads

Error correction using SPAdes (documentation). Input is reads in FASTQ format. Output is error-corrected reads in FASTQ format. Remember that HAPHPIPE is intended for Illumina reads, therefore the error correction is based on Illumina sequencing errors.

Usage:

haphpipe ec_reads [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

(or):

hp_ec_reads [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

Output files:
corrected_1.fastq
corrected_2.fastq
corrected_U.fastq

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--outdir Output directory (default: current directory).

Settings:

Option Description
--ncpu Number of CPU to use (default: 1).
--keep_tmp Keep temporary directory (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run, default is False.

Example usage:

haphpipe ec_reads --fq1 trimmed_1.fastq --fq2 trimmed_2.fastq