The Reads stage involves cleaning up the raw read sequences, as well as other processing steps. Modules to manipulate reads. Use -h after any command for a list of options.
sample_reads
Subsample reads using seqtk (documentation). Input is reads in FASTQ format. Output is sampled reads in FASTQ format. You do not have to have all read options (i.e., read1, read2 AND unpaired reads). You can have a combination of any of those.
Usage:
haphpipe sample_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
(or):
hp_sample_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
Output files:
sample_1.fastq
sample_2.fastq
Input/Output Arguments:
Option | Description |
---|---|
--fq1 | Fastq file with read 1. |
--fq2 | Fastq file with read 2. |
--fqU | Fastq file with unpaired reads. |
--outdir | Output directory (default: current directory). |
Options:
Option | Description |
---|---|
--nreads | Number of reads to sample. If greater than the number of reads in file, all reads will be sampled. |
--frac | Fraction of reads to sample, between 0 and 1. Each read has [frac] |
--seed | Seed for random number generator. |
Settings:
Option | Description |
---|---|
--quiet | Do not write output to console (silence stdout and stderr), default is False. |
--logfile | Append console output to this file. |
--debug | Print commands but do not run, default is False. |
Example usage:
This pulls 1000 reads from these paired end files with a starting seed of 1234.
haphpipe sample_reads --fq1 read_1.fastq --fq2 read_2.fastq --nreads 1000 --seed 1234
--
trim_reads
Trim reads using Trimmomatic (documentation). Input is reads in FASTQ format. Output is trimmed reads in FASTQ format.
Usage:
haphpipe trim_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
(or):
hp_trim_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
Output files:
trimmed_1.fastq
trimmed_2.fastq
trimmed_U.fastq
trimmomatic_summary.out
Input/Output Arguments:
Option | Description |
---|---|
--fq1 | Fastq file with read 1. |
--fq2 | Fastq file with read 2. |
--fqU | Fastq file with unpaired reads. |
--outdir | Output directory (default: current directory). |
Options:
Option | Description |
---|---|
--adapter_file | Adapter file. |
--trimmers | Trim commands for trimmomatic (default: ['LEADING:3', 'TRAILING:3', 'SLIDINGWINDOW:4:15', 'MINLEN:36']). |
--encoding | Quality score encoding. |
Settings:
Option | Description |
---|---|
--ncpu | Number of CPU to use (default: 1). |
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
--debug | Print commands but do not run (default: False). |
Example usage:
This trims paired end read files 1 and 2.
haphpipe trim_reads --fq1 read_1.fastq --fq2 read_2.fastq
--
join_reads
Join reads using FLASH (paper). Input is reads in FASTQ format. Output is joined reads in FASTQ format.
Usage:
haphpipe join_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> [--outdir]
(or):
hp_join_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> [--outdir]
Output files:
joined.fastq
notjoined_1.fastq
notjoined_2.fastq
Input/Output Arguments:
Option | Description |
---|---|
--fq1 | Fastq file with read 1. |
--fq2 | Fastq file with read 2. |
--outdir | Output directory (default: current directory). |
Settings:
Option | Description |
---|---|
--min_overlap | The minimum required overlap length between two reads to provide a confident overlap (default: 10). |
--max_overlap | Maximum overlap length expected in approximately 90% of read pairs, longer overlaps are penalized. |
--allow_outies | Also try combining read pairs in the "outie" orientation (default: False). |
--encoding | Quality score encoding. |
Settings:
Option | Description |
---|---|
--ncpu | Number of CPU to use (default: 1). |
--keep_tmp | Keep temporary directory (default: False). |
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
--debug | Print commands but do not run (default: False). |
Example usage:
haphpipe join_reads --fq1 trimmed_1.fastq --fq2 trimmed_2.fastq
ec_reads
Error correction using SPAdes (documentation). Input is reads in FASTQ format. Output is error-corrected reads in FASTQ format. Remember that HAPHPIPE is intended for Illumina reads, therefore the error correction is based on Illumina sequencing errors.
Usage:
haphpipe ec_reads [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
(or):
hp_ec_reads [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]
Output files:
corrected_1.fastq
corrected_2.fastq
corrected_U.fastq
Input/Output Arguments:
Option | Description |
---|---|
--fq1 | Fastq file with read 1. |
--fq2 | Fastq file with read 2. |
--fqU | Fastq file with unpaired reads. |
--outdir | Output directory (default: current directory). |
Settings:
Option | Description |
---|---|
--ncpu | Number of CPU to use (default: 1). |
--keep_tmp | Keep temporary directory (default: False). |
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
--debug | Print commands but do not run, default is False. |
Example usage:
haphpipe ec_reads --fq1 trimmed_1.fastq --fq2 trimmed_2.fastq