The assembly stage is designed to construct consensus sequence(s). Input reads (in FASTQ format) are assembled using either denovo assembly or reference-based alignment. Resulting consensus can be further refined. Use -h after any command for a list of options.

assemble_denovo

Assemble reads via de novo assembly using SPAdes (documentation). Input is reads in FASTQ format. Output is contigs in FNA format.

Usage:

haphpipe assemble_denovo [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

(or):

hp_assemble_denovo [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> [--outdir]

Output files:
denovo_contigs.fna
denovo_summary.txt

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--outdir Output directory (default: current directory).

Options:

Option Description
--no_error_correction Do not perform error correction (default: False)
--subsample Use a subsample of reads for assembly
--seed Seed for random number generator (ignored if not subsampling)

Settings:

Option Description
--ncpu Number of CPU to use (default: 1).
--keep_tmp Keep temporary directory (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe assemble_denovo --fq1 corrected_1.fastq --fq2 corrected_2.fastq --outdir denovo_assembly --no_error_correction TRUE

assemble_amplicons

Assemble contigs from de novo assembly using both a reference sequence and amplicon regions with MUMMER 3+ (documentation). Input is contigs and reference sequence in FASTA format and amplicon regions in GTF format.

Usage:

haphpipe assemble_amplicons [OPTIONS] [SETTINGS] --contigs_fa <FASTA> --ref_fa <FASTA> --ref_gtf <GTF> [--outdir]

(or):

hp_assemble_amplicons [OPTIONS] [SETTINGS] --contigs_fa <FASTA> --ref_fa <FASTA> --ref_gtf <GTF> [--outdir]

Output files:
amplicon_assembly.fna

Input/Output Arguments:

Option Description
--contigs_fa Fasta file with assembled contigs.
--ref_fa Fasta file with reference genome to scaffold against.
--ref_gtf GTF format file containing amplicon regions.
--outdir Output directory (default: current directory).

Scaffold Options:

Option Description
--sample_id Sample ID (default: sampleXX).
--padding Bases to include outside reference annotation (default: 50).

Settings:

Option Description
--keep_tmp Keep temporary directory (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe assemble_amplicons --contigs_fa denovo_contigs.fa --ref_fa HIV_B.K03455.HXB2.fasta --ref_gtf HIV_B.K03455.HXB2.gtf

assemble_scaffold

Scaffold contigs against a reference sequence with MUMMER 3+ (documentation). Input is contigs in FASTA format and reference sequence in FASTA format. Output is scaffold assembly, alligned scaffold, imputed scaffold, and padded scaffold in FASTA format.

Usage:

haphpipe assemble_scaffold [OPTIONS] [SETTINGS] --contigs_fa <FASTA> --ref_fa <FASTA> [--outdir]

(or):

hp_assemble_scaffold [OPTIONS] [SETTINGS] --contigs_fa <FASTA> --ref_fa <FASTA> [--outdir]

Output files:
scaffold_aligned.fa
scaffold_assembly.fa
scaffold_imputed.fa
scaffold_padded.out

Input/Output Arguments:

Option Description
--contigs_fa Fasta file with assembled contigs.
--ref_fa Fasta file with reference genome to scaffold against.
--outdir Output directory (default: current directory).

Options:

Option Description
--seqname Name to append to scaffold sequence (default: sample01).

Settings:

Option Description
--keep_tmp Additional options (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe assemble_scaffold --contigs_fa denovo_contigs.fa --ref_fa HIV_B.K03455.HXB2.fasta

align_reads

Map reads to reference sequence (instead of running de novo assembly) using Bowtie2 (documentation) and Picard (documentation). Input is reads in FASTQ format and reference sequence in FASTA format.

Usage:

haphpipe align_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

(or):

hp_align_reads [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

Output files:
aligned.bam
aligned.bt2.out

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--ref_fa Reference fasta file.
--outdir Output directory (default: current directory).

Options:

Option Description
--bt2_preset {very-fast, fast, sensitive,very-sensitive,very-fast-local,fast-local,sensitive-local,very-sensitive-local}
--sample_id Sample ID. Used as read group ID in BAM (default: sampleXX).
--no_realign Do not realign indels (default: False).
--remove_duplicates Remove duplicates from final alignment. Otherwise duplicates are marked but not removed (default: False).
--encoding {Phred+33,Phred+64} Quality score encoding.

Settings:

Option Description
--ncpu Number of CPUs to use (default: 1).
--xmx Maximum heap size for Java VM, in GB (default: 32).
--keep_tmp Additional options (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe align_reads --fq1 corrected_1.fastq --fq2 corrected _2.fastq --ref_fa HIV_B.K03455.HXB2.fasta

call_variants

Variant calling from alignment using GATK (documentation). Input is alignment file in BAM format and reference sequence in FASTA format (either reference from reference-based assembly or consensus final sequence from de novo assembly). Output is a Variant Call File (VCF) format file.

Usage:

haphpipe call_variants [OPTIONS] [SETTINGS] --aln_bam <BAM> --ref_fa <FASTA> [--outdir]

(or):

hp_call_variants [OPTIONS] [SETTINGS] --aln_bam <BAM> --ref_fa <FASTA> [--outdir]

Output files:
variants.vcf.gz

Input/Output Arguments:

Option Description
--aln_bam Alignment file.
--ref_fa Reference fasta file.
--outdir Output directory (default: False).

Options:

Option Description
--emit_all Output calls for all site (default: False).
--min_base_qual Minimum base quality required to consider a base for calling (default: 15).

Settings:

Option Description
--ncpu Number of CPUs to use (default: 1).
--xmx Maximum heap size for Java VM, in GB (default: 32).
--keep_tmp Additional options (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe call_variants --aln_bam alignment.bam --ref_fa HIV_B.K03455.HXB2.fasta

vcf_to_consensus

Generate a consensus sequence from a VCF file. Input is a VCF file. Output is the consensus sequence in FASTA format.

Usage:

haphpipe vcf_to_consensus [OPTIONS] [SETTINGS] --vcf <FASTQ> [--outdir] [--sampidx]

(or):

hp_vcf_to_consensus [OPTIONS] [SETTINGS] --vcf <FASTQ> [--outdir] [--sampidx]

Output files:
consensus.fna

Input/Output Arguments:

Option Description
--vcf VCF file (created with all sites).
--outdir Output directory (default: False).
--sampidx Index for sample if multi-sample VCF (default: 0).

Options:

Option Description
--min_DP Minimum depth to call site (default: 1).
--major Allele fraction to make unambiguous call (default: 0.5).
--minor Allele fraction to make ambiguous call (default: 0.2).

Settings:

Option Description
--keep_tmp Additional options (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.

Example usage:

haphpipe vcf_to_consensus --vcf variants.vcf

refine_assembly

Map reads to a denovo assembly or reference alignment. Assembly or alignment is iteratively updated. Input is reads in FASTQ format and reference sequence (assembly or reference alignment) in FASTA format. Output is refined assembly in FASTA format.

Usage:

haphpipe refine_assembly [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

(or):

hp_refine_assembly [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

Output files:
refined.fna

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--ref_fa Reference fasta file.
--outdir Output directory (default: False).

Options:

Option Description
--max_step Maximum number of refinement steps (default: 1).
--subsample Use a subsample of reads for refinement.
--seed Seed for random number generator (ignored if not subsampling).
--sample_id Sample ID. Used as read group ID in BAM (default: sampleXX).

Settings:

Option Description
--ncpu Number of CPUs to use (default: 1).
--xmx Maximum heap size for Java VM, in GB (default: 32).
--keep_tmp Additional options (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe refine_assembly --fq_1 corrected_1.fastq --fq2 corrected_2.fastq --ref_fa HIV_B.K03455.HXB2.fasta

finalize_assembly

Finalize consensus, map reads to consensus, and call variants. Input is reads in FASTQ format and reference sequence in FASTA format. Output is finalized reference sequence, alignment, and variants (in FASTA, BAM, and VCF formats, respectively).

Usage:

haphpipe finalize_assembly [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

(or):

hp_finalize_assembly [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --fqU <FASTQ> --ref_fa <FASTA> [--outdir]

Output files:
final.fna
final.ban
final.vcf.gz

Input/Output Arguments:

Option Description
--fq1 Fastq file with read 1.
--fq2 Fastq file with read 2.
--fqU Fastq file with unpaired reads.
--ref_fa Consensus fasta file.
--outdir Output directory (default: current directory).

Options:

Option Description
--bt2_preset {very-fast,fast,sensitive,very-sensitive,very-fast-local,fast-local,sensitive-local,very-sensitive-local} Bowtie2 preset to use (default: very-sensitive).
--sample_id Sample ID (default: sampleXX).

Settings:

Option Description
--ncpu Number of CPU to use (default: 1).
--keep_tmp Keep temporary directory (default: False).
--quiet Do not write output to console (silence stdout and stderr) (default: False).
--logfile Append console output to this file.
--debug Print commands but do not run (default: False).

Example usage:

haphpipe finalize_assembly --fq_1 corrected_1.fastq --fq2 corrected_2.fastq --ref_fa refined.fna