The haplotype stage in HAPHPIPE implements PredictHaplo (paper), although other haplotype reconstruction programs can be utilized outside of HAPHPIPE using the final output of HAPHPIPE, typically with the final consensus sequence (FASTA) file, reads (raw, trimmed, and/or corrected), and/or final alignment (BAM) file as input. Use -h after any command for a list of options.
Note: PredictHaplo is not compatable with read files including read IDs (.1 and .2 appended at the end of read names for read 1 and read 2, respectively). If your file has read IDs, use the following commands to create new files with the read IDs taken out before running PredictHaplo:
cat corrected_1.fastq | sed 's/\.1 / /' > corrected_1_fixed.fastq
cat corrected_2.fastq | sed 's/\.1 / /' > corrected_2_fixed.fastq
predict_haplo
Haplotype identification with PredictHaplo. Input is reads in FASTQ format and and reference sequence in FASTA format. Output is the longest global haplotype file and corresponding HTML file. Note: PredictHaplo must be installed separately before running this stage.
Usage:
haphpipe predict_haplo [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --ref_fa <FASTA> --interval_txt [TXT] [--outdir]
(or):
hp_predict_haplo [OPTIONS] [SETTINGS] --fq1 <FASTQ> --fq2 <FASTQ> --ref_fa <FASTA> --interval_txt [TXT] [--outdir]
Output files:
best.fa
Input/Output Arguments:
Option | Description |
---|---|
--fq1 | Fastq file with read 1. |
--fq2 | Fastq file with read 2. |
--ref_fa | Reference sequence used to align reads (Fasta). |
--interval_txt | File with intervals to perform haplotype reconstruction. |
--outdir | Output directory (default: current directory). |
Options:
Option | Description |
---|---|
--min_readlength | Minimum read length passed to PredictHaplo (default: 36). |
Settings:
Option | Description |
---|---|
--keep_tmp | Keep temporary directory (default: False). |
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
--debug | Print commands but do not run (default: False). |
Example usage:
haphpipe predict_haplo corrected_1.fastq --fq2 corrected_2.fastq --ref_fa final.fna
ph_parser
Returns PredictHaplo output as a correctly formatted FASTA file. Input is the output file from predict_haplo (longest global .fas file). Output is a correctly formatted FASTA file.
Usage:
haphpipe ph_parser [OPTIONS] [SETTINGS] --haplotypes_fa <best.fas> [--outdir]
(or):
hp_ph_parser [OPTIONS] [SETTINGS] --haplotypes_fa <best.fas> [--outdir]
Output files:
ph_summary.txt
ph_haplotypes.fna
Input/Output Arguments:
Option | Description |
---|---|
--haplotypes_fa | Haplotype file created by PredictHaplo. |
--outdir | Output directory (default: current directory). |
Options:
Option | Description |
---|---|
--prefix | Prefix to add to sequence names. |
--keep_gaps | Do not remove gaps from alignment (default: False). |
Settings:
Option | Description |
---|---|
--quiet | Do not write output to console (silence stdout and stderr) (default: False). |
--logfile | Append console output to this file. |
Example usage:
haphpipe ph_parser PH01.best_1_864.fas
Example:
By default, PredictHaplo outputs their own unique version of a fasta file. It includes the frequency, some information regarding their unqiue overlapping scores, and their unique confidence scores. This file is always named PH#.best_#_#.fas
, where the first number is the reconstructed haplotype number and the next numbers are the start and end of the longest haplotype reconstructed by PredictHaplo.
PredictHaplo Output
$ cat PH01.best_1_1208.fas
>reconstructed_0
;Freq:0.190638
;Overlap quality scores (5,10):1,1
;Confidence scores
;~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~n7~~~y~~~~~t~~~i~jjk~~zz
;~~~~{~~~~~N{~{~Sx~~~~~~z~K~F~~~~~~~y~~~~~~~|~~F~wx|~~{~~|~~s<|]~~~~~~
;~m~kj|{|v~{|_`~~~z~~~~~~~jy{y~~~~~a~__~~|~~~~~{wXZ~}~~~qm~xV~~~~~~~}~
;Q~}~y||~~~}}~~~z~~~~~~{}A~}b|~~u~~|}}|~}}z~}bx~~n||~~||{~}~~d}~bz~~~}
;|~}}~~~}~~~{|}}g{~~~}~r~}}~~~~u~~{kx{~}~}~|~}~}{}~}~}||~~~~[~}~}}~~~~
;~~}~~~U||U}~|}}~~}}~~~~}u~b|}~w~~~~~{}|wv~}Dxzp{|}~@~~P~}~}~V~~z}~|ry
;q~}|}~}~~}t~o~}~f}~~}{~~~}~{~~~~~~~}~~}~~|~~~M~~}~}x~}c~}^v~~~yzA~}~}
;y}~z}~~~~~~~~~{z~~}~~~}~{}~~~~~}~~~~~~~|~~v~~}~~|y~]|{~||~~~~~~~~||~~
;Y||~~|Q~|~~~|~~~~~|~~~z~~z~{{{y~~~~~~~~~~w{{~wz|~Z|~z|~~}p|~~|}}~~x}}
;z}~}}|a|}}}}{}|~~}}~}}~}~{~|~}}~}{}|}|}}}~|}}}}{}}}|}}|}|}}}|}}}}~}}}
;||}}}}{}}~}~}}}}}}}}}}}~}}}}}}}}}}}}}}}}}y}~}}}}}}}}}}~|}}}}|}}}}|}}}
;}}}}}}}}|}}}}}}}}~}}}}}}|}}}}}}}}|}}}}}|}}}}|~|}}}}{}}}}}}}}}}}}}}}}}
;}}=}}}{}}}}}}}}}}}}}}}}}}}}}}}}~||}}|~}}}}}{}}}}}|}}|}}}}}}}}}}}}}|}}
;|}~}}}}|}}}}}}}}|}|~J}}}}}}~}}}}}}}}}}}}}}}}}}}}|}}`~}}}}|}}}}}}}}}}~
;|}}}}}}}}}}}~}}}|}}}}}}}}}}|}}}}}}}}}}}t}}{}`}}}}}}|}~}~}|~}}}|k}}}}}
;|}|~}}|}}~}}~}|}}z}}}}}}}}}}}}}}~}}~}}}~~}|}}}~}}}}}}}}~}~}}||~~}~z}}
;~~}~}||~|~}|{}||~|z~~||}~|~}}|}~~}|}}~}}|}z~~~~{~}{}~y~~~~~{|}}~~|y~~
;~|~~||~~~~~~~|n~~{~~~~~~~~~~~~~~~
;EndOfComments
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAATATCTGGCCTTCCCGCAAGGGAAG
GCCAGGGAACTTTCCTCAGAGCAGGCTAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGCTTGGGGAA
GCAACAACAAAGCCCTCTCAGAAACAGGAGCCAATAGACAAGGAAATGTATCCTTTAACTTCCCTCAGAT
CACTCTTTGGCAACGACCCCTTGTCACAATAAAGATAGGGGAGCAACTAAAGGAAGCCCTGTTAGATACA
GGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAAACCAAGAATGATAGGGGGAA
TCGGGGGTTTTATCAAAGTAAGACAGTATGATCAGATAGCCATAGACATCTGTGGCCATAAAGCTATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTGAAGCCAGGAATGGATGGCCCAAAGG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAAATGGAAAAAGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGGAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAGGACTTCTGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGCGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTATACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACTAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AAGCTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCTT
>reconstructed_1
;Freq:0.294104
;Overlap quality scores (5,10):1,1
;Confidence scores
;~~~~~~~~z~~~~~~~~~|~y~~~~u}|~~~~~}}~~~~~~~ya~|~~}}~~~y~~~j~YXH~~s~
;~~~~z~~|~~b|}^}xZ}}~}z~t~r~{}~}x}}yb}~~~}~}u~~}~gu|}}{~|}}~ozsw}|}}h~
;|l}v|^][t}zz}vw}}s}}}}~v}|~|~~}}}~~}}}}}~}}}}}}zo\}}}~}vn}iv}}}}}}}}}
;v}}~|u}}}}}}}}}|}}}}}}|}^}}[}~}r}}}{|}}|t|}}{z}}pz}}{}}}}}}}v~}y|~}}}
;}}}}~}}~}}}tz}}`}}}~}}q}}}}}~}y}}|r||}}}}~}}~}}{}}}}}}}}}}}y~}}}}t}}}
;}}}}}}{|}|}}{r}}}}tv~}}}t}m|~}h}}}}}v}}qt}|y|{|u~}}|}~_}~|~~|}~|}~}|e
;Y}~}}~}}~}v}t}}}j}}~}}}~}}~|}}}}|}}|}}}~}}}}};}}~}}x}}|}}yu}}}wy`}|~~
;{}}x}}}}}~}}}~{v}}}}}{}~}}~}~}}|}}}}}}}}}}u}}y~~{g}L}}}}}|~}{~|}|}|}}
;l}}}}{_}}}}~z~}|}~~|~~x|~}~z~~|~}~~~}~~~}y}~~|k}}Pf}w~~~~T}}}F~~}~z}{
;{~|}|zZ~s~|}r~}}~~|}}}}}~|}}}}n}}}~}}}}~}}}}}}}}}}}|~}}}}}}~~}}~~~~~}
;N}}}}}|~}}~}}}}{}}}~}}|~~|}|}}}~|}~}}}}}}D}~}}}}}}}}}}}}}}}}|}}z}|}}}
;}k}}}}}}}}}}}}}}}}}}}|}}}}}|}}}}}|}}}}}|}}}}|~{~}}}G}~}}}}}~}}}}}}}}}
;}}h}}~|}|}}}}}}}}}}}}}}}}}}}}}}}}{}}|}}}}|~|}}}}}}}M}}}}}}}~}}}}}}}}}
;}}~|}}}}}~~~}}}~}}|}g}}}}}}}}~|}}}}}}}}}}}}}}}}}|}~t~}|}}}}}}}||~}}}}
;|}}}}~}}}}}}~}}}gi}}~}}}}}}}}}}}}|}}~}}g}~}}v}}}}}|{}}}}}}~}}~|e}}}}}
;}}|}}~}~}~}}~}|~}|}~|||}}}}|~}}~}}}}}}}}|}}~}}~}|~}|y~}~}}}}}|}}~~}~}
;}}~}~}|~}}}~}}}}~}{}}~}}~}}}||~~}||~}x|}{~|}|~~|~}}}|{~}}~}{}}||~}}}~
;|~}~}}}~|}}~~|,~~~~~~~~~~~~~~m~~~
;EndOfComments
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAGATCTGGCCTTCCCGCGGCGGAAG
GCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTTTGGGGAG
GAGACAACAACTCCCTCTCAGAAGCAGGAGCCGAAGGACAAGGAAACATATCCTTTAGCTTCCCTCAAAT
CACTCTTTGGCAACGACCCCTTGTTACAATAAAGATAGGGGGGCAGCTAAAGGAAGCTCTATTAGATACA
GGAGCAGATGACACAGTATTAGAAGAAATGGATTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAA
TTGGAGGTTTTATCAAAGTAAAACAGTATGATCAGATACCCATAGAAATTTGTGGACATAAAGTGATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTGAAGCCAGGAATGGATGGCCCAAAGG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAATAGAAATCTGTGCAGAAATGGAAAAGGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGAAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAAAGAACTTAATAAGAGAACTCAGGACTTCTGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAGTCAGTAACAGTACTGGATGTGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTACACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACCAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AATGTAGCATGACAAAAATCTTAGAACCTTTTAGAAAACAAAATCCAGATATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCTT
>reconstructed_2
;Freq:0.515259
;Overlap quality scores (5,10):4,4
;Confidence scores
;~~~~~~~w~~zz~~~~z~~w~|wy~|~~{|~~|~~~~~~~~}~{~hD~~~}|~|}}t~~_;~tue}}S}
;~}~~p}}l}~P}}ZlvL~z}}h~g~A~x}}}s}}{S~}~}}}}l}~|}SOq}}q}w}}~i\pz~w~~z~
;}7~|vzUu4|smtgm}~W}}~~}r}n`f`}{PZLSI:Vs_V_}}}}_<db}}}}}CQ}YJ}t}}}~}}}
;T}}~|}}}}}}}~}}k}}}}}}q}L}}]}}}L}}}tt|~}{q}}{M}lAIu}}}s{}}}~y~}st}|}}
;z}}}}|}}}}}{q|}Z}}}}}}P~}|}|}}E}}cLbi}vu}}|}}|}g}}}}}}|}~}~{}~}|}{}}}
;}~}|}}}{{|~}pp}~~~{|}}~}[yEx}}\}}}{}N}uoK}}pauSy}}}v~~J}}q}}v~}c}~}\i
;]|}}~}|}~}r}Y}}}B}}~||~}}}}{}}}|}|~|}~~~~}}~{w}{}~|R|}s|}k{~}}}ra}q}}
;o}}d}~}}}}}~}}|X~}}~}e}}}}}}}|}y|}}}}~}|~}c~~b~~nL~a}}}}}~}}m}}~{}}}}
;g}}}}uLm~l}}`~|n}}~}}|d~~y~u}|tz~~~~|}}~~t}}}tx}|Tu~v}}~}^|~~w~|~}m{}
;u}}}|w;}{}{}r}||}}z|}|~|}z}||}y}}}~}}}|}}~}}}|}y}}}}}}}~}||}}}}}}~}}|
;v{}|}}{}}}|~|}}|}}}}}}}}}}|}}}}}|}~}}}}}}p}}||}}}}}~}}}{}}}}}}}z}}}}}
;}|}}~}}}|}}}}}}}}}}}}}}}}}}}}}}|}||}}|}|}}}}|}|}}}}u}}}}}}}}}}}}}}}}}
;}}g}}}{}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}y}}}}}|}m}}}}}}}}}}}}}}}}}
;}}}}}}}}}}}}}}}}}}|}i}}}}}}}}}}|}}}}}}}}}}}}}}}}}}}H}}}}}}}}}|}{}}}}}
;|}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}y}}|}H}}}}}}}}}}}}}}}}}}u}}}}}
;}}}}}}}}}}}}}}}}}z}}}}}}}}}}}}}}}}|}}}|}}}}}}}}}|}}}}}}}}}}}}|}}}}|}}
;}}}}}}}}}}}}||}|}}|}}}}}}|}}|}}}~}|}|}}}|}{}}}}|}}}}|{|}}}}x||}}}|{}}
;}}}}}}}~}}~}~|n}~}}}}}~}}~~|~~~y~,
;EndOfComments
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAAATCTGGCCTTCCCACAAGGGAAG
GCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTTTGGGGAA
GAGACAACAACTCCCTCTCAGAAGCAGGAGCCGATAGACAAGGAAATGTATCCTTTAGCTTCCCTCAAAT
CACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACA
GGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAA
TTGGAGGTTTTATCAAAGTAAAACAGTATGATCAGATACCCATAGAAATCTGTGGACATAAAGCTATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAAATGGAAAAGGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGGAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCCGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTATACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACCAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AAGCTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCCC
ph_parser takes this output and creates a proper fasta file with each resontructed haplotype and a text file that has the hpalotype diversity estimate.
PH Parser Output
$ cat ph_summary.txt
PH_num_hap 3
PH_hap_diversity 0.611668153059
PH_seq_len 1208
$ cat ph_haplotypes.fna
>reconstructed_0 Freq=0.190638
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAATATCTGGCCTTCCCGCAAGGGAAG
GCCAGGGAACTTTCCTCAGAGCAGGCTAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGCTTGGGGAA
GCAACAACAAAGCCCTCTCAGAAACAGGAGCCAATAGACAAGGAAATGTATCCTTTAACTTCCCTCAGAT
CACTCTTTGGCAACGACCCCTTGTCACAATAAAGATAGGGGAGCAACTAAAGGAAGCCCTGTTAGATACA
GGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAAACCAAGAATGATAGGGGGAA
TCGGGGGTTTTATCAAAGTAAGACAGTATGATCAGATAGCCATAGACATCTGTGGCCATAAAGCTATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTGAAGCCAGGAATGGATGGCCCAAAGG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAAATGGAAAAAGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGGAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAGGACTTCTGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGCGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTATACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACTAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AAGCTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCTT
>reconstructed_1 Freq=0.294104
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAGATCTGGCCTTCCCGCGGCGGAAG
GCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTTTGGGGAG
GAGACAACAACTCCCTCTCAGAAGCAGGAGCCGAAGGACAAGGAAACATATCCTTTAGCTTCCCTCAAAT
CACTCTTTGGCAACGACCCCTTGTTACAATAAAGATAGGGGGGCAGCTAAAGGAAGCTCTATTAGATACA
GGAGCAGATGACACAGTATTAGAAGAAATGGATTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAA
TTGGAGGTTTTATCAAAGTAAAACAGTATGATCAGATACCCATAGAAATTTGTGGACATAAAGTGATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTGAAGCCAGGAATGGATGGCCCAAAGG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAATAGAAATCTGTGCAGAAATGGAAAAGGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGAAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAAAGAACTTAATAAGAGAACTCAGGACTTCTGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAGTCAGTAACAGTACTGGATGTGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTACACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACCAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AATGTAGCATGACAAAAATCTTAGAACCTTTTAGAAAACAAAATCCAGATATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCTT
>reconstructed_2 Freq=0.515259
ACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAAATCTGGCCTTCCCACAAGGGAAG
GCCAGGGAATTTTCTTCAGAGCAGACCAGAGCCAACAGCCCCACCAGAAGAGAGCTTCAGGTTTGGGGAA
GAGACAACAACTCCCTCTCAGAAGCAGGAGCCGATAGACAAGGAAATGTATCCTTTAGCTTCCCTCAAAT
CACTCTTTGGCAACGACCCCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACA
GGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAGATGGAAACCAAAAATGATAGGGGGAA
TTGGAGGTTTTATCAAAGTAAAACAGTATGATCAGATACCCATAGAAATCTGTGGACATAAAGCTATAGG
TACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGATTGGTTGCACT
TTAAATTTTCCCATTAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAG
TTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAATTTGTACAGAAATGGAAAAGGA
AGGGAAAATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAGGAAAAAAGAC
AGTACTAAATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCCGGGAAGTCC
AATTAGGAATACCACATCCCTCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGATGTGGGTGATGC
ATATTTTTCAGTTCCCTTAGATGAAGACTTCAGAAAGTATACTGCATTTACCATACCTAGTGTAAACAAT
GAGACACCAGGGATTAGGTATCAGTACAATGTGCTTCCACAAGGATGGAAAGGATCACCAGCAATATTCC
AAGCTAGCATGACAAAAATCTTAGAGCCTTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACAT
GGATGATCTGTATGTAGGATCTGACCTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAACTGAGAGAA
CATCTGTTGAGGTGGGGGTTTTGCACACCAGACAAGAAACATCAGAAGGAACCTCCATTCCTTTGGATGG
GTTATGAACTCCATCCCC
cliquesnv
A reference-based reconstruction of viral variants from NGS data(documentation). Input is read files in Fastq format and reference Fasta file. Input reads are aligned and outputted to SAM format using BWA and SAMtools prior to running CliqueSNV. Output is inferred viral variants with respective frequencies and diversity. Please see the CliqueSNV documentation for a full description of CliqueSNV options.
Usage:
haphpipe cliquesnv [CliqueSNV OPTIONS] [HAPHPIPE OPTIONS] --fq1 <FASTQ> --fq2 <FASTQ> (or) --fqU <FASTQ> [--outdir]
Output files:
File | Description |
---|---|
cs[NUM]_[REGION].fasta | Reconstructed haplotypes in FASTA format. |
cs[NUM]_[REGION].txt | CliqueSNV output summary file. |
cs[NUM]_[REGION]_summary.txt | HAPHPIPE-generated output summary file with sequence length and diversity. |
Input/Output Arguments:
Option | CliqueSNV Equivalent | Description |
---|---|---|
--fq1 SEQS | NA | Input reads in FASTQ format (read 1) |
--fq2 SEQS | NA | Input reads in FASTQ format (read 2) |
--fqU SEQS | NA | Input reads in FASTQ format (unpaired reads) |
--ref_fa REF_FA | NA | Reference FASTA |
--outdir OUTDIR | --prefix | Output directory (default: .) |
CliqueSNV Options:
Option | CliqueSNV Equivalent | Description |
---|---|---|
--jardir JARDIR | -jar | Path to clique-snv.jar |
--O22min O22MIN | -t | Minimum threshold for O22 value |
--O22minfreq --O22MINFREQ | -tf | Minimum threshold for O22 frequency relative to read coverage |
--printlog | -log | Print log data to console |
--merging MERGING | -cm | Cliques merging algorithm: accurate or fast |
--outputstart OUTPUTSTART | -os | Output start position |
--outputend OUTPUTEND | -oe | Output end position |
--fasta_format FASTA_FORMAT | -fdf | Fasta defline format: short or extended, add number at end to adjust precision of frequency |
Options:
Option | Description |
---|---|
--keep_tmp | Keep temporary directory |
--quiet | Do not write output to console (silence stdout and stderr) (default: False) |
--logfile LOGFILE | Name for log file (output) |
--debug | Print commands but do not run (default: False) |
--ncpu NCPU | Number of CPU to use (default: 1) |
Example usage:
haphpipe cliquesnv --fq1 corrected_1.fastq --fq2 corrected_2.fastq --ref_fa final.fna