Added option to read regions from a text file.#840
Conversation
| " -o, --output FILE Write FASTA to file.\n" | ||
| " -n, --length INT Length of FASTA sequence line. [60]\n" | ||
| " -c, --continue Continue after trying to retrieve missing region.\n" | ||
| " -r, --region_file FILE File of regions. One per line.\n" |
There was a problem hiding this comment.
It should be stated that a line in the region file has the format reg:from-to (chr1:12209228-12209246) and it is 1-based, so to be clear that is not a BED file or any other TAB delimited format.
There was a problem hiding this comment.
Is there a reason it couldn't additionally accept the regions as a BED file? This would be useful and in line with other samtools commands (though starting to duplicate the already-existing bedtools getfasta).
While you're editing the usage… wouldn't putting FILE/INT/etc beside their options be more readable?
" -o, --output FILE Write FASTA to file\n"
…etc…
" -r, --region_file FILE File of regions. One per line\n"
There was a problem hiding this comment.
Oh, also usual would be --region-file with hyphen rather than underscore.
There was a problem hiding this comment.
Supporting BED files as well could be useful, although if supported it should have its own option. I see both -b (used by samtools depth) and -L (samtools view) are available. This might count as future work, though. We'd have to look into whether the interface in bedidx.h works in the way we would want it to. This PR is simply to get around the command line limitations on how many regions you can ask for, and works pretty much in the way you would expect for that purpose.
The FILE/INT positioning was presumably so they all lined up in the same column. It probably made more sense when there were only short options. And yes, it should be hyphens not underscores to match the other long options.
There was a problem hiding this comment.
Regions have their own section in the man page and neither the current faidx usage or man entry defines them. I will add a little bit more to the usage but not a full definition. BED files are a different issue and if anyone wants to raise one then I will look at it. Yes it should be --region-file.
7ac7ce5 to
466752b
Compare
This commit also fixes a bug when writing output to file - the sequence names were written to stdout instead of the output file. The tests have been updated accordingly.
466752b to
68b1e87
Compare
This commit also fixes a bug in writing output to file. The tests have been updated accordingly. Closes #753 and is an alternative solution to the one provided in #754.