GLnexus
Scalable datastore for population genome sequencing, with on-demand joint genotyping
|
#include <BCFKeyValueData.h>
Public Member Functions | |
Status | contigs (std::vector< std::pair< std::string, size_t > > &ans) const override |
Status | sampleset_samples (const std::string &sampleset, std::shared_ptr< const std::set< std::string > > &ans) const override |
Status | sample_dataset (const std::string &sample, std::string &ans) const override |
Status | all_samples_sampleset (std::string &ans) override |
Status | sample_count (size_t &ans) const override |
Return the count of all samples in the database. | |
Status | new_sampleset (MetadataCache &metadata, const std::string &sampleset, const std::set< std::string > &samples) |
std::shared_ptr< StatsRangeQuery > | getRangeStats () |
Status | dataset_header (const std::string &dataset, std::shared_ptr< const bcf_hdr_t > &hdr) const override |
Retrieve the BCF header for a data set. | |
Status | dataset_range (const std::string &dataset, const bcf_hdr_t *hdr, const range &pos, std::vector< std::shared_ptr< bcf1_t > > &records) override |
Status | sampleset_range (const MetadataCache &metadata, const std::string &sampleset, const range &pos, std::shared_ptr< const std::set< std::string >> &samples, std::shared_ptr< const std::set< std::string >> &datasets, std::vector< std::unique_ptr< RangeBCFIterator >> &iterators) override |
Status | sampleset_range_base (const MetadataCache &metadata, const std::string &sampleset, const range &pos, std::shared_ptr< const std::set< std::string >> &samples, std::shared_ptr< const std::set< std::string >> &datasets, std::vector< std::unique_ptr< RangeBCFIterator >> &iterators) |
Status | import_gvcf (MetadataCache &metadata, const std::string &dataset, const std::string &filename, std::set< std::string > &samples) |
![]() | |
virtual Status | dataset_range_and_header (const std::string &dataset, const range &pos, std::shared_ptr< const bcf_hdr_t > &hdr, std::vector< std::shared_ptr< bcf1_t > > &records) |
Static Public Member Functions | |
static Status | InitializeDB (KeyValue::DB *db, const std::vector< std::pair< std::string, size_t > > &contigs, int interval_len=30000) |
static Status | Open (KeyValue::DB *db, std::unique_ptr< BCFKeyValueData > &ans) |
Open an existing database. | |
Implements the Metadata and BCFData interfaces with everything stored in a given key-value database. One imported gVCF file (potentially with multiple samples) becomes a data set. The key schema permits efficient retrieval by genomic range across the datasets.
|
overridevirtual |
Return the name of a sample set representing all samples currently available. This may either create a new sample set if needed, or return an existing one if available. As always, the sample set is immutable: it will not include samples added to the database later (but one could call all_samples_sampleset again to get a different sample set including them).
Implements GLnexus::Metadata.
|
overridevirtual |
Get the reference contigs.
The indices of the vector are the "rid" used in range()
Implements GLnexus::Metadata.
|
overridevirtual |
Retrieve all BCF records in the data set overlapping a range.
Each record x will already have been "unpacked" with bcf_unpack(x,BCF_UN_ALL). The records may be shared, so they must not be mutated. (They aren't declared const because some vcf.h accessor functions don't take const bcf1_t*)
The provided header must match the data set, otherwise the behavior is undefined!
Implements GLnexus::BCFData.
Status GLnexus::BCFKeyValueData::import_gvcf | ( | MetadataCache & | metadata, |
const std::string & | dataset, | ||
const std::string & | filename, | ||
std::set< std::string > & | samples | ||
) |
Import a new data set (a gVCF file, possibly containing multiple samples). The data set name must be unique. The sample names in the data set (gVCF column names) must be unique. All samples are immediately added to the sample set "*"
|
static |
Initialize a brand-new database, which SHOULD be empty to begin with. Contigs are stored and an empty sample set "*" is created.
|
overridevirtual |
Find the data set containing the sample.
The data set may contain other samples.
Implements GLnexus::Metadata.
|
overridevirtual |
Get iterators for BCF records overlapping the given range in all datasets containing at least one sample in the designated sample set. To facilitate parallelization, the implementation may yield multiple iterators, each of which will produce a range-based disjoint subset of the relevant records. Each iterator will yield results for each relevant data set (possibly yielding zero records in some steps) – that is, they will all reach their end after the same number of steps. The iterators together will produce each relevant record exactly once.
Reimplemented from GLnexus::BCFData.
|
overridevirtual |
List the samples in a sample set.
The resulting data structure may be shared, so the strings must not be mutated. They aren't declared const because...C++ http://stackoverflow.com/a/21365478
Implements GLnexus::Metadata.