GLnexus
Scalable datastore for population genome sequencing, with on-demand joint genotyping
 All Classes Functions Variables
Public Member Functions | Static Public Member Functions | List of all members
GLnexus::BCFKeyValueData Class Reference

#include <BCFKeyValueData.h>

Inheritance diagram for GLnexus::BCFKeyValueData:
GLnexus::Metadata GLnexus::BCFData

Public Member Functions

Status contigs (std::vector< std::pair< std::string, size_t > > &ans) const override
 
Status sampleset_samples (const std::string &sampleset, std::shared_ptr< const std::set< std::string > > &ans) const override
 
Status sample_dataset (const std::string &sample, std::string &ans) const override
 
Status all_samples_sampleset (std::string &ans) override
 
Status sample_count (size_t &ans) const override
 Return the count of all samples in the database.
 
Status new_sampleset (MetadataCache &metadata, const std::string &sampleset, const std::set< std::string > &samples)
 
std::shared_ptr< StatsRangeQuerygetRangeStats ()
 
Status dataset_header (const std::string &dataset, std::shared_ptr< const bcf_hdr_t > &hdr) const override
 Retrieve the BCF header for a data set.
 
Status dataset_range (const std::string &dataset, const bcf_hdr_t *hdr, const range &pos, std::vector< std::shared_ptr< bcf1_t > > &records) override
 
Status sampleset_range (const MetadataCache &metadata, const std::string &sampleset, const range &pos, std::shared_ptr< const std::set< std::string >> &samples, std::shared_ptr< const std::set< std::string >> &datasets, std::vector< std::unique_ptr< RangeBCFIterator >> &iterators) override
 
Status sampleset_range_base (const MetadataCache &metadata, const std::string &sampleset, const range &pos, std::shared_ptr< const std::set< std::string >> &samples, std::shared_ptr< const std::set< std::string >> &datasets, std::vector< std::unique_ptr< RangeBCFIterator >> &iterators)
 
Status import_gvcf (MetadataCache &metadata, const std::string &dataset, const std::string &filename, std::set< std::string > &samples)
 
- Public Member Functions inherited from GLnexus::BCFData
virtual Status dataset_range_and_header (const std::string &dataset, const range &pos, std::shared_ptr< const bcf_hdr_t > &hdr, std::vector< std::shared_ptr< bcf1_t > > &records)
 

Static Public Member Functions

static Status InitializeDB (KeyValue::DB *db, const std::vector< std::pair< std::string, size_t > > &contigs, int interval_len=30000)
 
static Status Open (KeyValue::DB *db, std::unique_ptr< BCFKeyValueData > &ans)
 Open an existing database.
 

Detailed Description

Implements the Metadata and BCFData interfaces with everything stored in a given key-value database. One imported gVCF file (potentially with multiple samples) becomes a data set. The key schema permits efficient retrieval by genomic range across the datasets.

Member Function Documentation

Status GLnexus::BCFKeyValueData::all_samples_sampleset ( std::string &  ans)
overridevirtual

Return the name of a sample set representing all samples currently available. This may either create a new sample set if needed, or return an existing one if available. As always, the sample set is immutable: it will not include samples added to the database later (but one could call all_samples_sampleset again to get a different sample set including them).

Implements GLnexus::Metadata.

Status GLnexus::BCFKeyValueData::contigs ( std::vector< std::pair< std::string, size_t > > &  ans) const
overridevirtual

Get the reference contigs.

The indices of the vector are the "rid" used in range()

Implements GLnexus::Metadata.

Status GLnexus::BCFKeyValueData::dataset_range ( const std::string &  dataset,
const bcf_hdr_t *  hdr,
const range pos,
std::vector< std::shared_ptr< bcf1_t > > &  records 
)
overridevirtual

Retrieve all BCF records in the data set overlapping a range.

Each record x will already have been "unpacked" with bcf_unpack(x,BCF_UN_ALL). The records may be shared, so they must not be mutated. (They aren't declared const because some vcf.h accessor functions don't take const bcf1_t*)

The provided header must match the data set, otherwise the behavior is undefined!

Implements GLnexus::BCFData.

Status GLnexus::BCFKeyValueData::import_gvcf ( MetadataCache metadata,
const std::string &  dataset,
const std::string &  filename,
std::set< std::string > &  samples 
)

Import a new data set (a gVCF file, possibly containing multiple samples). The data set name must be unique. The sample names in the data set (gVCF column names) must be unique. All samples are immediately added to the sample set "*"

Status GLnexus::BCFKeyValueData::InitializeDB ( KeyValue::DB db,
const std::vector< std::pair< std::string, size_t > > &  contigs,
int  interval_len = 30000 
)
static

Initialize a brand-new database, which SHOULD be empty to begin with. Contigs are stored and an empty sample set "*" is created.

Status GLnexus::BCFKeyValueData::sample_dataset ( const std::string &  sample,
std::string &  ans 
) const
overridevirtual

Find the data set containing the sample.

The data set may contain other samples.

Implements GLnexus::Metadata.

Status GLnexus::BCFKeyValueData::sampleset_range ( const MetadataCache metadata,
const std::string &  sampleset,
const range pos,
std::shared_ptr< const std::set< std::string >> &  samples,
std::shared_ptr< const std::set< std::string >> &  datasets,
std::vector< std::unique_ptr< RangeBCFIterator >> &  iterators 
)
overridevirtual

Get iterators for BCF records overlapping the given range in all datasets containing at least one sample in the designated sample set. To facilitate parallelization, the implementation may yield multiple iterators, each of which will produce a range-based disjoint subset of the relevant records. Each iterator will yield results for each relevant data set (possibly yielding zero records in some steps) – that is, they will all reach their end after the same number of steps. The iterators together will produce each relevant record exactly once.

Reimplemented from GLnexus::BCFData.

Status GLnexus::BCFKeyValueData::sampleset_samples ( const std::string &  sampleset,
std::shared_ptr< const std::set< std::string > > &  ans 
) const
overridevirtual

List the samples in a sample set.

The resulting data structure may be shared, so the strings must not be mutated. They aren't declared const because...C++ http://stackoverflow.com/a/21365478

Implements GLnexus::Metadata.


The documentation for this class was generated from the following files: