GLnexus
Scalable datastore for population genome sequencing, with on-demand joint genotyping
 All Classes Functions Variables
Public Member Functions | List of all members
GLnexus::BCFData Class Referenceabstract

#include <data.h>

Inheritance diagram for GLnexus::BCFData:
GLnexus::BCFKeyValueData

Public Member Functions

virtual Status dataset_header (const std::string &dataset, std::shared_ptr< const bcf_hdr_t > &hdr) const =0
 Retrieve the BCF header for a data set.
 
virtual Status dataset_range (const std::string &dataset, const bcf_hdr_t *hdr, const range &pos, std::vector< std::shared_ptr< bcf1_t > > &records)=0
 
virtual Status dataset_range_and_header (const std::string &dataset, const range &pos, std::shared_ptr< const bcf_hdr_t > &hdr, std::vector< std::shared_ptr< bcf1_t > > &records)
 
virtual Status sampleset_range (const MetadataCache &metadata, const std::string &sampleset, const range &pos, std::shared_ptr< const std::set< std::string >> &samples, std::shared_ptr< const std::set< std::string >> &datasets, std::vector< std::unique_ptr< RangeBCFIterator >> &iterators)
 

Detailed Description

Abstract interface to stored BCF data sets. The implementation is responsible for any suitable caching.

Member Function Documentation

virtual Status GLnexus::BCFData::dataset_range ( const std::string &  dataset,
const bcf_hdr_t *  hdr,
const range pos,
std::vector< std::shared_ptr< bcf1_t > > &  records 
)
pure virtual

Retrieve all BCF records in the data set overlapping a range.

Each record x will already have been "unpacked" with bcf_unpack(x,BCF_UN_ALL). The records may be shared, so they must not be mutated. (They aren't declared const because some vcf.h accessor functions don't take const bcf1_t*)

The provided header must match the data set, otherwise the behavior is undefined!

Implemented in GLnexus::BCFKeyValueData.

Status GLnexus::BCFData::dataset_range_and_header ( const std::string &  dataset,
const range pos,
std::shared_ptr< const bcf_hdr_t > &  hdr,
std::vector< std::shared_ptr< bcf1_t > > &  records 
)
virtual

Wrapper for dataset_range which first fetches the appropriate header (useful if the caller doesn't already have the header in hand)

Status GLnexus::BCFData::sampleset_range ( const MetadataCache metadata,
const std::string &  sampleset,
const range pos,
std::shared_ptr< const std::set< std::string >> &  samples,
std::shared_ptr< const std::set< std::string >> &  datasets,
std::vector< std::unique_ptr< RangeBCFIterator >> &  iterators 
)
virtual

Get iterators for BCF records overlapping the given range in all datasets containing at least one sample in the designated sample set. To facilitate parallelization, the implementation may yield multiple iterators, each of which will produce a range-based disjoint subset of the relevant records. Each iterator will yield results for each relevant data set (possibly yielding zero records in some steps) – that is, they will all reach their end after the same number of steps. The iterators together will produce each relevant record exactly once.

Reimplemented in GLnexus::BCFKeyValueData.


The documentation for this class was generated from the following files: