Euphoria Audio, LLC

Audio, Video and Data Consulting

Hypertable C++ Thrift Tutorial

No Comments »

Recently I was looking at different NOSQL solutions to storing huge amounts of data effeciently and so I started digging into Big Table like solutions including Cassandra, Hbase and Hypertable. All three are fairly similar in operation and access and though the first two are written in Java, Hypertable is written in C++, thus standing to have a slight performance benefit over the Java apps. While it's much faster to access the servers through their native interfaces, they all support Thrift which is necessary if your language doesn't have a native driver (or a Thrift wrapper) yet and you need to get coding. This is a brief tutorial that should get you started writing a low level Thrift interface for Hypertable.

First off, many thanks to Padraid O'Sullivan who's Cassandra example I found and used to get all of my Thrift code working. This example uses the Boost libraries as they make C++ coding much easier. You'll also want to start by compiling the Hypertable CPP libraries using the Thrift code generator. We'll start at with the header file:

hypertable.h

#ifndef MYHYPERTABLE_H_
#define MYHYPERTABLE_H_

#include <string>
#include <vector>
#include "Thrift.h"
#include "TLogging.h"
#include "transport/TSocket.h"
#include "transport/TTransport.h"
#include "transport/TBufferTransports.h"
#include "protocol/TProtocol.h"
#include "protocol/TBinaryProtocol.h"
#include "ClientService.h"

using namespace std;
using namespace apache::thrift::transport;
using namespace Hypertable;
using namespace Hypertable::ThriftGen;

struct MyKV{
	string key;
	string value;
};

class MyHypertable {
public:
	MyHypertable(const string host, const int port, const string ns);
	virtual ~MyHypertable();
	bool Write_Data(const string column_family, const vector<MyKV> &data);
	bool Read_Data(const string start_key, const string end_key, vector<MyKV> &data);
private:
	string htNamespace;
	ClientServiceClient *client;
	boost::shared_ptr<TTransport> transport;
	Namespace nsid;
	bool initialized;
};

#endif /* MYHYPERTABLE_H_ */

We include a number of thrift headers in this file, mostly dealing with transports and protocols that will be used to communicate with Hypertable. The only Hypertable header we include is "ClientService.h" as generated by the thrift calls. This example uses vectors which, while not the fastest structures to work with, are easy to use and understand, so you may want to use something else to speed up your code. I also defined a struct called MyKV which represents a simple key/value mapping. Both fields are strings as the Thrift interface returns data as a string and you have to type case it to another form if required.

The constructor on line 27 takes the host, port and a namespace to work with. (Think of the namespace, in this case, as the database name when dealing with MySQL or a similar relational DB). After the destructor we have an initialization method that will be used to connect to the server. Then we have Write_Data which takes a column family value and a vector of data objects to store or update in the table. Read_Data will perform a row scan and grab any values between start_key and end_key, storing the results in the data vector. For fields, we have the namespace that will be used in all methods, a pointer to the ClientServiceClient representing the Hypertable connection, a transport shared pointer, the namespace ID and a flag to determine whether the class has been initialized or not.

Pages:  1   2   3   4   5 

Leave a Reply