Home
Account
Tables

BigTable As A Web Service

BigTable

BigTable is a sparse, distributed, scalable database created by Google. It is used by many of their popular services. Until now it has only been available internally for Googlers, but there are some open source projects to duplicate it. For detailed design information, read Google's white paper.

I'm exposing the BigTable interface through a web service. Anyone can register for an account and get free, unlimited* access to the service immediately. Boom. Talk about a game changer.

* I'm using the Comcast definition of unlimited: 500 megabytes / the number of users of the system.

API

The API is pseudo-RESTful and based off of Hbase's Thrift API.

Tables exist in a global namespace. You must first register your table on the tables page. Rows belong to tables. Columns belong to rows and consist of two parts seperated by a colon ":", the family and column. Querying family and column returns cells that match the two, whereas specifying just family matches all columns that belong to that family. In other words, querying dean:howard would return only cells matching dean:howard. On the other hand, querying for dean: would return cells matching dean:howard and dean:kamen, as well as just dean:. Unlike the true BigTable, I don't require column families to be defined a priori. Tables, rows, and columns are strings. The value of each cell is a blob. Additionally, each cell has a timestamp associated with it. The timestamp is an ISO 8601 formatted string. Two formats are valid: 2008-04-14T01:16:02 and 2008-04-14T01:16:02.123456

JSON is used as the wire format. All methods return an array of cells. Each cell is a map. All parts are returned as strings. The value of the cell is returned as a base64 encoded value of the blob the cell holds. The map contains the following keys: table, row, column, value, timestamp.

Destructive methods (put and delete) require authentication in the form of email:secret_key. The secret key is configured on the account page. This is sent over the wire in cleartext format. DO NOT USE YOUR GMAIL PASSWORD!

Get

All these methods use the HTTP get method and return up to 100 cells.

http://bigtable.appspot.com/get/<table>/<row>/<column>
Returns the matching cell with the most recent timestamp.

http://bigtable.appspot.com/getRow/<table>/<row>
Returns all cells from the specified row with the most recent timestamp.

http://bigtable.appspot.com/getRowTs/<table>/<row>/<timestamp>
Returns all cells from the specified row with the specified timestamp.

http://bigtable.appspot.com/getVer/<table>/<row>/<column>/<count>
Returns the count most recent cells.

http://bigtable.appspot.com/getVerTs/<table>/<row>/<column>/<timestamp>/<count>
Returns the count most recent cells with timestamps less than that specified.

Put

All these methods use the HTTP put method. The cell value is sent as the body of the request.

http://bigtable.appspot.com/put/<auth>/<table>/<row>/<column>
Inserts the specified values with the current the current time.

http://bigtable.appspot.com/putTs/<auth>/<table>/<row>/<column>/<timestamp>
Inserts the specified values.

Delete

All these methods use the HTTP delete method and return the deleted values, up to 100 at a time.

http://bigtable.appspot.com/deleteAll/<auth>/<table>/<row>/<column>
Delete all matching cells.

http://bigtable.appspot.com/deleteAllRow/<auth>/<table>/<row>
Delete all matching cells.

http://bigtable.appspot.com/deleteAllRowTs/<auth>/<table>/<row>/<timestamp>
Delete all matching cells.

http://bigtable.appspot.com/deleteAllTs/<auth>/<table>/<row>/<column>/<timestamp>
Delete all matching cells with a timestamp less than or equal to that specified.

Additional Features

In addition to the JSON response, there is another option for the get method to return just the binary data. This allows it to be called by browsers directly with no wrapper or client. To do this, just append ?content-type=<type> to the URL. For example, here is an image I uploaded:
http://bigtable.appspot.com/get/people/andrew.hitchcock/photo:hello.jpg?content-type=image/jpeg

Connecting

Manual (Telnet)

Since the authenticated requests don't use HTTP GET, they can be hard to construct using a browser. Telnet makes it easy to do simple testing. Here is an example put request using telnet:

PUT /put/test@example.com:test/people/will.mortensen/relationship: HTTP/1.1
Host: bigtable.appspot.com
Content-Length: 13


Will is cool!

And one for delete:

DELETE /deleteAllRow/test@example.com:test/people/will.mortensen HTTP/1.1
Host: bigtable.appspot.com

For either of these examples, open up your shell and type telnet bigtable.appspot.com 80. Once it connects, paste in one of the above and hit enter. You have to be fast because the server aggressively closes connections.

Python

In order to ease adoption, I've created a python client for use with the service: BigTable.py. Read the class to see the methods, it is pretty simple. It requires httplib2 and json-py. Here is an example usage.

>>> import bigtable
>>> bt = bigtable.BigTable('localhost', '8080', 'test@example.com', 'test')
>>> bt.put('test', 'foo', 'bar:baz', 'How are you?')
{'column': 'bar:baz', 'table': 'test', 'row': 'foo', 'value': 'SG93IGFyZSB5b3U/', 'timestamp': '2008-04-14T09:11:17.144274'}
>>>

To upload a file:

>>> f = open('/User/Andrew/hello.jpg', 'r')
>>> bt.put('people', 'andrew.hitchcock', 'photo:hello.jpg', f.read())

Java

I hurriedly put together a Java client. It can be found at BigTable.jar. I put it together really fast, so expect there to be bugs. There are no JavaDocs associated with this, but the methods are pretty similar to those outlined above. You should be able to figure it out from the method signatures. The client is org.andrewhitchcock.bigtable.BigTableClient and you can constuct it by passing in your email and secret key as Strings. It requires stringtree-json as well as Commons httpclient, logging, and codec. It has the iharder base64 codec builtin.

BigTableClient client = new BigTableClient();
Collection<Cell> cells = client.getRow("people", "andrew.hitchcock");

About

Created by Andrew Hitchcock one weekend. A shout out to my friend Richard for his python help.