Friday, May 25, 2012

A data communication standard - OData vs. GData vs. homebrew

Today I discovered OData.  Then I discovered GData.  Then I discovered that I don't like either one.

The idea of a data standardization protocol is appealing on the surface, but the two "official" implementations of a common web protocol for data consumption are not my cup of tea.  I do like tea, however.  But I'm picky about my tea and I'm picky about how I communicate with web servers.

First off, I don't like complicated protocols.  I find them annoying as do most other programmers.  This is one of the reasons why SOAP slipped on itself and fell on its derriere at the starting gate.  /hahagreatjoke

I've been working with JSON a lot lately.  It benefits greatly from having native support in every popular scripting language and easy-to-integrate libraries in the few languages that don't.  So, if we want to develop a standard protocol, JSON is the starting and ending point for modern development.

For some strange, bizarre reason, when a company builds a "standardization protocol", they also waste a lot of time building into said protocol a "discovery service".  Let's face it:  The concept of a "discovery service" is just broken.  It is a LOT easier to skip that mess and just apply a healthy amount of documentation, which works equally well.  "Works equally well" (aka Laziness) trumps innovation.

One of the things I do is build a success vs. failure indicator into my JSON replies:

  "success" : false,
  "error" : "The API key is invalid."


  "success" : true,
  "results" : [ ... ]

This makes it easy to test, from any programming language, whether or not a specific reply was successful without having to check for the existence of an error message.

My JSON isn't very complex, but I imagine that if I were pulling a ton of data down, I'd possibly break it up with pagination.  However, OData's "__next" implementation is not what I'd do.  I'd return the LIMIT-ed set of result IDs and offer a mechanism to select 'x' results based on a subset of the IDs with another query because a server has no business dictating to a client how it should paginate results and the server would benefit from performance gains.  After all, most pagination results in running the same expensive query multiple times, whereas selecting a bunch of IDs will always use an index.

Another thing I do is version many of the APIs that I write.  "RESTful" interfaces are overrated and are kind of annoying when a query string works just as well, perhaps better.  So I just stick the version of the client API into my query string.  I generally do version breakage - if I upgrade the version of the server, I immediately break all clients using the older version.  If I were a large organization like Google, I might keep old API versions around for a few months.  However, I'm just one programmer, so trying to support multiple versions of an API will be a waste of my time.  I want you on the latest and greatest version, not some old version.  But I did just come up with a good idea:  Add an "api_expires" response to all queries to the server.  Set it to "never" initially.  When you want to deploy a new version of an API or terminate an obsolete version, change it to some date/time in the near future.  This will allow application developers to have fair warning and be able to programmatically do something (e.g. send an e-mail to themselves).

The last thing I do is encrypt some of my JSON requests and replies using shared secrets.  This allows for more secure communication between client and server even over plain-ol' HTTP.  I'll save this discussion for a separate post.

So there you have it - my homebew protocol.  Simple is better.

No comments:

Post a Comment