using json-schema for exploring api servers

A while ago Google introduced an API Explorer for many of their APIs.  In Mozilla Messaging we had done a similar api browser for Raindrop that James Burke created, which we ended up using in the early days of F1 as well.  I’ve been meaning to make our api browser generic so that it could be used with any api server, and had played with it off and on over the past couple months.  In the last couple weeks I’ve found a few hours here and there to get that idea to a working state, at least for Python.  Here is an example of the API browser, along with a response from testing the api (lower part of right panel).

Following Googles information on API Discovery, I used json-schema for the base data format.  I then hacked up a UI from another project (design by Andy Chung) to show documentation based on that schema, and to allow interacting with the API through a form.  Since it’s based on json-schema, it can at least display the documentation for any API that provides a json-schema file.

Another big benefit of using json-schema is that you should be able to use the generic API clients that google has created against your own API servers, which only lacks in a pure Javascript implementation (they use GWT).

Here’s the same frontend showing the Google Books API:

In the server side Python code, I use a set of decorators for defining the top level application documentation and the documentation for each api endpoint.  Information from Routes is used to find the api documentation, then the json-schema is put together and cached on the first call to the server. A cool benefit is that I also use the schema decorators to validate api input against the json-schema.  You can see my full implementation in the apibase project on github.

Provides:

  • decorators for Paste/Routes based applications
  • input validation for API endpoints
  • controller for retrieving schema
  • simple browser to the API
  • ability to use generic clients, no more writing client side api modules

Missing, or what might be next when I have time:

  • Authentication controls for protected APIs (e.g. Googles Explorer handles client side oauth)
  • Generally needs work and cleanup
  • Support for more than one API at a time in the browser, perhaps support remote servers that support CORS
  • Support for parameter constraints in the API browsers test form (e.g. min/max values for integers), although validation of those should be working in the server
  • Possibly use more information from Routes in generating json-schema
  • Expose more documentation in the browser (e.g. HTTP method of call)
  • Create a pure Javascript client library to work against json-schema enabled APIs
  • Servers for other languages? (e.g. node.js)
  • other ?

 

user management for oauth in firefox

Last week I worked on a little experiment around providing better user management over their OAuth permissions. There are a couple issues that I wanted to tackle, the first is how to better educate users around what it means to give OAuth access to a website, the second is how to provide users with an interface that allows them to easily see what sites have access to their data, and how they can manage that.

With the first issue, you can read my previous rant about Huffington Post linking their Twitter Follow button to OAuth, basically getting full access to users accounts on Twitter when users follow them through their site. My solution is partially inspired by the KnowMore extension which lets users know more about a companies behavior when they visit a site (e.g. this company is known to use child labor). I wanted to see if having a mechanism in Firefox that could be used to tell users more about what sites do with OAuth access to their accounts would be interesting. Right now it doesn’t really tell them anything beyond what the normal OAuth allow pages do, but it could.

In the image below, you can see I added a notification bar. This will appear anytime an OAuth authorization request is made.

The second issue is also a hard user problem. Users might give a lot of different sites access to different accounts (Facebook, Google, Twitter, etc.). It is non-obvious where a user can go to see what has access to their data. I added an about:oauth page (see below, yes it’s fugly right now) that shows what sites have access to what accounts (assuming they had this addon installed when they gave that access) and provides a link to the accounts revoke/manage page. Unfortunately I have to hard-code the revoke url since there is no discoverability for that.

Where could it go from here?

The notification bar could link to some user maintained pages that describe any abuses that sites might do, or it could simply point to a user education page that explains privacy issues around giving access to their account data.

We could work with organizations to get discoverability and api’s for OAuth token management and have a single point of management in the browser.

I supose it just depends on whether there are enough tin-foil hats in the room.

If you’re interested in the code, you can find it on github.

What I would change in OAuth

Now that I had my rant about the misuse of OAuth, I thought I’d mention a couple things that I think should be fixed in OAuth. I’m kind of skipping the “why” here, just to keep things short. And no, this doesn’t fix the problems from my rant, I’m not seeing how those issues can be fixed from a protocol perspective.

1. Enhance permissions via attribute exchange

OpenID has a couple extensions, Attribute Exchange and Simple Registration Extension. Adding a similar simple attribute exchange capability to OAuth would allow for finer permissions control. While OAuth 2 has added a scope for requesting permissions, it still lacks the ability to define required permissions and optional permissions, along with information as to what those permissions will be used for. A core/common set of social permissions should also be developed (e.g. write-wall). Oh, and please stop pretending that write access also means read access.

2. Add a ’3rd party authentication’ setting

It seems some sites are using OAuth in place of OpenID for having a way to authenticate users, usually due to wanting access to some account data/apis as well. I see plenty of confused comments around the two. Since it seems OAuth will get used this way, OAuth should just provide an authentication id like OpenID. It is a bit out of scope for OAuth, but really sites just want a “Connect with X” capability where they can authenticate a user and easily ask them for more information. OpenID+OAuth could be used as well, but it’s just more complexity.

Maybe what I really want is OpenID Connect

3. Better UI handling

Google at least supports a method to tell them what size/kind of UI you want for the OAuth authorization page, this should be a standard part of OAuth. Having done two systems already that attempt to put that into a dialog, it’s easy to see that every single site does something different enough to make the process awkward.

4. Management discovery

Ever try to revoke access to an application? Know where to start? Knowing and managing what access you have given to what applications/web sites should be easy for the user. That management aspect is not really a part of OAuth, but the discovery of the management URL should be defined in OAuth. I might go a step further, and say that sites implementing OAuth should also have an OAuth management api. Then it would be simple to provide a centralized management ui in a Firefox addon. It would also be nice to know what data has been retrieved by who.

Huffington Post: an example of social privacy problems

A few weeks ago I ran across an article (via some social site) on Huffington Post. I read articles there from time to time, and I thought, why not follow them on Twitter? I found the Twitter icon, clicked on it and got presented with a small dialog giving me a few options.

Well I don’t want to login to their site using Twitter, I click on the big Follow button. What happened next surprised me: I ended up on the Twitter OAuth page…that has a tantalizing big blue “Allow” button.

For those who may not know what OAuth is, put simply, it is a way to authorize one web site to access your data from another website without giving away your password. Generally the first web site will ask for certain kinds of access, such as posting to your wall (Facebook), reading your contacts, or accessing your profile information (e.g. email address, age, etc).

Despite Twitter saying they “take your privacy very seriously”, when you give Twitter OAuth access to a site, the web site gets access to everything in your Twitter account, including reading your direct messages (kind of like private messages between two Twitter users), the people you follow, and the people who follow you. Basically, a site using OAuth with Twitter can do everything you can do on Twitter, they are, in fact, YOU.

Well, I was a bit shocked that I was being asked for access to my Twitter account just so I could follow their tweets, it’s unnecessary. My next thought was, why do they want this access? Their login page really doesn’t explain or provide me a way to find out.

I looked at their privacy statement, no mention of Twitter, I looked at the user agreement, way too long and legalese to digest. I finally thought of looking in the FAQ, and while it doesn’t explicitly state what they will do with my Twitter account, I kind of figured it out (they’ll use my contacts to show them what I see on the site, and likewise to let me see what my contacts view).

Well, that’s kind of ok, if that was what I was trying to do, all I wanted was to follow their tweets. I wonder how many of their 785K followers gave them full access to their accounts.

Then I wondered about what they would get from other accounts. With Google they get my Gmail address and my contacts. With Yahoo!, they get access to my status, my updates, contacts and profile. I don’t even bother looking at what that get from Facebook, it would be too much.

All I wanted to do was follow their tweets.

Why is this a problem? Well, I do a bit of work with OAuth and OpenID and understand what can be obtained from using these. I think they are great technologies when used correctly. That’s the problem. When used incorrectly, typical non-technical users are not going to understand the implications. My hunch is that the typical user will give that access away without necessarily understanding what is happening.

Is the problem OAuth?

While there are problems that should be fixed in OAuth, the scenario above is not fixable by the OAuth protocol. The scenario above is an example of two organizations doing the wrong thing with the OAuth protocol. Twitter simply does not provide enough controls, tossing out the baby with the bathwater. Huffington Post appears to be attempting to gain subscribers by relying on the lack of understanding that the general population has around the technologies involved. Yes, given some knowledge and digging I feel like I know what will happen with my data; No, I don’t feel like either organization is Evil, just Wrong.

I decided not to follow Huffington Post and feel somewhat deflated.

sidenote: yes, a website may limit their access to only reading your twitter data, but that still gives access to all your data.

wsgi middlewares for profiling and debugging

A while back I implemented a debugging and profiling middleware, and I’ve been using those with Pylons recently. I think their pretty useful, so I’ve wrapped them up into an installable egg that contains Paste interfaces in setup.py. This allows you to easily insert the middleware into any existing Paste project (e.g. a Pylons project). I’m basically going to use this middlewares project as a dumping ground for middleware I find useful. It currently contains a debug (via DBGP) middleware, a profiler and a csrf middleware. There is nothing that says you must use Paste with these, Paste just makes it easier. The csrf middleware is currently tied to using Beaker.

Here’s a screenshot of Komodo IDE debugging my Pylons app. Lots of other debuggers support DBGP, but for many reasons I like Komodo.

Below is a partial output of line profiling the csrf middleware call handler. The profiler can use line or call profiling. I find line profiling handy when I want to focus into a specific area. While debugging and call profiling require no code changes in your code, line profiling does require you to decorate the function(s) you want to line profile.

The profiler currently requires a patch if you want to do line profiling with Pylons. I’ve sent the patch to the maintainer of line_profiler.

Timer unit: 1e-06 s

File: ...../csrf.py
Function: __call__ at line 40
Total time: 0.004621 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    40                                               @profile
    41                                               def __call__(self, environ, start_response):
    42         1           29     29.0      0.6          request = Request(environ)
    43         1            3      3.0      0.1          session = environ['beaker.session']
    44         1          476    476.0     10.3          csrf_token = session.get('csrf')
    45         1            4      4.0      0.1          if not csrf_token:
    46                                                       csrf_token = session['csrf'] = str(random.getrandbits(128))
    47                                                       session.save()
    48
    49         1            7      7.0      0.2          if request.method == 'POST':

database migrations for SQLAlchemy part duex

Well, as I was looking at making miruku more reliant on sqlalchemy-migrate, I discovered the expirmental command: migrate update_db_from_model! So much for an afternoons work, but at least I’m much more familiar with the migration tools. So here’s how I’ve implemented an auto upgrade for Pylons.

First, easy_install sqlalchemy-migrate

Now, in your pylons development.ini, add the following to app:main:

# SQLAlchemy migration
# if managed, the migration repository is here
migrate.repository = %(here)s/changes
# automatically do database upgrades
migrate.auto = 1

Then, in PRJNAME.config.environment, in load_environment, after the call to init_model add the following:

    
    # sqlalchemy auto migration
    if asbool(config.get('migrate.auto')):
        try:
            # managed upgrades
            cschema = schema.ControlledSchema.create(engine, config['migrate.repository'])
            cschema.update_db_from_model(meta.Base.metadata)
        except exceptions.InvalidRepositoryError, e:
            # unmanaged upgrades
            diff = schemadiff.getDiffOfModelAgainstDatabase(
                meta.Base.metadata, engine, excludeTables=None)
            genmodel.ModelGenerator(diff).applyModel()

Of course, don’t forget the imports you need:

from paste.deploy.converters import asbool
from migrate.versioning.util import load_model
from migrate.versioning import exceptions, genmodel, schemadiff, schema

Run your app: paster serve –reload development.ini

Now with most basic changes in the model, when paste reloads your database will be updated to reflect the new model. This of course can fail sometimes, such as adding a new column with nullable=False.

I’m only using the unmanaged upgrades right now, so the managed section may need some tweaking, I’ll see when I get there.

database migrations for SQLAlchemy

It was a dark and stormy day, so I skipped stepping outside and worked a while. There’s Vancouver for you.

At Mozilla Messaging we’ve been using Pylons and SQLAlchemy on a couple projects.  One of the features this setup misses that Django and Rails provides is database migration.  Looking around, there’s but one choice for SA since it’s not built in, sqlalchemy-migrate.  Oh, there’s that other project, miruku.  Hmm.

At first glance, miruku seems much simpler, and actually it is a layer on top of migrate-sqlalchemy (sort of).  It doesn’t have upgrade/downgrade versioning, you can only upgrade, but you also don’t have to build up a set of migration scripts.  Just change the table, run the upgrade command, move on.  Sounds much better than figuring out, building and maintaining a set of migration scripts (of course it may not be) at least during development.  So that is where I focused my time.

The first problem I ran into is that it didn’t work with SA declarative classes.  The second problem, despite it being saved from oblivion, there doesn’t seem to be any active maintenance.

Damn, the gauntlet is thrown.  There goes my Sunday afternoon.

The result is a new, working miruku.  I’ve only tested it lightly in a simple Pylons app, and it’s unit tests are specific to using Elixir, but it upgraded my tables for me, even correctly handling a drop column in SQLite (which doesn’t support drop column, miruku does some heavy lifting to make this work).

To use with Pylons, I have to add a section to the ini files (how have ini files survived this long?)
[app:miruku]
sqlalchemy.PRJNAME.url = sqlite://path
miruku.PRJNAME.metadata = PRJNAME.model.meta:Base.metadata

Then setup the miruku support by running miruku:

miruku create --config my.ini --section app:miruku

After that, I can run upgrades with:

miruku upgrade --config my.ini --section app:miruku

There’s a bit more work I plan on doing it, then it may well bit rot again, but here’s my personal wish list for miruku:

  • support altering column properties, miruku only supports add and drop column
  • use more of sqlalchemy-migrate to reduce code size
  • examine table level changes to see if anything major is missing in miruku
  • paster command support
  • better Pylons/Paste integration for configs

If you’re interested in trying it out, drop me a note and let me know how it works for you. You can find miruku in my bitbucket repo.

can I stumble any faster?

I’m thinking about web performance again…

Since taking a look at a little project, which is using the async event driven Tornado server, I thought I’d re-investigation performance data around python web servers.  Often when I see something in action I question my previous choices and have to look into it.  I’ve been working on Raindrop for a while, and one decision I made was to use Pylons with SQLAlchemy as the basis for the api server, which is essentially a wsgi application.  I wanted something that gave a simple framework to work with, but didn’t need or want all the features of Django.  Async servers like Tornado have a few benefits that make them attractive, but you have to write your apps differently, and my main question is, is it worth it?

Conclusions first…

While I found a few interesting things, I have come to the same conclusion once again, it doesn’t really matter what server you use, 99% of the time perf is an app issue.  That in combo with the LOLapps setup and what benchmarking I’ve seen, I’m pretty resolved that Pylons+SQA is a good balance between not writing too low level and not having a large framework.  Probably not the best performer, but not bad.  I don’t think it’s worth reworking an existing app to fit a Tornado model, but it might be fun to try out a new app there sometime.

Here’s some of what I looked at:

Ian Bicking pointed out this blog post, and it’s probably one of the better benchmarking posts I’ve seen to date (specific to Python web servers).  Ian’s post of course was calling for more in depth benchmarking, hopefully someone hears his call.

I also found this set of 3 posts (find the later 2 from the first) doing some varied simple tests with Apache Bench.

What I felt was really interesting is a talk from LOLapps at PyCON (see the video), probably due to the fuzzy feeling of validation about Pylons :).

I’ll summarize the video, they use pylons + paste + sqa, experimented with tornado and decided to stick with pylons for “most” systems, thought they’ll try to use tornado in some places.

They do some interesting cacheing with sqa that we can try at some point in the future,

A suggestion I don’t recall running into before was to disable the nagle algorithm (nice explanation).  Of course, doing that right now may be premature optimization.

They also have some good input on profiling performance, which reminded me again that I need to dig out the wsgi middleware I used for profiling.

Anyway, I was writing this as an email, and thought, why not blog?  It’s been a while.

revisiting addons, tb sync updated

Over the past couple months I’ve been distracted by a little project called Raindrop and have neglected the thunderbird addons a bit, but this week I’ve updated sync.  You can pull the latest from my bitbucket, get the tb-sync branch in weave-ext, and the default branch of weaver.  The oauth branch for Contacts also got a couple small fixes for thunderbird.

addon roundup

I thought I’d make some xpi’s available for some of the addons I’ve been working on, and while they are all available in either bitbucket or hg.mozilla.org, sometimes a xpi is just easier.  They will be released onto AMO soon, but for the adventurous out there here’s the links:

Available on AMO now:

Mailing list filter: Auto filter your email lists into search folders, and optionally create real folders to get them out of your inbox.

Bulk list filter: Auto filter a bunch of common site notifications into some search folders

Experimental:

These are not officially released, may not even work for you, they are here for the adventurous.  If you install these, you will have to uninstall them later to update.

Attachments: an experiment at searching and viewing attachments, currently consumes lots of cpu on accounts with large amounts of email.

Thunderbird Sync: Same as Firefox Sync (aka Weave) but for Thunderbird, and adds sync for Address Books. Requires the install of two addons, weave-ext and weaver.

Contacts for Thunderbird: A snapshot of Contacts addon for Thunderbird, requires the install of oauthorizer and Contacts.

Overview: An addon from David Ascher that gives a summary page on folders, I cleaned up a couple things and here is the result.