Internet developers day – part 1
I was invited today at the Internet Developer summit (#devkon) (thanks for the invite Reto), to present our vision and work on the Web of Things. I’ll try to blog as much as I can because it’s a lot of tips & tricks for internet developers, and the program is juicy. Sorry for the random notes, hopefully it’s usable enough (not only it was great so I had to listen, but I also had to translate from german, so please be kind).
Find more about the talk & slides here (in german).
Web is an ecosystem (APIs, networked data, twitter, extreme network effects). Interesting to analyze the growth (and not). Introduced pipes and friendfeed -> republication (retweets), platforms such as “thunder”. Flickr pics are sorted using times. Schools now have high speed internet but extremely bad computers (the colored round early iMacs). Yslow, page speed & other tips from yahoo.
Good APIs are nice and useful, self-describing, stateless, and the client decides the returns format (using REST of course). It makes sense to outsource the delivery CDN tips, oustource media data (aws, etc. several hostnames sockets) & better infrastructure, exception central repositories (eg. data on google code), his favorite is amazon web service (with or w/o) cloudfront. On facebook, when you want to build all the infrstrctuture from scratch, to support time-ordered storage of multimedia data: good luck. Relevance means for him “here and now”, so how does a search engine now the current here and now (the context)? Real-time search on twitter, only keeps the time index recently. Everything runs from memory. Check out docs about the infrastructure for facebook/twitter. Key/value pairs held in ram and simply distributed. Realities of distributed systems, disk, memory, i/o errors, transactions must be sync’ed. This requires new algorithms, partitioning (separation of concerns), simple interfaces, not invasive, no possibility of central data, acid vs base (eric brewer, towards robust distributed system). BASE (basically available, softstate, eventual consistency). CAP theorem: choose 2 of these consistency, availability, or tolerance network partitions.
Feeds: output point. Content available as feeds (RSS/Atom) with stateful URL. Polling sucks. Feed-discovery.Take home: offer APIs (& use them), offer notifications over platforms, scalable & elastic infrastructure, cachability & data storage.
Christian presented Twitcheln, which is an application to offer virtual presents over twitter. It’s an app with 3 APIs (normal, search & streaming) on top of twitter, rate limiting. Fully restful, different formats, classic API description + twitter features used (oAuth, list, autofollow, etc). They used CouchDB so could scale on the cloud. On twitter you can see a list of all users that have used it. What a cool and simple idea that was programmed very quickly (2 weeks thanks to the easy & cool API @twitter), twitter has reacted quickly (whitelisted them), no special libs other than pecl/oauth & some other php libs. Then he briefly introduced his real-world real-time web example I’ve heard about a while ago (check it out). I love his vision of the future Web fully automated, connected, linked. He talks about how this could be realized, and discusses other variants such as Weblog ping, SUP, pubsubhubbub, xmpp, prowl. He showed and demoed his setup, where he takes a pic, which gets, tweeted, friendfeeded, flickred, etc. I would love to play with his stuff
The slides of his talk are here.
Start here. Then create an app, give the name & callback (where the app is actually running). Facebook acts as the intermediary, requests data from your server (the app), and relays it back to the client. Choose a client library and include the libs, then you can use the API (>200 API methods for authentication, management, users, queries, photos, etc.). You can have access to more than 80 data about users, and then one can setup permissions. Then you can use over 100 FBML tags (fb:header, fb:bookmark). Then you can use FQL, a pseudo SQL to query for data. The API is completely changing, be consequent. You can link data, operations, pictures, etc. There is a developer FAQ, a wiki, a forum. See the live health status of facebook (because it’s not necessarily your apps that bugs). Then he showed a little demo and code examples of how one can easily build an app.
You can find the slides here.
Single Sing on mit Facebook, Twitter, und Google-ID, Dani Niklaus, CEO netlive.
He introduces the issue of SSO with the facebook connect login function on blick.ch. You have three types of SSO types, pure openID provider. When you create an openid accout to a provider, you get a unique ID that you can use (google, flickr, yahoo, etc). However, facebook & twitter have their own system (which is problematic as they have most of the users in social nets). Windows live… hmmm… yeah, we wait that they upgrade. With fbook connect, you can see a nice window that looks and feels facebook, but the domain name is www.facebook(.got.your.password.ru), so pay attention. Then you risk things such as crosssite forgeries (xss/csrf). Use https! When users have different ID systems, how can you link all these to a single ID, you need to find doublets. How to install that, how to deal with lost passwords? There are many many facebook starter kits for fb developers in many languages. Then you can use openID libs to login. Then who gives what? He told us about the suisseID project (electronic ID & valid electronic signature).
Find more infos here and slides/docs here. We have two issues as developers that shouldn’t be mixed: performance (leopard) & scalability (bird flocks). Starts with SOA, and how the front-end/back-end architectures looks like (with the example of local.ch where Patrice was front-end developer). Key points: dependencies local to services (DB, queues, synchronization), data replication/rsync. Try simple service that works. Services are the new classes, or maybe the new packets, or DB table. Motivation: modularity (team separation, clear boundaries, easier migrations, replacements, versioning), services/code reuse, best tool for each job (language, dependencies, tools), scalability, & competence center (each modules knows how to do its own job). On the other hand, it’s more expensive, complex set-up (training, deployment, training, automation), hard to see the big picture of the app & runtime, performance, data joins over net (should be avoided anyway). HTTP has a lot of problems already solved, so don’t re-implement your own RPC protocol (caching compression, error handling, authentication, content negotiation, documentation). Bonus: load balancing, proxying (squid), encryption (ssl). 500 internal server error is cool, because it says “we don’t want to find out what the real error is”. 402 payment required is also nice. He gave a little intro to caches, with ETags, if-none-match, expires, etc.
Then he shows the mnemonic architecture, how the different computers are bound and yield the final application. They have developed a python framework (on top of wsgi). It seems a clean tool to build distributed apps that fully communicate over HTTP.
NoSQL oder Not only SQL, Dr. Michael Marth (@michaelmarth), Founder marth.software.services.
NoSQL? Bunch of projects with common persistence requirements. Relational overlaps with non-relational. Two focuses: data models and scalability. no signle point of failure, transparent to the app, horizontal scaling (more servers=more capacity). Data first, structure later. Data model, granular, binary, node level access control. He mentioned jackrabbit as an example. The sweet spot of web content. The CouchDB uses a schemaless data model (just JSON). No odbc, no jdbc, just http. Example of couchdb. No SQL, but map-reduce (for queries in CouchDB), and gives an example. Parallelizable (cool word, eh?)! CouchDB designed for replication (incremental & on demand). For example, synchronize address book on mobile phones and computers & apps (hear that apple?). Sweet spot: distributed databases with schema-less data. Apache cassandra (initially for facebook mail search, but now also twitter, digg). Data model: key-value on steroids. High availability (suited for datacenters), eventually consistent, tunable tradeoffs between consistency. Sweet spot: really large data sets & high availability. When it comes to performance, hmm…, well depends on what we want to compare.
The rest of the talks will follow later.