Data Portability

Social network portability is one of several user-interface ideas and suggestions in the area of data-portability. As users, our identity, photos, videos and other forms of personal data should be discoverable by, and shared between our chosen (and trusted) tools or vendors. When you join a new site, you should be able to import or preferably subscribe to your profile information and your social network from any existing profile of yours. We need a DHCP for Identity. A distributed File System for data. The technologies already exist, we simply need a complete reference design to put the pieces together. This problem is solved by a number existing technologies and initiatives: Microformats, OpenID, OAuth, RDF, RSS, OPML and APML.

Data Portability Technologies

Data Portabilities mission is to put all existing technologies and initiatives in context to create a reference design for end-to-end Data Portability. To promote that design to the developer, vendor and end-user community.

This post serves a brief primer to each of these technologies.

Microformats

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviours and usage patterns (e.g. XHTML, blogging).

Examples include:

People and Organizations
hCard
Calendars and Events
hCalendar
Opinions, Ratings and Reviews
VoteLinks, hReview
Social Networks
XFN
Licenses:
rel-license
Tags, Keywords, Categories
rel-tag
Lists and Outlines
XOXO

If you use Flickr, Technorati, Upcoming, Last.fm, Twitter, Cork’d or any number of other services, you can conceivably share data between the different services providers automatically.

More details can be found on the microformats website.

OpenID

OpenID is an open, decentralized framework for user-centric digital identity. OpenID takes advantage of already existing internet technology (URI, HTTP, SSL, Diffie-Hellman) and realizes that people are already creating identities for themselves whether it be at their blog, photostream, profile page, etc. With OpenID you can easily transform one of these existing URIs into an account which can be used at sites which support OpenID logins.

In other words, OpenID allows users to login using shared credentials across different services. It also allows users to decide what information to share between services. For example, you can allow the use of your address on one service, but not another. You can think of OpenID as an extension to the single sign on used by Google or Yahoo! to access their various services.

More details can be found on the OpenID website.

OAuth

The OAuth protocol is less about authentication, which is the realm of OpenID, but rather authorisation. OAuth is an open protocol to allow secure API authorisation in a simple and standard method from desktop and web applications. For consumer developers, OAuth is a method to publish and interact with protected data. For Service Provider developers, OAuth gives users access to their data while protecting their account credentials.

A number of services have already been implemented. These include Fire Eagle, Open Social, Pownce, Get Satisfaction and Magnolia.

More details can be found on the OAuth website.

Resource Description Framework (RDF)

RDF is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats.

The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion “The sky has the color blue” in RDF is as the triple: a subject denoting “the sky”, a predicate denoting “has the color”, and an object denoting “blue”. RDF is an abstract model with several serialization formats (i.e. file formats), and so the particular way in which a resource or triple is encoded varies from format to format.

This mechanism for describing resources is a major component in what is proposed by the W3C’s Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. RDF’s simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity.

More details can be found on the W3C website.

Really Simple Syndication (RSS)

RSS is a family of Web feed formats used to publish frequently updated content including, but not limited to, blog entries, news headlines, and podcasts. An RSS document, which is called a “feed” or “web feed” or “channel”, contains either a summary of content from an associated web site or the full text. RSS makes it possible for people to keep up with web sites in an automated manner that can be piped into special programs or filtered displays.

RSS content can be read using software called an “RSS reader”, “feed reader” or an “aggregator”. The user subscribes to a feed by entering the feed’s link into the reader or by clicking an RSS icon in a browser that initiates the subscription process. The reader checks the user’s subscribed feeds regularly for new content, downloading any updates that it finds.

More details can be found on the RSS Board website.

Outline Processor Mark-up Language (OPML)

OPML is an XML format for outlines. Originally developed by Radio UserLand as a native file format for an outliner application, it has since been adopted for other uses, the most common being to exchange lists of web feeds between web feed aggregators.

The OPML specification defines an outline as a hierarchical, ordered list of arbitrary elements. The specification is fairly open which makes it suitable for many types of list data.

More details can be found on the OPML website.

Attention Profiling Mark-up Language (APML)

APML allows you to share your own personal Attention Profile in much the same way that OPML allows the exchange of reading lists between News Readers. The idea is to compress all forms of Attention Data into a portable file format containing a description of your ranked interests.

Services that have adopted APML include Bloglines, Cluztr, Dandelife, Engagd, Idiomag, OpenLink Data Spaces and Particls.

More details can be found on the APML website.

Securely transfering personal data around the web has become an increasingly important concept to not only users of the web, but service providers. Both Plaxo and Six Apart have been working on a system to allow the transferral of data. However, since Google announce Open Social and the Open Social API, the mantle has been handed over and there is now a strong commitment to realising data portability.

In the late 1990s, a large multi-national technology corporation, hoping to become a major force in online advertising, bought a small start-up in a sector that was believed to be the next big thing. That corporation was Microsoft and the start-up was Hotmail. Hotmail and Microsoft established web-based email as a must-have application for personal use. The addition of Hotmail to the Microsoft inventory promised to increase the companies online revenues that were being dominated by Yahoo!, Google and AOL amongst a host of others.

A decade later it was the turn of a much-evolved AOL to speculate with the purchase of a small and upcoming social networking website, Bebo, for $850m (£425m). This has raised a number of eyebrows since AOL has been a struggling web-portal after its merger with Time Warner, added to the fact that the real value of social networking has yet to be realised or understood.

Social Networking Websites

Both deals in their respective decades offer to the casual observer a paradox of the Internet revolution. Whilst both email and social networking have the premise of being the next big thing which aides revenue generation, it is dangerous to assume that each service can standalone and generate revenue in its own right. Webmail, now over a decade old illustrates this perfectly. Microsoft, Yahoo!, Google and AOL all have their respective webmail services with advertisements stratefically placed to entice the user to click through, but these are a small part of the bigger networks. The offer of email, free archiving, address book and calendar is cheap to deliver, but its primary purpose is to keep the user engaged with the brand and its associated websites, making users more likely to visit the affiliated pages where advertising is more effective.

For instance, I am a fully signed up member of Google and access their email, chat, documents, analytics, webmasters, adsense, adwords, calendar and checkout applications, etc, some of which have advertising and all of which support the core Google search pages through branding. A similar example can also be said of Yahoo!. I again frequently use Yahoo!s MyBlogLog, Flickr and Upcoming services, which serve to re-inforce the Yahoo! brand and web portal.

Social networking will become a ubiquitous feature of online life, but that does not mean it is a business.

From whence came webmail now comes social networking. The implicit values of social networking services such as MySpace, Facebook and Bebo have been increased by the big internet and media companies such as News Corporation, with their purchase of MySpace for $580m (£290m) in 2005 and Microsoft’s $260m (£130m) investment for a 1.6% share in Facebook, in late 2007 (valuing it at an enormous $15bn/£7.5bn). But valuing these online services so highly does not mean that there is a valuable revenue model; Facebook’s revenue for 2007 was a mere $150m (£75m). Sergey Brin of Google also admitted that the monetisation of their Orkut service and social networking in general was proving to be problematic (they also have a contractual agreement with News Corporation to offer advertising on their MySpace service).

Facebook has also been met with criticism and difficulty when trying to monetise its service with a project called Beacon. Facebook’s idea was to inform users’ networks whenever an item was purchased therefore creating what is in effect a recommendation system, or algorithmic word-of-mouth. Users rebelled and privacy advocates shouted loudly, the service was axed and Mark Zuckerberg, Facebook’s founder, was left to apologise for an innovative idea badly implemented.

Whilst social networking does have oportunities to make money, it is unlikely that it will be pots and pots of money. The value of the service, however, is not monetary, but as its genre suggests, it is social. We have already seen how people can connect to past and present friends, but a social networkings strength is in its ability to forge new relationships, business or personal. Social networking has made explicit the connections between people, which has lead to a whole ecosystem of applications built on their APIs which allow users to interact.

But should users really have to visit a specific website to be social?

I often comment that there is something profoundly wrong when people are forced to spend their lives updating their profile to keep in touch with their so-called friends. What happened to the good-old-fashioned telephone? Why don’t people simply arrange to meet up and go for a drink to keep in touch? Of course, with everyone’s increasingly busy lives, it is possible to argue that posting a tweet via twitter, posting an article on a blog or updating your Facebook profile, allows you to continue a real relationship with your friends, whilst not actually needing to see them every Friday or Saturday night. This is a good thing, right?

Another problem presented by today’s social networks is that they are an enclosed ecosystem, at least to users. Whilst Facebook and LinkedIn, in addition to a whole host of others, have provided APIs for developers to encourage them to interact with their services (this has been particularly successful with Facebook) the same cannot be applied to users. The various social networks, until recently, have been reluctant to allow users to pass data between competing services, afterall, this data is core to the success, or indeed failure, of a site. This is understandable since the networks’ huge valuations depend on the sites maximising revenues and page views, so they need to maintain a tight control. As a result, keen Internet users maintain a plethora of online accounts.

2008 will see a change in how people access social networks.

Google Open SocialThe opening up of social networks, lead by Google with their Open Social API, is set to bring about an evolution in this medium. This change is following the historical standardisation of popular services. First it was email with webmail, which in the early days was restricted to individual ecosystems, for example AOL and CompuServe, then it was instant messaging, with individual services provided by Microsoft, Yahoo!, Google, AOL and Skype.

Further developments include the Data Portability Working Group, whose mission is to put all existing technologies and initiatives in context to create a reference design for end-to-end data portability. In short, allow users to move their data around competing services. Others are pushing OpenID; a plan to create a single, federated online sign-on system that people can use to access many websites.

Data Portability

The opening of social networks is likely to accelerate thanks to the first tentative, yet bold, steps made by webmail; the first social network. As a technology, webmail has become old fashioned, but its younger sybling, the social network will revitalise not only webmail, but online communication and advertising. Through social intelligence, marketers and advertisers will be able to target adverts for items that we are more likely to want. This will not only boost the users online experience, but provide a more targeted revenue stream.

The fight for social networking dominance has been running for several years now, but it shows no sign of letting up.

The term Web 2.0, first coined by Tim O’Reilly back in 2004, describes a cluster of web-based services with a social collaboration and sharing component, where the community as a whole contributes, takes control, votes and ranks content and contributors. Web 2.0 services include social networking sites, wikis, communication tools, weblogs, social bookmarking, podcasts, RSS feeds (and other forms of many-to-many publishing), social software, and folksonomies. Central to this new Web is the idea of tagging — the adding of keywords to a digital object (e.g. a website, picture, audiofile or videoclip) to categorise it. This activity is effectively subject indexing but generally without a controlled vocabulary.

The following list provides examples of sites which include some form of user-based tagging:

Blogs
Technorati: http://technorati.com
Bookmarks
Delicious: http://del.icio.us
Books
Librarything: http://www.librarything.com
Emails
Gmail: http://mail.google.com
Events
GoingToMeet: http://www.goingtomeet.com
People
Tagalag: http://www.tagalag.com
Pictures
Flickr: http://www.flickr.com
Podcasts
Odeo: http://odeo.com
Videos
YouTube: http://www.youtube.com

Folksonomic Websites

Tagging of course is not a new concept, especially to librarians, indexers and classification professionals. What is new is that the tagging is being done by everyone, no longer by only a small group of experts, and that the tags are being made public and shared. This is the concept of Folksonomy.

A folksonomy is a user-generated taxonomy used to categorize and retrieve web content such as Web pages, photographs and Web links, using open-ended labels called tags. Typically, folksonomies are Internet-based, but their use may occur in other contexts. The folksonomic tagging is intended to make a body of information increasingly easy to search, discover, and navigate over time. A well-developed folksonomy is ideally accessible as a shared vocabulary that is both originated by, and familiar to, its primary users.

In contrast, in the realm of the Web, taxonomy can be defined as:

the laws or principles of classification;

controlled vocabulary used primarily for the creation of navigation structures for websites

The development of the Internet and the Web, and of search engines, led to users doing their own searching. In the Web 2.0 environment users are now also doing their own content creation and information management.

Because folksonomies develop in Internet-mediated social environments, users can often discover who created a given folksonomy tag, and see the other tags that this person created. In this way, folksonomy users often discover the tag sets of another user who tends to interpret and tag content in a way that makes sense to them. The result is often an immediate and rewarding gain in the user’s capacity to find related content. Part of the appeal of folksonomy is its inherent subversiveness: when faced with the choice of the search tools that Web sites provide, folksonomies can be seen as a rejection of the search engine status quo in favour of tools that are created by the community.

Folksonomy creation and searching tools are not part of the underlying World Wide Web protocols. Folksonomies arise in Web-based communities where special provisions are made at the site level for creating and using tags. These communities are established to enable Web users to label and share user-generated content, such as photographs (e.g. Flickr), or to collaboratively label existing content, such as Web sites (e.g. Technorati), books (e.g. LibraryThing), works in the scientific and scholarly literatures, and blog entries (e.g. WordPress).

Web 2.0 will alter the way that businesses develop and apply innovative ideas.

During the 1990s business leaders and venture capitalists grappled with how they would make money from the web. This was tipified by the two VCs, Kleiner Perkins and Sequoia Capital, investing $25 million in Google in the late 1990s; they new the search engine created by Sergey Brin and Larry Page was a winning formula, even though the pair had not yet monetised search. Bricks and mortar compaines were deemed “old hat” as the dotcom bubble was expanding. Companies such as eBay, Amazon and Yahoo! were at the forefront of every investors’ chequebook. Every company needed a 21st Century “Blue Sky” web strategy; every company needed to do e-commerce. However, the bubble burst and everyone was brought down with a bang. Boo.com is a classic example of the fallout from the over speculation.

Today, the reality has shifted from solely bricks and mortar or dotcom, to a balance between the real world and cyberspace, of traditional business operations complemented by the unversality provided by web-based technologies. The web has given businesses a greater understanding of their customers. With Web 2.0 a new type of web is emerging, further enhancing the understanding of a user or customer through the creation of online communities, where information is shared and new ideas evolve.

There are numerous examples of web communities from the early FriendsReunited to MySpace and the more specific Islandoo for the Channel4 TV progamme Shipwrecked. Web 2.0 is all about collaborative networks tipified by Flickr, del.icio.us, Wikipedia and YouTube. However, Web 2.0 has primarily been used in the consumer arena, as identified by the examples, but the use of such technologies has far reaching implications based on understanding how people interact with the technologies and behave online. Linking people across countries, time-zones and company boundaries will enable people to work together without hierarchical boundaries, bringing people together as one team to collate the best input. This is emphasised with the concept of a wiki whereby any end-user can make changes to the shared resource without the need for specialist software and expensive training. This makes sharing knowledge extremely easy.

Other areas of Web 2.0 is the technology identified by the term “folksonomy”. Simply, a folksonomy is defined on Wikipedia as:

… an Internet-based information retrieval methodology consisting of collaboratively generated, open-ended labels that categorize content such as Web pages, online photographs, and Web links. A folksonomy is most notably contrasted from a taxonomy in that the authors of the labeling system are often the main users (and sometimes originators) of the content to which the labels are applied. The labels are commonly known as tags and the labeling process is called tagging.

While it takes time for an expert to create a taxonomy specific to a particular organisation in order to categorise or define data, folksonomies do not require fixed taxonomies. Instead, users define their own descriptions of the data to be described by applying tags to the data, whether it is a bookmark in terms of del.icio.us, an image on Flickr, a video on YouTube or a document in a company repository. Over time, these tags can be amended by other users resulting in a definition that is more specific. This enables users to find information with relative ease, without having to type the exact keyword.

Web 2.0 will bring a whole host of issues into the business arena. While there are clear benefits from establishing communities and social networks, people with different views, be it political or religious, can drive the agenda. Further complications arise through the necessity to audit changes to the data and ensuring the data is indeed accurate (Wikipedia has had cases where people have maliciously altered data to either enhance their own profile or devalue the significance of historical events).