The new InfoBubbles project is a powerful but easy-to-use tool that gives users measurements of the statistical similarities of the current page to known text corpora, or “bubbles”. It uses Mozilla’s new WebExtensions add-on interface. You can build a bubble dataset for text you're interested in, and share it with other users.
Users can get a bubble summary, or explore in more detail.
Page classifications are initially user generated. A user can mark arbitrary URLs as either an example or a counter-example of a given filter bubble. These classification corpora can then be matched by pluggable text classifiers that can report data back to the user using a common UI. Configurations of corpus set feeds and algorithms will be published to configurable “bubble servers” that users can load configurations from.
This project will be designed to work with Google Chrome as well as Firefox, although initial development will be done in Firefox. All text handling will be client-side, with no text sent to the server.
If you're interested in applications for text classification, you can apply InfoBubbles to build bubble recognition power into social games, productivity tools, recommendation tools, and self-discipline/quantified self tasks.