Sunday, October 11th, 2009
Since I introduced myself to Google AppEngine, I’ve always liked it. It’s truly a joy to develop on the AppEngine platform. Due to hectic work schedule, I wasn’t able to get my head around to thinking of a new toy project so I could get into AppEngine again. Well, this weekend I decided to take a break from work-work and build something on the side for a change.
When I was working on the NewsXperiment project, I was neck deep in the RSS/Atom feed world. All the news feed sources that I had accumulated for NewsXperiment was hanging around to be used for another purpose. So came “News Fishing“.
I needed a way to quickly peek at what’s happening “right now” without being lost in a jungle of “stuff” on a web page or in an RSS reader app. I wanted to see “one” news item at a time, and if my interest in intrigued I wanted to dig in more by clicking on the link to the original page. If not, keep fishing. That was my initial and only requirement and it turned out to be the premise of “News Fishing“.
- As I’m typing this, two mobile apps (for iPhone and Android phones) for News Fishing are in the works.
Sunday, August 10th, 2008
As stupid as it looks, and it “does NOT make any sense” at many angles, NewsXperiment bears a few interesting software technologies and paradigms.
NewsXperiment project consists of two parts: NewsXperiment Scrambler Engine (NSE), and Web frontend.
NewsXperiment Scrambler Engine runs offline and gathers, processes, scrambles and outputs a zip file that consists of scrambled news item pickles.
Once executed, NSE goes through its categorized feed repository and retrieves the feeds. Thanks to Mark Pilgrim’s excellent “feedparser” library.
Now that the feeds are read, the engine performs the following:
- randomly picks a certain number of news items from each category as base feeds.
- randomly associates a certain number of scrambler feeds to each base feed.
At this point, the engine has the initial data in place. There comes the scrambling…. However, before scrambling anything, all the entries picked to be scrambled need to be tagged, chunked, chinked.
- Using NLTK, all the titles, and summaries read are tagged, chunked, chinked.(i love this part)
- Accoding to the chunkie, chinckie data, each base feed item’s title and summary are scrambled with the set that was destined to be the scrambler for the base. Ofcourse, this does not always result in a well-constructed sentence.
- At some point, the scrambling process is completed and time to generate the output file.
- Output file is created out of each scrambled item, and consists of a list of titles, summaries and links back to the news items that are used to create them. This file is a pickle dump dictionary elements.
- The output file is datestamped, and zipped. Zip file because, doh!, it’s compressed. Plus, I couldn’t find a way around uploading the pickle content to Google AppEngine. Very likely a MIME type issue, but didn’t dig deep into that. A zipped pickle dump was all I needed, and I had it.
Very well, I have the zipped pickles, what do I do with them? If I cannot get them up to Google AppEngine’s data store, how possibly could I share ?
Wednesday, July 30th, 2008
I started playing with Google AppEngine a few months ago. First, tried to port over my work in SillyDomainNames.com into AppEngine, but gave up on it after a short while. Always being on the lookout for new and interesting ideas, I somehow came up with this experimental-mash-up-site concept; NewsXperiment. Not your everyday mashup site, something different and unique. I spent some time experimenting with the code locally and after I brought the Natural Language Toolkit (NLTK) into the mix, it immediately gained some traction. Brilaps was looking for a project to test out the new Google AppEngine and it made sense to let NewsXperiment.com be the guinea pig. Google AppEngine turned out to be a great idea and it didn’t take long for me to bring this project from an idea to the first beta release. I haven’t come across anything similar yet, so if anything exists, please let me/them know.
So what is NewsXperiment? What can I do there? What is the roadmap and what were the challenges during development? I’ll try to answer those questions in this blog post.
What is NewsXperiment?
NewsXperiment is a news scrambler/generator site. In the possible simplest terms, NewsXperiment reads a bunch of RSS feeds, approximately 200, from a number of highly respected sources and scrambles their news’ titles and summaries using Natural Language Processing techniques. The idea is to create interesting, funny, and/or timely new stories based on actual real-time events as reported by news sources of all kind across the Internet. The mash often produces comical stories such as “Princess Di Dancing with the Polar Bears at Golden Gate Bridge”. How would it come up with such a story? Well at that time of our scrambling there was probably some unrelated news about Princess Di, Dancing with the Stars, Polar Bears, and Golden Gate Bridge. We randomly select and break apart each story, scramble them up, and rebuild them to construct amusing and well structured stories. The magic is in the reconstruction. The engine is still in beta and thus the scrambled Title/Summary text still needs some refinement, but it is worth a bookmark and glance every day or so, as it already generates some pretty interesting mashups several times a day.
You can simply poke around and glance at a few news entries. Or if you feel like digging in more, you can rate some stories and/or comment on them. Better yet, you can write your own version of the scrambled story using the references provided for that news. On top of all that, you can provide feedback and become a true NewsXperiment star
Roadmap and the challenges during development?
As of Aug 3rd, 2008 the basic functionality of an interactive website is in place.
Scrambler Engine, News Upload, and Admin level CRUD operations, Visitor Comments, Visitor Rating are all implemented.
Some tech specs about NewsXperiment project:
- http://newsXperiment.com redirects to http://newsXperiment.appspot.com
- Built with Python
- NewsXperiment hits Flickr per news item and grabs a relevant image.(this is the fun part)
- Utilizes NTLK libraries within the scrambler engine that runs offline.
- The generated output is a “zipped pickle” file and it is uploaded to Google AppEngine using appcfg.py.
- Runs on Google AppEngine.
- Uses Django for server-side rendering.
What’s in the bag for near future development:
- Sometime in the near future, a “Fork This News” feature will be added. “Fork This News” feature will enable the visitors to make a copy of an existing news entry, and write their own version, which can be rated, commented and yet again forked over and over again. Currently, visitors can simulate doing the same thing using the “Comment” form assigned to each news item.
- A better front-end design would be nice, but I highly doubt I’ll loose sleep on it. I absolutely wouldn’t mind if someone with good design skills taking a stab at it.
- NewsXperiment surely needs a new logo.
I’ll leave the challenges and the technical mumba jumba to another post… Any feedback is appreciated. Please feel free to comment here. If you prefer email communique, see “About” link on NewsXperiment.com for contact info.
Tuesday, July 19th, 2011
pubhubsubhub is a data (news) aggregator which can deliver your “topics of your interests” to you as Instant Messages.
You can almost consider it as an RSS Reader with the convenience of an IM.
Anytime I come up with (or come across an) idea, if it’s Web applicable, Google’s App Engine has been the platform of my choice. While glancing through the SDK Documents I came across Prospective Search and further more reading landed me on the XMPP.
Long story short, after reading through the docs and looking at a few samples, pubhubsubhub is born to possibly turn into something more than a news aggregator which is capable of delivering the subscribed topics (search results) as instant messages.
It is quite neat with Adium plus Growl notifications. While on GTalk widget in GMail, I’d recommend to have the pubhubsubhub popped out.
Currently, the search data is coming from approximately 1000 sources with high recent-popularity. That’s why I personally find it very useful to keep an eye on the recent / trending events.