140kit

From BenningtonWiki
Jump to: navigation, search

Overview

For my 2011 independent study I will be continuing work on 140kit, an online platform for collecting and analyzing Twitter data. My work over the seven weeks will culminate to a sort of second release of the site/code.

The initial release was too ambitious: it had too many features, none of which worked very well, if at all. Additionally, through talking to 140kit users, I've learned that a solid collection and exportation system is much more important than a rich feature set. Instead of adding more features, this second release will focus on making the collection and exportation engine of 140kit more stable, reliable, and easy-to-use. This also includes fixes to several bugs that have piled up since the initial release.

Currently, the live site is a mess. It also crashes frequently because of some server problem. I began a fork of the code (ian) in Fall 2010 to begin using a simpler database model. This fork, however, is not yet ready to be pushed live. I will continue to work in this fork. When it is ready, it will replace master and be pushed live.

To Do

Setup

  • reinstall server software on Linode (Apache or Nginx)
  • set up online development environment on Linode?
  • backup database?

Backend

  • fix locking
    • bug: duplicate analysis jobs are being created because there's no locking on curations
    • abstract out a locking method that can be used on any object
    • add locking columns to instance table
  • fix track collecting so that it works off of a txt file of terms instead of string of API params
  • implement REST (user) collecting
  • implement sample feed collecting
  • implement curation creation
  • add ORM (AR or DataMapper)

Frontend

  • make changes corresponding to backend changes (REST, sampling, curations)
  • fix Google charts

Beta Release Features/Fixes

  • site doesn't crash every day
  • adding a new collection is easy and intuitive
  • status/progress of collection is transparent
  • all google charts work, and display information correctly
  • network graph analytic is added only by choice
  • CSV and SQL exporter functions added only by choice as well
  • new feature: sample random tweets
  • cleaner backend with stable (non-homebrew) ORM
  • simpler, lighter database schema
  • all other/previous features that don't work 99.9% of the time will be pulled back

Updates

Jan 14, 2011

Done

  • reinstalled everything on linode
    • now using nginx instead of apache
    • created dev site at dev.140kit.com
  • abstracted out locking, fixed all locking problems
  • moved backend into rails app
    • ditched homebrew ORM and SQL libraries for ActiveRecord
    • cleaned up code (deleted whole files and hundreds of lines of code!)
    • new model: streamers, analyzers, and resters inherit from the Instance class
    • 'script/runner slave/streamer.rb' to run streaming instance
  • added at_exit to exit/fail gracefully by unlocking what it was working on
  • added functionality for following users and tracking locations through the streaming api (yes!)

To do next

  • fix bug where stream connections drop without warning or error (tweets don't come through)
    • TweetStream seems to handle this really well... (borrow their code?)
  • add relation tables for tweets and users?
    • one tweet or user can belong to many datasets
  • add front-end forms for follow and location tracking
  • convert analyzer and rester to use ActiveRecord and follow new model

Links

140kit

GitHub repo

Linode

DataMapper