For my 2011 independent study I will be continuing work on 140kit, an online platform for collecting and analyzing Twitter data. My work over the seven weeks will culminate to a sort of second release of the site/code.
The initial release was too ambitious: it had too many features, none of which worked very well, if at all. Additionally, through talking to 140kit users, I've learned that a solid collection and exportation system is much more important than a rich feature set. Instead of adding more features, this second release will focus on making the collection and exportation engine of 140kit more stable, reliable, and easy-to-use. This also includes fixes to several bugs that have piled up since the initial release.
Currently, the live site is a mess. It also crashes frequently because of some server problem. I began a fork of the code (ian) in Fall 2010 to begin using a simpler database model. This fork, however, is not yet ready to be pushed live. I will continue to work in this fork. When it is ready, it will replace master and be pushed live.
- reinstall server software on Linode (Apache or Nginx)
- set up online development environment on Linode?
- backup database?
- fix locking
- bug: duplicate analysis jobs are being created because there's no locking on curations
- abstract out a locking method that can be used on any object
- add locking columns to instance table
- fix track collecting so that it works off of a txt file of terms instead of string of API params
- implement REST (user) collecting
- implement sample feed collecting
- implement curation creation
- add ORM (AR or DataMapper)
- make changes corresponding to backend changes (REST, sampling, curations)
- fix Google charts
Beta Release Features/Fixes
- site doesn't crash every day
- adding a new collection is easy and intuitive
- status/progress of collection is transparent
- all google charts work, and display information correctly
- network graph analytic is added only by choice
- CSV and SQL exporter functions added only by choice as well
- new feature: sample random tweets
- cleaner backend with stable (non-homebrew) ORM
- simpler, lighter database schema
- all other/previous features that don't work 99.9% of the time will be pulled back
Jan 14, 2011
- reinstalled everything on linode
- now using nginx instead of apache
- created dev site at dev.140kit.com
- abstracted out locking, fixed all locking problems
- moved backend into rails app
- ditched homebrew ORM and SQL libraries for ActiveRecord
- cleaned up code (deleted whole files and hundreds of lines of code!)
- new model: streamers, analyzers, and resters inherit from the Instance class
- 'script/runner slave/streamer.rb' to run streaming instance
- added at_exit to exit/fail gracefully by unlocking what it was working on
- added functionality for following users and tracking locations through the streaming api (yes!)
To do next
- fix bug where stream connections drop without warning or error (tweets don't come through)
- TweetStream seems to handle this really well... (borrow their code?)
- add relation tables for tweets and users?
- one tweet or user can belong to many datasets
- add front-end forms for follow and location tracking
- convert analyzer and rester to use ActiveRecord and follow new model