I've been meaning to make the effort to fiddle with Node.js for awhile, and for lack of a better idea I've decided to dive in to what seems to be the staple for Node.js newbies; a simple app that will stream updates from Twitter.
That's how it started, at least. What follows is my take at making it slightly more challenging; although the actual sentiment analysis part is rudimentary. When the time comes, and with a lot of further research, the plan is to change the algorithm from negative/positive scoring (or "looks like randomly assigned green or red") to "80% accurate detection of sentiment".
The process to get Node.js up and running is as simple as grabbing the installer for your platform from the Node.js site. Installation is pretty straightforward, no configuration is needed for now.
To test that everything is working, use the usual Hello World example:
// Load the http module to create an http server. var http = require('http'); // Configure our HTTP server to respond with Hello World to all requests. var server = http.createServer(function (request, response) { response.writeHead(200, {"Content-Type": "text/plain"}); response.end("Hello World\n"); }); // Listen on port 8000, IP defaults to 127.0.0.1 server.listen(8000); // Put a friendly message on the terminal console.log("Server running at http://127.0.0.1:8000/");
Slap it all in a file called "hello.js" and execute it from the console with:
node hello.js
First things first; create a folder for the project:
mkdir twittest
Now create a file named package.json within that folder. The package.json file is used to detail the project, including any dependencies (used later on). This is a basic package.json to begin with:
{ "name": "TwitTest", "version": "0.0.1", "private": "true" }
Next step is to define the dependencies for the project. For this project I'll be using the following:
The package.json file now looks like this:
{ "name": "TwitTest", "version": "0.0.1", "private": "true", "dependencies": { "socket.io": "*", "ntwitter": "*", "express": "3.x", "jade": "*", "mysql": "2.0.0-alpha9" } }
Once happy with the package file, or if it's ever changed, a simple:
npm install
in the project folder will grab all the dependencies for us.
Sticking with the Twitter theme (definitely not because of my expert design skillz), the TwitTest project will be using Twitter Bootstrap for the frontend. While mucking about in the project folder, Express will need a folder to serve static assets from, and Jade will need a folder for views. The folder structure for TwitTest now looks like:
twittest |_node_modules |_public |_css |_js |_fonts |_img |_views
node_modules is the folder created by the npm install command, containing any dependencies/modules for the project. Bootstrap assets are placed under the /public/ folder.
Finally time to get going with some actual code! Wait, ntwitter requires Twitter API keys? Fine. Off to https://dev.twitter.com/apps, and with a bit of prodding they've coughed up some keys for use in the project.
Go ahead and create a file called apikeys.js, inserting values where appropriate:
var apikeys = { consumer_key: '[your consumer_key]', consumer_secret: '[your consumer_secret]', access_token_key: '[your access_token_key]', access_token_secret: '[your access_token_secret]' }; module.exports = apikeys;
//Required modules var twitter = require('ntwitter'); var apikeys = require('./apikeys.js');
//Create a new twitter object using our keys var t = new twitter({ consumer_key: apikeys.consumer_key, consumer_secret: apikeys.consumer_secret, access_token_key: apikeys.access_token_key, access_token_secret: apikeys.access_token_secret }); t.stream('statuses/sample', function(stream){ stream.on('data', function(tweet){ console.log(tweet.text); }); });
Save the file and give it a run with:
node twittest.js
Ctrl+C will kill the app, saving you from whatever depths of hell twitter pulls it's sample stream from:
Now that we've recovered from the Twitter spam, it's time to get a frontend up and running in a browser. Express is what seems like a great framework for Node.js, but for now we will just be using it to get a server up and running with a single route on '/':
//Extra modules that are required var express = require('express'); var app = express() , http = require('http') , server = http.createServer(app); //Listen on port 8000 server.listen(8000); //Serve static files from /public app.use(express.static(__dirname + '/public')); //On getting /, respond with Hello World app.get('/', function(request, response){ response.writeHead(200, {"Content-Type": "text/plain"}); response.end("Hello World\n"); });
As you can see, setting up a simple HTTP server that uses Express is almost as simple as our first Hello World example. The first benefit so far of using Express is the routing, the second being able to use Jade for the templating/view system:
[...] //Listen on port 8000 server.listen(8000); app.set('views', __dirname + '/views'); app.set('view engine', 'jade'); //Serve static files from /public app.use(express.static(__dirname + '/public')); [...]
Jade does templating quite well in my opinion, but for a better understanding I recommend reading the docs linked above. For now I will just be listing the templates that were created for this project, starting with layout.jade. In this file, I've recreated the starter template from Bootstrap using Jade:
!!!5 html head meta(charset='utf-8') meta(name='viewport', content='width=device-width, initial-scale=1.0') title #{title} link(rel='stylesheet', href='/css/bootstrap.min.css') link(rel='stylesheet', href='/css/starter-template.css') script(src='/js/jquery.js') script(src='/js/bootstrap.min.js') body div.navbar.navbar-inverse.navbar-fixed-top div.container div.navbar-header button.navbar-toggle(type='button', data-toggle='collapse', data-target='.navbar-collapse') span.icon-bar span.icon-bar span.icon-bar a.navbar-brand(href='#') TestTwit div.collapse.navbar-collapse ul.nav.navbar-nav li.active a(href='#') Home div.container block content
Then there is index.jade:
extends layout block content div(id='tweets')
And finally, to get the view to render, we make the following changes to the '/' route:
app.get('/', function(request, response){ response.render('index', { "title": "TwitTest", "content": "testing" }); });
Loading up the site in a browser should now look like this:
Using socket.io couldn't be any easier. The first thing we need to change is add it as a required module and set it to listen on our HTTP server in twittest.js:
var app = express() , http = require('http') , server = http.createServer(app) , io = require('socket.io').listen(server);
The browser will need a way to connect to our server, so create a file called index.js under /public/js:
var socket = io.connect(); //Simple!
and include that in index.jade:
extends layout block content div(id='tweets') script(src='js/index.js')
Back in twittest.js, we need to change the code around a bit to only start the Twitter stream when a socket.io connection is opened:
//We'll hold the current twitter stream here, so that we can destroy if //the client disconnects var currentStream; //On a new connection, we start a twitter stream io.sockets.on('connection', function(socket){ //Really should have better error handling, TODO! connection.connect(function(err){ if(err){} }); //Code here is the same as before, except now we emit the tweet //using socket.io, instead of logging to the console. //(also checking for English tweets only, helps with the spam ;) t.stream( 'statuses/sample', function(stream){ stream.on('data', function(tweet){ if(tweet.lang == 'en'){ socket.emit('tweet', tweet.text); } }); currentStream = stream; } ); //When the client disconnects, make sure we destroy the stream socket.on('disconnect', function(){ if(currentStream){ currentStream.destroy(); currentStream = false; } connection.end(); }); });
Finally, back in index.js we need to handle the tweet emitted from the server:
//On receiving a tweet socket.on('tweet', function(data){ //Add the tweet text to the #tweets div. $('#tweets').prepend('<div class="row"><div class="col-md-12">' + data.text + '</div></div>'); }
This is where things get a bit tricky, thus far I think the above covers most of what is needed to get a simple Node.js stack up and running; so I think my initial goal of learning a bit of Node.js has been covered.
What we are attempting in the following section is to score tweets as either "negative" or "positive", depending on the words used.
I am still researching better ways to do accurate sentiment analysis, but for now the method below will serve as a base for Part 2.
In theory, what we're doing is splitting a tweet up into individual words and scoring each word with the following criteria:
Each on a scale from 0 to 1. Neg/Pos are both quite obvious, objectivity will be used as a type of "weighting"; words which are not clear on their meaning will not sway the score by a large amount.
If trained correctly, this system should be accurate enough to indicate whether a tweet is positive or negative. Unfortunately, my current word list is not trained very well - so we only see an accurate prediction about 30% of the time.
With that said, let's have a look at the code:
In twittest.js, we add a database connection:
var mysql = require('mysql'); var connection = mysql.createConnection({ host: '[host]', user: '[user]', password: '[password]' });
And functions to parse a tweet, followed by getting the 'score' of a tweet:
function parseTweet(tweet, socket){ //Strip all non alpha characters from the tweet tweet.stripped = tweet.text.replace(/[^a-zA-Z ]/g, ""); //Get all the words tweet_words = tweet.stripped.split(" "); //Only use words longer than 3 characters, we won't find much meaning in smaller words words = []; for(i = 0; i < tweet_words.length; i++){ if(tweet_words[i].length > 3){ words.push(tweet_words[i]); } } tweet.words = words; //Get the score getScore(tweet,socket); } function getScore(tweet, socket){ //Build the database query //We really should sanitize this ;) Fine for demo purposes query = 'select word, positive, negative, objective from sentiment.word where word in("' + tweet.words.join('","') + '")'; connection.query(query, function(err, rows, fields){ if(err){ return; } var positive = 0; var negative = 0; //For each of the rows returned, add to the score for(i = 0; i < rows.length; i++){ positive += rows[i].positive * (1 - rows[i].objective); negative += rows[i].negative * (1 - rows[i].objective); } tweet.score = {}; tweet.score.positive = positive; tweet.score.negative = negative; //Finally emit the tweet, along with the score socket.emit('tweet', {text: tweet.text, user: tweet.screen_name, score: tweet.score}); }); }
Don't forget to destroy the database connection when socket.io disconnects, and change the stream function to use the parseTweet function instead of simply emitting the tweet.
In index.js, we need to add a bit of extra code to highlight tweets in red or green:
socket.on('tweet', function(data){ //Get the 'final' score var diff = data.score.positive - data.score.negative; var r,g,b = 0; //Diff < 0 is a negative tweet if(diff < 0){ //Change the shade of red depending on the score r = Math.floor(40 * diff * -1) + 140; g = Math.floor(40 * diff * -1); b = Math.floor(30 * diff * -1); style = 'color:#000;background-color:rgb(' + r + ',' + g + ',' + b + ');'; negtotal += (diff * -1); }else if(diff > 0){ //Change the shade of green depending on the score r = Math.floor(40 * diff); g = Math.floor(40 * diff) + 140; b = Math.floor(30 * diff); style = 'color:#000;background-color:rgb(' + r + ',' + g + ',' + b + ');'; postotal += diff; }else{ style = ''; } $('#tweets').prepend('<div class="row"><div class="col-md-12" style="' + style + '">' + data.text + '<br>Score: ' + diff + '</div></div>'); });
And with a bit of luck, when we run the app again we get to see this in the browser:
As I said, not the most accurate - but it certainly is interesting to watch ;)
A few extra ideas for the above test app:
t.stream( 'statuses/filter', { track: ['love', 'hate'] },
So possibly add a way to enter the keywords + start/stop the stream from the browser.