Real time sentiment analysis

Part 1

I've been meaning to make the effort to fiddle with Node.js for awhile, and for lack of a better idea I've decided to dive in to what seems to be the staple for Node.js newbies; a simple app that will stream updates from Twitter.

That's how it started, at least. What follows is my take at making it slightly more challenging; although the actual sentiment analysis part is rudimentary. When the time comes, and with a lot of further research, the plan is to change the algorithm from negative/positive scoring (or "looks like randomly assigned green or red") to "80% accurate detection of sentiment".

Installing Node.js

The process to get Node.js up and running is as simple as grabbing the installer for your platform from the Node.js site. Installation is pretty straightforward, no configuration is needed for now.

To test that everything is working, use the usual Hello World example:


// Load the http module to create an http server.
var http = require('http');
 
// Configure our HTTP server to respond with Hello World to all requests.
var server = http.createServer(function (request, response) {
    response.writeHead(200, {"Content-Type": "text/plain"});
    response.end("Hello World\n");
});
 
// Listen on port 8000, IP defaults to 127.0.0.1
server.listen(8000);
 
// Put a friendly message on the terminal
console.log("Server running at http://127.0.0.1:8000/");

Slap it all in a file called "hello.js" and execute it from the console with:

node hello.js

Creating a new project

First things first; create a folder for the project:

mkdir twittest

Now create a file named package.json within that folder. The package.json file is used to detail the project, including any dependencies (used later on). This is a basic package.json to begin with:

{
    "name": "TwitTest",
    "version": "0.0.1",
    "private": "true"
}

Next step is to define the dependencies for the project. For this project I'll be using the following:

  • ntwitter - Asynchronous Twitter REST/stream/search client API for node.js; we wouldn't have much of an app without this.
  • Express - Sinatra inspired web development framework for node.js; used later in the project for all of the HTTP gruntwork.
  • Jade - Template engine used for the frontend.
  • socket.io - Library for pushing data to and from the client. To quote "It's care-free realtime 100% in JavaScript."
  • mysql - MySQL client.

The package.json file now looks like this:

{
    "name": "TwitTest",
    "version": "0.0.1",
    "private": "true",
    "dependencies": {
        "socket.io": "*",
        "ntwitter": "*",
        "express": "3.x",
        "jade": "*",
        "mysql": "2.0.0-alpha9"
    }
}

Once happy with the package file, or if it's ever changed, a simple:

npm install

in the project folder will grab all the dependencies for us.

Sticking with the Twitter theme (definitely not because of my expert design skillz), the TwitTest project will be using Twitter Bootstrap for the frontend. While mucking about in the project folder, Express will need a folder to serve static assets from, and Jade will need a folder for views. The folder structure for TwitTest now looks like:

twittest
  |_node_modules
  |_public
    |_css
    |_js
    |_fonts
    |_img
  |_views

node_modules is the folder created by the npm install command, containing any dependencies/modules for the project. Bootstrap assets are placed under the /public/ folder.

Streaming to the console

Finally time to get going with some actual code! Wait, ntwitter requires Twitter API keys? Fine. Off to https://dev.twitter.com/apps, and with a bit of prodding they've coughed up some keys for use in the project.

Go ahead and create a file called apikeys.js, inserting values where appropriate:

var apikeys = {
    consumer_key: '[your consumer_key]',
    consumer_secret: '[your consumer_secret]',
    access_token_key: '[your access_token_key]',
    access_token_secret: '[your access_token_secret]'
};
module.exports = apikeys;
And a file called twittest.js, adding ntwitter as a requirement a long with our apikeys:


 //Required modules
var twitter = require('ntwitter');
var apikeys = require('./apikeys.js');
Because we are just logging straight to the console, we won't require any additional modules. We will need to create the stream, though:


//Create a new twitter object using our keys
var t = new twitter({
    consumer_key: apikeys.consumer_key,
    consumer_secret: apikeys.consumer_secret,
    access_token_key: apikeys.access_token_key,
    access_token_secret: apikeys.access_token_secret
});
 
t.stream('statuses/sample',
         function(stream){
            stream.on('data', function(tweet){
                console.log(tweet.text);
            });
        });

Save the file and give it a run with:

node twittest.js

Ctrl+C will kill the app, saving you from whatever depths of hell twitter pulls it's sample stream from:

Express and Jade for a simple frontend stack

Now that we've recovered from the Twitter spam, it's time to get a frontend up and running in a browser. Express is what seems like a great framework for Node.js, but for now we will just be using it to get a server up and running with a single route on '/':

//Extra modules that are required
var express = require('express');
var app = express()
  , http = require('http')
  , server = http.createServer(app);
 
//Listen on port 8000
server.listen(8000);
 
//Serve static files from /public
app.use(express.static(__dirname + '/public'));
 
//On getting /, respond with Hello World
app.get('/', function(request, response){
	response.writeHead(200, {"Content-Type": "text/plain"});
	response.end("Hello World\n");	
});

As you can see, setting up a simple HTTP server that uses Express is almost as simple as our first Hello World example. The first benefit so far of using Express is the routing, the second being able to use Jade for the templating/view system:

[...]
//Listen on port 8000
server.listen(8000);
 
app.set('views', __dirname + '/views');
app.set('view engine', 'jade');
 
//Serve static files from /public
app.use(express.static(__dirname + '/public'));
[...]

Jade does templating quite well in my opinion, but for a better understanding I recommend reading the docs linked above. For now I will just be listing the templates that were created for this project, starting with layout.jade. In this file, I've recreated the starter template from Bootstrap using Jade:

!!!5
html
  head
    meta(charset='utf-8')
    meta(name='viewport', content='width=device-width, initial-scale=1.0')
    title #{title}
    link(rel='stylesheet', href='/css/bootstrap.min.css')
    link(rel='stylesheet', href='/css/starter-template.css')
    script(src='/js/jquery.js')
    script(src='/js/bootstrap.min.js')
  body
    div.navbar.navbar-inverse.navbar-fixed-top
      div.container
        div.navbar-header
          button.navbar-toggle(type='button', data-toggle='collapse', data-target='.navbar-collapse')
            span.icon-bar
            span.icon-bar
            span.icon-bar
          a.navbar-brand(href='#') TestTwit           
        div.collapse.navbar-collapse
          ul.nav.navbar-nav
            li.active
              a(href='#') Home
    div.container
      block content

Then there is index.jade:

extends layout
block content
  div(id='tweets')

And finally, to get the view to render, we make the following changes to the '/' route:

app.get('/', function(request, response){
	response.render('index', {
		"title": "TwitTest",
		"content": "testing"
	});
});

Loading up the site in a browser should now look like this:

Streaming to the browser with socket.io

Using socket.io couldn't be any easier. The first thing we need to change is add it as a required module and set it to listen on our HTTP server in twittest.js:

var app = express()
  , http = require('http')
  , server = http.createServer(app)
  , io = require('socket.io').listen(server);

The browser will need a way to connect to our server, so create a file called index.js under /public/js:

var socket = io.connect(); //Simple!

and include that in index.jade:

extends layout
block content
  div(id='tweets')
  script(src='js/index.js')

Back in twittest.js, we need to change the code around a bit to only start the Twitter stream when a socket.io connection is opened:

//We'll hold the current twitter stream here, so that we can destroy if 
//the client disconnects
var currentStream;
 
//On a new connection, we start a twitter stream
io.sockets.on('connection', function(socket){
	//Really should have better error handling, TODO!
	connection.connect(function(err){
		if(err){}
	});
 
	//Code here is the same as before, except now we emit the tweet 
	//using socket.io, instead of logging to the console.
	//(also checking for English tweets only, helps with the spam ;)
	t.stream(
		'statuses/sample',
		function(stream){
			stream.on('data', function(tweet){
				if(tweet.lang == 'en'){
					socket.emit('tweet', tweet.text);
				}
			});
			currentStream = stream;
		}
	);
 
	//When the client disconnects, make sure we destroy the stream
	socket.on('disconnect', function(){
		if(currentStream){
			currentStream.destroy();
			currentStream = false;
		}
		connection.end();
	});
});

Finally, back in index.js we need to handle the tweet emitted from the server:

//On receiving a tweet
socket.on('tweet', function(data){
	//Add the tweet text to the #tweets div.
	$('#tweets').prepend('<div class="row"><div class="col-md-12">' + data.text + '</div></div>');
}
Running the app and opening it in a browser should display something like this:


Basic positive / negative scoring

This is where things get a bit tricky, thus far I think the above covers most of what is needed to get a simple Node.js stack up and running; so I think my initial goal of learning a bit of Node.js has been covered.

What we are attempting in the following section is to score tweets as either "negative" or "positive", depending on the words used.

I am still researching better ways to do accurate sentiment analysis, but for now the method below will serve as a base for Part 2.

In theory, what we're doing is splitting a tweet up into individual words and scoring each word with the following criteria:

  • Positivity
  • Negativity
  • Objectivity

Each on a scale from 0 to 1. Neg/Pos are both quite obvious, objectivity will be used as a type of "weighting"; words which are not clear on their meaning will not sway the score by a large amount.

If trained correctly, this system should be accurate enough to indicate whether a tweet is positive or negative. Unfortunately, my current word list is not trained very well - so we only see an accurate prediction about 30% of the time.

With that said, let's have a look at the code:

In twittest.js, we add a database connection:

var mysql = require('mysql');
var connection = mysql.createConnection({
	host: '[host]',
	user: '[user]',
	password: '[password]'
});

And functions to parse a tweet, followed by getting the 'score' of a tweet:

function parseTweet(tweet, socket){
	//Strip all non alpha characters from the tweet
	tweet.stripped = tweet.text.replace(/[^a-zA-Z ]/g, "");
	//Get all the words
	tweet_words = tweet.stripped.split(" ");
	//Only use words longer than 3 characters, we won't find much meaning in smaller words
	words = [];
	for(i = 0; i < tweet_words.length; i++){
		if(tweet_words[i].length > 3){
			words.push(tweet_words[i]);
		}
	}
	tweet.words = words;
	//Get the score
	getScore(tweet,socket);
}
 
function getScore(tweet, socket){
	//Build the database query
	//We really should sanitize this ;) Fine for demo purposes
	query = 'select word, positive, negative, objective from sentiment.word where word in("' + tweet.words.join('","') + '")';
	connection.query(query, function(err, rows, fields){
		if(err){ return; }
		var positive = 0;
		var negative = 0;
                //For each of the rows returned, add to the score
		for(i = 0; i < rows.length; i++){
			positive += rows[i].positive * (1 - rows[i].objective);
			negative += rows[i].negative * (1 - rows[i].objective);
		}
		tweet.score = {};
		tweet.score.positive = positive;
		tweet.score.negative = negative;
		//Finally emit the tweet, along with the score
		socket.emit('tweet', {text: tweet.text, user: tweet.screen_name, score: tweet.score});
	});
}

Don't forget to destroy the database connection when socket.io disconnects, and change the stream function to use the parseTweet function instead of simply emitting the tweet.

In index.js, we need to add a bit of extra code to highlight tweets in red or green:

socket.on('tweet', function(data){
	//Get the 'final' score	
	var diff = data.score.positive - data.score.negative;
	var r,g,b = 0;
 
	//Diff < 0 is a negative tweet
	if(diff < 0){
		//Change the shade of red depending on the score
		r = Math.floor(40 * diff * -1) + 140;
		g = Math.floor(40 * diff * -1);
		b = Math.floor(30 * diff * -1);
		style = 'color:#000;background-color:rgb(' + r + ',' + g + ',' + b + ');';
		negtotal += (diff * -1);
	}else if(diff > 0){
		//Change the shade of green depending on the score
		r = Math.floor(40 * diff);
		g = Math.floor(40 * diff) + 140;
		b = Math.floor(30 * diff);
		style = 'color:#000;background-color:rgb(' + r + ',' + g + ',' + b + ');';
		postotal += diff;
	}else{
		style = '';
	}
 
        $('#tweets').prepend('<div class="row"><div class="col-md-12" style="' + style + '">' + data.text + '<br>Score: ' + diff + '</div></div>');
});

And with a bit of luck, when we run the app again we get to see this in the browser:

As I said, not the most accurate - but it certainly is interesting to watch ;)

Additional ideas

A few extra ideas for the above test app:

  • You can use statuses/filter instead of statuses/sample, like so:
t.stream(
		'statuses/filter',
                { track: ['love', 'hate'] },

So possibly add a way to enter the keywords + start/stop the stream from the browser.

  • A graph is a nice touch: