Using Groovy to decypher adolescent Internet use

I’ve been meaning to write about my experiences using the Groovy dynamic language in educational research for a while, but haven’t found time to do so due to a grueling work schedule and burnout.

This post talks about how the Groovy dynamic language is used in an educational research setting, helping us better understand the way in which Internet use plays a role in adolescent development.

The TeenTech Project

Before coming back to the wonderful life as a programmer for hire, I spent some time pursuing a Masters in Human Development at UBC. The biggest project I was involved with there was the Teentech research project, which sought to explore the positive and negative links between Internet use and adolescent development.

Now you would think that this would be something that would be studied to death. With all the talk in the media about cyberbullying and World of Warcraft addiction, there must be heaps and heaps of information about this stuff. The sad reality is that this field remains understudied. We don’t really know much about any of it. ( Yes, all that stuff in the media is moral panic. )

What was unique about the teentech project was the use of real-time monitoring data. Instead of asking kids to remember how much time they spent online or how they socialized on the Internet, we simply kept track of it. If I asked you to remember how much time you spent on the Internet this week, you would only be able to give me an approximate answer. You might also want to omit or exaggerate details about how often you talk to friends online or visit porn sites.

The project found families in BC that were willing to subscribe to our own ISP. Anytime a teenager logged on, their Internet use was logged via a Wingate server. All their Internet usage was tracked with their consent. Now if you were at home using the Internet, you might be aware that you are being tracked, but knowing that some anonymous research people were looking at your usage won’t deter you from downloading stuff or talking to your friends on facebook.

Groovy to the Rescue!

In our pilot study, we followed a total of 19 participants over a year and yielded 44 gigabytes of data transferred ( 13 million connections ). [ See the poster here in PDF: Preliminary Results from the TeenTech Research Project ].

Three years later, we had 4 years worth of data that took up 17 gigabytes of disk space. This data needed to be cleaned, scrubbed, filtered and categorized so that our local SPSS gurus can do all sorts of Hierarchical Linear Modelling magic to it. It was quite a daunting task.

Our logs were stored in the DBF data format. In 2005, when I first approached this problem, I wrote a Java program to parse this data into a MySQL database and filter out our users by name. Due to the hacky way in which JDBC connections had to be set up, I remember spending two days just getting to program right.

In 2008, I wrote a simple groovy script in less than two hours:

import com.linuxense.javadbf.*;
import java.io.*;
import java.sql.PreparedStatement;

// connect to db
def db = []
db << "jdbc:mysql://localhost:3306/teentech";
db << "root"
db << ""
db << "com.mysql.jdbc.Driver"
def sql = groovy.sql.Sql.newInstance( *db )
def webuse = sql.dataSet("web")

def teens = [ ] // list of teen ids here
def directoryLocation = new File( "/Volumes/Time Machine Backups/history/" )

directoryLocation.eachFile{
    f ->
    if( f.name.indexOf( "." ) != 0 ){
        DBFReader reader = new DBFReader( new FileInputStream( f ) );
        Object []r;
        while( (r = reader.nextRecord()) != null) {
        if( teens.contains( r[3].trim() )){
            webuse.add( DESCRIPTION : r[7], USERNAME:r[3].trim(), TYPE: r[6], DURATION: r[9].longValue(), BYTESIN: r[10].longValue(), BYTESOUT: r[11].longValue(), IP_NUMBER:r[4].trim(), STARTTIME: new Date( ( r[8] * 1000 ).longValue() - 1000 * 60 * 60 *3) )
        }
      }
    }
}

This function highlights three major advantages of the Groovy language:

  1. Groovy is Java – In 2005, I wrote my code insert in Java and then took data out of the database in python. Today, I can use the same java library I used to parse the DBF files ( http://sarovar.org/projects/javadbf/ ) in a dynamic language.
  2. Built-in JDBC support – Groovy has very powerful database features that makes inserting content into a database an afterthought. In this script, I used these features to convert the usage data into indexed MySQL tables.

  3. Powerful language features – You can see that I use the eachFile() syntax to traverse each of my 17 gigs worth of data. Without this, I would have spent some more time troubleshooting and trying to get this right.

The Road Ahead

This work is not done, obviously. I still need to write custom functions to retrieve the usage data from my new database and filter different types of usages by date, year and type of usage. This data would then be tossed off to my buddy Brent the stats guy so he can make sense of this relationship.

This is where Groovy gets interesting to me as a research tool. Because of the ever changing landscape of the Internet, it becomes very hard to buy a commercial application that would correctly classify Internet traffic. If we would have bought a tool in 2005, we would not been able to correctly capture traffic made by AJAX requests or understood the way in which streaming media like YouTube flv show up in our data.

High productivity programming languages like Python, Ruby, Groovy and Perl allow educational researchers like me to quickly whip up a custom tool that will look at this connectivity data and make sense of it. In 2005, I used Python. But now thanks to Groovy, I don’t have to use two or three separate languages. I have the power of Java right at my fingertips.

Thanks Groovy team, you’ve made my life easier.

Groovy, Grails and the Future of Psychological Research Tools

It’s very refreshing being back in the programming world and see the possibilities in education research afforded by rapid development languages like Groovy, Grails and Flex. One of the projects I worked with at UBC was setting up a social network to facilitate classroom discussions but also enable students who were in teacher education programs to interact with one another. We used the NING social network, but found it lacking in many respects. Seeing how quick it is to build a social network in Grails, I almost want to rewind time so I can set up this project again.

Another ripe area of potential research is the development of Survey tools with Adobe Flex, AIR and server side technologies like Grails. There are a lot of survey tools out there, mostly written in PHP. They suck. Some of the projects I worked on have tried using surveymonkey or another generic tool, only to find that the way responses are aggregated for marketing don’t work well for the individual granularity needed for psychological research. The tools that work are costly, and not very good. Some are Java based and hard to use, other are HTML-based and lack flexibility ( think basic tables ). Seeing the power of Flex and RIAs, I wish I had more time to devote to the development of these type of tools.

A final possibility is the use of spidering tools to tie in Internet usage with the real world. Groovy has very powerful and simple web spidering techniques ( see, for example this ). These type of technological tools have never been explored in the study of the social worlds of teenagers and young adults. For my last hurray before becoming a Grad School dropout, I investigated how Facebook usage and physical geography were tied in together — the idea was to see the effect of social or geographical factors ( rural vs. urban, wealthy vs. middle class ) on social network participation. Given that 1 in 50 people in the world today is on Facebook, there must be something interesting going on there… [ if you want to see some of the ramblings, check out the other blog — http://geographyoffacebook.wordpress.com/ ]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s