Description of Twitter Data Streamed Using Twitter4j

Description of Twitter Data Streamed Using Twitter4j

Streamed data Vs. Query database Two methods to get the data we need : • Streamed from Twitter (This is what we have chosen) • Advantage: It is free and takes no time to get. • Con: Subject to availability--twitter only release part of its stream data to personal accounts but more to those who with organization accounts. However, we do not know exactly how much data does twitter release to the public. • Query their database • Advantage: Data is more complete than streamed data • Con: The query takes too much time and it is not free.

Crawler Tools • The API available Twitter REST API is provided by twitter. Now it is version 1.1. It provides functions for users to connect to Twitter’s API server and get streamed data. • The Tool we used Twitter4j is what we used to get the data. It is an open sourced software that implements many of Twitter API’s functions. The author is Yusuke Yamamoto. The following is the web link: http://twitter4j.org/en/index.html

Twitter4j Functions • ConfigurationBuilder() Build a configuration object with OAuth key, OAuth secret, AccessToken, and AccessToken secret which all need to be obtained by registering with twitter through OAuth authentication system. We registered a personal account through this system • TwitterStreamFactory() Takes the authentication information as input and establishes connection with the steaming server. • StatusListener() This is the event catcher and it catches any incoming stream and store it in a Status object

Twitter4j Functions Continued • Status() This creates Status objects that contains user name, user ID, status ID, text content of the status, time created, what kind of devices the user used to post the status, geographical location of the user when the status is posted( only if user choose to release it), how many times this status has been retweeted, user of the status and some other information. • User() This creates user object that returns the information about the user who posted the status

The Data We have crawled the data for 3 days. The data are arranged into 3 types, all in txt format:

A little side note on the raw data The raw data is about 17 times bigger in size when compared to the Sina data: Each txt file contains 4000 entries and take around 9Mb disk space while each Sina Data’s file contains 70000 entries and is about 8Mb . We can download about 6Gb data per day, so we will have many files created to keep those data. The file numbers will result in slow transfer of file as well as processing the data for research later on.

Description of Twitter Data Streamed Using Twitter4j

Description of Twitter Data Streamed Using Twitter4j

Presentation Transcript

Data description

Time is of the Essence: Improving Recency Ranking Using Twitter Data

Data Description

Twitter4J, Jenkins and Regression

Time is of Essence: Improving Recency Ranking Using Twitter Data

Data Mining and Twitter

Streamed Video

Basic Description of Data

Streamed Validation

RDF: Data Description

Using Twitter

Description of measurement data

Description of enumeration data

twitter me this? using twitter in higher ed

Analysis of Twitter Data

Data Description

Statistical Description of Data

Description of Multivariate Data

Predicting Flu Trends using Twitter Data

Scrape Twitter followers| Twitter data scraping

Mathematical modeling and streamed data processing

Data Description