It was Friday at work and didn’t had much of work to do .
While my surf at SCN surfing came across a Blog from Hillary Bliss ..
Sharknado Social Media Analysis with SAP HANA and Predictive Analysis
I had already planned for movie The Wolf Of Wall Street for the evening and had heard some good reviews of it
(It has to be Caprio always has Good ones ..where the Hell is his Oscar!! ).
Thought of trying to get the reviews from Twitter using the basic outlines of the Text Data Processing Blueprints.
Here are the steps and my explorations.
Data Extraction
Twitter provides an open Search API that provides an option to retrieve "popular tweets" in addition to real-time search results.
Source data consists of unstructured text in form of tweets which are retrieved from Twitter REST based Search API with the search term. Can check this out at.. Apigee twitter API console..
As in the Blueprint:
- Step 1: Create an account on https://dev.twitter.com
- Because Twitter is a third-party application, these steps may change.
- Create or log into your Twitter developer account at http://dev.twitter.com.
- In My Applications, create a new application with a unique name and a placeholder URL
- Open the OAuth settings to locate the Consumer Key and the Consumer Secret values, and add the values to the search.cfg file.
- Create an access token and refresh the page.
- Add the access token and the access token secret values to the search.cfg file.
- Edit the search.cfg configuration file to specify the terms or hashtags that are used for the Twitter search.
- Because Twitter is a third-party application, these steps may change.
Dataflow Development
- So now we have the source we are good to design a Data Services Job and Dataflow to extract and analyze the sentiment in tweets .
- Create a Job in the BODS Designer and develop the dataflows for the process
Below is the implementation screenshot of the Dataflow
Twitter Search Dataflow:
- This dataflow primarily connects to the Twitter API and extracts the tweets and loads it into the Database tables
- It uses two User Defined Transforms which has the python code to connect to the twitter
- GET_SEARCH_TASKS: It’s a User defined Transform which prepares the inputs for the twitter search API by extracting information from Search.cfg file.
- Search Twitter Transform: It’s a User defined Transform which retrieves the tweets from Twitter Search API
Sentiment Analysis:
- Twitter Process Dataflow:
- This dataflow uses the Base Entity Extraction Transform of TEXT DATA PROCESSING and analyzes the sentiments in the tweets.
- Entity_Extraction_Transform:
This transform extracts the basic entities for sentiment analysis.
It uses the English language module and the “english-tf-voc-sentiment.fsm” rule file provided by SAP for the analysis.
Run the Job.
And get reviews in general from Twitter Public stream.
Though the tables can be used to build universe and and detailed reports could have been generated I thought I would try that later..
I had to catch up with the Movie