Social media analytics is the process of gathering data from stakeholder conversations on digital media and processing into structured insights leading to more information-driven business decisions and increased customer centrality for brands and businesses.

With social media monitoring, businesses can also look at how many people follow their presence on Facebook and the number of times people interact with their social profile by sharing or liking their posts. More advanced types of social media analysis involve sentiment analytics. This practice involves sophisticated natural-language-processing machine learning algorithms parsing the text in a person’s social media post about a company to understand the meaning behind that person’s statement. These algorithms can create a quantified score of the public’s feelings toward a company based on social media interactions and give reports to management on how well the company interacts with customers.

Dataval’s Social Media Analytics Framework

Architect that we have designed comprises of User Interface where a person can query for an analytics report on a particular topic. Query can be on the basis of some keyword, trending topics or some business specific products. Query from the User Interface is processed by our analytics engine which generates statistical report in form of some graphs and numbers and presents that report on the dashboard.

Our Analytics engine consists of three main components:

  • Data Extraction
  • NLP Operations
  • Data Analytics

Data Extraction

In order to extract data, we used Platform API and public data provided by various social media platforms. These API can get you the data specific to a query but since all of these API are not free therefore the amount of data and they return depends upon the edition or version which we use to extract the data. One such API that we have often used is Twitter API. We have built an algorithm that downloads tweets related to keyword or hashtag as they are posted online. In addition to text of tweets these API’s also provide facility to download a plethora of data and metadata related to that tweet and the user who tweeted or retweeted that status, including, but absolutely not limited to: time, date, location, language, number of followers, number of accounts following, date of account creation, profile picture, and username of who made the original tweet and who retweeted the status.

NLP Operations

Depending upon a use case our Analytics engine performs variety of NLP Operations by using open-source software libraries like Spacy, TextBlob, NLTK. These libraries play a very important role in our analytics engine when we want to process natural language and retrieve sentiment of the text, retrieve popularity of named entity etc. Let say our use case is to find top 10 popular phones of Samsung trending in twitter then by using Named entity recognition technique of the above mentioned libraries we can easily get such information, provided we have already extracted the data using Twitter API.

Data Analytics

Our analytics engine performs analysis on the data parallelly along with NLP operations (if required). Statistical results are evaluated and visualized by it using libraries like matplotlib, plotly. It shows all results on the dashboard in an easily understandable form using graphs, charts and tables.

Framework Scope

This framework can be useful in various applications where analysis result on social media data can play a vital role in taking better decisions. Currently this framework has been tested upon two areas:

Brand Data Analysis

Our framework extracted last 30 days data from twitter and facebook using a query in our framework’s user interface. Twitter API gave us tweets and other meta-data related to Samsung brand. These were stored by our engine in a database from where it generated analytics reports.
Our Engine was able to generate following analytics reports:

  • Reports showing customer sentiments for Samsung brand by classifying sentiments into three categories as positive, negative and neutral.
  • Samsung’s popularity among its counterparts.
  • World cloud showing popular hashtags that people used in their tweets.
  • Most re-tweeted tweets about Samsung.
  • Locations which tweeted most about Samsung brand.

Political Data Analysis

Our framework extracted last 30 days data from twitter using a query which searched for data using popular hashtags related to politics. Our Engine was able to generate following analytics reports:

  • Reports showing most talked about political leader area wise.
  • Area wise sentiment analysis of people towards a political party.
  • World cloud showing popular hashtags that people used in their tweets.
  • Most talked about political topic in social media.
  • Location wise scatter plot showing region from where people tweeted.