As per Wikipedia  “Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. Sentiment analysis is widely applied to reviews and social media for a variety of applications, ranging from marketing to customer services. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation, affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).” But I like to view emotional analysis and sentiment analysis as two separate topic to research on, with a thin line of difference. Notably people might argue they are one and the same, its fair for everyone to have there opinion right?  So for now we would keep the emotional analysis of a text separate and would focus on understanding the polarity of the text.

For now, I have identified the following 5 basic steps for sentiment analysis, over the course of time I will be implementing the following basic step in combination with below mentioned algorithms.

  1. Text preprocessing / Noise Removal – Cleaning the data to hand pick the relevant information thereby reducing the Nosie and amplifying the signal strength
  2. Named Entity Recognition – This is the most important part of sentiment analysis as the objective of sentiment analysis.
  3. Feature Selection – The features can be unigrams and/or bigrams or higher n-grams with/without punctuation and  stop words with each feature being associated with feature scorer . I would try to hypothesis the importance of stop word and  punctuation through the study, with my current level of understanding punctuation might helps in detecting sarcasm and exclamation so could a lead clue in recognizing context.
  4. Subjectivity Classification – Classifying sentences as subjective or objective since subjective sentences hold sentiments while objective sentences are facts and figures.
  5. Sentiment Extraction- I am going too use the following algorithms in order understand pros and cons and perform a sentiment analysis on each of them:
  • LogisticRegression
  • KNeighborsClassifier
  • SVC
  • LinearSVC
  • NuSVC
  • DecisionTreeClassifier
  • ExtraTreesClassifier
  • RandomForestClassifier
  • GradientBoostingClassifier
  • Naive Bayes
  • BernoulliNB
  • GaussianNB
  • MultinomialNB

Idea here is evaluate the following algorithm, discuss in detail about the pros and cons and may be even device a new one. Once completed with them I would be moving on the deep learning algorithm specifically RNN.

Programming language : Python

Packages : (will be updated)

Dataset : To start with I will be using a apt dataset from the list of datasets : Machine Learning Datasets For Researchers/Data Scientists