From 2417a6b7603524dc5cd30d29b153f91024b9443d Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Wed, 1 Nov 2023 22:54:27 +0100 Subject: Move to Jekyll --- ...g-sentiment-analysis-for-clickbait-detection.md | 108 --------------------- 1 file changed, 108 deletions(-) delete mode 100644 content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md (limited to 'content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md') diff --git a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md deleted file mode 100644 index d5729ed..0000000 --- a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md +++ /dev/null @@ -1,108 +0,0 @@ ---- -title: Using sentiment analysis for clickbait detection in RSS feeds -url: using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html -date: 2019-10-19T12:00:00+02:00 -type: post -draft: false ---- - -## Initial thoughts - -One of the things that interested me for a while now is if major well -established news sites use click bait titles to drive additional traffic to -their sites and generate additional impressions. - -Goal is to see how article titles and actual content of article differ from each -other and see if titles are clickbaited. - -## Preparing and cleaning data - -For this example I opted to just use RSS feed from a new website and decided to -go with [The Guardian](https://www.theguardian.com) World news. While this gets -us limited data (~40) articles and also description (actual content) is trimmed -this really doesn't reflect the actual article contents. - -To get better content I could use web scraping and use RSS as link list and -fetch contents directly from website, but for this simple example this will -suffice. - -There are couple of requirements we need to install before we continue: - -- `pip3 install feedparser` (parses RSS feed from url) -- `pip3 install vaderSentiment` (does sentiment polarity analysis) -- `pip3 install matplotlib` (plots chart of results) - -So first we need to fetch RSS data and sanitize HTML content from description. - -```python -import re -import feedparser - -feed_url = "https://www.theguardian.com/world/rss" -feed = feedparser.parse(feed_url) - -# sanitize html -for item in feed.entries: - item.description = re.sub('<[^<]+?>', '', item.description) -``` - -## Perform sentiment analysis - -Since we now have cleaned up data in our `feed.entries` object we can start with -performing sentiment analysis. - -There are many sentiment analysis libraries available that range from rule-based -sentiment analysis up to machine learning supported analysis. To keep things -simple I decided to use rule-based analysis library -[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from -[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to -use. - -```python -from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer -analyser = SentimentIntensityAnalyzer() - -sentiment_results = [] -for item in feed.entries: - sentiment_title = analyser.polarity_scores(item.title) - sentiment_description = analyser.polarity_scores(item.description) - sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) -``` - -Now that we have this data in a shape that is compatible with matplotlib we can -plot results to see the difference between title and description sentiment of an -article. - -```python -import matplotlib.pyplot as plt - -plt.rcParams['figure.figsize'] = (15, 3) -plt.plot(sentiment_results, drawstyle='steps') -plt.title('Sentiment analysis relationship between title and description (Guardian World News)') -plt.legend(['title', 'description']) -plt.show() -``` - -## Results and assets - -1. Because of the small sample size further conclusions are impossible to make. -2. Rule-based approach may not be the best way of doing this. By using deep - learning we would be able to get better insights. -3. **Next step would be to** periodically fetch RSS items and store them over a - longer period of time and then perform analysis again and use either machine - learning or deep learning on top of it. - -![Relationship between title and description](/posts/sentiment-analysis/guardian-sa-title-desc-relationship.png) - -Figure above displays difference between title and description sentiment for -specific RSS feed item. 1 means positive and -1 means negative sentiment. - -[ยป Download Jupyter Notebook](/posts/sentiment-analysis/sentiment-analysis.ipynb) - -## Going further - -- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood) -- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) -- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) -- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) - -- cgit v1.2.3