aboutsummaryrefslogtreecommitdiff
path: root/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2023-05-26 00:40:40 +0200
committerMitja Felicijan <mitja.felicijan@gmail.com>2023-05-26 00:40:40 +0200
commit43b0708769eb61392050045b881f8e6ba39c5b66 (patch)
tree3939579a13b8325325d5ebb8e05324a41ed78a6d /content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
parent49e7e7d555a6cd9810d81561fa3e98e3d64502be (diff)
downloadmitjafelicijan.com-43b0708769eb61392050045b881f8e6ba39c5b66.tar.gz
Massive update to posts, archetypes
Added a archetypes for creating notes and posts so it auto-populates fields. Fixed existing posts so they align with the rule of 80 columns now.
Diffstat (limited to 'content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md')
-rw-r--r--content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md43
1 files changed, 32 insertions, 11 deletions
diff --git a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
index 30b0fd4..995da25 100644
--- a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
+++ b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
@@ -1,21 +1,29 @@
1--- 1---
2title: Using sentiment analysis for clickbait detection in RSS feeds 2title: Using sentiment analysis for clickbait detection in RSS feeds
3url: using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html 3url: using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html
4date: 2019-10-19 4date: 2019-10-19T12:00:00+02:00
5draft: false 5draft: false
6--- 6---
7 7
8## Initial thoughts 8## Initial thoughts
9 9
10One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions. 10One of the things that interested me for a while now is if major well
11established news sites use click bait titles to drive additional traffic
12to their sites and generate additional impressions.
11 13
12Goal is to see how article titles and actual content of article differ from each other and see if titles are clickbaited. 14Goal is to see how article titles and actual content of article differ from
15each other and see if titles are clickbaited.
13 16
14## Preparing and cleaning data 17## Preparing and cleaning data
15 18
16For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents. 19For this example I opted to just use RSS feed from a new website and decided
20to go with [The Guardian](https://www.theguardian.com) World news. While this
21gets us limited data (~40) articles and also description (actual content) is
22trimmed this really doesn't reflect the actual article contents.
17 23
18To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice. 24To get better content I could use web scraping and use RSS as link list and
25fetch contents directly from website, but for this simple example this will
26suffice.
19 27
20There are couple of requirements we need to install before we continue: 28There are couple of requirements we need to install before we continue:
21 29
@@ -39,9 +47,15 @@ for item in feed.entries:
39 47
40## Perform sentiment analysis 48## Perform sentiment analysis
41 49
42Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis. 50Since we now have cleaned up data in our `feed.entries` object we can start with
51performing sentiment analysis.
43 52
44There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use. 53There are many sentiment analysis libraries available that range from rule-based
54sentiment analysis up to machine learning supported analysis. To keep things
55simple I decided to use rule-based analysis library
56[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from
57[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite
58easy to use.
45 59
46```python 60```python
47from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 61from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
@@ -54,7 +68,9 @@ for item in feed.entries:
54 sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) 68 sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
55``` 69```
56 70
57Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article. 71Now that we have this data in a shape that is compatible with matplotlib we can
72plot results to see the difference between title and description sentiment of
73an article.
58 74
59```python 75```python
60import matplotlib.pyplot as plt 76import matplotlib.pyplot as plt
@@ -69,12 +85,16 @@ plt.show()
69## Results and assets 85## Results and assets
70 86
711. Because of the small sample size further conclusions are impossible to make. 871. Because of the small sample size further conclusions are impossible to make.
722. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights. 882. Rule-based approach may not be the best way of doing this. By using deep
733. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it. 89 learning we would be able to get better insights.
903. **Next step would be to** periodically fetch RSS items and store them over
91 a longer period of time and then perform analysis again and use either
92 machine learning or deep learning on top of it.
74 93
75![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png) 94![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)
76 95
77Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment. 96Figure above displays difference between title and description sentiment for
97specific RSS feed item. 1 means positive and -1 means negative sentiment.
78 98
79[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb) 99[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)
80 100
@@ -84,3 +104,4 @@ Figure above displays difference between title and description sentiment for spe
84- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) 104- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment)
85- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) 105- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis)
86- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) 106- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis)
107