diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-05-26 00:40:40 +0200 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-05-26 00:40:40 +0200 |
| commit | 43b0708769eb61392050045b881f8e6ba39c5b66 (patch) | |
| tree | 3939579a13b8325325d5ebb8e05324a41ed78a6d /content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md | |
| parent | 49e7e7d555a6cd9810d81561fa3e98e3d64502be (diff) | |
| download | mitjafelicijan.com-43b0708769eb61392050045b881f8e6ba39c5b66.tar.gz | |
Massive update to posts, archetypes
Added a archetypes for creating notes and posts so it auto-populates
fields.
Fixed existing posts so they align with the rule of 80 columns now.
Diffstat (limited to 'content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md')
| -rw-r--r-- | content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md | 43 |
1 files changed, 32 insertions, 11 deletions
diff --git a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md index 30b0fd4..995da25 100644 --- a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md +++ b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md | |||
| @@ -1,21 +1,29 @@ | |||
| 1 | --- | 1 | --- |
| 2 | title: Using sentiment analysis for clickbait detection in RSS feeds | 2 | title: Using sentiment analysis for clickbait detection in RSS feeds |
| 3 | url: using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html | 3 | url: using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html |
| 4 | date: 2019-10-19 | 4 | date: 2019-10-19T12:00:00+02:00 |
| 5 | draft: false | 5 | draft: false |
| 6 | --- | 6 | --- |
| 7 | 7 | ||
| 8 | ## Initial thoughts | 8 | ## Initial thoughts |
| 9 | 9 | ||
| 10 | One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions. | 10 | One of the things that interested me for a while now is if major well |
| 11 | established news sites use click bait titles to drive additional traffic | ||
| 12 | to their sites and generate additional impressions. | ||
| 11 | 13 | ||
| 12 | Goal is to see how article titles and actual content of article differ from each other and see if titles are clickbaited. | 14 | Goal is to see how article titles and actual content of article differ from |
| 15 | each other and see if titles are clickbaited. | ||
| 13 | 16 | ||
| 14 | ## Preparing and cleaning data | 17 | ## Preparing and cleaning data |
| 15 | 18 | ||
| 16 | For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents. | 19 | For this example I opted to just use RSS feed from a new website and decided |
| 20 | to go with [The Guardian](https://www.theguardian.com) World news. While this | ||
| 21 | gets us limited data (~40) articles and also description (actual content) is | ||
| 22 | trimmed this really doesn't reflect the actual article contents. | ||
| 17 | 23 | ||
| 18 | To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice. | 24 | To get better content I could use web scraping and use RSS as link list and |
| 25 | fetch contents directly from website, but for this simple example this will | ||
| 26 | suffice. | ||
| 19 | 27 | ||
| 20 | There are couple of requirements we need to install before we continue: | 28 | There are couple of requirements we need to install before we continue: |
| 21 | 29 | ||
| @@ -39,9 +47,15 @@ for item in feed.entries: | |||
| 39 | 47 | ||
| 40 | ## Perform sentiment analysis | 48 | ## Perform sentiment analysis |
| 41 | 49 | ||
| 42 | Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis. | 50 | Since we now have cleaned up data in our `feed.entries` object we can start with |
| 51 | performing sentiment analysis. | ||
| 43 | 52 | ||
| 44 | There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use. | 53 | There are many sentiment analysis libraries available that range from rule-based |
| 54 | sentiment analysis up to machine learning supported analysis. To keep things | ||
| 55 | simple I decided to use rule-based analysis library | ||
| 56 | [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from | ||
| 57 | [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite | ||
| 58 | easy to use. | ||
| 45 | 59 | ||
| 46 | ```python | 60 | ```python |
| 47 | from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer | 61 | from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer |
| @@ -54,7 +68,9 @@ for item in feed.entries: | |||
| 54 | sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) | 68 | sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) |
| 55 | ``` | 69 | ``` |
| 56 | 70 | ||
| 57 | Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article. | 71 | Now that we have this data in a shape that is compatible with matplotlib we can |
| 72 | plot results to see the difference between title and description sentiment of | ||
| 73 | an article. | ||
| 58 | 74 | ||
| 59 | ```python | 75 | ```python |
| 60 | import matplotlib.pyplot as plt | 76 | import matplotlib.pyplot as plt |
| @@ -69,12 +85,16 @@ plt.show() | |||
| 69 | ## Results and assets | 85 | ## Results and assets |
| 70 | 86 | ||
| 71 | 1. Because of the small sample size further conclusions are impossible to make. | 87 | 1. Because of the small sample size further conclusions are impossible to make. |
| 72 | 2. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights. | 88 | 2. Rule-based approach may not be the best way of doing this. By using deep |
| 73 | 3. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it. | 89 | learning we would be able to get better insights. |
| 90 | 3. **Next step would be to** periodically fetch RSS items and store them over | ||
| 91 | a longer period of time and then perform analysis again and use either | ||
| 92 | machine learning or deep learning on top of it. | ||
| 74 | 93 | ||
| 75 |  | 94 |  |
| 76 | 95 | ||
| 77 | Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment. | 96 | Figure above displays difference between title and description sentiment for |
| 97 | specific RSS feed item. 1 means positive and -1 means negative sentiment. | ||
| 78 | 98 | ||
| 79 | [» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb) | 99 | [» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb) |
| 80 | 100 | ||
| @@ -84,3 +104,4 @@ Figure above displays difference between title and description sentiment for spe | |||
| 84 | - [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) | 104 | - [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) |
| 85 | - [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) | 105 | - [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) |
| 86 | - [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) | 106 | - [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) |
| 107 | |||
