From be09bb704f8030a072114108fc094f07ee62efba Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Tue, 22 Oct 2019 13:32:43 +0200 Subject: Added 301 rules for old articles --- .../encoding-binary-data-into-dna-sequence.md | 1 + ...ng-python-web-applications-with-visual-tools.md | 1 + src/experiments/simple-iot-application.md | 1 + ...digitalocean-spaces-object-storage-with-fuse.md | 1 + ...alysis-for-click-bait-detection-in-rss-feeds.md | 86 ++++++++++++++++++++++ 5 files changed, 90 insertions(+) create mode 100644 src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md (limited to 'src/experiments') diff --git a/src/experiments/encoding-binary-data-into-dna-sequence.md b/src/experiments/encoding-binary-data-into-dna-sequence.md index cc42bd7..cdff15f 100644 --- a/src/experiments/encoding-binary-data-into-dna-sequence.md +++ b/src/experiments/encoding-binary-data-into-dna-sequence.md @@ -1,4 +1,5 @@ title: Encoding binary data into DNA sequence +description: Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it date: 2019-01-03 tags: experiment hide: false diff --git a/src/experiments/profiling-python-web-applications-with-visual-tools.md b/src/experiments/profiling-python-web-applications-with-visual-tools.md index 29e16d7..58d85bf 100644 --- a/src/experiments/profiling-python-web-applications-with-visual-tools.md +++ b/src/experiments/profiling-python-web-applications-with-visual-tools.md @@ -1,4 +1,5 @@ title: Profiling Python web applications with visual tools +description: Missing link when debugging and profiling python web application date: 2017-04-21 tags: experiment hide: false diff --git a/src/experiments/simple-iot-application.md b/src/experiments/simple-iot-application.md index b8744e6..1543e52 100644 --- a/src/experiments/simple-iot-application.md +++ b/src/experiments/simple-iot-application.md @@ -1,4 +1,5 @@ title: Simple IOT application supported by real-time monitoring and data history +description: Develop simple IOT application with Arduino MKR1000 and Python date: 2017-08-11 tags: experiment hide: false diff --git a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md index bc00d1e..ab0079f 100644 --- a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md +++ b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md @@ -1,4 +1,5 @@ title: Using DigitalOcean Spaces Object Storage with FUSE +description: Using DigitalOcean Spaces Object Storage with FUSE date: 2018-01-16 tags: experiment hide: false diff --git a/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md new file mode 100644 index 0000000..c27c6d0 --- /dev/null +++ b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md @@ -0,0 +1,86 @@ +title: Using sentiment analysis for click‑bait detection in RSS feeds +description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait +date: 2019-10-19 +tags: experiment +hide: false +---- + +## Initial thoughts + +One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions. + +Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited. + +## Preparing and cleaning data + +For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents. + +To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice. + +There are couple of requirements we need to install before we continue: + +- `pip3 install feedparser` (parses RSS feed from url) +- `pip3 install vaderSentiment` (does sentiment polarity analysis) +- `pip3 install matplotlib` (plots chart of results) + +So first we need to fetch RSS data and sanitize HTML content from description. + +```python +import re +import feedparser + +feed_url = "https://www.theguardian.com/world/rss" +feed = feedparser.parse(feed_url) + +for item in feed.import re: + # sanitize html + item.description = re.sub('<[^<]+?>', '', item.description) +``` + +## Perform sentiment analysis + +Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis. + +There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use. + +```python +from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer +analyser = SentimentIntensityAnalyzer() + +sentiment_results = [] +for item in feed.entries: + sentiment_title = analyser.polarity_scores(item.title) + sentiment_description = analyser.polarity_scores(item.description) + sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) +``` + +Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article. + +```python +import matplotlib.pyplot as plt + +plt.rcParams['figure.figsize'] = (15, 3) +plt.plot(sentiment_results, drawstyle='steps') +plt.title('Sentiment analysis relationship between title and description (Guardian World News)') +plt.legend(['title', 'description']) +plt.show() +``` + +## Results and assets + +1. Because of the small sample size further conclusions are impossible to make. +2. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights. +3. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it. + +![Relationship between title and description](/files/sentiment-analysis/guardian-sa-title-desc-relationship.png) + +Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment. + +[ยป Download Jupyter Notebook](/files/sentiment-analysis/sentiment-analysis.ipynb) + +## Going further + +- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood) +- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) +- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) +- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) -- cgit v1.2.3