From be09bb704f8030a072114108fc094f07ee62efba Mon Sep 17 00:00:00 2001
From: Mitja Felicijan <mitja.felicijan@gmail.com>
Date: Tue, 22 Oct 2019 13:32:43 +0200
Subject: Added 301 rules for old articles

---
 .../encoding-binary-data-into-dna-sequence.md      |  1 +
 ...ng-python-web-applications-with-visual-tools.md |  1 +
 src/experiments/simple-iot-application.md          |  1 +
 ...digitalocean-spaces-object-storage-with-fuse.md |  1 +
 ...alysis-for-click-bait-detection-in-rss-feeds.md | 86 ++++++++++++++++++++++
 5 files changed, 90 insertions(+)
 create mode 100644 src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md

(limited to 'src/experiments')

diff --git a/src/experiments/encoding-binary-data-into-dna-sequence.md b/src/experiments/encoding-binary-data-into-dna-sequence.md
index cc42bd7..cdff15f 100644
--- a/src/experiments/encoding-binary-data-into-dna-sequence.md
+++ b/src/experiments/encoding-binary-data-into-dna-sequence.md
@@ -1,4 +1,5 @@
 title: Encoding binary data into DNA sequence
+description: Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it
 date: 2019-01-03
 tags: experiment
 hide: false
diff --git a/src/experiments/profiling-python-web-applications-with-visual-tools.md b/src/experiments/profiling-python-web-applications-with-visual-tools.md
index 29e16d7..58d85bf 100644
--- a/src/experiments/profiling-python-web-applications-with-visual-tools.md
+++ b/src/experiments/profiling-python-web-applications-with-visual-tools.md
@@ -1,4 +1,5 @@
 title: Profiling Python web applications with visual tools
+description: Missing link when debugging and profiling python web application
 date: 2017-04-21
 tags: experiment
 hide: false
diff --git a/src/experiments/simple-iot-application.md b/src/experiments/simple-iot-application.md
index b8744e6..1543e52 100644
--- a/src/experiments/simple-iot-application.md
+++ b/src/experiments/simple-iot-application.md
@@ -1,4 +1,5 @@
 title: Simple IOT application supported by real-time monitoring and data history
+description: Develop simple IOT application with Arduino MKR1000 and Python
 date: 2017-08-11
 tags: experiment
 hide: false
diff --git a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
index bc00d1e..ab0079f 100644
--- a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
+++ b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
@@ -1,4 +1,5 @@
 title: Using DigitalOcean Spaces Object Storage with FUSE
+description: Using DigitalOcean Spaces Object Storage with FUSE
 date: 2018-01-16
 tags: experiment
 hide: false
diff --git a/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
new file mode 100644
index 0000000..c27c6d0
--- /dev/null
+++ b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
@@ -0,0 +1,86 @@
+title: Using sentiment analysis for click&#8209;bait detection in RSS feeds
+description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait
+date: 2019-10-19
+tags: experiment
+hide: false
+----
+
+## Initial thoughts
+
+One of the things that interested me for a while now is  if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions.
+
+Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited.
+
+## Preparing and cleaning data
+
+For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents.
+
+To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice.
+
+There are couple of requirements we need to install before we continue:
+
+- `pip3 install feedparser` (parses RSS feed from url)
+- `pip3 install vaderSentiment` (does sentiment polarity analysis)
+- `pip3 install matplotlib` (plots chart of results)
+
+So first we need to fetch RSS data and sanitize HTML content from description.
+
+```python
+import re
+import feedparser
+
+feed_url = "https://www.theguardian.com/world/rss"
+feed = feedparser.parse(feed_url)
+
+for item in feed.import re:
+    # sanitize html
+    item.description = re.sub('<[^<]+?>', '', item.description)
+```
+
+## Perform sentiment analysis
+
+Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis.
+
+There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use.
+
+```python
+from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
+analyser = SentimentIntensityAnalyzer()
+
+sentiment_results = []
+for item in feed.entries:
+    sentiment_title = analyser.polarity_scores(item.title)
+    sentiment_description = analyser.polarity_scores(item.description)
+    sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
+```
+
+Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article.
+
+```python
+import matplotlib.pyplot as plt
+
+plt.rcParams['figure.figsize'] = (15, 3)
+plt.plot(sentiment_results, drawstyle='steps')
+plt.title('Sentiment analysis relationship between title and description (Guardian World News)')
+plt.legend(['title', 'description'])
+plt.show()
+```
+
+## Results and assets
+
+1. Because of the small sample size further conclusions are impossible to make.
+2. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights.
+3. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it.
+
+![Relationship between title and description](/files/sentiment-analysis/guardian-sa-title-desc-relationship.png)
+
+Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment.
+
+[» Download Jupyter Notebook](/files/sentiment-analysis/sentiment-analysis.ipynb)
+
+## Going further
+
+- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood)
+- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment)
+- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis)
+- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis)
-- 
cgit v1.2.3