aboutsummaryrefslogtreecommitdiff
path: root/src/experiments
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2019-10-22 13:32:43 +0200
committerMitja Felicijan <mitja.felicijan@gmail.com>2019-10-22 13:32:43 +0200
commitbe09bb704f8030a072114108fc094f07ee62efba (patch)
treeb05537d59b1f47398f38bacdbe138f1c83ac9399 /src/experiments
parent629d3907b89f795667ba5fe2f31b790bf56093cd (diff)
downloadmitjafelicijan.com-be09bb704f8030a072114108fc094f07ee62efba.tar.gz
Added 301 rules for old articles
Diffstat (limited to 'src/experiments')
-rw-r--r--src/experiments/encoding-binary-data-into-dna-sequence.md1
-rw-r--r--src/experiments/profiling-python-web-applications-with-visual-tools.md1
-rw-r--r--src/experiments/simple-iot-application.md1
-rw-r--r--src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md1
-rw-r--r--src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md86
5 files changed, 90 insertions, 0 deletions
diff --git a/src/experiments/encoding-binary-data-into-dna-sequence.md b/src/experiments/encoding-binary-data-into-dna-sequence.md
index cc42bd7..cdff15f 100644
--- a/src/experiments/encoding-binary-data-into-dna-sequence.md
+++ b/src/experiments/encoding-binary-data-into-dna-sequence.md
@@ -1,4 +1,5 @@
1title: Encoding binary data into DNA sequence 1title: Encoding binary data into DNA sequence
2description: Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it
2date: 2019-01-03 3date: 2019-01-03
3tags: experiment 4tags: experiment
4hide: false 5hide: false
diff --git a/src/experiments/profiling-python-web-applications-with-visual-tools.md b/src/experiments/profiling-python-web-applications-with-visual-tools.md
index 29e16d7..58d85bf 100644
--- a/src/experiments/profiling-python-web-applications-with-visual-tools.md
+++ b/src/experiments/profiling-python-web-applications-with-visual-tools.md
@@ -1,4 +1,5 @@
1title: Profiling Python web applications with visual tools 1title: Profiling Python web applications with visual tools
2description: Missing link when debugging and profiling python web application
2date: 2017-04-21 3date: 2017-04-21
3tags: experiment 4tags: experiment
4hide: false 5hide: false
diff --git a/src/experiments/simple-iot-application.md b/src/experiments/simple-iot-application.md
index b8744e6..1543e52 100644
--- a/src/experiments/simple-iot-application.md
+++ b/src/experiments/simple-iot-application.md
@@ -1,4 +1,5 @@
1title: Simple IOT application supported by real-time monitoring and data history 1title: Simple IOT application supported by real-time monitoring and data history
2description: Develop simple IOT application with Arduino MKR1000 and Python
2date: 2017-08-11 3date: 2017-08-11
3tags: experiment 4tags: experiment
4hide: false 5hide: false
diff --git a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
index bc00d1e..ab0079f 100644
--- a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
+++ b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md
@@ -1,4 +1,5 @@
1title: Using DigitalOcean Spaces Object Storage with FUSE 1title: Using DigitalOcean Spaces Object Storage with FUSE
2description: Using DigitalOcean Spaces Object Storage with FUSE
2date: 2018-01-16 3date: 2018-01-16
3tags: experiment 4tags: experiment
4hide: false 5hide: false
diff --git a/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
new file mode 100644
index 0000000..c27c6d0
--- /dev/null
+++ b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md
@@ -0,0 +1,86 @@
1title: Using sentiment analysis for click&#8209;bait detection in RSS feeds
2description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait
3date: 2019-10-19
4tags: experiment
5hide: false
6----
7
8## Initial thoughts
9
10One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions.
11
12Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited.
13
14## Preparing and cleaning data
15
16For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents.
17
18To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice.
19
20There are couple of requirements we need to install before we continue:
21
22- `pip3 install feedparser` (parses RSS feed from url)
23- `pip3 install vaderSentiment` (does sentiment polarity analysis)
24- `pip3 install matplotlib` (plots chart of results)
25
26So first we need to fetch RSS data and sanitize HTML content from description.
27
28```python
29import re
30import feedparser
31
32feed_url = "https://www.theguardian.com/world/rss"
33feed = feedparser.parse(feed_url)
34
35for item in feed.import re:
36 # sanitize html
37 item.description = re.sub('<[^<]+?>', '', item.description)
38```
39
40## Perform sentiment analysis
41
42Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis.
43
44There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use.
45
46```python
47from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
48analyser = SentimentIntensityAnalyzer()
49
50sentiment_results = []
51for item in feed.entries:
52 sentiment_title = analyser.polarity_scores(item.title)
53 sentiment_description = analyser.polarity_scores(item.description)
54 sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
55```
56
57Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article.
58
59```python
60import matplotlib.pyplot as plt
61
62plt.rcParams['figure.figsize'] = (15, 3)
63plt.plot(sentiment_results, drawstyle='steps')
64plt.title('Sentiment analysis relationship between title and description (Guardian World News)')
65plt.legend(['title', 'description'])
66plt.show()
67```
68
69## Results and assets
70
711. Because of the small sample size further conclusions are impossible to make.
722. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights.
733. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it.
74
75![Relationship between title and description](/files/sentiment-analysis/guardian-sa-title-desc-relationship.png)
76
77Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment.
78
79[ยป Download Jupyter Notebook](/files/sentiment-analysis/sentiment-analysis.ipynb)
80
81## Going further
82
83- [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood)
84- [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment)
85- [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis)
86- [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis)