diff options
Diffstat (limited to 'src/experiments')
5 files changed, 90 insertions, 0 deletions
diff --git a/src/experiments/encoding-binary-data-into-dna-sequence.md b/src/experiments/encoding-binary-data-into-dna-sequence.md index cc42bd7..cdff15f 100644 --- a/src/experiments/encoding-binary-data-into-dna-sequence.md +++ b/src/experiments/encoding-binary-data-into-dna-sequence.md | |||
| @@ -1,4 +1,5 @@ | |||
| 1 | title: Encoding binary data into DNA sequence | 1 | title: Encoding binary data into DNA sequence |
| 2 | description: Imagine a world where you could go outside and take a leaf from a tree and put it through your personal DNA sequencer and get data like music, videos or computer programs from it | ||
| 2 | date: 2019-01-03 | 3 | date: 2019-01-03 |
| 3 | tags: experiment | 4 | tags: experiment |
| 4 | hide: false | 5 | hide: false |
diff --git a/src/experiments/profiling-python-web-applications-with-visual-tools.md b/src/experiments/profiling-python-web-applications-with-visual-tools.md index 29e16d7..58d85bf 100644 --- a/src/experiments/profiling-python-web-applications-with-visual-tools.md +++ b/src/experiments/profiling-python-web-applications-with-visual-tools.md | |||
| @@ -1,4 +1,5 @@ | |||
| 1 | title: Profiling Python web applications with visual tools | 1 | title: Profiling Python web applications with visual tools |
| 2 | description: Missing link when debugging and profiling python web application | ||
| 2 | date: 2017-04-21 | 3 | date: 2017-04-21 |
| 3 | tags: experiment | 4 | tags: experiment |
| 4 | hide: false | 5 | hide: false |
diff --git a/src/experiments/simple-iot-application.md b/src/experiments/simple-iot-application.md index b8744e6..1543e52 100644 --- a/src/experiments/simple-iot-application.md +++ b/src/experiments/simple-iot-application.md | |||
| @@ -1,4 +1,5 @@ | |||
| 1 | title: Simple IOT application supported by real-time monitoring and data history | 1 | title: Simple IOT application supported by real-time monitoring and data history |
| 2 | description: Develop simple IOT application with Arduino MKR1000 and Python | ||
| 2 | date: 2017-08-11 | 3 | date: 2017-08-11 |
| 3 | tags: experiment | 4 | tags: experiment |
| 4 | hide: false | 5 | hide: false |
diff --git a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md index bc00d1e..ab0079f 100644 --- a/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md +++ b/src/experiments/using-digitalocean-spaces-object-storage-with-fuse.md | |||
| @@ -1,4 +1,5 @@ | |||
| 1 | title: Using DigitalOcean Spaces Object Storage with FUSE | 1 | title: Using DigitalOcean Spaces Object Storage with FUSE |
| 2 | description: Using DigitalOcean Spaces Object Storage with FUSE | ||
| 2 | date: 2018-01-16 | 3 | date: 2018-01-16 |
| 3 | tags: experiment | 4 | tags: experiment |
| 4 | hide: false | 5 | hide: false |
diff --git a/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md new file mode 100644 index 0000000..c27c6d0 --- /dev/null +++ b/src/experiments/using-sentiment-analysis-for-click-bait-detection-in-rss-feeds.md | |||
| @@ -0,0 +1,86 @@ | |||
| 1 | title: Using sentiment analysis for click‑bait detection in RSS feeds | ||
| 2 | description: Using Python with sentiment analysis to detect if titles in RSS feeds are click-bait | ||
| 3 | date: 2019-10-19 | ||
| 4 | tags: experiment | ||
| 5 | hide: false | ||
| 6 | ---- | ||
| 7 | |||
| 8 | ## Initial thoughts | ||
| 9 | |||
| 10 | One of the things that interested me for a while now is if major well established news sites use click bait titles to drive additional traffic to their sites and generate additional impressions. | ||
| 11 | |||
| 12 | Goal is to see how article titles and actual content of article differ from each other and see if titles are click-baited. | ||
| 13 | |||
| 14 | ## Preparing and cleaning data | ||
| 15 | |||
| 16 | For this example I opted to just use RSS feed from a new website and decided to go with [The Guardian](https://www.theguardian.com) World news. While this gets us limited data (~40) articles and also description (actual content) is trimmed this really doesn't reflect the actual article contents. | ||
| 17 | |||
| 18 | To get better content I could use web scraping and use RSS as link list and fetch contents directly from website, but for this simple example this will suffice. | ||
| 19 | |||
| 20 | There are couple of requirements we need to install before we continue: | ||
| 21 | |||
| 22 | - `pip3 install feedparser` (parses RSS feed from url) | ||
| 23 | - `pip3 install vaderSentiment` (does sentiment polarity analysis) | ||
| 24 | - `pip3 install matplotlib` (plots chart of results) | ||
| 25 | |||
| 26 | So first we need to fetch RSS data and sanitize HTML content from description. | ||
| 27 | |||
| 28 | ```python | ||
| 29 | import re | ||
| 30 | import feedparser | ||
| 31 | |||
| 32 | feed_url = "https://www.theguardian.com/world/rss" | ||
| 33 | feed = feedparser.parse(feed_url) | ||
| 34 | |||
| 35 | for item in feed.import re: | ||
| 36 | # sanitize html | ||
| 37 | item.description = re.sub('<[^<]+?>', '', item.description) | ||
| 38 | ``` | ||
| 39 | |||
| 40 | ## Perform sentiment analysis | ||
| 41 | |||
| 42 | Since we now have cleaned up data in our `feed.entries` object we can start with performing sentiment analysis. | ||
| 43 | |||
| 44 | There are many sentiment analysis libraries available that range from rule-based sentiment analysis up to machine learning supported analysis. To keep things simple I decided to use rule-based analysis library [vaderSentiment](https://github.com/cjhutto/vaderSentiment) from [C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to use. | ||
| 45 | |||
| 46 | ```python | ||
| 47 | from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer | ||
| 48 | analyser = SentimentIntensityAnalyzer() | ||
| 49 | |||
| 50 | sentiment_results = [] | ||
| 51 | for item in feed.entries: | ||
| 52 | sentiment_title = analyser.polarity_scores(item.title) | ||
| 53 | sentiment_description = analyser.polarity_scores(item.description) | ||
| 54 | sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']]) | ||
| 55 | ``` | ||
| 56 | |||
| 57 | Now that we have this data in a shape that is compatible with matplotlib we can plot results to see the difference between title and description sentiment of an article. | ||
| 58 | |||
| 59 | ```python | ||
| 60 | import matplotlib.pyplot as plt | ||
| 61 | |||
| 62 | plt.rcParams['figure.figsize'] = (15, 3) | ||
| 63 | plt.plot(sentiment_results, drawstyle='steps') | ||
| 64 | plt.title('Sentiment analysis relationship between title and description (Guardian World News)') | ||
| 65 | plt.legend(['title', 'description']) | ||
| 66 | plt.show() | ||
| 67 | ``` | ||
| 68 | |||
| 69 | ## Results and assets | ||
| 70 | |||
| 71 | 1. Because of the small sample size further conclusions are impossible to make. | ||
| 72 | 2. Rule-based approach may not be the best way of doing this. By using deep learning we would be able to get better insights. | ||
| 73 | 3. **Next step would be to** periodically fetch RSS items and store them over a longer period of time and then perform analysis again and use either machine learning or deep learning on top of it. | ||
| 74 | |||
| 75 |  | ||
| 76 | |||
| 77 | Figure above displays difference between title and description sentiment for specific RSS feed item. 1 means positive and -1 means negative sentiment. | ||
| 78 | |||
| 79 | [ยป Download Jupyter Notebook](/files/sentiment-analysis/sentiment-analysis.ipynb) | ||
| 80 | |||
| 81 | ## Going further | ||
| 82 | |||
| 83 | - [Twitter Sentiment Analysis by Bryan Schwierzke](https://github.com/bswiss/news_mood) | ||
| 84 | - [AFINN-based sentiment analysis for Node.js by Andrew Sliwinski](https://github.com/thisandagain/sentiment) | ||
| 85 | - [Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande](https://github.com/adeshpande3/LSTM-Sentiment-Analysis) | ||
| 86 | - [Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir](https://github.com/abdulfatir/twitter-sentiment-analysis) | ||
