1 files changed, 25 insertions, 25 deletions
diff --git a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
index 995da25..e7324bb 100644
--- a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
+++ b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
@@ -7,22 +7,22 @@ draft: false
 ## Initial thoughts
-One of the things that interested me for a while now is  if major well 
+One of the things that interested me for a while now is if major well
-established news sites use click bait titles to drive additional traffic 
+established news sites use click bait titles to drive additional traffic to
-to their sites and generate additional impressions.
+their sites and generate additional impressions.
-Goal is to see how article titles and actual content of article differ from 
+Goal is to see how article titles and actual content of article differ from each
-each other and see if titles are clickbaited.
+other and see if titles are clickbaited.
 ## Preparing and cleaning data
-For this example I opted to just use RSS feed from a new website and decided 
+For this example I opted to just use RSS feed from a new website and decided to
-to go with [The Guardian](https://www.theguardian.com) World news. While this 
+go with [The Guardian](https://www.theguardian.com) World news. While this gets
-gets us limited data (~40) articles and also description (actual content) is 
+us limited data (~40) articles and also description (actual content) is trimmed
-trimmed this really doesn't reflect the actual article contents.
+this really doesn't reflect the actual article contents.
-To get better content I could use web scraping and use RSS as link list and 
+To get better content I could use web scraping and use RSS as link list and
-fetch contents directly from website, but for this simple example this will 
+fetch contents directly from website, but for this simple example this will
 suffice.
 There are couple of requirements we need to install before we continue:
@@ -50,12 +50,12 @@ for item in feed.entries:
 Since we now have cleaned up data in our `feed.entries` object we can start with
 performing sentiment analysis.
-There are many sentiment analysis libraries available that range from rule-based 
+There are many sentiment analysis libraries available that range from rule-based
-sentiment analysis up to machine learning supported analysis. To keep things 
+sentiment analysis up to machine learning supported analysis. To keep things
-simple I decided to use rule-based analysis library 
+simple I decided to use rule-based analysis library
-[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from 
+[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from
-[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite 
+[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to
-easy to use.
+use.
 ```python
 from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
@@ -68,9 +68,9 @@ for item in feed.entries:
    sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
 ```
-Now that we have this data in a shape that is compatible with matplotlib we can 
+Now that we have this data in a shape that is compatible with matplotlib we can
-plot results to see the difference between title and description sentiment of 
+plot results to see the difference between title and description sentiment of an
-an article.
+article.
 ```python
 import matplotlib.pyplot as plt
@@ -85,15 +85,15 @@ plt.show()
 ## Results and assets
 1. Because of the small sample size further conclusions are impossible to make.
-2. Rule-based approach may not be the best way of doing this. By using deep 
+2. Rule-based approach may not be the best way of doing this. By using deep
   learning we would be able to get better insights.
-3. **Next step would be to** periodically fetch RSS items and store them over 
+3. **Next step would be to** periodically fetch RSS items and store them over a
-   a longer period of time and then perform analysis again and use either 
+   longer period of time and then perform analysis again and use either machine
-   machine learning or deep learning on top of it.
+   learning or deep learning on top of it.
 ![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)
-Figure above displays difference between title and description sentiment for 
+Figure above displays difference between title and description sentiment for
 specific RSS feed item. 1 means positive and -1 means negative sentiment.
 [» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)

diff --git a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md index 995da25..e7324bb 100644 --- a/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md +++ b/content/posts/2019-10-19-using-sentiment-analysis-for-clickbait-detection.md
@@ -7,22 +7,22 @@ draft: false
7		7
8	## Initial thoughts	8	## Initial thoughts
9		9
10	One of the things that interested me for a while now is if major well	10	One of the things that interested me for a while now is if major well
11	established news sites use click bait titles to drive additional traffic	11	established news sites use click bait titles to drive additional traffic to
12	to their sites and generate additional impressions.	12	their sites and generate additional impressions.
13		13
14	Goal is to see how article titles and actual content of article differ from	14	Goal is to see how article titles and actual content of article differ from each
15	each other and see if titles are clickbaited.	15	other and see if titles are clickbaited.
16		16
17	## Preparing and cleaning data	17	## Preparing and cleaning data
18		18
19	For this example I opted to just use RSS feed from a new website and decided	19	For this example I opted to just use RSS feed from a new website and decided to
20	to go with [The Guardian](https://www.theguardian.com) World news. While this	20	go with [The Guardian](https://www.theguardian.com) World news. While this gets
21	gets us limited data (~40) articles and also description (actual content) is	21	us limited data (~40) articles and also description (actual content) is trimmed
22	trimmed this really doesn't reflect the actual article contents.	22	this really doesn't reflect the actual article contents.
23		23
24	To get better content I could use web scraping and use RSS as link list and	24	To get better content I could use web scraping and use RSS as link list and
25	fetch contents directly from website, but for this simple example this will	25	fetch contents directly from website, but for this simple example this will
26	suffice.	26	suffice.
27		27
28	There are couple of requirements we need to install before we continue:	28	There are couple of requirements we need to install before we continue:
@@ -50,12 +50,12 @@ for item in feed.entries:
50	Since we now have cleaned up data in our `feed.entries` object we can start with	50	Since we now have cleaned up data in our `feed.entries` object we can start with
51	performing sentiment analysis.	51	performing sentiment analysis.
52		52
53	There are many sentiment analysis libraries available that range from rule-based	53	There are many sentiment analysis libraries available that range from rule-based
54	sentiment analysis up to machine learning supported analysis. To keep things	54	sentiment analysis up to machine learning supported analysis. To keep things
55	simple I decided to use rule-based analysis library	55	simple I decided to use rule-based analysis library
56	[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from	56	[vaderSentiment](https://github.com/cjhutto/vaderSentiment) from
57	[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite	57	[C.J. Hutto](https://github.com/cjhutto). Really nice library and quite easy to
58	easy to use.	58	use.
59		59
60	```python	60	```python
61	from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer	61	from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
@@ -68,9 +68,9 @@ for item in feed.entries:
68	sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])	68	sentiment_results.append([sentiment_title['compound'], sentiment_description['compound']])
69	```	69	```
70		70
71	Now that we have this data in a shape that is compatible with matplotlib we can	71	Now that we have this data in a shape that is compatible with matplotlib we can
72	plot results to see the difference between title and description sentiment of	72	plot results to see the difference between title and description sentiment of an
73	an article.	73	article.
74		74
75	```python	75	```python
76	import matplotlib.pyplot as plt	76	import matplotlib.pyplot as plt
@@ -85,15 +85,15 @@ plt.show()
85	## Results and assets	85	## Results and assets
86		86
87	1. Because of the small sample size further conclusions are impossible to make.	87	1. Because of the small sample size further conclusions are impossible to make.
88	2. Rule-based approach may not be the best way of doing this. By using deep	88	2. Rule-based approach may not be the best way of doing this. By using deep
89	learning we would be able to get better insights.	89	learning we would be able to get better insights.
90	3. Next step would be to periodically fetch RSS items and store them over	90	3. Next step would be to periodically fetch RSS items and store them over a
91	a longer period of time and then perform analysis again and use either	91	longer period of time and then perform analysis again and use either machine
92	machine learning or deep learning on top of it.	92	learning or deep learning on top of it.
93		93
94	![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)	94	![Relationship between title and description](/assets/sentiment-analysis/guardian-sa-title-desc-relationship.png)
95		95
96	Figure above displays difference between title and description sentiment for	96	Figure above displays difference between title and description sentiment for
97	specific RSS feed item. 1 means positive and -1 means negative sentiment.	97	specific RSS feed item. 1 means positive and -1 means negative sentiment.
98		98
99	[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)	99	[» Download Jupyter Notebook](/assets/sentiment-analysis/sentiment-analysis.ipynb)