From 2417a6b7603524dc5cd30d29b153f91024b9443d Mon Sep 17 00:00:00 2001
From: Mitja Felicijan <mitja.felicijan@gmail.com>
Date: Wed, 1 Nov 2023 22:54:27 +0100
Subject: Move to Jekyll

---
 ...lysis-for-clickbait-detection-in-rss-feeds.html | 88 ----------------------
 1 file changed, 88 deletions(-)
 delete mode 100755 public/using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html

(limited to 'public/using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html')
diff --git a/public/using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html b/public/using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html
deleted file mode 100755
index 7a70590..0000000
--- a/public/using-sentiment-analysis-for-clickbait-detection-in-rss-feeds.html
+++ /dev/null
@@ -1,88 +0,0 @@
-<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=generator content="JBMAFP - github.com/mitjafelicijan/jbmafp"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>Using sentiment analysis for clickbait detection in RSS feeds</title><meta name=description content="Initial thoughtsOne of the things that interested me for a while now is if major wellestablished news sites use click bait titles to drive additional traffic totheir sites and generate additional impressions."><meta name=author content="Mitja Felicijan"><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>:root{--border-color:gainsboro;--border-size:2px;--link-color:blue;--bg-color:#eee}*::selection{background:var(--link-color);color:#fff}*::-moz-selection{background:var(--link-color);color:#fff}*::-webkit-selection{background:var(--link-color);color:#fff}body{padding:2.5rem;max-width:1900px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px}hr{border:0;border-bottom:var(--border-size)solid var(--border-color);margin-block-start:1.5rem}a{color:var(--link-color);text-decoration:none}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}blockquote{font-style:italic}table{max-width:100%;border:var(--border-size)solid var(--border-color);border-collapse:separate;border-spacing:0}table thead tr th{border-bottom:var(--border-size)solid var(--border-color);text-align:left}table th,table td{padding:.5em .8em}ul.list li{padding:.2em 0}ul{line-height:1.35em}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:var(--border-size)solid var(--border-color)}code{padding:0 3px;font-size:14px;border:0;background:var(--bg-color)}pre code{line-height:1.3em;background:#fff}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{width:800px;max-width:100%;text-align:center}figcaption p{margin:.3em 0 1.5em;font-style:italic}img,video,audio{width:800px;max-width:100%;border:var(--border-size)solid var(--border-color);padding:.5em}header nav{display:flex;gap:.9rem}article iframe{margin:0!important}audio::-webkit-media-controls-enclosure{border-radius:0}@media only screen and (max-width:600px){body{padding:.5em;word-wrap:break-word}header nav{gap:.7rem}header nav .hob{display:none}a{word-wrap:break-word}img,video,audio{padding:0}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=navigation aria-label="Main navigation"><a href=/>Home</a>
-<a href=/#posts>Posts</a>
-<a href=/#notes>Notes</a>
-<a href=/#sideprojects class=hob>Side Projects</a>
-<a href=/vault.html>Vault</a>
-<a href=https://github.com/mitjafelicijan target=_blank>Code</a>
-<a href=/mitjafelicijan.pgp.pub.txt target=_blank class=hob>PGP</a>
-<a href=/curriculum-vitae.html>CV</a>
-<a href=/index.xml target=_blank class=hob>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>Using sentiment analysis for clickbait detection in RSS feeds</h1><p><cap>post</cap>, Oct 19, 2019 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><h2 id=initial-thoughts>Initial thoughts</h2><p>One of the things that interested me for a while now is if major well
-established news sites use click bait titles to drive additional traffic to
-their sites and generate additional impressions.<p>Goal is to see how article titles and actual content of article differ from each
-other and see if titles are clickbaited.<h2 id=preparing-and-cleaning-data>Preparing and cleaning data</h2><p>For this example I opted to just use RSS feed from a new website and decided to
-go with <a href=https://www.theguardian.com>The Guardian</a> World news. While this gets
-us limited data (~40) articles and also description (actual content) is trimmed
-this really doesn't reflect the actual article contents.<p>To get better content I could use web scraping and use RSS as link list and
-fetch contents directly from website, but for this simple example this will
-suffice.<p>There are couple of requirements we need to install before we continue:<ul><li><code>pip3 install feedparser</code> (parses RSS feed from url)<li><code>pip3 install vaderSentiment</code> (does sentiment polarity analysis)<li><code>pip3 install matplotlib</code> (plots chart of results)</ul><p>So first we need to fetch RSS data and sanitize HTML content from description.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:#00f>import</span> re
-</span></span><span style=display:flex><span><span style=color:#00f>import</span> feedparser
-</span></span><span style=display:flex><span>
-</span></span><span style=display:flex><span>feed_url = <span style=color:#a31515>&#34;https://www.theguardian.com/world/rss&#34;</span>
-</span></span><span style=display:flex><span>feed = feedparser.parse(feed_url)
-</span></span><span style=display:flex><span>
-</span></span><span style=display:flex><span><span style=color:green># sanitize html</span>
-</span></span><span style=display:flex><span><span style=color:#00f>for</span> item <span style=color:#00f>in</span> feed.entries:
-</span></span><span style=display:flex><span>    item.description = re.sub(<span style=color:#a31515>&#39;&lt;[^&lt;]+?&gt;&#39;</span>, <span style=color:#a31515>&#39;&#39;</span>, item.description)
-</span></span></code></pre><h2 id=perform-sentiment-analysis>Perform sentiment analysis</h2><p>Since we now have cleaned up data in our <code>feed.entries</code> object we can start with
-performing sentiment analysis.<p>There are many sentiment analysis libraries available that range from rule-based
-sentiment analysis up to machine learning supported analysis. To keep things
-simple I decided to use rule-based analysis library
-<a href=https://github.com/cjhutto/vaderSentiment>vaderSentiment</a> from
-<a href=https://github.com/cjhutto>C.J. Hutto</a>. Really nice library and quite easy to
-use.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:#00f>from</span> vaderSentiment.vaderSentiment <span style=color:#00f>import</span> SentimentIntensityAnalyzer
-</span></span><span style=display:flex><span>analyser = SentimentIntensityAnalyzer()
-</span></span><span style=display:flex><span>
-</span></span><span style=display:flex><span>sentiment_results = []
-</span></span><span style=display:flex><span><span style=color:#00f>for</span> item <span style=color:#00f>in</span> feed.entries:
-</span></span><span style=display:flex><span>    sentiment_title = analyser.polarity_scores(item.title)
-</span></span><span style=display:flex><span>    sentiment_description = analyser.polarity_scores(item.description)
-</span></span><span style=display:flex><span>    sentiment_results.append([sentiment_title[<span style=color:#a31515>&#39;compound&#39;</span>], sentiment_description[<span style=color:#a31515>&#39;compound&#39;</span>]])
-</span></span></code></pre><p>Now that we have this data in a shape that is compatible with matplotlib we can
-plot results to see the difference between title and description sentiment of an
-article.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:#00f>import</span> matplotlib.pyplot <span style=color:#00f>as</span> plt
-</span></span><span style=display:flex><span>
-</span></span><span style=display:flex><span>plt.rcParams[<span style=color:#a31515>&#39;figure.figsize&#39;</span>] = (15, 3)
-</span></span><span style=display:flex><span>plt.plot(sentiment_results, drawstyle=<span style=color:#a31515>&#39;steps&#39;</span>)
-</span></span><span style=display:flex><span>plt.title(<span style=color:#a31515>&#39;Sentiment analysis relationship between title and description (Guardian World News)&#39;</span>)
-</span></span><span style=display:flex><span>plt.legend([<span style=color:#a31515>&#39;title&#39;</span>, <span style=color:#a31515>&#39;description&#39;</span>])
-</span></span><span style=display:flex><span>plt.show()
-</span></span></code></pre><h2 id=results-and-assets>Results and assets</h2><ol><li>Because of the small sample size further conclusions are impossible to make.<li>Rule-based approach may not be the best way of doing this. By using deep
-learning we would be able to get better insights.<li><strong>Next step would be to</strong> periodically fetch RSS items and store them over a
-longer period of time and then perform analysis again and use either machine
-learning or deep learning on top of it.</ol><figure><img src=/posts/sentiment-analysis/guardian-sa-title-desc-relationship.png alt="Relationship between title and description"></figure><p>Figure above displays difference between title and description sentiment for
-specific RSS feed item. 1 means positive and -1 means negative sentiment.<p><a href=/posts/sentiment-analysis/sentiment-analysis.ipynb>» Download Jupyter Notebook</a><h2 id=going-further>Going further</h2><ul><li><a href=https://github.com/bswiss/news_mood>Twitter Sentiment Analysis by Bryan Schwierzke</a><li><a href=https://github.com/thisandagain/sentiment>AFINN-based sentiment analysis for Node.js by Andrew Sliwinski</a><li><a href=https://github.com/adeshpande3/LSTM-Sentiment-Analysis>Sentiment Analysis with LSTMs in Tensorflow by Adit Deshpande</a><li><a href=https://github.com/abdulfatir/twitter-sentiment-analysis>Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc. by Abdul Fatir</a></ul></div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/linux/NFSv4ServerLockClients target=_blank rel=noopener>Finding which NFSv4 client owns a lock on a Linux NFS(v4) server</a> — <a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>A while back I wrote an entry about finding which NFS client owns
-a lock on a Linux NFS server, which turned
-out to be specific to NFS v3 (which I really should have seen coming,
-since it involved NLM and lockd). Finding the NFS v4 client that
-owns a lock is, depending on your perspective, either simpl…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a> — <a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen
-and Bradley Kuhn, are interacting on the OSI's license-discuss
-list where the're doing
-bad computer history and insisting that a guy Larry Rosen
-coincidentally interviewed for a book years ago is clearly the origin of
-somethin…<li><a href="http://offbeatpursuit.com:80/blog/?id=25" target=_blank rel=noopener>A fix by any other name</a> — <a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><div>tags:
-i2c, plan9
-Another month, another file system.
-Well, if you can’t fix it in software, fix it in hardware (looking at
-you, bme680, we’re not
-done yet). The show must go on, as they say, and I would like my
-experiments to go on.
-So a “new” addition to the environmental sensor family connected to
-the h…<li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a> — <a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2<li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a> — <a href=https://drewdevault.com>Drew DeVault's blog</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to
-this mortal coil, we are endowed with self-awareness, agency, and free will.
-Each of the 8 billion members of this human race represents a unique person, a
-unique worldview, and a unique agency. Yet, many of us have the sam…<li><a href=https://szymonkaliski.com/writing/2023-10-02-building-a-diy-pen-plotter/ target=_blank rel=noopener>Building a DIY Pen Plotter</a> — <a href=http://github.com/dylang/node-rss>Szymon Kaliski</a><div>This article documents my learnings from designing and building a DIY Pen Plotter during the summer of 2023.
-My ultimate goal is to build my…<li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a> — <a href=https://neil.computer/>Neil Panchal</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. There was no choice. For filing taxes, your CPA is going to ask you for an Income Statement (also known as P/L statement). If<li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a> — <a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up making myself a promise that the next app I work on would be using a more productive & simpler<li><a href=https://jcs.org/2023/10/25/wifi_da target=_blank rel=noopener>BlueSCSI Wi-Fi Desk Accessory 1.0 Released</a> — <a href=https://jcs.org/>joshua stein</a><div>BlueSCSI Wi-Fi Desk Accessory
-1.0 has been released:
-wifi_da-1.0.sit
-(StuffIt 3 archive)
-SHA256: ccfc9d27dd5da7412d10cef73b81119a1fec3848e4d1d88ff652a07ffdc6a69aSHA1: ff124972f202ceda6d7fa4788110a67ccda6a13a
-This is the initial public release of my BlueSCSI Wi-Fi Desk Accessory for
-classic MacOS.<li><a href=https://michael.stapelberg.ch/posts/2023-10-25-my-all-flash-zfs-network-storage-build/ target=_blank rel=noopener>My 2023 all-flash ZFS NAS (Network Storage) build</a> — <a href=https://michael.stapelberg.ch/>Michael Stapelbergs Website</a><div>For over 10 years now, I run two self-built NAS (Network Storage) devices which serve media (currently via Jellyfin) and run daily backups of all my PCs and servers.
-In this article, I describe my goals, which hardware I picked for my new build (and why) and how I set it up.
-Design Goals
-I use my netw…</ul><p>Generated with <a href=https://git.sr.ht/~sircmpwn/openring target=_blank rel=noopener>openring</a>.</section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email
-at <a href=mailto:mitja.felicijan@gmail.com>mitja.felicijan@gmail.com</a> or
-catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless
-specified otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer><script>
-	    window.va = window.va || function () { (window.vaq = window.vaq || []).push(arguments); };
-	  </script><script defer src=/_vercel/insights/script.js></script>
\ No newline at end of file
-- 
cgit v1.2.3