diff options
| author | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-10-29 14:41:39 +0100 |
|---|---|---|
| committer | Mitja Felicijan <mitja.felicijan@gmail.com> | 2023-10-29 14:41:39 +0100 |
| commit | 2836163e54e3b94342113314e70ee564c456c43e (patch) | |
| tree | 59b82fc69e83cc6d92846a8e9f510b0bb865cf3b /public/what-i-ve-learned-developing-ad-server.html | |
| parent | d50ea4053ea04abb3a455606d4591a8283af0677 (diff) | |
| download | mitjafelicijan.com-2836163e54e3b94342113314e70ee564c456c43e.tar.gz | |
Added public folder to git so it get get deployed on vercel
Diffstat (limited to 'public/what-i-ve-learned-developing-ad-server.html')
| -rwxr-xr-x | public/what-i-ve-learned-developing-ad-server.html | 140 |
1 files changed, 140 insertions, 0 deletions
diff --git a/public/what-i-ve-learned-developing-ad-server.html b/public/what-i-ve-learned-developing-ad-server.html new file mode 100755 index 0000000..ca94d08 --- /dev/null +++ b/public/what-i-ve-learned-developing-ad-server.html | |||
| @@ -0,0 +1,140 @@ | |||
| 1 | <!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>What I've learned developing ad server</title><meta name=description content="For the past year and half I have been developing native advertising server thatcontextually matches ads and displays them in different template forms onvariety of websites."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>body{padding:1rem;max-width:760px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px;margin:0 auto}hr{margin-block-start:1.5rem}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}table{max-width:100%;width:100%;border-collapse:separate;border-spacing:2px;border:1px solid #000;border-left:1px solid #999;border-top:1px solid #999}blockquote{font-style:italic}table thead{background:#eee}ul.list li{padding:.2em 0}ul{line-height:1.4em}td,th{border:1px solid #000;padding:4px;border-right:1px solid #999;border-bottom:1px solid #999;text-align:left}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:1px solid #dcdcdc}code{padding:0 3px;font-size:14px;border:0}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{text-align:center}figcaption p{margin:.3em 0 0}img,video,audio{max-width:100%}header{display:flex;flex-direction:row;gap:3rem}nav{display:flex;gap:.75rem}nav.main{flex-grow:1}.pstatus-orange{background:gold}.pstatus-green{background:#9acd32}.pstatus-red{background:#cd5c5c}@media only screen and (max-width:600px){body{padding:15px}header{flex-direction:column;gap:1rem}a{word-wrap:break-word}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=toolbar><a href=/>Home</a> | ||
| 2 | <a href=https://git.mitjafelicijan.com/ target=_blank>Git</a> | ||
| 3 | <a href=https://files.mitjafelicijan.com/ target=_blank>Files</a> | ||
| 4 | <a href=/radio.pls target=_blank>Radio</a> | ||
| 5 | <a href=/mitjafelicijan.pgp.pub.txt target=_blank>PGP</a> | ||
| 6 | <a href=/curriculum-vitae.html>CV</a> | ||
| 7 | <a href=/index.xml target=_blank>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>What I've learned developing ad server</h1><p><cap>post</cap>, Apr 17, 2017 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><p>For the past year and half I have been developing native advertising server that | ||
| 8 | contextually matches ads and displays them in different template forms on | ||
| 9 | variety of websites. This project grew from serving thousands of ads per day to | ||
| 10 | millions.<p>The system is made from couple of core components:<ul><li>API for serving ads,<li>Utils - cronjobs and queue management tools,<li>Dashboard UI.</ul><p>Initial release was using <a href=https://www.mongodb.com/>MongoDB</a> for full-text | ||
| 11 | search but was later replaced by <a href=https://www.elastic.co/>Elasticsearch</a> for | ||
| 12 | better CPU utilization and better search performance. This provided us with many | ||
| 13 | amazing functionalities of <a href=https://www.elastic.co/>Elasticsearch</a>. You should | ||
| 14 | check it out if you do any search related operations.<p>Because the premise of the server is to provide native ad experience, they are | ||
| 15 | rendered on the client side via simple templating engine. This ensures that ads | ||
| 16 | can be displayed number of different ways based on the visual style of the | ||
| 17 | page. And this makes JavaScript client library quite complex.<p>So now that you know basic information about the product lets get into the | ||
| 18 | lessons we learned.<h2 id=aggregate-everything>Aggregate everything</h2><p>After beta version was released everything (impressions, clicks, etc) was | ||
| 19 | written in nanosecond resolution in the database. At that time we were using | ||
| 20 | <a href=https://www.postgresql.org/>PostgreSQL</a> and database quickly grew way above | ||
| 21 | 200GB in disk space. And that was problematic. Statistics took disturbingly long | ||
| 22 | time to aggregate. Also using indexes on stats table in database was no help | ||
| 23 | after we reached 500 million datapoints.<blockquote><p>There is a marketing product information and there is real life experience. | ||
| 24 | And the tend to be quite the opposite.</blockquote><p>This was the reason that now everything is aggregated on daily basis and this | ||
| 25 | data is then fed to Elastic in form of daily summary. With this we achieved we | ||
| 26 | can now track many more dimensions such as zone, channel and platform | ||
| 27 | information. And with this information we can now adapt occurrences of ads on | ||
| 28 | specific places more precisely.<p>We have also adapted <a href=https://redis.io/>Redis</a> as a full-time citizen in our | ||
| 29 | stack. Because Redis also stores information on a local disk we have some sort | ||
| 30 | of backup if server would accidentally suffer some failure.<p>All the real-time statistics for ad serving and redirecting is presented as | ||
| 31 | counters in Redis instance and daily extracted and pushed to Elastic.<h2 id=measure-everything>Measure everything</h2><p>The thing about software is that we really don't know how well it is performing | ||
| 32 | under load until such load is presented. When testing locally everything is fine | ||
| 33 | but when on production things tend to fall apart.<p>As a solution for this we are measuring everything we can. Function execution | ||
| 34 | time (by encapsulating functions with timers), server performance (cpu, memory, | ||
| 35 | disk, etc), Nginx and <a href=https://uwsgi-docs.readthedocs.io/>uWSGI</a> performance. | ||
| 36 | We sacrifice a bit of performance for the sake of this information. And we store | ||
| 37 | all this information for later analysis.<p><strong>Example of function execution time</strong><pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{ | ||
| 38 | </span></span><span style=display:flex><span> "get_final_filtered_ads": { | ||
| 39 | </span></span><span style=display:flex><span> "counter": 1931250, | ||
| 40 | </span></span><span style=display:flex><span> "avg": 0.0066143431, | ||
| 41 | </span></span><span style=display:flex><span> "elapsed": 12773.9500310003 | ||
| 42 | </span></span><span style=display:flex><span> }, | ||
| 43 | </span></span><span style=display:flex><span> "store_keywords_statistics": { | ||
| 44 | </span></span><span style=display:flex><span> "counter": 1931011, | ||
| 45 | </span></span><span style=display:flex><span> "avg": 0.0004605267, | ||
| 46 | </span></span><span style=display:flex><span> "elapsed": 889.2821669996 | ||
| 47 | </span></span><span style=display:flex><span> }, | ||
| 48 | </span></span><span style=display:flex><span> "match_by_context": { | ||
| 49 | </span></span><span style=display:flex><span> "counter": 1931011, | ||
| 50 | </span></span><span style=display:flex><span> "avg": 0.0055960716, | ||
| 51 | </span></span><span style=display:flex><span> "elapsed": 10806.0758889999 | ||
| 52 | </span></span><span style=display:flex><span> }, | ||
| 53 | </span></span><span style=display:flex><span> "match_by_high_performance": { | ||
| 54 | </span></span><span style=display:flex><span> "counter": 262, | ||
| 55 | </span></span><span style=display:flex><span> "avg": 0.0152770229, | ||
| 56 | </span></span><span style=display:flex><span> "elapsed": 4.00258 | ||
| 57 | </span></span><span style=display:flex><span> }, | ||
| 58 | </span></span><span style=display:flex><span> "store_impression_stats": { | ||
| 59 | </span></span><span style=display:flex><span> "counter": 1931250, | ||
| 60 | </span></span><span style=display:flex><span> "avg": 0.0006189991, | ||
| 61 | </span></span><span style=display:flex><span> "elapsed": 1195.4419869999 | ||
| 62 | </span></span><span style=display:flex><span> } | ||
| 63 | </span></span><span style=display:flex><span>} | ||
| 64 | </span></span></code></pre><p>We have also started profiling with <a href=https://pymotw.com/2/profile/>cProfile</a> | ||
| 65 | and then visualizing with <a href=http://kcachegrind.sourceforge.net/>KCachegrind</a>. | ||
| 66 | This provides much more detailed look into code execution.<h2 id=cache-control-is-your-friend>Cache control is your friend</h2><p>Because we use Javascript library for rendering ads we rely on this script | ||
| 67 | extensively and when in need we need to be able to change behavior of the script | ||
| 68 | quickly.<p>In our case we can not simply replace javascript url in html code. It usually | ||
| 69 | takes a day or two for the guys who maintain sites to change code or add | ||
| 70 | ?ver=xxx attribute. And this makes rapid deployment and testing very difficult | ||
| 71 | and time consuming. There is a limitation of how much you can test locally.<p>We are now in the process of integrating <a href=https://www.google.com/analytics/tag-manager/>Google Tag | ||
| 72 | Manager</a> but couple of websites | ||
| 73 | are developed on ASP.net platform that have some problems with tag manager. With | ||
| 74 | a solution below we are certain that we are serving latest version of the | ||
| 75 | script.<p>And it only takes one mistake and users have the script cached and in case of | ||
| 76 | caching it for 1 year you probably know where the problem is.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># nginx ➜ /etc/nginx/sites-available/default | ||
| 77 | </span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>location</span> <span style=color:#a31515>/static/</span> { | ||
| 78 | </span></span><span style=display:flex><span> <span style=color:#00f>alias</span> <span style=color:#a31515>/path-to-static-content/</span>; | ||
| 79 | </span></span><span style=display:flex><span> <span style=color:#00f>autoindex</span> off; | ||
| 80 | </span></span><span style=display:flex><span> <span style=color:#00f>charset</span> <span style=color:#a31515>utf-8</span>; | ||
| 81 | </span></span><span style=display:flex><span> <span style=color:#00f>gzip</span> on; | ||
| 82 | </span></span><span style=display:flex><span> <span style=color:#00f>gzip_types</span> <span style=color:#a31515>text/plain</span> <span style=color:#a31515>application/javascript</span> <span style=color:#a31515>application/x-javascript</span> <span style=color:#a31515>text/javascript</span> <span style=color:#a31515>text/xml</span> <span style=color:#a31515>text/css</span>; | ||
| 83 | </span></span><span style=display:flex><span> <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)</span>$ { | ||
| 84 | </span></span><span style=display:flex><span> <span style=color:#00f>expires</span> <span style=color:#a31515>1y</span>; | ||
| 85 | </span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>; | ||
| 86 | </span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>"public"</span>; | ||
| 87 | </span></span><span style=display:flex><span> } | ||
| 88 | </span></span><span style=display:flex><span> <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(css|js|txt)</span>$ { | ||
| 89 | </span></span><span style=display:flex><span> <span style=color:#00f>expires</span> <span style=color:#a31515>3600s</span>; | ||
| 90 | </span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>; | ||
| 91 | </span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>"public,</span> <span style=color:#a31515>must-revalidate"</span>; | ||
| 92 | </span></span><span style=display:flex><span> } | ||
| 93 | </span></span><span style=display:flex><span>} | ||
| 94 | </span></span></code></pre><p>Also be careful when redirecting to url in your python code. We noticed that if | ||
| 95 | we didn't precisely setup cache control and expire headers in response we didn't | ||
| 96 | get the request on the server and therefore couldn't measure clicks. So when | ||
| 97 | redirecting do as follows and there will be no problems.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># python ➜ bottlepy web micro-framework</span> | ||
| 98 | </span></span><span style=display:flex><span>response = bottle.HTTPResponse(status=302) | ||
| 99 | </span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>"Cache-Control"</span>, <span style=color:#a31515>"no-store, no-cache, must-revalidate"</span>) | ||
| 100 | </span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>"Expires"</span>, <span style=color:#a31515>"Thu, 01 Jan 1970 00:00:00 GMT"</span>) | ||
| 101 | </span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>"Location"</span>, url) | ||
| 102 | </span></span><span style=display:flex><span><span style=color:#00f>return</span> response | ||
| 103 | </span></span></code></pre><blockquote><p>Cache control in browsers is quite aggressive and you need to be precise to | ||
| 104 | avoid future problems. We learned that lesson the hard way.</blockquote><h2 id=learn-nginx>Learn NGINX</h2><p>When deciding on a web server we went with Nginx as a reverse proxy for our | ||
| 105 | applications. We adapted micro-service oriented architecture early in the | ||
| 106 | project to ensure when we scale we can easily add additional servers to our | ||
| 107 | cluster. And Nginx was crucial to perform load balancing and static content | ||
| 108 | delivery.<p>At first our config file was quite simple and later grew larger. After patching | ||
| 109 | and adding new settings I sat down and learned more about the guts of Nginx. | ||
| 110 | This proved to be very useful and we were able to squeeze much more out of our | ||
| 111 | setup. So I advise you to take your time and read through the | ||
| 112 | <a href=https://nginx.org/en/docs/>documentation</a>. This saved us a lot of headache. | ||
| 113 | Googling for solutions only goes so far.<h2 id=use-redismemcached>Use Redis/Memcached</h2><p>As explained above we are using caching basically for everything. It is the | ||
| 114 | corner stone of our services. At first we were very careful about the quantity | ||
| 115 | of things we stored in <a href=https://redis.io/>Redis</a>. But we later found out that | ||
| 116 | the memory footprint is very low even when storing large amount of data in it.<p>So we gradually increased our usage to caching whole HTML outputs of dashboard. | ||
| 117 | This improved our performance in order of magnitude. And by using native TTL | ||
| 118 | support this goes hand in hand with our needs.<p>The reason why we choose <a href=https://redis.io/>Redis</a> over | ||
| 119 | <a href=https://memcached.org/>Memcached</a> was the nature of scalability of Redis out | ||
| 120 | of the box. But all this can be achieved with Memcached.<h2 id=conclusion>Conclusion</h2><p>There are a lot more details that could have been written and every single topic | ||
| 121 | in here deserves it's own post but you probably got the idea about the problems | ||
| 122 | we faced.</div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://chotrin.org/writing/2023-10-20.html target=_blank rel=noopener>OpenBSD upgrade and fall things.</a><div>Been AFK for a bit. It's autumn and I upgraded this server to OpenBSD 7.4! — <a href=https://chotrin.org>chötrin's wiki.</a><li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2 — <a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to | ||
| 123 | this mortal coil, we are endowed with self-awareness, agency, and free will. | ||
| 124 | Each of th… — <a href=https://drewdevault.com>Drew DeVault's blog</a><li><a href=https://solar.lowtechmagazine.com/2023/10/workshop-in-rotterdam-how-to-build-a-bike-generator/ target=_blank rel=noopener>Workshop in Rotterdam: How to Build a Bike Generator</a><div>Afbeelding: Low-tech Magazine workshop in Rotterdam, the Netherlands. Poster: Marie Verdeil. Image: Sara Vercauteren | ||
| 125 | The workshop takes place on behalf of the “Hou… — <a href=https://solar.lowtechmagazine.com/posts/>LOW←TECH MAGAZINE English</a><li><a href="http://offbeatpursuit.com:80/blog/?id=24" target=_blank rel=noopener>Printf debugging</a><div>tags: | ||
| 126 | plan9 | ||
| 127 | There’s no shame in that. Yes, there is documentation, code to be | ||
| 128 | read, and debuggers to be used. But sometimes you just need to “see” | ||
| 129 | what is happening. | ||
| 130 | So… — <a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. Ther… — <a href=https://neil.computer/>Neil Panchal</a><li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up … — <a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><li><a href=https://jcs.org/2023/10/17/wikipedia target=_blank rel=noopener>Wikipedia Reader 1.0 Released</a><div>Wikipedia Reader | ||
| 131 | 1.0 has been released: | ||
| 132 | wikipedia-1.0.sit | ||
| 133 | (StuffIt 3 archive, includes | ||
| 134 | source code | ||
| 135 | and THINK C 5 project file) | ||
| 136 | SHA256: 360e12d064f6579695f1e627ce34cb2f0… — <a href=https://jcs.org/>joshua stein</a></ul><p><a href=https://git.sr.ht/~sircmpwn/openring>Generated with openring.</a></section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email | ||
| 137 | at <a href=mailto:m@mitjafelicijan.com>m@mitjafelicijan.com</a> or | ||
| 138 | catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under | ||
| 139 | the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless specified | ||
| 140 | otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer> \ No newline at end of file | ||
