aboutsummaryrefslogtreecommitdiff
path: root/public/what-i-ve-learned-developing-ad-server.html
diff options
context:
space:
mode:
authorMitja Felicijan <mitja.felicijan@gmail.com>2023-10-29 14:41:39 +0100
committerMitja Felicijan <mitja.felicijan@gmail.com>2023-10-29 14:41:39 +0100
commit2836163e54e3b94342113314e70ee564c456c43e (patch)
tree59b82fc69e83cc6d92846a8e9f510b0bb865cf3b /public/what-i-ve-learned-developing-ad-server.html
parentd50ea4053ea04abb3a455606d4591a8283af0677 (diff)
downloadmitjafelicijan.com-2836163e54e3b94342113314e70ee564c456c43e.tar.gz
Added public folder to git so it get get deployed on vercel
Diffstat (limited to 'public/what-i-ve-learned-developing-ad-server.html')
-rwxr-xr-xpublic/what-i-ve-learned-developing-ad-server.html140
1 files changed, 140 insertions, 0 deletions
diff --git a/public/what-i-ve-learned-developing-ad-server.html b/public/what-i-ve-learned-developing-ad-server.html
new file mode 100755
index 0000000..ca94d08
--- /dev/null
+++ b/public/what-i-ve-learned-developing-ad-server.html
@@ -0,0 +1,140 @@
1<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>What I've learned developing ad server</title><meta name=description content="For the past year and half I have been developing native advertising server thatcontextually matches ads and displays them in different template forms onvariety of websites."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>body{padding:1rem;max-width:760px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px;margin:0 auto}hr{margin-block-start:1.5rem}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}table{max-width:100%;width:100%;border-collapse:separate;border-spacing:2px;border:1px solid #000;border-left:1px solid #999;border-top:1px solid #999}blockquote{font-style:italic}table thead{background:#eee}ul.list li{padding:.2em 0}ul{line-height:1.4em}td,th{border:1px solid #000;padding:4px;border-right:1px solid #999;border-bottom:1px solid #999;text-align:left}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:1px solid #dcdcdc}code{padding:0 3px;font-size:14px;border:0}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{text-align:center}figcaption p{margin:.3em 0 0}img,video,audio{max-width:100%}header{display:flex;flex-direction:row;gap:3rem}nav{display:flex;gap:.75rem}nav.main{flex-grow:1}.pstatus-orange{background:gold}.pstatus-green{background:#9acd32}.pstatus-red{background:#cd5c5c}@media only screen and (max-width:600px){body{padding:15px}header{flex-direction:column;gap:1rem}a{word-wrap:break-word}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=toolbar><a href=/>Home</a>
2<a href=https://git.mitjafelicijan.com/ target=_blank>Git</a>
3<a href=https://files.mitjafelicijan.com/ target=_blank>Files</a>
4<a href=/radio.pls target=_blank>Radio</a>
5<a href=/mitjafelicijan.pgp.pub.txt target=_blank>PGP</a>
6<a href=/curriculum-vitae.html>CV</a>
7<a href=/index.xml target=_blank>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>What I've learned developing ad server</h1><p><cap>post</cap>, Apr 17, 2017 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><p>For the past year and half I have been developing native advertising server that
8contextually matches ads and displays them in different template forms on
9variety of websites. This project grew from serving thousands of ads per day to
10millions.<p>The system is made from couple of core components:<ul><li>API for serving ads,<li>Utils - cronjobs and queue management tools,<li>Dashboard UI.</ul><p>Initial release was using <a href=https://www.mongodb.com/>MongoDB</a> for full-text
11search but was later replaced by <a href=https://www.elastic.co/>Elasticsearch</a> for
12better CPU utilization and better search performance. This provided us with many
13amazing functionalities of <a href=https://www.elastic.co/>Elasticsearch</a>. You should
14check it out if you do any search related operations.<p>Because the premise of the server is to provide native ad experience, they are
15rendered on the client side via simple templating engine. This ensures that ads
16can be displayed number of different ways based on the visual style of the
17page. And this makes JavaScript client library quite complex.<p>So now that you know basic information about the product lets get into the
18lessons we learned.<h2 id=aggregate-everything>Aggregate everything</h2><p>After beta version was released everything (impressions, clicks, etc) was
19written in nanosecond resolution in the database. At that time we were using
20<a href=https://www.postgresql.org/>PostgreSQL</a> and database quickly grew way above
21200GB in disk space. And that was problematic. Statistics took disturbingly long
22time to aggregate. Also using indexes on stats table in database was no help
23after we reached 500 million datapoints.<blockquote><p>There is a marketing product information and there is real life experience.
24And the tend to be quite the opposite.</blockquote><p>This was the reason that now everything is aggregated on daily basis and this
25data is then fed to Elastic in form of daily summary. With this we achieved we
26can now track many more dimensions such as zone, channel and platform
27information. And with this information we can now adapt occurrences of ads on
28specific places more precisely.<p>We have also adapted <a href=https://redis.io/>Redis</a> as a full-time citizen in our
29stack. Because Redis also stores information on a local disk we have some sort
30of backup if server would accidentally suffer some failure.<p>All the real-time statistics for ad serving and redirecting is presented as
31counters in Redis instance and daily extracted and pushed to Elastic.<h2 id=measure-everything>Measure everything</h2><p>The thing about software is that we really don't know how well it is performing
32under load until such load is presented. When testing locally everything is fine
33but when on production things tend to fall apart.<p>As a solution for this we are measuring everything we can. Function execution
34time (by encapsulating functions with timers), server performance (cpu, memory,
35disk, etc), Nginx and <a href=https://uwsgi-docs.readthedocs.io/>uWSGI</a> performance.
36We sacrifice a bit of performance for the sake of this information. And we store
37all this information for later analysis.<p><strong>Example of function execution time</strong><pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{
38</span></span><span style=display:flex><span> &#34;get_final_filtered_ads&#34;: {
39</span></span><span style=display:flex><span> &#34;counter&#34;: 1931250,
40</span></span><span style=display:flex><span> &#34;avg&#34;: 0.0066143431,
41</span></span><span style=display:flex><span> &#34;elapsed&#34;: 12773.9500310003
42</span></span><span style=display:flex><span> },
43</span></span><span style=display:flex><span> &#34;store_keywords_statistics&#34;: {
44</span></span><span style=display:flex><span> &#34;counter&#34;: 1931011,
45</span></span><span style=display:flex><span> &#34;avg&#34;: 0.0004605267,
46</span></span><span style=display:flex><span> &#34;elapsed&#34;: 889.2821669996
47</span></span><span style=display:flex><span> },
48</span></span><span style=display:flex><span> &#34;match_by_context&#34;: {
49</span></span><span style=display:flex><span> &#34;counter&#34;: 1931011,
50</span></span><span style=display:flex><span> &#34;avg&#34;: 0.0055960716,
51</span></span><span style=display:flex><span> &#34;elapsed&#34;: 10806.0758889999
52</span></span><span style=display:flex><span> },
53</span></span><span style=display:flex><span> &#34;match_by_high_performance&#34;: {
54</span></span><span style=display:flex><span> &#34;counter&#34;: 262,
55</span></span><span style=display:flex><span> &#34;avg&#34;: 0.0152770229,
56</span></span><span style=display:flex><span> &#34;elapsed&#34;: 4.00258
57</span></span><span style=display:flex><span> },
58</span></span><span style=display:flex><span> &#34;store_impression_stats&#34;: {
59</span></span><span style=display:flex><span> &#34;counter&#34;: 1931250,
60</span></span><span style=display:flex><span> &#34;avg&#34;: 0.0006189991,
61</span></span><span style=display:flex><span> &#34;elapsed&#34;: 1195.4419869999
62</span></span><span style=display:flex><span> }
63</span></span><span style=display:flex><span>}
64</span></span></code></pre><p>We have also started profiling with <a href=https://pymotw.com/2/profile/>cProfile</a>
65and then visualizing with <a href=http://kcachegrind.sourceforge.net/>KCachegrind</a>.
66This provides much more detailed look into code execution.<h2 id=cache-control-is-your-friend>Cache control is your friend</h2><p>Because we use Javascript library for rendering ads we rely on this script
67extensively and when in need we need to be able to change behavior of the script
68quickly.<p>In our case we can not simply replace javascript url in html code. It usually
69takes a day or two for the guys who maintain sites to change code or add
70?ver=xxx attribute. And this makes rapid deployment and testing very difficult
71and time consuming. There is a limitation of how much you can test locally.<p>We are now in the process of integrating <a href=https://www.google.com/analytics/tag-manager/>Google Tag
72Manager</a> but couple of websites
73are developed on ASP.net platform that have some problems with tag manager. With
74a solution below we are certain that we are serving latest version of the
75script.<p>And it only takes one mistake and users have the script cached and in case of
76caching it for 1 year you probably know where the problem is.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># nginx ➜ /etc/nginx/sites-available/default
77</span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>location</span> <span style=color:#a31515>/static/</span> {
78</span></span><span style=display:flex><span> <span style=color:#00f>alias</span> <span style=color:#a31515>/path-to-static-content/</span>;
79</span></span><span style=display:flex><span> <span style=color:#00f>autoindex</span> off;
80</span></span><span style=display:flex><span> <span style=color:#00f>charset</span> <span style=color:#a31515>utf-8</span>;
81</span></span><span style=display:flex><span> <span style=color:#00f>gzip</span> on;
82</span></span><span style=display:flex><span> <span style=color:#00f>gzip_types</span> <span style=color:#a31515>text/plain</span> <span style=color:#a31515>application/javascript</span> <span style=color:#a31515>application/x-javascript</span> <span style=color:#a31515>text/javascript</span> <span style=color:#a31515>text/xml</span> <span style=color:#a31515>text/css</span>;
83</span></span><span style=display:flex><span> <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)</span>$ {
84</span></span><span style=display:flex><span> <span style=color:#00f>expires</span> <span style=color:#a31515>1y</span>;
85</span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>;
86</span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>&#34;public&#34;</span>;
87</span></span><span style=display:flex><span> }
88</span></span><span style=display:flex><span> <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(css|js|txt)</span>$ {
89</span></span><span style=display:flex><span> <span style=color:#00f>expires</span> <span style=color:#a31515>3600s</span>;
90</span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>;
91</span></span><span style=display:flex><span> <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>&#34;public,</span> <span style=color:#a31515>must-revalidate&#34;</span>;
92</span></span><span style=display:flex><span> }
93</span></span><span style=display:flex><span>}
94</span></span></code></pre><p>Also be careful when redirecting to url in your python code. We noticed that if
95we didn't precisely setup cache control and expire headers in response we didn't
96get the request on the server and therefore couldn't measure clicks. So when
97redirecting do as follows and there will be no problems.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># python ➜ bottlepy web micro-framework</span>
98</span></span><span style=display:flex><span>response = bottle.HTTPResponse(status=302)
99</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Cache-Control&#34;</span>, <span style=color:#a31515>&#34;no-store, no-cache, must-revalidate&#34;</span>)
100</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Expires&#34;</span>, <span style=color:#a31515>&#34;Thu, 01 Jan 1970 00:00:00 GMT&#34;</span>)
101</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Location&#34;</span>, url)
102</span></span><span style=display:flex><span><span style=color:#00f>return</span> response
103</span></span></code></pre><blockquote><p>Cache control in browsers is quite aggressive and you need to be precise to
104avoid future problems. We learned that lesson the hard way.</blockquote><h2 id=learn-nginx>Learn NGINX</h2><p>When deciding on a web server we went with Nginx as a reverse proxy for our
105applications. We adapted micro-service oriented architecture early in the
106project to ensure when we scale we can easily add additional servers to our
107cluster. And Nginx was crucial to perform load balancing and static content
108delivery.<p>At first our config file was quite simple and later grew larger. After patching
109and adding new settings I sat down and learned more about the guts of Nginx.
110This proved to be very useful and we were able to squeeze much more out of our
111setup. So I advise you to take your time and read through the
112<a href=https://nginx.org/en/docs/>documentation</a>. This saved us a lot of headache.
113Googling for solutions only goes so far.<h2 id=use-redismemcached>Use Redis/Memcached</h2><p>As explained above we are using caching basically for everything. It is the
114corner stone of our services. At first we were very careful about the quantity
115of things we stored in <a href=https://redis.io/>Redis</a>. But we later found out that
116the memory footprint is very low even when storing large amount of data in it.<p>So we gradually increased our usage to caching whole HTML outputs of dashboard.
117This improved our performance in order of magnitude. And by using native TTL
118support this goes hand in hand with our needs.<p>The reason why we choose <a href=https://redis.io/>Redis</a> over
119<a href=https://memcached.org/>Memcached</a> was the nature of scalability of Redis out
120of the box. But all this can be achieved with Memcached.<h2 id=conclusion>Conclusion</h2><p>There are a lot more details that could have been written and every single topic
121in here deserves it's own post but you probably got the idea about the problems
122we faced.</div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://chotrin.org/writing/2023-10-20.html target=_blank rel=noopener>OpenBSD upgrade and fall things.</a><div>Been AFK for a bit. It's autumn and I upgraded this server to OpenBSD 7.4! — <a href=https://chotrin.org>chötrin's wiki.</a><li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2 — <a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to
123this mortal coil, we are endowed with self-awareness, agency, and free will.
124Each of th… — <a href=https://drewdevault.com>Drew DeVault's blog</a><li><a href=https://solar.lowtechmagazine.com/2023/10/workshop-in-rotterdam-how-to-build-a-bike-generator/ target=_blank rel=noopener>Workshop in Rotterdam: How to Build a Bike Generator</a><div>Afbeelding: Low-tech Magazine workshop in Rotterdam, the Netherlands. Poster: Marie Verdeil. Image: Sara Vercauteren
125The workshop takes place on behalf of the “Hou… — <a href=https://solar.lowtechmagazine.com/posts/>LOW←TECH MAGAZINE English</a><li><a href="http://offbeatpursuit.com:80/blog/?id=24" target=_blank rel=noopener>Printf debugging</a><div>tags:
126plan9
127There’s no shame in that. Yes, there is documentation, code to be
128read, and debuggers to be used. But sometimes you just need to “see”
129what is happening.
130So… — <a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. Ther… — <a href=https://neil.computer/>Neil Panchal</a><li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up … — <a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><li><a href=https://jcs.org/2023/10/17/wikipedia target=_blank rel=noopener>Wikipedia Reader 1.0 Released</a><div>Wikipedia Reader
1311.0 has been released:
132wikipedia-1.0.sit
133(StuffIt 3 archive, includes
134source code
135and THINK C 5 project file)
136SHA256: 360e12d064f6579695f1e627ce34cb2f0… — <a href=https://jcs.org/>joshua stein</a></ul><p><a href=https://git.sr.ht/~sircmpwn/openring>Generated with openring.</a></section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email
137at <a href=mailto:m@mitjafelicijan.com>m@mitjafelicijan.com</a> or
138catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under
139the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless specified
140otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer> \ No newline at end of file