aboutsummaryrefslogtreecommitdiff
path: root/public/what-i-ve-learned-developing-ad-server.html
blob: 1010bc14277463ac7336bc5c2913ad7d4d9876ae (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
<!doctype html><html lang=en-us><meta charset=utf-8><meta name=viewport content="width=device-width,initial-scale=1"><meta name=generator content="JBMAFP - github.com/mitjafelicijan/jbmafp"><link href="data:image/x-icon;base64,AAABAAEAEBAAAAEAIABoBAAAFgAAACgAAAAQAAAAIAAAAAEAIAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL69vf8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAv76+/8LBwQkAAAAAAAAAAAAAAAC+vb3/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAL+9vf/Bv78JAAAAAAAAAAAAAAAAu7q6/wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC7ubr/vr29CAAAAAAAAAAAy8nJAZ6foP8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnqGj/6GipAoAAAAAHLjU/xcXHf/BwsL/I8XY/yPK3v8XGiD/IbjL/yPF2f8XGiD/Fxkf/yLF2f8gnK3/Fxog/62ztv8fwNf/FRcd/x271v8mz93/GRsi/xkXHf8p097/GiIp/xobIv8p0t3/KdPe/xocIv8fYmr/KNPe/xoZH/8aHCL/J87c/xy81/8VFxz/IsPZ/8zS0/8XGiD/Ir/R/yPH2/8XGiD/Fxkf/yPH2/8dd4T/GBog/yPJ3f8jyNr/uru9/xcUGv8cudb/EhITDKi5vRKlvMP/RUpOERwcHRAdOj4QHTk8EBwdHRAdNTgQHTo/EBwcHRAcHB0QSGduEKW4vf+koqQfHzg+EBqz0ewSFRv7EyMr/xq51vsTERb7ExUb+xq41fsau9j7ExUb+xiPp/sZudb7ExUb+xMVG/sZuNX/GKvI/BIUGfMdvdn/IrfL/xcaIP8n1eb/J9Dh/xkcIf8ZGR7/J8/f/xxCSv8ZGyH/J9Dg/ybQ4P8ZHCL/FSQs/yPK3/8UExj/GE1b/ybS5P8ZGB7/Ghwj/ynW5P8p2Ob/Ghwi/yWrtv8p1eH/Ghwi/xocIv8p1uT/J8XT/xkcIv8m1un/Hb7d/xUYH/8hzOr/HtHu/xcaIf8XGB//I8vi/xgxOv8XGSD/I8rg/yPK4P8XGiD/GUFL/yPP6f8SERj/Fhkh/x3A4f8AAAAAJ2f9/ydr//8mZPH/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAlYu38J2v//ydo/f8AAAAAAAAAAAd8/fkFqf//Iob8sAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMY39awWr//8FfP3/AAAAAAAAAAAFm/7/SfD//wR+/f8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOB/f9B7v//BaX+/wAAAAAAAAAAQ878SAyZ/v9n1v4KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADu9v8DDJb+/z3N/XgAAAAA3/sAAN/7AADf+wAA3/sAAAAAAAAAAAAAAAAAAN/7AAAAAAAAAAAAAAAAAAAAAAAAj/EAAI/5AACP8QAA3/sAAA==" rel=icon type=image/x-icon><title>What I've learned developing ad server</title><meta name=description content="For the past year and half I have been developing native advertising server thatcontextually matches ads and displays them in different template forms onvariety of websites."><link rel=alternate type=application/rss+xml title="Mitja Felicijan's posts" href=https://mitjafelicijan.com/index.xml><link rel=alternate type=application/rss+xml title="Mitja Felicijan's notes" href=https://mitjafelicijan.com/notes.xml><style>:root{--border-color:gainsboro;--border-size:2px;--link-color:blue}body{padding:2.5rem;max-width:1900px;background:#fff;font-family:sans-serif;line-height:1.35rem;font-size:16px}hr{border:0;border-bottom:var(--border-size)solid var(--border-color);margin-block-start:1.5rem}a{color:var(--link-color);text-decoration:none}h1,h2,h3{line-height:initial}h1{font-size:xx-large}footer{margin-block-start:2rem}cap{text-transform:capitalize}blockquote{font-style:italic}table{max-width:100%;border:var(--border-size)solid var(--border-color);border-collapse:separate;border-spacing:0}table thead tr th{border-bottom:var(--border-size)solid var(--border-color)}table th,table td{padding:.5em .8em}ul.list li{padding:.2em 0}ul{line-height:1.35em}pre{text-wrap:nowrap;overflow-x:auto;padding:0 1em;border:var(--border-size)solid var(--border-color)}code{padding:0 3px;font-size:14px;border:0}pre code{line-height:1.3em}pre,code,pre *,code *{font-family:monospace}figure{margin-inline-start:0;margin-inline-end:0}figcaption{text-align:center}figcaption p{margin:.3em 0 0}img,video,audio{width:800px;max-width:100%}header nav{display:flex;gap:.9rem}audio::-webkit-media-controls-enclosure{border-radius:0}@media only screen and (max-width:600px){body{padding:.5em;word-wrap:break-word}header nav{gap:.7rem}header nav .hob{display:none}a{word-wrap:break-word}}</style><header><nav class=main itemscope itemtype=http://schema.org/SiteNavigationElement role=toolbar><a href=/>Home</a>
<a href=/#posts>Posts</a>
<a href=/#notes>Notes</a>
<a href=/#sideprojects class=hob>Side Projects</a>
<a href=/vault.html>Vault</a>
<a href=https://github.com/mitjafelicijan target=_blank>Code</a>
<a href=/mitjafelicijan.pgp.pub.txt target=_blank class=hob>PGP</a>
<a href=/curriculum-vitae.html>CV</a>
<a href=/index.xml target=_blank class=hob>RSS</a></nav></header><main role=main><article itemtype=http://schema.org/Article><h1 itemtype=headline>What I've learned developing ad server</h1><p><cap>post</cap>, Apr 17, 2017 on <a href=https://mitjafelicijan.com>Mitja Felicijan's blog</a><div><p>For the past year and half I have been developing native advertising server that
contextually matches ads and displays them in different template forms on
variety of websites. This project grew from serving thousands of ads per day to
millions.<p>The system is made from couple of core components:<ul><li>API for serving ads,<li>Utils - cronjobs and queue management tools,<li>Dashboard UI.</ul><p>Initial release was using <a href=https://www.mongodb.com/>MongoDB</a> for full-text
search but was later replaced by <a href=https://www.elastic.co/>Elasticsearch</a> for
better CPU utilization and better search performance. This provided us with many
amazing functionalities of <a href=https://www.elastic.co/>Elasticsearch</a>. You should
check it out if you do any search related operations.<p>Because the premise of the server is to provide native ad experience, they are
rendered on the client side via simple templating engine. This ensures that ads
can be displayed number of different ways based on the visual style of the
page. And this makes JavaScript client library quite complex.<p>So now that you know basic information about the product lets get into the
lessons we learned.<h2 id=aggregate-everything>Aggregate everything</h2><p>After beta version was released everything (impressions, clicks, etc) was
written in nanosecond resolution in the database. At that time we were using
<a href=https://www.postgresql.org/>PostgreSQL</a> and database quickly grew way above
200GB in disk space. And that was problematic. Statistics took disturbingly long
time to aggregate. Also using indexes on stats table in database was no help
after we reached 500 million datapoints.<blockquote><p>There is a marketing product information and there is real life experience.
And the tend to be quite the opposite.</blockquote><p>This was the reason that now everything is aggregated on daily basis and this
data is then fed to Elastic in form of daily summary. With this we achieved we
can now track many more dimensions such as zone, channel and platform
information. And with this information we can now adapt occurrences of ads on
specific places more precisely.<p>We have also adapted <a href=https://redis.io/>Redis</a> as a full-time citizen in our
stack. Because Redis also stores information on a local disk we have some sort
of backup if server would accidentally suffer some failure.<p>All the real-time statistics for ad serving and redirecting is presented as
counters in Redis instance and daily extracted and pushed to Elastic.<h2 id=measure-everything>Measure everything</h2><p>The thing about software is that we really don't know how well it is performing
under load until such load is presented. When testing locally everything is fine
but when on production things tend to fall apart.<p>As a solution for this we are measuring everything we can. Function execution
time (by encapsulating functions with timers), server performance (cpu, memory,
disk, etc), Nginx and <a href=https://uwsgi-docs.readthedocs.io/>uWSGI</a> performance.
We sacrifice a bit of performance for the sake of this information. And we store
all this information for later analysis.<p><strong>Example of function execution time</strong><pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span>{
</span></span><span style=display:flex><span>  &#34;get_final_filtered_ads&#34;: {
</span></span><span style=display:flex><span>    &#34;counter&#34;: 1931250,
</span></span><span style=display:flex><span>    &#34;avg&#34;: 0.0066143431,
</span></span><span style=display:flex><span>    &#34;elapsed&#34;: 12773.9500310003
</span></span><span style=display:flex><span>  },
</span></span><span style=display:flex><span>  &#34;store_keywords_statistics&#34;: {
</span></span><span style=display:flex><span>    &#34;counter&#34;: 1931011,
</span></span><span style=display:flex><span>    &#34;avg&#34;: 0.0004605267,
</span></span><span style=display:flex><span>    &#34;elapsed&#34;: 889.2821669996
</span></span><span style=display:flex><span>  },
</span></span><span style=display:flex><span>  &#34;match_by_context&#34;: {
</span></span><span style=display:flex><span>    &#34;counter&#34;: 1931011,
</span></span><span style=display:flex><span>    &#34;avg&#34;: 0.0055960716,
</span></span><span style=display:flex><span>    &#34;elapsed&#34;: 10806.0758889999
</span></span><span style=display:flex><span>  },
</span></span><span style=display:flex><span>  &#34;match_by_high_performance&#34;: {
</span></span><span style=display:flex><span>    &#34;counter&#34;: 262,
</span></span><span style=display:flex><span>    &#34;avg&#34;: 0.0152770229,
</span></span><span style=display:flex><span>    &#34;elapsed&#34;: 4.00258
</span></span><span style=display:flex><span>  },
</span></span><span style=display:flex><span>  &#34;store_impression_stats&#34;: {
</span></span><span style=display:flex><span>    &#34;counter&#34;: 1931250,
</span></span><span style=display:flex><span>    &#34;avg&#34;: 0.0006189991,
</span></span><span style=display:flex><span>    &#34;elapsed&#34;: 1195.4419869999
</span></span><span style=display:flex><span>  }
</span></span><span style=display:flex><span>}
</span></span></code></pre><p>We have also started profiling with <a href=https://pymotw.com/2/profile/>cProfile</a>
and then visualizing with <a href=http://kcachegrind.sourceforge.net/>KCachegrind</a>.
This provides much more detailed look into code execution.<h2 id=cache-control-is-your-friend>Cache control is your friend</h2><p>Because we use Javascript library for rendering ads we rely on this script
extensively and when in need we need to be able to change behavior of the script
quickly.<p>In our case we can not simply replace javascript url in html code. It usually
takes a day or two for the guys who maintain sites to change code or add
?ver=xxx attribute. And this makes rapid deployment and testing very difficult
and time consuming. There is a limitation of how much you can test locally.<p>We are now in the process of integrating <a href=https://www.google.com/analytics/tag-manager/>Google Tag
Manager</a> but couple of websites
are developed on ASP.net platform that have some problems with tag manager. With
a solution below we are certain that we are serving latest version of the
script.<p>And it only takes one mistake and users have the script cached and in case of
caching it for 1 year you probably know where the problem is.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># nginx ➜ /etc/nginx/sites-available/default
</span></span></span><span style=display:flex><span><span style=color:green></span><span style=color:#00f>location</span> <span style=color:#a31515>/static/</span> {
</span></span><span style=display:flex><span>  <span style=color:#00f>alias</span> <span style=color:#a31515>/path-to-static-content/</span>;
</span></span><span style=display:flex><span>  <span style=color:#00f>autoindex</span> off;
</span></span><span style=display:flex><span>  <span style=color:#00f>charset</span> <span style=color:#a31515>utf-8</span>;
</span></span><span style=display:flex><span>  <span style=color:#00f>gzip</span> on;
</span></span><span style=display:flex><span>  <span style=color:#00f>gzip_types</span> <span style=color:#a31515>text/plain</span> <span style=color:#a31515>application/javascript</span> <span style=color:#a31515>application/x-javascript</span> <span style=color:#a31515>text/javascript</span> <span style=color:#a31515>text/xml</span> <span style=color:#a31515>text/css</span>;
</span></span><span style=display:flex><span>  <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(ico|gif|jpeg|jpg|png|woff|ttf|otf|svg|woff2|eot)</span>$ {
</span></span><span style=display:flex><span>    <span style=color:#00f>expires</span> <span style=color:#a31515>1y</span>;
</span></span><span style=display:flex><span>    <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>;
</span></span><span style=display:flex><span>    <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>&#34;public&#34;</span>;
</span></span><span style=display:flex><span>  }
</span></span><span style=display:flex><span>  <span style=color:#00f>location</span> ~<span style=color:#a31515>*</span> <span style=color:#a31515>\.(css|js|txt)</span>$ {
</span></span><span style=display:flex><span>    <span style=color:#00f>expires</span> <span style=color:#a31515>3600s</span>;
</span></span><span style=display:flex><span>    <span style=color:#00f>add_header</span> <span style=color:#a31515>Pragma</span> <span style=color:#a31515>public</span>;
</span></span><span style=display:flex><span>    <span style=color:#00f>add_header</span> <span style=color:#a31515>Cache-Control</span> <span style=color:#a31515>&#34;public,</span> <span style=color:#a31515>must-revalidate&#34;</span>;
</span></span><span style=display:flex><span>  }
</span></span><span style=display:flex><span>}
</span></span></code></pre><p>Also be careful when redirecting to url in your python code. We noticed that if
we didn't precisely setup cache control and expire headers in response we didn't
get the request on the server and therefore couldn't measure clicks. So when
redirecting do as follows and there will be no problems.<pre tabindex=0 style=background-color:#fff><code><span style=display:flex><span><span style=color:green># python ➜ bottlepy web micro-framework</span>
</span></span><span style=display:flex><span>response = bottle.HTTPResponse(status=302)
</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Cache-Control&#34;</span>, <span style=color:#a31515>&#34;no-store, no-cache, must-revalidate&#34;</span>)
</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Expires&#34;</span>, <span style=color:#a31515>&#34;Thu, 01 Jan 1970 00:00:00 GMT&#34;</span>)
</span></span><span style=display:flex><span>response.set_header(<span style=color:#a31515>&#34;Location&#34;</span>, url)
</span></span><span style=display:flex><span><span style=color:#00f>return</span> response
</span></span></code></pre><blockquote><p>Cache control in browsers is quite aggressive and you need to be precise to
avoid future problems. We learned that lesson the hard way.</blockquote><h2 id=learn-nginx>Learn NGINX</h2><p>When deciding on a web server we went with Nginx as a reverse proxy for our
applications. We adapted micro-service oriented architecture early in the
project to ensure when we scale we can easily add additional servers to our
cluster. And Nginx was crucial to perform load balancing and static content
delivery.<p>At first our config file was quite simple and later grew larger. After patching
and adding new settings I sat down and learned more about the guts of Nginx.
This proved to be very useful and we were able to squeeze much more out of our
setup. So I advise you to take your time and read through the
<a href=https://nginx.org/en/docs/>documentation</a>. This saved us a lot of headache.
Googling for solutions only goes so far.<h2 id=use-redismemcached>Use Redis/Memcached</h2><p>As explained above we are using caching basically for everything. It is the
corner stone of our services. At first we were very careful about the quantity
of things we stored in <a href=https://redis.io/>Redis</a>. But we later found out that
the memory footprint is very low even when storing large amount of data in it.<p>So we gradually increased our usage to caching whole HTML outputs of dashboard.
This improved our performance in order of magnitude. And by using native TTL
support this goes hand in hand with our needs.<p>The reason why we choose <a href=https://redis.io/>Redis</a> over
<a href=https://memcached.org/>Memcached</a> was the nature of scalability of Redis out
of the box. But all this can be achieved with Memcached.<h2 id=conclusion>Conclusion</h2><p>There are a lot more details that could have been written and every single topic
in here deserves it's own post but you probably got the idea about the problems
we faced.</div></article></main><section><hr><h2>Posts from blogs I follow around the net</h2><ul><li><a href=https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSWhyNotDirectoryToFilesystem target=_blank rel=noopener>One reason that ZFS can't turn a directory into a filesystem</a><a href=https://utcc.utoronto.ca/~cks/space/blog/>Chris's Wiki :: blog</a><div>One of the wishes that I and other people frequently have for ZFS
is the ability to take an existing directory (and everything
underneath it) in a ZFS filesystem and turn it into a sub-filesystem
of its own. One reason for wanting this is that a number of things
are set and controlled on a per-filesyst…<li><a href=http://www.landley.net/notes-2023.html#28-10-2023 target=_blank rel=noopener>October 28, 2023</a><a href=http://www.landley.net/notes-2023.html>Rob Landley's Blog Thing for 2023</a><div>Oh good grief, two of my least favorite licensing people, Larry Rosen
and Bradley Kuhn, are interacting on the OSI's license-discuss
list where the're doing
bad computer history and insisting that a guy Larry Rosen
coincidentally interviewed for a book years ago is clearly the origin of
somethin…<li><a href="http://offbeatpursuit.com:80/blog/?id=25" target=_blank rel=noopener>A fix by any other name</a><a href=http://offbeatpursuit.com:80/blog/>WLOG - blog</a><div>tags:
i2c, plan9
Another month, another file system.
Well, if you can’t fix it in software, fix it in hardware (looking at
you, bme680, we’re not
done yet). The show must go on, as they say, and I would like my
experiments to go on.
So a “new” addition to the environmental sensor family connected to
the h…<li><a href=https://mirzapandzo.com/next-image-url-parameter-is-valid-but-upstream-response-is-invalid target=_blank rel=noopener>Next/Image "url" parameter is valid but upstream response is invalid</a><a href=https://mirzapandzo.com/>Mirza Pandzo's Blog</a><div>Getting "url" parameter is valid but upstream response is invalid error with Next/Image on WSL2<li><a href=https://drewdevault.com/2023/10/13/Going-off-script.html target=_blank rel=noopener>Going off-script</a><a href=https://drewdevault.com>Drew DeVault's blog</a><div>There is a phenomenon in society which I find quite bizarre. Upon our entry to
this mortal coil, we are endowed with self-awareness, agency, and free will.
Each of the 8 billion members of this human race represents a unique person, a
unique worldview, and a unique agency. Yet, many of us have the sam…<li><a href=https://szymonkaliski.com/writing/2023-10-02-building-a-diy-pen-plotter/ target=_blank rel=noopener>Building a DIY Pen Plotter</a><a href=http://github.com/dylang/node-rss>Szymon Kaliski</a><div>This article documents my learnings from designing and building a DIY Pen Plotter during the summer of 2023.
My ultimate goal is to build my…<li><a href=https://neil.computer/notes/chart-of-accounts-for-startups-and-saas-companies/ target=_blank rel=noopener>Chart of Accounts for Startups and SaaS Companies</a><a href=https://neil.computer/>Neil Panchal</a><div>Accounting is fundamental to starting a business. You need to have a basic understanding of accounting principles and essential bookkeeping. I had to learn it. There was no choice. For filing taxes, your CPA is going to ask you for an Income Statement (also known as P/L statement). If<li><a href=https://journal.valeriansaliou.name/deploy-a-nomad-cluster-on-alpine-linux-with-vultr/ target=_blank rel=noopener>Deploy a Nomad Cluster on Alpine Linux with Vultr</a><a href=https://journal.valeriansaliou.name/>Valerian Saliou</a><div>After spending countless hours trying to understand how to deploy my apps on Kubernetes for the first time to host Mirage, an AI API service that I run, I ended up making myself a promise that the next app I work on would be using a more productive & simpler<li><a href=https://jcs.org/2023/10/25/wifi_da target=_blank rel=noopener>BlueSCSI Wi-Fi Desk Accessory 1.0 Released</a><a href=https://jcs.org/>joshua stein</a><div>BlueSCSI Wi-Fi Desk Accessory
1.0 has been released:
wifi_da-1.0.sit
(StuffIt 3 archive)
SHA256: ccfc9d27dd5da7412d10cef73b81119a1fec3848e4d1d88ff652a07ffdc6a69aSHA1: ff124972f202ceda6d7fa4788110a67ccda6a13a
This is the initial public release of my BlueSCSI Wi-Fi Desk Accessory for
classic MacOS.<li><a href=https://michael.stapelberg.ch/posts/2023-10-25-my-all-flash-zfs-network-storage-build/ target=_blank rel=noopener>My 2023 all-flash ZFS NAS (Network Storage) build</a><a href=https://michael.stapelberg.ch/>Michael Stapelbergs Website</a><div>For over 10 years now, I run two self-built NAS (Network Storage) devices which serve media (currently via Jellyfin) and run daily backups of all my PCs and servers.
In this article, I describe my goals, which hardware I picked for my new build (and why) and how I set it up.
Design Goals
I use my netw…</ul><p>Generated with <a href=https://git.sr.ht/~sircmpwn/openring target=_blank rel=noopener>openring</a>.</section><footer><hr><p><big><strong>Want to comment or have something to add?</strong></big><p>You can write me an email
at <a href=mailto:mitja.felicijan@gmail.com>mitja.felicijan@gmail.com</a> or
catch up with me <a href=https://telegram.me/mitjafelicijan target=_blank>on Telegram</a>.<hr><p>This website does not track you. Content is made available under the <a href=https://creativecommons.org/licenses/by/4.0/ target=_blank rel=noreferrer>CC BY 4.0 license</a> unless
specified otherwise. Blog is also available as <a href=/index.xml target=_blank>RSS feed</a>.</footer><script>
	    window.va = window.va || function () { (window.vaq = window.vaq || []).push(arguments); };
	  </script><script defer src=/_vercel/insights/script.js></script>