aboutsummaryrefslogtreecommitdiff
path: root/content/2021-12-25-running-golang-application-as-pid1.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/2021-12-25-running-golang-application-as-pid1.md')
-rw-r--r--content/2021-12-25-running-golang-application-as-pid1.md348
1 files changed, 348 insertions, 0 deletions
diff --git a/content/2021-12-25-running-golang-application-as-pid1.md b/content/2021-12-25-running-golang-application-as-pid1.md
new file mode 100644
index 0000000..10543f2
--- /dev/null
+++ b/content/2021-12-25-running-golang-application-as-pid1.md
@@ -0,0 +1,348 @@
1---
2title: Running Golang application as PID 1 with Linux kernel
3url: running-golang-application-as-pid1.html
4date: 2021-12-25T12:00:00+02:00
5type: post
6draft: false
7---
8
9## Unikernels, kernels, and alike
10
11I have been reading a lot about
12[unikernernels](https://en.wikipedia.org/wiki/Unikernel) lately and found them
13very intriguing. When you push away all the marketing speak and look at the
14idea, it makes a lot of sense.
15
16> A unikernel is a specialized, single address space machine image constructed
17> by using library operating systems. ([Wikipedia](https://en.wikipedia.org/wiki/Unikernel))
18
19I really like the explanation from the article
20[Unikernels: Rise of the Virtual Library Operating System](https://queue.acm.org/detail.cfm?id=2566628).
21Really worth a read.
22
23If we compare a normal operating system to a unikernel side by side, they would
24look something like this.
25
26![Virtual machines vs Containers vs Unikernels](/assets/pid1/unikernels.png)
27
28From this image, we can see how the complexity significantly decreases with
29the use of Unikernels. This comes with a price, of course. Unikernels are hard
30to get running and require a lot of work since you don't have an actual proper
31kernel running in the background providing network access and drivers etc.
32
33So as a half step to make the stack simpler, I started looking into using
34Linux kernel as a base and going from there. I came across this
35[Youtube video talking about Building the Simplest Possible Linux System](https://www.youtube.com/watch?v=Sk9TatW9ino)
36by [Rob Landley](https://landley.net) and apart from statically compiling the
37application to be run as PID1 there was really no other obstacles.
38
39## What is PID 1?
40
41PID 1 is the first process that Linux kernel starts after the boot process.
42It also has a couple of unique properties that are unique to it.
43
44- When the process with PID 1 dies for any reason, all other processes are
45 killed with KILL signal.
46- When any process having children dies for any reason, its children are
47 re-parented to process with PID 1.
48- Many signals which have default action of Term do not have one for PID 1.
49- When the process with PID 1 dies for any reason, kernel panics, which
50 result in system crash.
51
52PID 1 is considered as an Init application which takes care of running other
53and handling services like:
54
55- sshd,
56- nginx,
57- pulseaudio,
58- etc.
59
60If you are on a Linux machine, you can check what your process is with PID 1
61by running the following.
62
63```sh
64$ cat /proc/1/status
65Name: systemd
66Umask: 0000
67State: S (sleeping)
68Tgid: 1
69Ngid: 0
70Pid: 1
71PPid: 0
72...
73```
74
75As we can see on my machine the process with id of 1 is [systemd](https://systemd.io/)
76which is a software suite that provides an array of system components for Linux
77operating systems. If you look closely you can also see that the `PPid`
78(process id of the parent process) is `0` which additionally confirms that
79this process doesn't have a parent.
80
81## So why even run application as PID 1 instead of just using a container?
82
83Containers are wonderful, but they come with a lot of baggage. And because they
84are in their nature layered, the images require quite a lot of space and also a
85lot of additional software to handle them. They are not as lightweight as they
86seem, and many popular images require 500 MB plus disk space.
87
88The idea of running this as PID 1 would result in a significantly smaller footprint,
89as we will see later in the post.
90
91> You could run a simple init system inside Docker container described more
92> in this article [Docker and the PID 1 zombie reaping problem](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/).
93
94## The master plan
95
961. Compile Linux kernel with the default definitions.
972. Prepare a Hello World application in Golang that is statically compiled.
983. Run it with [QEMU](https://www.qemu.org/) and providing Golang application
99 as init application / PID 1.
100
101For the sake of simplicity we will not be cross-compiling any of it and just
102use the 64bit version.
103
104## Compiling Linux kernel
105
106```sh
107$ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.15.7.tar.xz
108$ tar xf linux-5.15.7.tar.xz
109
110$ cd linux-5.15.7
111
112$ make clean
113
114# read more about this https://stackoverflow.com/a/41886394
115$ make defconfig
116
117$ time make -j `nproc`
118
119$ cd ..
120```
121
122At this point we have kernel image that is located in `arch/x86_64/boot/bzImage`.
123We will use this in QEMU later.
124
125To make our lives a bit easier lets move the kernel image to another place.
126Lets create a folder `bin/` in the root of our project with `mkdir -p bin`.
127
128
129At this point we can copy `bzImage` to `bin/` folder with
130`cp linux-5.15.7/arch/x86_64/boot/bzImage bin/bzImage`.
131
132The folder structure of this experiment should look like this.
133
134```
135pid1/
136 bin/
137 bzImage
138 linux-5.15.7/
139 linux-5.15.7.tar.xz
140```
141
142## Preparing PID 1 application in Golang
143
144This step is relatively easy. The only thing we must have in mind that we will
145need to compile the binary as a static one.
146
147Let's create `init.go` file in the root of the project.
148
149```go
150package main
151
152import (
153 "fmt"
154 "time"
155)
156
157func main() {
158 for {
159 fmt.Println("Hello from Golang")
160 time.Sleep(1 * time.Second)
161 }
162}
163```
164
165If you notice, we have a forever loop in the main, with a simple sleep of 1
166second to not overwhelm the CPU. This is because PID 1 should never complete
167and/or exit. That would result in a kernel panic. Which is BAD!
168
169There are two ways of compiling Golang application. Statically and dynamically.
170
171To statically compile the binary, use the following command.
172
173```sh
174$ go build -ldflags="-extldflags=-static" init.go
175```
176
177We can also check if the binary is statically compiled with:
178
179```sh
180$ file init
181init: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Ypu8Zw_4NBxm1Yxg2OYO/H5x721rQ9uTPiDVh-VqP/vZN7kXfGG1zhX_qdHMgH/9vBfmK81tFrygfOXDEOo, not stripped
182
183$ ldd init
184not a dynamic executable
185```
186
187At this point, we need to create [initramfs](https://www.linuxfromscratch.org/blfs/view/svn/postlfs/initramfs.html)
188(abbreviated from "initial RAM file system", is the successor of initrd. It
189is a cpio archive of the initial file system that gets loaded into memory
190during the Linux startup process).
191
192```sh
193$ echo init | cpio -o --format=newc > initramfs
194$ mv initramfs bin/initramfs
195```
196
197The projects at this stage should look like this.
198
199```
200pid1/
201 bin/
202 bzImage
203 initramfs
204 linux-5.15.7/
205 linux-5.15.7.tar.xz
206 init.go
207```
208
209## Running all of it with QEMU
210
211[QEMU](https://www.qemu.org/) is a free and open-source hypervisor. It emulates
212the machine's processor through dynamic binary translation and provides a set
213of different hardware and device models for the machine, enabling it to run a
214variety of guest operating systems.
215
216```sh
217$ qemu-system-x86_64 -serial stdio -kernel bin/bzImage -initrd bin/initramfs -append "console=ttyS0" -m 128
218```
219
220```sh
221$ qemu-system-x86_64 -serial stdio -kernel bin/bzImage -initrd bin/initramfs -append "console=ttyS0" -m 128
222[ 0.000000] Linux version 5.15.7 (m@khan) (gcc (GCC) 11.2.1 20211203 (Red Hat 11.2.1-7), GNU ld version 2.37-10.fc35) #7 SMP Mon Dec 13 10:23:25 CET 2021
223[ 0.000000] Command line: console=ttyS0
224[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
225[ 0.000000] signal: max sigframe size: 1440
226[ 0.000000] BIOS-provided physical RAM map:
227[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
228[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
229[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
230[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable
231[ 0.000000] BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved
232[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
233[ 0.000000] NX (Execute Disable) protection: active
234[ 0.000000] SMBIOS 2.8 present.
235[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014
236[ 0.000000] tsc: Fast TSC calibration failed
237...
238[ 2.016106] ALSA device list:
239[ 2.016329] No soundcards found.
240[ 2.053176] Freeing unused kernel image (initmem) memory: 1368K
241[ 2.056095] Write protecting the kernel read-only data: 20480k
242[ 2.058248] Freeing unused kernel image (text/rodata gap) memory: 2032K
243[ 2.058811] Freeing unused kernel image (rodata/data gap) memory: 500K
244[ 2.059164] Run /init as init process
245Hello from Golang
246[ 2.386879] tsc: Refined TSC clocksource calibration: 3192.032 MHz
247[ 2.387114] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e02e31fa14, max_idle_ns: 440795264947 ns
248[ 2.387380] clocksource: Switched to clocksource tsc
249[ 2.587895] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
250Hello from Golang
251Hello from Golang
252Hello from Golang
253```
254
255The whole [log file here](/assets/pid1/qemu.log).
256
257## Size comparison
258
259The cool thing about this approach is that the Linux kernel and the application
260together only take around 12 MB, which is impressive as hell. And we need to
261also know that the size of bzImage (Linux kernel) could be greatly decreased
262by going into `make menuconfig` and removing a ton of features from the kernel,
263making the size even smaller. I managed to get kernel size down to 2 MB and
264still working properly.
265
266```sh
267total 12M
268-rw-r--r--. 1 m m 9.3M Dec 13 10:24 bzImage
269-rw-r--r--. 1 m m 1.9M Dec 27 01:19 initramfs
270```
271
272## Creating ISO image and running it with Gnome Boxes
273
274First we need to create proper folder structure with `mkdir -p iso/boot/grub`.
275
276Then we need to download the [grub binary](https://github.com/littleosbook/littleosbook/raw/master/files/stage2_eltorito).
277You can read more about this program on https://github.com/littleosbook/littleosbook.
278
279```sh
280$ wget -O iso/boot/grub/stage2_eltorito https://github.com/littleosbook/littleosbook/raw/master/files/stage2_eltorito
281```
282
283```sh
284$ tree iso/boot/
285iso/boot/
286├── bzImage
287├── grub
288│   ├── menu.lst
289│   └── stage2_eltorito
290└── initramfs
291```
292
293Let's copy files into proper folders.
294
295
296```sh
297$ cp stage2_eltorito iso/boot/grub/
298$ cp bin/bzImage iso/boot/
299$ cp bin/initramfs iso/boot/
300```
301
302Lets create a GRUB config file at `nano iso/boot/grub/menu.lst` with contents.
303
304```ini
305default=0
306timeout=5
307
308title GoAsPID1
309kernel /boot/bzImage
310initrd /boot/initramfs
311```
312
313Let's create iso file by using genisoimage:
314
315```sh
316genisoimage -R \
317 -b boot/grub/stage2_eltorito \
318 -no-emul-boot \
319 -boot-load-size 4 \
320 -A os \
321 -input-charset utf8 \
322 -quiet \
323 -boot-info-table \
324 -o GoAsPID1.iso \
325 iso
326```
327
328This will produce `GoAsPID1.iso` which you can use with [Virtualbox](https://www.virtualbox.org/)
329or [Gnome Boxes](https://apps.gnome.org/app/org.gnome.Boxes/).
330
331<video src="/assets/pid1/boxes.mp4" controls></video>
332
333## Is running applications as PID 1 even worth it?
334
335Well, the answer to this is not as simple as one would think. Sometimes it is
336and sometimes it's not. For embedded systems and very specialized applications
337it is worth for sure. But in normal uses, I don't think so. It was an interesting
338exercise in compiling kernels and looking at the guts of the Linux kernel,
339but sticking to containers for most of the things is a better option in my
340opinion.
341
342An interesting experiment would be creating an image that supports networking
343and could be deployed to AWS as an EC2 instance and observing how it fares.
344But in that case, we would need to write some sort of supervisor that would
345run on a separate EC2 that would check if other EC2 instances are running
346properly. Remember that if your application fails, kernel panics and the
347whole machine is inoperable in this case.
348