From c142732367dde6a2e350b33dfebbfbcd8bd891ce Mon Sep 17 00:00:00 2001 From: Mitja Felicijan Date: Mon, 27 Dec 2021 01:30:05 +0100 Subject: New post: Golang as PID 1 --- Makefile | 3 + assets/pid1/qemu.log | 320 ++++++++++++ assets/pid1/unikernels.png | Bin 0 -> 48567 bytes assets/pid1/unikernels.svg | 578 +++++++++++++++++++++ ...021-12-25-running-golang-application-as-pid1.md | 229 ++++++++ 5 files changed, 1130 insertions(+) create mode 100644 assets/pid1/qemu.log create mode 100644 assets/pid1/unikernels.png create mode 100644 assets/pid1/unikernels.svg create mode 100644 posts/2021-12-25-running-golang-application-as-pid1.md diff --git a/Makefile b/Makefile index 450f700..049c04b 100644 --- a/Makefile +++ b/Makefile @@ -33,6 +33,9 @@ build: alternator --build rm template/openring-build.html +server: + python3 -m http.server 8000 --directory public + deploy: build cd public && scp -r * root@165.22.87.180:/var/www/html/mitjafelicijan.com/ ssh root@165.22.87.180 chown www-data:www-data /var/www/html/mitjafelicijan.com/ -Rf diff --git a/assets/pid1/qemu.log b/assets/pid1/qemu.log new file mode 100644 index 0000000..11be312 --- /dev/null +++ b/assets/pid1/qemu.log @@ -0,0 +1,320 @@ +[ 0.000000] Linux version 5.15.7 (m@khan) (gcc (GCC) 11.2.1 20211203 (Red Hat 11.2.1-7), GNU ld version 2.37-10.fc35) #7 SMP Mon Dec 13 10:23:25 CET 2021 +[ 0.000000] Command line: console=ttyS0 +[ 0.000000] x86/fpu: x87 FPU will use FXSAVE +[ 0.000000] signal: max sigframe size: 1440 +[ 0.000000] BIOS-provided physical RAM map: +[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable +[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved +[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved +[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable +[ 0.000000] BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved +[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved +[ 0.000000] NX (Execute Disable) protection: active +[ 0.000000] SMBIOS 2.8 present. +[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014 +[ 0.000000] tsc: Fast TSC calibration failed +[ 0.000000] last_pfn = 0x7fe0 max_arch_pfn = 0x400000000 +[ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT +[ 0.000000] found SMP MP-table at [mem 0x000f5c40-0x000f5c4f] +[ 0.000000] RAMDISK: [mem 0x07e06000-0x07fdffff] +[ 0.000000] ACPI: Early table checksum verification disabled +[ 0.000000] ACPI: RSDP 0x00000000000F5A80 000014 (v00 BOCHS ) +[ 0.000000] ACPI: RSDT 0x0000000007FE1905 000034 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: FACP 0x0000000007FE17B9 000074 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: DSDT 0x0000000007FE0040 001779 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: FACS 0x0000000007FE0000 000040 +[ 0.000000] ACPI: APIC 0x0000000007FE182D 000078 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: HPET 0x0000000007FE18A5 000038 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: WAET 0x0000000007FE18DD 000028 (v01 BOCHS BXPC 00000001 BXPC 00000001) +[ 0.000000] ACPI: Reserving FACP table memory at [mem 0x7fe17b9-0x7fe182c] +[ 0.000000] ACPI: Reserving DSDT table memory at [mem 0x7fe0040-0x7fe17b8] +[ 0.000000] ACPI: Reserving FACS table memory at [mem 0x7fe0000-0x7fe003f] +[ 0.000000] ACPI: Reserving APIC table memory at [mem 0x7fe182d-0x7fe18a4] +[ 0.000000] ACPI: Reserving HPET table memory at [mem 0x7fe18a5-0x7fe18dc] +[ 0.000000] ACPI: Reserving WAET table memory at [mem 0x7fe18dd-0x7fe1904] +[ 0.000000] No NUMA configuration found +[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000007fdffff] +[ 0.000000] NODE_DATA(0) allocated [mem 0x07e02000-0x07e05fff] +[ 0.000000] Zone ranges: +[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] +[ 0.000000] DMA32 [mem 0x0000000001000000-0x0000000007fdffff] +[ 0.000000] Normal empty +[ 0.000000] Movable zone start for each node +[ 0.000000] Early memory node ranges +[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] +[ 0.000000] node 0: [mem 0x0000000000100000-0x0000000007fdffff] +[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x0000000007fdffff] +[ 0.000000] On node 0, zone DMA: 1 pages in unavailable ranges +[ 0.000000] On node 0, zone DMA: 97 pages in unavailable ranges +[ 0.000000] On node 0, zone DMA32: 32 pages in unavailable ranges +[ 0.000000] ACPI: PM-Timer IO Port: 0x608 +[ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) +[ 0.000000] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 +[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) +[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) +[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) +[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) +[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) +[ 0.000000] ACPI: Using ACPI (MADT) for SMP configuration information +[ 0.000000] ACPI: HPET id: 0x8086a201 base: 0xfed00000 +[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs +[ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff] +[ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x0009f000-0x0009ffff] +[ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000effff] +[ 0.000000] PM: hibernation: Registered nosave memory: [mem 0x000f0000-0x000fffff] +[ 0.000000] [mem 0x08000000-0xfffbffff] available for PCI devices +[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns +[ 0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:1 nr_node_ids:1 +[ 0.000000] percpu: Embedded 52 pages/cpu s174360 r8192 d30440 u2097152 +[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 31968 +[ 0.000000] Policy zone: DMA32 +[ 0.000000] Kernel command line: console=ttyS0 +[ 0.000000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear) +[ 0.000000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) +[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off +[ 0.000000] Memory: 94464K/130552K available (14350K kernel code, 2582K rwdata, 3596K rodata, 1368K init, 1488K bss, 35828K reserved, 0K cma-reserved) +[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 +[ 0.000000] rcu: Hierarchical RCU implementation. +[ 0.000000] rcu: RCU event tracing is enabled. +[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1. +[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies. +[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1 +[ 0.000000] NR_IRQS: 4352, nr_irqs: 256, preallocated irqs: 16 +[ 0.000000] random: get_random_bytes called from start_kernel+0x492/0x65f with crng_init=0 +[ 0.000000] Console: colour VGA+ 80x25 +[ 0.000000] printk: console [ttyS0] enabled +[ 0.000000] ACPI: Core revision 20210730 +[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns +[ 0.002000] APIC: Switch to symmetric I/O mode setup +[ 0.005000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 +[ 0.013000] tsc: Unable to calibrate against PIT +[ 0.014000] tsc: using HPET reference calibration +[ 0.014000] tsc: Detected 3189.099 MHz processor +[ 0.001005] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2df8103a89b, max_idle_ns: 440795220785 ns +[ 0.002672] Calibrating delay loop (skipped), value calculated using timer frequency.. 6378.19 BogoMIPS (lpj=3189099) +[ 0.002960] pid_max: default: 32768 minimum: 301 +[ 0.003627] LSM: Security Framework initializing +[ 0.004329] SELinux: Initializing. +[ 0.005051] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear) +[ 0.005202] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear) +[ 0.020479] process: using AMD E400 aware idle routine +[ 0.020699] Last level iTLB entries: 4KB 512, 2MB 255, 4MB 127 +[ 0.020832] Last level dTLB entries: 4KB 512, 2MB 255, 4MB 127, 1GB 0 +[ 0.021165] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization +[ 0.021438] Spectre V2 : Mitigation: Full AMD retpoline +[ 0.021586] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch +[ 0.238228] Freeing SMP alternatives memory: 44K +[ 0.242641] random: fast init done +[ 0.350203] smpboot: CPU0: AMD QEMU Virtual CPU version 2.5+ (family: 0xf, model: 0x6b, stepping: 0x1) +[ 0.355136] Performance Events: PMU not available due to virtualization, using software events only. +[ 0.356607] rcu: Hierarchical SRCU implementation. +[ 0.360890] smp: Bringing up secondary CPUs ... +[ 0.361082] smp: Brought up 1 node, 1 CPU +[ 0.361253] smpboot: Max logical packages: 1 +[ 0.361394] smpboot: Total of 1 processors activated (6378.19 BogoMIPS) +[ 0.371481] devtmpfs: initialized +[ 0.378162] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns +[ 0.378478] futex hash table entries: 256 (order: 2, 16384 bytes, linear) +[ 0.381522] PM: RTC time: 00:19:47, date: 2021-12-27 +[ 0.384915] NET: Registered PF_NETLINK/PF_ROUTE protocol family +[ 0.387403] audit: initializing netlink subsys (disabled) +[ 0.391765] audit: type=2000 audit(1640564386.402:1): state=initialized audit_enabled=0 res=1 +[ 0.392916] thermal_sys: Registered thermal governor 'step_wise' +[ 0.392950] thermal_sys: Registered thermal governor 'user_space' +[ 0.393202] cpuidle: using governor menu +[ 0.394085] ACPI: bus type PCI registered +[ 0.396583] PCI: Using configuration type 1 for base access +[ 0.415012] Kprobes globally optimized +[ 0.416844] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages +[ 0.420649] cryptomgr_test (20) used greatest stack depth: 15680 bytes left +[ 0.426071] ACPI: Added _OSI(Module Device) +[ 0.426182] ACPI: Added _OSI(Processor Device) +[ 0.426279] ACPI: Added _OSI(3.0 _SCP Extensions) +[ 0.426376] ACPI: Added _OSI(Processor Aggregator Device) +[ 0.426606] ACPI: Added _OSI(Linux-Dell-Video) +[ 0.426709] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) +[ 0.426821] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics) +[ 0.439511] ACPI: 1 ACPI AML tables successfully acquired and loaded +[ 0.452709] ACPI: Interpreter enabled +[ 0.453468] ACPI: PM: (supports S0 S3 S4 S5) +[ 0.453603] ACPI: Using IOAPIC for interrupt routing +[ 0.454022] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug +[ 0.455266] ACPI: Enabled 2 GPEs in block 00 to 0F +[ 0.480013] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) +[ 0.480702] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3] +[ 0.481425] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge. +[ 0.483666] PCI host bridge to bus 0000:00 +[ 0.483848] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] +[ 0.484096] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] +[ 0.484237] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] +[ 0.484480] pci_bus 0000:00: root bus resource [mem 0x08000000-0xfebfffff window] +[ 0.484578] pci_bus 0000:00: root bus resource [mem 0x100000000-0x17fffffff window] +[ 0.484870] pci_bus 0000:00: root bus resource [bus 00-ff] +[ 0.486588] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 +[ 0.492625] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 +[ 0.493621] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 +[ 0.495015] pci 0000:00:01.1: reg 0x20: [io 0xc040-0xc04f] +[ 0.495760] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] +[ 0.495936] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] +[ 0.496095] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] +[ 0.496598] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] +[ 0.497793] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 +[ 0.498219] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI +[ 0.498384] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB +[ 0.499487] pci 0000:00:02.0: [1234:1111] type 00 class 0x030000 +[ 0.500186] pci 0000:00:02.0: reg 0x10: [mem 0xfd000000-0xfdffffff pref] +[ 0.500569] pci 0000:00:02.0: reg 0x18: [mem 0xfebf0000-0xfebf0fff] +[ 0.502569] pci 0000:00:02.0: reg 0x30: [mem 0xfebe0000-0xfebeffff pref] +[ 0.508052] pci 0000:00:03.0: [8086:100e] type 00 class 0x020000 +[ 0.508590] pci 0000:00:03.0: reg 0x10: [mem 0xfebc0000-0xfebdffff] +[ 0.509075] pci 0000:00:03.0: reg 0x14: [io 0xc000-0xc03f] +[ 0.511015] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref] +[ 0.517286] ACPI: PCI: Interrupt link LNKA configured for IRQ 10 +[ 0.518032] ACPI: PCI: Interrupt link LNKB configured for IRQ 10 +[ 0.518504] ACPI: PCI: Interrupt link LNKC configured for IRQ 11 +[ 0.518920] ACPI: PCI: Interrupt link LNKD configured for IRQ 11 +[ 0.519208] ACPI: PCI: Interrupt link LNKS configured for IRQ 9 +[ 0.521412] iommu: Default domain type: Translated +[ 0.521589] iommu: DMA domain TLB invalidation policy: lazy mode +[ 0.524448] pci 0000:00:02.0: vgaarb: setting as boot VGA device +[ 0.524569] pci 0000:00:02.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none +[ 0.524633] pci 0000:00:02.0: vgaarb: bridge control possible +[ 0.524846] vgaarb: loaded +[ 0.526151] SCSI subsystem initialized +[ 0.528124] ACPI: bus type USB registered +[ 0.528600] usbcore: registered new interface driver usbfs +[ 0.528917] usbcore: registered new interface driver hub +[ 0.529156] usbcore: registered new device driver usb +[ 0.529593] pps_core: LinuxPPS API ver. 1 registered +[ 0.529693] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti +[ 0.529916] PTP clock support registered +[ 0.531428] Advanced Linux Sound Architecture Driver Initialized. +[ 0.538313] NetLabel: Initializing +[ 0.538413] NetLabel: domain hash size = 128 +[ 0.538513] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO +[ 0.539300] NetLabel: unlabeled traffic allowed by default +[ 0.540192] PCI: Using ACPI for IRQ routing +[ 0.541336] hpet: 3 channels of 0 reserved for per-cpu timers +[ 0.541742] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 +[ 0.541934] hpet0: 3 comparators, 64-bit 100.000000 MHz counter +[ 0.547124] clocksource: Switched to clocksource tsc-early +[ 0.589778] VFS: Disk quotas dquot_6.6.0 +[ 0.590116] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) +[ 0.591999] pnp: PnP ACPI init +[ 1.348853] pnp: PnP ACPI: found 6 devices +[ 1.363393] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns +[ 1.364026] NET: Registered PF_INET protocol family +[ 1.364871] IP idents hash table entries: 2048 (order: 2, 16384 bytes, linear) +[ 1.369722] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear) +[ 1.369973] TCP established hash table entries: 1024 (order: 1, 8192 bytes, linear) +[ 1.370241] TCP bind hash table entries: 1024 (order: 2, 16384 bytes, linear) +[ 1.370483] TCP: Hash tables configured (established 1024 bind 1024) +[ 1.371348] UDP hash table entries: 256 (order: 1, 8192 bytes, linear) +[ 1.371835] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear) +[ 1.373053] NET: Registered PF_UNIX/PF_LOCAL protocol family +[ 1.374701] RPC: Registered named UNIX socket transport module. +[ 1.375153] RPC: Registered udp transport module. +[ 1.375280] RPC: Registered tcp transport module. +[ 1.375386] RPC: Registered tcp NFSv4.1 backchannel transport module. +[ 1.377429] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] +[ 1.377567] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] +[ 1.377738] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] +[ 1.377893] pci_bus 0000:00: resource 7 [mem 0x08000000-0xfebfffff window] +[ 1.378032] pci_bus 0000:00: resource 8 [mem 0x100000000-0x17fffffff window] +[ 1.378574] pci 0000:00:01.0: PIIX3: Enabling Passive Release +[ 1.378817] pci 0000:00:00.0: Limiting direct PCI/PCI transfers +[ 1.378993] pci 0000:00:01.0: Activating ISA DMA hang workarounds +[ 1.379296] pci 0000:00:02.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] +[ 1.379537] PCI: CLS 0 bytes, default 64 +[ 1.385473] Unpacking initramfs... +[ 1.394653] Initialise system trusted keyrings +[ 1.395898] workingset: timestamp_bits=56 max_order=15 bucket_order=0 +[ 1.400517] Freeing initrd memory: 1896K +[ 1.409899] NFS: Registering the id_resolver key type +[ 1.410240] Key type id_resolver registered +[ 1.410358] Key type id_legacy registered +[ 1.436299] Key type asymmetric registered +[ 1.436505] Asymmetric key parser 'x509' registered +[ 1.436899] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251) +[ 1.437334] io scheduler mq-deadline registered +[ 1.437848] io scheduler kyber registered +[ 1.440723] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 +[ 1.443386] ACPI: button: Power Button [PWRF] +[ 1.445654] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled +[ 1.447264] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A +[ 1.450740] Non-volatile memory driver v1.3 +[ 1.451106] Linux agpgart interface v0.103 +[ 1.467087] loop: module loaded +[ 1.474468] scsi host0: ata_piix +[ 1.476252] scsi host1: ata_piix +[ 1.476701] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc040 irq 14 +[ 1.476882] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc048 irq 15 +[ 1.481539] libphy: Fixed MDIO Bus: probed +[ 1.482188] e100: Intel(R) PRO/100 Network Driver +[ 1.482313] e100: Copyright(c) 1999-2006 Intel Corporation +[ 1.482507] e1000: Intel(R) PRO/1000 Network Driver +[ 1.482702] e1000: Copyright (c) 1999-2006 Intel Corporation. +[ 1.616439] ACPI: \_SB_.LNKC: Enabled at IRQ 11 +[ 1.649465] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 +[ 1.664135] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 +[ 1.693021] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray +[ 1.693338] cdrom: Uniform CD-ROM driver Revision: 3.20 +[ 1.723925] sr 1:0:0:0: Attached scsi generic sg0 type 5 +[ 1.946674] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56 +[ 1.947107] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection +[ 1.947650] e1000e: Intel(R) PRO/1000 Network Driver +[ 1.947749] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. +[ 1.947947] sky2: driver version 1.30 +[ 1.948805] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver +[ 1.948993] ehci-pci: EHCI PCI platform driver +[ 1.949218] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver +[ 1.949394] ohci-pci: OHCI PCI platform driver +[ 1.949636] uhci_hcd: USB Universal Host Controller Interface driver +[ 1.950082] usbcore: registered new interface driver usblp +[ 1.950302] usbcore: registered new interface driver usb-storage +[ 1.951012] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 +[ 1.954333] serio: i8042 KBD port at 0x60,0x64 irq 1 +[ 1.954634] serio: i8042 AUX port at 0x60,0x64 irq 12 +[ 1.957984] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 +[ 1.960071] rtc_cmos 00:05: RTC can wake from S4 +[ 1.964738] rtc_cmos 00:05: registered as rtc0 +[ 1.965357] rtc_cmos 00:05: alarms up to one day, y3k, 242 bytes nvram, hpet irqs +[ 1.966676] device-mapper: ioctl: 4.45.0-ioctl (2021-03-22) initialised: dm-devel@redhat.com +[ 1.967364] hid: raw HID events driver (C) Jiri Kosina +[ 1.968571] usbcore: registered new interface driver usbhid +[ 1.968750] usbhid: USB HID core driver +[ 1.974818] Initializing XFRM netlink socket +[ 1.975673] NET: Registered PF_INET6 protocol family +[ 1.981212] Segment Routing with IPv6 +[ 1.981421] In-situ OAM (IOAM) with IPv6 +[ 1.982292] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver +[ 1.984278] NET: Registered PF_PACKET protocol family +[ 1.984857] Key type dns_resolver registered +[ 1.985989] IPI shorthand broadcast: enabled +[ 1.986261] sched_clock: Marking stable (1999028700, -13430834)->(1985937339, -339473) +[ 1.987965] registered taskstats version 1 +[ 1.988095] Loading compiled-in X.509 certificates +[ 1.991283] PM: Magic number: 1:335:305 +[ 1.991523] tty tty34: hash matches +[ 1.991951] printk: console [netcon0] enabled +[ 1.992067] netconsole: network logging started +[ 1.994549] cfg80211: Loading compiled-in X.509 certificates for regulatory database +[ 2.004972] kworker/u2:2 (64) used greatest stack depth: 14856 bytes left +[ 2.012521] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' +[ 2.013924] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2 +[ 2.014318] cfg80211: failed to load regulatory.db +[ 2.016106] ALSA device list: +[ 2.016329] No soundcards found. +[ 2.053176] Freeing unused kernel image (initmem) memory: 1368K +[ 2.056095] Write protecting the kernel read-only data: 20480k +[ 2.058248] Freeing unused kernel image (text/rodata gap) memory: 2032K +[ 2.058811] Freeing unused kernel image (rodata/data gap) memory: 500K +[ 2.059164] Run /init as init process +Hello from Golang +[ 2.386879] tsc: Refined TSC clocksource calibration: 3192.032 MHz +[ 2.387114] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e02e31fa14, max_idle_ns: 440795264947 ns +[ 2.387380] clocksource: Switched to clocksource tsc +[ 2.587895] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3 +Hello from Golang +Hello from Golang +Hello from Golang diff --git a/assets/pid1/unikernels.png b/assets/pid1/unikernels.png new file mode 100644 index 0000000..16026ed Binary files /dev/null and b/assets/pid1/unikernels.png differ diff --git a/assets/pid1/unikernels.svg b/assets/pid1/unikernels.svg new file mode 100644 index 0000000..77d1547 --- /dev/null +++ b/assets/pid1/unikernels.svg @@ -0,0 +1,578 @@ + + + + + + + + + + + + VIRTUAL MACHINE + + + HOST OS / HYPERVISOR + + + VIRTUALMACHINE + GUEST OS + TARGETSOFTWARE + + + HARDWARE + + + + VIRTUALMACHINE + GUEST OS + TARGETSOFTWARE + + CONTAINERS + + + HOST OS + + + + CONTAINER + PROGRAMS &LIBRARIES + TARGETSOFTWARE + + + + HARDWARE + + + + CONTAINER + PROGRAMS &LIBRARIES + TARGETSOFTWARE + + UNIKERNELS + + + HYPERVISOR + + + + UNIKERNEL &APPLICATION + + + + HARDWARE + + + + UNIKERNEL &APPLICATION + + + + + diff --git a/posts/2021-12-25-running-golang-application-as-pid1.md b/posts/2021-12-25-running-golang-application-as-pid1.md new file mode 100644 index 0000000..1eef97b --- /dev/null +++ b/posts/2021-12-25-running-golang-application-as-pid1.md @@ -0,0 +1,229 @@ +--- +Title: Running Golang application as PID 1 with Linux kernel +Description: Running Golang application as PID 1 with Linux kernel +Slug: running-golang-application-as-pid1 +Listing: true +Created: 2021-12-25 +Tags: [] +--- + + + +I have been reading a lot about [unikernernels](https://en.wikipedia.org/wiki/Unikernel) lately and found them very intriguing. When you push away all the marketing speak and look at the idea, it makes a lot of sense. + +> A unikernel is a specialized, single address space machine image constructed by using library operating systems. ([Wikipedia](https://en.wikipedia.org/wiki/Unikernel)) + +I really like the explanation from the article [Unikernels: Rise of the Virtual Library Operating System](https://queue.acm.org/detail.cfm?id=2566628). Really worth a read. + +If we compare a normal operating system to a unikernel side by side, they would look something like this. + +![Virtual machines vs Containers vs Unikernels](/assets/pid1/unikernels.png) + +From this image, we can see how the complexity significantly decreases with the use of Unikernels. This comes with a price, of course. Unikernels are hard to get running and require a lot of work since you don't have an actual proper kernel running in the background providing network access and drivers etc. + +So as a half step to make the stack simpler, I started looking into using Linux kernel as a base and going from there. I came across this [Youtube video talking about Building the Simplest Possible Linux System](https://www.youtube.com/watch?v=Sk9TatW9ino) by [Rob Landley](https://landley.net) and apart from statically compiling the application to be run as PID1 there was really no other obstacles. + +## What is PID 1? + +PID 1 is the first process that Linux kernel starts after the boot process. It also has a couple of unique properties that are unique to it. + +- When the process with PID 1 dies for any reason, all other processes are killed with KILL signal. +- When any process having children dies for any reason, its children are re-parented to process with PID 1. +- Many signals which have default action of Term do not have one for PID 1. +- When the process with PID 1 dies for any reason, kernel panics, which result in system crash. + +PID 1 is considered as an Init application which takes care of running other and handling services like: + +- sshd, +- nginx, +- pulseaudio, +- etc. + +If you are on a Linux machine, you can check what your process is with PID 1 by running the following. + +```sh +$ cat /proc/1/status +Name: systemd +Umask: 0000 +State: S (sleeping) +Tgid: 1 +Ngid: 0 +Pid: 1 +PPid: 0 +... +``` + +As we can see on my machine the process with id of 1 is [systemd](https://systemd.io/) which is a software suite that provides an array of system components for Linux operating systems. If you look closely you can also see that the `PPid` (process id of the parent process) is `0` which additionally confirms that this process doesn't have a parent. + +## So why even run application as PID 1 instead of just using a container? + +Containers are wonderful, but they come with a lot of baggage. And because they are in their nature layered, the images require quite a lot of space and also a lot of additional software to handle them. They are not as lightweight as they seem, and many popular images require 500 MB plus disk space. + +The idea of running this as PID 1 would result in a significantly smaller footprint, as we will see later in the post. + +> You could run a simple init system inside Docker container described more in this article [Docker and the PID 1 zombie reaping problem](https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/). + +## The plan + +1. Compile Linux kernel with the default definitions. +2. Prepare a Hello World application in Golang that is statically compiled. +3. Run it with [QEMU](https://www.qemu.org/) and providing Golang application as init application / PID 1. + +For the sake of simplicity we will not be cross-compiling any of it and just use the 64bit version. + +## Compiling Linux kernel + +```sh +wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.15.7.tar.xz +tar xf linux-5.15.7.tar.xz + +cd linux-5.15.7 + +make clean + +# read more about this https://stackoverflow.com/a/41886394 +make defconfig + +time make -j `nproc` + +cd .. +``` + +At this point we have kernel image that is located in `arch/x86_64/boot/bzImage`. We will use this this in QEMU later. + +To make our lives a bit easier lets move the kernel image to another place. Lets create a folder `bin/` in the root of our project with `mkdir -p bin`. + + +At this point we can copy `bzImage` to `bin/` folder with `cp linux-5.15.7/arch/x86_64/boot/bzImage bin/bzImage`. + +The folder structure of this experiment should look like this. + +``` +pid1/ + bin/ + bzImage + linux-5.15.7/ + linux-5.15.7.tar.xz +``` + +## Preparing PID 1 application in Golang + +This step is relatively easy. The only thing we must have in mind that we will need to compile the binary as a static one. + +Let's create `init.go` file in the root of the project. + +```go +package main + +import ( + "fmt" + "time" +) + +func main() { + for { + fmt.Println("Hello from Golang") + time.Sleep(1 * time.Second) + } +} +``` + +If you notice, we have a forever loop in the main, with a simple sleep of 1 second to not overwhelm the CPU. + +There are two ways of compiling Golang application. Statically and dynamically. + +To statically compile the binary, use the following command. + +```sh +go build -ldflags="-extldflags=-static" init.go +``` + +We can also check if the binary is statically compiled with: + +```sh +$ file init +init: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Ypu8Zw_4NBxm1Yxg2OYO/H5x721rQ9uTPiDVh-VqP/vZN7kXfGG1zhX_qdHMgH/9vBfmK81tFrygfOXDEOo, not stripped + +$ ldd init +not a dynamic executable +``` + +At this point, we need to create [initramfs](https://www.linuxfromscratch.org/blfs/view/svn/postlfs/initramfs.html) (abbreviated from "initial RAM file system", is the successor of initrd. It is a cpio archive of the initial file system that gets loaded into memory during the Linux startup process). + +```sh +echo init | cpio -o --format=newc > initramfs +mv initramfs bin/initramfs +``` + +The projects at this stage should look like this. + +``` +pid1/ + bin/ + bzImage + initramfs + linux-5.15.7/ + linux-5.15.7.tar.xz + init.go +``` + +## Running all of it with QEMU + +[QEMU](https://www.qemu.org/) is a free and open-source hypervisor. It emulates the machine's processor through dynamic binary translation and provides a set of different hardware and device models for the machine, enabling it to run a variety of guest operating systems. + +```sh +qemu-system-x86_64 -serial stdio -kernel bin/bzImage -initrd bin/initramfs -append "console=ttyS0" -m 128 +``` + +```sh +$ qemu-system-x86_64 -serial stdio -kernel bin/bzImage -initrd bin/initramfs -append "console=ttyS0" -m 128 +[ 0.000000] Linux version 5.15.7 (m@khan) (gcc (GCC) 11.2.1 20211203 (Red Hat 11.2.1-7), GNU ld version 2.37-10.fc35) #7 SMP Mon Dec 13 10:23:25 CET 2021 +[ 0.000000] Command line: console=ttyS0 +[ 0.000000] x86/fpu: x87 FPU will use FXSAVE +[ 0.000000] signal: max sigframe size: 1440 +[ 0.000000] BIOS-provided physical RAM map: +[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable +[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved +[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved +[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007fdffff] usable +[ 0.000000] BIOS-e820: [mem 0x0000000007fe0000-0x0000000007ffffff] reserved +[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved +[ 0.000000] NX (Execute Disable) protection: active +[ 0.000000] SMBIOS 2.8 present. +[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-6.fc35 04/01/2014 +[ 0.000000] tsc: Fast TSC calibration failed +... +[ 2.016106] ALSA device list: +[ 2.016329] No soundcards found. +[ 2.053176] Freeing unused kernel image (initmem) memory: 1368K +[ 2.056095] Write protecting the kernel read-only data: 20480k +[ 2.058248] Freeing unused kernel image (text/rodata gap) memory: 2032K +[ 2.058811] Freeing unused kernel image (rodata/data gap) memory: 500K +[ 2.059164] Run /init as init process +Hello from Golang +[ 2.386879] tsc: Refined TSC clocksource calibration: 3192.032 MHz +[ 2.387114] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2e02e31fa14, max_idle_ns: 440795264947 ns +[ 2.387380] clocksource: Switched to clocksource tsc +[ 2.587895] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3 +Hello from Golang +Hello from Golang +Hello from Golang +``` + +The whole [log file here](/assets/pid1/qemu.log). + +## Size comparison + +The cool thing about this approach is that the Linux kernel and the application together only take around 12 MB, which is impressive as hell. And we need to also know that the size of bzImage (Linux kernel) could be greatly decreased by going into `make menuconfig` and removing a ton of features from the kernel, making the size even smaller. I managed to get kernel size down to 2 MB and still working properly. + +```sh +total 12M +-rw-r--r--. 1 m m 9.3M Dec 13 10:24 bzImage +-rw-r--r--. 1 m m 1.9M Dec 27 01:19 initramfs +``` + +## Is running applications as PID 1 even worth it? + +Well, the answer to this is not as simple as one would think. Sometimes it is and sometimes it's not. For embedded systems and very specialized applications it is worth for sure. But in normal uses, I don't think so. It was an interesting exercise in compiling kernels and looking at the guts of the Linux kernel, but sticking to containers for most of the things is a better option in my opinion. + +An interesting experiment would be creating an image that supports networking and could be deployed to AWS as an EC2 instance and observing how it fares. But in that case, we would need to write some sort of supervisor that would run on a separate EC2 that would check if other EC2 instances are running properly. Remember that if your application fails, kernel panics and the whole machine is inoperable in this case. -- cgit v1.2.3