TUTORIALS 》 A Linux system call in C without a standard library
This article is submitted by my student:
If you want to feature your article,
you can kindly contact me via email and send your article submissions (content and the resources).
Once they are reviewed, I should accept and post the same 🤗
🔗
When we learn C, we are taught that main is the first function called in a C program. But in reality, main is simply a convention of the standard library.
root@raminfp:# cat main.c #include <stdio.h> int main(int argc, char* argv[]) { printf("hello world\n"); return 0; } root@raminfp:# gcc main.c root@raminfp:# a.out hello world root@raminfp:#
Now we decode a.out with gdb tools:
root@raminfp:# gdb a.out Starting program: ~/a.out [----------------------------------registers-----------------------------------] RAX: 0x5555555546a0 (<main>: push rbp) RBX: 0x0 RCX: 0x0 RDX: 0x7fffffffddf8 --> 0x7fffffffe1da ("LC_PAPER=fa_IR") RSI: 0x7fffffffdde8 --> 0x7fffffffe18d ("/syscall/C_syscall_without_standard_library_linux/a.out") RDI: 0x1 RBP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>: push r15) RSP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>: push r15) RIP: 0x5555555546a4 (<main+4>: sub rsp,0x10) R8 : 0x555555554740 (<__libc_csu_fini>: repz ret) R9 : 0x7ffff7de8bd0 (<_dl_fini>: push rbp) R10: 0x10000000000 R11: 0x7ffff7ffa19c (mov ch,BYTE PTR [rdx]) R12: 0x555555554570 (<_start>: xor ebp,ebp) R13: 0x7fffffffdde0 --> 0x1 R14: 0x0 R15: 0x0 EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow) [-------------------------------------code-------------------------------------] 0x55555555469b <frame_dummy+43>: jmp 0x5555555545e0 <register_tm_clones> 0x5555555546a0 <main>: push rbp 0x5555555546a1 <main+1>: mov rbp,rsp => 0x5555555546a4 <main+4>: sub rsp,0x10 0x5555555546a8 <main+8>: mov DWORD PTR [rbp-0x4],edi 0x5555555546ab <main+11>: mov QWORD PTR [rbp-0x10],rsi 0x5555555546af <main+15>: lea rdi,[rip+0x9e] # 0x555555554754 0x5555555546b6 <main+22>: call 0x555555554560 [------------------------------------stack-------------------------------------] 0000| 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>: push r15) 0008| 0x7fffffffdd08 --> 0x7ffff7a313f1 (<__libc_start_main+241>: mov edi,eax) 0016| 0x7fffffffdd10 --> 0x7ffff7dce798 --> 0x7ffff7a30d30 (<init_cacheinfo>: push r15) 0024| 0x7fffffffdd18 --> 0x7fffffffdde8 --> 0x7fffffffe18d ("/home/raminfp/Desktop/syscall/C_syscall_without_standard_library_linux/a.out") 0032| 0x7fffffffdd20 --> 0x1f7b9a888 0040| 0x7fffffffdd28 --> 0x5555555546a0 (<main>: push rbp) 0048| 0x7fffffffdd30 --> 0x0 0056| 0x7fffffffdd38 --> 0xdac473773e2a1848 [------------------------------------------------------------------------------] Legend: code, data, rodata, value Breakpoint 1, 0x00005555555546a4 in main ()
notice the output of gdb run command. The first function in reality is _start
0x555555554570 (<_start>: xor ebp,ebp)
Now if we try to compile our current code with -nostdlib gcc option, we will run into linker errors as shown below:
root@raminfp:# gcc -s -O2 -nostdlib main.c /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310 /tmp/ccqHCAhy.o: In function `main': main.c:(.text.startup+0xc): undefined reference to `puts' collect2: error: ld returned 1 exit status
The linker is complaining about missing _start. We have a linker error on puts, which is a libc function.
So how do we print "hello world" without puts?
The answer is Linux kernel exposes a bunch of syscalls (system-calls), which are functions(APIs) that user-space
programs can use to interact with the OS. You find listd of syscall table:
https://github.com/torvalds/ ... /syscalls/syscall_64.tbl
Lets find out which syscall uses puts. For that we can use tools strace.
root@raminfp:# cat puts.c #include <stdio.h> int main(int argc, char* argv[]) { puts("hello"); return 0; } root@raminfp:# gcc puts.c root@raminfp:# strace ./a.out > /dev/null execve("./a.out", ["./a.out"], [/* 69 vars */]) = 0 brk(NULL) = 0x557f38db6000 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079d0000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=127890, ...}) = 0 mmap(NULL, 127890, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda079b0000 close(3) = 0 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0 mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda073e8000 mprotect(0x7fda075a5000, 2097152, PROT_NONE) = 0 mmap(0x7fda077a5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fda077a5000 mmap(0x7fda077ab000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda077ab000 close(3) = 0 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079ae000 arch_prctl(ARCH_SET_FS, 0x7fda079ae700) = 0 mprotect(0x7fda077a5000, 16384, PROT_READ) = 0 mprotect(0x557f38ce9000, 4096, PROT_READ) = 0 mprotect(0x7fda079d3000, 4096, PROT_READ) = 0 munmap(0x7fda079b0000, 127890) = 0 fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(1, TCGETS, 0x7ffd9aaa7a40) = -1 ENOTTY (Inappropriate ioctl for device) brk(NULL) = 0x557f38db6000 brk(0x557f38dd8000) = 0x557f38dd8000 write(1, "hello world\n", 6) = 6 exit_group(0) = ? +++ exited with 0 +++
In output as shown above, we can see write(1, "hello world\n", 6). Which means it is fine to replace puts() to write() API as shown below.
root@raminfp:# whatis write write (2) - write to a file descriptor write (1) - send a message to another user root@raminfp:# man 2 write root@raminfp:# cat write.c #include <unistd.h> #include <stdio.h> int main(int argc, char* argv[]) { write(1, "hello world\n", 6); return 0; } root@raminfp:# gcc -s -O2 -nostdlib write.c write.c: In function ‘main’: write.c:12:5: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] write(1, "hello world\n", 13); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310 /tmp/ccYMZ2gc.o: In function `main': write.c:(.text.startup+0x16): undefined reference to `write' collect2: error: ld returned 1 exit status
Oops! even the "write" function is a part of the standard library !
1. User-level applications use integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
2. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.
3. The number of the syscall has to be passed in register %rax.
4. System-calls are limited to six arguments, no argument is passed directly on the stack.
5. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno.
6. Only values of class INTEGER or class MEMORY are passed to the kernel.
this will be our syscall wrapper (an Intel x86 syntax):
root@raminfp:# cat syscall.S .intel_syntax noprefix .text .globl syscall syscall: mov rax,rdi mov rdi,rsi mov rsi,rdx mov rdx,rcx mov r10,r8 mov r8,r9 syscall ret rax = syscall number here for write is 1 rdi = param1 rsi = param2 rdx = param3 rcx = param4 r8 = param5 r9 = param6
Now we can use Assm(Assembly) and C for our new hello world program and compile the same:
root@raminfp:# cat assm_syscall.S // Putting it all together, our _start function needs to: // - zero rbp // - put argc into rdi (1st parameter for main) // - put the stack address of argv[0] into rsi (2nd param for main), // which will be interpreted as an array of char pointers. // - align stack to 16-bytes // - call main .intel_syntax noprefix .text .globl _start, syscall _start: // _start function xor rbp,rbp /* xoring a value with itself = 0 */ pop rdi /* rdi = argc */ /* the pop instruction already added 8 to rsp */ mov rsi,rsp /* rest of the stack as an array of char ptr */ and rsp,-16 call main // call main function // _EXIT // man 2 _EXIT mov rdi,rax /* syscall param 1 = rax (ret value of main) */ mov rax,60 /* SYS_exit */ syscall ret syscall: mov rax,rdi mov rdi,rsi mov rsi,rdx mov rdx,rcx mov r10,r8 mov r8,r9 syscall ret root@raminfp:# cat assm_syscall.c void* syscall( void* syscall_number, void* param1, void* param2, void* param3, void* param4, void* param5 ); typedef unsigned long int uintptr; /* size_t */ typedef long int intptr; /* ssize_t */ static intptr write(int fd, void const* data, uintptr nbytes) { return (intptr) syscall( (void*)1, /* SYS_write */ (void*)(intptr)fd, (void*)data, (void*)nbytes, 0, /* ignored */ 0 /* ignored */ ); } int main(int argc, char* argv[]) { write(1, "hello world\n", 13); return 0; }
Now if we compile the two source file assm_syscall.S and assm_syscall.c files as shown below, we get the same output as any standard libc
printf() (or write(1,...)) output as you can see below.
root@raminfp:# gcc -s -O2 -nostdlib assm_syscall.S assm_syscall.c root@raminfp:#./a.out hello world
So this is how you can breakdown standard libc APIs and write a C code without using the same. And if required you can write your own
custom libraries based on this technique !
Here is the same I published in Github:
C source code and Assembly
linux system call
Featured Video:
Suggested Topics:
Join The Linux Channel :: Facebook Group ↗
Visit The Linux Channel :: on Youtube ↗
💗 Help shape the future: Sponsor/Donate
Recommended Topics:
Featured Video:
Trending Video:
Recommended Video: