HOMEVIDEOSCOURSESSTUDENTSSPONSORSDONATIONSEVENTSTUTORIALSLINKSNEWSCONTACT


TUTORIALS 》 A Linux system call in C without a standard library

This article is submitted by my student:
If you want to feature your article, you can kindly contact me via email and send your article submissions (content and the resources). Once they are reviewed, I should accept and post the same 🤗

11 - Ramin Farajpour Cami, Iran, Iran ↗
LinkedinGithub 🔗
Senior Cyber Security Researcher, EDG (Engineering Development Group) of CCI (Center for Cyber Intelligence).


When we learn C, we are taught that main is the first function called in a C program. But in reality, main is simply a convention of the standard library.

root@raminfp:# cat main.c
#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("hello world\n");

    return 0;
}

root@raminfp:# gcc main.c
root@raminfp:# a.out
hello world
root@raminfp:#

Now we decode a.out with gdb tools:

root@raminfp:# gdb a.out

Starting program: ~/a.out 

 [----------------------------------registers-----------------------------------]
RAX: 0x5555555546a0 (<main>:	push   rbp)
RBX: 0x0 
RCX: 0x0 
RDX: 0x7fffffffddf8 --> 0x7fffffffe1da ("LC_PAPER=fa_IR")
RSI: 0x7fffffffdde8 --> 0x7fffffffe18d ("/syscall/C_syscall_without_standard_library_linux/a.out")
RDI: 0x1 
RBP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
RSP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
RIP: 0x5555555546a4 (<main+4>:	sub    rsp,0x10)
R8 : 0x555555554740 (<__libc_csu_fini>:	repz ret)
R9 : 0x7ffff7de8bd0 (<_dl_fini>:	push   rbp)
R10: 0x10000000000 
R11: 0x7ffff7ffa19c (mov    ch,BYTE PTR [rdx])
R12: 0x555555554570 (<_start>:	xor    ebp,ebp)
R13: 0x7fffffffdde0 --> 0x1 
R14: 0x0 
R15: 0x0
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x55555555469b <frame_dummy+43>:	jmp    0x5555555545e0 <register_tm_clones>
   0x5555555546a0 <main>:	push   rbp
   0x5555555546a1 <main+1>:	mov    rbp,rsp
=> 0x5555555546a4 <main+4>:	sub    rsp,0x10
   0x5555555546a8 <main+8>:	mov    DWORD PTR [rbp-0x4],edi
   0x5555555546ab <main+11>:	mov    QWORD PTR [rbp-0x10],rsi
   0x5555555546af <main+15>:	lea    rdi,[rip+0x9e]        # 0x555555554754
   0x5555555546b6 <main+22>:	call   0x555555554560
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
0008| 0x7fffffffdd08 --> 0x7ffff7a313f1 (<__libc_start_main+241>:	mov    edi,eax)
0016| 0x7fffffffdd10 --> 0x7ffff7dce798 --> 0x7ffff7a30d30 (<init_cacheinfo>:	push   r15)
0024| 0x7fffffffdd18 --> 0x7fffffffdde8 --> 0x7fffffffe18d ("/home/raminfp/Desktop/syscall/C_syscall_without_standard_library_linux/a.out")
0032| 0x7fffffffdd20 --> 0x1f7b9a888 
0040| 0x7fffffffdd28 --> 0x5555555546a0 (<main>:	push   rbp)
0048| 0x7fffffffdd30 --> 0x0 
0056| 0x7fffffffdd38 --> 0xdac473773e2a1848 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x00005555555546a4 in main ()

notice the output of gdb run command. The first function in reality is _start

0x555555554570 (<_start>:	xor    ebp,ebp)

Now if we try to compile our current code with -nostdlib gcc option, we will run into linker errors as shown below:

root@raminfp:# gcc -s -O2 -nostdlib main.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310
/tmp/ccqHCAhy.o: In function `main':
main.c:(.text.startup+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status

The linker is complaining about missing _start. We have a linker error on puts, which is a libc function. So how do we print "hello world" without puts?

The answer is Linux kernel exposes a bunch of syscalls (system-calls), which are functions(APIs) that user-space programs can use to interact with the OS. You find listd of syscall table: https://github.com/torvalds/ ... /syscalls/syscall_64.tbl

Lets find out which syscall uses puts. For that we can use tools strace.

root@raminfp:# cat puts.c

#include <stdio.h>

int main(int argc, char* argv[])
{
    puts("hello");

    return 0;
}

root@raminfp:# gcc puts.c
root@raminfp:# strace ./a.out > /dev/null

execve("./a.out", ["./a.out"], [/* 69 vars */]) = 0
brk(NULL)                               = 0x557f38db6000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079d0000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127890, ...}) = 0
mmap(NULL, 127890, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda079b0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0
mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda073e8000
mprotect(0x7fda075a5000, 2097152, PROT_NONE) = 0
mmap(0x7fda077a5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fda077a5000
mmap(0x7fda077ab000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda077ab000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079ae000
arch_prctl(ARCH_SET_FS, 0x7fda079ae700) = 0
mprotect(0x7fda077a5000, 16384, PROT_READ) = 0
mprotect(0x557f38ce9000, 4096, PROT_READ) = 0
mprotect(0x7fda079d3000, 4096, PROT_READ) = 0
munmap(0x7fda079b0000, 127890)          = 0
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, TCGETS, 0x7ffd9aaa7a40)        = -1 ENOTTY (Inappropriate ioctl for device)
brk(NULL)                               = 0x557f38db6000
brk(0x557f38dd8000)                     = 0x557f38dd8000
write(1, "hello world\n", 6)            = 6
exit_group(0)                           = ?
+++ exited with 0 +++

In output as shown above, we can see write(1, "hello world\n", 6). Which means it is fine to replace puts() to write() API as shown below.

root@raminfp:# whatis write
write (2)            - write to a file descriptor
write (1)            - send a message to another user

root@raminfp:# man 2 write
root@raminfp:# cat write.c 
#include <unistd.h>
#include <stdio.h>

int main(int argc, char* argv[])
{

    write(1, "hello world\n", 6);

    return 0;
}

root@raminfp:# gcc -s -O2 -nostdlib write.c 
write.c: In function ‘main’:
write.c:12:5: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
     write(1, "hello world\n", 13);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310
/tmp/ccYMZ2gc.o: In function `main':
write.c:(.text.startup+0x16): undefined reference to `write'
collect2: error: ld returned 1 exit status

Oops! even the "write" function is a part of the standard library !

So we can use Calling convention https://en.wikipedia.org/wiki/Calling_convention for it: http://stackoverflow.com/questions/ ...

1. User-level applications use integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.

2. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.

3. The number of the syscall has to be passed in register %rax.

4. System-calls are limited to six arguments, no argument is passed directly on the stack.

5. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno.

6. Only values of class INTEGER or class MEMORY are passed to the kernel.

this will be our syscall wrapper (an Intel x86 syntax):
root@raminfp:# cat syscall.S
.intel_syntax noprefix

.text
    .globl syscall

    syscall:
        mov rax,rdi
        mov rdi,rsi
        mov rsi,rdx
        mov rdx,rcx
        mov r10,r8
        mov r8,r9
        syscall
        ret

rax = syscall number here for write is 1
rdi = param1
rsi = param2
rdx = param3
rcx = param4
r8  = param5
r9  = param6


Now we can use Assm(Assembly) and C for our new hello world program and compile the same:

root@raminfp:# cat assm_syscall.S
// Putting it all together, our _start function needs to:
// - zero rbp
// - put argc into rdi (1st parameter for main)
// - put the stack address of argv[0] into rsi (2nd param for main),
//   which will be interpreted as an array of char pointers.
// - align stack to 16-bytes
// - call main

.intel_syntax noprefix
.text
    .globl _start, syscall

    _start:
	// _start function
	
        xor rbp,rbp  /* xoring a value with itself = 0 */
        pop rdi      /* rdi = argc */
        	     /* the pop instruction already added 8 to rsp */
        mov rsi,rsp  /* rest of the stack as an array of char ptr */

        and rsp,-16
        call main    // call main function 

	// _EXIT
	// man 2 _EXIT
	mov rdi,rax /* syscall param 1 = rax (ret value of main) */
        mov rax,60 /* SYS_exit */
        syscall
	ret

    syscall:
        mov rax,rdi
        mov rdi,rsi
        mov rsi,rdx
        mov rdx,rcx
        mov r10,r8
        mov r8,r9
        syscall
        ret



root@raminfp:# cat assm_syscall.c
void* syscall(
    void* syscall_number,
    void* param1,
    void* param2,
    void* param3,
    void* param4,
    void* param5
);

typedef unsigned long int uintptr; /* size_t */
typedef long int intptr; /* ssize_t */

static
intptr write(int fd, void const* data, uintptr nbytes)
{
    return (intptr)
        syscall(
            (void*)1, /* SYS_write */
            (void*)(intptr)fd,
            (void*)data,
            (void*)nbytes,
            0, /* ignored */
            0  /* ignored */
        );
}

int main(int argc, char* argv[])
{
    write(1, "hello world\n", 13);

    return 0;
}


Now if we compile the two source file assm_syscall.S and assm_syscall.c files as shown below, we get the same output as any standard libc printf() (or write(1,...)) output as you can see below.

root@raminfp:# gcc -s -O2 -nostdlib assm_syscall.S assm_syscall.c
root@raminfp:#./a.out
hello world


So this is how you can breakdown standard libc APIs and write a C code without using the same. And if required you can write your own custom libraries based on this technique !

Here is the same I published in Github: C source code and Assembly linux system call


Featured Video:



Suggested Topics:


☆ Tutorials :: Arduino UNO Projects ↗


☆ Tutorials :: Network Software Development ↗


☆ Tutorials :: Research and Projects ↗


☆ Tutorials :: Linux (user-space), Systems Architecture ↗


☆ Tutorials :: Linux Kernel Software Development ↗


☆ Tutorials :: Linux Kernel Internals (PDFs) - by Ramin Farajpour ↗


☆ Tutorials :: Software Development (Programming) Tools ↗


☆ Tutorials :: Embedded Projects ↗

Join The Linux Channel :: Facebook Group ↗

Visit The Linux Channel :: on Youtube ↗


💗 Help shape the future: Sponsor/Donate


Recommended Topics:
Featured Video:
Watch on Youtube - [205//0] OSIDays - 12th Open Source India 2015 Bengaluru - Episode2 - SUSE Stall ↗

libpcap Library | Linux User-space Network Stack Development ↗
Sunday' 06-Aug-2023
libpcap is a very popular user-space networking library, with which you can capture and or generate packets. libpcap is the underlying framework for many popular packet capture tools such as tcpdump, Wireshark and so on. In fact libpcap is a part of tcpdump project. But besides just using it as a packet capture tool, you can use libpcap in various applications, such as user-space based networking stack development, etc. In some cases libpcap is yet another alternative to raw-sockets and tun/tap interfaces.

The Linux Channel :: Sponsors ↗
Monday' 30-May-2022
Here is a list of all The Linux Channel sponsors/donors (individual/companies).

Inline Programming | Assembly | Scripts | php, python, shell, etc | Rust in Linux Kernel ↗
Friday' 12-May-2023
Inline programming is a technique where code statements are included directly in the text of a program, instead of being contained in separate files or modules. Inline programming can be useful for small or simple tasks, as it can eliminate the need for a separate script or function. One common example of inline programming is using JavaScripts, Php, etc in HTML documents to create dynamic content. Similarly in Linux Kernel we can find lot of instances where we can find inline programming such as inline assembly and now Rust within the Kernel source.

Linux Kernel /sysfs Interface ↗
Saturday' 14-May-2022
/sysfs is one of the most popular kernel to user-space interface which you can leverage to add an interface to your Kernel code such as Kernel modules, Kernel Device Drivers, etc. Although personally I prefer /proc interface than other alternatives such as /sysfs, ioctl() and so on for my personal Kernel modules/stack. So here is my detailed multi-episode Youtube video series on /sysfs Interface.

Rockchip ROC-RK3566-PC from Firefly | OpenWRT ↗
Thursday' 19-Oct-2023
Here is my multi-episode video series on evaluation of Rockchip ROC-RK3566-PC from Firefly with stock OpenWRT firmware.

What is purpose of Kernel Development - Example SMOAD Networks SDWAN Orchestrator Firewall Kernel Engine ↗
Monday' 18-Jul-2022
Often aspiring students may have this question, that what is the purpose of Linux Kernel Development. Since Linux Kernel is very mature and it has almost everything one would need. Usually, we need custom kernel development in the case of any new driver development for new upcoming hardware. And this happens on and on. But at times we may also come across few features/modules/components which are already provided by the Linux Kernel which are not adequate or atleast not the way we exactly intended to use. So, this is the real-world example, sometimes no matter what Linux Kernel provides as a part of stock Kernel/OS features, sometimes we have to write our own custom kernel stack or module(s) which can specifically cater our exact needs.

Linux Kernel Driver Device Trees ↗
Tuesday' 17-Jan-2023
The Linux kernel is the backbone of the Linux operating system. A device tree is a hierarchical tree structure that describes the various devices that are present in a system, including their properties and relationships to one another. The device tree is used by the Linux kernel to identify and initialize the different devices on a system, and to provide a consistent interface for interacting with them.

Linux Kernel vs User-space - Library APIs - Linux Kernel Programming ↗
Friday' 27-Oct-2023
One of the important aspects a beginner who is into Linux Kernel space systems software development has to understand is that unlike user-space C/C++ programming, where you can freely include any library APIs via respective #include files (which are dynamically linked during run-time via those /lib .so files), in the case of Kernel space programming, these library APIs are written within the Kernel source itself. These are the fundamental APIs which we commonly use, such as memcpy(), memcmp(), strlen(), strcpy(), strcpy() and so on. So here is my detailed Youtube video episode on the same with live demo, walk-through and examples.

Porting Sample libpcap C code to Raw Sockets | User-space Network Stack Framework ↗
Monday' 04-Sep-2023
Here is my multi-episode video series where I demonstrate how you can port the my libpcap sample code, discussed in the earlier episode to raw-socket. This code should further help you to design and architect your own user-space Network stack on top of this fundamental framework.

Roadmap - How to become Systems Software Developer ↗
Friday' 13-May-2022
When you are at the beginning of your career or a student, and aspire to become a software developer, one of the avenues to choose is to become a hard-core Systems Software Developer. However it is easier said than done, since there are many aspects to it as you explore further. As a part of systems developer, you can get into core kernel space developer, kernel device drivers developer, embedded developer and get into things like board bring-up, porting, etc, or can become a user-space systems programmer, and so on. So here is my detailed multi-episode Youtube video series on Roadmap - How to become Systems Software Developer.


Trending Video:
Watch on Youtube - [442//0] x225 Linux Kernel Dummy Network Interface /drivers/net/dummy.c Network Namespace Research Part-2 ↗

Linux Kernel Network Programming ↗
Thursday' 19-Oct-2023



Recommended Video:
Watch on Youtube - [4879//0] Linux Kernel Compilation - part2 - Ubuntu Kernel Compilation ↗