CASACURSOSESTUDANTESDOAÇÕESVÍDEOSEVENTOSTUTORIAISLINKSNOTÍCIACONTATO


TUTORIALS 》 A Linux system call in C without a standard library

When we learn C, we are taught that main is the first function called in a C program. But in reality, main is simply a convention of the standard library.

[email protected]:# cat main.c
#include <stdio.h>

int main(int argc, char* argv[])
{
    printf("hello world\n");

    return 0;
}

[email protected]:# gcc main.c
[email protected]:# a.out
hello world
[email protected]:#

Now we decode a.out with gdb tools:

[email protected]:# gdb a.out

Starting program: ~/a.out 

 [----------------------------------registers-----------------------------------]
RAX: 0x5555555546a0 (<main>:	push   rbp)
RBX: 0x0 
RCX: 0x0 
RDX: 0x7fffffffddf8 --> 0x7fffffffe1da ("LC_PAPER=fa_IR")
RSI: 0x7fffffffdde8 --> 0x7fffffffe18d ("/syscall/C_syscall_without_standard_library_linux/a.out")
RDI: 0x1 
RBP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
RSP: 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
RIP: 0x5555555546a4 (<main+4>:	sub    rsp,0x10)
R8 : 0x555555554740 (<__libc_csu_fini>:	repz ret)
R9 : 0x7ffff7de8bd0 (<_dl_fini>:	push   rbp)
R10: 0x10000000000 
R11: 0x7ffff7ffa19c (mov    ch,BYTE PTR [rdx])
R12: 0x555555554570 (<_start>:	xor    ebp,ebp)
R13: 0x7fffffffdde0 --> 0x1 
R14: 0x0 
R15: 0x0
EFLAGS: 0x246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x55555555469b <frame_dummy+43>:	jmp    0x5555555545e0 <register_tm_clones>
   0x5555555546a0 <main>:	push   rbp
   0x5555555546a1 <main+1>:	mov    rbp,rsp
=> 0x5555555546a4 <main+4>:	sub    rsp,0x10
   0x5555555546a8 <main+8>:	mov    DWORD PTR [rbp-0x4],edi
   0x5555555546ab <main+11>:	mov    QWORD PTR [rbp-0x10],rsi
   0x5555555546af <main+15>:	lea    rdi,[rip+0x9e]        # 0x555555554754
   0x5555555546b6 <main+22>:	call   0x555555554560
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffdd00 --> 0x5555555546d0 (<__libc_csu_init>:	push   r15)
0008| 0x7fffffffdd08 --> 0x7ffff7a313f1 (<__libc_start_main+241>:	mov    edi,eax)
0016| 0x7fffffffdd10 --> 0x7ffff7dce798 --> 0x7ffff7a30d30 (<init_cacheinfo>:	push   r15)
0024| 0x7fffffffdd18 --> 0x7fffffffdde8 --> 0x7fffffffe18d ("/home/raminfp/Desktop/syscall/C_syscall_without_standard_library_linux/a.out")
0032| 0x7fffffffdd20 --> 0x1f7b9a888 
0040| 0x7fffffffdd28 --> 0x5555555546a0 (<main>:	push   rbp)
0048| 0x7fffffffdd30 --> 0x0 
0056| 0x7fffffffdd38 --> 0xdac473773e2a1848 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x00005555555546a4 in main ()

notice the output of gdb run command. The first function in reality is _start

0x555555554570 (<_start>:	xor    ebp,ebp)

Now if we try to compile our current code with -nostdlib gcc option, we will run into linker errors as shown below:

[email protected]:# gcc -s -O2 -nostdlib main.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310
/tmp/ccqHCAhy.o: In function `main':
main.c:(.text.startup+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status

The linker is complaining about missing _start. We have a linker error on puts, which is a libc function. So how do we print "hello world" without puts?

The answer is Linux kernel exposes a bunch of syscalls (system-calls), which are functions(APIs) that user-space programs can use to interact with the OS. You find listd of syscall table: https://github.com/torvalds/ ... /syscalls/syscall_64.tbl

Lets find out which syscall uses puts. For that we can use tools strace.

[email protected]:# cat puts.c

#include <stdio.h>

int main(int argc, char* argv[])
{
    puts("hello");

    return 0;
}

[email protected]:# gcc puts.c
[email protected]:# strace ./a.out > /dev/null

execve("./a.out", ["./a.out"], [/* 69 vars */]) = 0
brk(NULL)                               = 0x557f38db6000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079d0000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=127890, ...}) = 0
mmap(NULL, 127890, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fda079b0000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0
mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fda073e8000
mprotect(0x7fda075a5000, 2097152, PROT_NONE) = 0
mmap(0x7fda077a5000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7fda077a5000
mmap(0x7fda077ab000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fda077ab000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fda079ae000
arch_prctl(ARCH_SET_FS, 0x7fda079ae700) = 0
mprotect(0x7fda077a5000, 16384, PROT_READ) = 0
mprotect(0x557f38ce9000, 4096, PROT_READ) = 0
mprotect(0x7fda079d3000, 4096, PROT_READ) = 0
munmap(0x7fda079b0000, 127890)          = 0
fstat(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, TCGETS, 0x7ffd9aaa7a40)        = -1 ENOTTY (Inappropriate ioctl for device)
brk(NULL)                               = 0x557f38db6000
brk(0x557f38dd8000)                     = 0x557f38dd8000
write(1, "hello world\n", 6)            = 6
exit_group(0)                           = ?
+++ exited with 0 +++

In output as shown above, we can see write(1, "hello world\n", 6). Which means it is fine to replace puts() to write() API as shown below.

[email protected]:# whatis write
write (2)            - write to a file descriptor
write (1)            - send a message to another user

[email protected]:# man 2 write
[email protected]:# cat write.c 
#include <unistd.h>
#include <stdio.h>

int main(int argc, char* argv[])
{

    write(1, "hello world\n", 6);

    return 0;
}

[email protected]:# gcc -s -O2 -nostdlib write.c 
write.c: In function ‘main’:
write.c:12:5: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
     write(1, "hello world\n", 13);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310
/tmp/ccYMZ2gc.o: In function `main':
write.c:(.text.startup+0x16): undefined reference to `write'
collect2: error: ld returned 1 exit status

Oops! even the "write" function is a part of the standard library !

So we can use Calling convention https://en.wikipedia.org/wiki/Calling_convention for it: http://stackoverflow.com/questions/ ...

1. User-level applications use integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.

2. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.

3. The number of the syscall has to be passed in register %rax.

4. System-calls are limited to six arguments, no argument is passed directly on the stack.

5. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno.

6. Only values of class INTEGER or class MEMORY are passed to the kernel.

this will be our syscall wrapper (an Intel x86 syntax):
[email protected]:# cat syscall.S
.intel_syntax noprefix

.text
    .globl syscall

    syscall:
        mov rax,rdi
        mov rdi,rsi
        mov rsi,rdx
        mov rdx,rcx
        mov r10,r8
        mov r8,r9
        syscall
        ret

rax = syscall number here for write is 1
rdi = param1
rsi = param2
rdx = param3
rcx = param4
r8  = param5
r9  = param6


Now we can use Assm(Assembly) and C for our new hello world program and compile the same:

[email protected]:# cat assm_syscall.S
// Putting it all together, our _start function needs to:
// - zero rbp
// - put argc into rdi (1st parameter for main)
// - put the stack address of argv[0] into rsi (2nd param for main),
//   which will be interpreted as an array of char pointers.
// - align stack to 16-bytes
// - call main

.intel_syntax noprefix
.text
    .globl _start, syscall

    _start:
	// _start function
	
        xor rbp,rbp  /* xoring a value with itself = 0 */
        pop rdi      /* rdi = argc */
        	     /* the pop instruction already added 8 to rsp */
        mov rsi,rsp  /* rest of the stack as an array of char ptr */

        and rsp,-16
        call main    // call main function 

	// _EXIT
	// man 2 _EXIT
	mov rdi,rax /* syscall param 1 = rax (ret value of main) */
        mov rax,60 /* SYS_exit */
        syscall
	ret

    syscall:
        mov rax,rdi
        mov rdi,rsi
        mov rsi,rdx
        mov rdx,rcx
        mov r10,r8
        mov r8,r9
        syscall
        ret



[email protected]:# cat assm_syscall.c
void* syscall(
    void* syscall_number,
    void* param1,
    void* param2,
    void* param3,
    void* param4,
    void* param5
);

typedef unsigned long int uintptr; /* size_t */
typedef long int intptr; /* ssize_t */

static
intptr write(int fd, void const* data, uintptr nbytes)
{
    return (intptr)
        syscall(
            (void*)1, /* SYS_write */
            (void*)(intptr)fd,
            (void*)data,
            (void*)nbytes,
            0, /* ignored */
            0  /* ignored */
        );
}

int main(int argc, char* argv[])
{
    write(1, "hello world\n", 13);

    return 0;
}


Now if we compile the two source file assm_syscall.S and assm_syscall.c files as shown below, we get the same output as any standard libc printf() (or write(1,...)) output as you can see below.

[email protected]:# gcc -s -O2 -nostdlib assm_syscall.S assm_syscall.c
[email protected]:#./a.out
hello world


So this is how you can breakdown standard libc APIs and write a C code without using the same. And if required you can write your own custom libraries based on this technique !

Here is the same I published in Github: C source code and Assembly linux system call


Featured Video:



Suggested Topics:


☆ Tutorials :: Arduino UNO Projects ↗


☆ Tutorials :: Network Software Development ↗


☆ Tutorials :: Research and Projects ↗


☆ Tutorials :: Linux (user-space), Systems Architecture ↗


☆ Tutorials :: Linux Kernel Software Development ↗


☆ Tutorials :: Linux Kernel Internals (PDFs) - by Ramin Farajpour ↗


☆ Tutorials :: Software Development (Programming) Tools ↗


☆ Tutorials :: Embedded Projects ↗

Join The Linux Channel :: Facebook Group ↗

Visit The Linux Channel :: on Youtube ↗


Join a course:

💎 Linux, Kernel, Networking and Device Drivers: PDF Brochure
💎 PhD or equivalent (or Post Doctoral) looking for assistance: Details
💎 ... or unlimited life-time mentorship: Details


💗 Help shape the future: Sponsor/Donate


Tópicos recomendados:
Featured Video:
Assista no Youtube - [4588//0] 351 Kernel customization via make menuconfig - Linux Kernel Compilation (or a Kernel Build) ↗

Roadmap - How to become Linux Kernel Developer - Device Drivers Programmer and a Systems Software Expert ↗
Friday' 10-Jul-2020
Many viewers and even sometimes my students ask me how I can become a kernel programmer or just device driver developer and so on. So I shot this video (and an add-on video) where I summarized steps and a roadmap to become a full-fledged Linux Kernel Developer.

Compiling a C Compiler with a C Compilter | Compile gcc with gcc ↗
Friday' 10-Jul-2020
The fundamental aspect of a programming language compiler is to translate code written from language to other. But most commonly compilers will compile code written in high-level human friendly language such as C, C++, Java, etc. to native CPU architecture specific (machine understandable) binary code which is nothing but sequence of CPU instructions. Hence if we see that way we should able to compile gcc Compiler source code with a gcc Compiler binary.

AT&T Archives: The UNIX Operating System ↗
Friday' 10-Jul-2020

CEO, CTO Talk ↗
Friday' 10-Jul-2020

Arduino UNO - RO Water Purifier Controller ↗
Friday' 10-Jul-2020
Here is a Youtube VLOG of my DIY RO Water Purifier Controller done via Arduino UNO. I want the Arduino UNO to control the RO pump, so that it pumps for a specific duration and stops automatically. This is done via Opto-isolated 4 Channel 5V 10A Relay Board meant for Arduino UNO, Raspberry Pi or similar SoC boards which offers GPIO pins. To this relay I have connected the RO water purifier booster pump which works at 24V DC connected via 220V AC to 24V DC power supply adaptar. I have also connected a small active 5V buzzer to notify the progress and completion as it fills the tank/canister.

CYRIL INGÉNIERIE - CoreFreq Linux CPU monitoring software ↗
Friday' 10-Jul-2020

Programming Language Performance and Overheads ↗
Friday' 10-Jul-2020
A detailed Youtube video series of various programming language performance and overheads - a big picture

Nmap Network Scanning ↗
Friday' 10-Jul-2020

Telnet installation and remote access ↗
Friday' 10-Jul-2020

Linux Kernel Network Programming - Transport Layer L4 TCP/UDP Registration - Protocol APIs ↗
Friday' 10-Jul-2020


Trending Video:
Assista no Youtube - [803//0] 214 Introduction and code-walk - Linux Kernel struct dst_entry datastructure - ep1 ↗

Linux Software Development and Tools ↗
Friday' 10-Jul-2020



Recommended Video:
Assista no Youtube - [440//0] 0x169 Synchronization in Linux user-space - Architecting multi-process and multi-threads ↗