In software development, reading and writing files is a frequent operation, and speed improvement greatly affects the overall performance of the software. When reading and writing these files, why use mmap to manipulate files faster than normal system calls?
When a user executes a program in the operating system, the program uses two areas: user space and kernel space. User space is freely accessible by programs, but kernel space is not directly accessible by programs. Separating the resource space into two is excellent in terms of security, but it is inconvenient for the user to be unable to handle the kernel space because processing involving hardware tasks such as reading and writing files can only be performed in the kernel space.
For this reason, the system call is a bridge between the user space and the kernel space so that the user can handle the kernel space. For example, when fetching a file, a file descriptor for file input/output is created using the Open system call, a read system call is called, and the file descriptor file data is read into the buffer to enable data manipulation.
It is a general procedure when working with files in the operating system, but it is also possible to call the mmap system for reading and writing files. System calls that can map mmap files to operating system virtual memory can read and write files from the mapped virtual memory address, eliminating the need to use other system calls.
In order to compare the file operation speed in the case of using the general system call and the case of using mmap, the sequential and random read speeds in each case are measured in 4KB, 8KB, and 16KB block sizes. In fact, when data exists in the buffer cache, mmap loading speed is faster in the sequential read speed measurement results.
As in the case of continuous reading, the random read speed exceeds the normal system call loading speed in mmap. It can be seen that when sequential reading is performed in a 16KB block size, the CPU usage and the time taken to copy data from kernel space to user space (copy_user_enhanced_fast_string) account for 61% of the total program execution time. In addition, 15% of the time is spent on commands that involve moving to user space in other kernel space (functions do_syscall_64, entry_SYSCALL_64). Using mmap, 61% is spent on’__memmove_avx_unaligned_erms’. In other words, it can be said that the difference in efficiency between the part that takes up most of the processing by normal system calls (copy_user_enhanced_fast_string) and the part that takes up much processing with mmap (__memmove_avx_unaligned_erms) greatly affects the difference in read speed.
The difference in efficiency between the two corresponds to AVX, which treats data as multiple streams, and’__memmove_avx_unaligned_erms’ supports AVX and can efficiently use memory bandwidth, but’copy_user_enhanced_fast_string’ does not support AVX, so the bandwidth cannot be utilized as much as possible. This is why mmap can perform file operations faster than normal system calls.
The reason why general system calls do not support AVX is that registers are stored and manipulated for each system call, which increases the processing load for moving user space and kernel space. As a result, it has revealed that it may run faster applications that replace file operations with mmap in normal system calls. Related information can be found here.