Linux
Comparison and analysis of select, poll, and epoll.

Select

#include <sys/select.h>
 
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);

The parameter nfds represents the maximum file descriptor value in the set of file descriptors to be monitored plus 1.

The sets readfds, writefds, and exceptfds represent the file descriptors to be monitored for readable, writable, and exceptional events respectively. A value of NULL can be passed for any set that does not need to be monitored.

The parameter timeout specifies the timeout period for the select function.

The return value of select indicates the number of ready file descriptors. A return value of 0 indicates that no file descriptor became ready within the timeout period, while a negative return value indicates an error.

There are three main drawbacks to using select:

  1. Every time select is called, the entire set of file descriptors to be monitored must be copied from user space to kernel space. This overhead increases as the number of file descriptors in the set grows.
  2. select uses polling to check the file descriptor sets for readiness, which can result in high overhead if many file descriptors are not ready.
  3. select can only monitor a limited number of file descriptors at a time. This limit is typically determined by the number of bits used to represent file descriptors (32 or 64). The fd_set is a bitmap where each bit represents a file descriptor. The initial number of bits is only 1024, which may be increased in newer versions, but the limit remains.

poll

#include <poll.h>
 
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
 

The poll function is similar to select in that it is used to wait for any one of a set of file descriptors to become ready. However, instead of using an fd_set, the poll function uses an array of pollfd structures to describe the set of file descriptors.

Each pollfd structure contains information related to a file descriptor, including the file descriptor itself, the events to be waited for, and the events that have occurred. When using the poll function, the file descriptors to be waited on and their corresponding event types are packaged into an array of pollfd structures and passed to the poll function.

One advantage of poll over select is that there is no limit to the maximum number of file descriptors that can be monitored. Another advantage is that the file descriptor set does not need to be re-initialized every time poll is called - only the array of pollfd structures needs to be updated. However, poll is generally less efficient than epoll.

While this solves the issue of the file descriptor limit in select, it still suffers from the following problems:

  1. The pollfd structures must still be copied to kernel space.
  2. Polling is still used to check for readiness.
  3. However, this data structure avoids the problem of too small a limit on the number of file descriptors that can be monitored.

Epoll

#include <sys/epoll.h>
 
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
 
struct epoll_event {
    uint32_t events; // 文件描述符事件类型,如 EPOLLIN, EPOLLOUT 等
    epoll_data_t data; // 用户私有数据,可以是一个指针或一个整数
};
 
typedef union epoll_data {
    void *ptr;
    int fd;
    uint32_t u32;
    uint64_t u64;
} epoll_data_t;

epoll is an I/O multiplexing mechanism that has higher efficiency in handling a large number of file descriptors compared to select and poll. Introduced in Linux, epoll has the following features:

  1. epoll manages multiple file descriptors with a single file descriptor, stores the events of the file descriptors that the user cares about into a kernel event table, reducing the number of copies between user space and kernel space.
  2. epoll uses a Red-Black Tree to maintain the set of file descriptors, enabling fast insertion and deletion, and uses a doubly-linked list to store the set of ready file descriptors, enabling quick access.
  3. epoll supports both Edge Triggered (ET) and Level Triggered (LT) modes, with LT being the default mode. In LT mode, when a file descriptor is ready, epoll_wait returns the file descriptor, while in ET mode, epoll_wait returns the file descriptor only when it transitions from not ready to ready.
  4. epoll does not decrease efficiency as file descriptors increase, as it is event-driven.

epoll_create is used to create an epoll instance and returns a file descriptor. size represents the number of nodes in the Red-Black Tree of the instance. epoll_ctl is used to add, modify, or delete a file descriptor from the epoll instance. op represents the type of operation and can be one of EPOLL_CTL_ADD, EPOLL_CTL_MOD, or EPOLL_CTL_DEL. fd is the file descriptor to be added, modified, or deleted, and event is the event type of the file descriptor to be added, modified, or deleted. epoll_wait is used to wait for events to occur and returns the list of ready file descriptors.

Compared to other methods,

  1. There is still a need for a kernel transition, but based on the kernel callback mechanism, it avoids frequent context switching.
  2. The use of callbacks instead of polling increases efficiency.
  3. With the ability to add new descriptors, there is no limit to the number of file descriptors.

In addition, both edge-triggered (ET) and level-triggered (LT) modes are supported.

In LT mode, every time epoll_wait() is called, the state of the descriptor is returned if it is ready.

In ET mode, the state of the descriptor must change from not ready to ready in order to be returned during the next epoll_wait() call.

summaries

Fselectpollepoll
timeO(n)O(n)O(1)
FD limit1024(default)no limitno limit
triggeredLTLTET and LT