[Linux Internals] Completely Fair Scheduling(CFS) — CPU Scheduler Algorithm
- Completely Fair Scheduling (CFS) is the default scheduler in the latest kernel versions since 2.6.23; elegant handling of I/O and CPU bound processes
- Both interactive and non-interactive processes can fit into this easily
- As the name suggests, each process receives equally fair CPU time for execution. Idea: If N processes are in the system, each process should have for (100N)% of the CPU time.
How CFS algorithm is incorporated in CPU scheduler?
Each process PCB(process control block) has an entry for ‘virtual runtime(vruntime)’. At every scheduling point, if the process has run for t ms, then its vruntime is incremented by t. So vruntime monotonically increases. i.e vruntime += t
Whenever timer interrupt or context-switch happens, it always chooses the next task with the lowest vruntime (min_vruntime). min_vruntime is a pointer which points to the lowest vruntime. Time slice will be dynamically recomputed; Process executes the task ; context-switch again occurs and cycle continues.
So, how the internal mechanism of CFS picking the next task to run works?
- CFS uses Red-Black tree data structure(self-balancing binary search tree) ; inserting/deleting tasks from the tree is O(logN)
- Each node represents a runnable task
- Nodes are ordered according to vruntime; Nodes in left-side have lower vruntime compared to the nodes on the right-side of the tree; i.e the left most node in the task with the least vruntime and that’s the node is referenced by min_vruntime
- At the point of context-switch:
- It picks the left most node of the tree in O(1) as its cached in min_vruntime
- If the previous process is still runnable, it is re-inserted into the tree with re-computed vruntime in O(logN); Tasks move from left to right side of the tree after its execution completes and hence “Starvation” has been avoided.
Starvation — > When high priority processes keep executing and low priority processes get blocked for indefinite time
- CFS does NOT use any priority-based queues and priority is used to just weigh the vruntime (i,e nice values).
How I/O and CPU bound processes are handled by CFS?
I/O bound processes should get higher priority and get a longer time to execute compared to CPU bound processes. Because I/O processes have low CPU burst time , so they will have lower vruntime. They mainly would appear on the left-side of the RBtree and so ending up with higher priorities!
What happens when a new process is created?
- It gets added to RB-tree
- Starts with initial value of min_vruntime and that ensures that it gets to execute quickly as possible[ Remember, lower vruntime tasks end up in left-side!]