CIS 25 - Notes 0007

In a multiprogramming environment where several processes compete for resources, a situation may arise where a process is waiting for resources that are held by other waiting processes. This situation is called a deadlock.

From your book: "Perhaps the best illustration of a deadlock can be drawn from a law passed by the Kansas legislature early in the 20th century. It said, in part: 'When two trains approach each other at a crossing, both shall come to a full stop and neither will start up again until the other has gone.'"

Introduction

Generally, a system has a finite set of resources (such as memory, IO devices, etc.) and a finite set of processes that need to use these resources.

A process which wishes to use any of these resources makes a request to use that resource. If the resource is free, the process gets it. If it is used by another process, it waits for it to become free.

The assumption is that the resource will eventually become free and the waiting process will continue on to use the resource. But what if the other process is also waiting for some resource?

"A set of processes is in a deadlock state when every process in the set is waiting for an event that can only be caused by another process in the set."

Imagine a situation where two processes have acquired a lock on two tape drivers (or hard drives, etc.), but need two such resources to proceed with execution (for example, to copy something from one to the other). Each is waiting for the other process to release the other tape drive, which will never happen, since the other is also waiting for the same thing.

Deadlock Characterization

A deadlock situation can arise if the following four conditions hold simultaneously in a system:

Mutual Exclusion: At least one resource must be held in a non-sharable mode; that is, only one process at a time can use the resource. If another process requests the resource, the requesting process must be delayed until the resource has been released.

Hold and Wait: A process must be holding at least one resource and waiting to acquire additional resources that are currently being held by other processes.

No Preemption: Resources cannot be preempted; that is, a resource can be released only voluntarily by the process holding it, after that process has completed its task.

Circular Wait: A set {P0, P1, P2, …, Pn} of waiting processes must exist such that P0 is waiting for a resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn-1 is waiting for a resource that is held by Pn, and Pn is waiting for a resource that is held by P0.

All four conditions MUST hold for a deadlock to occur.

Resource-Allocation Graph

The book presents a visualization tool for easier illustration of deadlock situations. (refer to section 8.2.2 and Figure 8.1 in the dinosaur book).

The idea is to have a graph. The graph has two different types of nodes, the process nodes and resource nodes (process represented by circles, resource node represented by rectangles). For different instances of a resource, there is a dot in the resource node rectangle. For example, if there are two identical printers, the printer resource might have two dots to indicate that we don't really care which is used, as long as we acquire the resource.

The edges among these nodes represent resource allocation and release. Edges are directed, and if the edge goes from resource to process node that means the process has acquired the resource. If the edge goes from process node to resource node that means the process has requested the resource.

Again, refer to Figure 8.1 on page 247 of the dinosaur book.

We can use these graphs to determine if a deadline has occurred or may occur. If for example, all resources only have one instance (all resource node rectangles have one dot) and the graph is circular, then a deadlock has occurred. If on the other hand some resources have several instances, then a deadlock may occur. If the graph is not circular, a deadlock cannot occur (the circular wait condition wouldn't be satisfied).

Methods for Handling Deadlocks

We can deal with the deadlock issue in several ways:

1. We can use specific protocols to prevent or avoid deadlocks (preventing it).

2. We can detect the deadlock and recover from it (recovering after it has occurred).

3. We can totally ignore the deadlock problem (pretend it doesn't exist). This is what most operating systems do (including UNIX and Windows).

Preventing deadlocks is ensuring that at least one of the necessary four deadlock conditions cannot occur. Avoiding a deadlock is knowing which resources the process will use beforehand and only run the process when all the resources are available for the process to use (not make it wait for individual resources one by one - which may cause a deadlock).

If we do not provide for deadlock prevention or deadlock avoidance, the system may enter a deadlock state. At this point, we may employ some deadlock detection scheme and a recovery (if there is indeed a deadlock).

If we don't prevent nor recover from it, the system will eventually have deadlocks. This is the strategy used by most operating systems. Luckily, deadlocks are usually rare, and the systems that are affected usually suffer from other freezing problems (process not releasing control in a non-preemptive environment, program errors, etc.) that make deadlocks seem unimportant. [ie: if you reboot your Windows every day because something gets messed up, you won't mind rebooting your Windows every year because of a deadlock].

Deadlocks Prevention

We can prevent deadlocks by ensuring that at least one of the four necessary conditions for deadlock cannot occur. Note, that if at least one condition is not satisfied, a deadlock will not occur.

Mutual Exclusion

Mutual exclusion condition must hold for non-sharable resources. For example, only one process can have access to a printer at a time, otherwise the output gets garbled up. Some resources can be made sharable, like a read-only file. Several processes can be granted real-only access to a file without them interfering with each other.

Generally, however, there will always be intrinsically non-sharable resources.

Hold and Wait

To prevent the Hold and Wait condition our process can allocate all resources that it needs before starting execution, that way, it won't have to wait for resources during execution (thus, preventing deadlocks).

Another strategy is to allow processes to allocate resources when they have none. Again, this prevents deadlocks since a process must release whatever resources it has before requesting more resources.

Each of these has some performance or resource utilization issues. If we allocate all resources at the beginning of the program, we will hold them all while the program executes. We may not need all of them all at once though, and so resources end up underused (since no other process can use them while we hold them).

Another problem is starvation. Some processes may never get to execute since some other process always has control of some popular resource.

Preemption

Preemption of resources means that we take away resources from processes when they are waiting for other resources. This could work several ways:

1. As soon as a process requests for a resource that is not available, all of its held resources are released. The process is now waiting for all of previously held resources plus the resource that it requested.

2. When a process requests a resource, if it is available, it gets it; else, a scan is made of waiting processes that hold that resource. The resource is taken away from the waiting process and allocated to the requesting process. If this cannot happen, the process is made to wait (at which point, some other process may grab some of its resources).

Again, there are some limitations of where and to what resources this can be applied.

Circular Wait

To prevent Circular Wait we can define an order by which processes allocate resources. For example, each resource type gets a number, and the processes can only allocate resources in increasing order of those resource numbers.

For example, if we setup the tape drive to have number 1, disk drive to have number 5, and printer to have number 12, a process that wants to read the disk drive and print out the results, will first need to allocate the disk drive, then the printer. It will be prevented from doing it in reverse order.

The book presents a proof why this prevents circular wait. It is not hard to see though, if every process is forced to follows this protocol, then processes will only wait for higher resources to be allocated and never wait for lower resources (and lower resources cannot be allocated by processes that are holding higher resources). This prevents the circular wait condition for a deadlock.

Deadlock Avoidance

Deadlock avoidance deals with processes that declare before execution how many resources they may need during their execution. If given several processes and resources, we can allocate the resources in some order as to prevent the deadlock, the system is said to be in a safe state, else, if a deadlock is possible, the system is said to be in an unsafe state.

The idea of avoiding a deadlock is to simply now allow the system to enter an unsafe state which may cause a deadlock. We can define what makes an unsafe state.

For example, consider a system with 12 tape drives and three processes: P0, P1 and P2. P0 may require 10 tape drives during execution, P1 may require 4, and P2 may require up to 9.

Suppose at that some moment in time, P0 is holding on to 5 tape drives, P1 holds 2 and P2 holds 2 tape drives. The system is said to be in a safe state, since there is a safe sequence that avoids the deadlock.

This sequence implies that P1 can instantly get all of its needed resources (there are 12 total tape drives, and P1 already has 2, so maximum it needs is at least 2, which it can get since there are 3 free tape drives). Once it finishes executing, it releases all 4 resources, which makes for 5 free tape drives, at which point, P0 can execute (since a maximum it needs is 10), after it finishes, P2 can proceed since a maximum it needs is 9.

Now, here's an unsafe sequence: Let's say at some time P2 requests one more resource to make its holding resources 3. Now, the system is in an unsafe state, since only P1 can be allocated resources, and after it returns, it will only leave 4 free resources, which is not enough for either P0 or P2, so the system enters a deadlock.

There are algorithms that determine if a safe state can be maintained after a resource allocation, and the resource will not be allocated that might cause the system to enter an unsafe state.

Your book describes two algorithms that would leave the system in a safe state: The Resource-Allocation Graph Algorithm, and Banker's Algorithm. You should read them on your own.

Deadlock Detection

If the system doesn't use any of the preventive or avoidance methods, it needs to provide for deadlock detection and recovery. Let us first worry about detection.

Single Instance of Each Resource Type

If each resource type has only one instance (There is only 1 printer, 1 hard drive, 1 tape drive, one screen, etc.) then we can use a variant of the resource-allocation graph called a wait-for graph.

The idea is that we collapse all the request and release edges from a resource-allocation graph into wait-for edges. (Refer to figure 8.7 on page 261 of the dinosaur book.) In the graph, processes are not waiting for resources, but for other processes that hold the resources. If the graph contains a cycle, a deadlock has occurred.

This approach requires that a cycle detection algorithm runs from time to time to detect deadlocks. Algorithms that detect cycles are O(N²), where N is the number of nodes.

There is an algorithm that deals with Several Instances of a Resource Type and again, you should read the details of it on your own.

Detection-Algorithm Usage

Detection algorithms need to be executed to detect a deadlock. The frequency and time when we run such algorithm is dependent on how often we assume deadlocks occur and how many processes they may effect.

If deadlocks may happen often, we run the detection often. If it affects many processes, we may decide to run it often so that less processes are affected by the deadlock.

We could run the algorithm on every resource request, but considering that deadlocks are rare, it is not very efficient use of resources. We could run the algorithm from time to time, say every hour, or when CPU utilization crosses some threshold, or at random times during the system execution lifetime.

Deadlock Recovery

We can recover from a deadlock via two approaches: we either kill the processes (that releases all resources for killed process) or take away resources.

Process Termination

When recovering from a deadlock via process termination, we have two approaches. We can terminate all processes involved in a deadlock, or terminate them one by one until the deadlock disappears.

Killing all processes is costly (since some processes may have been doing something important for a long time) and will need to be re-executed again.

Killing a process at a time until deadlock is resolved is also costly, since we must rerun deadlock detection algorithm every time we terminate a process to make sure we got rid of the deadlock.

Also, some priority must be considered when terminating processes, since we don't want to kill an important process when less important processes are available. Priority might also include things like how many resources are being held by that process, or how long has it executed, or how long it has to go before it completes, or how many resources it needs to complete its job, etc.

Resource Preemption

This approach takes resources from waiting processes and gives them to other processes. Obviously, the victim process cannot continue regularly, and we have a choice of how to handle it. We can either terminate that process, or roll it back to some previous state so that it can request the resources again.

Again, there are many factors that determine which process we choose as the victim.

Note: that if the system has resource preemption, by definition, a deadlock cannot occur. The type of resource preemption we are talking about here is non-normal preemption that only occurs when a deadlock detection mechanism detected a deadlock.

More Resources on Deadlocks

Distributed Systems (Chapter 17 of the book) also suffer from deadlocks. Read section 17.5 of the book for an illustration of some of the problems in preventing and detecting deadlocks in such systems.

Summary

In summary, a deadlock is when a process has some resource and is waiting to acquire another resource, while that resource is being held by some process that is waiting to acquire the resource that is being held by the original process.

A deadlock needs four conditions to occur: Mutual Exclusion (2 processes cannot use resource at the same time), Hold and Wait (process holds a resource and requests another), Non-Preemption of resources (the system doesn't take away resources from waiting processes - processes release resources voluntarily after they are finished using them), Circular Waiting (there must exist a waiting chain such that processes are waiting for other processes that are waiting for the original processes to release some resource).

We can handle deadlocks in three major ways: We can prevent them, Handle them when we detect them, or simply ignore the whole deadlock issue altogether.