Project 1

Implement First-Come First-Serve, Shortest Job First, Shorted Remaining Time First, Shortest Job First wtih exponential averaging, and Round-Robin Scheduling algorithms.

You need to have some program generate at least 1000 processes with around (on average) 750 CPU bursts and 250 IO bursts. CPU bursts should be on average 5 to 1000 time units. IO bursts should be on average 500 to 10000 time units. This data is to be generated using a random number generator. (they should not be all 750 CPU bursts, etc., that is just an average. Some processes may have 200 CPU bursts, etc.) The format should be something like the below:

``P1, ARRIVED:10, CPU:20, IO:600, CPU:90, IO:900, CPU:10''
(obviously the data will be much much much bigger; with one process per line).

Everyone must use this file format (I need to be able to feed my file into your simulation); you can then run your test numbers against other student's simulations; and spot bugs in either your version or theirs.

For all algorithms, assume 2 time units to do context switch. For round-robin, use time quanta of 10, 50, 100, 500, 1000, 1000000.

Assume ready-queue capacity of 20, 100, and unbounded (e.g. if capacity is 20, and you have a process arriving, you cannot put it on the ready queue; it has to wait)..

Note that processes must start and end with a CPU burst (not an IO burst), and cannot have 2 IO bursts sequentially; there needs to be more CPU bursts than IO bursts. There can be 2 sequential CPU bursts.

In your simulations, assume the process doesn't have to wait to perform IO (as it has to wait for the CPU). This is not realistic though, as 2 processes cannot access the same IO device at the same time, but is a simplifying assumption for this simulation.

After generating all that data, you need to run it through the algorithms and for each, produce:

CPU utilization (what percentage of the time does the CPU spend doing actual work - does not include context switching or when CPU is idle).

Throughput - How many processes are going through the system per 1000 time units. (you can just count number of process competitions per 1000 time units).

Turnaround time - Average time it takes for a process to enter the system, get executed and exit. From original arrival time to the time the process finishes its last CPU burst.

Waiting time: Average, Standard Deviation, and Median waiting time - how much time the processor spends on the ready queue waiting to execute (this does not include time the processor spends doing IO). Waiting time starts when the process enters the ready queue (not when it enters the system).

In the end, you should have:

1. First Come First Serve

2. Shortest Job First

3. Shortest Remaining Job First

4. Shortest Job First using exponential average to predict the next CPU burst length (instead of the actual computed values). The computed values will be the 'actual' time the process spends, but this simulation determines how good the predictive power of exponential average is compared to pure deterministic approach (where all values are known). See page 159 of dinosaur book for equation to use (it's in the chapter on scheduling).

5. Round Robin, with the quantum 10.

6. Round Robin, with time quantum 50.

7. Round Robin, with time quantum 100, etc., (other quantum numbers).

8. All of the above with different ready queue sizes.

Write a 1-2 page paper describing your project, what problems you've faced and how you've overcome them. In your opinion, what is the best scheduling algorithm? What are practical limitations on that algorithm? (is it best only because it has detailed data, or does it adapt well like the SJF with exponential average prediction?)

What is the effect of increasing the size of the ready queue? What is the effect of decreasing it? What is the effect of decreasing context switch time to 1 time unit? What is the effect of increasing it to 10? What percentage of turnaround time is the process waiting?

Your simulation must be able to run on UNIX/Linux. In other words, if you're using C/C++, make sure the proram compiles using GCC (and provide a makefile). Write portable code. Do not use features that are only found in Windows API. If using Java, make sure file paths will work on Unix base system. Use any programming language (I think I have all of them installed on my computer, if not sure, check with me); if using C#, make sure your code will compile and run under Mono. (I plan to test your code under Linux).

Do not use any GUI interface. Simulation will either accept a filename as argument, or read standard input. The output will go into a file, or be output to standard output (which can be redirected to a file).

Do not make the simulation interactive. Make it scriptable (ie: I should be able to run 1 batch file (or shell script) that will run the entire thing with all the parameters). [there are many students in the class, I can't fiddle with the code of every single student---if it doesn't compile/run just by typing "run.sh" (or something) I'll email it back to you]. Make it easy for me to run.

What to submit:

Submit your initial test data (in some well formatted fashion - make it readable). That should be smaller than the dataset you use to run the simulation. Submit the program that generated the simulation data (for the final numbers).

Submit ALL source code with detailed instructions on how and where to compile it, and how to make it run. You should submit a Makefile to build your code under GCC, Java, or whatever language you use. Note that Visual C++ also supports Makefiles, so if you use that, you can still export a makefile.

Submit your findings about each algorithm; average numbers (waiting times, etc.) (in addition to your 1-2 page paper)

Submit your paper describing the project.

Submit a file named: description.txt that describes what and where all the information is stored. (which file does what, which file contains results of which algorithm, etc.). This is mostly so that I don't get lost in your project directory.

Note: All descriptions and text should be in TEXT format. Do NOT submit MS Word documents, etc. Everything has to be readable without any special programs. (I cannot read Word documents. If something "has" to be formated, use PDF).

You may use any language you wish, but you need to document what language you're using and how to compile code somewhere in your submission. Also comment your code! If I can't read it, it's wrong!

When submitting, you're very likely to have many files. You can compress them into a tar.gz or zip and submit that.

Tips: For many of you, this will be the biggest project you've ever worked on. It helps to organize the code from the start, to document everything, etc. Make the code readable (not just for you, but for me as well). Modularize your code. Work on 1 thing at a time. Start by generating the data and placing it in some file. Then implement basic algorithms and data structures like lists, queues, sorting, etc. Then implement First-Come, First-Served, make sure that works well, then implement SJF, probably reusing some parts of the First-Come, First-Served code, etc.

And most importantly: Organize and design the project and know what you're doing before you start coding. (and don't wait until the last weekend to do it). Approached gradually and systematically this isn't a big project---approached during the last weekend, this is an impossible project---don't wait until the last day.

You are free to discuss the project with your class mates. In fact, you are encouraged to do that! BUT, you are NOT to share code! If I find same code (or code derived from a single source) I will simply split the grade (ie: if 2 people do it, their grade will be cut in half... if 3 people do it, their grade will be split 3 ways, and so on). Just wanted to make that clear. Share ideas, but keep your source code secure from everyone. You have been warned.

Check your results

Easy way to see if your data makes sense is: turnaround time = time process waits to get on the ready queue + time process uses the CPU + time process does IO + time process waits on the ready queue.

This implies that knowing all these, you can easily check whether any number is correct (ie: you cannot have process waiting 55% of the turnaround time doing IO 25% of the turnaround time, and executing 35% of the turnaround time---percentages have to add upto 100%).

proj1_hint.txt