CIS 25 - Notes 0015

One of the major reasons to have a distribute dystem is to be able to manage the load on the system. If you need to process potentially millions of customers/ requests/ queries/ etc. at the same time, you need to ensure that you can handle the load.

Vertical Scalability

There are several ways of increasing your capability of handling large loads. The most obvious is simply to upgrade the hardware. Put in more memory, get a faster computer, etc. This is called vertical scalability.

You can only go so far with this. There is a rather hard limit on the speed of any single CPU, or a machine loaded with a bunch of CPUs. ie: You cannot buy a 10000Ghz processor, no matter how much you're willing to spend.

While as times move on, you can vertially scale much more efficiently, the real key to building a scalable distributed system lies in...

Horizontal Scalability

Horizontal scalability refers to the idea of increasing performance/ throughput/ processing power, etc., by adding more machines.

This is the holy grail of scalability that everyone is always striving for. The idea of `use as much hardware as you need', and scale virtually infinitely with demand is very appealing. Trouble is that it doesn't work like that. In reality very often horizontal scalability is tied to vertial scalability.

For example, for your distributed system to work, the client needs to connect to a certain machine and be assigned a distributed host to use; that single machine becomes the bottle neck. Very often the database (many machines using a single database) becomes the bottle neck.

Role of Operating System/Environment

The role of the operating system is usually trivial. The operating needs to support basic networking, etc. Usually the bigger question is the role of the infrastructure. For example, you can develop a distributed system, connect to the database, etc., but you're not even aware that the database itself could be a database cluster, instead of the single entity that you see and use.

In high-end distributed systems the network layout/routing also play a major role. The system needs to be designed in such a way as to minimize network traffic.

Web Site Scalability

So given a simple website, how do you make it handle thousands of users at the same time? Let's say you're providing a service like Yahoo Mail, or hotmail, etc., and you need to be able to service a million customers from all over the world. How do you do it?

The answer is load balancing. The usual form of load balancing on websites takes the form of redirection. You setup a script/ servlet/ program/ server, that accepts the initial user request. Out of a list of available servers, it picks the least busy one, and redirects the user to that host. From then on, the user deals with that server, and no longer bothers the main website.

The list of available servers is periodically pinged and queried about their load; so the redirection script has a good idea of what's happening to the server farm.

This is software load balancing, and there are many available ones on the market (including ones from major software companies like Microsoft). There is also a form of hardware load balancing, that monitors the actual physical network traffic, and directs request based on that.

Both approaches have their good points and bad points; a lot network utilization doesn't mean the server isn't busy, and similarly, the server may be relatively free - yet be inputing and outputing lots of data that would clog the network. Usually a combination of these approaches works best.

Distributed Scalability

Scalability in distributed systems works pretty much the same as described earlier. The only difference may be the more dynamic nature of the process; with more unpredictable situations.

Most distributed architectures have a thing called a Naming service. It is a server that allows you to lookup service references based on their name. Often this service is modified to pass out service references on different computers; just like a website client may be redirected, this naming service just tells you where you can find the component that you're looking for. ie: exactly the same idea - different technology.