main | forum
March 14th, 2024    

CP207
Main
Overview
Forum

Notes
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
0016

Tests/Quizes
quiz1
quiz2
quiz3
quiz4
quiz5
midterm
northwoods db
midterm ans
final

Notes 0001

Database Systems

Traditional Database Systems

When companies first began using computers, each application had its own set of data files, which were used only by that application.

For example, a bank may have checking account programs which operated on checking account data files. Savings account programs, which operated on savings account data files, etc.

There are several major problems with this approach. Some of the major ones are:

Data redundancy and inconsistency: For our bank example, your name and address may be stored in both savings account and checking account data files. This wastes space. If you ever changed your phone number or address, the change has to be updated everywhere; if it wasn't, then data becomes inconsistent.

Accessing data difficulties: Sine each application can only access its own files, accessing unexpected data is a major problem. If for example the bank needed to know the names and addresses of every customer who has more than $10,000 in their checking account, unless they already have such a search, it would be very difficult to accomplish (requiring writing a new application component just for that search).

Data isolation: Because data is saved in various places in various formats, it's difficult to write new applications that work with the data.

Integrity problems: The database data must satisfy some consistency constraints. For example, a bank account may never drop below a certain amount, etc. When changes to these constraints change, applications need to be modified/rewritten in order to accommodate them (not a simple task to fix something that’s relatively simple).

Atomicity problems: There is no easy way to ensure atomic transactions. For example, in our bank example, let's say a user wanted to transfer money from their savings account into a checking account – and right in the middle of this process, the system fails. The result could be unpredictable: the savings account may have been deducted, but checking may not have had time to be incremented.

Concurrent Access problems: When several users access such data files, it is very easy for data to become inconsistent simply because of interacting users. For example, one user may notice $100 in the account, and decide to withdraw it; but before the withdrawal could be completed, another user notices the $100 and withdraws it first. Now the 1st user is withdrawing $100 from an empty account.

Security Problems: Not every user of the database should be allowed to access all the data. Enforcing such security with disconnected programs using various data files of various formats is a problem.

Personal Databases

Originally, databases were running on big mainframe computers. These were mostly accessed via dumb terminals. Once PCs became relatively cheap, a new database concept emerged: personal databases.

Now, the user could run the database on their own computer, accessing a network shared data file (note that this has some of the problems point out in the previous section).

There are also other types of personal databases: those meant for a single user environment. Microsoft Access is an example of such a database. It is meant to run on a relatively low powered machine, and is not designed for enterprise data handling in mind (it lacks data locking features, among other things).

Clint/Server Database Systems

To overcome some of the limitations mentioned in previous paragraphs, client/server databases were invented. The idea is fairly basic: you have a database server somewhere on the network. It manages its own data files. If you wanted to insert data, get data, work on data, etc., you simply tell the database server what to do.

The key is that now, instead of you doing all of the data processing on your own machine, the database server does everything, and sends you the results.

Solved Problems

Because the database is now a single separate entity that's responsible for the data, some of the "data file" approach problems were overcome.

Data is now managed in one place, so data redundancy is minimized. You can still make bad database designs that have redundant data, but you can also avoid it easily.

Security is managed in one place. Users can be restricted from allowing certain things on certain data.

Transaction processing has become a reality. Advanced databases such as oracle, etc., maintain a transaction log, so if something happens (database crashes, etc.) the server can return the database to a correct state.

Less advanced databases such as Microsoft Access store the transaction log in memory, which means that if the computer crashes while in the middle of a transaction, the database can be in an inconsistent state.

Error recovery is also improved greatly. If you only have to recover one server, as opposed to many independent programs and their respective data files, you save time.

Relational Databases

Reason

When each program managed its own data, in its own format, it didn't really matter much how the data was stored. As long as some particular program can read it, it's ok. It wasn't like there needed to be a common conceptual data format for everybody to obey.

Now, once data started to be stored in a single place (the database server), the needed to be some common way of storing all the data. Several approaches as possible, among the main ones are: hierarchical or relational.

Hierarchical approach is flexible, but it depends on links between data nodes, and most disk storage media aren't really friendly to linked data structures.

Relational database views data in terms of tables, or matrices with columns and rows. Columns represent different data categories, and rows contain the actual data values.

Columns are also called fields, and rows are called records.



































© 2006, Particle