Was having a chat with an administrator in our offices who for years had dealt with some of our staff when they were off to do Oracle work. Oracle work for the company tends to be the installation of Oracle and then our deploying our software or the migration of Oracle from one server to another.
They said that they would like to understand how Oracle actually works so they had some appreciation of what people were talking about and what people onsite were trying to do. So I wrote the below to try and help them out!
(Rather than draw everything out with Visio I decided to draw the pictures! Then got someone else to re-draw them as it turns out my drawing is as good as my writing)
What’s a database? And what is Oracle?
It’s not the most catchy title in the world but it’s the question that was asked of me. So to put it in terms that most people will understand let’s start with an activity that most businesses and individuals will have done at some point; storing address information.
Typically you will want to store the addresses of your customers (or friends) in a file somewhere much as you would have done before computers even existed! (Remember address books?) To do so we will assume that you create a single file called “ADDRESSES” and put your first address in that file. To be safe, and to let other people see that file, you store it on a server somewhere in your organisation.
So now you have a file which you can open and write to whenever you choose to and that you can share with others.
OK, so this works but what if someone else wants to read the file as well as you? For Microsoft servers such as (but not exclusively) Windows 2008 one person can open a file to write to it whilst another person can open a file to read it. This might work if you only want to share with one other person but is clearly going to be a problem if there is anyone else who might want to read it.
So let’s say that there are now 3 of you and you all want to be able to read and write to the file at the same time. One way to do this is the other two people relay their requests to read and write information to the file through you and you hold the file open permanently. But that’s a total waste of your time so why don’t we get a program to do that, hold open the file permanently and accept requests from multiple different people to read and write to the same file? Enter the database engine.
The database engine holds open your file permanently and you now ask it for the contents of the file and ask it to write to the file. The engine, being built for such things, will allow multiple requests to happen at the same time; like reading from two places in the file at the same time.
We could add in a few more files as well which the database engine could manage like a list of our products or perhaps a list of customers and their phone numbers. You might call these files PRODUCTS and CUSTOMERS as that seems logical.
These files live on the server still and are on a number of hard disks which the database engine reads from each time you ask it for some information (remember that the database engine has our 3 files open all the time). The hard disk(s) can only read and write information at a certain rate which is dictated by the design of the hardware that the server is made up of. This design is something that can make a huge difference to the speed of the return of information and a great deal of time is spent making sure that bottlenecks are removed to ensure high performance. Ultimately though the hardware can only do so much and if many many people are trying to read a lot of information at the same time then problems are likely to happen with the speed of retrieval.
To counter some of this you could move the very busy file (let’s say CUSTOMERS) from the hard disks that ADDRESSES and PRODUCTS are sitting on onto another hard disk. That will stop people using the CUSTOMERS file from affecting the ADDRESSES and PRODUCTS file users and vice versa. The database engine itself takes care of this and, because you only ever ask the database engine for information, and don’t go direct, the move of that file on the server will be transparent to the end user.
Datafiles
So this carries on and more little files are created to store information for the office. This gets a little messy with all these files lurking around on the hard disks and the database engine having to keep open all these little files. So the database creates a really big container (or file) and puts all these smaller files into that container. It can do this by making them virtual files within that big container/file and with some clever code that organises them. This means that the database engine now only holds open a single container/file which, in Oracle terms, is called a “Datafile”.
Now if you remember earlier we said that the CUSTOMERS file (now virtual) is very very busy so we don’t want that in the same Datafile as the ADDRESSES and PRODUCTS so we create a second Datafile and move that virtual file in there. We move that Datafile to another set of disks and we have helped our performance problem. The database doesn’t mind doing this as many times as you like depending on how many disks you have to use, and to the end user it wont make any difference as they don’t see these changes “under the hood”.
Tables
Next problem - the files are getting really chaotic as people add in new information in whatever format they like or they add in duplicates.The files are getting bigger and become increasingly difficult to find the information you are looking for. So we need to organise this information properly and give it structure.
Most of us are familiar with excel spreadsheets and how they can be used to store data. So for this random scribble I will assume that is the case. An excel spreadsheet is a series of columns and rows that makes up a grid. The rows are numbered 1 to lots and the columns are labelled A,B,C etc. Within a database the same thing can except we don’t call them excel spreadsheets and we call them “Tables”.
Within each Table there are a number of rows and columns, but rather than call the columns A,B,C etc we can give them whatever names we choose. So for our ADDRESSES file/Table we might call a column HOUSENUMBER and another one POSTCODE and in those columns we store the house number and post code. So for each row within that Table we transpose the house number and postcode into the correct column giving us nice, ordered, information.
Structured Query Language (SQL)
Our ADDRESSES table is now within our database and has data within it and this is where the second clever part of the database engine comes in. Most people don’t want to scroll through rows and rows of information in order to find what they are looking for instead they want to ask the database engine something like: “can you get me all the postcodes from the addresses table please”. The database engine will then go off and bring back all of the postcodes as you asked from the addresses table. This is great but we can extend this further and say: “can you get me all the postcodes from the addresses table where they are near EC4” (which is London). Again the database will happily return all of the postcodes which are near EC4 for you.
Except that databases don’t really understand English and when i say “near” it wont really understand what i mean because; what do i mean? So a method was created to ask database’s questions in order to retrieve data from them. This is called the “Structure Query Language” (SQL) and is commonly used no matter what database to retrieve data.
Using this language we can change some of my earlier questions into the Structure Query Language and our answers back from the database engine.
“Can you get me all the postcodes from the addresses table please”
SQL: "Select postcodes from ADDRESSES"
The above is straight forward; Select - well you could select things from a shelf couldn’t you? Select the beans from the shelf of tins.
Lets extend this further into the second question:
“Can you get me all the postcodes from the addresses table where they are near EC4”
SQL: "Select postcodes from ADDRESSES table where postcodes like 'EC4%'"
This is very similar to the first one except we are now appending a clause or caveat to our original request; can you only get those postcodes where they are like EC4%. The % symbol says that anything can replace it so it could bring back a postcode of “EC4 Elephant” except we really hope that the postcodes are properly typed in and it will only brings those back that start with EC4 and are really postcodes!
Now we can do some even more clever stuff by asking it to bring back other useful information like:
SQL: Select streetname from addresses where postcode = "EC4A 1AB"
This will now bring us back a whole load of street names in that postcode. Incidentally, if you hadn’t guessed, streetname is another column in the ADDRESSES table which contains the street name for an address. We could call this column “bob” but then it would be difficult to make sense of and the next person who comes along and tries to get information from our table would be mightily confused!
With column naming we are drifting perilously close to name conventions which is something programmers spend a lot of their time doing to ensure that the next person can understand what they have done and often so that they can understand what they have done themselves! I will steer us back away from the naming thing…
So we have our tables with our data in our datafiles.
But i now have two users (James and Alex) who both want their own copy of ADDRESSES because they want their own Christmas card list and don’t want to share. Now we can’t create a copy of the ADDRESSES table next to the other copy because the database wont know which copy you are talking to when you say “get me postcodes from ADDRESSES” and we can’t rename them as both want their’s to be named ADDRESSES. So to solve this we create them a separate area each.
Under Oracle everyone has a login to the database or User if you would prefer. The Users each can store information within their own private area or Schema. In Oracle a User and a Schema are one and the same thing. So we create the JAMES user/schema and the ALEX user/schema. Each of these have their own private storage areas so within the JAMES schema we create an ADDRESSES table and then within the ALEX schema we create another ADDRESSES table. Now they both have their own tables.
But it turns out that ALEX is very very busy and is using up lots of lots of disk speed and making it difficult for JAMES to work. We need a way of moving that table to another area, we need a storage area for JAMES’ tables and ALEX’s tables, we need space to store their tables. So for both of them we create a TABLESPACE.
Each of these tablespaces have their own, private, datafile as we still need to physically write the information in these tables to disk. So if a tablespace gets busy (as our ALEX example is) we can move that datafile, which contains the tablespace, which contains the table, to another disk on the server. The database engine handles all of that and the end users know no different!
So remember that Table -> Tablespace -> datafile -> Disk
Now someone want to access JAMES’ ADDRESSES table? You would need to tell the database engine that you want to access that one in particular and not the ALEX version. To do so you would say:
SQL: "select streetname from JAMES.ADDRESSES"
You can extend this out to lots of other tables in each of the Schema and give permissions to other users/schema to access tables (or not). There are a lot of other clever things that i wont go into here but some of them are:
- Views - a query that is stored in the database so you select from the view and not from the tables, useful if selecting from the tables is complicated or you want it hidden for security reasons.
- Functions/Procedures - pieces of code which sit inside the database and interact with the data.
- Triggers - perform an action once a particular action is undertaken on a table, for instance a new record might well trigger a function.
Oracle manages all of this and has its own schema called “SYSTEM” which just contains tables which has details about everything else, so it might have a table called “USERS” within which it has ALEX and JAMES as records. It also would contain details on the datafiles, tablespaces etc etc.
Databases (and i will use Oracle as an example here) can do clever things with caching to avoid reading values from the disk. This can be particularly useful when you have a value that is read back repeatedly (such as the current VAT rate) and as such Oracle will hold that in memory rather than retrieve it from disk each time. By clever use of caching and disk management the performance of the database can be altered dramatically.
Thus completes the section on how a database, and a little bit of Oracle, works.
TNS Listener
The final, useful, bit to know is how everyone connects into our new database.
If everyone talked straight to the database engine then it could connect people straight into the database and they could start work. But what if we had two databases on the same server, one live and one test? Would the live one handle the connections for the test database or vice versa?
A better way than having the database engines handle the connections (and remember they are very busy processing information requests) is to have a dedicated process creating the connections. This, in Oracle terms, is the TNS Listener.
The TNS Listener accepts connections on behalf of all databases running on a server and then establishes a connection between the connecting client and the target database before stepping out of the process and letting them carry on without it.
Done
So there you go, 2500 words on what a database is and what Oracle is - hope you found it useful.