A wise man* once said, an expert is someone who uses big words and acronyms where simple phrases would do just as nicely. So stand back and listen to this: Database, Relational Database, DBMS, RDBMS, SQL, Sub-queries, normalisation.
[* that wise man was me.]
So now that I’ve established my credentials by bamboozling you with arcane words and capital letters, let me tell you what the purpose of this series of articles is. By the end of it, you will be able to re-read that first paragraph and understand every word; or, if you would prefer that in more practical terms, you will be able to read – and write – SQL, which is the programming language of databases.
A Database is an organised collection of data. Not yet sure what that means? Well, do you own an address book, either on your phone or in a physical book? That’s a database. After all, the addresses and phone numbers are organised – with all friends whose names start with A being grouped separately from people whose names start with B or C or D.
A Relational Database is a database in which the data is organised according to type with the relationships being maintained between the differing types. Okay, that sounds a bit like Greek (or Dutch, if you’re Greek; or German, if you’re Dutch; or Xhosa if you’re German…), but it makes sense if you let me explain.
Dig out your address book again. Imagine all the names grouped together; and all the phone numbers grouped together in another list; and all the addresses in a third. On their own these individual lists might be interesting but not useful; but if we establish the relationship between the lists – this address is where that person lives and that’s their phone number – then our database takes shape.
Make sense? Don’t worry about it too much if it doesn’t; we’ll come back to it a little later. Let’s talk about Oracle now.
Oracle began making RDBMS in the 70s. Today, the Oracle database is, by most metrics, the most popular in the world (it does have some strong competition; we’ll talk about them later. This isn’t like football; now you’re part of Team Oracle, it doesn’t mean you have to hate the opposition. Not much, at least). The latest version of the database is Oracle 12c. You don’t particularly need to remember that now – in fact, you don’t particularly need to remember anything from this section. We’re just painting in the background; the juicy stuff is what comes next.
Remember I’d said the data in databases is organised in groups – all the names over here, the phone numbers over there, the addresses over in that other place? Well, those groupings are called tables.
So in our little database we have a FRIEND_NAME table, a PHONE_NUMBER table, and an ADDRESS table. Got that? Cool.
Tables are made up of vertical columns and horizontal rows. The columns contain data of the same type; while rows contain the data that makes up an item. In our example FRIEND_NAME table, the Last_Name column contains all the surnames – Geller, Bing, Tribiani, Geller-Bing, Green and Buffay – while the rows contain the full names, such as Ross Geller.
Our database will be pretty boring – and not relational – if it contained only one table. Let’s knock up our PHONE_NUMBER and ADDRESS tables.
Figure 1: PHONE_NUMBER
Figure 2: ADDRESS
The Oracle database needs to know the types of all the data you keep. (That way, for instance, if you ask it to subtract the value in the ADDRESS.CITY column from the value in the ADDRESS.HOUSE_NO column, it’ll be able to tell you that you’re crazy.) There is a long list of data types that Oracle recognises, but we’ll only focus on the 3 main types.
NUMBER: This one’s self-explanatory. If a column is created as a NUMBER column, only numbers can be stored in it. It can be whole numbers, decimals, negative or positive.
VARCHAR2: Okay, this one’s a little weird. There’s a lot of history packed into the name of this data type; however, it’s mostly boring, so I won’t go into it. What you need to know is that it stands for VARiable CHARacter and is the data type required to store character strings, such as the data in FRIEND_NAME.FIRST_NAME, FRIEND_NAME.MIDDLE_NAME and FRIEND_NAME.LAST_NAME.
There is one interesting difference between the VARCHAR2 and NUMBER data types, and that is that you can only store numbers in NUMBER columns; however, you can record any string of alphanumeric characters in VARCHAR2 columns. For example, with its combination of numbers and letters, we cannot record ADDRESS.ZIPCODE in a NUMBER column, but we can save it as a VARCHAR2.
DATE: Another self-explanatory data type. We haven’t used any dates in our hypothetical database thus far – but we will; I’m saving that pleasure for later.
Let me explain. Say one day, you’re chilling out at a café and you start chatting with a stranger. Turns out you’ve got lots in common and you really like that joke they told about a politician, a monkey and a water pistol. When eventually, you rise to leave, you exchange names and numbers and promise to stay in touch. They say their name is Ross Geller. You add it to your address book. But you already had a friend named Ross Geller! How will you know which is which when you want to phone them up and laugh about the monkey joke again?
That’s where primary keys come in. Names – even rare ones like Ross Geller - do not uniquely identify a record, so we need something that does. In our NAME table it is the number in FRIEND_ID. We simply need to give the new row, the new Ross, a new – unique – number in the FRIEND_ID column.
Databases rule the world, and thus, primary keys are all around us. Your passport number, your social security number, the number on your driving license – they’re all primary keys.
Our database has a table for names, another for phone numbers, and a third for addresses. However, there is no way of knowing which of our friends lives at what address and when, or what their phone number might be. We’ve built a database, but it’s not yet relational. Let’s create two further tables that address that problem.
Figure 3: FRIEND_ADDRESS
Figure 4: FRIEND_PHONE
Take a minute to study the tables. Notice how useful primary keys are? Instead of typing out the friend’s name in full or typing the full address, all we need is the primary keys. And so, armed with our burgeoning knowledge of databases, we can look at the following:
And after relating this table to the FRIEND_NAME and ADDRESS tables, we know that it is saying between September 1994 and October 2000, Chandler Bing lived at Apartment 19, 90 Bedford Street, New York, NY10014.
And the reason we know that is because we now implicitly understand the concept of foreign keys. Here’s a definition: A Foreign Key is a column (or combination of columns) that uniquely identifies a row in another table.
Foreign keys are the invisible threads that knit all the tables in our database together. It is the foreign keys, telling us how the rows in one table are related to the rows in another table, that turn a database into a relational database. It is the foreign key that takes data and begins to turn it into information.
Without foreign keys, a database is like a dull room, full of bored people.
With foreign keys, it’s a party.
Got that? Great. In the next article, we’ll be learning SQL, the language of databases.
[* that wise man was me.]
So now that I’ve established my credentials by bamboozling you with arcane words and capital letters, let me tell you what the purpose of this series of articles is. By the end of it, you will be able to re-read that first paragraph and understand every word; or, if you would prefer that in more practical terms, you will be able to read – and write – SQL, which is the programming language of databases.
Definitions
Let’s meet the main characters of our story: I’ll give you a couple of definitions; one building on the other.A Database is an organised collection of data. Not yet sure what that means? Well, do you own an address book, either on your phone or in a physical book? That’s a database. After all, the addresses and phone numbers are organised – with all friends whose names start with A being grouped separately from people whose names start with B or C or D.
A Relational Database is a database in which the data is organised according to type with the relationships being maintained between the differing types. Okay, that sounds a bit like Greek (or Dutch, if you’re Greek; or German, if you’re Dutch; or Xhosa if you’re German…), but it makes sense if you let me explain.
Dig out your address book again. Imagine all the names grouped together; and all the phone numbers grouped together in another list; and all the addresses in a third. On their own these individual lists might be interesting but not useful; but if we establish the relationship between the lists – this address is where that person lives and that’s their phone number – then our database takes shape.
Make sense? Don’t worry about it too much if it doesn’t; we’ll come back to it a little later. Let’s talk about Oracle now.
Oracle
You’ve probably heard the word Oracle mentioned in discussions about databases, but you possibly do not know that Oracle is a corporation. It makes software to create and manage databases – so-called Database Management Systems. That’s the DBMS acronym from way back in paragraph 1; and an RDBMS is, of course, a Relational Database Management System.Oracle began making RDBMS in the 70s. Today, the Oracle database is, by most metrics, the most popular in the world (it does have some strong competition; we’ll talk about them later. This isn’t like football; now you’re part of Team Oracle, it doesn’t mean you have to hate the opposition. Not much, at least). The latest version of the database is Oracle 12c. You don’t particularly need to remember that now – in fact, you don’t particularly need to remember anything from this section. We’re just painting in the background; the juicy stuff is what comes next.
Databases
It’s time to roll up our sleeves and get our hands dirty. Go get your address book again.Remember I’d said the data in databases is organised in groups – all the names over here, the phone numbers over there, the addresses over in that other place? Well, those groupings are called tables.
So in our little database we have a FRIEND_NAME table, a PHONE_NUMBER table, and an ADDRESS table. Got that? Cool.
Tables are made up of vertical columns and horizontal rows. The columns contain data of the same type; while rows contain the data that makes up an item. In our example FRIEND_NAME table, the Last_Name column contains all the surnames – Geller, Bing, Tribiani, Geller-Bing, Green and Buffay – while the rows contain the full names, such as Ross Geller.
Our database will be pretty boring – and not relational – if it contained only one table. Let’s knock up our PHONE_NUMBER and ADDRESS tables.
Figure 1: PHONE_NUMBER
Figure 2: ADDRESS
Data Types
You will have noticed that we’ve got different types of data in our tables – from the PHONE_NUMBER table that contains nothing but numbers to FRIEND_NAME and ADDRESS that both contain character strings, numbers and, in the case of the ZIPCODE column, a combination of both.The Oracle database needs to know the types of all the data you keep. (That way, for instance, if you ask it to subtract the value in the ADDRESS.CITY column from the value in the ADDRESS.HOUSE_NO column, it’ll be able to tell you that you’re crazy.) There is a long list of data types that Oracle recognises, but we’ll only focus on the 3 main types.
NUMBER: This one’s self-explanatory. If a column is created as a NUMBER column, only numbers can be stored in it. It can be whole numbers, decimals, negative or positive.
VARCHAR2: Okay, this one’s a little weird. There’s a lot of history packed into the name of this data type; however, it’s mostly boring, so I won’t go into it. What you need to know is that it stands for VARiable CHARacter and is the data type required to store character strings, such as the data in FRIEND_NAME.FIRST_NAME, FRIEND_NAME.MIDDLE_NAME and FRIEND_NAME.LAST_NAME.
There is one interesting difference between the VARCHAR2 and NUMBER data types, and that is that you can only store numbers in NUMBER columns; however, you can record any string of alphanumeric characters in VARCHAR2 columns. For example, with its combination of numbers and letters, we cannot record ADDRESS.ZIPCODE in a NUMBER column, but we can save it as a VARCHAR2.
DATE: Another self-explanatory data type. We haven’t used any dates in our hypothetical database thus far – but we will; I’m saving that pleasure for later.
The One About Primary Keys
I’ve got another term for you: Primary Key. A primary key is a key – a column or combination of columns – that uniquely identifies a row.Let me explain. Say one day, you’re chilling out at a café and you start chatting with a stranger. Turns out you’ve got lots in common and you really like that joke they told about a politician, a monkey and a water pistol. When eventually, you rise to leave, you exchange names and numbers and promise to stay in touch. They say their name is Ross Geller. You add it to your address book. But you already had a friend named Ross Geller! How will you know which is which when you want to phone them up and laugh about the monkey joke again?
That’s where primary keys come in. Names – even rare ones like Ross Geller - do not uniquely identify a record, so we need something that does. In our NAME table it is the number in FRIEND_ID. We simply need to give the new row, the new Ross, a new – unique – number in the FRIEND_ID column.
Databases rule the world, and thus, primary keys are all around us. Your passport number, your social security number, the number on your driving license – they’re all primary keys.
Relational Databases
We now have all the pieces of the puzzle. We can now redefine – and understand – relational databases. A Relational Database is a database in which the data is organised in tables with the relationships being maintained between the different tables.Our database has a table for names, another for phone numbers, and a third for addresses. However, there is no way of knowing which of our friends lives at what address and when, or what their phone number might be. We’ve built a database, but it’s not yet relational. Let’s create two further tables that address that problem.
Figure 3: FRIEND_ADDRESS
Figure 4: FRIEND_PHONE
Take a minute to study the tables. Notice how useful primary keys are? Instead of typing out the friend’s name in full or typing the full address, all we need is the primary keys. And so, armed with our burgeoning knowledge of databases, we can look at the following:
And after relating this table to the FRIEND_NAME and ADDRESS tables, we know that it is saying between September 1994 and October 2000, Chandler Bing lived at Apartment 19, 90 Bedford Street, New York, NY10014.
And the reason we know that is because we now implicitly understand the concept of foreign keys. Here’s a definition: A Foreign Key is a column (or combination of columns) that uniquely identifies a row in another table.
Foreign keys are the invisible threads that knit all the tables in our database together. It is the foreign keys, telling us how the rows in one table are related to the rows in another table, that turn a database into a relational database. It is the foreign key that takes data and begins to turn it into information.
Without foreign keys, a database is like a dull room, full of bored people.
With foreign keys, it’s a party.
Recap
Here’s what I would like you to remember: what is a database? What is a relational database? What are tables, columns and row? What are the main data types? What are primary keys and foreign keys?Got that? Great. In the next article, we’ll be learning SQL, the language of databases.