(This is a Rambling today)
I was working on an issue last week and it got me to thinking. There are so many times throughout my career where I have had to just sit and look at the screen and just think. I know this sounds just really odd, but it’s true. Let me give you an example.
About 10 or 15 years ago (I don’t know exactly), I was working on creating a table that had every company in the US in it. This sounds pretty simple doesn’t it? The creation of the table was easy but then I got to the part of writing queries on it. But that is a different post for a different day. The point that I am trying to make is that I sat there for hours trying to figure out all the different ways that I could create this table that would store 16 million records. Some of the ways that I was thinking about was to:
- Split the 1 table into many tables and have one table for each State. That way when I ran a query on it I could just look at the State and go right to the table (By the way this was way before table partitions in SQL Server).
- Another way that I thought about it was to Group the States by the area of the country they were in. In my thought process I thought wow, this way I could keep the tables close to the same size but would not have the Maintenance that I would have with the previous option that I mentioned.
- At one time I think I considered creating multiple tables based on the first number in the zip code, or the area code in the phone number.
- I am sure there were others, I just don’t remember them now.
The point that I am trying to make here is that the answer was sitting right in front of me. Just leave it as one table and create an index on the table that will work the queries that I need to run. I am not sure if I have a good example up to this point, but what I am getting at is just because you can do something does not mean that you should do it. Any of the suggestions that I had listed may have worked. Each of those suggestions may even be a good suggestion based on a different case. The end result was I spent so much time trying to figure out how to handle this huge (at the time) number of rows that I could not see the forest through the trees.
Good database design starts with knowing how the data is going to be used. There may be more than one answer, and chances are that there is more than one way to do it. The question is can you see the forest.
Maybe we should start looking at some of the database designs that we see and say look at the pictures, look at the lines that represent the keys. Is there to much there? Then maybe it’s too complex to get data out of it. I am not saying that databases with a lot of tables is bad, I am saying maybe it should be reviewed to make sure that it’s the way it should be done.
I leave you with this thought. There use to be rule when I first started working with databases. If I recall you could have up to 32 table joins in a query, but when you talked to the experts they would say if you are using more the 8 joins then the performance would be awful. Today we can join a lot more than 32 tables in one query and I still think that if you are looking at 8 joins, you may want to review how you are doing things.
Sorry about the Ramblings today. I am still trying to figure out when we lost sight of the goal.