In the course of studying the scalability of a system I need to maintain, I am questioning the architectural choices of the initial development team.
My main question is : does it make sense to persist binary data in the database ? The database is already several GB big. Most of that is just binary data that could technically just as well reside elsewhere.
In our experience, big databases are harder to work with : importing & exporting data takes longer, adding & removing constraints or just refactoring the database is close to impossible because of downtime, database management requires more expertise (tablespace issues, rollback segments not being big enough, etc), etc.
Furthermore, we spend a lot our time getting that binary data out of the server to store it either in a web server or in other capacity.
On the other hand, the full-database solution has the great advantage of being the single place where to obtain the data. This is important because we run two servers in parallel to prevent any downtime. I can probably assume there is a good deal of caching taking place, so I don't have to write a distributed cache layer myself.
I am considering getting rid of the LONGRAW columns (the binary data) and replacing them with a unique id which would identify a file on a filesystem shared across multiple instances of the server. Possible issues are performance, referential integrity and transactions.
I think reading for a file system (even via NFS) can't be that much worse that querying the database for the binary data, especially if I add some caching. Of course, storing the data in the filesystem means that instead of just having one query, I would have one query+1 filesystem read... Enforcing referential integrity and transactions don't seem too much of an issue if the filesystem is shared across instances, I think the main problem will be to make sure files get erased when the db rows are erased.
Has anyone else studied this tradeoff ?
Posted by pgirolami76 at July 2, 2003 10:16 PM | TrackBackinteresting info
Posted by: Harold Online at June 18, 2004 02:34 PMinteresting info
Posted by: Vilgelm Didrex at July 27, 2004 12:12 PM