Martin Fowler's MultipleCanonicalModel entry hit the spot right on what's been happening around here lately.
His point is implementing conceptual models using shared databases in a large enterprise may not be a winning proposition whereas using message-based integration brings along all sorts of side gains.
I think the same point applies to B2B exchanges
The company I work for produces, provisions and distributes content to mobile devices. Up to now we've been delivering the provisionning of the content (ie what content is available in what node of the client's catalog) every couple weeks as a huge zip file containing data files and tables. The clients then update their sites' databases according to the tables. Basically a shared database strategy as far as I can tell, only it's across company boundaries.
We are currently moving towards a notification system based on messaging : everytime a use case that changes the content master repository is executed, a notification is fired and dispatched to wherever. This is a messaging based approach to integration which helps use decrease the time it takes to update our clients, is scalable and makes our lives easier.
It also somewhat shields our data model from what the client wants (he can probably have everything flat in his db if he wants). This is Fowler's 3rd point.
Our data model is actually richer than what most clients want, this is Fowler's 4th point.
How do you make a legacy system send notifications ? More to come...
Laurent's entry about debugging and development technique struck a chord when I read it. At teh begining of the year, we had to bring in a programmer to help us out. He used the debugger all the time (in java, with EJBs !!). It was the first time in years I had seen someone use a debugger and I realized it had been years (ok... months really but not as a work methodoly) since I'd used one myself !
In order to minimize errors I make, I usually resort to
i) writing tests (but that depends on how strict your suite is and the general state of the project)
ii) "having the program tell you a story", ie log every choice or important information so you can track precisly what's going on.
This way you can usually track exactly where a bug is coming from (usually) by reproducing the error and just looking at the log.
When you're in production, switch it to WARN mode rather than DEBUG and you're ok. And yeah... I hate testing whether the log is in DEBUG mode, I can live with the extra time it takes to compute those log strings... I'm using EJBs anyway right now...
What is the point in PerPOJO rather than just serializing my object graph ? Ok, so it reads a file and saves it from time to time... do we need a project for this ?
Is it thread safe ? I am not an almighty multithreaded programmer but saveAll() has no synchronization whatsover with respect to put or get. put() has a synchronized keyword but that won't synchronize access to other methods...
I'm perplexed. Did I miss something ?
In the course of studying the scalability of a system I need to maintain, I am questioning the architectural choices of the initial development team.
My main question is : does it make sense to persist binary data in the database ? The database is already several GB big. Most of that is just binary data that could technically just as well reside elsewhere.
In our experience, big databases are harder to work with : importing & exporting data takes longer, adding & removing constraints or just refactoring the database is close to impossible because of downtime, database management requires more expertise (tablespace issues, rollback segments not being big enough, etc), etc.
Furthermore, we spend a lot our time getting that binary data out of the server to store it either in a web server or in other capacity.
On the other hand, the full-database solution has the great advantage of being the single place where to obtain the data. This is important because we run two servers in parallel to prevent any downtime. I can probably assume there is a good deal of caching taking place, so I don't have to write a distributed cache layer myself.
I am considering getting rid of the LONGRAW columns (the binary data) and replacing them with a unique id which would identify a file on a filesystem shared across multiple instances of the server. Possible issues are performance, referential integrity and transactions.
I think reading for a file system (even via NFS) can't be that much worse that querying the database for the binary data, especially if I add some caching. Of course, storing the data in the filesystem means that instead of just having one query, I would have one query+1 filesystem read... Enforcing referential integrity and transactions don't seem too much of an issue if the filesystem is shared across instances, I think the main problem will be to make sure files get erased when the db rows are erased.
Has anyone else studied this tradeoff ?