Anna Filina

Doctrine: saving you some tables and headaches

October 17th, 2013

I like database schema normalization, but sometimes it just creates too much overhead.

For example, you may want to let your users to enter a number of URLs (facebook, linkedin, twitter, blog, company, etc.) The straightforward solution seems to add a Url table with foreign key (user_id=1) to link it to the User. Sometimes you want to reuse the Url table for other entities, such as Project or Organization. In this case, you'd probably go with a Url table containing the name of the entity to which it belongs and an id (entity="User", entity_id=1), which requires me to have some extra logic in my code.

But then I step back and ask myself: "What do I gain by storing URLs in a separate table?" Usually, the answer is "Nothing, really." I often serialize such data to a text column using JSON:

  { "type":"twitter", "url":"" },
  { "type":"linkedin", "url":"" },
  { "type":"company", "url":"" },
  { "type":"company", "url":"" }

It's very natural for me to work with this format, saves me tables and has a positive effect on performance, since I no longer require LEFT JOINs. For years, I wrote my APIs in a way that automatically deserialized JSON when loading from the database and serializing before storing back.

I recently discovered that Doctrine 2 has a nifty column type called "json_array". You basically interact with your model's property as though it was a regular PHP array like this $user->getUrls() and $user->setUrls(). Doctrine then takes care of the serialization and deserialization for you, which makes adding new columns of this type faster and easier than ever.

Reference of Doctrine mapping types:

Happy coding!


Feras October 18th, 2013 I dont agree with you on several points:

1) if this sort of performance overhead from joins matter to you you should NOT be using Doctrine to begin with.

2) Storing values like that undermines the whole point of using a relational database since no operations can be performed on them. If you need to store such unstructured data a non-relational database is what you need such as Couch or mongodb. Actually a lot of applications use that approach and store the relational data in RDBS and non-structured data in a nosql database which obviously makes sense.

3) Depending how complex is that serialized data and what is your schema (indexes, etc), serializing might actually be slower than joins when you have to do it on huge amounts of records.

4) If all your API does is serializing / deserializing some data and store it and performance is key, why would you ever use a sql solution?

I really dont mean to rant on this post but I see so many PHP developers posting similar stuff and doing such premature and unnecessary optimizations on open source projects that just make the code one big pile of crap, essentially ruining it.
Anna October 18th, 2013 I understand your reaction, since this approach doesn't look "pure".

When choosing types of storage, it doesn't have to be either-or. Find the right balance between things that work for your unique case.

Note that in the example, only a small portion is serialized and the data is far from complex, causing too little overhead. A couple CPU cycles during the few write operations are less expensive than a join on a multimillion-record table during the countless disk reads. As long as this approach is used sparingly, I wouldn't worry.

Besides, the optimization is only a positive side-effect. The main reason would be to simplify the schema and code to reduce development time and later maintenance costs. I have seen complex schemas with hundreds of tables that were created for each type of list imaginable (plus the models) and developers were simply lost in that clutter.

Once again, this approach shouldn't be used everywhere, because it doesn't make sense in every case. It's good to know that this method exists and has its advantages in certain cases. In the end, every tool has the potential to be misused, so I have to trust that people would think before making decisions.
gggeek October 18th, 2013 At the cost of being pedantic - trying to make it clear for even the youngest readers:
even in the simple case mentioned, and not taking any scaling/performances considerations into account (hey, not everyone becomes the next twitter), what you loose from this solution is the possibility of querying the url table for analytic/statistical information:
- how many users have a twitter link?
- how many links are tagged twitter?
Making a batch update also becomes much heavier operation.

My advice would be to think about the business logic first, optimize second.
Anna October 18th, 2013 Your data is still searchable when serialized. I have implemented solutions for analytics and batch operations on big recordsets and much more complex data structures than the one illustrated.

Once again, it all depends on how much you want to focus on theoretical features rather than the ones at hand. One can spend countless hours planning for the what-ifs that never happen (premature architecture). You should ask yourself "How likely am I to implement all these other things in the near future?" In the worst case, you can refactor. In the meantime you'd be delivering real value and Return on Investment for your client/employer.

This is my philosophy, and is the one that enabled me to get into the project rescue business. Maybe if I had unlimited time, I could have spent more time "purifying" my code and planning for all potential features that I might or might not implement :)
Feras October 18th, 2013 I find the article useful in the fact that it shows this overlooked featured, but again I would advice against such application. If you read your comment:

"Your data is still searchable when serialized. I have implemented solutions for analytics and batch operations on big recordsets and much more complex data structures than the one illustrated."

What do you think is messier - starting to write those solutions that do magic in memory and trying to do operations on otherwise relational data, or just having a 100 tables as you say in the schema and leave the highly optimized DB deamon do those operations for you... Thats my humble opinion.
Lukas October 19th, 2013 The reality is that there are plenty of use cases where it makes sense to serialize data into an RDBMS. At the most extreme level one can see it as just a transaction enabled file system with well understood solutions for backup and replication.

Now between that and a proper normalized RDBMS there are a lot different use cases. If I just have a few records, do I really want to bring in another database technology that needs to be scaled and made fault tolerant?

Also if I never/rarely need to query the data and if I never/rarely need to do batch operations on the data, does it even matter? Furthermore its not like NoSQL solutions are very good at batch operations, in fact it might be easier to batch update JSON in an RDBMS than in many NoSQL solutions because there is simply no adhoc query language that allows batch updates in many of them.

Furthermore the JSON approach can often also simply be used for denormalization where the application or triggers keep them updated.

So the conclusion: Do not oppose serialized data in an RDBMS on principle.
Anna October 19th, 2013 Lee Davis wrote some interesting things regarding Doctrine. I suggest you take a look at this link he gave me:
Victor P. October 20th, 2013 There are so many reasons why serializing data like this inside an rdbms is terrible but I see that some of the most logical ones have already been listed.

Anna, you can't call it premature architecture, or "purifying" your code. They are best practices for a reason, and there's good reason for data normalization.

The cost of storing this data correctly is not that big and by serializing your data this way "refactoring" your data & schema later may prove to be a lot harder than you originally thought.

I use doctrine, and I make sure that I create the correct relationships and proper models so that my tables meet the 3rd normalized form.
Feras October 20th, 2013 @Lukas
I mostly disagree here:
" it might be easier to batch update JSON in an RDBMS than in many NoSQL solutions because there is simply no adhoc query language that allows batch updates in many of them.". I think you are looking at the problem from the wrong angle, sure if you store the serialize json in NoSQL it might be harder but then the whole point of using a schema-less NOSQL solution is lost, the idea of using the NOSQL for unstructured data is exactly just that - you dont need to serialize the data since it doesnt have to have a fixed structure and making batch operations on it is trivial... Having to do it in RDBMS means having to use json decoding where the software supports it and then manually extracting each json field from the set and performing operations on it. Here is a brief example in MySQL:
Now I dont know how efficient that is and so on, but it looks more like a patch than a solution really. Also I am not against denormalization, but I am for denormalization when it is used for performance gains not to simplify the schema or development time, since its gonna bite you later if you do it for those purposes and if your denormalization is done for performance gains than chances are that you wont mind adding additional software to the stack such as NoSQL since performance gains will generally be larger. Thats my humble opinion.
Victor P. October 21st, 2013 @Feras

Agreed. The correct solution when you have an issue related to dynamic schemas is to use a NoSQL database therefore serialization is not necessary.

And if you really needed this to be in a relational database but still be dynamic enough, you could possibly add a link table that has a many to one relationship with the user table and a type column.

It seems to me that the problem described above is not a schema issue but a design issue.

But a NoSQL approach would be the most correct in a scenario where you truly have a dynamic schema.

I think that it's great that doctrine has this data type but I'm not certain that it was intended to be used like this.
Caponica February 16th, 2014 I have to agree with Lukas and Anna here... there are plenty of use cases where it makes sense to serialize some small amount of data (relative to the whole).

Throwing out an existing system to replace it with NOSQL is not realistic and slavishly following principles of normalisation even where it does not serve your actual needs is not good practice.

Anna, in relation to your original article... have you tried creating an admin or form interface to data serialized in this way? If so, was that complex or are there built in Symfony / Sonata / etc form types that help with this?
Anna March 2nd, 2014 @Caponica
Extracting form data is very easy in such frameworks. In Symfony, I'd call ->getData() to get the form's data as an array.

I would then use json_encode() to serialize the array. If I'm using Doctrine's "json_array" type for a column, then I can simply assign the array to that column and let Doctrine do the serialization.
foo March 22nd, 2014 Postgresql actually supports json as a native data type and also has functionalities to query those json-fields.

I think the example given in the article is good example on case where storing json makes sense.
Xander S July 23rd, 2015 Hi there,
This is a great way to store settings. Imagine an application where you have to save loads of settings in an easy way. Thanks. I'd like to see more about storing json.
Arpita Rana October 14th, 2015 @Anna, Nice discussion.
If I stored form data (form contain 40 to 50 fields) in one field as serialize array as you say.

Then how to display all value from database in form, when I want to edit the form.

Thanks in advance. :)
Arpita Rana October 14th, 2015 This above point is regarding Symfony2 framework.
Anna November 17th, 2015 I had success with injecting the array into a subform. Say "settings" is a json_array. You can have the child CustomerSettingsType contain fields that are properties of "settings".

class CustomerType extends AbstractType
public function buildForm(FormBuilderInterface $builder, array $options)
->add('settings', new CustomerSettingsType(), [
'label' => false,

class CustomerSettingsType extends AbstractType
public function buildForm(FormBuilderInterface $builder, array $options)
->add('setting1', 'text')
->add('setting2', 'text');
Johan December 26th, 2018 Great post thank you.

This page is protected by reCAPTCHA and the Google
Privacy Policy and Terms of Service apply.

Phone: +1 514-918-7866 | E-mail: