JavaScript: Filtering data

Tags: , , , , ,

This entry was posted on Thursday, November 20th, 2008 at 2:11 am and is filed under dojo, javascript. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Storing data in arrays is a pretty common technique throughout various programming languages, and it’s very straight forward. Until you get to reading them, sorting, filtering and mixing them. A lot of times you have more complexity in your data and filtering them is a very usual task.
Almost every toolkit has some functional approaches to make this easier and the latest JavaScript versions also implement some of them natively. Here I will show two use cases for filtering an array of objects, different approaches and their consequences.

Removing a unique element

Initially I wanted to post a little piece of knowledge to my blog just not to forget it. But while writing it I realized, first that I would find a better approach and second that filtering data can be more interesting than just this first little use case.
Actually I only had the task of removing a certain unique item from an array of objects, like this:

1
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];

From this array I only wanted to remove the tag where id=2. The closest approach is to get the index of the element and splice the array, as described here:

1
2
3
4
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];
for (var i=0, l=tags.length; i<l && tags[i].id!=2; i++){}
tags.splice(i, 1)
// Result: [{id:1, name:"sun"}, {id:3, name:"mountains"}]

The same, and that was what I actually wanted to write about, can be achieved by using dojo and one line less of code:

1
2
3
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];
tags = dojo.filter(tags, "return item.id!=2")
// Result: [{id:1, name:"sun"}, {id:3, name:"mountains"}]

The latter one I actually like best, because I like this string syntax in dojo.filter() for simple functional pieces of code. Using dojo.filter() with the string “return item.id!=2″ as the second parameter relieves me from writing the entire long version “function(item){ return item.id!=2 }”. Yes, it is a little magic, but the scope where the magic applies is so tiny and the actual action is so well visible, that I prefer this. It just makes the code really much more readable.
But be warned, there are traps. If you are dealing with scopes you might have to either reconsider this approach or understand JavaScript’s scoping very well, but that is not part of this article.

The performance

The first approach definitely has quite some advantages. Especially when working on a lot of data or when doing this a lot of times and you want all the performance possible you should stick to the first version. As opposed to the second version it doesn’t create a function, it uses the condition in the for loop for evaluating, not the return value of a function. And what has a lot more impact on the performance is that the for loop is only executed until the element is found, unlike in the second example which always loops over the entire array.

Letting both versions loop over the entire array shows that the first approach is about 10% faster. Adding the early interruption of the first approach you can get a lot more performance.

Finding multiple elements

The previous use case assumed that you search for a unique identifier in the data, which allows to stop searching once the entry was found. If you are searching for all possible matches in the array you have to loop over all the array in both approaches, the native JavaScript approach and the dojo way. Let’s look at the pure JavaScript version first:

1
2
3
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];
for (var i=tags.length-1; i>=0; i--) if (!tags[i].name.match(/^s/)) { tags.splice(i, 1) }
// Result: [{id:1, name:"sun"}, {id:2, name:"sea"}]

You can see that the loop is a bit special and not very easy to read and understand. You need to loop backwards over the array, to not reduce the array while looping over it. We splice on the way, which is not really great performance-wise. We could select the indexes of the elements on the way and remove them from the array later, but that would also mean a splice for each of them.
Creating a new array might be a better thing to do. Let’s try:

1
2
3
4
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];
var newTags = [];
for (var i=0, l=tags.length; i<l; i++) if (tags[i].name.match(/^s/)) { newTags.push(tags[i]) }
// Result: newTags = [{id:1, name:"sun"}, {id:2, name:"sea"}]

And yes, this approach only needs about 10% of the time the first one needed. So this is really much faster. But you also need more memory! I tried it with really a lot of elements to get those numbers, I tested this in Firefox 3.0.3, you can get the code here if you want to try it for yourself.

Mmmh, so let’s try the dojo way:

1
2
3
var tags = [{id:1, name:"sun"}, {id:2, name:"sea"}, {id:3, name:"mountains"}];
tags = dojo.filter(tags, "return item.name.match(/^s/)!=-1 ? index : false")
// Result: [{id:1, name:"sun"}, {id:2, name:"sea"}]

This looks so nice. Very readable source code isn’t it? And it is as fast as the previous approach. Well, it has to loop over all elements and the small overhead from dojo for creating a function is really negligible. So I would go for that :-).

In the end

So this was just a little travel into the filtering of simple arrays. I hope it raises awareness for how to use JavaScript for this task and for where dojo can support you. Always make sure to look under the hood and understand what happens there.
Especially maintainablity becomes more interesting the more people work in your team. So get the balance right.

Leave a Reply