Filtering Data using Multiple Criteria with O(n) Complexity
Can we filter data using multiple criteria with O(n) time complexity? Yes, it is easy.
A few days ago, I stumbled upon a question of how to filter an array of object data using multiple criteria with O(n) complexity. I think it's an excellent example of a question to answer with fun.
We have a list of objects representing items with several attributes. We also have a list of criteria. Items having one of those attributes should be excluded from the result. The original list is as below:
There is some discussion about whether the excludes
list is a constant or not. In my opinion, it is evident that excludes list is a dynamic list. Otherwise, there would be no question at all. So to make it obvious, I will add a touch to the question. Write the function as in the following code:
interface Item {
type: string;
name: string;
color: string;
}
interface Exclude {
key: string;
value: string;
}
const someItems: Item[] = [
{ type: "a", name: "a1", color: "yellow" },
{ type: "b", name: "b1", color: "orange" },
{ type: "c", name: "orange", color: "green" },
];
const someExcludes: Exclude[] = [
{ key: "color", value: "yellow" },
{ key: "name", value: "x" },
];
// Create the following function with O(n) complexity.
function excludeItems(items: Item[], excludes: Exclude[]): Item[] {}
excludeItems(items, excludes);
Big O Notation
Wikipedia defines Big O notation as follows:
Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others, collectively called Bachmann–Landau notation or asymptotic notation. The letter O was chosen by Bachmann to stand for Ordnung, meaning the order of approximation.
In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows. In analytic number theory, big O notation is often used to express a bound on the difference between an arithmetical function and a better understood approximation.
In our example, you should solve the problem in a linear order. Roughly, the time we are allowed to spend is the total number of items
and excludes
.
O(n2) Solution
The first solution that comes to mind is as follows:
items.filter((item) =>
!excludes.some((exclude) => item[exclude.key] === exclude.value)
);
However, this has a complexity of O(n2). The time we need is roughly the multiplication of the number of items and the number of criteria.
The following table is a rough comparison of the complexities:
Items | Criteria | O(n2) Time | O(n) Time |
---|---|---|---|
10 | 10 | 100 | 20 |
100 | 100 | 10.000 | 200 |
1.000 | 1.000 | 1.000.000 | 2000 |
1.000.000.000 | 10.000 | 10.000.000.000.000 | 1.000.010.000 |
Solving this problem with O(n) complexity will be much better.
O(n) Solution
Reading the list items
already takes time O(n). Now we need to reduce access time to the excludes
list to O(1). Here we can inspire from database engines. I will create an index by iterating over excludes
list. Since the total steps in the loop only depend on the number of excluded criteria, this is a linear operation completed in O(m) time.
Usually, accessing a value using Hash or Map lookup takes O(1) time. We will iterate all items in O(n) time and check whether its three attributes exist in the excluded index in O(3) time.
interface ExcludeIndex {
type: Set<string>;
name: Set<string>;
color: Set<string>;
}
function excludeItems(items: Item[], excludes: Exclude[]): Item[] {
const excludeIndex: ExcludeIndex = { type: new Set(), name: new Set(), color: new Set() };
// Populate Exclude Index: O(m)
excludes.forEach((exclude) => excludeIndex[exclude.key].add(exclude.value));
// Filter Items: O(n)
return items.filter(
(item) => !excludeIndex.type.has(item.type)
&& !excludeIndex.name.has(item.name)
&& !excludeIndex.color.has(item.color),
);
}
Our total time complexity is O(m) + (O(n) * O(3)) = O(m + 3n)
O(n) === O(m + 3n)
Couldn't we solve the problem in O(n) time complexity? O(n) means a linear solution. In this context, all below terms are equal in complexity:
O(n) = O(3n)
O(n) = O(m + n)
O(n) = O(m + 3n)
Conclusion
Like the example above, most of our daily problems can be solved using a little data transformation.
What do you think about the above solution? Can you find an alternative? Please share with readers in the comments.
Comments ()