Filtering Data using Multiple Criteria with O(n) Complexity

A few days ago, I stumbled upon a question of how to filter an array of object data using multiple criteria with O(n) complexity. I think it's an excellent example of a question to answer with fun.

We have a list of objects representing items with several attributes. We also have a list of criteria. Items having one of those attributes should be excluded from the result. The original list is as below:

const items = [
  { type: "a", name: "a1", color: "yellow" },
  { type: "b", name: "b1", color: "orange" },
];

const excludes = [
  { k: "color", value: "yellow" },
  { k: "name", value: "x" },
];

Original code from the question

There is some discussion about whether the excludes list is a constant or not. In my opinion, it is evident that excludes list is a dynamic list. Otherwise, there would be no question at all. So to make it obvious, I will add a touch to the question. Write the function as in the following code:

interface Item {
  type: string;
  name: string;
  color: string;
}

interface Exclude {
  key: string;
  value: string;
}

const someItems: Item[] = [
  { type: "a", name: "a1", color: "yellow" },
  { type: "b", name: "b1", color: "orange" },
  { type: "c", name: "orange", color: "green" },
];

const someExcludes: Exclude[] = [
  { key: "color", value: "yellow" },
  { key: "name", value: "x" },
];

// Create the following function with O(n) complexity.
function excludeItems(items: Item[], excludes: Exclude[]): Item[] {}

excludeItems(items, excludes);

Big O Notation

Wikipedia defines Big O notation as follows:

Big O notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Landau, and others, collectively called Bachmann–Landau notation or asymptotic notation. The letter O was chosen by Bachmann to stand for Ordnung, meaning the order of approximation.
In computer science, big O notation is used to classify algorithms according to how their run time or space requirements grow as the input size grows. In analytic number theory, big O notation is often used to express a bound on the difference between an arithmetical function and a better understood approximation.

In our example, you should solve the problem in a linear order. Roughly, the time we are allowed to spend is the total number of items and excludes.

O(n2) Solution

The first solution that comes to mind is as follows:

items.filter((item) =>
  !excludes.some((exclude) => item[exclude.key] === exclude.value)
);

However, this has a complexity of O(n2). The time we need is roughly the multiplication of the number of items and the number of criteria.

The following table is a rough comparison of the complexities:

Items Criteria O(n2) Time O(n) Time
10 10 100 20
100 100 10.000 200
1.000 1.000 1.000.000 2000
1.000.000.000 10.000 10.000.000.000.000 1.000.010.000

Solving this problem with O(n) complexity will be much better.

O(n) Solution

Reading the list items already takes time O(n). Now we need to reduce access time to the excludes list to O(1). Here we can inspire from database engines. I will create an index by iterating over excludes list. Since the total steps in the loop only depend on the number of excluded criteria, this is a linear operation completed in O(m) time.

Usually, accessing a value using Hash or Map lookup takes O(1) time. We will iterate all items in O(n) time and check whether its three attributes exist in the excluded index in O(3) time.

interface ExcludeIndex {
  type: Set<string>;
  name: Set<string>;
  color: Set<string>;
}

function excludeItems(items: Item[], excludes: Exclude[]): Item[] {
  const excludeIndex: ExcludeIndex = { type: new Set(), name: new Set(), color: new Set() };

  // Populate Exclude Index: O(m)
  excludes.forEach((exclude) => excludeIndex[exclude.key].add(exclude.value));

  // Filter Items: O(n)
  return items.filter(
    (item) => !excludeIndex.type.has(item.type)
      && !excludeIndex.name.has(item.name)
      && !excludeIndex.color.has(item.color),
  );
}

Our total time complexity is O(m) + (O(n) * O(3)) = O(m + 3n)

O(n) === O(m + 3n)

Couldn't we solve the problem in O(n) time complexity? O(n) means a linear solution. In this context, all below terms are equal in complexity:

O(n) = O(3n)
O(n) = O(m + n)
O(n) = O(m + 3n)

Conclusion

Like the example above, most of our daily problems can be solved using a little data transformation.

What do you think about the above solution? Can you find an alternative? Please share with readers in the comments.