20 Mar 2022, 18:28

Making a Data Structure Iterable in JavaScript/TypeScript

In my last post I explained how to make a Map and Set that allowed you to use coordinates ([number, number] arrays) more like values than references. This means that adding a coordinate array with the same values twice will only result in one entry in the respective data structures. For example, in the CoordinateSet data structure you can see this behaviour where we add the same coordinate twice but the size remains at 0:

coordinateSet.add([51.509865, -0.118092]);
coordinateSet.add([51.509865, -0.118092]);
coordinateSet.size; // 1

Here I realised I wanted to extend CoordinateSet to be closer to the Set class. This meant adding the entries and forEach methods and also handling spreadable and for...of behaviour.

This got me looking at how these work for Set. Turns out set implements Set.prototype[@@iterator](), which allows it to be iterated over. An object is considered iterable if the object implements the iterable protocol - some examples include Arrays, Maps and Sets.

The @@iterator property is accessible via Symbol.iterator, and must return an object that conforms to the iterator protocol. Luckily, using a generator function we can ensure that an iterator object is returned. Generator functions are pretty cool and can be defined by adding an asterix onto the end of a function keyword, i.e. function*. In this specific case we use a computed property, which means we are defining it using a value, in this case Symbol.iterator. This syntax is slightly different as we have to put the asterix before *[Symbol.iterator]().

Let’s assume we have a Class (in our case CoordinateSet from the previous post), here’s how we’d do it:


  // This bit allows the data structure to be iterated over with
  // spread syntax ([...]) and for...of
  *[Symbol.iterator]() {
    for (let entry of this.map.entries()) {
      yield [entry[1], entry[1]];
    }
  }

  // Matching the Set.entries interface
  entries() {
    return this[Symbol.iterator]();
  }

Let’s break down the bits that might be a bit alien here:

  • * denotes a generator - this means that the function is a generator function, where the function returns a Generator object. The Generator object implements both the iterable protocol and the iterator protocol.
  • [Symbol.iterator] - this is the computed property denoted by the square brackets. The Symbol.iterator is a Symbol which specifies the default iterator for a given object. It’s outside the scope of the post to go into Symbols but you can think of them as key that’s guaranteed to be unique. There are a few Symbols that are builtin in, such as Symbol.iterator.
  • yield is similar to return, but instead pauses the generator function execution and the value that is yielded is returned to the generator

There’s quite a bit to take in there if you’re not familiar with generators and Symbols. I would recommend taking a bit of a look at the MDN documentation for these two and the above code will start to make a little more sense!

20 Feb 2022, 18:28

Creating Unique Coordinate Keys

In a previous post, I examined creating a data structure for quickly getting a value based on a coordinate. For context, here a coordinate is meaning a longitude, latitude pair. This pair can often be found as an array ([0.1, 0.2] etc), and sometimes as a property on an object ({ lng: 0.1, lat: 0.2 }).

In this post, I want to dig a bit deeper and examine a new key creation approach that saves on memory and speed whilst also opening up new avenues for more simply implementing data structures such as Sets for coordinates.

Maps, Sets and Strict Equality

First a bit of background. In JavaScript two arrays are only said to be equal if they reference the same array, even if the values are the same. This means can’t use two identical coordinate arrays to exclusively associate to some piece of data. For example:

const map = new Map();
map.set([-0.118092, 51.509865], "a");
map.set([-0.118092, 51.509865], "b");

// For those familiar with JS Map and Set behaviour,
// the following may not be a suprise:

map.get([-0.118092, 51.509865]); // Returns undefined
map.size; // Return 2, as each array creates it's own entry

Now you might be able to see that in some use cases we might think of a coordinate as a unique key and not be concerned about object equality - we just care that the values match. In some languages, tuples of a set length with the same values can be considered equal, and this is the same logic we might want to apply here.

What about toString?

The second problem it gets around is perhaps how one might intuatively try to solve this which is to use toString:

const map = new Map();
const key = [-0.118092, 51.509865].toString();
map.set(key, "a");
map.set(key, "b");

map.get([-0.118092, 51.509865]); // Returns "b"
map.size; // Return 1, as each array creates it's own entry

This gets us closer to what we want and could have been one way to achieve what we wanted in the original blog post. However, unfortunately toString is a somewhat slow operation in JavaScript.

The approach in the original post gets around both of these issues by using nested Maps, so setting in the CoordinateMap creates a Map for the longitude value and then another nested one for the latitude value. This works, but in the worst case means you end up with two Maps for each coordinate. Understandably this could be viewed as excessive.

Is there a fast way to create a unique key?

I’ve been experimenting with a fast way to create a key - the logic works like this: in Mathematics, there is a concept of a pairing function that allows you to uniquely map two natural numbers to a single larger natural number. In our case, we can think of natural numbers as positive integers. This got me thinking, if we can encode our coordinates as positive integers we can use one of the pairing functions to create a single key. If we work on the assumption that we can pin the coordinates to a maximum precision, say a very generous 9 decimals, then thankfully we can fit the numbers into a single digit (JavaScripts maximum integer size is a whopping 9007199254740991 as specified by IEEE 754).

Using Szudzik’s Elegant pairing function we can achieve this doing the following:

export function pairCoordinate(
  longitude: number,
  latitude: number,
  factor = 10000000 // 9 digit precision
) {
  // Make longitude and latitude postive
  // Ensure they are integers
  const x = ((longitude + 180) * factor) | 0;
  const y = ((latitude + 90) * factor) | 0;

  // Szudzik pairing
  return x >= y ? x * x + x + y : y * y + x;
}

This approach for generating unique keys for a coordinate is about 45x faster than using toString. In a future post I will examine how we can reimplement at CoordinateMap and also look at implementing a CoordinateSet to quickly check for presence of a given coordinate.

13 Feb 2022, 18:28

Coordinate Maps and Sets

One problem I was pondering recently is in the geospatial arena - somewhere I spend a lot of my time working. I’ve noticed that it’s relatively easy to do key-value lookups on a single piece of data, i.e. a string or a number. For example, if you want to associate a letter in the alphabet to a score you could use a Map to do something like this:

const map = new Map(["a", 1], ["b", 2], ["c", 3], ["d", 4], ["e", 5], ["f", 6]); //etc

The problem I have been mulling over is if you have a piece of data which is unique but multidemnsional, like a coordinate, how do you handle that? For example this approach now falls apart:

coordinateMap = new Map();
coordinateMap.set([32.95, 83.31], 1);
coordinateMap.set([32.95, 83.31], 2);
coordinateMap.size; // This will return 2, as each array creates it's own entry

Assuming you have a coordinate in the format [number, number], you could simply convert the coordinate to a string using the toString array method. This gives you a unique key for that piece of data:

const coordinateMap = new Map();
const coordinate = [32.95, 83.31];
coordinateMap.set(coordinate.toString(), "some value");

It turns out, the Array toString method is quite slow. However using string concatenation/template strings is quite fast, as fast as using some sort of number pairing function, and as such, we can use something like this:

`${coordinates[0]},${coordinates[1]}`;

Coordinate Map

This got me thinking, can we come up with a data structure that handles this problem elegantly, with a nice Map-like API? This is where I got to:

import { CoordinateDataStructure } from "./coordinate-data-structure";

export type NotUndefined<T> = T extends undefined ? never : T;

export class CoordinateMap<
  T extends U extends any ? NotUndefined<T> : unknown,
  U = any
> extends CoordinateDataStructure {
  private _size = 0;
  private _map: Map<string, T> = new Map();

  // O(1)
  get size() {
    return this._size;
  }

  // Ignore setting the size property publically
  set size(value: number) {}

  // O(1)
  set(coordinates: [number, number], value: T): void {
    this.validateCoordinate(coordinates);

    if (value === undefined) {
      throw new Error("Values can not be set to undefined - use 'remove'");
    }

    const key = this.getKey(coordinates);

    if (!this._map.get(key)) {
      this._size++;
    }

    this._map.set(key, value);
  }

  // O(1)
  get(coordinates: [number, number]): T | undefined {
    this.validateCoordinate(coordinates);
    const key = this.getKey(coordinates);
    return this._map.get(key);
  }

  // O(1)
  delete(coordinates: [number, number]): void {
    this.validateCoordinate(coordinates);
    const key = this.getKey(coordinates);

    if (this._map.get(key) === undefined) {
      throw new Error(
        `Coordinate [${key}] cannot be removed as is not in CoordinateMap`
      );
    }

    this._size--;
    this._map.delete(key);
  }

  // O(1)
  has(coordinates: [number, number]): boolean {
    this.validateCoordinate(coordinates);
    return this._map.get(this.getKey(coordinates)) !== undefined;
  }

  clear() {
    this._map.clear();
    this._size = 0;
  }
}

Here validateCoordinate and getKey comes from a base abstract class which we share with the Set data structure which is outlined below. For completeness, it looks like so:

export abstract class CoordinateDataStructure {
  protected getKey(coordinates: [number, number]) {
    return `${coordinates[0]},${coordinates[1]}`;
  }

  protected validateCoordinate([lng, lat]: [number, number]) {
    if (
      !(typeof lng === "number" && !isNaN(lng) && lng >= -180 && lng <= 180) ||
      !(typeof lat === "number" && !isNaN(lat) && lat >= -90 && lat <= 90)
    ) {
      throw new Error("Longitude and latitude must be valid");
    }
  }
}

Designing the Map API was a fun little challenge - in the end, I tried to match the classic JavaScript Map API as closely as possible.

To demonstrate, let’s say we’re in a testing environment like Jest/Mocha etc, here I can demonstrate the API of the class:

const coordinateMap = new CoordinateMap<number>();

expect(coordinateMap.get([0.0, 1.0])).toBe(undefined);
expect(coordinateMap.has([0.0, 1.0])).toBe(false);

coordinateMap.set([0.0, 1.0], 1);

expect(coordinateMap.get([0.0, 1.0])).toBe(1);
expect(coordinateMap.has([0.0, 1.0])).toBe(true);

coordinateMap.delete([0.0, 1.0]);

expect(coordinateMap.get([0.0, 1.0])).toBe(undefined);
expect(coordinateMap.has([0.0, 1.0])).toBe(false);

Coordinate Set

I also went ahead and implemented an equivalent data structure for the Set data structure:

import { CoordinateDataStructure } from "./coordinate-data-structure";

export class CoordinateSet extends CoordinateDataStructure {
  private _size: number;
  private map: Map<string, [number, number]>;

  constructor(array?: [number, number][]) {
    super();

    if (Array.isArray(array)) {
      this.map = new Map();
      arrOrCoordSet.forEach((c) => {
        this.validateCoordinate(c);
        this.map.set(this.getKey(c), c);
      });
      this._size = arrOrCoordSet.length;
    } else {
      this.map = new Map();
      this._size = 0;
    }
  }

  // O(1)
  get size() {
    return this._size;
  }

  // Ignore setting the size property publically
  set size(value: number) {}

  has(coordinate: [number, number]) {
    return Boolean(this.map.get(this.getKey(coordinate)));
  }

  add(coordinate: [number, number]) {
    const key = this.getKey(coordinate);
    if (this.map.get(key)) {
      return;
    }
    this._size++;
    this.map.set(key, coordinate);
    return this;
  }

  delete(coordinate: [number, number]): boolean {
    const key = this.getKey(coordinate);
    if (!this.map.get(key)) {
      return false;
    }
    this._size--;
    return this.map.delete(key);
  }

  clear() {
    this.map = new Map();
    this._size = 0;
  }
}

This data structure allows you to do the core JavaScript Set operations, like so:

const set = new CoordinateSet();

set.add([180, -90]);
set.add([0.1, 0.2]);
set.delete([180, -90]);
expect(set.size).toBe(1);
set.delete([0.1, 0.2]);
expect(set.size).toBe(0);

A few things could be done to probably improve these data structures, for example adding all the iterable methods such as entries and values. I may extend this at some point and publish it on a library on npm.