16 Aug 2020, 18:28

Getting Started With Handling TypeScript ASTs

TypeScript provides a language which builds on top of JavaScript adding static types to help determine type mismatches at compile time (and even whilst you write!). This is useful from a developmental perspective as (without wanting to dive into the dynamic vs static typing) it makes it easier to reason about the data flowing through your programs, documenting the shape of data at each stage.

TypeScript is powerful in that it has a solid level of inference, which means you don’t need to strictly type every variable. For variable assignment it will take what’s called a best common type; this means if your provide an integer or float, it will infer a number type, but also if you have an array with multiple types it will type the variable as an array with those possible types (i.e. let arr = ["1", 2, null] would type to (string | number | null)[]). Along with side this, it also provides what you could call the ‘inverse’ of this; contextual typing. Contextual typing means that the type can be implied by the location and usage. For example, when you assign a function to environment callbacks like window.onmousemove in the browser, TypeScript can imply the parameter type of the function is of type MouseEvent because it knows onmousemove takes a function with this as its first parameter (the type is inferred from the hand side of the assignment).

When the TypeScript compiler compiles your code, it creates an Abstract Syntax Tree (AST) of it. Essentially, an AST can be thought of as a tree representation of the syntax of your source code, with each node being a data structure representing a construct in the relating source code. The tree is complete with nodes of all the elements of the source code.

For example, if we took the simple TypeScript program:

console.log("hello world");

We would end up with an AST structure that could be represented textually in this form:

SourceFile
  ExpressionStatement
    CallExpression
      PropertyAccessExpression
        Identifier
        Identifier
      StringLiteral
  EndOfFileToken

Here, SourceFile represents the file itself, we then have two child nodes; a parent ExpressionStatement (console.log("hello world");) and the EndOfFileToken, representing the end of the file. The ExpressionStatement comprises of one CallExpression, which has one PropertyAccessExpression composed of two Identifiers console and it’s property log. We also have the StringLiteral in the call expression which is, in this case, "hello world".

Each of these items in our tree is an AST node, which can be traversed and has a series of metadata attached to it. This data can be useful or interesting to us as developers. As an elementary example, we can determine the kind of the node; in TypeScript this is represented as an enum called SyntaxKind. To demonstrate, we can say that the value 243 is equal to FunctionDeclaration in the SyntaxKind enum. This scratches simple use scratches the surface what we can do, however, and as we go deeper into this blog post you’ll see the types of things we can do with the TypeScript AST.

To get more specific here are some ideas of things you could do by traversing and manipulating ASTs:

  • Write documentation in markdown or HTML for your TypeScript code
  • Edit code programmatically by using the manipulation features
  • Automatically update code between new versions of a library or framework
  • Write new code, leveraging something like code-block-writer
  • Write types for the return payloads of your APIs for your frontend code
  • Write a custom linter for your TypeScript code

Now, lets first dive into how we can explore ASTs using the TypeScript compiler API.

Making sense of TypeScript’s AST

The typescript package provides a compiler API, which allows you to access to the AST nodes. In theory, if we wanted to print the above AST we could do so using the typescript module itself like so:

import * as ts from 'typescript';

const code = "console.log('hello world')";

const sourceFile = ts.createSourceFile('temp.ts', code);
let indent = 0;

function printTree(node) {
    console.log(new Array(indent + 1).join(' ') + ts.SyntaxKind[node.kind]);
    indent++;
    ts.forEachChild(node, printTree);
    indent--;
}
printTree(sourceFile);

Here we can see us creating a temporary sourcefile in memory using the code string, then traversing down the tree to print out its children type. This will print out the tree depicted above. Let’s take this a small step further and use the node APIs to only log out the programs syntactic elements when it’s a string literal:

import * as ts from 'typescript';

const code = "console.log('hello world')";

const sourceFile = ts.createSourceFile('temp.ts', code);

function printTree(node) {
    if (ts.isStringLiteral(node)) {
        console.log("Text:", node.getFullText());
    }
    ts.forEachChild(node, printTree);
}
printTree(sourceFile);

This program would log out hello world, as it ignores any node that isn’t a string literal.

This is perhaps one of the most rudimentary things we could do with the compiler API, but there’s a whole host of use cases and more detailed dives into the TypeScript Compiler API in the TypeScript GitHub documentation.

Introducing ts-morph

Some might argue that the TypeScript compiler API can be slightly cumbersome to work with. ts-morph is a package that wraps around the compiler API to provide a smoother and easier way to deal with the TypeScript code.

Let’s rewrite the first program we took from using the TypeScript compiler API:

import { Project } from "ts-morph";

const project = new Project();
const code = "console.log('hello world')";
const sourceFile = project.createSourceFile("temp.ts", code);

let indent = 0;

function printTree(node) {
    console.log(new Array(indent + 1).join(' ') + node.getKindName());
    indent++;
    node.forEachChild(printTree);
    indent--;
}
printTree(sourceFile);

The main difference here is that instead of using methods on the ts module, ts-morph creates an object that has those methods attached to the nodes. As such we could turn the second program into:

import { Project } from "ts-morph";

const project = new Project();
const code = "console.log('hello world')";
const sourceFile = project.createSourceFile("temp.ts", code);

function printTree(node: SourceFile) {
    if (node.getKindName() === 'StringLiteral') {
        console.log("Text:", node.getFullText());
    }
    node.forEachChild(printTree);
}
printTree(sourceFile);

Putting ts-morph to work

Taking ts-morph a bit further we can relatively straightforwardly go about doing more interesting things. Say we wanted to get the names of all the classes that contained a certain property name in a file, we could write a function that does that like so that:

const findClassesWithPropertyName = (sourceFile: SourceFile, name: string) => {

    const classes = sourceFile.getClasses();
    const classesWithProperty: ClassDeclaration[] = [];

    for (let i = 0; i > classes.length; i++) {
        const tsClass = classes[i];
        const matches = tsClass.getProperties().map((p) => p.getName()).includes(name)
        if (matches) {
            classesWithProperty.push(tsClass);
        }
    }

    return classesWithProperty;

} 

Let’s think of another example program; in this case, we are interested in determining the depth of the deepest nested function within a source code file which contains top-level functions. We could achieve this using something like the following code:

const getDeepestFunction = (sourceFile: Node | SourceFile) => {
    let deepest = 0;

    const stack: Node[] = [sourceFile.getFirstChildByKind(ts.SyntaxKind.SyntaxList)];
    const parentNodes = [...stack[0].getChildren()]

    let depth = 0;

    while (stack.length) {
        const node = stack.pop();
        const kind = node.getKindName();

        if (parentNodes.indexOf(node) !== -1) {
            depth = 0;
        }
        
        if ((kind === 'FunctionDeclaration' || kind === 'ArrowFunction') ) {
            depth++
            if (depth > deepest) {
                deepest = depth;
            }
        }
        stack.push(...node.getChildren());
    }

    return deepest;
} 

Here you can see we create a stack and iterate through the child nodes, adding depth every time we encounter a function declaration or an ES6 arrow function. We reset if we are in one of the top-level parent nodes.

Replacing and updating AST nodes

Lastly, I want to demonstrate how you could rewrite your codes AST using ts-morph (although the behaviour is basically identical for the core typescript library). Let’s say we have a code snippet that converts 10000000000 bytes to kilobytes, megabytes and gigabytes. However we notice that the conversion is off by one!:

const totalSize = 10000000000;
const totalSizeKB = totalSize / Math.pow(1024,2);
const totalSizeMB = totalSize / Math.pow(1024,3);
const totalSizeGB = totalSize / Math.pow(1024,4);

This is of course slightly contrived and we could certainly fix this issue manually, but it’s useful to demonstrate how we can start doing code transforms and updating our code programmatically. We could write a function that changes the powers from 2, 3 and 4 to be 1, 2 and 3 respectively:

const rewriteMemoryPowers = (sourceFile: SourceFile) => {
    return sourceFile.transform(traversal => { 
        const node = traversal.visitChildren(); // Here travseral visits children in postorder
        if (ts.isNumericLiteral(node) && node.getText().length === 1) {
            return ts.createNumericLiteral(String(parseInt(node.getText()) - 1));
        }
        return node;
    });
};

Here we use the transform method which returns a TransformTraversalControl. This, in turn, allows us to traverse down the AST and replace and update node; we call visitChildren to make sure we traverse down the child nodes (here ‘traversal’ also has a currentNode property if you are not interested in the node’s children).

The program goes through each node and its children, determines if it is a numeric literal, and if it is and has a length of 1 (e.g. 2, 3, 4) then we decrement it by one and create a new numeric literal in its place. That should fix our issue! Arguably the way this is coded isn’t super robust, but the aim here is to demonstrate how you can use the transform method to recurse down the AST tree and update the behaviour of the program by changing its nodes.

Saving and emitting to disk

If you made transforms to your code, or you’re interested in converting them to JavaScript, saving and emitting will be useful features. Thankfully with ts-morph, this is reasonably straightforward. In our previous example, if we wanted to save the file to disk, we could do something like so:

const sourceFile = project.createSourceFile("conversion.ts", code);
rewriteMemoryPowers(sourceFile).saveSync();

Here we use saveSync, but you can use save if you would like that to be asynchronous. On a similar note, assuming we wanted to emit the compiled JavaScript file, we could do:

const sourceFile = project.createSourceFile("conversion.ts", code);
rewriteMemoryPowers(sourceFile).emitSync();

This will write a conversion.js file to disk. Here we have been using in-memory files up until the point we save them, but we can also read files from disk, and even whole directories if we so wish. See the ts-morph docs on navigation and directories.

Conclusion

Hopefully, this blog post has shown you how you can get started with exploring TypeScript ASTs and how you can manipulate their data structures in useful ways.

An aim here was that the ideas at the beginning of this post give some inspiration for what might be possible. If there is interest I can look into exploring one of these examples in a further blog post.