OpenRewrite Learning (Part 2): Lossless Semantic Tree (LST)
In the previous article, we introduced LST as the foundation for OpenRewrite’s precise and controllable code modifications. This time, we’ll dive deeper into how OpenRewrite preserves the original semantic structure of the code during parsing.
What is LST?
LST stands for Lossless Semantic Tree. Let’s break down its meaning:
- Lossless: Ensures no information is lost during code parsing and structuring. This includes preserving spaces, comments, and formatting, allowing the structural attributes to retain the original code’s details.
- Semantic: Represents not just syntactic elements but also the semantic relationships between code fragments.
- Tree: Organizes code elements hierarchically, making traversal, querying, and transformation intuitive and clearly structured.
When discussing semantic trees, we must mention the AST.
Abstract Syntax Tree (AST)
The Abstract Syntax Tree (AST) represents the syntactic structure of programming code as a tree. AST is widely used in compilers to describe program structure.
For example, the AST below represents this snippet of the Euclidean algorithm, where each node corresponds to a structure in the source code:
while b ≠ 0:
if a > b:
a := a - b
else:
b := b - a
return a
Lossless Semantic Tree (LST)
OpenRewrite’s LST (Lossless Semantic Tree) boasts unique features that enable accurate code search and transformation even across repositories:
- Preservation of Type Information: Each LST includes detailed type information. For example, in the source code, you might only see a field reference like
myField
. However, the LST retains the type information for this field. Even if the type is not defined in the current file or project, these attributes remain accessible for querying. - Lossless Formatting: LST retains formatting details such as spaces, indentation, and line breaks. When rendering the tree structure, it can reproduce the original code format. Additionally, when inserting or restructuring code, new fragments automatically adapt to the surrounding style, ensuring consistency with the original formatting.
The preservation of type information is critical for precise pattern matching. For instance, when searching for specific SLF4J logging statements, the type information allows you to determine whether the logger
variable is an SLF4J instance or from another framework like Logback:
logger.info("Hi");
AST vs. LST
Let’s compare the two structures based on the features and examples above:
FeatureASTLSTInformation CompletenessFocuses on syntax and semantics; loses formatting and comments.Retains all details, including formatting, whitespace, and comments.Fine-Grained ControlLimited control over localized changes; output may not match original formatting.Enables precise localized changes while preserving the original code style.Semantic RichnessContains syntax and basic semantic info.Provides deeper semantic fidelity, exposing richer relationships.
The advantages of LST make it a powerful tool for automated refactoring and transformation while maintaining consistency and minimizing manual intervention.
LST Examples
Here are two examples showcasing the LST structure in Java. In OpenRewrite, all Java LST types implement the J
interface, which extends Tree
. The Tree
interface is the conceptual basis of the LST implementation across languages and data formats.
Example 1
The first example, from the official documentation, shows the LST for a Java class containing definitions for a class, fields, and methods. The root of the tree is of type J.CompilationUnit
, and each node represents a J.XXX
LST type.
Example 2
Here’s a slightly more complex Java class with constructors, method calls, and field accesses. Using OpenRewrite’s TreeVisitingPrinter.printTree(tree)
method, we can print its LST.
Java Code:
package com.atbug.demo;
class FooBar {
private String greeting = "world";
public FooBar(String greeting) { this.greeting = greeting; }
public String hello() { return this.greeting; }
public String greeting() { return "Hello, " + this.hello() + "!"; }
}
LST Output:
The LST reveals detailed information about code blocks, variable types, method declarations, method calls, and return types for each node in the tree.
Summary
The LST is the core of OpenRewrite, enabling fine-grained control for recipes. To write robust, controllable recipes, it’s essential to understand the LST structure and define precise conditions for specific code paths.
Using TreeVisitingPrinter.printTree(Tree)
to visualize the semantic tree allows you to intuitively grasp code structures and design more effective recipes. As we deepen our understanding of LST, we can leverage its capabilities to create highly customized and powerful automated transformations.