mirror of
https://github.com/RGBCube/cstree
synced 2025-07-27 09:07:44 +00:00
Merge pull request #17 from domenicquirl/add-docs
This commit is contained in:
commit
be8477e5a4
11 changed files with 538 additions and 122 deletions
23
README.md
23
README.md
|
@ -10,19 +10,34 @@
|
|||
</div>
|
||||
|
||||
`cstree` is a library for creating and working with concrete syntax trees (CSTs).
|
||||
The concept of CSTs is inspired in part by Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
|
||||
"Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which represent information about the source text of a document and reduce this information to the minimal amount necessary to correctly interpret it.
|
||||
In contrast, CSTs are lossless representations of the entire input where all tree nodes are represented uniformly (i.e. the nodes are _untyped_), but include a `SyntaxKind` field to determine the kind of node.
|
||||
One of the big advantages of this representation is not only that it can recreate the original source exactly, but also that it lends itself very well to the representation of _incomplete or erroneous_ trees and is thus very suited for usage in contexts such as IDEs.
|
||||
|
||||
The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/), developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/).
|
||||
While we are building our own documentation, a conceptual overview of their implementation is available in the [rust-analyzer repo](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
|
||||
The concept of and the data structures for CSTs are inspired in part by Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
|
||||
Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source text in _position independent_ green nodes.
|
||||
Tokens and nodes that appear identically at multiple places in the source text are _deduplicated_ in this representation in order to store the tree efficiently.
|
||||
This means that the green tree may not structurally be a tree.
|
||||
To remedy this, the actual syntax tree is constructed on top of the green tree as a secondary tree (called _red_ tree), which models the exact source structure.
|
||||
|
||||
The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/), developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who wrote up a conceptual overview of their implementation in [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
|
||||
Notable differences of `cstree` compared to `rowan`:
|
||||
- Syntax trees (red trees) are created lazily, but are persistent. Once a node has been created, it will remain allocated, while `rowan` re-creates the red layer on the fly. Apart from the trade-off discussed [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#memoized-rednodes), this helps to achieve good tree traversal speed while providing the next points:
|
||||
- Syntax (red) nodes are `Send` and `Sync`, allowing to share realized trees across threads. This is achieved by atomically reference counting syntax trees as a whole, which also gets rid of the need to reference count individual nodes (helping with the point above).
|
||||
- Syntax nodes can hold custom data.
|
||||
- `cstree` trees are trees over interned strings. This means `cstree` will deduplicate the text of tokens such as identifiers with the same name. In this position, `rowan` stores each string, with a small string optimization (see [`SmolStr`](https://crates.io/crates/smol_str)).
|
||||
- Performance optimizations for tree creation: only allocate new nodes on the heap if they are not in cache, avoid recursively hashing subtrees
|
||||
- Performance optimizations for tree traversal: persisting red nodes allows tree traversal methods to return references. You can still `clone` to obtain an owned node, but you only pay that cost when you need to.
|
||||
|
||||
## Getting Started
|
||||
The main entry points for constructing syntax trees are `GreenNodeBuilder` and `SyntaxNode::new_root` for green and red trees respectively.
|
||||
See `examples/s_expressions` for a guided tutorial to `cstree`.
|
||||
|
||||
## AST Layer
|
||||
While `cstree` is built for concrete syntax trees, applications are quite easily able to work with either a CST or an AST representation, or freely switch between them.
|
||||
To do so, use `cstree` to build syntax and underlying green tree and provide AST wrappers for your different kinds of nodes.
|
||||
An example of how this is done can be seen [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated.rs) and [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated/nodes.rs) (note that the latter file is automatically generated by a task).
|
||||
|
||||
See `examples/s_expressions` for a tutorial.
|
||||
## License
|
||||
|
||||
`cstree` is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
|
||||
|
|
|
@ -1,19 +1,13 @@
|
|||
//! In this tutorial, we will write parser
|
||||
//! and evaluator of arithmetic S-expressions,
|
||||
//! which look like this:
|
||||
//! In this tutorial, we will write parser and evaluator of arithmetic S-expressions, which look like
|
||||
//! this:
|
||||
//! ```
|
||||
//! (+ (* 15 2) 62)
|
||||
//! ```
|
||||
//!
|
||||
//! It's suggested to read the conceptual overview of the design
|
||||
//! alongside this tutorial:
|
||||
//! You may want to follow the conceptual overview of the design alongside this tutorial:
|
||||
//! https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md
|
||||
|
||||
/// cstree uses `TextSize` and `TextRange` types to
|
||||
/// represent utf8 offsets and ranges.
|
||||
|
||||
/// Let's start with defining all kinds of tokens and
|
||||
/// composite nodes.
|
||||
/// Let's start with defining all kinds of tokens and composite nodes.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
#[allow(non_camel_case_types)]
|
||||
#[repr(u16)]
|
||||
|
@ -29,11 +23,12 @@ enum SyntaxKind {
|
|||
ATOM, // `+`, `15`, wraps a WORD token
|
||||
ROOT, // top-level node: a list of s-expressions
|
||||
}
|
||||
use std::collections::VecDeque;
|
||||
|
||||
use SyntaxKind::*;
|
||||
|
||||
/// Some boilerplate is needed, as cstree settled on using its own
|
||||
/// `struct SyntaxKind(u16)` internally, instead of accepting the
|
||||
/// user's `enum SyntaxKind` as a type parameter.
|
||||
/// Some boilerplate is needed, as cstree represents kinds as `struct SyntaxKind(u16)` internally,
|
||||
/// in order to not need the user's `enum SyntaxKind` as a type parameter.
|
||||
///
|
||||
/// First, to easily pass the enum variants into cstree via `.into()`:
|
||||
impl From<SyntaxKind> for cstree::SyntaxKind {
|
||||
|
@ -42,9 +37,9 @@ impl From<SyntaxKind> for cstree::SyntaxKind {
|
|||
}
|
||||
}
|
||||
|
||||
/// Second, implementing the `Language` trait teaches cstree to convert between
|
||||
/// these two SyntaxKind types, allowing for a nicer SyntaxNode API where
|
||||
/// "kinds" are values from our `enum SyntaxKind`, instead of plain u16 values.
|
||||
/// Second, implementing the `Language` trait teaches cstree to convert between these two SyntaxKind
|
||||
/// types, allowing for a nicer SyntaxNode API where "kinds" are values from our `enum SyntaxKind`,
|
||||
/// instead of plain u16 values.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
enum Lang {}
|
||||
impl cstree::Language for Lang {
|
||||
|
@ -60,17 +55,18 @@ impl cstree::Language for Lang {
|
|||
}
|
||||
}
|
||||
|
||||
/// GreenNode is an immutable tree, which is cheap to change,
|
||||
/// but doesn't contain offsets and parent pointers.
|
||||
/// GreenNode is an immutable tree, which caches identical nodes and tokens, but doesn't contain
|
||||
/// offsets and parent pointers.
|
||||
/// cstree also deduplicates the actual source string in addition to the tree nodes, so we will need
|
||||
/// the Resolver to get the real text back from the interned representation.
|
||||
use cstree::{interning::Resolver, GreenNode};
|
||||
|
||||
/// You can construct GreenNodes by hand, but a builder
|
||||
/// is helpful for top-down parsers: it maintains a stack
|
||||
/// of currently in-progress nodes
|
||||
/// You can construct GreenNodes by hand, but a builder is helpful for top-down parsers: it maintains
|
||||
/// a stack of currently in-progress nodes.
|
||||
use cstree::GreenNodeBuilder;
|
||||
|
||||
/// The parse results are stored as a "green tree".
|
||||
/// We'll discuss working with the results later
|
||||
/// We'll discuss how to work with the results later.
|
||||
struct Parse<I> {
|
||||
green_node: GreenNode,
|
||||
resolver: I,
|
||||
|
@ -80,17 +76,14 @@ struct Parse<I> {
|
|||
|
||||
/// Now, let's write a parser.
|
||||
/// Note that `parse` does not return a `Result`:
|
||||
/// by design, syntax tree can be built even for
|
||||
/// completely invalid source code.
|
||||
/// By design, syntax trees can be built even for completely invalid source code.
|
||||
fn parse(text: &str) -> Parse<impl Resolver> {
|
||||
struct Parser<'input> {
|
||||
/// input tokens, including whitespace,
|
||||
/// in *reverse* order.
|
||||
tokens: Vec<(SyntaxKind, &'input str)>,
|
||||
/// the in-progress tree.
|
||||
/// input tokens, including whitespace.
|
||||
tokens: VecDeque<(SyntaxKind, &'input str)>,
|
||||
/// the in-progress green tree.
|
||||
builder: GreenNodeBuilder<'static, 'static>,
|
||||
/// the list of syntax errors we've accumulated
|
||||
/// so far.
|
||||
/// the list of syntax errors we've accumulated so far.
|
||||
errors: Vec<String>,
|
||||
}
|
||||
|
||||
|
@ -115,10 +108,10 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
SexpRes::RParen => {
|
||||
self.builder.start_node(ERROR.into());
|
||||
self.errors.push("unmatched `)`".to_string());
|
||||
self.bump(); // be sure to chug along in case of error
|
||||
self.bump(); // be sure to advance even in case of an error, so as to not get stuck
|
||||
self.builder.finish_node();
|
||||
}
|
||||
SexpRes::Ok => (),
|
||||
SexpRes::Ok => {}
|
||||
}
|
||||
}
|
||||
// Don't forget to eat *trailing* whitespace
|
||||
|
@ -126,11 +119,13 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
// Close the root node.
|
||||
self.builder.finish_node();
|
||||
|
||||
// Turn the builder into a GreenNode
|
||||
let (tree, resolver) = self.builder.finish();
|
||||
// Get the green tree from the builder.
|
||||
// Note that, since we didn't provide our own interner to the builder, it has
|
||||
// instantiated one for us and now returns it together with the tree.
|
||||
let (tree, interner) = self.builder.finish();
|
||||
Parse {
|
||||
green_node: tree,
|
||||
resolver: resolver.unwrap().into_resolver(),
|
||||
resolver: interner.unwrap().into_resolver(),
|
||||
errors: self.errors,
|
||||
}
|
||||
}
|
||||
|
@ -150,7 +145,7 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
self.bump();
|
||||
break;
|
||||
}
|
||||
SexpRes::Ok => (),
|
||||
SexpRes::Ok => {}
|
||||
}
|
||||
}
|
||||
// close the list node
|
||||
|
@ -160,8 +155,7 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
fn sexp(&mut self) -> SexpRes {
|
||||
// Eat leading whitespace
|
||||
self.skip_ws();
|
||||
// Either a list, an atom, a closing paren,
|
||||
// or an eof.
|
||||
// Either a list, an atom, a closing paren, or an eof.
|
||||
let t = match self.current() {
|
||||
None => return SexpRes::Eof,
|
||||
Some(R_PAREN) => return SexpRes::RParen,
|
||||
|
@ -182,13 +176,13 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
|
||||
/// Advance one token, adding it to the current branch of the tree builder.
|
||||
fn bump(&mut self) {
|
||||
let (kind, text) = self.tokens.pop().unwrap();
|
||||
let (kind, text) = self.tokens.pop_front().unwrap();
|
||||
self.builder.token(kind.into(), text);
|
||||
}
|
||||
|
||||
/// Peek at the first unprocessed token
|
||||
fn current(&self) -> Option<SyntaxKind> {
|
||||
self.tokens.last().map(|(kind, _)| *kind)
|
||||
self.tokens.front().map(|(kind, _)| *kind)
|
||||
}
|
||||
|
||||
fn skip_ws(&mut self) {
|
||||
|
@ -198,30 +192,29 @@ fn parse(text: &str) -> Parse<impl Resolver> {
|
|||
}
|
||||
}
|
||||
|
||||
let mut tokens = lex(text);
|
||||
tokens.reverse();
|
||||
Parser {
|
||||
tokens,
|
||||
tokens: lex(text),
|
||||
builder: GreenNodeBuilder::new(),
|
||||
errors: Vec::new(),
|
||||
}
|
||||
.parse()
|
||||
}
|
||||
|
||||
/// To work with the parse results we need a view into the
|
||||
/// green tree - the Syntax tree.
|
||||
/// It is also immutable, like a GreenNode,
|
||||
/// but it contains parent pointers, offsets, and
|
||||
/// has identity semantics.
|
||||
|
||||
/// To work with the parse results we need a view into the green tree - the syntax tree.
|
||||
/// It is also immutable, like a GreenNode, but it contains parent pointers, offsets, and has
|
||||
/// identity semantics.
|
||||
type SyntaxNode = cstree::SyntaxNode<Lang>;
|
||||
#[allow(unused)]
|
||||
type SyntaxToken = cstree::SyntaxToken<Lang>;
|
||||
#[allow(unused)]
|
||||
type SyntaxElement = cstree::NodeOrToken<SyntaxNode, SyntaxToken>;
|
||||
type SyntaxElement = cstree::SyntaxElement<Lang>;
|
||||
|
||||
impl<I> Parse<I> {
|
||||
fn syntax(&self) -> SyntaxNode {
|
||||
// If we owned `self`, we could use `new_root_with_resolver` instead at this point to attach
|
||||
// `self.resolver` to the tree. This simplifies retrieving text and provides automatic
|
||||
// implementations for useful traits like `Display`, but also consumes the resolver (it can
|
||||
// still be accessed indirectly via the `resolver` method).
|
||||
SyntaxNode::new_root(self.green_node.clone())
|
||||
}
|
||||
}
|
||||
|
@ -234,6 +227,7 @@ fn test_parser() {
|
|||
let node = parse.syntax();
|
||||
let resolver = &parse.resolver;
|
||||
assert_eq!(
|
||||
// note how, since we didn't attach the resolver in `syntax`, we now need to provide it
|
||||
node.debug(resolver, false),
|
||||
"ROOT@0..15", // root node, spanning 15 bytes
|
||||
);
|
||||
|
@ -259,17 +253,13 @@ fn test_parser() {
|
|||
}
|
||||
|
||||
/// So far, we've been working with a homogeneous untyped tree.
|
||||
/// It's nice to provide generic tree operations, like traversals,
|
||||
/// but it's a bad fit for semantic analysis.
|
||||
/// This crate itself does not provide AST facilities directly,
|
||||
/// but it is possible to layer AST on top of `SyntaxNode` API.
|
||||
/// Let's write a function to evaluate S-expression.
|
||||
/// That tree is nice to provide generic tree operations, like traversals, but it's a bad fit for
|
||||
/// semantic analysis. cstree itself does not provide AST facilities directly, but it is possible to
|
||||
/// layer AST on top of `SyntaxNode` API. Let's write a function to evaluate S-expressions.
|
||||
///
|
||||
/// For that, let's define AST nodes.
|
||||
/// It'll be quite a bunch of repetitive code, so we'll use a macro.
|
||||
///
|
||||
/// For a real language, you'd want to generate an AST. I find a
|
||||
/// combination of `serde`, `ron` and `tera` crates invaluable for that!
|
||||
/// For a real language, you may want to automatically generate the AST implementations with a task.
|
||||
macro_rules! ast_node {
|
||||
($ast:ident, $kind:ident) => {
|
||||
#[derive(PartialEq, Eq, Hash)]
|
||||
|
@ -292,7 +282,7 @@ ast_node!(Root, ROOT);
|
|||
ast_node!(Atom, ATOM);
|
||||
ast_node!(List, LIST);
|
||||
|
||||
// Sexp is slightly different, so let's do it by hand.
|
||||
// Sexp is slightly different because it can be both an atom and a list, so let's do it by hand.
|
||||
#[derive(PartialEq, Eq, Hash)]
|
||||
#[repr(transparent)]
|
||||
struct Sexp(SyntaxNode);
|
||||
|
@ -319,8 +309,7 @@ impl Sexp {
|
|||
}
|
||||
}
|
||||
|
||||
// Let's enhance AST nodes with ancillary functions and
|
||||
// eval.
|
||||
// Let's enhance AST nodes with ancillary functions and eval.
|
||||
impl Root {
|
||||
fn sexps(&self) -> impl Iterator<Item = Sexp> + '_ {
|
||||
self.0.children().cloned().filter_map(Sexp::cast)
|
||||
|
@ -413,9 +402,8 @@ nan
|
|||
assert_eq!(res, vec![Some(92), Some(92), None, None, Some(92),])
|
||||
}
|
||||
|
||||
/// Split the input string into a flat list of tokens
|
||||
/// (such as L_PAREN, WORD, and WHITESPACE)
|
||||
fn lex(text: &str) -> Vec<(SyntaxKind, &str)> {
|
||||
/// Split the input string into a flat list of tokens (such as L_PAREN, WORD, and WHITESPACE)
|
||||
fn lex(text: &str) -> VecDeque<(SyntaxKind, &str)> {
|
||||
fn tok(t: SyntaxKind) -> m_lexer::TokenKind {
|
||||
m_lexer::TokenKind(cstree::SyntaxKind::from(t).0)
|
||||
}
|
||||
|
@ -445,6 +433,7 @@ fn lex(text: &str) -> Vec<(SyntaxKind, &str)> {
|
|||
.into_iter()
|
||||
.map(|t| (t.len, kind(t.kind)))
|
||||
.scan(0usize, |start_offset, (len, kind)| {
|
||||
// reconstruct the item's source text from offset and len
|
||||
let s = &text[*start_offset..*start_offset + len];
|
||||
*start_offset += len;
|
||||
Some((kind, s))
|
||||
|
|
|
@ -1,3 +1,7 @@
|
|||
//! Implementation of the inner, "green" tree.
|
||||
//! The [`GreenNodeBuilder`] is the main entry point to constructing [`GreenNode`]s and
|
||||
//! [`GreenToken`]s.
|
||||
|
||||
mod builder;
|
||||
mod element;
|
||||
mod node;
|
||||
|
|
|
@ -18,6 +18,8 @@ use super::{node::GreenNodeHead, token::GreenTokenData};
|
|||
/// this node into the cache.
|
||||
const CHILDREN_CACHE_THRESHOLD: usize = 3;
|
||||
|
||||
/// A `NodeCache` deduplicates identical tokens and small nodes during tree construction.
|
||||
/// You can re-use the same cache for multiple similar trees with [`GreenNodeBuilder::with_cache`].
|
||||
#[derive(Debug)]
|
||||
pub struct NodeCache<'i, I = Rodeo<Spur, FxBuildHasher>> {
|
||||
nodes: FxHashMap<GreenNodeHead, GreenNode>,
|
||||
|
@ -26,6 +28,27 @@ pub struct NodeCache<'i, I = Rodeo<Spur, FxBuildHasher>> {
|
|||
}
|
||||
|
||||
impl NodeCache<'static, Rodeo<Spur, FxBuildHasher>> {
|
||||
/// Constructs a new, empty cache.
|
||||
///
|
||||
/// By default, this will also create a default interner to deduplicate source text (strings) across
|
||||
/// tokens. To re-use an existing interner, see [`with_interner`](NodeCache::with_interner).
|
||||
/// # Examples
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # const ROOT: SyntaxKind = SyntaxKind(0);
|
||||
/// # const INT: SyntaxKind = SyntaxKind(1);
|
||||
/// # fn parse(b: &mut GreenNodeBuilder, s: &str) {}
|
||||
/// let mut cache = NodeCache::new();
|
||||
/// let mut builder = GreenNodeBuilder::with_cache(&mut cache);
|
||||
/// # builder.start_node(ROOT);
|
||||
/// # builder.token(INT, "42");
|
||||
/// # builder.finish_node();
|
||||
/// parse(&mut builder, "42");
|
||||
/// let (tree, _) = builder.finish();
|
||||
/// assert_eq!(tree.kind(), ROOT);
|
||||
/// let int = tree.children().next().unwrap();
|
||||
/// assert_eq!(int.kind(), INT);
|
||||
/// ```
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
nodes: FxHashMap::default(),
|
||||
|
@ -49,6 +72,27 @@ impl<'i, I> NodeCache<'i, I>
|
|||
where
|
||||
I: Interner,
|
||||
{
|
||||
/// Constructs a new, empty cache that will use the given interner to deduplicate source text
|
||||
/// (strings) across tokens.
|
||||
/// # Examples
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # use lasso::Rodeo;
|
||||
/// # const ROOT: SyntaxKind = SyntaxKind(0);
|
||||
/// # const INT: SyntaxKind = SyntaxKind(1);
|
||||
/// # fn parse(b: &mut GreenNodeBuilder<Rodeo>, s: &str) {}
|
||||
/// let mut interner = Rodeo::new();
|
||||
/// let mut cache = NodeCache::with_interner(&mut interner);
|
||||
/// let mut builder = GreenNodeBuilder::with_cache(&mut cache);
|
||||
/// # builder.start_node(ROOT);
|
||||
/// # builder.token(INT, "42");
|
||||
/// # builder.finish_node();
|
||||
/// parse(&mut builder, "42");
|
||||
/// let (tree, _) = builder.finish();
|
||||
/// assert_eq!(tree.kind(), ROOT);
|
||||
/// let int = tree.children().next().unwrap();
|
||||
/// assert_eq!(int.kind(), INT);
|
||||
/// ```
|
||||
pub fn with_interner(interner: &'i mut I) -> Self {
|
||||
Self {
|
||||
nodes: FxHashMap::default(),
|
||||
|
@ -183,11 +227,32 @@ impl<T: Default> Default for MaybeOwned<'_, T> {
|
|||
}
|
||||
}
|
||||
|
||||
/// A checkpoint for maybe wrapping a node. See `GreenNodeBuilder::checkpoint` for details.
|
||||
/// A checkpoint for maybe wrapping a node. See [`GreenNodeBuilder::checkpoint`] for details.
|
||||
#[derive(Clone, Copy, Debug)]
|
||||
pub struct Checkpoint(usize);
|
||||
|
||||
/// A builder for a green tree.
|
||||
/// A builder for green trees.
|
||||
/// Construct with [`new`](GreenNodeBuilder::new) or [`with_cache`](GreenNodeBuilder::with_cache). To
|
||||
/// add tree nodes, start them with [`start_node`](GreenNodeBuilder::start_node), add
|
||||
/// [`token`](GreenNodeBuilder::token)s and then [`finish_node`](GreenNodeBuilder::finish_node). When
|
||||
/// the whole tree is constructed, call [`finish`](GreenNodeBuilder::finish) to obtain the root.
|
||||
///
|
||||
/// # Examples
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # const ROOT: SyntaxKind = SyntaxKind(0);
|
||||
/// # const INT: SyntaxKind = SyntaxKind(1);
|
||||
/// let mut builder = GreenNodeBuilder::new();
|
||||
/// builder.start_node(ROOT);
|
||||
/// builder.token(INT, "42");
|
||||
/// builder.finish_node();
|
||||
/// let (tree, interner) = builder.finish();
|
||||
/// assert_eq!(tree.kind(), ROOT);
|
||||
/// let int = tree.children().next().unwrap();
|
||||
/// assert_eq!(int.kind(), INT);
|
||||
/// let resolver = interner.unwrap().into_resolver();
|
||||
/// assert_eq!(int.as_token().unwrap().text(&resolver), "42");
|
||||
/// ```
|
||||
#[derive(Debug)]
|
||||
pub struct GreenNodeBuilder<'cache, 'interner, I = Rodeo<Spur, FxBuildHasher>> {
|
||||
cache: MaybeOwned<'cache, NodeCache<'interner, I>>,
|
||||
|
@ -196,7 +261,7 @@ pub struct GreenNodeBuilder<'cache, 'interner, I = Rodeo<Spur, FxBuildHasher>> {
|
|||
}
|
||||
|
||||
impl GreenNodeBuilder<'static, 'static, Rodeo<Spur, FxBuildHasher>> {
|
||||
/// Creates new builder.
|
||||
/// Creates new builder with an empty [`NodeCache`].
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
cache: MaybeOwned::Owned(NodeCache::new()),
|
||||
|
@ -216,8 +281,8 @@ impl<'cache, 'interner, I> GreenNodeBuilder<'cache, 'interner, I>
|
|||
where
|
||||
I: Interner,
|
||||
{
|
||||
/// Reusing `NodeCache` between different `GreenNodeBuilder`s saves memory.
|
||||
/// It allows to structurally share underlying trees.
|
||||
/// Reusing a [`NodeCache`] between multiple builders saves memory, as it allows to structurally
|
||||
/// share underlying trees.
|
||||
pub fn with_cache(cache: &'cache mut NodeCache<'interner, I>) -> Self {
|
||||
Self {
|
||||
cache: MaybeOwned::Borrowed(cache),
|
||||
|
@ -226,22 +291,21 @@ where
|
|||
}
|
||||
}
|
||||
|
||||
/// Adds new token to the current branch.
|
||||
/// Add new token to the current branch.
|
||||
#[inline]
|
||||
pub fn token(&mut self, kind: SyntaxKind, text: &str) {
|
||||
let token = self.cache.token(kind, text);
|
||||
self.children.push(token.into());
|
||||
}
|
||||
|
||||
/// Start new node and make it current.
|
||||
/// Start new node of the given `kind` and make it current.
|
||||
#[inline]
|
||||
pub fn start_node(&mut self, kind: SyntaxKind) {
|
||||
let len = self.children.len();
|
||||
self.parents.push((kind, len));
|
||||
}
|
||||
|
||||
/// Finish current branch and restore previous
|
||||
/// branch as current.
|
||||
/// Finish the current branch and restore the previous branch as current.
|
||||
#[inline]
|
||||
pub fn finish_node(&mut self) {
|
||||
let (kind, first_child) = self.parents.pop().unwrap();
|
||||
|
@ -250,12 +314,13 @@ where
|
|||
self.children.push(node.into());
|
||||
}
|
||||
|
||||
/// Prepare for maybe wrapping the next node.
|
||||
/// The way wrapping works is that you first of all get a checkpoint,
|
||||
/// then you place all tokens you want to wrap, and then *maybe* call
|
||||
/// `start_node_at`.
|
||||
/// Example:
|
||||
/// ```rust
|
||||
/// Prepare for maybe wrapping the next node with a surrounding node.
|
||||
///
|
||||
/// The way wrapping works is that you first get a checkpoint, then you add nodes and tokens as
|
||||
/// normal, and then you *maybe* call [`start_node_at`](GreenNodeBuilder::start_node_at).
|
||||
///
|
||||
/// # Examples
|
||||
/// ```
|
||||
/// # use cstree::{GreenNodeBuilder, SyntaxKind};
|
||||
/// # const PLUS: SyntaxKind = SyntaxKind(0);
|
||||
/// # const OPERATION: SyntaxKind = SyntaxKind(1);
|
||||
|
@ -280,8 +345,8 @@ where
|
|||
Checkpoint(self.children.len())
|
||||
}
|
||||
|
||||
/// Wrap the previous branch marked by `checkpoint` in a new branch and
|
||||
/// make it current.
|
||||
/// Wrap the previous branch marked by [`checkpoint`](GreenNodeBuilder::checkpoint) in a new
|
||||
/// branch and make it current.
|
||||
#[inline]
|
||||
pub fn start_node_at(&mut self, checkpoint: Checkpoint, kind: SyntaxKind) {
|
||||
let Checkpoint(checkpoint) = checkpoint;
|
||||
|
@ -300,9 +365,16 @@ where
|
|||
self.parents.push((kind, checkpoint));
|
||||
}
|
||||
|
||||
/// Complete tree building. Make sure that
|
||||
/// `start_node_at` and `finish_node` calls
|
||||
/// are paired!
|
||||
/// Complete building the tree.
|
||||
///
|
||||
/// Make sure that calls to [`start_node`](GreenNodeBuilder::start_node) /
|
||||
/// [`start_node_at`](GreenNodeBuilder::start_node_at) and
|
||||
/// [`finish_node`](GreenNodeBuilder::finish_node) are balanced, i.e. that every started node has
|
||||
/// been completed!
|
||||
///
|
||||
/// If this builder was constructed with [`new`](GreenNodeBuilder::new), this method returns the
|
||||
/// interner used to deduplicate source text (strings) as its second return value to allow
|
||||
/// resolving tree tokens back to text and re-using the interner to build additonal trees.
|
||||
#[inline]
|
||||
pub fn finish(mut self) -> (GreenNode, Option<I>) {
|
||||
assert_eq!(self.children.len(), 1);
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
use std::{fmt, hash, mem};
|
||||
|
||||
// NOTE: From `thin_dst`:
|
||||
// NOTE from `thin_dst`:
|
||||
// This MUST be size=1 such that pointer math actually advances the pointer.
|
||||
type ErasedPtr = *const u8;
|
||||
|
||||
|
|
|
@ -12,7 +12,7 @@ use crate::{
|
|||
TextSize,
|
||||
};
|
||||
|
||||
#[repr(align(2))] // NB: this is an at-least annotation
|
||||
#[repr(align(2))] //to use 1 bit for pointer tagging. NB: this is an at-least annotation
|
||||
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
|
||||
pub(super) struct GreenNodeHead {
|
||||
kind: SyntaxKind,
|
||||
|
@ -40,8 +40,8 @@ impl GreenNodeHead {
|
|||
}
|
||||
}
|
||||
|
||||
/// Internal node in the immutable tree.
|
||||
/// It has other nodes and tokens as children.
|
||||
/// Internal node in the immutable "green" tree.
|
||||
/// It contains other nodes and tokens as its children.
|
||||
#[derive(Clone)]
|
||||
pub struct GreenNode {
|
||||
pub(super) data: ThinArc<GreenNodeHead, PackedGreenElement>,
|
||||
|
@ -54,7 +54,7 @@ impl std::fmt::Debug for GreenNode {
|
|||
}
|
||||
|
||||
impl GreenNode {
|
||||
/// Creates new Node.
|
||||
/// Creates a new Node.
|
||||
#[inline]
|
||||
pub fn new<I>(kind: SyntaxKind, children: I) -> GreenNode
|
||||
where
|
||||
|
@ -103,19 +103,19 @@ impl GreenNode {
|
|||
}
|
||||
}
|
||||
|
||||
/// Kind of this node.
|
||||
/// [`SyntaxKind`] of this node.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> SyntaxKind {
|
||||
self.data.header.header.kind
|
||||
}
|
||||
|
||||
/// Returns the length of the text covered by this node.
|
||||
/// Returns the length of text covered by this node.
|
||||
#[inline]
|
||||
pub fn text_len(&self) -> TextSize {
|
||||
self.data.header.header.text_len
|
||||
}
|
||||
|
||||
/// Children of this node.
|
||||
/// Iterator over all children of this node.
|
||||
#[inline]
|
||||
pub fn children(&self) -> Children<'_> {
|
||||
Children {
|
||||
|
@ -139,6 +139,7 @@ impl PartialEq for GreenNode {
|
|||
|
||||
impl Eq for GreenNode {}
|
||||
|
||||
/// An iterator over a [`GreenNode`]'s children.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Children<'a> {
|
||||
inner: slice::Iter<'a, PackedGreenElement>,
|
||||
|
|
|
@ -4,15 +4,15 @@ use std::{fmt, hash, mem::ManuallyDrop, ptr};
|
|||
use crate::{green::SyntaxKind, interning::Resolver, TextSize};
|
||||
use lasso::Spur;
|
||||
|
||||
#[repr(align(2))] // NB: this is an at-least annotation
|
||||
#[repr(align(2))] // to use 1 bit for pointer tagging. NB: this is an at-least annotation
|
||||
#[derive(Debug, PartialEq, Eq, Hash, Copy, Clone)]
|
||||
pub struct GreenTokenData {
|
||||
pub kind: SyntaxKind,
|
||||
pub text: Spur,
|
||||
pub text_len: TextSize,
|
||||
pub(super) struct GreenTokenData {
|
||||
pub(super) kind: SyntaxKind,
|
||||
pub(super) text: Spur,
|
||||
pub(super) text_len: TextSize,
|
||||
}
|
||||
|
||||
/// Leaf node in the immutable tree.
|
||||
/// Leaf node in the immutable "green" tree.
|
||||
pub struct GreenToken {
|
||||
ptr: ptr::NonNull<GreenTokenData>,
|
||||
}
|
||||
|
@ -39,9 +39,9 @@ impl GreenToken {
|
|||
unsafe { &*Self::remove_tag(self.ptr).as_ptr() }
|
||||
}
|
||||
|
||||
/// Creates new Token.
|
||||
/// Creates a new Token.
|
||||
#[inline]
|
||||
pub fn new(data: GreenTokenData) -> GreenToken {
|
||||
pub(super) fn new(data: GreenTokenData) -> GreenToken {
|
||||
let ptr = Arc::into_raw(Arc::new(data));
|
||||
let ptr = ptr::NonNull::new(ptr as *mut _).unwrap();
|
||||
GreenToken {
|
||||
|
@ -49,13 +49,13 @@ impl GreenToken {
|
|||
}
|
||||
}
|
||||
|
||||
/// Kind of this Token.
|
||||
/// [`SyntaxKind`] of this Token.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> SyntaxKind {
|
||||
self.data().kind
|
||||
}
|
||||
|
||||
/// Text of this Token.
|
||||
/// The original source text of this Token.
|
||||
#[inline]
|
||||
pub fn text<'i, I>(&self, resolver: &'i I) -> &'i str
|
||||
where
|
||||
|
@ -64,7 +64,7 @@ impl GreenToken {
|
|||
resolver.resolve(&self.data().text)
|
||||
}
|
||||
|
||||
/// Returns the length of the text covered by this token.
|
||||
/// Returns the length of text covered by this token.
|
||||
#[inline]
|
||||
pub fn text_len(&self) -> TextSize {
|
||||
self.data().text_len
|
||||
|
|
80
src/lib.rs
80
src/lib.rs
|
@ -1,14 +1,21 @@
|
|||
//! `cstree` is a generic library for creating and working with concrete syntax trees.
|
||||
//! The concept of CSTs is inspired in part by Swift's
|
||||
//! [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
|
||||
//!
|
||||
//! The `cstree` implementation is a fork of the excellent
|
||||
//! [`rowan`](https://github.com/rust-analyzer/rowan/), developed by the authors of
|
||||
//! [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/).
|
||||
//! While we are building our own documentation, a conceptual overview of their implementation is
|
||||
//! available in the [rust-analyzer
|
||||
//! repo](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
|
||||
//! "Traditional" abstract syntax trees (ASTs) usually contain different types of nodes which represent information
|
||||
//! about the source text of a document and reduce this information to the minimal amount necessary to correctly
|
||||
//! interpret it. In contrast, CSTs are lossless representations of the entire input where all tree nodes are
|
||||
//! represented uniformly (i.e. the nodes are _untyped_), but include a [`SyntaxKind`] field to determine the kind of
|
||||
//! node.
|
||||
//! One of the big advantages of this representation is not only that it can recreate the original source exactly, but
|
||||
//! also that it lends itself very well to the representation of _incomplete or erroneous_ trees and is thus very suited
|
||||
//! for usage in contexts such as IDEs.
|
||||
//!
|
||||
//! The concept of and the data structures for CSTs are inspired in part by Swift's [libsyntax](https://github.com/apple/swift/tree/5e2c815edfd758f9b1309ce07bfc01c4bc20ec23/lib/Syntax).
|
||||
//! Trees consist of two layers: the inner tree (called _green_ tree) contains the actual source text in _position
|
||||
//! independent_ green nodes. Tokens and nodes that appear identically at multiple places in the source text are
|
||||
//! _deduplicated_ in this representation in order to store the tree efficiently. This means that the green tree may not
|
||||
//! structurally be a tree. To remedy this, the actual syntax tree is constructed on top of the green tree as a
|
||||
//! secondary tree (called _red_ tree), which models the exact source structure.
|
||||
|
||||
//! The `cstree` implementation is a fork of the excellent [`rowan`](https://github.com/rust-analyzer/rowan/), developed by the authors of [rust-analyzer](https://github.com/rust-analyzer/rust-analyzer/) who wrote up a conceptual overview of their implementation in [their repository](https://github.com/rust-analyzer/rust-analyzer/blob/master/docs/dev/syntax.md#trees).
|
||||
//! Notable differences of `cstree` compared to `rowan`:
|
||||
//! - Syntax trees (red trees) are created lazily, but are persistent. Once a node has been created,
|
||||
//! it will remain allocated, while `rowan` re-creates the red layer on the fly. Apart from the
|
||||
|
@ -24,15 +31,24 @@
|
|||
//! with a small string optimization (see [`SmolStr`](https://crates.io/crates/smol_str)).
|
||||
//! - Performance optimizations for tree creation: only allocate new nodes on the heap if they are not in cache, avoid
|
||||
//! recursively hashing subtrees
|
||||
//! - Performance optimizations for tree traversal: persisting red nodes allows tree traversal methods to return
|
||||
//! references. You can still `clone` to obtain an owned node, but you only pay that cost when you need to.
|
||||
//!
|
||||
//! See `examples/s_expressions.rs` for a tutorial.
|
||||
//! ## Getting Started
|
||||
//! The main entry points for constructing syntax trees are [`GreenNodeBuilder`] and [`SyntaxNode::new_root`] for green
|
||||
//! and red trees respectively. See `examples/s_expressions.rs` for a guided tutorial to `cstree`.
|
||||
//!
|
||||
//! ## AST Layer
|
||||
//! While `cstree` is built for concrete syntax trees, applications are quite easily able to work with either a CST or
|
||||
//! an AST representation, or freely switch between them. To do so, use `cstree` to build syntax and underlying green
|
||||
//! tree and provide AST wrappers for your different kinds of nodes. An example of how this is done can be seen [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated.rs) and [here](https://github.com/rust-analyzer/rust-analyzer/blob/master/crates/syntax/src/ast/generated/nodes.rs) (note that the latter file is automatically generated by a task).
|
||||
|
||||
#![forbid(
|
||||
// missing_debug_implementations,
|
||||
unconditional_recursion,
|
||||
future_incompatible,
|
||||
// missing_docs,
|
||||
)]
|
||||
#![deny(unsafe_code)]
|
||||
#![deny(unsafe_code, missing_docs)]
|
||||
|
||||
#[allow(unsafe_code)]
|
||||
mod green;
|
||||
|
@ -42,8 +58,10 @@ pub mod syntax;
|
|||
#[cfg(feature = "serde1")]
|
||||
mod serde_impls;
|
||||
mod syntax_text;
|
||||
#[allow(missing_docs)]
|
||||
mod utility_types;
|
||||
|
||||
/// Types and Traits for efficient String storage and deduplication.
|
||||
pub mod interning {
|
||||
pub use lasso::{Interner, Reader, Resolver};
|
||||
}
|
||||
|
@ -59,9 +77,47 @@ pub use crate::{
|
|||
utility_types::{Direction, NodeOrToken, TokenAtOffset, WalkEvent},
|
||||
};
|
||||
|
||||
/// The `Language` trait is the bridge between the internal `cstree` representation and your language
|
||||
/// types.
|
||||
/// This is essential to providing a [`SyntaxNode`] API that can be used with your types, as in the
|
||||
/// `s_expressions` example:
|
||||
/// ```
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// # #[allow(non_camel_case_types)]
|
||||
/// #[repr(u16)]
|
||||
/// enum SyntaxKind {
|
||||
/// ROOT, // top-level node
|
||||
/// ATOM, // `+`, `15`
|
||||
/// WHITESPACE, // whitespaces is explicit
|
||||
/// #[doc(hidden)]
|
||||
/// __LAST,
|
||||
/// }
|
||||
/// use SyntaxKind::*;
|
||||
///
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// enum Lang {}
|
||||
///
|
||||
/// impl cstree::Language for Lang {
|
||||
/// type Kind = SyntaxKind;
|
||||
///
|
||||
/// fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
|
||||
/// assert!(raw.0 <= __LAST as u16);
|
||||
/// unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
|
||||
/// }
|
||||
///
|
||||
/// fn kind_to_raw(kind: Self::Kind) -> cstree::SyntaxKind {
|
||||
/// cstree::SyntaxKind(kind as u16)
|
||||
/// }
|
||||
/// }
|
||||
/// ```
|
||||
pub trait Language: Sized + Clone + Copy + fmt::Debug + Eq + Ord + std::hash::Hash {
|
||||
/// A type that represents what items in your Language can be.
|
||||
/// Typically, this is an `enum` with variants such as `Identifier`, `Literal`, ...
|
||||
type Kind: fmt::Debug;
|
||||
|
||||
/// Construct a semantic item kind from the compact representation.
|
||||
fn kind_from_raw(raw: SyntaxKind) -> Self::Kind;
|
||||
|
||||
/// Convert a semantic item kind into a more compact representation.
|
||||
fn kind_to_raw(kind: Self::Kind) -> SyntaxKind;
|
||||
}
|
||||
|
|
196
src/syntax.rs
196
src/syntax.rs
|
@ -1,3 +1,10 @@
|
|||
//! Implementation of the outer, "red" tree.
|
||||
//!
|
||||
//! Inner [`SyntaxNode`]s represent only structural information, but can hold additional, user-defined data.
|
||||
//! Leaf [`SyntaxToken`]s represent individual pieces of source text.
|
||||
//! Use [`SyntaxNode::new_root`] and [`SyntaxNode::new_root_with_resolver`] to construct a syntax
|
||||
//! tree on top of a green tree.
|
||||
|
||||
use std::{
|
||||
cell::UnsafeCell,
|
||||
fmt::{self, Write},
|
||||
|
@ -33,6 +40,10 @@ use crate::{
|
|||
//
|
||||
// - DQ 01/2021
|
||||
|
||||
/// Inner syntax tree node.
|
||||
/// Syntax nodes can be shared between threads.
|
||||
/// Every syntax tree is reference counted as a whole and nodes are pointer-sized, so copying
|
||||
/// individual nodes is relatively cheap.
|
||||
#[repr(transparent)]
|
||||
pub struct SyntaxNode<L: Language, D: 'static = (), R: 'static = ()> {
|
||||
data: *mut NodeData<L, D, R>,
|
||||
|
@ -42,6 +53,7 @@ unsafe impl<L: Language, D: 'static, R: 'static> Send for SyntaxNode<L, D, R> {}
|
|||
unsafe impl<L: Language, D: 'static, R: 'static> Sync for SyntaxNode<L, D, R> {}
|
||||
|
||||
impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
||||
#[allow(missing_docs)]
|
||||
pub fn debug(&self, resolver: &impl Resolver, recursive: bool) -> String {
|
||||
// NOTE: `fmt::Write` methods on `String` never fail
|
||||
let mut res = String::new();
|
||||
|
@ -74,6 +86,7 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
res
|
||||
}
|
||||
|
||||
#[allow(missing_docs)]
|
||||
pub fn display(&self, resolver: &impl Resolver) -> String {
|
||||
let mut res = String::new();
|
||||
self.preorder_with_tokens()
|
||||
|
@ -195,6 +208,7 @@ impl<L: Language, D, R> Hash for SyntaxNode<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Syntax tree token.
|
||||
pub struct SyntaxToken<L: Language, D: 'static = (), R: 'static = ()> {
|
||||
parent: SyntaxNode<L, D, R>,
|
||||
index: u32,
|
||||
|
@ -228,6 +242,7 @@ impl<L: Language, D, R> PartialEq for SyntaxToken<L, D, R> {
|
|||
impl<L: Language, D, R> Eq for SyntaxToken<L, D, R> {}
|
||||
|
||||
impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
||||
#[allow(missing_docs)]
|
||||
pub fn debug(&self, resolver: &impl Resolver) -> String {
|
||||
let mut res = String::new();
|
||||
write!(res, "{:?}@{:?}", self.kind(), self.text_range()).unwrap();
|
||||
|
@ -246,11 +261,13 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
unreachable!()
|
||||
}
|
||||
|
||||
#[allow(missing_docs)]
|
||||
pub fn display(&self, resolver: &impl Resolver) -> String {
|
||||
self.resolve_text(resolver).to_string()
|
||||
}
|
||||
}
|
||||
|
||||
/// An element of the tree, can be either a node or a token.
|
||||
pub type SyntaxElement<L, D = (), R = ()> = NodeOrToken<SyntaxNode<L, D, R>, SyntaxToken<L, D, R>>;
|
||||
|
||||
impl<L: Language, D, R> From<SyntaxNode<L, D, R>> for SyntaxElement<L, D, R> {
|
||||
|
@ -266,6 +283,7 @@ impl<L: Language, D, R> From<SyntaxToken<L, D, R>> for SyntaxElement<L, D, R> {
|
|||
}
|
||||
|
||||
impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
||||
#[allow(missing_docs)]
|
||||
pub fn display(&self, resolver: &impl Resolver) -> String {
|
||||
match self {
|
||||
NodeOrToken::Node(it) => it.display(resolver),
|
||||
|
@ -274,6 +292,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// A reference to an element of the tree, can be either a reference to a node or one to a token.
|
||||
pub type SyntaxElementRef<'a, L, D = (), R = ()> = NodeOrToken<&'a SyntaxNode<L, D, R>, &'a SyntaxToken<L, D, R>>;
|
||||
|
||||
impl<'a, L: Language, D, R> From<&'a SyntaxNode<L, D, R>> for SyntaxElementRef<'a, L, D, R> {
|
||||
|
@ -298,6 +317,7 @@ impl<'a, L: Language, D, R> From<&'a SyntaxElement<L, D, R>> for SyntaxElementRe
|
|||
}
|
||||
|
||||
impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
||||
#[allow(missing_docs)]
|
||||
pub fn display(&self, resolver: &impl Resolver) -> String {
|
||||
match self {
|
||||
NodeOrToken::Node(it) => it.display(resolver),
|
||||
|
@ -356,6 +376,39 @@ impl<L: Language, D, R> NodeData<L, D, R> {
|
|||
}
|
||||
|
||||
impl<L: Language, D> SyntaxNode<L, D, ()> {
|
||||
/// Build a new syntax tree on top of a green tree.
|
||||
///
|
||||
/// # Example
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # #[allow(non_camel_case_types)]
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// #[repr(u16)]
|
||||
/// enum SyntaxKind {
|
||||
/// ROOT,
|
||||
/// }
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// enum Lang {}
|
||||
/// impl cstree::Language for Lang {
|
||||
/// // ...
|
||||
/// # type Kind = SyntaxKind;
|
||||
/// #
|
||||
/// # fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
|
||||
/// # assert!(raw.0 <= SyntaxKind::ROOT as u16);
|
||||
/// # unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
|
||||
/// # }
|
||||
/// #
|
||||
/// # fn kind_to_raw(kind: Self::Kind) -> cstree::SyntaxKind {
|
||||
/// # cstree::SyntaxKind(kind as u16)
|
||||
/// # }
|
||||
/// }
|
||||
/// # let mut builder = GreenNodeBuilder::new();
|
||||
/// # builder.start_node(SyntaxKind(0));
|
||||
/// # builder.finish_node();
|
||||
/// # let (green, _) = builder.finish();
|
||||
/// let root: SyntaxNode<Lang> = SyntaxNode::new_root(green);
|
||||
/// assert_eq!(root.kind(), SyntaxKind::ROOT);
|
||||
/// ```
|
||||
pub fn new_root(green: GreenNode) -> Self {
|
||||
Self::make_new_root(green, ())
|
||||
}
|
||||
|
@ -385,6 +438,45 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
ret
|
||||
}
|
||||
|
||||
/// Build a new syntax tree on top of a green tree and associate a resolver with the tree to
|
||||
/// resolve interned Strings.
|
||||
///
|
||||
/// # Example
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # #[allow(non_camel_case_types)]
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// #[repr(u16)]
|
||||
/// enum SyntaxKind {
|
||||
/// TOKEN,
|
||||
/// ROOT,
|
||||
/// }
|
||||
/// #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// enum Lang {}
|
||||
/// impl cstree::Language for Lang {
|
||||
/// // ...
|
||||
/// # type Kind = SyntaxKind;
|
||||
/// #
|
||||
/// # fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
|
||||
/// # assert!(raw.0 <= SyntaxKind::ROOT as u16);
|
||||
/// # unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
|
||||
/// # }
|
||||
/// #
|
||||
/// # fn kind_to_raw(kind: Self::Kind) -> cstree::SyntaxKind {
|
||||
/// # cstree::SyntaxKind(kind as u16)
|
||||
/// # }
|
||||
/// }
|
||||
/// # const ROOT: cstree::SyntaxKind = cstree::SyntaxKind(0);
|
||||
/// # const TOKEN: cstree::SyntaxKind = cstree::SyntaxKind(1);
|
||||
/// # type SyntaxNode<L> = cstree::SyntaxNode<L, (), lasso::Rodeo<lasso::Spur, fxhash::FxBuildHasher>>;
|
||||
/// let mut builder = GreenNodeBuilder::new();
|
||||
/// builder.start_node(ROOT);
|
||||
/// builder.token(TOKEN, "content");
|
||||
/// builder.finish_node();
|
||||
/// let (green, resolver) = builder.finish();
|
||||
/// let root: SyntaxNode<Lang> = SyntaxNode::new_root_with_resolver(green, resolver.unwrap());
|
||||
/// assert_eq!(root.text(), "content");
|
||||
/// ```
|
||||
pub fn new_root_with_resolver(green: GreenNode, resolver: R) -> Self
|
||||
where
|
||||
R: Resolver,
|
||||
|
@ -409,6 +501,8 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
Self::new(data)
|
||||
}
|
||||
|
||||
/// Stores custom data for this node.
|
||||
/// If there was previous data associated with this node, it will be replaced.
|
||||
pub fn set_data(&self, data: D) -> Arc<D> {
|
||||
let mut ptr = self.data().data.write();
|
||||
let data = Arc::new(data);
|
||||
|
@ -416,6 +510,8 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
data
|
||||
}
|
||||
|
||||
/// Stores custom data for this node, but only if no data was previously set.
|
||||
/// If it was, the given data is returned unchanged.
|
||||
pub fn try_set_data(&self, data: D) -> Result<Arc<D>, D> {
|
||||
let mut ptr = self.data().data.write();
|
||||
if ptr.is_some() {
|
||||
|
@ -426,16 +522,19 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
Ok(data)
|
||||
}
|
||||
|
||||
/// Returns the data associated with this node, if any.
|
||||
pub fn get_data(&self) -> Option<Arc<D>> {
|
||||
let ptr = self.data().data.read();
|
||||
(*ptr).as_ref().map(|ptr| Arc::clone(ptr))
|
||||
}
|
||||
|
||||
/// Removes the data associated with this node.
|
||||
pub fn clear_data(&self) {
|
||||
let mut ptr = self.data().data.write();
|
||||
*ptr = None;
|
||||
}
|
||||
|
||||
/// If there is a resolver associated with this tree, returns it.
|
||||
pub fn resolver(&self) -> &Arc<R> {
|
||||
match &self.root().data().kind {
|
||||
Kind::Root(_, resolver) => resolver,
|
||||
|
@ -548,16 +647,19 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The internal representation of the kind of this node.
|
||||
#[inline]
|
||||
pub fn syntax_kind(&self) -> SyntaxKind {
|
||||
self.green().kind()
|
||||
}
|
||||
|
||||
/// The kind of this node in terms of your language.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> L::Kind {
|
||||
L::kind_from_raw(self.syntax_kind())
|
||||
}
|
||||
|
||||
/// The range this node covers in the source text, in bytes.
|
||||
#[inline]
|
||||
pub fn text_range(&self) -> TextRange {
|
||||
let offset = match self.data().kind.as_child() {
|
||||
|
@ -567,6 +669,9 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
TextRange::at(offset, self.green().text_len())
|
||||
}
|
||||
|
||||
/// Uses the provided resolver to return an efficient representation of all source text covered
|
||||
/// by this node, i.e. the combined text of all token leafs of the subtree originating in this
|
||||
/// node.
|
||||
#[inline]
|
||||
pub fn resolve_text<'n, 'i, I>(&'n self, resolver: &'i I) -> SyntaxText<'n, 'i, I, L, D, R>
|
||||
where
|
||||
|
@ -575,11 +680,13 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
SyntaxText::new(self, resolver)
|
||||
}
|
||||
|
||||
/// Returns the unterlying green tree node of this node.
|
||||
#[inline]
|
||||
pub fn green(&self) -> &GreenNode {
|
||||
unsafe { self.data().green.as_ref() }
|
||||
}
|
||||
|
||||
/// The parent node of this node, except if this node is the root.
|
||||
#[inline]
|
||||
pub fn parent(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
match &self.data().kind {
|
||||
|
@ -588,21 +695,29 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns an iterator along the chain of parents of this node.
|
||||
#[inline]
|
||||
pub fn ancestors(&self) -> impl Iterator<Item = &SyntaxNode<L, D, R>> {
|
||||
iter::successors(Some(self), |&node| node.parent())
|
||||
}
|
||||
|
||||
/// Returns an iterator over all nodes that are children of this node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`children_with_tokens`](SyntaxNode::children_with_tokens).
|
||||
#[inline]
|
||||
pub fn children(&self) -> SyntaxNodeChildren<'_, L, D, R> {
|
||||
SyntaxNodeChildren::new(self)
|
||||
}
|
||||
|
||||
/// Returns an iterator over child elements of this node, including tokens.
|
||||
#[inline]
|
||||
pub fn children_with_tokens(&self) -> SyntaxElementChildren<'_, L, D, R> {
|
||||
SyntaxElementChildren::new(self)
|
||||
}
|
||||
|
||||
/// The first child node of this node, if any.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`first_child_or_token`](SyntaxNode::first_child_or_token).
|
||||
#[inline]
|
||||
#[allow(clippy::map_clone)]
|
||||
pub fn first_child(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
|
@ -610,12 +725,16 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
self.get_or_add_node(node, index, offset).as_node().map(|node| *node)
|
||||
}
|
||||
|
||||
/// The first child element of this node, if any, including tokens.
|
||||
#[inline]
|
||||
pub fn first_child_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (element, (index, offset)) = self.green().children_from(0, self.text_range().start()).next()?;
|
||||
Some(self.get_or_add_element(element, index, offset))
|
||||
}
|
||||
|
||||
/// The last child node of this node, if any.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`last_child_or_token`](SyntaxNode::last_child_or_token).
|
||||
#[inline]
|
||||
#[allow(clippy::map_clone)]
|
||||
pub fn last_child(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
|
@ -627,6 +746,7 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
self.get_or_add_node(node, index, offset).as_node().map(|node| *node)
|
||||
}
|
||||
|
||||
/// The last child element of this node, if any, including tokens.
|
||||
#[inline]
|
||||
pub fn last_child_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (element, (index, offset)) = self
|
||||
|
@ -636,30 +756,47 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
Some(self.get_or_add_element(element, index, offset))
|
||||
}
|
||||
|
||||
/// The first child node of this node starting at the (n + 1)-st, if any.
|
||||
/// Note that even if this method returns `Some`, the contained node may not actually be the (n +
|
||||
/// 1)-st child, but the next child from there that is a node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`next_child_or_token_after`](SyntaxNode::next_child_or_token_after).
|
||||
#[inline]
|
||||
pub fn next_child_after(&self, n: usize, offset: TextSize) -> Option<&SyntaxNode<L, D, R>> {
|
||||
let (node, (index, offset)) = filter_nodes(self.green().children_from(n + 1, offset)).next()?;
|
||||
self.get_or_add_node(node, index, offset).as_node().copied()
|
||||
}
|
||||
|
||||
/// The first child element of this node starting at the (n + 1)-st, if any.
|
||||
/// If this method returns `Some`, the contained node is the (n + 1)-st child of this node.
|
||||
#[inline]
|
||||
pub fn next_child_or_token_after(&self, n: usize, offset: TextSize) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (element, (index, offset)) = self.green().children_from(n + 1, offset).next()?;
|
||||
Some(self.get_or_add_element(element, index, offset))
|
||||
}
|
||||
|
||||
/// The last child node of this node up to the nth, if any.
|
||||
/// Note that even if this method returns `Some`, the contained node may not actually be the (n -
|
||||
/// 1)-st child, but the previous child from there that is a node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`prev_child_or_token_before`](SyntaxNode::prev_child_or_token_before).
|
||||
#[inline]
|
||||
pub fn prev_child_before(&self, n: usize, offset: TextSize) -> Option<&SyntaxNode<L, D, R>> {
|
||||
let (node, (index, offset)) = filter_nodes(self.green().children_to(n, offset)).next()?;
|
||||
self.get_or_add_node(node, index, offset).as_node().copied()
|
||||
}
|
||||
|
||||
/// The last child node of this node up to the nth, if any.
|
||||
/// If this method returns `Some`, the contained node is the (n - 1)-st child.
|
||||
#[inline]
|
||||
pub fn prev_child_or_token_before(&self, n: usize, offset: TextSize) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (element, (index, offset)) = self.green().children_to(n, offset).next()?;
|
||||
Some(self.get_or_add_element(element, index, offset))
|
||||
}
|
||||
|
||||
/// The node to the right of this one, i.e. the next child node (!) of this node's parent after this node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`next_sibling_or_token`](SyntaxNode::next_sibling_or_token).
|
||||
#[inline]
|
||||
pub fn next_sibling(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
let (parent, index, _) = self.data().kind.as_child()?;
|
||||
|
@ -673,6 +810,7 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
parent.get_or_add_node(node, index, offset).as_node().copied()
|
||||
}
|
||||
|
||||
/// The tree element to the right of this one, i.e. the next child of this node's parent after this node.
|
||||
#[inline]
|
||||
pub fn next_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (parent, index, _) = self.data().kind.as_child()?;
|
||||
|
@ -684,6 +822,9 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
Some(parent.get_or_add_element(element, index, offset))
|
||||
}
|
||||
|
||||
/// The node to the left of this one, i.e. the previous child node (!) of this node's parent before this node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`prev_sibling_or_token`](SyntaxNode::prev_sibling_or_token).
|
||||
#[inline]
|
||||
pub fn prev_sibling(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
let (parent, index, _) = self.data().kind.as_child()?;
|
||||
|
@ -693,6 +834,7 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
parent.get_or_add_node(node, index, offset).as_node().copied()
|
||||
}
|
||||
|
||||
/// The tree element to the left of this one, i.e. the previous child of this node's parent before this node.
|
||||
#[inline]
|
||||
pub fn prev_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
let (parent, index, _) = self.data().kind.as_child()?;
|
||||
|
@ -716,6 +858,11 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
self.last_child_or_token()?.last_token()
|
||||
}
|
||||
|
||||
/// Returns an iterator over all sibling nodes of this node in the given `direction`, i.e. all of
|
||||
/// this node's parent's child nodes (!) from this node on to the left or the right. The first
|
||||
/// item in the iterator will always be this node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`siblings_with_tokens`](SyntaxNode::siblings_with_tokens).
|
||||
#[inline]
|
||||
pub fn siblings(&self, direction: Direction) -> impl Iterator<Item = &SyntaxNode<L, D, R>> {
|
||||
iter::successors(Some(self), move |node| match direction {
|
||||
|
@ -724,6 +871,9 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
})
|
||||
}
|
||||
|
||||
/// Returns an iterator over all siblings of this node in the given `direction`, i.e. all of this
|
||||
/// node's parent's children from this node on to the left or the right.
|
||||
/// The first item in the iterator will always be this node.
|
||||
#[inline]
|
||||
pub fn siblings_with_tokens(&self, direction: Direction) -> impl Iterator<Item = SyntaxElementRef<'_, L, D, R>> {
|
||||
let me: SyntaxElementRef<'_, L, D, R> = self.into();
|
||||
|
@ -733,6 +883,9 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
})
|
||||
}
|
||||
|
||||
/// Returns an iterator over all nodes (!) in the subtree starting at this node, including this node.
|
||||
///
|
||||
/// If you want to also consider leafs, see [`descendants_with_tokens`](SyntaxNode::descendants_with_tokens).
|
||||
#[inline]
|
||||
pub fn descendants(&self) -> impl Iterator<Item = &SyntaxNode<L, D, R>> {
|
||||
self.preorder().filter_map(|event| match event {
|
||||
|
@ -741,6 +894,7 @@ impl<L: Language, D, R> SyntaxNode<L, D, R> {
|
|||
})
|
||||
}
|
||||
|
||||
/// Returns an iterator over all elements in the subtree starting at this node, including this node.
|
||||
#[inline]
|
||||
pub fn descendants_with_tokens(&self) -> impl Iterator<Item = SyntaxElementRef<'_, L, D, R>> {
|
||||
self.preorder_with_tokens().filter_map(|event| match event {
|
||||
|
@ -870,6 +1024,9 @@ impl<L: Language, D, R> SyntaxNode<L, D, R>
|
|||
where
|
||||
R: Resolver,
|
||||
{
|
||||
/// Uses the resolver associated with this tree to return an efficient representation of all
|
||||
/// source text covered by this node, i.e. the combined text of all token leafs of the subtree
|
||||
/// originating in this node.
|
||||
#[inline]
|
||||
pub fn text(&self) -> SyntaxText<'_, '_, R, L, D, R> {
|
||||
SyntaxText::new(self, self.resolver().as_ref())
|
||||
|
@ -963,21 +1120,25 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
parent.replace_with(new_parent)
|
||||
}
|
||||
|
||||
/// The internal representation of the kind of this token.
|
||||
#[inline]
|
||||
pub fn syntax_kind(&self) -> SyntaxKind {
|
||||
self.green().kind()
|
||||
}
|
||||
|
||||
/// The kind of this token in terms of your language.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> L::Kind {
|
||||
L::kind_from_raw(self.syntax_kind())
|
||||
}
|
||||
|
||||
/// The range this token covers in the source text, in bytes.
|
||||
#[inline]
|
||||
pub fn text_range(&self) -> TextRange {
|
||||
TextRange::at(self.offset, self.green().text_len())
|
||||
}
|
||||
|
||||
/// Uses the provided resolver to return the source text of this token.
|
||||
#[inline]
|
||||
pub fn resolve_text<'i, I>(&self, resolver: &'i I) -> &'i str
|
||||
where
|
||||
|
@ -986,6 +1147,7 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
self.green().text(resolver)
|
||||
}
|
||||
|
||||
/// Returns the unterlying green tree token of this token.
|
||||
pub fn green(&self) -> &GreenToken {
|
||||
self.parent
|
||||
.green()
|
||||
|
@ -996,28 +1158,35 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
.unwrap()
|
||||
}
|
||||
|
||||
/// The parent node of this token.
|
||||
#[inline]
|
||||
pub fn parent(&self) -> &SyntaxNode<L, D, R> {
|
||||
&self.parent
|
||||
}
|
||||
|
||||
/// Returns an iterator along the chain of parents of this token.
|
||||
#[inline]
|
||||
pub fn ancestors(&self) -> impl Iterator<Item = &SyntaxNode<L, D, R>> {
|
||||
self.parent().ancestors()
|
||||
}
|
||||
|
||||
/// The tree element to the right of this one, i.e. the next child of this token's parent after this token.
|
||||
#[inline]
|
||||
pub fn next_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
self.parent()
|
||||
.next_child_or_token_after(self.index as usize, self.text_range().end())
|
||||
}
|
||||
|
||||
/// The tree element to the left of this one, i.e. the previous child of this token's parent after this token.
|
||||
#[inline]
|
||||
pub fn prev_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
self.parent()
|
||||
.prev_child_or_token_before(self.index as usize, self.text_range().start())
|
||||
}
|
||||
|
||||
/// Returns an iterator over all siblings of this token in the given `direction`, i.e. all of this
|
||||
/// token's parent's children from this token on to the left or the right.
|
||||
/// The first item in the iterator will always be this token.
|
||||
#[inline]
|
||||
pub fn siblings_with_tokens(&self, direction: Direction) -> impl Iterator<Item = SyntaxElementRef<'_, L, D, R>> {
|
||||
let me: SyntaxElementRef<'_, L, D, R> = self.into();
|
||||
|
@ -1027,7 +1196,8 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
})
|
||||
}
|
||||
|
||||
/// Next token in the tree (i.e, not necessary a sibling)
|
||||
/// Returns the next token in the tree.
|
||||
/// This is not necessary a direct sibling of this token, but will always be further right in the tree.
|
||||
pub fn next_token(&self) -> Option<&SyntaxToken<L, D, R>> {
|
||||
match self.next_sibling_or_token() {
|
||||
Some(element) => element.first_token(),
|
||||
|
@ -1039,7 +1209,8 @@ impl<L: Language, D, R> SyntaxToken<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Previous token in the tree (i.e, not necessary a sibling)
|
||||
/// Returns the previous token in the tree.
|
||||
/// This is not necessary a direct sibling of this token, but will always be further left in the tree.
|
||||
pub fn prev_token(&self) -> Option<&SyntaxToken<L, D, R>> {
|
||||
match self.prev_sibling_or_token() {
|
||||
Some(element) => element.last_token(),
|
||||
|
@ -1056,6 +1227,7 @@ impl<L: Language, D, R> SyntaxToken<L, D, R>
|
|||
where
|
||||
R: Resolver,
|
||||
{
|
||||
/// Uses the resolver associated with this tree to return the source text of this token.
|
||||
#[inline]
|
||||
pub fn text(&self) -> &str {
|
||||
self.green().text(self.parent().resolver().as_ref())
|
||||
|
@ -1094,6 +1266,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The range this element covers in the source text, in bytes.
|
||||
#[inline]
|
||||
pub fn text_range(&self) -> TextRange {
|
||||
match self {
|
||||
|
@ -1102,6 +1275,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The internal representation of the kind of this element.
|
||||
#[inline]
|
||||
pub fn syntax_kind(&self) -> SyntaxKind {
|
||||
match self {
|
||||
|
@ -1110,6 +1284,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The kind of this element in terms of your language.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> L::Kind {
|
||||
match self {
|
||||
|
@ -1118,6 +1293,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The parent node of this element, except if this element is the root.
|
||||
#[inline]
|
||||
pub fn parent(&self) -> Option<&SyntaxNode<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1126,6 +1302,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns an iterator along the chain of parents of this node.
|
||||
#[inline]
|
||||
pub fn ancestors(&self) -> impl Iterator<Item = &SyntaxNode<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1134,6 +1311,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Return the leftmost token in the subtree of this element.
|
||||
#[inline]
|
||||
pub fn first_token(&self) -> Option<&SyntaxToken<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1142,6 +1320,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Return the rightmost token in the subtree of this element.
|
||||
#[inline]
|
||||
pub fn last_token(&self) -> Option<&SyntaxToken<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1150,6 +1329,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The tree element to the right of this one, i.e. the next child of this element's parent after this element.
|
||||
#[inline]
|
||||
pub fn next_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
match self {
|
||||
|
@ -1158,6 +1338,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The tree element to the left of this one, i.e. the previous child of this element's parent after this element.
|
||||
#[inline]
|
||||
pub fn prev_sibling_or_token(&self) -> Option<SyntaxElementRef<'_, L, D, R>> {
|
||||
match self {
|
||||
|
@ -1168,6 +1349,7 @@ impl<L: Language, D, R> SyntaxElement<L, D, R> {
|
|||
}
|
||||
|
||||
impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
||||
/// The range this element covers in the source text, in bytes.
|
||||
#[inline]
|
||||
pub fn text_range(&self) -> TextRange {
|
||||
match self {
|
||||
|
@ -1176,6 +1358,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The internal representation of the kind of this element.
|
||||
#[inline]
|
||||
pub fn syntax_kind(&self) -> SyntaxKind {
|
||||
match self {
|
||||
|
@ -1184,6 +1367,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The kind of this element in terms of your language.
|
||||
#[inline]
|
||||
pub fn kind(&self) -> L::Kind {
|
||||
match self {
|
||||
|
@ -1192,6 +1376,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The parent node of this element, except if this element is the root.
|
||||
#[inline]
|
||||
pub fn parent(&self) -> Option<&'a SyntaxNode<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1200,6 +1385,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Returns an iterator along the chain of parents of this node.
|
||||
#[inline]
|
||||
pub fn ancestors(&self) -> impl Iterator<Item = &'a SyntaxNode<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1208,6 +1394,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Return the leftmost token in the subtree of this element.
|
||||
#[inline]
|
||||
pub fn first_token(&self) -> Option<&'a SyntaxToken<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1216,6 +1403,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// Return the rightmost token in the subtree of this element.
|
||||
#[inline]
|
||||
pub fn last_token(&self) -> Option<&'a SyntaxToken<L, D, R>> {
|
||||
match self {
|
||||
|
@ -1224,6 +1412,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The tree element to the right of this one, i.e. the next child of this element's parent after this element.
|
||||
#[inline]
|
||||
pub fn next_sibling_or_token(&self) -> Option<SyntaxElementRef<'a, L, D, R>> {
|
||||
match self {
|
||||
|
@ -1232,6 +1421,7 @@ impl<'a, L: Language, D, R> SyntaxElementRef<'a, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// The tree element to the left of this one, i.e. the previous child of this element's parent after this element.
|
||||
#[inline]
|
||||
pub fn prev_sibling_or_token(&self) -> Option<SyntaxElementRef<'a, L, D, R>> {
|
||||
match self {
|
||||
|
@ -1280,6 +1470,7 @@ impl<'n> Iter<'n> {
|
|||
}
|
||||
}
|
||||
|
||||
/// An iterator over the child nodes of a [`SyntaxNode`].
|
||||
#[derive(Clone)]
|
||||
pub struct SyntaxNodeChildren<'n, L: Language, D: 'static = (), R: 'static = ()> {
|
||||
inner: Iter<'n>,
|
||||
|
@ -1310,6 +1501,7 @@ impl<'n, L: Language, D, R> Iterator for SyntaxNodeChildren<'n, L, D, R> {
|
|||
}
|
||||
}
|
||||
|
||||
/// An iterator over the children of a [`SyntaxNode`].
|
||||
#[derive(Clone)]
|
||||
pub struct SyntaxElementChildren<'n, L: Language, D: 'static = (), R: 'static = ()> {
|
||||
inner: Iter<'n>,
|
||||
|
|
|
@ -1,7 +1,61 @@
|
|||
//! Efficient representation of the source text that is covered by a [`SyntaxNode`].
|
||||
|
||||
use std::fmt;
|
||||
|
||||
use crate::{interning::Resolver, Language, SyntaxNode, SyntaxToken, TextRange, TextSize};
|
||||
|
||||
/// An efficient representation of the text that is covered by a [`SyntaxNode`], i.e. the combined
|
||||
/// source text of all tokens that are descendants of the node.
|
||||
///
|
||||
/// Offers methods to work with the text distributed across multiple [`SyntaxToken`]s while avoiding
|
||||
/// the construction of intermediate strings where possible.
|
||||
/// This includes efficient comparisons with itself and with strings and conversion `to_string()`.
|
||||
///
|
||||
/// # Example
|
||||
/// ```
|
||||
/// # use cstree::*;
|
||||
/// # #[allow(non_camel_case_types)]
|
||||
/// # #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// # #[repr(u16)]
|
||||
/// # enum SyntaxKind {
|
||||
/// # TOKEN,
|
||||
/// # ROOT,
|
||||
/// # }
|
||||
/// # #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
/// # enum Lang {}
|
||||
/// # impl cstree::Language for Lang {
|
||||
/// # type Kind = SyntaxKind;
|
||||
/// #
|
||||
/// # fn kind_from_raw(raw: cstree::SyntaxKind) -> Self::Kind {
|
||||
/// # assert!(raw.0 <= SyntaxKind::ROOT as u16);
|
||||
/// # unsafe { std::mem::transmute::<u16, SyntaxKind>(raw.0) }
|
||||
/// # }
|
||||
/// #
|
||||
/// # fn kind_to_raw(kind: Self::Kind) -> cstree::SyntaxKind {
|
||||
/// # cstree::SyntaxKind(kind as u16)
|
||||
/// # }
|
||||
/// # }
|
||||
/// # type SyntaxNode = cstree::SyntaxNode<Lang, (), lasso::RodeoResolver<lasso::Spur>>;
|
||||
/// #
|
||||
/// # fn parse_float_literal(s: &str) -> SyntaxNode {
|
||||
/// # const LITERAL: cstree::SyntaxKind = cstree::SyntaxKind(0);
|
||||
/// # let mut builder = GreenNodeBuilder::new();
|
||||
/// # builder.start_node(LITERAL);
|
||||
/// # builder.token(LITERAL, s);
|
||||
/// # builder.finish_node();
|
||||
/// # let (root, interner) = builder.finish();
|
||||
/// # let resolver = interner.unwrap().into_resolver();
|
||||
/// # SyntaxNode::new_root_with_resolver(root, resolver)
|
||||
/// # }
|
||||
/// let node = parse_float_literal("2.748E2");
|
||||
/// let text = node.text();
|
||||
/// assert_eq!(text.len(), 7.into());
|
||||
/// assert!(text.contains_char('E'));
|
||||
/// assert_eq!(text.find_char('E'), Some(5.into()));
|
||||
/// assert_eq!(text.char_at(1.into()), Some('.'));
|
||||
/// let sub = text.slice(2.into()..5.into());
|
||||
/// assert_eq!(sub, "748");
|
||||
/// ```
|
||||
#[derive(Clone)]
|
||||
pub struct SyntaxText<'n, 'i, I: ?Sized, L: Language, D: 'static = (), R: 'static = ()> {
|
||||
node: &'n SyntaxNode<L, D, R>,
|
||||
|
@ -15,19 +69,24 @@ impl<'n, 'i, I: Resolver + ?Sized, L: Language, D, R> SyntaxText<'n, 'i, I, L, D
|
|||
SyntaxText { node, range, resolver }
|
||||
}
|
||||
|
||||
/// The combined length of this text, in bytes.
|
||||
pub fn len(&self) -> TextSize {
|
||||
self.range.len()
|
||||
}
|
||||
|
||||
/// Returns `true` if [`self.len()`](SyntaxText::len) is zero.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.range.is_empty()
|
||||
}
|
||||
|
||||
/// Returns `true` if `c` appears anywhere in this text.
|
||||
pub fn contains_char(&self, c: char) -> bool {
|
||||
self.try_for_each_chunk(|chunk| if chunk.contains(c) { Err(()) } else { Ok(()) })
|
||||
.is_err()
|
||||
}
|
||||
|
||||
/// If `self.contains_char(c)`, returns `Some(pos)`, where `pos` is the byte position of the
|
||||
/// first appearance of `c`. Otherwise, returns `None`.
|
||||
pub fn find_char(&self, c: char) -> Option<TextSize> {
|
||||
let mut acc: TextSize = 0.into();
|
||||
let res = self.try_for_each_chunk(|chunk| {
|
||||
|
@ -41,6 +100,8 @@ impl<'n, 'i, I: Resolver + ?Sized, L: Language, D, R> SyntaxText<'n, 'i, I, L, D
|
|||
found(res)
|
||||
}
|
||||
|
||||
/// If `offset < self.len()`, returns `Some(c)`, where `c` is the first `char` at or after
|
||||
/// `offset` (in bytes). Otherwise, returns `None`.
|
||||
pub fn char_at(&self, offset: TextSize) -> Option<char> {
|
||||
let mut start: TextSize = 0.into();
|
||||
let res = self.try_for_each_chunk(|chunk| {
|
||||
|
@ -55,6 +116,12 @@ impl<'n, 'i, I: Resolver + ?Sized, L: Language, D, R> SyntaxText<'n, 'i, I, L, D
|
|||
found(res)
|
||||
}
|
||||
|
||||
/// Indexes this text by the given `range` and returns a `SyntaxText` that represents the
|
||||
/// corresponding slice of this text.
|
||||
///
|
||||
/// # Panics
|
||||
/// The end of `range` must be equal of higher than its start.
|
||||
/// Further, `range` must be contained within `0..self.len()`.
|
||||
pub fn slice<Ra: private::SyntaxTextRange>(&self, range: Ra) -> Self {
|
||||
let start = range.start().unwrap_or_default();
|
||||
let end = range.end().unwrap_or_else(|| self.len());
|
||||
|
@ -82,6 +149,12 @@ impl<'n, 'i, I: Resolver + ?Sized, L: Language, D, R> SyntaxText<'n, 'i, I, L, D
|
|||
}
|
||||
}
|
||||
|
||||
/// Applies the given function to text chunks (from [`SyntaxToken`]s) that are part of this text
|
||||
/// as long as it returns `Ok`, starting from the initial value `init`.
|
||||
///
|
||||
/// If `f` returns `Err`, the error is propagated immediately.
|
||||
/// Otherwise, the result of the current call to `f` will be passed to the invocation of `f` on
|
||||
/// the next token, producing a final value if `f` succeeds on all chunks.
|
||||
pub fn try_fold_chunks<T, F, E>(&self, init: T, mut f: F) -> Result<T, E>
|
||||
where
|
||||
F: FnMut(T, &str) -> Result<T, E>,
|
||||
|
@ -91,10 +164,21 @@ impl<'n, 'i, I: Resolver + ?Sized, L: Language, D, R> SyntaxText<'n, 'i, I, L, D
|
|||
})
|
||||
}
|
||||
|
||||
/// Applies the given function to all text chunks that this text is comprised of, in order,
|
||||
/// as long as `f` completes successfully.
|
||||
///
|
||||
/// If `f` returns `Err`, this method returns immediately and will not apply `f` to any further
|
||||
/// chunks.
|
||||
///
|
||||
/// See also [`try_fold_chunks`](SyntaxText::try_fold_chunks).
|
||||
pub fn try_for_each_chunk<F: FnMut(&str) -> Result<(), E>, E>(&self, mut f: F) -> Result<(), E> {
|
||||
self.try_fold_chunks((), move |(), chunk| f(chunk))
|
||||
}
|
||||
|
||||
/// Applies the given function to all text chunks that this text is comprised of, in order.
|
||||
///
|
||||
/// See also [`try_fold_chunks`](SyntaxText::try_fold_chunks),
|
||||
/// [`try_for_each_chunk`](SyntaxText::try_for_each_chunk).
|
||||
pub fn for_each_chunk<F: FnMut(&str)>(&self, mut f: F) {
|
||||
enum Void {}
|
||||
match self.try_for_each_chunk(|chunk| {
|
||||
|
|
|
@ -1,3 +1,6 @@
|
|||
/// Convenience type to represent tree elements which may either be a node or a token.
|
||||
///
|
||||
/// Used for both red and green tree, references to elements, ...
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
||||
pub enum NodeOrToken<N, T> {
|
||||
Node(N),
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue