@typhonjs-utils/unicode

@typhonjs-utils/unicode

NPM Code Style License Build Status Coverage API Docs Discord Twitch

Provides a fast and space efficient ESM based Unicode grapheme parser including an iterable parser.

API documentation

Overview:

There are two resources available that work well in the browser via the fflate compression library:

The main use case presently supported is parsing strings for Unicode grapheme clusters.

The following functions are exported from @typhonjs-utils/unicode:

  • graphemeSplit(string): string[]
  • graphemeIterator(string): IterableIterator<string>

For instance, you can use graphemeIterator as a tokenizer for @typhonjs-svelte/trie-search allowing the trie to be made up of Unicode graphemes. There is more work to be done on this package especially for making a complete implementation of graphemeIterator. Right now there is a trivial / eager implementation that uses graphemeSplit, so the goal is to move toward creating a graphemeIterator implementation w/ full Unicode support, but more importantly the most compact browser capable implementation possible.

When you bundle this package for the browser presumably w/ Rollup or another bundler do remember to configure your bundle for browser support. For instance when using Rollup and @rollup/plugin-node-resolve pass { browser: true } to the Node resolve plugin.

Roadmap:

  • Complete a non-eager implementation of graphemeIterator.