Creating a regex-based Markdown parser in TypeScript
We will explore the limitations and benefits of using regular expressions to create a simple Markdown parser in TypeScript. Spoiler alert — not a good way to do it
Markdown is a markup language that has gained immense popularity in recent years. Besides being used as a convenient way to create content that generates full-blown static websites (via engines such as Gatsby.js and MarkBind), I also started to see widespread usage of Markdown in knowledge management systems such as Obsidian and Dendron.
I write articles like this one using Markdown and I am also actively exploring the use of Markdown in the above-mentioned capacities this year. As a result, I decided to dive deep into how Markdown works and hence this article.
I realized that there are two extremes in software projects:
- the most popular/battle-tested/enterprise-grade projects that define the “standard” for a particular domain
- e.g. for Markdown, it’s markdown-it and marked
- tutorial examples/toy projects for educational purposes
While the former is complex and production ready, the latter is simple and easy to understand. The problem is that there’s a huge gap between creating something simple to something complex. Should you want to do it, there’s less help and at times you are basically on your own to read the code and figure out how the complex implementation works. Nonetheless, there are values in the toy examples, which is what (and why) I will be going through in this article. A simple, starter-friendly implementation.
To understand how Markdown works, I intend to implement several Markdown parsers according to the tutorials/articles that I can find online and work from simple/naive implementations to (hopefully) a more realistic implementation that can be used in production. This is the first one in the “series” and hence the elaborated introduction.
What is Markdown
As Markdown was born without a well-defined set of rules or tests, it has evolved to have a few different flavors. The most well-known flavor of Markdown is CommonMark, which provides a standard set of rules for the language. Borrowing their Markdown reference as seen here, a common…