Prose linting with Vale and Emacs

Sep 14, 2020

I have set myself the goal to improve my writing. I read some books and articles on the topic, but I am also looking for real-time feedback. I am not a native english speaker.

I found about Grammarly on twitter, but there is no way I will send my emails and documents to their server as I type. That is how I started to look for an offline solution.

I found proselint but it seems inactive since 2018. As I tried to integrate it with Emacs, I learned about write-good mode. This Emacs mode has two flaws:

Through those projects, I learned about the original article 3 shell scripts to improve your writing, or “My Ph.D. advisor rewrote himself in bash.”.

I found a more sophisticated and extensible tool called write-good, implemented in Javascript/node, which means I will not be able to package it.

I decided to try to write one. I found the Go library prose. It allows to iterate over tokens, entities and sentences. Iterating over the tokens gives access to tags:

for _, tok := range doc.Tokens() {
	fmt.Println(tok.Text, tok.Tag, tok.Label)
	// Go NNP B-GPE
	// is VBZ O
	// an DT O
	// ...
}

For example, the text “At dinner, six shrimp were eaten by Harry” produces the following tags:

Harry PERSON
At IN
dinner NN
, ,
six CD
shrimp NN
were VBD
eaten VBN
by IN
Harry NNP

The combination VBD (verb, past tense )and VBN (verb, past participle) can be used to detect passive voice, one of the guidelines of “good-write”.

Soon I figured out that the tokens do not give access to the locations. I started to see who is using the code, looking for examples.

I realized the author of the library uses the library to power vale, a prose linter implemented in go. What I was trying to write.

The vale documentation revealed the tool is more than I was looking for. It includes write-good definitions as part of its example styles. There are examples for Documentation CI which include how Gitlab, Linode and Homebrew use it to lint their documentation. It even has a Github action.

Nothing left other than to integrate with Emacs. There is no need to write a full mode for that. A simple Flycheck checker should do. Turns out, it already exists, but I could not make it work.

The existing Emacs checker uses vale JSON output (--output JSON), which gives access to all details of the result. We can write the simplest checker from scratch, by recognizing patterns with --output line:

(flycheck-define-checker vale
  "A checker for prose"
  :command ("vale" "--output" "line"
            source)
  :standard-input nil
  :error-patterns
  ((error line-start (file-name) ":" line ":" column ":" (id (one-or-more (not (any ":")))) ":" (message) line-end))
  :modes (markdown-mode org-mode text-mode)
  )
(add-to-list 'flycheck-checkers 'vale 'append)

Note that for this to work, you need vale in your PATH. I packaged it in the Open Build Service. You need also a $HOME/.vale.ini or in the root of your project:

StylesPath = /usr/share/vale/styles
Vocab = Blog
[*.txt]
BasedOnStyles = Vale, write-good
[*.md]
BasedOnStyles = Vale, write-good
[*.org]
BasedOnStyles = Vale, write-good

And with this Emacs works:

emacs.png

Due to --output line not providing severity, every message shows as error.

I am looking forward to integrate vale in some documentation, develop custom styles and why not, investigate and fix the original flycheck-vale project.

Also pending is to expand the configuration to work with my Emacs based mail client, which should be a matter of hooking into mu4e compose mode.

While writing this post, I had to fix the unintended use of passive voice tens of times. Valuable feedback.