I have recently created my own syntax highlighting whose purpose is to aid me process post-OCR file. What it does is basically coloring various scripts – for instance all Latin-derived scripts in green, Cyrillic in orange and numbers in blue. Another function is to highlight all sorts of possible thrash. For instance, some dictionaries use ♦ to mark certain phraseologisms. However, OCR usually treats them as <|>, <ф>, «~► etc. I have a regex for that:
(; [•\<«'~◄\-■][СC0фóJГOО\(\)56\^\*£\$\&§][•\>»'~►\-■] )|(; [•\<«'~◄\-■][СC0фóJГO\(\)56\^\*£\$\&§] )|(; [СC0фóJГO\(\)56\^\*£\$\&§][•\>»'~►\-■] )|(; <> )
The regex works in the search function, but for whatever reason, it either works partially (highlights everything in red except for the letters, which retain their script-based colors) or not at all (I actually moved the whole block writing this, but when I moved it back it didn't work at all).
Here's the YAML-tmLanuage file on pastebin.
So, how do I make such regexes work? What can I do to make them globally override other syntax definitions?
Aucun commentaire:
Enregistrer un commentaire