[SOLVED] How to get the accent/diacritic of a letter in javascript?

Issue

I want to get the accent/diacritic of a letter in javascript.

For example:

  • ñ -> ~
  • á -> ´
  • è -> `

I tried using .normalize("NFD") but it doesn’t return the correct accent/diacritc

string = "á"
string.normalize("NFD").split("")
// ['a', '́']
string.normalize("NFD").split("").includes("´") 
// false
'́' === "´"
// false

I want NFD or any other function to give the accent/diacritic instead of the combining accent/diacritic

Solution

The short answer is because COMBINING TILDE != TILDE

Here’s a breakdown of each of the Unicode characters potentially involved in ñ for example:

Symbol Code CodePoint Name
ñ \u00F1 241 LATIN SMALL LETTER N WITH TILDE
n \u006E 110 LATIN SMALL LETTER N
̃ \u0303 771 COMBINING TILDE
~ \u007E 126 TILDE

In order to be able to separate out the diacritical marks from their attached characters, you can use string.normalize with "NFD" which provides the "Canonical Decomposition", breaking up a single glyph into different character combinations that result in the same symbol.

There are 112 different combining diacritical marks. I can’t find a native way to convert between the combining character and it’s solo counterpart. You could look for a library or write the mapping yourself for marks you want to handle like this:

const combiningMarks = {
  771: 126, // tilde
  769: 180, // acute accent
  768: 96,  // grave accent
}

Then decompose to separate chars and lookup the associated mark for each combining char like this:

const combiningMarks = {
  771: 126, // tilde
  769: 180, // acute accent
  768: 96,  // grave accent
}

const startingString = "ñáè" // "\u00F1\u00E1\u00E8"
const decomposedString = startingString.normalize("NFD") // "\u006E\u0303\u0061\u0301\u0065\u0300"
const codepoints = [...decomposedString].map(c => c.codePointAt(0)) // [110, 771, 97, 769, 101, 768]
const charsWithFullMarks = codepoints.map(c => combiningMarks[c] || c) // [110, 126, 97, 180, 101, 96]
const finalString = String.fromCodePoint(...charsWithFullMarks) // "n~a´e`"
console.log(finalString);

Answered By – Socko

Answer Checked By – Timothy Miller (BugsFixing Admin)

Leave a Reply

Your email address will not be published. Required fields are marked *