Internationalized domain name (IDN) explained
A Sri Lankan website සහායපියස.ලංකා using a Sinhala IDN domain
An Internationalized Domain Name (IDN) is a domain name that contains at least one character beyond the basic a-z, 0-9, and - (hyphen) range -- the characters originally allowed by DNS. IDN opens domain names to characters from national alphabets: letters with diacritics (é), Chinese characters (指), Sinhala characters (සිං), Arabic script, Cyrillic, and hundreds of other scripts supported by Unicode.
The concept was designed to overcome a fundamental limitation of the early internet -- the DNS, created in 1983, only understood ASCII1. For billions of people whose languages don't use the Latin alphabet, this meant domain names were written in a foreign script. IDN changes that. Users can register and type domains in their native language, which makes web addresses easier to remember and culturally relevant2.
History and development of IDN
Early experiments (1996-2000)
The idea of non-ASCII domain names appeared almost as soon as the internet went global. In December 1996, Martin Dürst at the University of Zürich proposed UTF-5, the first known ASCII-compatible encoding for domain names -- a concept that would later become central to the entire IDN architecture3. It was a rough early sketch, but the principle was right: encode Unicode characters into something DNS can handle.
Real working prototypes came from Singapore. In 1998, Tan Tin Wee and his team at the Internet Research and Development Unit (IRDU) of the National University of Singapore built a functional multilingual domain name system they called iDNS4. The team -- including Tan Juay Kwang and Leong Kok Yong -- demonstrated that Chinese, Malay, Tamil, and other scripts could actually work in domain names. An Asia-Pacific testbed was launched under the Asia Pacific Networking Group (APNG), and James Seng, a former student of Tan Tin Wee, led the technical work at a spin-off company called i-DNS.net International, which received significant investment from General Atlantic5.
These weren't just academic exercises. The i-DNS.net system was operational and handling real queries. But it was proprietary, and other groups around the world were building their own incompatible systems. Multiple commercial vendors began offering "multilingual domain names" in different ways, which inevitably led to confusion and interoperability problems3.
The IETF standardization process (2000-2003)
In January 2000, the IETF formed the IDN Working Group, chaired by James Seng and Marc Blanchet, to develop a proper standard3. The working group had to choose from several competing encoding proposals -- RACE (Row-based ASCII Compatible Encoding), DUDE (Differential Unicode Domain Encoding), LACE, and several variants in the AMC-ACE family6. After extensive evaluation, the group settled on a new algorithm designed by Adam Costello: Punycode, specified in RFC 34927.
In March 2003, the IETF published the first complete IDN standard as a set of three RFCs, collectively known as IDNA 2003:
- RFC 3490 -- the core IDNA protocol, defining how applications process internationalized domain names8
- RFC 3491 -- Nameprep, a profile of Stringprep (RFC 3454) that handles case folding, normalization, and character mapping9
- RFC 3492 -- the Punycode algorithm itself7
The protocol chain worked like this: take the Unicode input, apply Nameprep (which normalizes and lowercases it using Unicode normalization form KC), encode the result with Punycode, and prepend the xn-- ACE prefix. The DNS never sees any Unicode -- it just sees ASCII labels starting with xn--.
IDNA 2008: A significant revision
IDNA 2003 worked, but it had problems. It was hardwired to Unicode 3.2, so any new characters added in later Unicode versions couldn't be used in domain names without updating the standard. And some of its character mappings were controversial.
The most notorious case: IDNA 2003 mapped the German Eszett (ß, U+00DF) to "ss" and the Greek final sigma (ς, U+03C2) to the regular sigma (σ). This meant German speakers couldn't register a domain with ß in it -- it would silently become "ss" -- and Greek speakers lost the distinction between word-final and word-internal sigma10. For a standard meant to respect local languages, this was a real problem.
In August 2010, the IETF published IDNA 2008 as RFCs 5890-589511. The revision made fundamental changes:
- No more Nameprep/Stringprep. Instead of mapping characters before encoding, IDNA 2008 defines which Unicode code points are valid (PVALID), which are disallowed, and which are context-dependent. The protocol is based on Unicode character properties, so it automatically handles new Unicode versions without protocol updates12.
- Eszett and final sigma became registrable. Both characters are PVALID in IDNA 2008.
- No implicit case mapping. Applications can still display names however they want, but the protocol itself doesn't silently transform characters.
The transition between the two versions created a compatibility headache, though. A domain registered under IDNA 2008 rules (like one containing ß) would resolve to a completely different name under an IDNA 2003 implementation. The Unicode Consortium published UTS #46 (Unicode IDNA Compatibility Processing) as a bridge, but four characters -- ß, ς, ZWJ, and ZWNJ -- can't be fully bridged13. Our Punycode article covers the encoding algorithm and the IDNA pipeline in detail.
How IDN works: the technical pipeline
When you type an internationalized domain name into a browser, here's what actually happens:
Step one -- IDNA processing. The application takes the Unicode string and processes it according to IDNA rules. Under IDNA 2003, this meant running Nameprep (normalization, case folding, and prohibited character checks). Under IDNA 2008, the application checks each code point against the derived property tables from RFC 5892 and applies contextual rules from RFC 5893 for bidirectional text1214.
Step two -- Punycode encoding. Each domain label (the parts between the dots) that contains non-ASCII characters gets encoded using the Punycode algorithm. ASCII-only labels are left unchanged.
Step three -- ACE prefix. The encoded label is prefixed with xn-- to signal that it's a Punycode-encoded internationalized label8.
Step four -- DNS lookup. The resulting ASCII string (xn--...) is sent to the DNS resolver, which treats it like any other domain name. The DNS itself has no idea it's dealing with an IDN.
So when someone enters pokémon.com in their browser, the application converts it to xn--pokmon-dva.com and sends that to DNS. The user never sees the Punycode form unless the browser decides to show it (which happens in certain security-sensitive situations -- more on that below).
This entire process is transparent for the user, who just types a domain in their own language and gets to the right website. The ASCII plumbing is completely hidden.
Internationalized top-level domains (IDN TLDs)
For the first decade of IDN, only the part before the TLD could use international characters. You could have münchen.de, but the .de part was still Latin. The natural next step was internationalizing the TLD itself.
ICANN's Fast Track Process
In October 2009, at its annual meeting in Seoul, the ICANN Board approved the IDN ccTLD Fast Track Process -- a mechanism for countries and territories to request top-level domains in their own scripts15. Applications opened on November 16, 2009, and the first countries to pass the evaluation were Egypt, Russia, Saudi Arabia, and the United Arab Emirates16.
On May 5, 2010, the first IDN ccTLDs went live in the root zone -- all in Arabic script16:
- مصر (masr) -- Egypt
- السعودية (AlSaudiah) -- Saudi Arabia
- امارات (Emarat) -- United Arab Emirates
This was a genuinely historic moment. For the first time, a URL could be written entirely in a non-Latin script, from the domain name to the top-level domain. Russia's Cyrillic .рф followed shortly after, going live on May 12, 2010, with президент.рф (president) and правительство.рф (government) as the first accessible sites17.
Current state of IDN TLDs
As of mid-2025, 151 TLDs have been delegated as IDNs, representing 37 languages across 23 scripts18. Some notable examples:
| TLD | Script | Country/Region |
|---|---|---|
| .рф | Cyrillic | Russia |
| .中国 | Chinese | China |
| .भारत | Devanagari | India |
| .مصر | Arabic | Egypt |
| .台灣 | Chinese | Taiwan |
| .한국 | Hangul | South Korea |
| .ไทย | Thai | Thailand |
The Russian .рф is by far the most popular IDN ccTLD. When general registration opened on November 11, 2010, over 240,000 domains were registered on the first day alone17. It currently holds around 769,000 registrations19.
IDN in email addresses
Domain names were only half the problem. Even after IDN was standardized, email addresses were stuck with ASCII -- both in the local part (before the @) and in the domain part. Someone could visit a website at münchen.de, but they couldn't send email to kontakt@münchen.de.
The Email Address Internationalization (EAI) standards, published by the IETF in February 2012, addressed this gap:
- RFC 6530 -- Framework for internationalized email20
- RFC 6531 -- SMTP extension (SMTPUTF8) allowing UTF-8 in email addresses and headers21
- RFC 6532 -- Internationalized email headers22
The key mechanism is the SMTPUTF8 extension. When an SMTP server announces SMTPUTF8 support during the EHLO handshake, both the sender and recipient addresses can contain UTF-8 characters. This works for the local part too -- not just the domain.
Adoption has been slow but steady. In August 2014, Google announced that Gmail could send to and receive from internationalized email addresses (though Gmail accounts themselves still require ASCII usernames)23. Postfix added SMTPUTF8 support in version 3.0 (February 2015)24. Exim and Sendmail also support SMTPUTF8. But many smaller email providers and corporate mail systems still don't, which means an internationalized email address won't work reliably everywhere.
Security considerations: homograph attacks
IDN's greatest strength -- visual similarity to native-language text -- is also its biggest security weakness. Because Unicode contains characters from hundreds of scripts, and many of those characters look identical or nearly identical to Latin letters, attackers can craft domain names that visually impersonate legitimate sites. This is the IDN homograph attack, first described by Evgeniy Gabrilovich and Alex Gontmakher in 200225.
The classic example: the Cyrillic letter "а" (U+0430) is visually indistinguishable from the Latin "a" (U+0061). An attacker could register аpple.com (with Cyrillic "а") and it would look exactly like apple.com in the address bar. Our Punycode article's section on homograph attacks covers this topic in depth, including the full table of common homoglyphs and the Punycode encoding that reveals these fakes.
Defenses operate at multiple levels:
Browsers are the first line of defense. Chrome, Firefox, and Edge all implement policies to detect suspicious IDN labels and display them as Punycode instead of Unicode. The general approach: if a domain label mixes scripts (like Latin and Cyrillic), show the raw xn-- form. Browsers also maintain lists of "confusable" characters and apply script-specific rules26. Chrome tightened its IDN restrictions significantly starting with version 5927.
Registries implement policies at the registration level. Many restrict IDN registrations to a single script per label, or require that registrants match the language community of the TLD. ICANN's IDN Implementation Guidelines, updated to version 4.1 in April 2025, strengthen protections against consumer confusion and DNS abuse28.
ICANN itself prohibits any IDN TLD from choosing a string that could be visually confused with an existing TLD15.
Universal Acceptance: the gap between standards and reality
Having a standard is one thing. Having it work everywhere is another.
Universal Acceptance (UA) is the principle that all domain names and email addresses -- regardless of script, length, or TLD -- should be accepted, validated, stored, and processed correctly by all internet-enabled applications. In practice, a lot of software fails at this. Forms that reject email addresses with non-ASCII characters, libraries that can't handle long TLDs, websites that truncate or mangle IDN domains -- these are all UA failures29.
The Universal Acceptance Steering Group (UASG), established by ICANN, spent a decade tracking and advocating for UA readiness across the internet ecosystem. Their annual readiness reports consistently showed underwhelming numbers: as of 2024, only about 28% of mail servers across gTLD domains supported EAI, up from roughly 20% in 202230. In 2025, ICANN transitioned UASG's work to a new President's Committee on Universal Acceptance, signaling a shift in approach31.
The core issue isn't technical -- the standards exist and they work. It's inertia. Software developers often hardcode ASCII assumptions into validation logic, database schemas, and email handling. Fixing this across the global software ecosystem is a slow, unglamorous process.
Current adoption and statistics
Despite being standardized for over two decades, IDN adoption remains surprisingly modest. The numbers tell an interesting story.
There are an estimated 4.4 million second-level IDN registrations worldwide, representing about 1.2% of the global domain name market19. That's a tiny fraction, considering the majority of the world's population uses non-Latin scripts.
The distribution is heavily skewed. Chinese script accounts for 49% of IDN registrations under gTLDs, followed by Latin script (with diacritics) at 28%32. Among ccTLDs, the biggest IDN holders are:
- .рф (Russia) -- ~769,000 domains
- .de (Germany, Latin+diacritics) -- ~648,000
- .cn (China) -- ~537,000
- .中国 (China, IDN TLD) -- ~164,000
- .jp (Japan) -- ~85,000
Support at the registry level is reasonably good: 85% of ccTLDs and 41% of gTLDs offer IDN registrations19. The regional breakdown varies -- Europe and Asia lead with around 88% and 87% registry support respectively, while the Americas trail at 68%.
But registrations tell only part of the story. When the IDN World Report asked ccTLD registries to rate end-user awareness of IDNs, the average score was just 2.5 out of 519. Most internet users simply don't know they can register domains in their own script. And even among those who do, the Universal Acceptance problems described above make IDNs feel unreliable for professional use.
Growth has been essentially flat. Median IDN registration growth across 41 ccTLDs was 0.6% in 202319. A few registries (.рф, .vn, .hk) saw strong growth, but most barely moved.
68% of ccTLD registries surveyed identified Universal Acceptance as the most important factor for IDN uptake -- even more important than end-user awareness19. Which makes sense: there's little point in registering a domain in Arabic or Hindi if half the forms on the internet won't accept it as valid.
Citations
-
RFC 1035: Domain Names -- Implementation and Specification. Retrieved March 1, 2026 ↩
-
ICANN: Internationalized Domain Names. Retrieved March 1, 2026 ↩
-
ICANN / Tan Tin Wee: The History of Internationalised Domain Names (IDN). Retrieved March 1, 2026 ↩ ↩2 ↩3
-
Internet Hall of Fame: Official Biography: Tan Tin Wee. Retrieved March 1, 2026 ↩
-
APNIC Blog: Singapore's Father of IDNs. Retrieved March 1, 2026 ↩
-
IETF: Report of the IDN ACE Design Team. Retrieved March 1, 2026 ↩
-
RFC 3492: Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA). Retrieved March 1, 2026 ↩ ↩2
-
RFC 3490: Internationalizing Domain Names in Applications (IDNA). Retrieved March 1, 2026 ↩ ↩2
-
RFC 3491: Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN). Retrieved March 1, 2026 ↩
-
Unicode Consortium: FAQ -- International Domain Names (IDN). Retrieved March 1, 2026 ↩
-
RFC 5890: Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework. Retrieved March 1, 2026 ↩
-
RFC 5892: The Unicode Code Points and Internationalized Domain Names for Applications (IDNA). Retrieved March 1, 2026 ↩ ↩2
-
Unicode Consortium: UTS #46: Unicode IDNA Compatibility Processing. Retrieved March 1, 2026 ↩
-
RFC 5893: Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA). Retrieved March 1, 2026 ↩
-
ICANN: IDN ccTLD Fast Track Process. Retrieved March 1, 2026 ↩ ↩2
-
ICANN: First IDN ccTLDs Available. Retrieved March 1, 2026 ↩ ↩2
-
ICANN: Russian IDN ccTLD .рф Opens for Registrations, Makes History. Retrieved March 1, 2026 ↩ ↩2
-
ICANN: ICANN Highlights IDN Progress With Release of IDN Annual Report June 2025. Retrieved March 1, 2026 ↩
-
IDN World Report: IDN World Report 2024. Retrieved March 1, 2026 ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
RFC 6530: Overview and Framework for Internationalized Email. Retrieved March 1, 2026 ↩
-
RFC 6531: SMTP Extension for Internationalized Email. Retrieved March 1, 2026 ↩
-
RFC 6532: Internationalized Email Headers. Retrieved March 1, 2026 ↩
-
Google Workspace Updates: Support for third-party internationalized email addresses in Gmail. Retrieved March 1, 2026 ↩
-
Postfix: Postfix SMTPUTF8 support. Retrieved March 1, 2026 ↩
-
Gabrilovich and Gontmakher: The Homograph Attack. Communications of the ACM, 2002 ↩
-
Mozilla Wiki: IDN Display Algorithm. Retrieved March 1, 2026 ↩
-
Chromium: IDN Display Algorithm. Retrieved March 1, 2026 ↩
-
ICANN: IDN Implementation Guidelines Version 4.1. Retrieved March 1, 2026 ↩
-
ICANN: Universal Acceptance (UA). Retrieved March 1, 2026 ↩
-
UASG: FY24 Universal Acceptance-Readiness Report. Retrieved March 1, 2026 ↩
-
ICANN: Universal Acceptance: Aligning Resources and the Path Forward. Retrieved March 1, 2026 ↩
-
ICANN: Internationalized Domain Name (IDN) Report -- June 2024. Retrieved March 1, 2026 ↩
Updated: March 1, 2026