Regular Expressions
advanced
#regex#unicode#v-flag

Working with Unicode in JavaScript: The New RegExp 'v' Flag

The new /v (Unicode Sets) flag super-charges JavaScript regular expressions with string-level properties, set operations and more reliable Unicode matching. This guide shows what it unlocks and how to detect support.

June 28, 2025
5 min read
Share this article:

JavaScript's regular expression engine has historically had limitations when working with Unicode properties. The new 'v' flag for RegExp addresses these limitations, providing more accurate and powerful Unicode support.

The Problem with Unicode Matching

Consider a simple task: matching all uppercase letters in a string that includes non-Latin characters. The traditional approach has limitations:

javascript
const text = 'Hello Σκύλος مرحبا'

// Traditional approach - incomplete
const upperRegex = /[A-Z]/g
console.log(text.match(upperRegex)) // ['H'] - misses 'Σ'

This approach only matches Latin uppercase letters, missing characters from other scripts. The 'v' flag, combined with Unicode property escapes, solves this problem.

Using the 'v' Flag

The 'v' flag enables more accurate Unicode matching:

javascript
const text = 'Hello Σκύλος مرحبا'
const upperRegex = /\p{Lu}/gv

console.log(text.match(upperRegex)) // ['H', 'Σ']

Key Features

1. Case Folding

The 'v' flag also fixes edge-cases involving complement classes and case-insensitive matching. A classic example is the Greek sigma:

javascript
// Greek capital sigma (Σ) vs. small sigma (σ)

// Without 'v' flag – complement fails unexpectedly
console.log(/[^σ]/i.test('Σ')) // false

// With 'v' flag – complement behaves correctly
console.log(/[^σ]/iv.test('Σ')) // true

2. Property Escapes

Unicode property escapes become more powerful with the 'v' flag:

javascript
const text = 'Hello مرحبا 你好 123'

// Match all letters, regardless of script
const letterRegex = /\p{Letter}/gv
console.log(text.match(letterRegex))
// ['H', 'e', 'l', 'l', 'o', 'م', 'ر', 'ح', 'ب', 'ا', '你', '好']

// Match specific scripts
const arabicRegex = /\p{Script=Arabic}/gv
console.log(text.match(arabicRegex))
// ['م', 'ر', 'ح', 'ب', 'ا']

3. Set Operations

The 'v' flag enables set operations in character classes:

javascript
const text = 'Hello1 مرحبا2 你好3'

// Match characters that are letters but not Latin
const nonLatinRegex = /[\p{Letter}--\p{Script=Latin}]/gv
console.log(text.match(nonLatinRegex))
// ['م', 'ر', 'ح', 'ب', 'ا', '你', '好']

Practical Applications

1. Input Validation

Create more accurate validation for international user input:

javascript
function isValidName(name) {
  // Allow letters from any script, spaces, and common punctuation
  const nameRegex = /^[\p{Letter}\p{Mark}\s'.-]+$/v
  return nameRegex.test(name)
}

console.log(isValidName('José García')) // true
console.log(isValidName('محمد علي')) // true
console.log(isValidName('王小明')) // true
console.log(isValidName('John123')) // false

2. Text Analysis

Analyze text content across different writing systems:

javascript
function getScriptDistribution(text) {
  const distribution = new Map()

  const scripts = ['Latin', 'Arabic', 'Han', 'Greek', 'Cyrillic']
  for (const script of scripts) {
    const regex = new RegExp(`\\p{Script=${script}}`, 'gv')
    const matches = text.match(regex) || []
    if (matches.length > 0) {
      distribution.set(script, matches.length)
    }
  }

  return distribution
}

const text = 'Hello Σκύλος مرحبا 你好'
console.log(getScriptDistribution(text))
// Map(4) { 'Latin' => 5, 'Greek' => 6, 'Arabic' => 5, 'Han' => 2 }

3. Advanced Search Functionality

Implement sophisticated search features that work across scripts:

javascript
function searchIgnoringDiacritics(text, query) {
  // Match base characters, ignoring diacritical marks
  const regex = new RegExp(query, 'vi')
  return regex.test(text)
}

console.log(searchIgnoringDiacritics('résumé', 'resume')) // true
console.log(searchIgnoringDiacritics('Σκύλος', 'σκυλος')) // true

Browser Support

Standardised in ECMAScript 2024 (spec text now in the ES-2026 draft). Available in Chrome 117+, Firefox 119+, Safari TP (17.4) and Node 20.12+. To feature-detect the flag at runtime you can use:

javascript
const hasVFlag = (() => {
  try {
    new RegExp('', 'v')
    return true
  } catch {
    return false
  }
})()

Best Practices

  1. Performance: Unicode-aware regular expressions can be slower than simple ASCII matching. Use them when you specifically need Unicode support.

  2. Validation: Always test your regular expressions with a diverse set of input strings from different writing systems.

  3. Maintenance: Document your Unicode patterns well, as they can be less immediately readable than traditional regular expressions.

Conclusion

The RegExp 'v' flag significantly improves JavaScript's Unicode handling capabilities. It enables more accurate text processing across different writing systems, making it easier to build truly international applications. While it adds some complexity, the benefits of proper Unicode support far outweigh the learning curve for applications that need to handle multilingual text.