CharacterSet

CharacterSet (and its reference type counterpart, NSCharacterSet) is a Foundation type used to trim, filter, and search for characters in text.

nshipster.com/characterset/

The article introduces CharacterSet, a Foundation type in Swift for manipulating Unicode scalar values, distinct from Set<Character> despite its name, as it conforms to the SetAlgebra protocol rather than storing Character values directly.

It details predefined character sets like alphanumerics, letters, and URL-specific sets (e.g., urlQueryAllowed), which align with Unicode General Categories, and warns about common pitfalls like confusing capitalizedLetters (titlecase) with uppercaseLetters.

Practical uses include trimming whitespace with whitespacesAndNewlines, percent-encoding URL components, and validating user input by creating custom sets with formUnion or using inverted for exclusion. Advanced functionality allows creating a CharacterSet for Emoji using Swift 5’s Unicode.Scalar.properties.isEmoji, with its bitmapRepresentation enabling efficient storage as a 16KB Data object.

The article contrasts CharacterSet with NSCharacterSet, noting its evolution from a 16-bit UCS-2 context to Swift’s Unicode-compliant String, yet it remains a performant tool for text processing tasks like normalization and filtering.


Category:

Tag:

Year: