Swift – Natural language recognizer


with iOS 12, Apple released a new framework for recognition and other interesting stuff. Is called NLLanguageRecognizer.

Use the framework to perform tasks like language and script identification, tokenization, lemmatization, parts-of- tagging, and named entity recognition. You can also use this framework with Create ML to train and deploy custom natural language models.

This framework provides a high-level API for lots of language detection features using .

Let’s see some example:
  • Which language is this?
  • How many contain this phrase?
  • There are names inside? Places? Company names?

Who knows. Me not.


  1. JA, or better JAPANESE
  2. 23 words in this phrase
  3. The english translations is “Hi, I’m Alberto, I live in Italy and I write in an unknown language.”, so yes, there is a name and a inside.

Let’s do it to the new /macOS common framework, NLNaturalLanguage to see how it works.

Examine this phrase:

let string = "Ciao, sono Alberto, vivo in Italia e scrivo in an unknown language. Mi piace la Coca-Cola."

it’s mixed, ITALIAN / ENGLISH. We can use this as a good example.

import NaturalLanguage

let string = "Ciao, sono Alberto, vivo a Bergamo e scrivo in an unknown language. Mi piace la CocaCola."

// create a new recognizer
let languageRecognizer = NLLanguageRecognizer()

// that should read your string

// get eventually any language hypoteses
let hypoteses = languageRecognizer.languageHypotheses(withMaximum: 2) //2

// get the dominant language of the phrase
let language  = languageRecognizer.dominantLanguage!.rawValue 

print("First language is  : \(language)")
print("Other languages are: \(hypoteses)")

output in console:

First language is  : itOther languages are:   
[__C.NLLanguage(_rawValue: it): 0.9752411842346191,   
__C.NLLanguage(_rawValue: en): 0.009950380772352219]

We receive the languages and the percentage of the confidence. Good. Italian is about 0.97% so we can trust the algorithm.


Let’s count the words (or the , or the sentences, or the document…) using NLTokenizer:

// create a new tokenizer// choose your unit (word, paragraph, sentences, document)let tokenizer = NLTokenizer(unit: .word)  

// set your language (or use the discovered one...)
tokenizer.setLanguage( .italian )  //NLLanguage(language) )

// link your string  

tokenizer.string = string

// get tokens
let tokens = tokenizer.tokens(for: string.startIndex..<string.endIndex)

print( "Words: \(tokens.count)" )  
// Words: 12 . 
EXTRACT pieces of information:

Another cool feature is related to TAG, to extract tagged informations like, people names, city, places and organization names, using NLTagger.

Let’s see how:

// create a tagger
let tagger = NLTagger(tagSchemes: [.nameType])

// set the text

tagger.string = string

// select the options

let options: NLTagger.Options = [  

// and the tag to extract

let : [] = [
 // and much more...

// create all the tags

let tags = tagger.tags(   
  in: string.startIndex..<string.endIndex,
  unit: .word,
  scheme: .nameType,
  options: options) { tag, tokenRange in    
    if let tag = tag, tags.contains(tag) {
        print("\(tag.rawValue) -> \(string[tokenRange])")
  return true


The result is nice… with mixed languages happens something strange, but it’s ok.

PersonalName -> AlbertoPlaceName -> BergamoOrganizationName -> an unknown language
OrganizationName -> CocaCola

You are able to know the language of the phrase so, you can easily speech the text in the real and in the correct language using AVSpeechSynthesizer!

let speechUtterance = AVSpeechUtterance(string: string)

//speechUtterance.rate = 0.7

speechUtterance.volume = 1.0

// set your discovered language

speechUtterance.voice = AVSpeechSynthesisVoice(language: language)


Instead of using these old techniques that make me laugh now… 😉

And that’s all for now.

Go deeper into this framework because is very interesting.