Swift – Natural language recognizer

Hi!

with iOS 12, Apple released a new framework for language recognition and other interesting stuff. Is called NLLanguageRecognizer.

Use the Natural Language framework to perform tasks like language and script identification, tokenization, lemmatization, parts-of-speech tagging, and named entity recognition. You can also use this framework with Create ML to train and deploy custom natural language models.

This framework provides a high-level API for lots of language detection features using text.

Let’s see some example:
こんにちは、私はアルベルトです、私はイタリアに住んでいて、私は不明な言語で書いています。
  • Which language is this?
  • How many words contain this phrase?
  • There are names inside? Places? Company names?

Who knows. Me not.

Answers:

  1. JA, or better JAPANESE
  2. 23 words in this phrase
  3. The english translations is “Hi, I’m Alberto, I live in Italy and I write in an unknown language.”, so yes, there is a name and a place inside.

Let’s do it to the new iOS/macOS common framework, NLNaturalLanguage to see how it works.

Examine this phrase:

let string = "Ciao, sono Alberto, vivo in Italia e scrivo in an unknown language. Mi piace la Coca-Cola."

it’s mixed, ITALIAN / ENGLISH. We can use this as a good example.

DETECTING LANGUAGE(s)
import NaturalLanguage



let string = "Ciao, sono Alberto, vivo a Bergamo e scrivo in an unknown language. Mi piace la CocaCola."



// create a new recognizer
 
let languageRecognizer = NLLanguageRecognizer()

 
// that should read your string
 
languageRecognizer.processString(string)

 
  
// get eventually any language hypoteses
 
let hypoteses = languageRecognizer.languageHypotheses(withMaximum: 2) //2

 
  
// get the dominant language of the phrase
 
let language  = languageRecognizer.dominantLanguage!.rawValue 
  


print("First language is  : \(language)")
 
print("Other languages are: \(hypoteses)")

output in console:

First language is  : itOther languages are:   
[__C.NLLanguage(_rawValue: it): 0.9752411842346191,   
__C.NLLanguage(_rawValue: en): 0.009950380772352219]

We receive the languages and the percentage of the confidence. Good. Italian is about 0.97% so we can trust the algorithm.


TOKENIZE A TEXT:

Let’s count the words (or the paragraph, or the sentences, or the document…) using NLTokenizer:

// create a new tokenizer// choose your unit (word, paragraph, sentences, document)let tokenizer = NLTokenizer(unit: .word)  
  


// set your language (or use the discovered one...)
  
tokenizer.setLanguage( .italian )  //NLLanguage(language) )

  
  
// link your string  

tokenizer.string = string

  
  
// get tokens
  
let tokens = tokenizer.tokens(for: string.startIndex..<string.endIndex)

  
  
print( "Words: \(tokens.count)" )  
// Words: 12 . 
EXTRACT pieces of information:

Another cool feature is related to TAG, to extract tagged informations like, people names, city, places and organization names, using NLTagger.

Let’s see how:

// create a tagger
let tagger = NLTagger(tagSchemes: [.nameType])



// set the text

tagger.string = string



// select the options

let options: NLTagger.Options = [  
  .omitPunctuation,  
  .omitWhitespace,  
  .omitOther,  
  .joinNames
]



// and the tag to extract

let tags: [NLTag] = [
 
  .personalName, 
  .placeName, 
  .organizationName
 // and much more...
]



// create all the tags

let tags = tagger.tags(   
  in: string.startIndex..<string.endIndex,
  
  unit: .word,
  
  scheme: .nameType,
  
  options: options) { tag, tokenRange in    
    if let tag = tag, tags.contains(tag) {
      
        print("\(tag.rawValue) -> \(string[tokenRange])")
    
    }
    
  return true

}

The result is nice… with mixed languages happens something strange, but it’s ok.

PersonalName -> AlbertoPlaceName -> BergamoOrganizationName -> an unknown language
OrganizationName -> CocaCola
EXTRA

You are able to know the language of the phrase so, you can easily speech the text in the real and in the correct language using AVSpeechSynthesizer!

let speechUtterance = AVSpeechUtterance(string: string)


//speechUtterance.rate = 0.7

speechUtterance.volume = 1.0



// set your discovered language

speechUtterance.voice = AVSpeechSynthesisVoice(language: language)



speechSynthesizer.speakUtterance(speechUtterance)

Instead of using these old techniques that make me laugh now… 😉

[ObjectiveC] Text to Speech with Google Translate

[Objective-C] Use Google speech on iPhone

ObjC – Tesla speech for OSX

And that’s all for now.

Go deeper into this framework because is very interesting.

 

Alberto Pasca

Software engineer @ Pirelli & C. S.p.A. with a strong passion for mobile  development, security, and connected things.