Check if words sound alike using Natural

InstructorHannah Davis

Share this video with your friends

Send Tweet

In this lesson, we’ll take a look at Natural’s phonetics feature. We’ll learn how to check whether two words sound alike, looking at both the SoundEx and Metaphone algorithms.

Yonatan Shalev
~ 4 years ago

Should I split documents into single sentences or use them as is to train Brain.js text classification model? I was wondering what's the best way to feed the model with training data.

Can i just use the document as is? like this: {"phrase": "First long document with up to 30 sentences", "result": {"label 1": 1}}, {"phrase": "first long document with up to 30 sentences", "result": {"label 2": 1}} {"phrase": "Second long document with up to 30 sentences", "result": {"label 2": 1}}, etc. Or, should I split all documents into sentences and then the data will look like something this: {"phrase": "Sentence 1 out of document 1", "result": {"label 1": 1}}, {"phrase": "Sentence 2 out of document 1", "result": {"label 2": 1}}, etc.

{"phrase": "Sentence 1 out of document 2", "result": {"label 5": 1}}, etc.

{"phrase": "Sentence X out of document X", "result": {"No labels at all": 1}}, etc. Same question about using the model, should I just apply it on the complete document or should I split it to separate sentences then apply the model on each sentence.

What's the best practice?