How do machines learn meaning ?

9 min readNov 22, 2020

Word Sense Disambiguation ( From Sheldon to Chandler😅 )

**Spoiler alert 👻 :- Machines are intellligent because humans are teriffic at teaching**

Semantic processing is about understanding the meaning of a given piece of text. Computers consist of on/off switches and process meaningless symbols. So how is it that we can hope that computers might understand the meaning of words, products, actions and documents?

Here we will basically be discussing “Word Sense Disambiguation” problem and how to deal with it which arises due to different meaning of words in different context.

In this article , we will go step by step👣 . First understand the naive ways then will discuss about how it fails in certain circumstances, and the how to mend that 🦾 .

We will using a sentence -” I went to the bank to deposit money” in the article. We will see how our algorithm distinguishes the meaning of bank from the water body -bank to the fanacial organization -bank . We will be using lesk algorithm helps in word sense disambiguation. The word ‘bank ‘can have multiple meanings depending on the surrounding (or the context) words. The lesk algorithm helps in finding the ‘correct’ meaning.

Lets start

Your brain can process sentences meaningfully because it can relate the text to other words and concepts it already knows.

It can process meaning in the context of the text and can disambiguate between multiple possible senses (identifying the intended meaning of an ambiguous word) of words .
Also, your brain has an understanding of topics being talked about in a text, even though these exact words are not present in the text .

For e.g. the word ‘bank’ has different meanings in the phrases ‘bank of a river’ and ‘a commercial bank’ because the word happens to be used differently in these contexts.

Terms which appear in similar contexts are similar to each other.
Terms acquire meaning through use in certain contexts, and the meaning of terms may change depending on the context in which they appear.

Lets understand some terminology -

Say you ask Alexa,

‘Alexa, what is a Labrador?’, it answers ‘It is a breed of dogs ‘.

Say you ask ‘Who is the coach of the Indian cricket team?’, it reveals the name of the coach.

To answer such questions, a system needs some kind of mapping between entities and entity types, i.e. it needs to understand that a Labrador is a dog, a mammal is an animal, a coach is a specific person etc.

This brings us to the concept of associations between entities and entity types. These associations are represented using the notion of predicates. Let’s study the idea of predicates.

For this we need a framework and schema .

Schema.org is a joint effort by Google, Yahoo, Bing and Yandex (Russian search engine) to create a large schema relating the most commonly occurring entities on web pages. The main purpose of the schema is to ease search engine querying and improve search performance.

For example, say a web page of a hotel (e.g. Hotel Ginger) contains the words ‘Ginger’ and ‘four stars’. How would a search engine indexing this page know whether the word ‘Ginger’ refers to the plant ginger or Hotel Ginger? Similarly, how would it know whether the phrase ‘four stars’ refers to the rating of a hotel or to astronomical stars?

To solve this problem, schema.org provides a way to explicitly specify the types of entities on web pages. For example, one can explicitly mention that ‘Ginger’ is the name of the hotel and specify various entities such as its rating, price etc. (example HTML shown below).

Aboutness

for example,-

‘Croatia fought hard before succumbing to France’s deadly attack; lost the finals 2 goals to 4.’

In the above text, if we want the machine to detect the game of football , then we need to formally define the notion of aboutness.

We can, for example, detect that the game is football by defining semantic associations such as “Croatia” is-a “country”, “France” is-a “country”, “finals” is-a “tournament stage”, “goals” is-a “scoring parameter” and so on. By defining such relationships, we can probably infer that the text is talking about football by going through the enormous schema. But you can imagine the kind of search this simple sentence would require. Even if we search through the schema, it doesn’t mean we’ll be able to decide that the game is football.

This leads us to define another semantic association — ‘aboutness’.Thus, to understand the ‘aboutness’ of a text basically means to identify the ‘topics’ being talked about in the text( called Topic Modelling)

Thus, to understand the ‘aboutness’ of a text basically means to identify the ‘topics’ being talked about in the text. What makes this problem hard is that the same word (e.g. China) can be used in multiple topics such as politics, the Olympic games, trading etc.

There are some nomenclatures used to classify types of associations between terms and concepts.

Even after defining such a wide range of association types, one cannot cover the wide range of complexities of natural languages.

For example, consider how two words are often put together to form a phrase. The semantics of the combination of these words could be very different than the individual words. For example, consider the phrase — ‘cake walk’. The meanings of the terms ‘cake’ and ‘walk’ are very different from the meaning of their combination.Such cases are said to violate the principle of compositionality.

So now what , how can machine learn the meaning of a word or a sentence ?😥

Resource for semantic processing 🔮

WordNet

WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings.

WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser.

When I typed “free”, it has shown various senses in which free can be used

So it aint that magically , but we have put a lot of effort to make the machine understand and learn the meaning .

ConceptNet

ConceptNet is a freely-available semantic network, designed to help computers understand the meanings of words that people use.It deals specifically with assertions between concepts.

**Thatz the reason , I said humans are terrific at teachering**

For example, there is the concept of a “dog”, and the concept of a “kennel”. As a human, we know that a dog lives inside a kennel. ConceptNet records that assertion with /c/en/dog /r/AtLocation /c/en/kennel.

As you can see , Isearched apple and Iended up getting all these informations-

Related terms,Synonyms,Derived terms,apple is used for…,Properties of apple,Location of apple,apple is a type of…,Types of apple,apple has…,Etymologically derived terms,apple can be…,apple is capable of…,Etymologically related,Distinct terms,Antonyms,Parts of apple,Word forms,Things that require apple,Symbols of apple,Etymological roots of “apple”,Context of this term,apple is part of…,Things created by apple,Terms with this context,apple is near…,apple is a species of… 🤓🤯

It isnt that perfect although.. If you look closely to all the info that it provided for word “apple” , you will understand.

Now we know all that we needs to know , to arrive to the agenda of our article. Drum rolls 🥁🥁🥁

Word Sense Disambiguation

WSD is basically solution to the ambiguity which arises due to different meaning of words in different context.For example, consider the two sentences.

“The bank will not be accepting cash on Saturdays.

“The river overflowed the bank.”

The word bank in the first sentence refers to the commercial (finance) banks, while in second sentence, it refers to the river bank. The ambiguity that arises due to this, is tough for a machine to detect and resolve.

It has received increasing attention due to its promising applications in the fields of Sentiment Analysis, Information Retrieval, Information Extraction, Machine Translation ,Knowledge Graph Construction etc.

Word sense disambiguation (WSD) is the task of identifying the correct sense of an ambiguous word such as ‘bank’, ‘bark’, ‘pitch’ etc.

WSD tasks are often classified into two types: lexical sample WSD and all-word WSD. The former focuses on disambiguating only some particular target words, while the latter conducts WSD on every word in a document.

We can say broadly different solutions of WSD has been p divided into

Supervised techniques where word sense disambiguation require the input words to be tagged with their senses. The sense is the label assigned to the word. WSD is modeled as a classification problem, with each classifier dealing with one target word. Each classifier is trained separately with all annotated samples concerning a particular target word.For example - Naive Bayes ,Logistic regression ,Neural Networks ,Support-vector machines ,k-Nearest Neighbors etc.
Knowledge-based ( Unsupervised) techniques, where the words are not tagged with their senses, which are to be inferred using other techniques. In comparison, the knowledge-based approaches have gained a rapid development in recent years due to their independence of an expensive sense-annotated corpus.

Let explore the Unsupervised way of handling this problem ,it holds great advantage due to lack of labelled data . A popular unsupervised algorithm used for word sense disambiguation is the Lesk algorithm.

Lesk Algorithm

It is unsupervised WSD algorithm based on comparing dictionary definitions of the word and its neighbouring words. It is implemented in NLTK.

Given an ambiguous word and the context in which the word occurs, Lesk returns a Synset with the highest number of overlapping words between the context sentence and different definitions from each Synset.

Look at this example -

Here this is what is happening — you just take the definitions corresponding to the different senses of the ambiguous word and see which definition overlaps maximum with the neighbouring words of the ambiguous word. The sense which has the maximum overlap with the surrounding words is then chosen as the ‘correct sense’.

Lets implement this from scratch in python🐍

Where we will follow the below steps :

Tokenize the senses of the word

2. Tokenize the sentence

3.Count the overlapping words

4. Sense with maximum overlap

You can find the full code here GitHub. Thats all for now ..🤗

If you liked the article, show your support by clapping for this article. This article is basically a colab of many articles from machinelearning mastery, medium , analytical vidya , upgrad material etc.

If you are also learning Machine learning like me follow me, for more articles. Lets go on this trip together :)

You can also follow me on Linkedin