Unraveling the Natural Language Processing: From Sci-Fi Dreams to Real-world Impact

Kushan Madhusanka
9 min readFeb 18, 2024

--

NLP Tutorials Part -I from Basics to Advance - Analytics Vidhya

We will go from the wonderful realms of science fiction to the real-world effects that Natural Language Processing (NLP) has on our lives today as we look into this field. The goal of NLP research is to use computers’ computational power to carry out useful activities requiring human language. These are not just text or speech processing jobs, they also include improving human-to-human communication and developing useful language-based apps.

The emergence of Conversational Agents (digital creatures skilled at conversing and interacting) is a stunning illustration of their value. The visionary minds of science fiction, including the likes of Arthur C. Clarke, foreshadowed the future of NLP. In “2001: A Space Odyssey,” Clarke introduced HAL, an artificial agent capable of sophisticated language behaviors, from speaking to understanding English. Similar artificial agents experienced on our television screens in shows like “Knight Rider,” as science fiction writers remarkably predicted the evolution of language-capable machines.

Fast forward to the present, and we witness the convergence of fiction and reality with Japan’s Erica android. This cutting-edge creation not only communicates with humans using Natural Language but also incorporates body language, bringing us face-to-face with the realization of once-fictional possibilities.

In this exploration of NLP, we unravel the layers of its historical roots, practical applications, Building blocks of language, why natural language processing is challenging, and Approaches to natural language processing.

Practical natural language processing

Semantic Annotation

Semantic annotation involves adding metadata or tags to text to make its meaning explicit for machines. This helps in categorizing and understanding the content at a deeper level. For example, annotating a sentence to identify entities, relationships, or sentiments allows NLP systems to process and analyze the text more effectively.

Machine Translation

Machine Translation (MT) is the automated process of translating text or speech from one language to another. NLP techniques are employed to understand the semantics and context of the source language and generate coherent and accurate translations in the target language. Popular examples include Google Translate and other language translation services.

Information Retrieval

Information Retrieval (IR) involves the extraction of relevant information from a large dataset. NLP techniques help in understanding user queries and retrieving documents, web pages, or other data sources. Search engines like Google utilize NLP to provide users with the most relevant and contextually appropriate results.

Information Extraction

Information Extraction (IE) focuses on extracting specific information from unstructured data, such as text documents or web pages. NLP algorithms identify and extract entities, relationships, and events, converting unstructured information into structured formats. This is particularly useful for aggregating data from diverse sources.

Relation Extraction

Relation extraction is a subset of information extraction that specifically involves identifying and classifying relationships between entities in a text. NLP models may identify relationships between people, groups, or ideas that are mentioned in a document. This information can be used to build knowledge graphs and other applications.

Text summarization

Text summarization involves converting lengthy pieces of text while retaining essential information and meaning. NLP techniques enable the identification of key sentences or phrases, creating concise summaries. This is useful for quickly understanding the main points of articles, documents, or news stories.

Chatbots

Chatbots are computer programs designed to engage in natural language conversations with users. NLP plays a crucial role in enabling chatbots to understand user queries, generate appropriate responses, and simulate human-like interactions. They find applications in customer service, virtual assistants, and various online platforms.

Topic Modeling

Topic modeling is a technique used to identify topics present in a collection of text documents. NLP algorithms analyze the patterns of words and phrases to categorize documents into topics, providing a high-level overview of the content. This is valuable in organizing and understanding large volumes of textual data, such as news articles or research papers.

Summary:

  • Spell Correction E.g.: MS Word/ any other editor
  • Search Engines E.g.: Google, Yahoo, Bing, Baidu
  • Question Answering Systems E.g.: Google, IBM’s Watson
  • Natural Language Assistants / Speech Engines E.g.: Apple’s Siri, Google
  • Voice Machine Translators E.g.: Google Translator
  • News Feeds E.g.: Google, Yahoo
  • Text Summarization
  • Named Entity Recognition
  • Automatic Earthquake Reports E.g.: LA Times
  • Spam Classifiers E.g.: Gmail, Hotmail, Yahoo Mail
  • Opinion Mining

Why learn natural language processing

In the fast-paced digital age, our daily interactions are increased by technology, generating a huge volume of textual data. Consider your own routine: How many emails do you get in a day? How many of those did you read? How often do you engage with blogs, news articles, and social media platforms? The big question is how much time did you spend doing this?

Information Retrieval Challenges

  • Locating Old Emails: Consider the struggle of finding specific information buried in your email archives.
  • Document Formats cause the problems: Think about the variety of document formats like Word, PDF, and text

Online Engagement

  • Blogs, Articles, News: Quantify the digital content you consume daily, spanning blogs, articles, and news sites.
  • Social Media Usage: Reflect on your use of instant messaging, Twitter, or Facebook for communication and information sharing.

Search Habits

  • Search Engines: How frequently do you turn to search engines like Google, Yahoo!, or Bing for information?
  • Local Searches: Consider your habits when searching for information locally, whether on your machine or corporate intranet.

Content Production

  • Emails, Reports, and More: Assess the volume of content you generate, including emails, reports, and other documentation.
  • Time Investment: Calculate the time spent on these activities for reading, responding, and creating content.

Because Natural Language Processing can navigate the huge size of unstructured data within organizations, it has tremendous commercial potential. Since 80–90% of data is thought to exist in formats that aren’t naturally organized, natural language processing becomes a powerful tool for understanding and deriving conclusions from this confusing landscape of data. Beyond its inherent usefulness, natural language processing is essential to improving data accessibility because it gives organizations the ability to extract and apply important information from unstructured text data.

Social media platforms have transformed connections by facilitating multi-directional talks, in contrast to the unidirectional communication of traditional media. As a result of its development, social media is now regarded as a major source of business intelligence, offering companies an infinite number of insights obtained from various user interactions. Essentially, natural language processing not only reveals the business possibilities hidden in unstructured data but also embraces the ever-changing social media landscape to move organizations toward a more purposeful and knowledgeable future.

Building blocks of language

The building blocks of language refer to the fundamental components and elements that constitute human language, enabling communication and expression. These building blocks encompass various linguistic aspects that work together to convey meaning and facilitate understanding. Here are the key building blocks of language:

Phonetics and Phonology

The study of the physical sounds of human speech, including their production and acoustic properties called as Phonetics. And, Phonology is the study of the organization and patterning of sounds in a particular language, including the rules governing their combination.

Morphology

Morphology deals with the structure of words and the formation of meaningful word forms. It explores how words are built from smaller units called morphemes, which are the smallest units of meaning.

Syntax

Syntax focuses on the arrangement of words to form grammatical sentences. It involves understanding the rules governing sentence structure, including the relationships between words and the order in which they appear.

Semantics

Semantics is concerned with the meaning of words and sentences. It explores how words combine to convey meaning and how the meaning of sentences can be interpreted in different contexts.

Pragmatics

Pragmatics involves the study of language use in context. It explores how language is influenced by factors such as social context, cultural norms, and the speaker’s intentions. Pragmatics helps interpret implied meanings and understand the use of language in specific situations.

Why is natural language processing challenging?

Continuing with the complexities that make Natural Language Processing a formidable challenge.

Ambiguity

Syntactic Ambiguity: Syntactic ambiguity arises from the structural relationships between words in a sentence. For instance, consider the sentence, “Every man loves a woman.” The ambiguity lies in whether it implies that for every man, there is a woman, or if there is one particular woman loved by every man.

Attachment Ambiguity: Attachment ambiguity occurs when a constituent fits more than one position in a parse tree. Take the sentence, “The man saw the girl with the telescope.” The ambiguity arises in determining whether the man saw a girl carrying a telescope or saw her through his telescope.

Semantic Ambiguity: Semantic ambiguity represents the knowledge of meaning. In the sentence, “Saman loves her mother, and Piyal does too,” ambiguity arises as to whether Piyal loves Saman’s mother or likes his own mother. Similar ambiguities occur in sentences like “Jack invited Mary to the Halloween ball” or “The car hit the pole while it was moving.”

Discourse: Understanding linguistic units larger than a single utterance, known as discourse, adds another layer of complexity. For example, “Merck & Co. formed a joint venture with Ache Group, of Brazil. It will be called Prodome Ltd.” requires connecting information across sentences for comprehensive comprehension.

Pragmatics: Pragmatics involves understanding the relationship between meaning and the goals and intentions of the speaker. This introduces ambiguity based on context, as seen in phrases like “The CEO was fired up about his new role” versus “The CEO was fired from his new role.” Similarly, interpreting statements like “IBM’s PC division was acquired by Lenovo” versus “Lenovo bought the PC division of IBM” requires discerning the speaker’s intention.

“I love you too” can be taken in a number of ways, including reciprocal love, comparing one’s own love to another’s, expressing love with other emotions, and even mixing love and like, it illustrates the complexities of pragmatic ambiguity.

Common knowledge

Navigating common knowledge is an additional layer of complexity in NLP due to the vast array of information assumed to be known by the general population. While humans can rely on shared cultural, historical, or everyday knowledge to interpret language, machines lack this inherent understanding. Expressions such as idioms, cultural references, or implicit meanings require NLP systems to bridge the gap between explicit information and assumed knowledge, making the processing of text more challenging.

Diversity across languages

The diversity inherent in languages poses a significant challenge for NLP systems. Every language has its unique syntax, semantics, and cultural nuances. Additionally, the availability and quality of linguistic resources vary across languages, making it challenging to develop universally applicable models. NLP systems must contend with diverse linguistic features, and language-specific challenges, requiring continuous refinement and customization for effective cross-linguistic comprehension.

Approaches to natural language processing

The different approaches used to solve NLP problems commonly fall into three categories.

  • Heuristic based approaches

Heuristic-based approaches involve using explicit rules or guidelines crafted by domain experts to process and understand natural language. These rules are often manually defined based on linguistic and contextual knowledge. Interpretable, rule-based, and effective in specific domains with well-defined patterns.

Example: Creating a set of rules to identify and extract named entities from text based on linguistic patterns.

  • Machine Learning approaches

Machine learning approaches in NLP involve training algorithms on labeled datasets to learn patterns and relationships within the data. These models can then make predictions or classifications on new, unseen data. Data-driven, adaptable, and effective for various NLP tasks, requires labeled training data.

Example: Training a supervised machine learning model to classify sentiment in customer reviews.

  • Deep Learning based approaches

Deep learning employs neural networks with multiple layers to automatically learn hierarchical representations of language features. This approach has gained prominence for its ability to capture complex patterns in large datasets. Hierarchical feature learning, effective for complex tasks, requires substantial computational resources and data.

Example: Training a deep neural network for machine translation, where the model learns complex language patterns for translating between languages.

So this is the end of this article. Hope you guys got a good understandig.

See you in the next blog post. Bye Bye🍻🍸❤️❤️

--

--

Kushan Madhusanka
Kushan Madhusanka

Written by Kushan Madhusanka

Undergraduate of University of Moratuwa | Faculty of Information Technology

Responses (1)