Our AI Impact

 for the health of people



 Our AI Impact

 for the health of planet



 Our AI Impact

 for the health of business



“The work [with Synaptiq] is unprecedented in its scale and potential impact,” Mortenson Center’s Managing Director Laura MacDonald MacDonald said. “It ties together our center’s strengths in impact evaluation and sensor deployment to generate evidence that informs development tools, policy, and practice.” 
Read the Case Study ⇢ 


    ⇲ Implement & Scale
    A startup in digital health trained a risk model to open up a robust, precise, and scalable processing pipeline so providers could move faster, and patients could move with confidence after spinal surgery. 
    Read the Case Study ⇢ 


      Thwart errors, relieve in-take form exhaustion, and build a more accurate data picture for patients in chronic pain? Those who prefer the natural albeit comprehensive path to health and wellness said: sign me up. 
      Read the Case Study ⇢ 


        Using a dynamic machine vision solution for detecting plaques in the carotid artery and providing care teams with rapid answers, saves lives with early disease detection and monitoring. 
        Read the Case Study ⇢ 


          man-wong-aSERflF331A-unsplash (1)-1
          This global law firm needed to be fast, adaptive, and provide unrivaled client service under pressure, intelligent automation did just that plus it made time for what matters most: meaningful human interactions. 
          Read the Case Study ⇢ 



            Mushrooms, Goats, and Machine Learning: What do they all have in common? You may never know unless you get started exploring the fundamentals of Machine Learning with Dr. Tim Oates, Synaptiq's Chief Data Scientist. You can read and visualize his new book in Python, tinker with inputs, and practice machine learning techniques for free. 

            Start Chapter 1 Now ⇢ 


              How Should My Company Prioritize AIQ™ Capabilities?





                Start With Your AIQ Score

                  9 min read

                  Customer Review Sentiment Analysis: A Business Case for Tokenization

                  Featured Image

                  Sentiment analysis is a must-have for organizations with a business-to-consumer (B2C) business model. This natural language processing technique can be used to discern the sentiment of customer reviews, revealing valuable insights that would otherwise be lost in a sea of unstructured data. Such insights are key for B2C organizations striving to understand customer stories and remain attuned to consumer needs.

                  Tokenization is a fundamental step in sentiment analysis. It is the process of splitting a single piece of text into multiple smaller units (tokens) for processing. Let’s explore how B2C organizations can use tokenization with a practical business case: performing sentiment analysis on customer reviews.

                  Note: "Tokenization" can also refer to a process in which sensitive data is substituted with a unique non-sensitive equivalent, called a token. We’ll cover that kind of tokenization in a future blog post. ;)

                  Inquire About Sentiment Analysis for Your Business

                  Sourcing a Suitable Dataset

                  Our first step is to gather data suitable for tokenization. We’ve procured an open-source dataset from the online data science community Kaggle that contains about one million customer reviews of Sephora skincare products collected via web scraping. We’ve narrowed down our dataset by filtering for customer reviews specifically related to the product "Lip Sleeping Mask Intense Hydration with Vitamin C," resulting in a subset of 199 customer reviews.

                  Learn More About Web Scraping From Our Chief Data Scientist

                  Tokenizing the Customer Reviews

                  Our next step is to tokenize the data we’ve gathered. Tokenization entails dividing a single piece of text (in our case, a customer review) into smaller pieces, or "tokens." Tokens can range in size from whole words to granular units like subword pieces, which are generated by progressively complicated methods such as Byte-Pair Encoding.

                  For the sake of simplicity, let’s settle on word-level tokenization. We've used the Natural Language Toolkit or "NLTK" — a popular Python package for natural language processing for English — to split each of our 199 customer reviews into a series of words and punctuation marks. A single review that reads, "This lip mask is awesome!" thus becomes six tokens (five  words and an exclamation point): "This" "lip" "mask" "is" "awesome" "!"

                  The histogram below displays the 10 most frequently occurring tokens within our dataset. On the x-axis, we have the tokens themselves, and on the y-axis, we have the total number of times each token appears across our collection of 199 customer reviews. We can see that two of these 10 tokens are punctuation marks, and a further seven are what we call "stopwords." Stopwords are common words with very little semantic meaning, such as articles, prepositions, and conjunctions. They are generally not useful in the context of sentiment analysis.

                  10 Most Frequently Occurring Tokens in Customer Review Dataset

                  Note: Reviews were converted to lowercase prior to tokenization.

                  Removing the punctuation marks and stopwords from our dataset yields a much more interesting histogram (below). It’s no surprise to find the tokens "lips," "lip," and "product" on the x-axis — remember, we’ve tokenized reviews for a lip mask skincare product— but the other eight tokens hint at customer sentiment. For example, the token "dry" could convey a positive sentiment (e.g., "my lips were dry, but this product helped") or a negative sentiment (e.g., "this product made my lips feel dry"), depending on the context in which it appears.

                  10 Most Frequently Occurring Tokens Excluding Stopwords in Customer Review Dataset

                  We’ve used tokenization to turn our customer reviews into bite-sized tokens. Our final step is to perform sentiment analysis to evaluate these tokens in context (where they’re useful) and interpret the results. 

                  Performing Sentiment Analysis

                  Sentiment analysis is a natural language processing technique used to categorize the sentiment expressed in a piece of text as positive, negative, or neutral. We've employed lexicon-based sentiment analysis to cagegorize a single customer review from our dataset: "It is so moisturizing and keeps my lips super soft and hydrated. This approach involves matching each token in the review with a predefined lexicon to get its sentiment score a process similar to “looking up” a word in the dictionary to get its definition. We’ve used a domain-specific lexicon tailored to the makeup and beauty market created by the Stanford Natural Language Processing Group.

                  The figure below shows the sentiment scores assigned to each token in the customer review. We can speculate that the tokens "moisturizing," “super,” “soft,” and “hydrated” have a positive sentiment score because they often express desirable qualities in the context of the beauty market, whereas the tokens "keeps" and "lips" have a negative sentiment score because they often express undesirable qualities (e.g., "keeps drying out my lips).

                  Sentiment Analysis Results of Customer Review #37

                  This example shows how B2C organizations can use sentiment analysis to extract insights from customer reviews. Businesses can understand customer sentiments, identify areas of improvement, and enhance their products or services to better meet consumer needs by tokenizing and analyzing customer feedback in aggregate.

                  Inquire About Sentiment Analysis for Your Business


                  humankind of ai


                  About Synaptiq

                  Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation. 

                  Contact us if you have a problem to solve, a process to refine, or a question to ask.

                  You can learn more about our story through our past projects, our blog, or our podcast.

                  Additional Reading:

                  Too Much Data, Too Little Time: A Business Case for Dimensionality Reduction

                  Introduction to Dimensionality Reduction

                  High-Dimensional Data

                  Imagine a spreadsheet with one hundred columns and...

                  BETTER Customer Review Sentiment Analysis: A Business Case for N-grams

                  Sentiment analysis is a useful tool for organizations aiming to understand customer preferences, gauge public...

                  Smart and Safe Innovation: Synthetic Data for Proof-of-Concept Projects

                  In the ever-evolving landscape of technology, innovation and experimentation are key drivers of success. However, the...