Our AI Impact

 for the health of people



 Our AI Impact

 for the health of planet



 Our AI Impact

 for the health of business



“The work [with Synaptiq] is unprecedented in its scale and potential impact,” Mortenson Center’s Managing Director Laura MacDonald MacDonald said. “It ties together our center’s strengths in impact evaluation and sensor deployment to generate evidence that informs development tools, policy, and practice.” 
Read the Case Study ⇢ 


    ⇲ Implement & Scale
    A startup in digital health trained a risk model to open up a robust, precise, and scalable processing pipeline so providers could move faster, and patients could move with confidence after spinal surgery. 
    Read the Case Study ⇢ 


      Thwart errors, relieve in-take form exhaustion, and build a more accurate data picture for patients in chronic pain? Those who prefer the natural albeit comprehensive path to health and wellness said: sign me up. 
      Read the Case Study ⇢ 


        Using a dynamic machine vision solution for detecting plaques in the carotid artery and providing care teams with rapid answers, saves lives with early disease detection and monitoring. 
        Read the Case Study ⇢ 


          man-wong-aSERflF331A-unsplash (1)-1
          This global law firm needed to be fast, adaptive, and provide unrivaled client service under pressure, intelligent automation did just that plus it made time for what matters most: meaningful human interactions. 
          Read the Case Study ⇢ 



            Mushrooms, Goats, and Machine Learning: What do they all have in common? You may never know unless you get started exploring the fundamentals of Machine Learning with Dr. Tim Oates, Synaptiq's Chief Data Scientist. You can read and visualize his new book in Python, tinker with inputs, and practice machine learning techniques for free. 

            Start Chapter 1 Now ⇢ 


              How Should My Company Prioritize AIQ™ Capabilities?





                Start With Your AIQ Score

                  5 min read

                  Smart and Safe Innovation: Synthetic Data for Proof-of-Concept Projects

                  Featured Image

                  In the ever-evolving landscape of technology, innovation and experimentation are key drivers of success. However, the challenges of data privacy, data availability, and data diversity often hinder the rapid development of proof-of-concept and feasibility projects. This is where synthetic data emerges as a useful solution. In this blog, we will delve deep into the world of synthetic data, exploring what it is and why it’s used across different industries. 

                  What is Synthetic Data?

                  Synthetic data is digital information that is created artificially, mimicking real-world data scenarios without compromising the privacy and confidentiality of individuals [1]. Unlike traditional data, synthetic data is generated through computer simulations, algorithms, statistical modeling, and other techniques, offering a safe yet realistic environment for experimentation.

                  To put this in simpler terms, consider the example of data scientists wanting to run experiments on patient data from hospitals. Patient data contains sensitive identification information such as details about their medical history, their full name, address, contact information, and much more that is too vulnerable to include in studies that could be published. As a result, many scientists who do work with patient data either attempt to obtain de-identified data, or de-identify data themselves if they have the right permissions. Obtaining already de-identified data to perform experiments can be difficult. 

                  In this case, data scientists would create synthetic data by fabricating PII (Patient Identification Information) terms. This would not only allow them to run experiments smoothly with the amount of data that they would need, but also protect the privacy of the original patients. 

                  In another related example, a hospital could hire a team of data scientists and data engineers to create a machine learning based entity linker. In order to build this model, the team would likely need to use synthetic data to construct PII like names, gender, and age while testing the model, rather than using identifiable patient data. 

                  What are Proof-of-Concept Projects & Feasibility Studies?

                  Proof-of-concept projects are a type of feasibility study that serve as the preliminary testing ground for innovative ideas. They allow companies to validate the viability of their concepts before investing substantial resources. However, sourcing, managing, and protecting real-world data can be a daunting task during these sorts of projects. Synthetic data steps in as a valuable alternative, providing a secure platform to develop and refine concepts without the risks associated with genuine or proprietary data. While it may seem that we’re exaggerating the risks related to using real data, when it comes to health-related data or any type of personal or even governmental information, the dangers are very real. 

                  Why and How is Synthetic Data Used? 

                  Let’s explore a few key applications of synthetic data. 

                  Data Availability and Privacy: A Glimpse into the Future

                  Gartner estimates that 60% of data used in AI and analytics projects will be synthetically generated by 2024 [2]. This shift is driven by the elusive nature of real-world data; it tends to be gated in some way to protect the privacy of the source’s personal information. Synthetic data addresses these challenges by enabling the creation of diverse, realistic datasets that preserve individual privacy.

                  Data Diversity: Enhancing Testing Environments

                  One of the challenges in proof-of-concept projects lies in testing diverse scenarios and edge cases. Edge cases are when models run into data that cause them to not perform as expected. Sometimes, this can be due to the data being very different from what the model was trained on, and presenting it with a case where its criteria no longer applies well. In other cases–such as with image classification models–data can seem similar to training data based on the model’s parameters, but actually be unrelated, which can result in a silly scenario like this one: a model classified a similarly colored photograph of a blueberry muffin as a puppy [3]. While this example is harmless, with higher stakes applications of AI models, inaccurate classifications can have a much bigger impact. How can data scientists help to mitigate this issue? 

                  An article in Nature points out that, with its flexibility, synthetic data covers a wide array of situations, ensuring robust testing environments. By creating or using synthetic data while testing and building models, data scientists can increase their accuracy and lower edge case effects by exposing them to potentially extreme data points, and discovering areas where their parameters might need to be adjusted. For example, with the image classification model mentioned above, the use of synthetic data could reveal that edge case with the blueberry muffin, and allow data scientists the opportunity to adjust parameters accordingly. Models trained on more diverse data have a greater chance of adapting well to real-world complexities, and also will allow data scientists to monitor how well models perform with more realistic data. 

                  Rapid Prototyping and Cost-Efficiency: Accelerating Innovation

                  Developing proof-of-concepts often demands quick iterations and experimentation. Waiting for access to a large volume of real data can slow down the process significantly [5]. Synthetic data, available on-demand, expedites prototyping, saving time and resources. Moreover, its cost-effectiveness makes it particularly appealing for startups and projects with limited budgets. 

                  Data Labeling, Annotation, and Augmentation: Fueling Machine Learning Advancements

                  For machine learning projects that use unstructured and uncleaned real data, data labeling and annotation are imperative, yet frequently time-consuming tasks. Synthetic data, equipped with predefined labels, can streamline these processes, allowing data scientists and researchers alike to innovate more efficiently. Additionally, when integrated with real data, synthetic data can augment datasets, enhancing the performance of already-robust machine learning models [6]. Examples of this for image classification models can include adding noise to images, flipping original training data, and even scaling original images to create new examples for models to train with.

                  Embracing the Future of Innovation

                  In conclusion, synthetic data emerges as a game-changing tool for technology companies, enabling them to innovate safely and efficiently. As the world of AI and analytics continues to evolve, embracing synthetic data in proof-of-concept projects will be, and already has been, instrumental in overcoming challenges and fostering a future where innovation knows no bounds. By leveraging the power of synthetic data, businesses can create a safer, more inclusive, and technologically advanced world for us all. 

                  Want to learn more? Watch our video on synthetic data usage and other related data-wrangling topics, featuring our Chief Data Scientist and Co-founder, Dr. Tim Oates. 


                  About Synaptiq

                  Synaptiq is an AI and data science consultancy based in Portland, Oregon. We collaborate with our clients to develop human-centered products and solutions. We uphold a strong commitment to ethics and innovation. 

                  Contact us if you have a problem to solve, a process to refine, or a question to ask.

                  You can learn more about our story through our past projects, blog, or podcast

                  Additional Reading:

                  Too Much Data, Too Little Time: A Business Case for Dimensionality Reduction

                  Introduction to Dimensionality Reduction

                  High-Dimensional Data

                  Imagine a spreadsheet with one hundred columns and...

                  BETTER Customer Review Sentiment Analysis: A Business Case for N-grams

                  Sentiment analysis is a useful tool for organizations aiming to understand customer preferences, gauge public...

                  Smart and Safe Innovation: Synthetic Data for Proof-of-Concept Projects

                  In the ever-evolving landscape of technology, innovation and experimentation are key drivers of success. However, the...