HomeBlog >
Multimodal Series: The Surprising Link Between The Human Mind and Machine Perception
< Home
Insights

Multimodal Series: The Surprising Link Between The Human Mind and Machine Perception

By
Moments Lab Content Team
March 3, 2020

Human Decision Making: When Do We Use Confidence Levels?

Confidence levels can be used anytime one is estimating or predicting something. Examples include: business, engineering, medicine, technology... or just day-to-day life.

As humans we use confidence levels regularly. Whether you decide to dodge an aisle at the grocery store because you thought you saw your chatty neighbor, or using evidence and intuition to convict a suspected criminal during jury duty, your mind is in a constant state of perceiving its surroundings. It makes decisions based on those perceptions via an inherent estimate of confidence.

Human Multimodal Decision Making
Human confidence levels in decision making are complex and intricate, sometimes leading to serious consequences : Image Source

And so how does your mind generate confidence levels? Well... this all happens in a split second. The brain using an innate multimodal approach to better perceive external scenarios and draw conclusions. In the first phase of the brain’s decision-making process, most people rely on their 5 senses.

This is called a cognitive shortcut, or heuristic technique, which is immediate and instinctive, but not guaranteed as optimal (examples: an educated guess, sensory instinct etc).

A second layer would then be cross-checking this analysis with previous experience. This indicates a more advanced thought process which either validates or invalidates the brain’s initial confidence level in making a decision.

This logic is based off of basic evolutionary traits that allowed our ancestors to not just survive, but live longer. Making correct decisions meant outliving the bad decision-makers.

Human Confidence Levels
Should I eat this? Are the color and shape normal... have I eaten something like this before?

When looking at modern day science, confidence levels become more statistics-based and less intuitive.

Confidence Level in Statistics vs. Machine Learning

Do you remember your first Statistics class? The heavy textbook you never used, the lecture hall full of students... and (if you were paying attention) a board that looked something like this?

Statistics Confidence Levels
Do you remember taking Statistics class, seeing a board like this? Image Source.

The image above is an example of estimation statistics. You can see a parabolic representation of a confidence interval (CI) of 0.44 and 0.64. For reference, the CI is the range of experimental results expected to contain a parameter of the population of interest. The confidence level is 95% (probability that you would get similar results if you repeatedly run a test). With this information, we create a data set that is “trusted” with 95% confidence.

Just as your statistics professor painstakingly calculated these data sets on a blackboard, neural network models now provide results much quicker. They also deliver a smaller margin of error that could be caused by humans.

As applied to machine learning, a minimum confidence level is pre-defined by an AI model with existing thresholds to validate results (remaining empirical).

This solution is still based on estimation statistics - except instead of taking 30 minutes, data sets are populated instantaneously.

Neighbors in Tokyo?

That’s kind of heavy, so let’s take a lighter example.

Say you live in New York and take a summer vacation to Tokyo. You just finished watching an amazing sumo match, and now you are walking back to your hotel, enjoying some evening sight-seeing.

Facial Recognition_Neighbors in Tokyo
You’re walking through the crowded streets of Tokyo… is that your neighbor Paul?!

Suddenly, someone catches your eye as you’re walking through the crowded streets. It’s a familiar face, but you can’t quite pinpoint his identity. As you pass him directly and get a closer view, you realize this person looks a lot like your neighbor in Manhattan, Paul Johnson.

But it can’t be, so you keep walking unphased. The likelihood is low. Paul being in the exact same place as you at the same time, so far away from home- it’s nearly impossible in your mind. And well... nearly impossible statistically-speaking, too.

... Except that it was Paul.

As a recap, in this case, you:

  1. Recognized a familiar face in an unfamiliar environment
  2. Cross-analyzed what you sensed (saw) with your previous experiences and knowledge
  3. Had doubt, and a limited level of confidence due to conflicting analysis

This example showcases the immense power of the human mind.

The missing element was context, as you saw his face, but did not have the existing knowledge that he was indeed on vacation in the same city as you.

In this instance, perhaps your innate confidence level was a 2 out of 10... also showing how much the human brain relies on the contextual approach to making correct decisions.

20% confidence level
You may feel 20% confident that you just saw your neighbor... lacking conviction due to unfamiliar context.

When Do We Need Technology's Help? Machine Confidence

Just as chemistry and neural architecture of the brain helps humans draw conclusions and recognize faces, AI is able to use statistics and neural networks to detect various components.

Such elements include faces, objects, language and context, just to name a few.

Similar to how humans use basic senses and contextual information to make decisions and estimate confidence levels, AI algorithms can now replicate this process.

Now machines are able to detect what’s in a video, for example, via cross-analysing facial recognition, speech-to-text, metadata... aligning sources of data to better perceive results, in record-breaking time.

Multimodal Perception_Newsbridge
AI inspired by human neural networks.

That said, just as the human brain has margin for error, the same can be said for deep learning. After-all, the same algorithms were created by humans.

In no way does AI take the place of a human brain- but it does emulate the brain’s perception process via multimodal analysis factors.

Multimodal Analysis and Confidence Levels in Practice: Real-Life Application

Understanding and perceiving the world around us originates from a mixture of fascination, necessity and survival. Nowadays in a digital-focused society, many entities understand the importance of intaking information quickly and on multiple levels.

Here are some examples:

Automation Industry: Tesla Says Autopilot AI Technology is 9x Safer than Human Drivers

In its third quarter 2019 update, Tesla stated that autopilot technology is “9x safer than the average human driver [in the U.S.].” The report supports this statement with the fact that autopilot cars were only involved in one accident for every 4.34 million miles (7 million km) driven during the quarter- (in the U.S) whereas the average human driver related accident is one for every half million miles (800 thousand km). In terms of machine decision making- the confidence level of Tesla autopilot technology is not just high-functioning it is life-saving.

Tesla Autopilot Multimodal AI
Tesla Autopilot technology aims to reduce crashes by 50 percent: Image Source.

Mandatory Facial Recognition: Cell Phones in China

Perhaps you heard about it in the news. Now China is making facial recognition mandatory for all mobile phone users to protect cyberspace citizens against fraud. Telecom companies now use artificial intelligence among other technical methods to make extremely important decisions when it comes to online identity- so it could be fair enough to say that confidence levels must be extremely high.

Facial Recognition
Facial Recognition is changing the lives of Chinese citizens: Image Source.

Multimodal Cognitive Video Indexing for Media and Broadcast Entities

Moments Lab created a system of multimodal analysis that uses machine learning to train AI and adapt the algorithm. More specifically, journalists can upload all images and videos to a secure cloud-based platform. From there, digital content is automatically tagged and indexed via multimodal facial recognition (visual), object detection (tactile) and speech-to text technologies (audio), among others. This technology also has the ability to read any “open text” sources to improve results across various scenarios.

In Summary

Perhaps one of the main takeaways is that as machine learning and AI continue to advance, human logic and decision-making remains at the heart of machine perception.

In turn, many other technological advancements and revolutions to come. Just as humans use confidence levels for making important decisions, today multimodal AI machines use a similar rationale to make judgements and take action in a short period of time.

This is leading to more profitable, secure and sometimes life-saving results.

Related reads

No items found.
Moments Lab pour votre organisation

Contactez-nous pour une démo et un essai gratuit de 7 jours.

C'est parti →