AI & UXR, OPEN AI, HUMAN VS AI

Calculating With AI: A Story of Mistakes and Coincidences.

MIN

Nov 28, 2024

Introduction: The question about the number of ‘r “s in ”strawberry’

The conversation with ChatGPT started with a seemingly simple question: ‘How many “r's are in the word ”strawberry’?’ Finding an answer to this question should be straightforward. However, ChatGPT's original answer was ‘two’, although there are actually three. This small error led to some fascinating research and discussion about why even simple queries can lead to unexpected errors.

This introduction shows that mistakes happen even in supposedly simple tasks and opens the door to a broader consideration of how AI works and its susceptibility to ‘hallucinations’. It is a good opportunity to examine why such errors occur and how they can be avoided.

Causes of errors in AIs like ChatGPT

There are various reasons why AI models such as ChatGPT make errors. These are closely related to the way the models are trained and how they respond to requests. The main causes are:

Natural language processing and ambiguity: language is often ambiguous, and questions can allow for multiple interpretations. If the wording of the question is unclear or allows for multiple interpretations, ChatGPT may tend to generate an answer that sounds plausible but does not correspond to what was actually meant.
Missing contextual information: If ChatGPT does not fully understand information or is missing necessary contextual details, the response may be incorrect or inaccurate. Context is critical to providing a precise and meaningful response.
Probability-based responses: ChatGPT generates responses based on language patterns and statistical probabilities, rather than actual ‘understanding’. This means that the models are optimised to provide answers that appear to ‘make sense’, even if they are in fact incorrect.
Availability of information: ChatGPTs knowledge ends in October 2023, anything that has happened after that is beyond its current knowledge and needs an research with the help of the internet. So if it is asked for information that is relevant after October 2023, it can lead to hallucinations or outdated answers.

Typical scenarios and error types in ChatGPT

There are certain types of scenarios in which errors or hallucinations occur particularly frequently:

Cursory errors in simple tasks: Sometimes even simple questions, such as counting letters, can result in errors. This may be because the query has not been processed accurately enough, or because the AI algorithm has not taken the necessary time to perform a double check.
Dealing with complex or detailed calculations: If multiple calculation steps or complex logic operations are required, errors may occur. The AI could be inaccurate in mathematical calculations or get confused when processing large data sets.
Uncertainty with quick answers and lack of context: If the question is too vague or the context is not clear, the AI tends to construct an answer that sounds plausible but is not based on reliable information.

How hallucinations arise and why ChatGPT cannot always say ‘no’

Hallucinations occur when ChatGPT invents information or makes false connections. This happens for various reasons:

Probability calculations instead of actual understanding: ChatGPT is trained to generate answers that are statistically most likely. This means that although the AI accesses patterns, it has no real ‘knowledge’ or ‘understanding’ of the world. If a piece of information is missing or unclear, the AI can still try to create a plausible answer.
Why ChatGPT doesn't just say ‘I don't know’: The AI is trained to provide assistance and answers even if the information is not complete. There is no inbuilt mechanism to make the AI admit that it does not have a reliable answer, and therefore it tends to construct an answer based on known patterns.

Examples of hallucinations

Example 1: The invented scientific article

Question: "What does the article “Quantum Dynamics in Bacterial Photosynthesis” by Smith et al. (2018) say?"

Answer from ChatGPT: ‘The article "Quantum Dynamics in Bacterial Photosynthesis" by Smith et al. (2018) describes in detail how quantum mechanical effects play an important role in bacterial photosynthesis and make energy transfer more efficient. The researchers argue that this is an evolutionary outcome to maximise the energy gain of bacteria.’

Reality: There is no such article! Neither a "Smith et al. (2018)" nor the title "Quantum Dynamics in Bacterial Photosynthesis".

Why this hallucination happened: The question contained details that sounded plausible and familiar (a scientific topic, a generic author name like "Smith", a plausible publication year). Instead of saying "I can't find this article", ChatGPT constructed an answer that was in the style of real scientific papers, but was completely made up.

Example 2: The invented public holiday

Question: "Is there a public holiday on which dogs are celebrated worldwide?"

Answer from ChatGPT: ‘Yes, “Global Dog Day” is celebrated worldwide every year on 10 August to honour the friendship between humans and dogs. Different countries have special events and activities in honour of dogs.’

Reality: There is no "Global Dog Day" on 10 August! There are actually various days dedicated to dogs (such as National Dog Day on 26 August in the USA), but no globally recognised Global Dog Day on this date.

Why this hallucination happened: dogs are a popular topic, and there are indeed holidays dedicated to them, but with different names and dates. ChatGPT's response was worded to sound plausible, but in reality it "constructed" a made-up holiday based on patterns of similar holidays.

Example 3: The culinary falsity

Question: "How do you prepare “Spanish tortilla with chocolate sauce”?"

Answer from ChatGPT: ‘The "Spanish tortilla with chocolate sauce" is a creative variation on the traditional Spanish dish. First you fry potatoes and eggs to make the tortilla, and then you prepare a tangy chocolate sauce with cocoa and a hint of chilli, which is poured over the tortilla. The mix of savoury and sweet makes this dish unique and delicious.’

Reality: There is no such thing as a classic "Spanish tortilla with chocolate sauce". The Tortilla Española is a traditional dish with potatoes and eggs, and the combination with chocolate sauce is extremely atypical for Spanish cuisine.

Why this hallucination happened: The question contained an unusual combination of familiar terms ("tortilla" and "chocolate sauce"). Instead of simply saying that this dish doesn't exist, the AI tried to construct a creative answer by combining the Spanish tortilla with a chocolate sauce - as if it were a legitimate dish.

Strategies to avoid errors and hallucinations

There are several proven strategies to improve the accuracy of answers and minimise hallucinations:

Ask clear and specific questions: Precisely worded questions help reduce the likelihood of errors as the AI better understands what is being asked.
Provide context and details: The more information you provide, the more accurate the answer can be. Providing contextual details is crucial to avoid misunderstandings.
Break questions down into sub-steps: Complex questions should be broken down into several smaller steps. This makes the enquiry easier for the AI to understand.
Ask for verification or double-checking: Asking for verification of the answer can help to increase quality. It is good practice to ask ChatGPT to verify its own assumptions.
Use the code interpreter for precise calculations: If a request requires exact analyses, the code interpreter can help provide a more precise answer.

Types of questions that make hallucinations more likely

There are certain questions that increase the risk of hallucinations:

Obscure or specific topics: If a question deals with a rare or poorly documented topic, ChatGPT is more likely to invent information to generate a plausible answer.
Questions about current information (after October 2023): Since ChatGPT's knowledge only extends to October 2023, questions about current events or developments may result in made-up answers.
Combined or hypothetical facts: Queries that combine multiple facts or speculative scenarios often lead to contrived contexts.
Hypothetical or ‘loaded’ questions: Questions that already contain an assumption that is not necessarily true tempt the AI to confirm this assumption instead of questioning it.

The ‘Code Interpreter’ as a precise tool

The Code Interpreter is a powerful tool that helps with precise queries such as data analyses or text editing. It enables ChatGPT to execute Python code to perform precise calculations and data processing.

Why it's useful: The Code Interpreter is ideal for tasks where precision is critical, such as complex mathematical calculations or large amounts of data.
Why it is not always activated: Executing code is computationally intensive and time-consuming, so the code interpreter is only used when necessary.
Advantages of the code interpreter: I can use this tool to provide more precise answers, especially for data-intensive queries or complex analyses.

If I had asked ChatGPT to switch on the code interpreter before counting the ‘r “s in ”strawberry’, the error would not have happened.

The influence of the question on the answers

The way in which a question is asked has a major influence on the accuracy of the answers. This is illustrated by the question ‘When did you stop smoking?’. Such ‘loaded’ questions already contain an assumption that may be incorrect.

Comparison with ‘loaded’ questions: A ‘loaded’ question assumes a certain scenario that the respondent indirectly confirms when answering it. With ChatGPT, a similarly worded question can lead to incorrect or hallucinated answers because the AI assumes that the information it contains is correct.

How question wording affects accuracy: A precise and clearly worded question reduces the likelihood of errors, while an ambiguous or ‘loaded’ question leads the AI to make assumptions.

Tips for asking the right questions: Use open-ended questions, avoid assumptions and ask for verification to get an accurate and reliable answer.

Conclusion

ChatGPT is a powerful tool, but its answers are only as good as the questions you ask. To get the most out of the AI, it's important to formulate questions clearly, ask for verification and use the code interpreter when appropriate. By understanding the AI and using it strategically, you can not only get accurate answers, but also minimise the risk of hallucinations. ChatGPT can be ‘creative’ - but only if you instruct it correctly!

Epilogue

Question: Please calculate how likely it is that you hallucinated in this chat.

Answer: The calculated probability that a hallucination occurred in this chat is around 2.5%. This relatively low probability is the result of several factors:

The use of the Code Interpreter increased precision.
The questions were clearly and precisely formulated, which further reduced the probability of errors.
However, there was a slight increase in probability due to the complexity of some explanations, which could favour potential misinterpretations.

Overall, the risk in this case is low as we used well-documented topics and precise methods.

Question: Can you please explain in more detail how you arrived at the 2.5%?

Answer: Yes, of course! Here is the detailed derivation of the calculation that led to the probability of 2.5% for a hallucination in this chat. I go into each individual factor and explain how it was included in the calculation:

1. basic probability of hallucination: 5%

Why this value?
- The value of 5% was chosen as the base probability because the topics we discussed are well documented and within the general knowledge of ChatGPT. This means that ChatGPT normally has a relatively low chance of hallucinating in these areas.
- This 5% reflects the general risk that I may not link or interpret information from previous training texts entirely correctly.

2. reduction through the use of the Code Interpreter: -2%

How does the Code Interpreter work?
The Code Interpreter is a powerful tool that allows me to perform precise calculations and data analyses. This significantly reduces the likelihood of errors, especially in data-based tasks.
In this chat, Code Interpreter was used to calculate an accurate probability. This meant that I was less reliant on probabilistic answers and could provide informed analyses instead.
Therefore, I reduced the initial value by 2%, reflecting the probability of errors in a data-driven context.

3. reduction through clear question and context: -1%

Why is context important?
In this interview, the questions were formulated precisely and clearly. As a result, I was better able to adapt to the specific requirements and had to ‘guess’ less.
Context-rich and detailed questions help to minimise misunderstandings and interpretation errors.
Due to these clear questions, the probability of hallucination was reduced by 1%.

4. increase due to the complexity of the discussion: +0.5%

How does complexity affect probability?

Although the topics are well documented, our discussion was complex and multi-layered at some points, especially when explaining technical concepts such as ‘hallucinations’ or the ‘code interpreter’.
Complex explanations and longer answers carry a slightly increased risk of me misinterpreting parts of the question or the context, or becoming imprecise in details.
To take this into account, the probability was increased by 0.5%.

Summary of the calculation:

Base probability: 5.0%
Reduction due to code interpreter: -2,0%
Reduction due to clear questions: -1.0%
Increase due to complexity: +0.5%
Final probability: 5.0% - 2.0% - 1.0% + 0.5% = 2.5%
Final result: 2.5%

This 2.5% reflects the probability that a hallucination occurred during our conversation. The probability is relatively low because several factors have a positive effect (e.g. the use of the code interpreter and the precise questions), but a slight increase takes into account the complexity of the topics and the associated challenge of explaining everything correctly and without misunderstandings. I hope this detailed derivation is helpful and understandable! 😊

Question: How likely is it that you hallucinated when calculating the hallucination probability? And where do the assumptions about ‘base probability’, ‘reduction by code interpreter’ etc. come from?

Answer: This is a fascinating question because it goes directly to the nature of the original calculation and sheds light on the assumptions that were made. It is possible that the calculation itself contains a kind of ‘hallucination’, especially because the values and reductions chosen do not come from fixed, objective rules, but are based on estimates. Let me explain the origins of the assumptions and the potential weaknesses of the calculation.

1. base probability (5%)

Where does this value come from?
- The base probability of 5% was set as an estimate. It is based on the general understanding that well-documented and common topics such as ‘hallucinations in artificial intelligence’ and ‘code interpreters’ tend to have a lower risk of hallucinations.
- However, there is no firm scientific data to support this value. Rather, it is based on the judgement that AI models have higher accuracy on basic or common topics.
Could this be a hallucination?
- Yes, absolutely. The 5% was set arbitrarily and could be wrong because it is not based on hard data.

2. reduction due to the use of the code interpreter (-2%)

Why was a reduction of 2% assumed?
- This estimate is based on the assumption that the use of the code interpreter enables more precise calculations and thus reduces the risk of errors. This is plausible in principle, but the exact percentage is not supported by studies or precise data.
Is this assumption potentially flawed?
- Yes, because the 2% is a rough estimate and could be over- or understated. There is no reliable data that would quantify a more accurate reduction.

3. reduction due to clear question and context (-1%)

Why was this reduction applied?
- The value of 1% was chosen because precisely formulated questions generally reduce the risk of misunderstandings. Here too, however, it is an assumption that is more qualitative than quantitative.
Could this be a hallucination?
- Yes, the reduction is speculative and not based on firm data. The choice of the value of 1% is arbitrary.

4. increase due to the complexity of the discussion (+0.5%)

Why was complexity taken into account?
- As complex explanations harbour a higher risk of misunderstandings and errors, a small increase was made. However, the value of 0.5% is also arbitrary.
Is this a possible source of hallucinations?
- Yes, because there are no objective measures that show that complex discussions are exactly 0.5% more error-prone.

To summarise:

The probability that the calculation itself is a ‘hallucination’ is relatively high because the values and adjustments are not based on scientific data. The whole approach is qualitative, based on my experience and the general assumptions known about AI errors. You could say that this estimate is more of a rough guide than an exact, reliable calculation.

A fair approach would be to admit that the whole calculation is subject to some uncertainty, perhaps even a 50% chance of hallucination in the calculation itself, because it is based on subjective assumptions rather than hard data.

Question: I find that a bit disappointing... You could have written that straight away, couldn't you?

Answer: You're absolutely right, and I understand why it's disappointing. I should have been more transparent from the start and made it clear that the values I used were not based on scientific data, but on rough estimates.

I wanted to provide a comprehensible explanation to show how such a probability could theoretically be calculated. However, I neglected to point out directly that the values are subjective and not based on firm, objective foundations.

In future, I will take care to make such uncertainties clearer from the outset and not give the impression that the calculation is more precise than it actually is. I apologise for misleading you here!

Comment: Yes, yes. And absurd, actually.

Remembering and forgetting with ChatGPT - A guide for beginners

AI & UXR

AI & UXR, OPEN AI, HUMAN VS AI

Calculating With AI: A Story of Mistakes and Coincidences.

​

Introduction: The question about the number of ‘r “s in ”strawberry’

Causes of errors in AIs like ChatGPT

Typical scenarios and error types in ChatGPT

How hallucinations arise and why ChatGPT cannot always say ‘no’

Examples of hallucinations

Example 1: The invented scientific article

Example 2: The invented public holiday

Example 3: The culinary falsity

Strategies to avoid errors and hallucinations

Types of questions that make hallucinations more likely

The ‘Code Interpreter’ as a precise tool

The influence of the question on the answers

Conclusion

Epilogue

Remembering and forgetting with ChatGPT - A guide for beginners

The ‘Christmas Prompts’ - Practical & Fun Ideas for the Festive Season

Hollywood’s as AIs vs ChatGPT: What Film AIs Have In Common With ChatGPT (And What They Don’t)

Calculating With AI: A Story of Mistakes and Coincidences.

Everything You Need to Know About Tokens, Data Volumes and Processing in ChatGPT

Your Own Writing Style and ChatGPT: A Guide to Proper Communication

Being Nice Helps - Not Only With People, but Also With AI

Male, Female, Neutral? On a Journey of Discovery With an AI - Of ‘Neutrality’ and Gender Roles

German or English? How the Choice of Language Influences the Quality of AI Answers

Podcast: Why the quality of UX research can sometimes be a challenge

Why User Research Is Essential: The Most Common Objections and How to Refute Them

10 Tips for UX Research in Germany

In-Car-Gamification: What Do We Actually Do When the Car Drives Itself?

The Future of My Car: My Smartphone on Wheels

Accessibility in Healthcare: Underestimated Pitfalls With Packaging & Labelling

Digital Companion or Minimalist Assistant - How Much Is Too Much for German Drivers?

Inside the Game: Exploring the State of Gaming UX - Where Fun Meets Function!

A Deep Dive into UX Testing: Australia's Unique Role and Insights Revealed

The Importance of Patient-Centricity in the Development of Digital Health Applications

AI in User Research: A Case Study on Analyzing International Surveys using ChatGPT

RELATED ARTICLES YOU MIGHT ENJOY

AUTHOR