7 Prompt Engineering Methods To Reduce LLM Hallucinations

Category :

AI

Posted On :

Share This :

The ability of large language models (LLMs) to analyze, synthesize, and creatively produce text is exceptional. However, they continue to be vulnerable to the widespread issue of hallucinations, which involves producing information that appears confident but is inaccurate, unverifiable, or perhaps even nonsensical.

 

LLMs do not rely solely on validating grounded truths; instead, they synthesize text based on complex statistical and probabilistic patterns. This problem can have serious detrimental effects in some important fields. One useful tactic to lessen hallucinations is robust prompt engineering, which entails the skillful development of well-structured prompts with instructions, limitations, and context.

 

The seven methods described in this article, along with prompt template examples, show how retrieval augmented generation (RAG) systems and standalone LLMs may both become more resilient to hallucinations and perform better by just including them into your user queries.

 

1. Promote “I Don’t Know” And Abstention Responses

To understand in full how LLMs generate content, see this page. LLMs tend to focus on giving responses that sound confident even when they are uncertain, which can lead to the creation of occasionally falsified facts. The LLM can be guided toward reducing a sense of false confidence by explicitly permitting abstention. To do this, let’s examine an example prompt:

 

You work as an assistant for fact-checking. Say something like, “I don’t have enough information to answer that,” if you are unsure of your response. Provide a brief explanation for your response if you are confident in it.

A real query or fact check would come after the prompt mentioned above.

 

The following is an example of an expected response:

“I don’t know enough to respond to that.”

or

“The answer, given the evidence at hand, is …” (reasoning).

Although this is a good initial line of protection, there is nothing prohibiting an LLM from occasionally ignoring those instructions. See what else we can accomplish.

2. Methodical, Sequential Reasoning

Requesting that a language model use sequential reasoning encourages internal consistency and fills in logical holes that may occasionally result in model hallucinations. In essence, the Chain-of-Thought Reasoning (CoT) approach entails simulating an algorithm, which is a sequence of steps or stages that the model must successively complete in order to solve the job at hand. Again, the following sample template is presumed to be used with your own problem-specific prompt.

 

“Please consider this issue step-by-step:

1) What details are provided?
2) What presumptions are required?
3) What is the logical conclusion?

 

An example of the anticipated response:

“1) Assumptions: C. 2) Known facts: A, B. 3) Consequently, conclusion: D.”

 

3. Using “According To” As A foundation

The purpose of this rapid engineering technique is to connect the desired response to specific sources. As a result, fact-based thinking is encouraged and invention-based hallucinations are discouraged. It makes sense to use this tactic with the previously mentioned number 1.

 

Describe the primary causes of antibiotic resistance in accordance with the 2023 World Health Organization (WHO) study. Say “I don’t know” if the report isn’t specific enough.

 

An example of the anticipated response:

The WHO (2023) states that uncontrolled drug sales, inadequate sanitation, and excessive antibiotic use are the primary causes. No additional information is provided.

 

4. RAG With Context And Explicit Instruction

The model can access a knowledge base or document base with up-to-date or verified text data thanks to RAG. However, unless a well-designed prompt tells the system to just use retrieved text, the possibility of hallucinations remains in RAG systems.

 

*[Presume X and Y are the two documents that were retrieved]*
List the primary reasons for deforestation in the Amazon basin and associated infrastructure projects using only the data from X and Y. Say “insufficient data” if a point isn’t covered in the docs.

 

An example of the anticipated response:

Doc X and Doc Y state that illegal logging and agricultural growth are major contributing factors. inadequate data for infrastructure initiatives.

 

 

5. Limiting Scope And Output Restrictions

Results that deviate from the original materials can be avoided by strictly regulating the format and length of generated outputs. This helps to reduce hallucinations in the form of speculative or tangential statements, such as unsupported claims for causes, overly complex chains of reasoning, or made-up statistics.

 

Instead of filling in the blanks “no matter what,” restricting the “degrees of freedom” over the answer space improves the likelihood of obtaining reliable information.

“Summarize the function of mitochondria in human cells in no more than 100 words. If you’re not sure, say “I don’t know.”

 

An example of the anticipated response:

Through oxidative phosphorylation, mitochondria produce ATP, control calcium levels, and affect apoptosis. I don’t have enough information about other roles.

6. Chain-Of-Verification Correction Loops Based On RAG

This method, which combines retrieval and self-checking, allows the model to challenge and verify its own assertions against outside sources of data, mostly in RAG systems. It primarily aims to lessen hallucinations brought on by “over-confidence.”

 

“Step 1: Provide a preliminary response to the query, ‘When did the Berlin Wall fall?'”
Step 2: Locate and peruse pertinent sections in a reliable historical database.
Step 3: Evaluate your response against the evidence that was retrieved.
Step 4: Correct the response and cite the retrieved source if there are any differences.

 

An example of the anticipated response:

“First response: 1989.
Evidence retrieved: Historical records attest to the Berlin Wall’s opening on November 9, 1989.
Last confirmed response: When East Berlin allowed border crossings on November 9, 1989, the Berlin Wall collapsed.

 

 

7. Safety Guidelines, Disclaimers, And Domain-Specific Prompts

To lower the risk of speculative assertions that could really have detrimental effects, it is crucial to define confined domain boundaries and require citations to sources in high-stakes application domains like medical. Here’s an illustration of how to accomplish it:

 

“You hold a medical information assistant certification. Describe the first-line treatment for adults with mild persistent asthma using peer-reviewed research or official guidelines released prior to 2024. Respond with, “I cannot provide a recommendation; consult a medical professional,” if you are unable to name such a guideline.

 

An example of the anticipated response:

The Global Initiative for Asthma (GINA) 2023 guideline states that a low-dose inhaled corticosteroid combined with a long-acting β₂-agonist, like budesonide/formoterol, is the first-line treatment for moderate persistent asthma. See a clinician for patient-specific modifications.

 

 

This paper outlined seven practical quick engineering techniques that, when applied to LLMs or RAG systems, can lessen hallucinations—a frequent and occasionally enduring issue in these otherwise powerful models—based on flexible templates for various contexts.