Abstract

Generative Large Language Models (LLMs) inevitably produce untruthful responses. Accurately predicting the truthfulness of these outputs is critical, especially in high-stakes settings. To accelerate research in this domain and make truthfulness prediction methods more accessible, we introduce TruthTorchLM, an open-source, comprehensive Python library featuring over 30 truthfulness prediction methods, which we refer to as Truth Methods.

TruthTorchLM offers a broad and extensible collection of techniques. These methods span diverse trade-offs in computational cost, access level (e.g., black-box vs. white-box), grounding document requirements, and supervision type (self-supervised or supervised).

TruthTorchLM is seamlessly compatible with both HuggingFace and LiteLLM, enabling support for locally hosted and API-based models. It also provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction, along with a flexible framework for extending the library with new methods.

System Overview

TruthTorchLM provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction.

Architecture & Workflow

The library is designed around a central abstraction: Truth Methods, which predict the truthfulness of LLM-generated outputs. Users can generate responses for any input query and apply one or more truth methods to assess reliability.

TruthTorchLM System Overview

🔧 Unified Generation

Seamless integration with HuggingFace and LiteLLM. Generate responses and get truth values in a single function call.

📊 Evaluation Tools

Built-in support for AUROC, PRR, F1 score, and other metrics. Use traditional or LLM-as-a-judge correctness evaluation.

⚖️ Calibration

Normalize truth scores to [0,1] using Isotonic Regression or min-max normalization for meaningful comparison and ensembling.

📝 Long-form Support

Decompose long-form generations into individual claims and assess truthfulness at the claim level using specialized methods.

Truth Methods

A representative subset of available methods categorized by their characteristics and trade-offs.

Truth Method Document-Grounding Supervised Access Level Sampling-Required
LARS Grey-box
MARS Grey-box
SelfDetection Black-box
PTrue Grey-box
AttentionScore White-box
CrossExamination Black-box
Eccentricity Black-box
GoogleSearchCheck Black-box
Inside White-box
KernelLanguageEntropy Black-box
MiniCheck Black-box
Matrix-Degree Black-box
SAPLMA White-box
SemanticEntropy Grey-box
MultiLLMCollab Black-box
SAR Grey-box
VerbalizedConfidence Black-box
DirectionalEntailmentGraph Black-box

Access Levels: Black-box Output only | Grey-box Output probabilities | White-box Internal representations

Benchmark Results

AUROC and PRR performance on TriviaQA, GSM8K, and FactScore-Bio datasets.

Truth Method LLaMA-3 8B GPT-4o-mini
TriviaQA GSM8K FactScore-Bio TriviaQA GSM8K FactScore-Bio
AUROC PRR AUROC PRR AUROC PRR AUROC PRR AUROC PRR AUROC PRR
LARS 0.861 0.783 0.834 0.719 0.677 0.391 0.852 0.766 0.840 0.686 0.640 0.294
MARS 0.763 0.635 0.730 0.488 0.660 0.367 0.792 0.668 0.735 0.480 0.655 0.405
SelfDetection 0.780 0.590 0.556 0.090 0.687 0.369 0.799 0.587 0.736 0.421 0.671 0.313
PTrue 0.727 0.485 0.654 0.307 0.670 0.368 0.772 0.509 0.833 0.636 0.658 0.372
AttentionScore 0.523 0.092 0.503 -0.024 0.644 0.263 -- -- -- -- -- --
CrossExamination 0.664 0.377 0.585 0.187 0.683 0.361 0.718 0.483 0.768 0.551 0.635 0.289
Eccentricity 0.809 0.645 0.703 0.450 0.695 0.415 0.817 0.632 0.754 0.455 0.671 0.421
GoogleSearchCheck 0.672 0.470 -- -- -- -- 0.779 0.673 -- -- -- --
Inside 0.711 0.478 0.689 0.354 0.636 0.221 -- -- -- -- -- --
KernelLanguageEntropy 0.792 0.596 0.662 0.296 0.680 0.396 0.820 0.635 0.706 0.349 0.678 0.397
SAPLMA 0.850 0.726 0.815 0.642 0.651 0.347 -- -- -- -- -- --
SemanticEntropy 0.799 0.652 0.699 0.417 0.682 0.403 0.813 0.673 0.735 0.464 0.681 0.447
MultiLLMCollab 0.632 0.350 0.689 0.320 0.681 0.347 0.778 0.565 0.933 0.879 0.671 0.399
SAR 0.804 0.679 0.768 0.590 0.674 0.389 0.835 0.724 0.764 0.512 0.671 0.433
VerbalizedConfidence 0.759 0.547 0.579 0.234 0.698 0.460 0.836 0.740 0.652 0.369 0.717 0.514
DirectionalEntailmentGraph 0.745 0.513 0.731 0.501 0.659 0.347 0.778 0.532 0.736 0.439 0.658 0.380

Note: "--" indicates the method is not applicable (e.g., white-box methods cannot be used with GPT-4o-mini). Highlighted cells show best performance in each category.

Quick Start

Get started with TruthTorchLM in just a few lines of code.

import TruthTorchLM as ttlm

# Define truth methods
lars = ttlm.truth_methods.LARS()
confidence = ttlm.truth_methods.Confidence()
self_detection = ttlm.truth_methods.SelfDetection(number_of_questions=5)
truth_methods = [lars, confidence, self_detection]

# Define chat input
chat = [{"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital city of France?"}]

# Generate with a HuggingFace model
output = ttlm.generate_with_truth_value(
    model=model, tokenizer=tokenizer,
    messages=chat,
    truth_methods=truth_methods,
    max_new_tokens=100, temperature=0.7)

# Or use an API-based model
output = ttlm.generate_with_truth_value(
    model="GPT-4o", messages=chat,
    truth_methods=truth_methods)
pip install TruthTorchLM