TruthTorchLM

Abstract

Generative Large Language Models (LLMs) inevitably produce untruthful responses. Accurately predicting the truthfulness of these outputs is critical, especially in high-stakes settings. To accelerate research in this domain and make truthfulness prediction methods more accessible, we introduce TruthTorchLM, an open-source, comprehensive Python library featuring over 30 truthfulness prediction methods, which we refer to as Truth Methods.

TruthTorchLM offers a broad and extensible collection of techniques. These methods span diverse trade-offs in computational cost, access level (e.g., black-box vs. white-box), grounding document requirements, and supervision type (self-supervised or supervised).

TruthTorchLM is seamlessly compatible with both HuggingFace and LiteLLM, enabling support for locally hosted and API-based models. It also provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction, along with a flexible framework for extending the library with new methods.

System Overview

TruthTorchLM provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction.

Architecture & Workflow

The library is designed around a central abstraction: Truth Methods, which predict the truthfulness of LLM-generated outputs. Users can generate responses for any input query and apply one or more truth methods to assess reliability.

🔧 Unified Generation

Seamless integration with HuggingFace and LiteLLM. Generate responses and get truth values in a single function call.

📊 Evaluation Tools

Built-in support for AUROC, PRR, F1 score, and other metrics. Use traditional or LLM-as-a-judge correctness evaluation.

⚖️ Calibration

Normalize truth scores to [0,1] using Isotonic Regression or min-max normalization for meaningful comparison and ensembling.

📝 Long-form Support

Decompose long-form generations into individual claims and assess truthfulness at the claim level using specialized methods.

Truth Methods

A representative subset of available methods categorized by their characteristics and trade-offs.

Truth Method	Document-Grounding	Supervised	Access Level	Sampling-Required
LARS	✗	✓	Grey-box	✗
MARS	✗	✗	Grey-box	✗
SelfDetection	✗	✗	Black-box	✓
PTrue	✗	✗	Grey-box	✗
AttentionScore	✗	✗	White-box	✗
CrossExamination	✗	✗	Black-box	✓
Eccentricity	✗	✗	Black-box	✓
GoogleSearchCheck	✓	✗	Black-box	✗
Inside	✗	✗	White-box	✓
KernelLanguageEntropy	✗	✗	Black-box	✓
MiniCheck	✓	✗	Black-box	✗
Matrix-Degree	✗	✗	Black-box	✓
SAPLMA	✗	✓	White-box	✗
SemanticEntropy	✗	✗	Grey-box	✓
MultiLLMCollab	✗	✗	Black-box	✓
SAR	✗	✗	Grey-box	✓
VerbalizedConfidence	✗	✗	Black-box	✗
DirectionalEntailmentGraph	✗	✗	Black-box	✓

Access Levels: Black-box Output only | Grey-box Output probabilities | White-box Internal representations

Benchmark Results

AUROC and PRR performance on TriviaQA, GSM8K, and FactScore-Bio datasets.

Truth Method	LLaMA-3 8B						GPT-4o-mini
Truth Method	TriviaQA		GSM8K		FactScore-Bio		TriviaQA		GSM8K		FactScore-Bio
	AUROC	PRR	AUROC	PRR	AUROC	PRR	AUROC	PRR	AUROC	PRR	AUROC	PRR
LARS	0.861	0.783	0.834	0.719	0.677	0.391	0.852	0.766	0.840	0.686	0.640	0.294
MARS	0.763	0.635	0.730	0.488	0.660	0.367	0.792	0.668	0.735	0.480	0.655	0.405
SelfDetection	0.780	0.590	0.556	0.090	0.687	0.369	0.799	0.587	0.736	0.421	0.671	0.313
PTrue	0.727	0.485	0.654	0.307	0.670	0.368	0.772	0.509	0.833	0.636	0.658	0.372
AttentionScore	0.523	0.092	0.503	-0.024	0.644	0.263	--	--	--	--	--	--
CrossExamination	0.664	0.377	0.585	0.187	0.683	0.361	0.718	0.483	0.768	0.551	0.635	0.289
Eccentricity	0.809	0.645	0.703	0.450	0.695	0.415	0.817	0.632	0.754	0.455	0.671	0.421
GoogleSearchCheck	0.672	0.470	--	--	--	--	0.779	0.673	--	--	--	--
Inside	0.711	0.478	0.689	0.354	0.636	0.221	--	--	--	--	--	--
KernelLanguageEntropy	0.792	0.596	0.662	0.296	0.680	0.396	0.820	0.635	0.706	0.349	0.678	0.397
SAPLMA	0.850	0.726	0.815	0.642	0.651	0.347	--	--	--	--	--	--
SemanticEntropy	0.799	0.652	0.699	0.417	0.682	0.403	0.813	0.673	0.735	0.464	0.681	0.447
MultiLLMCollab	0.632	0.350	0.689	0.320	0.681	0.347	0.778	0.565	0.933	0.879	0.671	0.399
SAR	0.804	0.679	0.768	0.590	0.674	0.389	0.835	0.724	0.764	0.512	0.671	0.433
VerbalizedConfidence	0.759	0.547	0.579	0.234	0.698	0.460	0.836	0.740	0.652	0.369	0.717	0.514
DirectionalEntailmentGraph	0.745	0.513	0.731	0.501	0.659	0.347	0.778	0.532	0.736	0.439	0.658	0.380

Note: "--" indicates the method is not applicable (e.g., white-box methods cannot be used with GPT-4o-mini). Highlighted cells show best performance in each category.

Quick Start

Get started with TruthTorchLM in just a few lines of code.

import TruthTorchLM as ttlm

# Define truth methods
lars = ttlm.truth_methods.LARS()
confidence = ttlm.truth_methods.Confidence()
self_detection = ttlm.truth_methods.SelfDetection(number_of_questions=5)
truth_methods = [lars, confidence, self_detection]

# Define chat input
chat = [{"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital city of France?"}]

# Generate with a HuggingFace model
output = ttlm.generate_with_truth_value(
    model=model, tokenizer=tokenizer,
    messages=chat,
    truth_methods=truth_methods,
    max_new_tokens=100, temperature=0.7)

# Or use an API-based model
output = ttlm.generate_with_truth_value(
    model="GPT-4o", messages=chat,
    truth_methods=truth_methods)

pip install TruthTorchLM