TrustLLM: Trustworthiness in Large Language Models

ICML 2024

Yue Huang1,2 Lichao Sun1 Haoran Wang3 Siyuan Wu4 Qihui Zhang4 Yuan Li5 Chujie Gao4 Yixin Huang6 Wenhan Lyu7 Yixuan Zhang7 Xiner Li8 Hanchi Sun1 Zhengliang Liu9 Yixin Liu1 Yijue Wang10 Zhikun Zhang11 et al.

1. Lehigh University 2. University of Notre Dame 3. Illinois Institute of Technology 4. CISPA Helmholtz Center for Information Security 5. University of Cambridge 6. Institut Polytechnique de Paris 7. William & Mary 8. Texas A&M University 9. University of Georgia 10. Samsung Research America 11. Stanford University

Abstract


Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. Besides these observations, we uncover key insights into the multifaceted trustworthiness in LLMs and highlight the need for continued research efforts to enhance their reliability and ethical alignment.

Resources


Citation

 @inproceedings{SZa24,
    author = {Lichao Sun and Yue Huang and Haoran Wang and Siyuan Wu and Qihui Zhang and Yuan Li and Chujie Gao and Yixin Huang and Wenhan Lyu and Yixuan Zhang and Xiner Li and Hanchi Sun and Zhengliang Liu and Yixin Liu and Yijue Wang and Zhikun Zhang and others},
    title = {{TrustLLM: Trustworthiness in Large Language Models}},
    booktitle = {{ICML}},
    publisher = {PMLR},
    year = {2024},
}