#907 Less Defined Knowledge and More True Alarms: Reference-based Phishing Detection without a Pre-defined Reference List


More

  • Ee-Chien Chang
  • Guoxing Chen
  • Haojin Zhu
  • Jun Han
  • Yan Meng
  • Yu Yu
  • Yun Lin
  • Yuncong Hu

  • All (Shanghai Jiao Tong University)
  • All (National Univeresity of Singapore)

R2 Accept on Shepherd Approval -> Accept

[PDF] Final version (2.1MB) Jun 5, 2024, 3:57:11 PM AoE · ade0ae72586e3119d45f3a5b3fee836ae8fc58be6985bb864723a5c84c44ca15ade0ae72

[PDF] Submission version

Phishing, a pervasive form of social engineering attack that compromises user credentials, has led to significant financial losses and undermined public trust. Modern phishing detection has gravitated to reference-based methods for their explainability and robustness against zero-day phishing attacks. These methods maintain and update predefined reference lists to specify domain-brand relationships, alarming phishing websites by the inconsistencies between their domain (e.g., payp0l.com) and intended brand (e.g., PayPal). However, the curated lists are largely limited by their lack of comprehensiveness and high maintenance costs in practice. In this work, we present PhishLLM as a novel reference-based phishing detector that operates without an explicit pre-defined reference list. Our rationale lies in that modern LLMs have encoded far more extensive brand-domain information than any predefined list. Further, the detection of many webpage semantics such as credential-taking intention analysis is more like a linguistic problem, but they are processed as a vision problem now. Thus, we design PhishLLM to decode (or retrieve) the domain-brand relationships from LLM and effectively parse the credential-taking intention of a webpage, without the cost of maintaining and updating an explicit reference list. Moreover, to control the hallucination of LLMs, we introduce a search-engine-based validation mechanism to remove the misinformation. Our extensive experiments show that PhishLLM significantly outperforms state-of-the-art solutions such as Phishpedia and PhishIntention, improving the recall by 21% to 66%, at the cost of negligible precision. Our field studies show that PhishLLM discovers (1) 6 times more zero-day phishing webpages compared to existing approaches such as PhishIntention and (2) close to 2 times more zero-day phishing webpages even if it is enhanced by DynaPhish. Our code is available at https://github.com/code-philia/PhishLLM/.

R. Liu, Y. Lin, X. Teoh, G. Liu, Z. Huang, J. Dong

  • Social issues and security: Emerging threats, harassment, extremism, and online abuse
  • Social issues and security: Information manipulation, misinformation, and disinformation
  • Social issues and security: Protecting and understanding at-risk users
Internet Defense Prize
Distinguished Paper Award
Artifact Evaluation

To edit this submission, sign in using your email and password.

EthCon.2RecDecWriQuaConRecDec
Review #907A1223
Review #907B1333
Review #907C3343
Review #907D1442

[Text] Reviews and comments in plain text