Don’t Stop Believin’: A Unified Evaluation Approach for LLM Honeypots
Don’t Stop Believin’: A Unified Evaluation Approach for LLM Honeypots
Blog Article
The research area of honeypots is gaining new momentum, driven by advancements in large language models (LLMs).The chat-based applications of generative pretrained transformer (GPT) models seem ideal for the use as honeypot backends, especially in request-response protocols like Secure Shell (SSH).By leveraging LLMs, many challenges associated with traditional honeypots – such as high development costs, ease of exposure, and breakout risks Kratom Gummies – appear to be solved.While early studies have primarily focused on the potential of these models, our research investigates the current limitations of GPT-3.
5 by analyzing three datasets of varying complexity.We conducted an expert annotation of over 1,400 request-response pairs, encompassing 230 different base commands.Our findings reveal that while GPT-3.5 struggles to maintain context, incorporating session context into response generation improves the quality of SSH responses.
Additionally, we explored whether distinguishing between convincing and non-convincing responses Military Airplane Model Kit is a metrics issue.We propose a paraphrase-mining approach to address this challenge, which achieved a macro F1 score of 77.85% using cosine distance in our evaluation.This method has the potential to reduce annotation efforts, converge LLM-based honeypot performance evaluation, and facilitate comparisons between new and previous approaches in future research.