Аннотация

A. K. Jaiswal

Evaluating RAG System Models Robustness to Hybrid Homoglyph and Emoji-Based Next-Generation Adversarial Prompt Injection

Recent advances in retrieval-augmented generation (RAG) have enabled large language models (LLMs) to reliably answer user queries by retrieving relevant documents. However, adversaries may exploit obfuscation techniques—such as homoglyph substitution, emoji injection, or invisible Unicode characters—to hide attack payloads and induce leakage of sensitive information. In this work, we systematically evaluate five RAG models running inside Docker Desktop containers under a unified pipeline designed to test for such leaks. We have used pdf, text, html and readme file format as a document loader to the RAG model and then we used FAISS vector database for index vectorization system. We generate adversarial prompts against synthetic secrets using templated generators, varied obfuscation types, and measure metrics including adversarial leak count, pass vs block rates, benign leak frequencies, and the overall leak rate. Our experiments show a high leak rate: in one aggregated run over 5 models, approximately 68.3\% of adversarial prompts result in leaks under our threat model. We find that combined obfuscations (e.g. homoglyph + emojis) are especially effective. We further provide per-model comparisons, analyze failure modalities, and suggest practical mitigations including input normalization and detection. We release all artifacts—including Docker separate containerization of the models, prompt generation scripts, secret lists, and evaluation scripts—to support reproducibility and artifact evaluation. Our findings raise important implications for RAG deployment in privacy/security-sensitive applications.

КЛЮЧЕВЫЕ СЛОВА: RAG, adversarial prompts, homoglyphs, emoji injection, LLM leakage, model robustness.