If your AI vendor can't give you RAGAS scores, they can't prove their system works. Here's what the four metrics mean and exactly what to ask for before you sign anything.
In enterprise AI procurement, a new question is appearing in RFPs: what are your RAGAS evaluation scores? Most vendors cannot answer it. The ones who can are building something categorically different.
RAGAS — Retrieval Augmented Generation Assessment — is a framework for evaluating RAG systems across four measurable dimensions. It has become the closest thing the AI engineering industry has to a standardised quality benchmark for knowledge retrieval systems.
1. Faithfulness
Does the answer contain only information present in the retrieved context? A faithfulness score of 1.0 means every claim in the answer can be traced to a specific passage in your documents. A score below 0.8 means the model is adding information not present in your knowledge base — which is the technical definition of hallucination.
2. Answer Relevance
Does the answer actually address the question asked? A system can retrieve the right documents and still produce an answer that is tangentially related rather than directly responsive.
3. Context Precision
Of the document passages retrieved, how many were actually useful? Low context precision means the retrieval system is pulling irrelevant context — which dilutes answer quality and increases hallucination risk.
4. Context Recall
Did the retrieval system find all the information needed? High faithfulness with low recall means the system is honest about what it found but missed key information that existed in your knowledge base.
For a production RAG system on business-critical knowledge, the floor should be:
At Scaliq, we hold 85% across all four metrics as the minimum threshold before any RAG system goes live. We run RAGAS evaluation pipelines continuously in production — so quality degradation is caught before users experience it.
Ask any AI vendor building you a knowledge system whether they can provide RAGAS evaluation scores on your data before deployment.
If the answer is that they do not use RAGAS but their system is accurate — that is not an answer. Accuracy without measurement is a claim, not a guarantee.
If the answer is that they will run evaluations post-deployment — that means you are paying to discover whether the system works after it is already live with your users.
The difference between a demo and a production AI system is measurability. RAGAS is how you measure.
Ready to deploy?
Free 30-minute technical scoping call. We scope your AI system live and give you a clear deployment plan.