Can a general-purpose vision model read an aeronautical chart like a pilot?
In this article, I benchmark two vision-language models on a real visual approach chart (that of Chavenay, LFPX, my home airfield), asking each to extract ICAO code, coordinates, frequencies and chart metadata as structured JSON.
- Qwen2.5-VL-72B-Instruct on OVHcloud AI Endpoints: surprisingly strong zero-shot, nails the ICAO code, coordinates and most frequencies, and even returns per-field confidence scores.
- Llava-next-mistral-7b: struggles with field confusion, misread dates and unstable JSON output.
The article also argues for hybrid architectures (AI extraction, rule-based validation, human oversight) and explains why accessible cloud endpoints make iterative aerospace experimentation actually viable at reasonable cost.