feat(ocr): retry exponential backoff sur 429 dans ocr:validate
All checks were successful
Build & Deploy API / build-and-deploy (push) Successful in 1m19s
All checks were successful
Build & Deploy API / build-and-deploy (push) Successful in 1m19s
La free tier Mistral a un rate limit non-linéaire (parfois 4-5 req/min
acceptées, parfois 1 req/2min selon la charge). Un délai fixe entre
calls ne suffit pas — on retry max 3× avec backoff 30s, 60s, 90s.
Combiné avec --delay-ms (espacement nominal entre calls), ça permet
de tenir tout un bench même si le quota se serre en cours de route.
Bench réel observé sur 10 factures variées (templates Boulangerie,
Mercier moderne, Mercier ancien, retards 5j/30j/90j/180j) :
- amountTtcCents : 10/10 (100 %) ← précision financière parfaite
- clientEmail : 10/10 (100 %)
- numero : 9/10 (90 %) ← 1 hallucination "FOUT"
- issueDate : 9/10 (90 %) ← même facture, 1970-01-01 fallback
- dueDate : 9/10 (90 %) ← idem
- clientName : 8/10 (80 %) ← 2 fails : Mistral inclut contact
- Latence moy. : 9.5 s/facture (avec delay 7s)
- 8/10 factures 100 % match (80 %)
- 91.7 % accuracy globale champs
Insights actionnables :
- amountTtcCents et clientEmail sont fiables → ok pour auto-validate
- clientName : ajouter au prompt "ne pas inclure le contact (M./Mme)"
- 1 facture sur 10 fait halluciner Mistral (FOUT + dates 1970) →
afficher "à vérifier" dans la UI quand confidence < 0.5 sur dates
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
2f96238efe
commit
4d0cab8b33
@ -171,17 +171,32 @@ export default class OcrValidate extends BaseCommand {
|
|||||||
await driveDisk.put(storageKey, buffer)
|
await driveDisk.put(storageKey, buffer)
|
||||||
|
|
||||||
const t0 = Date.now()
|
const t0 = Date.now()
|
||||||
let ocrResult: OcrResult
|
let ocrResult: OcrResult | null = null
|
||||||
try {
|
// Retry sur 429 (rate limit) avec backoff exponentiel — utile en
|
||||||
ocrResult = await provider.extract({ storageKey, filename: pdf })
|
// free tier Mistral où la limite n'est pas linéaire. Max 3 retries
|
||||||
} catch (err) {
|
// (30s + 60s + 90s = 3 min max d'attente avant abandon).
|
||||||
this.logger.error(`✗ ${pdf} — extraction throw : ${(err as Error).message}`)
|
const maxRetries = 3
|
||||||
// Cleanup temp file
|
for (let attempt = 0; attempt <= maxRetries; attempt++) {
|
||||||
await driveDisk.delete(storageKey).catch(() => {})
|
try {
|
||||||
continue
|
ocrResult = await provider.extract({ storageKey, filename: pdf })
|
||||||
|
break
|
||||||
|
} catch (err) {
|
||||||
|
const msg = (err as Error).message
|
||||||
|
const isRateLimit = msg.includes('429') || msg.includes('Rate limit')
|
||||||
|
if (!isRateLimit || attempt === maxRetries) {
|
||||||
|
this.logger.error(`✗ ${pdf} — extraction throw : ${msg}`)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
const waitSec = 30 + attempt * 30
|
||||||
|
this.logger.warning(
|
||||||
|
`⏸ ${pdf} — 429 rate limit, retry dans ${waitSec}s (attempt ${attempt + 1}/${maxRetries})`
|
||||||
|
)
|
||||||
|
await new Promise((r) => setTimeout(r, waitSec * 1000))
|
||||||
|
}
|
||||||
}
|
}
|
||||||
const durationMs = Date.now() - t0
|
const durationMs = Date.now() - t0
|
||||||
await driveDisk.delete(storageKey).catch(() => {})
|
await driveDisk.delete(storageKey).catch(() => {})
|
||||||
|
if (!ocrResult) continue
|
||||||
|
|
||||||
const comparisons = compareFields(expected, ocrResult)
|
const comparisons = compareFields(expected, ocrResult)
|
||||||
const allMatch = comparisons.every((c) => c.match)
|
const allMatch = comparisons.every((c) => c.match)
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user