JSON parsing: always fix all incoming json when using _manual_json #551

MarkJGx · 2024-07-14T13:19:17Z

Description

This change set adds non-LLM-based JSON malformity handling as a preliminary step before using the more resource-intensive LLM-based fixup.

More Details

While running GraphRAG with a local Ollama model, I noticed frequent malformed JSON responses from LLM requests, significantly slowing down the process on an M1 Max MacBook. In a fast, parallel cloud inference system, this issue is manageable, but locally it becomes a bottleneck. After indexing, I found 140 instances of JSON parsing failures.

The json_repair library effectively fixed the malformed JSON in my tests. I opted not to delve into the specific parsing failure cases, as they are mainly LLM-related and predicting every edge case is impractical. This library should be robust enough to handle most local LLM faults.

Related Issues

[Ollama] GraphRAG Community Support for running Ollama #345

Proposed Changes

hinzufügen json_repair as a new Poetry dependency.
- https://pypi.org/project/json-repair/
Use json_repair for initial JSON repair in _manual_json, with a fallback to the LLM.
Apply JSON repair when graph search JSON parsing fails.

Checklist

Tested these changes locally.
Reviewed the code changes.
Updated documentation (if necessary).
Added appropriate unit tests (if applicable).

MarkJGx · 2024-07-14T13:25:00Z

@microsoft-github-policy-service agree

graphrag/llm/openai/openai_chat_llm.py

- search: more aggressive cleanup path - _manual_json: Opt to fix all incoming LLM json using cheaper repair_json from json_repair before running it through the LLM repair path.

…json

MarkJGx · 2024-07-28T21:26:48Z

Hey. I've addressed the oversight caught in the comment and rebased on main, while rehashing the poetry lock with the new library added. Waiting for a review. @AlonsoGuevara @jgbradley1 et al.

natoverse · 2024-08-09T17:29:47Z

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

MarkJGx requested a review from a team as a code owner July 14, 2024 13:19

MarkJGx changed the title ~~JSON parsing: always fix all incoming manual json and search~~ JSON parsing: always fix all incoming json when operating _manual_json mode Jul 14, 2024

s106916 mentioned this pull request Jul 15, 2024

fix the double {{ and }} when model_supports_json: false #503

Closed

4 tasks

s106916 reviewed Jul 17, 2024

View reviewed changes

graphrag/llm/openai/openai_chat_llm.py Outdated Show resolved Hide resolved

s106916 mentioned this pull request Jul 19, 2024

Json validation and fix #617

Closed

4 tasks

MarkJGx force-pushed the markjg/fix-json-parsing branch from bb80399 to 8486380 Compare July 28, 2024 21:20

MarkJGx requested a review from a team as a code owner July 28, 2024 21:20

MarkJGx force-pushed the markjg/fix-json-parsing branch from 8486380 to 2eb3c21 Compare July 28, 2024 21:21

MarkJGx added 2 commits July 29, 2024 00:23

ChatLLM: always fix all incoming manual json

79ebea6

- search: more aggressive cleanup path - _manual_json: Opt to fix all incoming LLM json using cheaper repair_json from json_repair before running it through the LLM repair path.

_manual_json: fix not chaining clean_up_json output to fix_malformed_…

a375643

…json

MarkJGx force-pushed the markjg/fix-json-parsing branch from 2eb3c21 to a375643 Compare July 28, 2024 21:23

MarkJGx changed the title ~~JSON parsing: always fix all incoming json when operating _manual_json mode~~ JSON parsing: always fix all incoming json when using _manual_json Jul 29, 2024

AlonsoGuevara mentioned this pull request Aug 2, 2024

Repair json when LLM returns faulty responses on non json mode #801

Merged

4 tasks

natoverse closed this Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON parsing: always fix all incoming json when using _manual_json #551

JSON parsing: always fix all incoming json when using _manual_json #551

MarkJGx commented Jul 14, 2024 •

edited

Loading

MarkJGx commented Jul 14, 2024

MarkJGx commented Jul 28, 2024

natoverse commented Aug 9, 2024

JSON parsing: always fix all incoming json when using _manual_json #551

JSON parsing: always fix all incoming json when using _manual_json #551

Conversation

MarkJGx commented Jul 14, 2024 • edited Loading

Description

More Details

Related Issues

Proposed Changes

Checklist

MarkJGx commented Jul 14, 2024

MarkJGx commented Jul 28, 2024

natoverse commented Aug 9, 2024

MarkJGx commented Jul 14, 2024 •

edited

Loading