Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON parsing: always fix all incoming json when using _manual_json #551

Closed
wants to merge 2 commits into from

Conversation

MarkJGx
Copy link

@MarkJGx MarkJGx commented Jul 14, 2024

Description

This change set adds non-LLM-based JSON malformity handling as a preliminary step before using the more resource-intensive LLM-based fixup.

More Details

While running GraphRAG with a local Ollama model, I noticed frequent malformed JSON responses from LLM requests, significantly slowing down the process on an M1 Max MacBook. In a fast, parallel cloud inference system, this issue is manageable, but locally it becomes a bottleneck. After indexing, I found 140 instances of JSON parsing failures.

The json_repair library effectively fixed the malformed JSON in my tests. I opted not to delve into the specific parsing failure cases, as they are mainly LLM-related and predicting every edge case is impractical. This library should be robust enough to handle most local LLM faults.

Related Issues

Proposed Changes

  • hinzufügen json_repair as a new Poetry dependency.
  • Use json_repair for initial JSON repair in _manual_json, with a fallback to the LLM.
  • Apply JSON repair when graph search JSON parsing fails.

Checklist

  • Tested these changes locally.
  • Reviewed the code changes.
  • Updated documentation (if necessary).
  • Added appropriate unit tests (if applicable).

@MarkJGx MarkJGx requested a review from a team as a code owner July 14, 2024 13:19
@MarkJGx
Copy link
Author

MarkJGx commented Jul 14, 2024

@microsoft-github-policy-service agree

@MarkJGx MarkJGx changed the title JSON parsing: always fix all incoming manual json and search JSON parsing: always fix all incoming json when operating _manual_json mode Jul 14, 2024
@s106916 s106916 mentioned this pull request Jul 19, 2024
4 tasks
@MarkJGx MarkJGx requested a review from a team as a code owner July 28, 2024 21:20
- search: more aggressive cleanup path
- _manual_json: Opt to fix all incoming LLM json using cheaper repair_json from json_repair
 before running it through the LLM repair path.
@MarkJGx
Copy link
Author

MarkJGx commented Jul 28, 2024

Hey. I've addressed the oversight caught in the comment and rebased on main, while rehashing the poetry lock with the new library added. Waiting for a review. @AlonsoGuevara @jgbradley1 et al.

@MarkJGx MarkJGx changed the title JSON parsing: always fix all incoming json when operating _manual_json mode JSON parsing: always fix all incoming json when using _manual_json Jul 29, 2024
@natoverse
Copy link
Collaborator

We have resolved several issues related to text encoding and JSON parsing that are rolled up into version 0.2.2. Please try again with that version and re-open if this is still an issue.

@natoverse natoverse closed this Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants