[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

yanggaome · 2024-07-15T15:34:44Z

Describe the bug

After switching promptflow-eval from 0.3.0 to 0.3.1, for same code, it keeps giving me the error:

2024-07-15 06:52:21 +0000     377 execution.bulk     ERROR    Error occurred while executing batch run. Exception: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

======= Run Summary =======

Run name: "promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"
Run status: "Failed"
Start time: "2024-07-15 06:52:20.066987+00:00"
Duration: "0:00:01.488330"
Output path: "/root/.promptflow/.runs/promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"

Traceback (most recent call last):
...
    _ = evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
    raise e
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
    return _evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 447, in _evaluate
    evaluator_info["result"] = batch_run_client.get_details(evaluator_info["run"], all_results=True)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_batch_run_client/proxy_client.py", line 33, in get_details
    run = proxy_run.run.result(timeout=BATCH_RUN_TIMEOUT)
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
    return self._run(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
    return self.runs.create_or_update(run=run, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 137, in create_or_update
    self.stream(created_run)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 228, in stream
    raise InvalidRunStatusError(error_message)
promptflow._sdk._errors.InvalidRunStatusError: First error message is: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

I noticed some changes related to performance/parallelization between 0.3.0 and 0.3.1. e.g. 32115d3

for the evaluate API, i see it changed from

    use_thread_pool = kwargs.get("_use_thread_pool", True)
    batch_run_client = CodeClient() if use_thread_pool else pf_client

to

    use_pf_client = kwargs.get("_use_pf_client", True)
    batch_run_client = ProxyClient(pf_client) if use_pf_client else CodeClient()

so accordingly, the batch_run_client changed from a CodeClient to ProxyClient with default inputs (neither passing in _use_thread_pool nor _use_pf_client)

Is this related to the failure? and which client is parallelization and recommended to use?

How To Reproduce the bug
Steps to reproduce the behavior, how frequent can you experience the bug:
1.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
Operating System: [e.g. Ubuntu 20.04, Windows 11]
Python Version using python --version: [e.g. python==3.10.12]

{
"promptflow": "1.13.0",
"promptflow-azure": "1.13.0",
"promptflow-core": "1.13.0",
"promptflow-devkit": "1.13.0",
"promptflow-evals": "0.3.1", or 0.3.0 as comparision
"promptflow-tracing": "1.13.0"
}

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

yanggaome · 2024-07-15T18:22:09Z

one observation is that, when I created the target user call (with a class same with the evaluator, provided init and call methods)

in the constructor, if I passed the loaded data (~100 lines jsonl data) as argument explicitly: TargetCallClass(config, input_data), then it got this error

if i passed this loaded data in a config.input_data: TargetCallClass(config), it seems better, better means can succeeded on one machine, but still failed on another. not sure if this is related, but just giving this observation

do you have any idea about this?

also, if i set _use_pf_client to false with 0.3.1 evals, it worked fine (just as 0.3.0 as they both use CodeClient not ProxyClient)

yanggaome · 2024-07-16T04:07:34Z

Hi @ninghu , I think i was able to identify the root cause.

The issue was when the evaluate api gets called, the data path should be absolute path, not relative path w.r.t. working directory.

    _ = evaluate(
        evaluation_name=evaluation_name,
        data=data_path,
        target=custom_user_call,
        evaluators=evaluators_set,
        evaluator_config=evaluators_config,

        )

When using CodeClient, the dataframe read from data_path is used. When using ProxyClient, data (data path) is used for pfclient run. However, this still puzzles me, because for target call, it is also a pf client run, with data to be relative path, it still works well. but it makes a difference when using ProxyClient.

Another observation is, if we use one evaluator (either built-in or customized), even with ProxyClient + relative path, it still worked.

But when using two evaluators, ProxyClient + relative path didn't work. We have to use absolute path.

ninghu · 2024-07-16T16:26:08Z

These are great findings! The behavior you described is interesting and seems to occur only when using more than one evaluator. We'll try to reproduce this issue and get back to you.

github-actions · 2024-08-15T21:32:45Z

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

yanggaome added the bug Something isn't working label Jul 15, 2024

ninghu self-assigned this Jul 15, 2024

github-actions bot added the no-recent-activity There has been no recent activity on this issue/pull request label Aug 15, 2024

D-W- added the promptflow-evals label Aug 22, 2024

github-actions bot removed the no-recent-activity There has been no recent activity on this issue/pull request label Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

yanggaome commented Jul 15, 2024 •

edited

Loading

yanggaome commented Jul 15, 2024 •

edited

Loading

yanggaome commented Jul 16, 2024

ninghu commented Jul 16, 2024

github-actions bot commented Aug 15, 2024

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

Kommentare

yanggaome commented Jul 15, 2024 • edited Loading

yanggaome commented Jul 15, 2024 • edited Loading

yanggaome commented Jul 16, 2024

ninghu commented Jul 16, 2024

github-actions bot commented Aug 15, 2024

yanggaome commented Jul 15, 2024 •

edited

Loading

yanggaome commented Jul 15, 2024 •

edited

Loading