Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]evaluator keeps failing with promptflow-eval 0.3.1 but works with 0.3.0 #3549

Öffnen Sie
yanggaome opened this issue Jul 15, 2024 · 4 comments
Öffnen Sie
Assignees
Labels
bug Something isn't working promptflow-evals

Kommentare

@yanggaome
Copy link

yanggaome commented Jul 15, 2024

Describe the bug

After switching promptflow-eval from 0.3.0 to 0.3.1, for same code, it keeps giving me the error:

2024-07-15 06:52:21 +0000     377 execution.bulk     ERROR    Error occurred while executing batch run. Exception: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

======= Run Summary =======

Run name: "promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"
Run status: "Failed"
Start time: "2024-07-15 06:52:20.066987+00:00"
Duration: "0:00:01.488330"
Output path: "/root/.promptflow/.runs/promptflow_evals_evaluators_content_safety_content_safety_contentsafetyevaluator_k3ldgeeq_20240715_065220_068989"

Traceback (most recent call last):
...
    _ = evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_telemetry/__init__.py", line 111, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 365, in evaluate
    raise e
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 340, in evaluate
    return _evaluate(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_evaluate.py", line 447, in _evaluate
    evaluator_info["result"] = batch_run_client.get_details(evaluator_info["run"], all_results=True)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/evals/evaluate/_batch_run_client/proxy_client.py", line 33, in get_details
    run = proxy_run.run.result(timeout=BATCH_RUN_TIMEOUT)
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 446, in result
    return self.__get_result()
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/local/miniconda/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 301, in run
    return self._run(
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_pf_client.py", line 226, in _run
    return self.runs.create_or_update(run=run, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 137, in create_or_update
    self.stream(created_run)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/_telemetry/activity.py", line 265, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/local/miniconda/lib/python3.9/site-packages/promptflow/_sdk/operations/_run_operations.py", line 228, in stream
    raise InvalidRunStatusError(error_message)
promptflow._sdk._errors.InvalidRunStatusError: First error message is: The input for batch run is incorrect. Input from key 'data' is an empty list, which means we cannot generate a single line input for the flow run. Please rectify the input and try again.

I noticed some changes related to performance/parallelization between 0.3.0 and 0.3.1. e.g. 32115d3

  1. for the evaluate API, i see it changed from
    use_thread_pool = kwargs.get("_use_thread_pool", True)
    batch_run_client = CodeClient() if use_thread_pool else pf_client

to

    use_pf_client = kwargs.get("_use_pf_client", True)
    batch_run_client = ProxyClient(pf_client) if use_pf_client else CodeClient()
  1. so accordingly, the batch_run_client changed from a CodeClient to ProxyClient with default inputs (neither passing in _use_thread_pool nor _use_pf_client)

Is this related to the failure? and which client is parallelization and recommended to use?

How To Reproduce the bug
Steps to reproduce the behavior, how frequent can you experience the bug:
1.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Running Information(please complete the following information):

  • Promptflow Package Version using pf -v: [e.g. 0.0.102309906]
  • Operating System: [e.g. Ubuntu 20.04, Windows 11]
  • Python Version using python --version: [e.g. python==3.10.12]

{
"promptflow": "1.13.0",
"promptflow-azure": "1.13.0",
"promptflow-core": "1.13.0",
"promptflow-devkit": "1.13.0",
"promptflow-evals": "0.3.1", or 0.3.0 as comparision
"promptflow-tracing": "1.13.0"
}

Additional context
Add any other context about the problem here.

@yanggaome yanggaome added the bug Something isn't working label Jul 15, 2024
@yanggaome
Copy link
Author

yanggaome commented Jul 15, 2024

one observation is that, when I created the target user call (with a class same with the evaluator, provided init and call methods)

in the constructor, if I passed the loaded data (~100 lines jsonl data) as argument explicitly: TargetCallClass(config, input_data), then it got this error

if i passed this loaded data in a config.input_data: TargetCallClass(config), it seems better, better means can succeeded on one machine, but still failed on another. not sure if this is related, but just giving this observation

do you have any idea about this?

also, if i set _use_pf_client to false with 0.3.1 evals, it worked fine (just as 0.3.0 as they both use CodeClient not ProxyClient)

@ninghu ninghu self-assigned this Jul 15, 2024
@yanggaome
Copy link
Author

Hi @ninghu , I think i was able to identify the root cause.

The issue was when the evaluate api gets called, the data path should be absolute path, not relative path w.r.t. working directory.

    _ = evaluate(
        evaluation_name=evaluation_name,
        data=data_path,
        target=custom_user_call,
        evaluators=evaluators_set,
        evaluator_config=evaluators_config,

        )

When using CodeClient, the dataframe read from data_path is used. When using ProxyClient, data (data path) is used for pfclient run. However, this still puzzles me, because for target call, it is also a pf client run, with data to be relative path, it still works well. but it makes a difference when using ProxyClient.

Another observation is, if we use one evaluator (either built-in or customized), even with ProxyClient + relative path, it still worked.

But when using two evaluators, ProxyClient + relative path didn't work. We have to use absolute path.

@ninghu
Copy link
Member

ninghu commented Jul 16, 2024

These are great findings! The behavior you described is interesting and seems to occur only when using more than one evaluator. We'll try to reproduce this issue and get back to you.

Copy link

Hi, we're sending this friendly reminder because we haven't heard back from you in 30 days. We need more information about this issue to help address it. Please be sure to give us your input. If we don't hear back from you within 7 days of this comment, the issue will be automatically closed. Thank you!

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue/pull request label Aug 15, 2024
@github-actions github-actions bot removed the no-recent-activity There has been no recent activity on this issue/pull request label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working promptflow-evals
Projects
None yet
Development

No branches or pull requests

3 participants