Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC: AI-Gen examples ctypeslib.as_ctypes_types #26827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2024

Conversation

otieno-juma
Copy link
Contributor

I used AI Llama 3 to help create these.
@bmwoodruff and I reviewed them.
[skip actions] [skip azp] [skip cirrus]

as-ctypes

@matthew-brett
Copy link
Contributor

Is now the time to ask whether we have a licensing concern about AI-generated code (for example - that it will pick up GPL-3 content)?

@bmwoodruff
Copy link
Member

I think that's a great question to bring up now. We wanted to make sure anything AI generated was tagged appropriately, so that it's easy to remove if it becomes a problem at some point.

@rgommers
Copy link
Member

rgommers commented Jul 7, 2024

It would be good to explain how this PR was put together. E.g.:

  1. What was the LLM model prompt you used? And did this come out on the first try, or did you guide it to construct a more relevant example than it could do by itself?
  2. Did you use a code search tool to see if this example came from somewhere else verbatim?

@rgommers
Copy link
Member

rgommers commented Jul 7, 2024

Logistical comment: please use [docs only] in your commit messages for docs-only PRs, as documenting in the contributing guide, to avoid running a lot of unnecessary CI jobs.

@bmwoodruff
Copy link
Member

bmwoodruff commented Jul 7, 2024

please use [docs only] in your commit messages

We've been using [skip actions] [skip azp] [skip cirrus] while we wait for #26316 to be complete. I double checked the contributing guide to see if I'd missed something. I've been watching for [docs only] to be merged in, and will swap when it's ready. If using [skip actions] [skip azp] [skip cirrus] is incorrect till then, please let me know. That's easy to fix.

I'm pretty sure the current failed tests are related to a different issue which was fixed right before the 5th PR was contributed. I haven't wanted to push an empty commit yet, with the right tags, to run the docs only tests till after a discussion at the triage meeting (and enough time has lapsed for all to weigh in on the mailing list).

We can also make sure to combine examples from several functions into one PR to help reduce CI costs. We could combine all 5 of the currently submitted PRs into one, if you think that's appropriate. I think grouping 3 functions into one PR is what the recommendation was from the last triage meeting. I gave @otieno-juma the option of submitting the PRs he'd already had ready for over a week, or merging them into one before submitting, and he chose to submit each individually.

explain how this PR was put together

The explanation is currently fully visible, yet scattered across many commits and issues at https://github.com/possee-org/genai-numpy/. I'll put together an organized short cohesive narrative early this week. When it's ready, I'll share a link here and post to the ongoing discussion on the mailing list.

@eagunn
Copy link
Contributor

eagunn commented Jul 8, 2024

I do understand and appreciate the larger issues about AI-generated code and copyright for the codebase as a whole.

However, the code chunks involved in all these related PRs are 3-4 line snippets within the docstrings for a function, intended to demonstrate the use of that function. These should be canonical examples; what any of us who teach coding might put in our lecture notes or type out during a demonstration in front of a class. If they did NOT match other code available out on the net, either character for character or within a few characters, I'd be more concerned.

Meanwhile, I hope that @otieno-juma, @bmwoodruff, and the rest of the POSSE (sp?) team are getting the basic content feedback that they need. Are the variable names clear enough? Should there be comments within the example? Should strings such as 'i4' be assigned to a variable with a meaningful name before being used so it is clearer what role that value is playing in the example. Unfortunately, I can't do any of that for this example because I, frankly, have no idea what the code is doing.

@charris
Copy link
Member

charris commented Jul 14, 2024

However, the code chunks involved in all these related PRs are 3-4 line snippets within the docstrings for a function

Agree that licensing problems are unlikely here, I think it would be fine to review these like any other examples.

@bmwoodruff
Copy link
Member

Here's a write up of the process we used to create the examples:

@rgommers
Copy link
Member

It seems okay to review and merge this, given the process of putting together this PR that @bmwoodruff posted above. It's in particular good to see that the generated code was:

  • picked by hand from a much larger set of potential examples, and judged as suitable
  • optimized for matching existing numpy examples
  • edited by hand to ensure correctly of outputs

@charris
Copy link
Member

charris commented Jul 18, 2024

close/reopen

@charris charris closed this Jul 18, 2024
@charris charris reopened this Jul 18, 2024
@charris
Copy link
Member

charris commented Jul 18, 2024

I think this needs a rebase on current main to fix the circleci failure, There was an update to the .circleci/config.yml on 6/13/2024. A simple git rebase main should work from your branch after main is updated from the github repo here.

I used AI Llama 3 to help create these.
@bmwoodruff and I reviewed them.
[skip actions] [skip azp] [skip cirrus]
@otieno-juma otieno-juma force-pushed the ai-examples-ctypeslib-as-ctypes branch from 6694c6a to 0d8832e Compare July 18, 2024 17:51
@charris charris merged commit c8ed7e9 into numpy:main Jul 18, 2024
4 checks passed
@charris
Copy link
Member

charris commented Jul 18, 2024

Thanks @otieno-juma .

@ngoldbaum
Copy link
Member

ngoldbaum commented Jul 19, 2024

Hi all, it looks like this broke the "benchmarks" CI, which runs the doctests:

=================================== FAILURES ===================================
___________________ [doctest] numpy.ctypeslib.as_ctypes_type ___________________
499           `ctypes.Structure`\ s
500         - insert padding fields
501 
502         Examples
503         --------
504         Converting a simple dtype:
505 
506         >>> dt = np.dtype('i4')
507         >>> ctype = np.ctypeslib.as_ctypes_type(dt)
508         >>> ctype
Expected:
    <class 'ctypes.c_int32'>
Got:
    <class 'ctypes.c_int'>

We recently updated how the doctests are running, please make sure that they run against PRs that touch docstrings. Just running the circleCI doc builder is not sufficient right now.

#26989 has a fix.

@ngoldbaum
Copy link
Member

Given the AI hallucinated some of the reprs and other doctesting output, I'd also appreciate it if you could make sure that the real output matches what the AI expects.

There may very well be cases where our reprs could be improved, but we shouldn't add reprs that don't yet exist to the docs before we change them to something nicer.

In this case the repr of the ctypes.Structure subclass that numpy dynamically defines is not ctypes.Structure but is instead just struct because that's the name we give it when we dynamically create the type. This would have been caught if a human had verified that the doctest output the AI generates matches the real output from the library.

@bmwoodruff
Copy link
Member

The tests we ran were all before the recent update to the tester. I'll take a look at the code we're using, and update things appropriately. The output of most of the AI generated scripts is hallucinated, so i wrote a script to strip all output, and then insert appropriate output afterwards. The tests passed before the recent change. I'll look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants