Update `/generate` to not split classes & functions across cells #1158

srdas · 2024-12-18T18:26:33Z

/generate will sometimes split a class or a function across multiple code cells. After tracing the code in generate.py the error arises from unanticipated LLM behavior in the curation of the resultant Jupyter notebook in the creation of the outline object. This is best handled by post-processing the notebook to merge "hanging" code cells with the preceding code cell.

The error looks like this:

After the fix, the error is no longer present. See this example:

Here is another example of the rectified notebook generation:

if necessary to check the log, add print statements in lines 236, 240, 245 to the code as shown below.

The before and after versions of the logged notebook can be compared for lines with ***** CELL 1 ***** as shown here:

for more information, see https://pre-commit.ci

dlqqq

@srdas Awesome work! Love the creativity applied here. Few minor comments below.

dlqqq · 2024-12-18T22:46:17Z

packages/jupyter-ai/jupyter_ai/chat_handlers/generate.py

@@ -212,6 +212,15 @@ def create_notebook(outline):
        nb["cells"].append(nbf.new_markdown_cell("## " + section["title"]))
        for code_block in section["code"].split("\n\n"):
            nb["cells"].append(nbf.new_code_cell(code_block))
+
+    # Post process notebook for hanging cells: merge hanging cell with the previous cell
+    nb_cells = []


It may be worth renaming this variable to avoid confusion with nb["cells"].

Suggested change

nb_cells = []

merged_cells = []

dlqqq · 2024-12-18T22:48:50Z

packages/jupyter-ai/jupyter_ai/chat_handlers/generate.py

+    # Post process notebook for hanging cells: merge hanging cell with the previous cell
+    nb_cells = []
+    for cell in nb["cells"]:
+        if (cell["cell_type"] == "code") and (cell["source"][0] == " "):


The True branch requires nb_cells[-1] to exist, otherwise it will throw an exception. The condition cell["source"][0] == " " also requires cell["source"] to have at least 1 char, which is not guaranteed.

The condition can be updated to avoid these edge cases:

Suggested change

if (cell["cell_type"] == "code") and (cell["source"][0] == " "):

follows_code_cell = nb_cells and nb_cells[-1]["cell_type"] == "code"

is_incomplete = cell["cell_type"] == "code" and cell["source"].startswith(" ")

if follows_code_cell and is_incomplete:

Note that nb_cells should be renamed to merged_cells if the previous suggestion is accepted.

Made all suggested changes! Thanks for the help.

dlqqq · 2024-12-18T23:00:50Z

@srdas One other question: with this change, are you able to run every notebook generated via /generate? It would be helpful to identify & document what issues still exist with /generate so that they can be fixed in the future.

for more information, see https://pre-commit.ci

srdas · 2024-12-19T00:57:31Z

During testing across several different /generate use cases, one more issue seemed to occur (though somewhat rarely) -- there are a few cells that contains markdown text but the cell_type is "code" instead. Therefore, added another sweep through the cells to detect these cases and flip the cell_type to markdown.

Additional code to handle this is a new function:

And a second post-processing pass through the notebook to fix cells that should be markdown:

By adding print statements, this was checked as shown below, the cell_type before and after the rectification is shown:

See also the corresponding fix in the notebook:

The last example:

Notebooks usually run from end to end in most cases. The two cases when they do not are unrelated to the /generate code. These are:

A missing package
The function in the package being used is deprecated and the generated code is still using the old version. This is simply the fact that the LLM is not up to date.

Update to ensure no hanging code cells in generated notebooks

9a48bd9

srdas added the bug Something isn't working label Dec 18, 2024

pre-commit-ci bot and others added 4 commits December 18, 2024 18:27

[pre-commit.ci] auto fixes from pre-commit.com hooks

4feb4d1

for more information, see https://pre-commit.ci

Update generate.py

1b05277

Merge branch 'main' into fix_generate_hanging_code_cells

bf63ab9

Merge branch 'main' into fix_generate_hanging_code_cells

debb571

srdas marked this pull request as ready for review December 18, 2024 22:30

dlqqq changed the title ~~Update /generate to ensure no splitting of class or function code across cells in generated notebooks~~ Update /generate to not split classes & functions across cells Dec 18, 2024

srdas requested a review from dlqqq December 18, 2024 22:48

dlqqq requested changes Dec 18, 2024

View reviewed changes

srdas and others added 2 commits December 18, 2024 16:48

Update generate.py

2083eda

[pre-commit.ci] auto fixes from pre-commit.com hooks

3dbc1dc

for more information, see https://pre-commit.ci

srdas requested a review from dlqqq December 19, 2024 00:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `/generate` to not split classes & functions across cells #1158

Update `/generate` to not split classes & functions across cells #1158

srdas commented Dec 18, 2024 •

edited

Loading

dlqqq left a comment

dlqqq Dec 18, 2024

dlqqq Dec 18, 2024

srdas Dec 19, 2024

dlqqq commented Dec 18, 2024

srdas commented Dec 19, 2024 •

edited

Loading

-        if (cell["cell_type"] == "code") and (cell["source"][0] == " "):
+        follows_code_cell = nb_cells and nb_cells[-1]["cell_type"] == "code"
+        is_incomplete = cell["cell_type"] == "code" and cell["source"].startswith(" ")
+        if follows_code_cell and is_incomplete:

Update /generate to not split classes & functions across cells #1158

Are you sure you want to change the base?

Update /generate to not split classes & functions across cells #1158

Conversation

srdas commented Dec 18, 2024 • edited Loading

dlqqq left a comment

Choose a reason for hiding this comment

dlqqq Dec 18, 2024

Choose a reason for hiding this comment

dlqqq Dec 18, 2024

Choose a reason for hiding this comment

srdas Dec 19, 2024

Choose a reason for hiding this comment

dlqqq commented Dec 18, 2024

srdas commented Dec 19, 2024 • edited Loading

Update `/generate` to not split classes & functions across cells #1158

Update `/generate` to not split classes & functions across cells #1158

srdas commented Dec 18, 2024 •

edited

Loading

srdas commented Dec 19, 2024 •

edited

Loading