[SPARK-50618] Make DataFrameReader and DataStreamReader leverage the analyzer more #49238

brkyvz · 2024-12-19T00:14:46Z

What changes were proposed in this pull request?

Introduces two logical nodes:

UnresolvedDataSource
UnresolvedJDBCRelation

The DataFrameReader and DataStreamReader creates these unresolved nodes instead, and calls the analyzer to resolve these data sources. These then get analyzed as part of the ResolveDataSource rule. All logic in DataFrameReader and DataStreamReader has been moved here.

There is still logic around text based format parsing on an existing Dataset. I will refactor this in a subsequent PR.

Why are the changes needed?

The DataFrameReader and DataStreamReader typically creates analyzed relations as part of their respective .load() methods.

This creates inconsistencies for what rules get applied to the query plan as part of Catalyst depending on your API of choice, such as SQL vs Python or SQL vs Scala.

The goal of this Jira is to refactor the logic in DataFrameReader and DataStreamReader classes to create unresolved plans that get analyzed as part of Catalyst.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests and will add new tests

Was this patch authored or co-authored using generative AI tooling?

No

Make DataFrameReader and DataStreamReader leverage the analyzer more

ed37858

github-actions bot added SQL STRUCTURED STREAMING labels Dec 19, 2024

Burak Yavuz added 4 commits December 18, 2024 16:16

forgot

e1014ca

Merge branch 'master' of github.com:apache/spark into unresolvedDS

29f286f

Fix merge conflicts

1534cad

Added unit tests

853a13e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50618] Make DataFrameReader and DataStreamReader leverage the analyzer more #49238

[SPARK-50618] Make DataFrameReader and DataStreamReader leverage the analyzer more #49238

brkyvz commented Dec 19, 2024

[SPARK-50618] Make DataFrameReader and DataStreamReader leverage the analyzer more #49238

Are you sure you want to change the base?

[SPARK-50618] Make DataFrameReader and DataStreamReader leverage the analyzer more #49238

Conversation

brkyvz commented Dec 19, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?