Skip to content

6.0.0

Compare
Choose a tag to compare
@ibis-project-bot ibis-project-bot released this 05 Jul 15:05

6.0.0 (2023-07-05)

⚠ BREAKING CHANGES

  • imports: Use of ibis.udf as a module is removed. Use ibis.legacy.udf instead.

  • The minimum supported Python version is now Python 3.9

  • api: group_by().count() no longer automatically names the count aggregation count. Use relabel to rename columns.

  • backends: Backend.ast_schema is removed. Use expr.as_table().schema() instead.

  • snowflake/postgres: Postgres UDFs now use the new @udf.scalar.python API. This should be a low-effort replacement for the existing API.

  • ir: ops.NullLiteral is removed

  • datatypes: dt.Interval has no longer a default unit, dt.interval is removed

  • deps: snowflake-connector-python's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgrade snowflake-connector-python to at least version 3.0.2.

  • api: Table.difference(), Table.intersection(), and Table.union() now require at least one argument.

  • postgres: Ibis no longer automatically defines first/last reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of the pgxn implementations instead.

  • api: ibis.examples.<example-name>.fetch no longer forwards arbitrary keyword arguments to read_csv/read_parquet.

  • datatypes: dt.Interval.value_type attribute is removed

  • api: Table.count() is no longer automatically named "count". Use Table.count().name("count") to achieve the previous behavior.

  • trino: The trino backend now requires at least version 0.321 of the trino Python package.

  • backends: removed AlchemyTable, AlchemyDatabase, DaskTable, DaskDatabase, PandasTable, PandasDatabase, PySparkDatabaseTable, use ops.DatabaseTable instead

  • dtypes: temporal unit enums are now available under ibis.common.temporal instead of ibis.common.enums.

  • clickhouse: external_tables can no longer be passed in ibis.clickhouse.connect. Pass external_tables directly in raw_sql/execute/to_pyarrow/to_pyarrow_batches().

  • datatypes: dt.Set is now an alias for dt.Array

  • bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.

  • impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use raw_sql if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on.

  • api: Column.first()/Column.last() are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function in select-based APIs should function unchanged.

  • bigquery: when using the bigquery backend, casting float to int
    will no longer round floats to the nearest integer

  • ops.Hash: The hash method on table columns on longer accepts
    the how argument. The hashing functions available are highly
    backend-dependent and the intention of the hash operation is to provide
    a fast, consistent (on the same backend, only) integer value.
    If you have been passing in a value for how, you can remove it and you
    will get the same results as before, as there were no backends with
    multiple hash functions working.

  • duckdb: Some CSV files may now have headers that did not have them previously. Set header=False to get the previous behavior.

  • deps: New environments will have a different default setting for compression in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Install clickhouse-cityhash and lz4 to preserve the previous behavior.

  • api: Table.set_column() is removed; use Table.mutate(name=expr) instead

  • api: the suffixes argument in all join methods has been removed in favor of lname/rname args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass in lname="{name}_x", rname="{name}_y".

  • ir: IntervalType.unit is now an enum instead of a string

  • type-system: Inferred types of Python objects may be slightly different. Ibis now use pyarrow to infer the column types of pandas DataFrame and other types.

  • backends: path argument of Backend.connect() is removed, use the database argument instead

  • api: removed Table.sort_by() and Table.groupby(), use .order_by() and .group_by() respectively

  • datatypes: DataType.scalar and column class attributes are now strings.

  • backends: Backend.load_data(), Backend.exists_database() and Backend.exists_table() are removed

  • ir: Value.summary() and NumericValue.summary() are removed

  • schema: Schema.merge() is removed, use the union operator schema1 | schema2 instead

  • api: ibis.sequence() is removed

  • drop support for Python 3.8 (747f4ca)

Features

  • add dask windowing (9cb920a)
  • add easy type hints to GroupBy (da330b1)
  • add microsecond method to TimestampValue and TimeValue (e9df2da)
  • api: add __dataframe__ implementation (b3d9619)
  • api: add ALL_CAPS option to Table.relabel (c0b30e2)
  • api: add first/last reduction APIs (8c01980)
  • api: add zip operation and api (fecf695)
  • api: allow passing multiple keyword arguments to ibis.interval (22ee854)
  • api: better repr and pickle support for deferred expressions (2b1ec9c)
  • api: exact median (c53031c)
  • api: raise better error on column name collision in joins (e04c38c)
  • api: replace suffixes in join with lname/rname (3caf3a1)
  • api: support abstract type names in selectors.of_type (f6d2d56)
  • api: support list of strings and single strings in the across selector (a6b60e7)
  • api: use create_table to load example data (42e09a4)
  • bigquery: add client and storage_client params to connect (4cf1354)
  • bigquery: enable group_concat over windows (d6a1117)
  • cast: add table-level try_cast (5e4d16b)
  • clickhouse: add array zip impl (efba835)
  • clickhouse: move to clickhouse supported Python client (012557a)
  • clickhouse: set default engine to native file (29815fa)
  • clickhouse: support pyarrow decimal types (7472dd5)
  • common: add a pure python egraph implementation (aed2ed0)
  • common: add pattern matchers (b515d5c)
  • common: add support for start parameter in StringFind (31ce741)
  • common: add Topmost and Innermost pattern matchers (90b48fc)
  • common: implement copy protocol for Immutable base class (e61c66b)
  • create_table: support pyarrow Table in table creation (9dbb25c)
  • datafusion: add string functions (66c0afb)
  • datafusion: add support for scalar pyarrow UDFs (45935b7)
  • datafusion: minimal decimal support (c550780)
  • datafusion: register tables and datasets in datafusion (cb2cc58)
  • datatypes: add support for decimal values with arrow-based APIs (b4ba6b9)
  • datatypes: support creating Timestamp from units (66f2ff0)
  • deps: load examples lazily (4ea0ddb)
  • duckdb: add attach_sqlite method (bd32649)
  • duckdb: add support for native and pyarrow UDFs (7e56fc4)
  • duckdb: expand map support to .values() and map concatenation (ad49a09)
  • duckdb: set header=True by default (e4b515d)
  • duckdb: support 0.8.0 (ae9ae7d)
  • duckdb: support array zip operation (2d14ccc)
  • duckdb: support motherduck (053dc7e)
  • duckdb: warn when querying an already consumed RecordBatchReader (5a013ff)
  • flink: add initial flink SQL compiler (053a6d2)
  • formats: support timestamps in delta output; default to micros for pyarrow conversion (d8d5710)
  • implement read_delta and to_delta for some backends (74fc863)
  • implement read_delta for datafusion (eb4602f)
  • implement try_cast for a few backends (f488f0e)
  • io: add to_torch API (685c8fc)
  • io: add az/gs prefixes to normalize_filename in utils (e9eebba)
  • mysql: add re_extract (5ed40e1)
  • oracle: add oracle backend (c9b038b)
  • oracle: support temporary tables (6e64cd0)
  • pandas: add approx_median (6714b9f)
  • pandas: support passing memtables to create_table (3ea9a21)
  • polars: add any and all reductions (0bd3c01)
  • polars: add argmin and argmax (78562d3)
  • polars: add correlation operation (05ff488)
  • polars: add polars support for identical_to (aab3bae)
  • polars: add support for offset, binary literals, and dropna(how='all') (d2298e9)
  • polars: allow seamless connection for DataFrame as well as LazyFrame (a2a3e45)
  • polars: implement .sql methods (86f2a34)
  • polars: lower-latency column return for non-temporal results (b009563)
  • polars: support pyarrow decimal types (7e6c365)
  • polars: support SQL dialect translation (c87f695)
  • polars: support table registration from multiple parquet files (9c0a8be)
  • postgres: add ApproxMedian aggregation (887f572)
  • pyspark: add zip array impl (6c00cbc)
  • snowflake/postgres: scalar UDFs (dbf5b62)
  • snowflake: implement array zip (839e1f0)
  • snowflake: implement proper approx median (b15a6fe)
  • snowflake: support SSO and other forms of passwordless authentication (23ac53d)
  • snowflake: use the client python version as the UDF runtime where possible (69a9101)
  • sql: allow any SQL dialect accepted by sqlgllot in Table.sql and Backend.sql (f38c447)
  • sqlite: add argmin and argmax functions (c8af9d4)
  • sqlite: add arithmetic mode aggregation (6fcac44)
  • sqlite: add ops.DateSub, ops.DateAdd, ops.DateDiff (cfd65a0)
  • streamlit: add support for streamlit connection interface (05c9449)
  • trino: implement zip (cd11daa)

Bug Fixes

  • add issue write permission to assign.yml (9445cee)
  • alchemy: close the cursor on error during dataframe construction (cc7dffb)
  • backends: fix capitalize to lowercase subsequent characters (49978f9)
  • backends: fix notall/notany translation (56b56b3)
  • bigquery: add srid=4326 to the geography dtype mapping (57a825b)
  • bigquery: allow passing both schema and obj in create_table (49cc2c4)
  • bigquery: bigquery timestamp and datetime dtypes (067e8a5)
  • bigquery: ensure that bigquery temporal ops work with the new timeunit/dateunit/intervalunit enums (0e00d86)
  • bigquery: ensure that generated names are used when compiling columns and allow flexible column names (c7044fe)
  • bigquery: fix table naming from count rename removal refactor (5b009d2)
  • bigquery: raise OperationNotDefinedError for IntervalAdd and IntervalSubtract (501aaf7)
  • bigquery: support capture group functionality (3f4f05b)
  • bigquery: truncate when casting float to int (267d8e1)
  • ci: use mariadb-admin instead of mysqladmin in mariadb 11.x (d4ccd3d)
  • clickhouse: avoid generating names for structs (5d11f48)
  • clickhouse: clean up external tables per query to avoid leaking them across queries (6d32edd)
  • clickhouse: close cursors more aggressively (478a40f)
  • clickhouse: use correct functions for milli and micro extraction (49b3136)
  • clickhouse: use named rather than positional group by (1f7e309)
  • clickhouse: use the correct dialect to generate subquery string for Contains operation (f656bd5)
  • common: fix bug in re_extract (6ebaeab), closes #6167
  • core: interval resolution should upcast to smallest unit (f7f844d), closes #6139
  • datafusion: fix incorrect order of predicate -> select compilation (0092304)
  • deps: make pyarrow a required dependency (b217cde)
  • deps: prevent vulnerable snowflake-connector-python versions (6dedb45)
  • deps: support multipledispatch version 1 (805a7d7)
  • deps: update dependency atpublic to v4 (3a44755)
  • deps: update dependency datafusion to v22 (15d8d11)
  • deps: update dependency datafusion to v23 (e4d666d)
  • deps: update dependency datafusion to v24 (c158b78)
  • deps: update dependency datafusion to v25 (c3a6264)
  • deps: update dependency datafusion to v26 (7e84ffe)
  • deps: update dependency deltalake to >=0.9.0,<0.11.0 (9817a83)
  • deps: update dependency pyarrow to v12 (3cbc239)
  • deps: update dependency sqlglot to v12 (5504bd4)
  • deps: update dependency sqlglot to v13 (1485dd0)
  • deps: update dependency sqlglot to v14 (9c40c06)
  • deps: update dependency sqlglot to v15 (f149729)
  • deps: update dependency sqlglot to v16 (46601ef)
  • deps: update dependency sqlglot to v17 (9b50fb4)
  • docs: fix failing doctests (04b9f19)
  • docs: typo in code without selectors (b236893)
  • docs: typo in docstrings and comments (0d3ed86)
  • docs: typo in snowflake do_connect kwargs (671bc31)
  • duckdb: better types for null literals (7b9d85e)
  • duckdb: disable map values and map merge for columns (b5472b3)
  • duckdb: ensure to_timestamp returns a UTC timestamp (0ce0b9f)
  • duckdb: ensure connection lifetime is greater than or equal to record batch reader lifetime (6ed353e)
  • duckdb: ensure that quoted struct field names work (47de1c3)
  • duckdb: ensure that types are inferred correctly across duckdb_engine versions (9c3d173)
  • duckdb: fix check for literal maps (b2b229b)
  • duckdb: fix exporting pyarrow record batches by bumping duckdb to 0.8.1 (aca52ab)
  • duckdb: fix read_csv problem with kwargs (6f71735), closes #6190
  • examples: move lockfile creation to data directory (b8f6e6b)
  • examples: use filelock to prevent pooch from clobbering files when fetching concurrently (e14662e)
  • expr: fix graphviz rendering (6d4a34f)
  • impala: do not cast ca_cert None value to string (bfdfb0e)
  • impala: expose hdfs_connect function as ibis.impala.hdfs_connect (27a0d12)
  • impala: more aggressively clean up cursors internally (bf5687e)
  • impala: replace time_mapping with TIME_MAPPING and backwards compatible check (4c3ca20)
  • ir: force an alias if projecting or aggregating columns (9fb1e88)
  • ir: raise Exception for group by with no keys (845f7ab), closes #6237
  • mssql: dont yield from inside a cursor (4af0731)
  • mysql: do not fail when we cannot set the session timezone (930f8ab)
  • mysql: ensure enum string functions are coerced to the correct type (e499c7f)
  • mysql: ensure that floats and double do not come back as Python Decimal objects (a3c329f)
  • mysql: fix binary literals (e081252)
  • mysql: handle the zero timestamp value (9ac86fd)
  • operations: ensure that self refs have a distinct name from the table they are referencing (bd8eb88)
  • oracle: disable autoload when cleaning up temp tables (b824142)
  • oracle: disable statement cache (41d3857)
  • oracle: disable temp tables to get inserts working (f9985fe)
  • pandas, dask: allow overlapping non-predicate columns in asof join (09e26a0)
  • pandas: fix first and last over windows (9079bc4), closes #5417
  • pandas: fix string translate function (12b9569), closes #6157
  • pandas: grouped aggregation using a case statement (d4ac345)
  • pandas: preserve RHS values in asof join when column names collide (4514668)
  • pandas: solve problem with first and last window function (dfdede5), closes #4918
  • polars: avoid implode deprecation warning (ce3bdad)
  • polars: ensure that to_pyarrow is called from the backend (41bacf2)
  • polars: make list column operations backwards compatible (35fc5f7)
  • postgres: ensure that alias method overwrites view even if types are different (7d5845b)
  • postgres: ensure that backend still works when create/drop first/last aggregates fails (eb5d534)
  • pyspark: enable joining on columns with different names as well as complex predicates (dcee821)
  • snowflake: always use pyarrow for memtables (da34d6f)
  • snowflake: ensure connection lifetime is greater than or equal to record batch reader lifetime (34a0c59)
  • snowflake: ensure that _pandas_converter attribute is resolved correctly (9058bbe)
  • snowflake: ensure that temp tables are only created once (43b8152)
  • snowflake: ensure unnest works for nested struct/object types (fc6ffc2)
  • snowflake: ensure use of the right timezone value (40426bf)
  • snowflake: fix tmpdir construction for python <3.10 (a507ae2)
  • snowflake: fix incorrect arguments to snowflake regexp_substr (9261f70)
  • snowflake: fix invalid attribute access when using pyarrow (bfd90a8)
  • snowflake: handle broken upstream behavior when a table can't be found (31a8366)
  • snowflake: resolve import error from interval datatype refactor (3092012)
  • snowflake: use convert_timezone for timezone conversion instead of invalid postgres AT TIME ZONE syntax (1595e7b)
  • sqlalchemy: ensure that backends don't clobber tables needed by inputs (76e38a3)
  • sqlalchemy: ensure that union_all-generated memtables use the correct column names (a4f546b)
  • sqlalchemy: prepend the table's schema when querying metadata (d8818e2)
  • sqlalchemy: quote struct field names (f5c91fc)
  • tests: ensure that record batch readers are cleaned up (d230a8d)
  • trino: bump lower bound to avoid having to handle experimental_python_types (bf6eeab)
  • trino: ensure that nested array types are inferred correctly (030f76d)
  • trino: fix incorrect version computation (04d3a89)
  • trino: support trino 0.323 special tuple type for struct results (ea1529d)
  • type-system: infer in-memory object types using pyarrow (f7018ee)
  • typehint: update type hint for class instance (2e1e14f)

Documentation

  • across: add documentation for across (b8941d3)
  • add allowed input for memtable constructor (69cdee5)
  • add disclaimer on no row order guarantees (75dd8b0)
  • add examples to if_any and if_all (5015677)
  • add platform comment in conda env creation (e38eacb)
  • add read_delta and related to backends docs (90eaed2)
  • api: ensure all top-level items have a description (c83d783)
  • api: hide dunder methods in API docs (6724b7b)
  • api: manually add inherited mixin methods to timey classes (7dbc96d)
  • api: show source for classes to allow dunder method inspection (4cef0f8)
  • backends: fix typo in pip install command (6a7207c)
  • bigquery: add connection explainer to bigquery backend docs (84caa5b)
  • blog: add Ibis + PyTorch + DuckDB blog post (1ad946c)
  • change plural variable name cols to col (c33a3ed), closes #6115
  • clarify map refers to Python Mapping container (f050a61)
  • css: enable code block copy button, don't select prompt (3510abe)
  • de-template remaining backends (except pandas, dask, impala) (82b7408)
  • describe NULL differences with pandas (688b293)
  • dev-env: remove python 3.8 from environment support matrix (4f89565)
  • drop docker-compose install for conda dev env setup (e19924d)
  • duckdb: add quick explainer on connecting to motherduck (4ef710e)
  • file support: add badge and docstrings for read_* methods (0767b7c)
  • fill out more docstrings (dc0289c)
  • fix errors and add 'table' before 'expression' (096b568)
  • fix some redirects (3a23c1f)
  • fix typo in Table.relabel return description (05cc51e)
  • generic: add docstring examples in types/generic (1d87292)
  • guides: add brief installation instructions at top of notebooks (dc3e694)
  • guides: update ibis-for-dplyr-users.ipynb with latest (1aa172e), closes #6125
  • improve docstrings for BooleanValue and BoleanColumn (30c1009)
  • improve docstrings to map types (72a49b0)
  • install: add quotes to all bracketed installs for shell compatibility (bb5c075)
  • intersphinx: add mapping to autolink pyarrow and pandas refs (cd92019)
  • intro: create Ibis for dplyr users document (e02a6f2)
  • introguides: use DuckDB for intro pandas notebook, remove iris (a7e845a)
  • link to Ibis for dplyr users (6e7c6a2)
  • make pandas.md filename lowercase (4937d45)
  • more group_by() and NULL in pandas guide (486b696)
  • more spelling fixes (564abbe)
  • move API docs to top-level (dcc409f)
  • numeric: add examples to numeric methods (39b470f)
  • oracle: add basic backend documentation (c871790)
  • oracle: add oracle to matrix (89aecf2)
  • python-versions: document how we decide to drop support for Python versions (3474dbc)
  • redirect Pandas to pandas (4074284)
  • remove trailing whitespace (63db643)
  • reorder sections in pandas guide (3b66093)
  • restructure and consistency (351d424)
  • snowflake: add connection explainer to snowflake backend docs (a62bbcd)
  • streamlit: fix ibis-framework install (a8cf773)
  • update copyright and some minor edits (b9aed44)
  • update notany/notall docstrings with arg (a5ec986), closes #5993
  • update structs and fix constructor docstrings (493437a)
  • use lowercase pandas (19b5d10)
  • use to_pandas instead of execute (882949e)

Refactors

  • alchemy: abstract out custom type mapping and fix sqlite (d712e2e)
  • api: consolidate ibis.date(), ibis.time() and ibis.timestamp() functions (20f71bf)
  • api: enforce at least one argument for Table set operations (57e948f)
  • api: remove automatic count name from relations (2cb19ec)
  • api: remove automatic group by count naming (15d9e50)
  • api: remove deprecated ibis.sequence() function (de0bf69)
  • api: remove deprecated Table.set_column() method (aa5ed94)
  • api: remove deprecated Table.sort_by() and Table.groupby() methods (1316635)
  • backends: remove ast_schema method (51b5ef8)
  • backends: remove backend specific DatabaseTable operations (d1bab97)
  • backends: remove deprecated Backend.load_data(), .exists_database() and .exists_table() methods (755555f)
  • backends: remove deprecated path argument of Backend.connect() (6737ea8)
  • bigquery: align datatype conversions with the new convention (70b8232)
  • bigquery: support a broader range of interval units in temporal binary operations (f78ce73)
  • common: add sanity checks for creating ENodes and Patterns (fc89cc3)
  • common: cleanup unit conversions (73de24e)
  • common: disallow unit conversions between days and hours (5619ce0)
  • common: move ibis.collections.DisjointSet to ibis.common.egraph (07dde21)
  • common: move tests for re_extract to general suite (acd1774)
  • common: use an enum as a sentinel value instead of NoMatch class (6674353), closes #6049
  • dask/pandas: align datatype conversions with the new convention (cecc24c)
  • datatypes: make pandas conversion backend specific if needed (544d27c)
  • datatypes: normalize interval values to integers (80a40ab)
  • datatypes: remove Set() in favor of Array() datatype (30a4f7e)
  • datatypes: remove value_type parametrization of the Interval datatype (463cdc3)
  • datatypes: remove direct ir dependency from datatypes (d7f0be0)
  • datatypes: use typehints instead of rules (704542e)
  • deps: remove optional dependency on clickhouse-cityhash and lz4 (736fe26)
  • dtypes: add normalize_datetime() and normalize_timezone() common utilities (c00ab38)
  • dtypes: turn dt.dtype() into lazily dispatched factory function (5261003)
  • formats: consolidate the dataframe conversion logic (53ed88e)
  • formats: encapsulate conversions to TypeMapper, SchemaMapper and DataMapper subclasses (ab35311)
  • formats: introduce a standalone subpackage to deal with common in-memory formats (e8f45f5)
  • impala: rely on impyla cursor for _wait_synchronous (a1b8736)
  • imports: move old UDF implementation to ibis.legacy module (cf93d5d)
  • ir: encapsulate temporal unit handling in enums (1b8fa7b)
  • ir: remove rlz.column_from, rlz.base_table_of and rlz.function_of rules (ed71d51)
  • ir: remove deprecated Value.summary() and NumericValue.summary() expression methods (6cd8050)
  • ir: remove redundant ops.NullLiteral() operation (a881703)
  • ir: simplify Expr._find_backends() implementation by using the ibis.common.graph utilities (91ff8d4)
  • ir: use dt.normalize() to construct literals (bf72f16)
  • ops.Hash: remove how from backend-specific hash operation (46a55fc)
  • pandas: solve and remove stale TODOs (92d979e)
  • polars: align datatype conversion functions with the new convention (5d61159)
  • postgres: fail at execute time for UDFs to avoid db connections in .compile() (e3a4d4d)
  • pyspark: align datatype conversion functions with the new convention (3437bb6)
  • pyspark: remove useless window branching in compiler (ad08da4)
  • replace custom _merge using pd.merge (fe74f76)
  • schema: remove deprecated Schema.merge() method (d307722)
  • schema: use type annotations instead of rules (98cd539)
  • snowflake: add flags to supplemental JavaScript UDFs (054add4)
  • sql: align datatype conversions with the new convention (0ef145b)
  • sqlite: remove roundtripping for DayOfWeekIndex and DayOfWeekName (b5a2bc5)
  • test: cleanup test data (7ae2b24)
  • to-pyarrow-batches: ensure that batch readers are always closed and exhausted (35a391f)
  • trino: always clean up prepared statements created when accessing query metadata (4f3a4cd)
  • util: use base32 to compress uuid table names (ba039a3)

Performance

  • imports: speed up checking for geospatial support (aa601af)
  • snowflake: use pyarrow for all transport (1fb89a1)
  • sqlalchemy: lazily construct the inspector object (8db5624)

Deprecations

  • api: deprecate tuple syntax for order by keys (5ed5110)