6.0.0
6.0.0 (2023-07-05)
⚠ BREAKING CHANGES
-
imports: Use of
ibis.udf
as a module is removed. Useibis.legacy.udf
instead. -
The minimum supported Python version is now Python 3.9
-
api:
group_by().count()
no longer automatically names the count aggregationcount
. Userelabel
to rename columns. -
backends:
Backend.ast_schema
is removed. Useexpr.as_table().schema()
instead. -
snowflake/postgres: Postgres UDFs now use the new
@udf.scalar.python
API. This should be a low-effort replacement for the existing API. -
ir:
ops.NullLiteral
is removed -
datatypes:
dt.Interval
has no longer a default unit,dt.interval
is removed -
deps:
snowflake-connector-python
's lower bound was increased to 3.0.2, the minimum version needed to avoid a high-severity vulernability. Please upgradesnowflake-connector-python
to at least version 3.0.2. -
api:
Table.difference()
,Table.intersection()
, andTable.union()
now require at least one argument. -
postgres: Ibis no longer automatically defines
first
/last
reductions on connection to the postgres backend. Use DDL shown in https://wiki.postgresql.org/wiki/First/last_(aggregate) or one of thepgxn
implementations instead. -
api:
ibis.examples.<example-name>.fetch
no longer forwards arbitrary keyword arguments toread_csv
/read_parquet
. -
datatypes:
dt.Interval.value_type
attribute is removed -
api:
Table.count()
is no longer automatically named"count"
. UseTable.count().name("count")
to achieve the previous behavior. -
trino: The trino backend now requires at least version 0.321 of the
trino
Python package. -
backends: removed
AlchemyTable
,AlchemyDatabase
,DaskTable
,DaskDatabase
,PandasTable
,PandasDatabase
,PySparkDatabaseTable
, useops.DatabaseTable
instead -
dtypes: temporal unit enums are now available under
ibis.common.temporal
instead ofibis.common.enums
. -
clickhouse:
external_tables
can no longer be passed inibis.clickhouse.connect
. Passexternal_tables
directly inraw_sql
/execute
/to_pyarrow
/to_pyarrow_batches()
. -
datatypes:
dt.Set
is now an alias fordt.Array
-
bigquery: Before this change, ibis timestamp is mapping to Bigquery TIMESTAMP type and no timezone supports. However, it's not correct, BigQuery TIMESTAMP type should have UTC timezone, while DATETIME type is the no timezone version. Hence, this change is breaking the ibis timestamp mapping to BigQuery: If ibis timestamp has the UTC timezone, will map to BigQuery TIMESTAMP type. If ibis timestamp has no timezone, will map to BigQuery DATETIME type.
-
impala: Cursors are no longer returned from DDL operations to prevent resource leakage. Use
raw_sql
if you need specialized operations that return a cursor. Additionally, table-based DDL operations now return the table they're operating on. -
api:
Column.first()
/Column.last()
are now reductions by default. Code running these expressions in isolation will no longer be windowed over the entire table. Code using this function inselect
-based APIs should function unchanged. -
bigquery: when using the bigquery backend, casting float to int
will no longer round floats to the nearest integer -
ops.Hash: The
hash
method on table columns on longer accepts
thehow
argument. The hashing functions available are highly
backend-dependent and the intention of the hash operation is to provide
a fast, consistent (on the same backend, only) integer value.
If you have been passing in a value forhow
, you can remove it and you
will get the same results as before, as there were no backends with
multiple hash functions working. -
duckdb: Some CSV files may now have headers that did not have them previously. Set
header=False
to get the previous behavior. -
deps: New environments will have a different default setting for
compression
in the ClickHouse backend due to removal of optional dependencies. Ibis is still capable of using the optional dependencies but doesn't include them by default. Installclickhouse-cityhash
andlz4
to preserve the previous behavior. -
api:
Table.set_column()
is removed; useTable.mutate(name=expr)
instead -
api: the
suffixes
argument in all join methods has been removed in favor oflname
/rname
args. The default renaming scheme for duplicate columns has also changed. To get the exact same behavior as before, pass inlname="{name}_x", rname="{name}_y"
. -
ir:
IntervalType.unit
is now an enum instead of a string -
type-system: Inferred types of Python objects may be slightly different. Ibis now use
pyarrow
to infer the column types of pandas DataFrame and other types. -
backends:
path
argument ofBackend.connect()
is removed, use thedatabase
argument instead -
api: removed
Table.sort_by()
andTable.groupby()
, use.order_by()
and.group_by()
respectively -
datatypes:
DataType.scalar
andcolumn
class attributes are now strings. -
backends:
Backend.load_data()
,Backend.exists_database()
andBackend.exists_table()
are removed -
ir:
Value.summary()
andNumericValue.summary()
are removed -
schema:
Schema.merge()
is removed, use the union operatorschema1 | schema2
instead -
api:
ibis.sequence()
is removed -
drop support for Python 3.8 (747f4ca)
Features
- add dask windowing (9cb920a)
- add easy type hints to GroupBy (da330b1)
- add microsecond method to TimestampValue and TimeValue (e9df2da)
- api: add
__dataframe__
implementation (b3d9619) - api: add ALL_CAPS option to Table.relabel (c0b30e2)
- api: add first/last reduction APIs (8c01980)
- api: add zip operation and api (fecf695)
- api: allow passing multiple keyword arguments to
ibis.interval
(22ee854) - api: better repr and pickle support for deferred expressions (2b1ec9c)
- api: exact median (c53031c)
- api: raise better error on column name collision in joins (e04c38c)
- api: replace
suffixes
injoin
withlname
/rname
(3caf3a1) - api: support abstract type names in
selectors.of_type
(f6d2d56) - api: support list of strings and single strings in the
across
selector (a6b60e7) - api: use
create_table
to load example data (42e09a4) - bigquery: add client and storage_client params to connect (4cf1354)
- bigquery: enable group_concat over windows (d6a1117)
- cast: add table-level try_cast (5e4d16b)
- clickhouse: add array zip impl (efba835)
- clickhouse: move to clickhouse supported Python client (012557a)
- clickhouse: set default engine to native file (29815fa)
- clickhouse: support pyarrow decimal types (7472dd5)
- common: add a pure python egraph implementation (aed2ed0)
- common: add pattern matchers (b515d5c)
- common: add support for start parameter in StringFind (31ce741)
- common: add Topmost and Innermost pattern matchers (90b48fc)
- common: implement copy protocol for Immutable base class (e61c66b)
- create_table: support pyarrow Table in table creation (9dbb25c)
- datafusion: add string functions (66c0afb)
- datafusion: add support for scalar pyarrow UDFs (45935b7)
- datafusion: minimal decimal support (c550780)
- datafusion: register tables and datasets in datafusion (cb2cc58)
- datatypes: add support for decimal values with arrow-based APIs (b4ba6b9)
- datatypes: support creating Timestamp from units (66f2ff0)
- deps: load examples lazily (4ea0ddb)
- duckdb: add attach_sqlite method (bd32649)
- duckdb: add support for native and pyarrow UDFs (7e56fc4)
- duckdb: expand map support to
.values()
and map concatenation (ad49a09) - duckdb: set
header=True
by default (e4b515d) - duckdb: support 0.8.0 (ae9ae7d)
- duckdb: support array zip operation (2d14ccc)
- duckdb: support motherduck (053dc7e)
- duckdb: warn when querying an already consumed RecordBatchReader (5a013ff)
- flink: add initial flink SQL compiler (053a6d2)
- formats: support timestamps in delta output; default to micros for pyarrow conversion (d8d5710)
- implement read_delta and to_delta for some backends (74fc863)
- implement read_delta for datafusion (eb4602f)
- implement try_cast for a few backends (f488f0e)
- io: add
to_torch
API (685c8fc) - io: add az/gs prefixes to normalize_filename in utils (e9eebba)
- mysql: add re_extract (5ed40e1)
- oracle: add oracle backend (c9b038b)
- oracle: support temporary tables (6e64cd0)
- pandas: add approx_median (6714b9f)
- pandas: support passing memtables to
create_table
(3ea9a21) - polars: add any and all reductions (0bd3c01)
- polars: add argmin and argmax (78562d3)
- polars: add correlation operation (05ff488)
- polars: add polars support for
identical_to
(aab3bae) - polars: add support for
offset
, binary literals, anddropna(how='all')
(d2298e9) - polars: allow seamless connection for DataFrame as well as LazyFrame (a2a3e45)
- polars: implement
.sql
methods (86f2a34) - polars: lower-latency column return for non-temporal results (b009563)
- polars: support pyarrow decimal types (7e6c365)
- polars: support SQL dialect translation (c87f695)
- polars: support table registration from multiple parquet files (9c0a8be)
- postgres: add ApproxMedian aggregation (887f572)
- pyspark: add zip array impl (6c00cbc)
- snowflake/postgres: scalar UDFs (dbf5b62)
- snowflake: implement array zip (839e1f0)
- snowflake: implement proper approx median (b15a6fe)
- snowflake: support SSO and other forms of passwordless authentication (23ac53d)
- snowflake: use the client python version as the UDF runtime where possible (69a9101)
- sql: allow any SQL dialect accepted by sqlgllot in
Table.sql
andBackend.sql
(f38c447) - sqlite: add argmin and argmax functions (c8af9d4)
- sqlite: add arithmetic mode aggregation (6fcac44)
- sqlite: add ops.DateSub, ops.DateAdd, ops.DateDiff (cfd65a0)
- streamlit: add support for streamlit connection interface (05c9449)
- trino: implement zip (cd11daa)
Bug Fixes
- add issue write permission to assign.yml (9445cee)
- alchemy: close the cursor on error during dataframe construction (cc7dffb)
- backends: fix capitalize to lowercase subsequent characters (49978f9)
- backends: fix notall/notany translation (56b56b3)
- bigquery: add srid=4326 to the geography dtype mapping (57a825b)
- bigquery: allow passing both schema and obj in create_table (49cc2c4)
- bigquery: bigquery timestamp and datetime dtypes (067e8a5)
- bigquery: ensure that bigquery temporal ops work with the new timeunit/dateunit/intervalunit enums (0e00d86)
- bigquery: ensure that generated names are used when compiling columns and allow flexible column names (c7044fe)
- bigquery: fix table naming from
count
rename removal refactor (5b009d2) - bigquery: raise OperationNotDefinedError for IntervalAdd and IntervalSubtract (501aaf7)
- bigquery: support capture group functionality (3f4f05b)
- bigquery: truncate when casting float to int (267d8e1)
- ci: use mariadb-admin instead of mysqladmin in mariadb 11.x (d4ccd3d)
- clickhouse: avoid generating names for structs (5d11f48)
- clickhouse: clean up external tables per query to avoid leaking them across queries (6d32edd)
- clickhouse: close cursors more aggressively (478a40f)
- clickhouse: use correct functions for milli and micro extraction (49b3136)
- clickhouse: use named rather than positional group by (1f7e309)
- clickhouse: use the correct dialect to generate subquery string for Contains operation (f656bd5)
- common: fix bug in re_extract (6ebaeab), closes #6167
- core: interval resolution should upcast to smallest unit (f7f844d), closes #6139
- datafusion: fix incorrect order of predicate -> select compilation (0092304)
- deps: make pyarrow a required dependency (b217cde)
- deps: prevent vulnerable snowflake-connector-python versions (6dedb45)
- deps: support multipledispatch version 1 (805a7d7)
- deps: update dependency atpublic to v4 (3a44755)
- deps: update dependency datafusion to v22 (15d8d11)
- deps: update dependency datafusion to v23 (e4d666d)
- deps: update dependency datafusion to v24 (c158b78)
- deps: update dependency datafusion to v25 (c3a6264)
- deps: update dependency datafusion to v26 (7e84ffe)
- deps: update dependency deltalake to >=0.9.0,<0.11.0 (9817a83)
- deps: update dependency pyarrow to v12 (3cbc239)
- deps: update dependency sqlglot to v12 (5504bd4)
- deps: update dependency sqlglot to v13 (1485dd0)
- deps: update dependency sqlglot to v14 (9c40c06)
- deps: update dependency sqlglot to v15 (f149729)
- deps: update dependency sqlglot to v16 (46601ef)
- deps: update dependency sqlglot to v17 (9b50fb4)
- docs: fix failing doctests (04b9f19)
- docs: typo in code without selectors (b236893)
- docs: typo in docstrings and comments (0d3ed86)
- docs: typo in snowflake do_connect kwargs (671bc31)
- duckdb: better types for null literals (7b9d85e)
- duckdb: disable map values and map merge for columns (b5472b3)
- duckdb: ensure
to_timestamp
returns a UTC timestamp (0ce0b9f) - duckdb: ensure connection lifetime is greater than or equal to record batch reader lifetime (6ed353e)
- duckdb: ensure that quoted struct field names work (47de1c3)
- duckdb: ensure that types are inferred correctly across
duckdb_engine
versions (9c3d173) - duckdb: fix check for literal maps (b2b229b)
- duckdb: fix exporting pyarrow record batches by bumping duckdb to 0.8.1 (aca52ab)
- duckdb: fix read_csv problem with kwargs (6f71735), closes #6190
- examples: move lockfile creation to data directory (b8f6e6b)
- examples: use filelock to prevent pooch from clobbering files when fetching concurrently (e14662e)
- expr: fix graphviz rendering (6d4a34f)
- impala: do not cast
ca_cert
None
value to string (bfdfb0e) - impala: expose
hdfs_connect
function asibis.impala.hdfs_connect
(27a0d12) - impala: more aggressively clean up cursors internally (bf5687e)
- impala: replace
time_mapping
withTIME_MAPPING
and backwards compatible check (4c3ca20) - ir: force an alias if projecting or aggregating columns (9fb1e88)
- ir: raise Exception for group by with no keys (845f7ab), closes #6237
- mssql: dont yield from inside a cursor (4af0731)
- mysql: do not fail when we cannot set the session timezone (930f8ab)
- mysql: ensure enum string functions are coerced to the correct type (e499c7f)
- mysql: ensure that floats and double do not come back as Python Decimal objects (a3c329f)
- mysql: fix binary literals (e081252)
- mysql: handle the zero timestamp value (9ac86fd)
- operations: ensure that self refs have a distinct name from the table they are referencing (bd8eb88)
- oracle: disable autoload when cleaning up temp tables (b824142)
- oracle: disable statement cache (41d3857)
- oracle: disable temp tables to get inserts working (f9985fe)
- pandas, dask: allow overlapping non-predicate columns in asof join (09e26a0)
- pandas: fix first and last over windows (9079bc4), closes #5417
- pandas: fix string translate function (12b9569), closes #6157
- pandas: grouped aggregation using a case statement (d4ac345)
- pandas: preserve RHS values in asof join when column names collide (4514668)
- pandas: solve problem with first and last window function (dfdede5), closes #4918
- polars: avoid
implode
deprecation warning (ce3bdad) - polars: ensure that
to_pyarrow
is called from the backend (41bacf2) - polars: make list column operations backwards compatible (35fc5f7)
- postgres: ensure that
alias
method overwrites view even if types are different (7d5845b) - postgres: ensure that backend still works when create/drop first/last aggregates fails (eb5d534)
- pyspark: enable joining on columns with different names as well as complex predicates (dcee821)
- snowflake: always use pyarrow for memtables (da34d6f)
- snowflake: ensure connection lifetime is greater than or equal to record batch reader lifetime (34a0c59)
- snowflake: ensure that
_pandas_converter
attribute is resolved correctly (9058bbe) - snowflake: ensure that temp tables are only created once (43b8152)
- snowflake: ensure unnest works for nested struct/object types (fc6ffc2)
- snowflake: ensure use of the right timezone value (40426bf)
- snowflake: fix
tmpdir
construction for python <3.10 (a507ae2) - snowflake: fix incorrect arguments to snowflake regexp_substr (9261f70)
- snowflake: fix invalid attribute access when using pyarrow (bfd90a8)
- snowflake: handle broken upstream behavior when a table can't be found (31a8366)
- snowflake: resolve import error from interval datatype refactor (3092012)
- snowflake: use
convert_timezone
for timezone conversion instead of invalid postgresAT TIME ZONE
syntax (1595e7b) - sqlalchemy: ensure that backends don't clobber tables needed by inputs (76e38a3)
- sqlalchemy: ensure that union_all-generated memtables use the correct column names (a4f546b)
- sqlalchemy: prepend the table's schema when querying metadata (d8818e2)
- sqlalchemy: quote struct field names (f5c91fc)
- tests: ensure that record batch readers are cleaned up (d230a8d)
- trino: bump lower bound to avoid having to handle
experimental_python_types
(bf6eeab) - trino: ensure that nested array types are inferred correctly (030f76d)
- trino: fix incorrect
version
computation (04d3a89) - trino: support trino 0.323 special tuple type for struct results (ea1529d)
- type-system: infer in-memory object types using pyarrow (f7018ee)
- typehint: update type hint for class instance (2e1e14f)
Documentation
- across: add documentation for across (b8941d3)
- add allowed input for memtable constructor (69cdee5)
- add disclaimer on no row order guarantees (75dd8b0)
- add examples to
if_any
andif_all
(5015677) - add platform comment in conda env creation (e38eacb)
- add read_delta and related to backends docs (90eaed2)
- api: ensure all top-level items have a description (c83d783)
- api: hide dunder methods in API docs (6724b7b)
- api: manually add inherited mixin methods to timey classes (7dbc96d)
- api: show source for classes to allow dunder method inspection (4cef0f8)
- backends: fix typo in pip install command (6a7207c)
- bigquery: add connection explainer to bigquery backend docs (84caa5b)
- blog: add Ibis + PyTorch + DuckDB blog post (1ad946c)
- change plural variable name cols to col (c33a3ed), closes #6115
- clarify map refers to Python Mapping container (f050a61)
- css: enable code block copy button, don't select prompt (3510abe)
- de-template remaining backends (except pandas, dask, impala) (82b7408)
- describe NULL differences with pandas (688b293)
- dev-env: remove python 3.8 from environment support matrix (4f89565)
- drop
docker-compose
install for conda dev env setup (e19924d) - duckdb: add quick explainer on connecting to motherduck (4ef710e)
- file support: add badge and docstrings for
read_*
methods (0767b7c) - fill out more docstrings (dc0289c)
- fix errors and add 'table' before 'expression' (096b568)
- fix some redirects (3a23c1f)
- fix typo in Table.relabel return description (05cc51e)
- generic: add docstring examples in types/generic (1d87292)
- guides: add brief installation instructions at top of notebooks (dc3e694)
- guides: update ibis-for-dplyr-users.ipynb with latest (1aa172e), closes #6125
- improve docstrings for BooleanValue and BoleanColumn (30c1009)
- improve docstrings to map types (72a49b0)
- install: add quotes to all bracketed installs for shell compatibility (bb5c075)
- intersphinx: add mapping to autolink pyarrow and pandas refs (cd92019)
- intro: create Ibis for dplyr users document (e02a6f2)
- introguides: use DuckDB for intro pandas notebook, remove iris (a7e845a)
- link to Ibis for dplyr users (6e7c6a2)
- make pandas.md filename lowercase (4937d45)
- more group_by() and NULL in pandas guide (486b696)
- more spelling fixes (564abbe)
- move API docs to top-level (dcc409f)
- numeric: add examples to numeric methods (39b470f)
- oracle: add basic backend documentation (c871790)
- oracle: add oracle to matrix (89aecf2)
- python-versions: document how we decide to drop support for Python versions (3474dbc)
- redirect Pandas to pandas (4074284)
- remove trailing whitespace (63db643)
- reorder sections in pandas guide (3b66093)
- restructure and consistency (351d424)
- snowflake: add connection explainer to snowflake backend docs (a62bbcd)
- streamlit: fix ibis-framework install (a8cf773)
- update copyright and some minor edits (b9aed44)
- update notany/notall docstrings with arg (a5ec986), closes #5993
- update structs and fix constructor docstrings (493437a)
- use lowercase pandas (19b5d10)
- use to_pandas instead of execute (882949e)
Refactors
- alchemy: abstract out custom type mapping and fix sqlite (d712e2e)
- api: consolidate
ibis.date()
,ibis.time()
andibis.timestamp()
functions (20f71bf) - api: enforce at least one argument for
Table
set operations (57e948f) - api: remove automatic
count
name from relations (2cb19ec) - api: remove automatic group by count naming (15d9e50)
- api: remove deprecated
ibis.sequence()
function (de0bf69) - api: remove deprecated
Table.set_column()
method (aa5ed94) - api: remove deprecated
Table.sort_by()
andTable.groupby()
methods (1316635) - backends: remove
ast_schema
method (51b5ef8) - backends: remove backend specific
DatabaseTable
operations (d1bab97) - backends: remove deprecated
Backend.load_data()
,.exists_database()
and.exists_table()
methods (755555f) - backends: remove deprecated
path
argument ofBackend.connect()
(6737ea8) - bigquery: align datatype conversions with the new convention (70b8232)
- bigquery: support a broader range of interval units in temporal binary operations (f78ce73)
- common: add sanity checks for creating ENodes and Patterns (fc89cc3)
- common: cleanup unit conversions (73de24e)
- common: disallow unit conversions between days and hours (5619ce0)
- common: move
ibis.collections.DisjointSet
toibis.common.egraph
(07dde21) - common: move tests for re_extract to general suite (acd1774)
- common: use an enum as a sentinel value instead of NoMatch class (6674353), closes #6049
- dask/pandas: align datatype conversions with the new convention (cecc24c)
- datatypes: make pandas conversion backend specific if needed (544d27c)
- datatypes: normalize interval values to integers (80a40ab)
- datatypes: remove
Set()
in favor ofArray()
datatype (30a4f7e) - datatypes: remove
value_type
parametrization of the Interval datatype (463cdc3) - datatypes: remove direct
ir
dependency fromdatatypes
(d7f0be0) - datatypes: use typehints instead of rules (704542e)
- deps: remove optional dependency on clickhouse-cityhash and lz4 (736fe26)
- dtypes: add
normalize_datetime()
andnormalize_timezone()
common utilities (c00ab38) - dtypes: turn dt.dtype() into lazily dispatched factory function (5261003)
- formats: consolidate the dataframe conversion logic (53ed88e)
- formats: encapsulate conversions to TypeMapper, SchemaMapper and DataMapper subclasses (ab35311)
- formats: introduce a standalone subpackage to deal with common in-memory formats (e8f45f5)
- impala: rely on impyla cursor for _wait_synchronous (a1b8736)
- imports: move old UDF implementation to ibis.legacy module (cf93d5d)
- ir: encapsulate temporal unit handling in enums (1b8fa7b)
- ir: remove
rlz.column_from
,rlz.base_table_of
andrlz.function_of
rules (ed71d51) - ir: remove deprecated
Value.summary()
andNumericValue.summary()
expression methods (6cd8050) - ir: remove redundant
ops.NullLiteral()
operation (a881703) - ir: simplify
Expr._find_backends()
implementation by using theibis.common.graph
utilities (91ff8d4) - ir: use
dt.normalize()
to construct literals (bf72f16) - ops.Hash: remove
how
from backend-specific hash operation (46a55fc) - pandas: solve and remove stale TODOs (92d979e)
- polars: align datatype conversion functions with the new convention (5d61159)
- postgres: fail at execute time for UDFs to avoid db connections in
.compile()
(e3a4d4d) - pyspark: align datatype conversion functions with the new convention (3437bb6)
- pyspark: remove useless window branching in compiler (ad08da4)
- replace custom
_merge
usingpd.merge
(fe74f76) - schema: remove deprecated
Schema.merge()
method (d307722) - schema: use type annotations instead of rules (98cd539)
- snowflake: add flags to supplemental JavaScript UDFs (054add4)
- sql: align datatype conversions with the new convention (0ef145b)
- sqlite: remove roundtripping for DayOfWeekIndex and DayOfWeekName (b5a2bc5)
- test: cleanup test data (7ae2b24)
- to-pyarrow-batches: ensure that batch readers are always closed and exhausted (35a391f)
- trino: always clean up prepared statements created when accessing query metadata (4f3a4cd)
- util: use base32 to compress uuid table names (ba039a3)
Performance
- imports: speed up checking for geospatial support (aa601af)
- snowflake: use pyarrow for all transport (1fb89a1)
- sqlalchemy: lazily construct the inspector object (8db5624)
Deprecations
- api: deprecate tuple syntax for order by keys (5ed5110)