Skip to content
@code-kern-ai

Kern AI

Building data-centric open-source tools for NLP

Hi there 👋

We are Kern AI, a team of ambitious data engineers and scientists aiming to make your life as a developer a bit easier. Our libraries and tools aim at improving the data-centric AI lifecycle.

We mainly maintain and publish refinery, the data scientist's open-source choice to scale, assess and maintain natural language data. Also, we maintain bricks, a collection of open-source modular NLP enrichments.

🪢 Community and contact

Feel free to join our community spaces, where we'll discuss about recent findings in data-centric AI:

We send out a (mostly) weekly newsletter about recent findings in data-centric AI, product highlights in development and more. You can subscribe to the newsletter here.

Also, you can follow us on Twitter and LinkedIn.

GitHub Discussions Discord Twitter LinkedIn YouTube Kern AI Docs Website

Pinned Loading

  1. refinery refinery Public

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.

    Python 1.4k 69

  2. bricks bricks Public

    Open-source natural language enrichments at your fingertips.

    Python 454 23

  3. refinery-python-sdk refinery-python-sdk Public

    Official Python SDK for Kern AI refinery.

    Python 18 3

  4. refinery-sample-projects refinery-sample-projects Public

    Containing examples of projects you can use to test refinery. Please select the use case from the branches.

    24 5

  5. automl-docker automl-docker Public archive

    CLI-based tool to automatically build ML models from training data into a servable Docker container

    Python 57 7

Repositories

Showing 10 of 52 repositories
  • refinery-submodule-model Public

    Data model for refinery. Manages entities and their access for multiple services, e.g. the gateway.

    code-kern-ai/refinery-submodule-model’s past year of commit activity
    Python 2 Apache-2.0 1 0 4 Updated Dec 13, 2024
  • cicd-deployment-scripts Public

    Scripts used for Kern AI CI/CD efforts

    code-kern-ai/cicd-deployment-scripts’s past year of commit activity
    Shell 0 0 1 1 Updated Dec 13, 2024
  • refinery-weak-supervisor Public

    Weak supervision for refinery. Manages the integration of heuristics such as labeling functions, active learners or zero-shot classifiers. Uses the weak-nlp library for the actual integration logic and algorithms.

    code-kern-ai/refinery-weak-supervisor’s past year of commit activity
    Python 0 Apache-2.0 1 0 1 Updated Dec 13, 2024
  • refinery-neural-search Public

    Neural search for refinery. Manages similarity search powered by Qdrant and outlier detection, both based on vector representations of the project records.

    code-kern-ai/refinery-neural-search’s past year of commit activity
    Python 5 Apache-2.0 1 0 2 Updated Dec 13, 2024
  • refinery-embedder Public

    Embedder for refinery. Manages the creation of document- and token-level embeddings using the embedders library.

    code-kern-ai/refinery-embedder’s past year of commit activity
    Python 1 Apache-2.0 1 0 15 Updated Dec 13, 2024
  • refinery-updater Public

    Updater for refinery. Manages migration logic to new versions if required.

    code-kern-ai/refinery-updater’s past year of commit activity
    Python 0 Apache-2.0 1 0 1 Updated Dec 13, 2024
  • refinery-tokenizer Public

    Tokenizer for refinery. Manages the creation and storage of spaCy tokens for text-based record attributes and supports multiple language models. It is used by the gateway.

    code-kern-ai/refinery-tokenizer’s past year of commit activity
    Python 1 Apache-2.0 1 0 1 Updated Dec 13, 2024
  • refinery-gateway Public

    Gateway for refinery. Manages incoming requests and holds the workflow logic. To interact with the gateway, the UI or Python SDK can be used.

    code-kern-ai/refinery-gateway’s past year of commit activity
    Python 0 Apache-2.0 3 2 5 Updated Dec 13, 2024
  • refinery-ml-exec-env Public

    Execution environment for the active learning module in refinery. Containerized function as a service to build active learning models using scikit-learn and sequence-learn.

    code-kern-ai/refinery-ml-exec-env’s past year of commit activity
    Python 0 Apache-2.0 1 1 12 Updated Dec 13, 2024
  • refinery-submodule-parent-images Public

    Submodule which contains the requirements of the different parent images of refinery.

    code-kern-ai/refinery-submodule-parent-images’s past year of commit activity
    Python 1 Apache-2.0 0 0 7 Updated Dec 13, 2024