PKU-Alignment

All

16 repositories

align-anything
Public
Align Anything: Training All-modality Model with Feedback
chameleon multimodal dpo large-language-models rlhf vision-language-model
Python
•
Apache License 2.0
•54•272•6•0•Updated Dec 18, 2024Dec 18, 2024
ProgressGym
Public
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
Python
•
MIT License
•3•15•0•0•Updated Dec 11, 2024Dec 11, 2024
aligner
Public
NeurIPS 2024 Oral: Achieving Efficient Alignment through Learned Correction
Python
•7•0•0•0•Updated Nov 19, 2024Nov 19, 2024
.github
Public
0•0•0•0•Updated Nov 9, 2024Nov 9, 2024
Aligner2024.github.io
Public
HTML
•1•0•0•0•Updated Oct 31, 2024Oct 31, 2024
omnisafe
Public
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark machine-learning reinforcement-learning deep-learning deep-reinforcement-learning constraint-satisfaction-problem pytorch safety-critical saferl safe-reinforcement-learning
Python
•
Apache License 2.0
•133•955•9•3•Updated Oct 15, 2024Oct 15, 2024
safe-sora
Public
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
alignment human-preferences text-to-video-generation large-vision-models
Python
•5•27•1•0•Updated Aug 20, 2024Aug 20, 2024
safe-rlhf
Public
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning transformers transformer safety llama gpt datasets beaver alpaca ai-safety
Python
•
Apache License 2.0
•119•1.4k•15•0•Updated Jun 13, 2024Jun 13, 2024
llms-resist-alignment
Public
Repo for paper "Language Models Resist Alignment"
alignment llama safe alpaca ai-safety vicuna llm llms rlhf safe-rlhf
Python
•0•6•0•0•Updated Jun 9, 2024Jun 9, 2024
safety-gymnasium
Public
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
reinforcement-learning constraint-satisfaction-problem safety-critical safety-critical-systems safe-reinforcement-learning safe-reinforcement-learning-environments constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•53•410•4•0•Updated May 14, 2024May 14, 2024
ProAgent
Public
ProAgent: Building Proactive Cooperative Agents with Large Language Models
language-model cooperative human-ai overcooked human-ai-interaction cooperative-ai llm-agent
JavaScript
•
MIT License
•7•63•1•0•Updated Apr 8, 2024Apr 8, 2024
SafeDreamer
Public
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
reinforcement-learning constraint-satisfaction-problem safety-critical-systems safe-reinforcement-learning constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•7•51•1•0•Updated Apr 8, 2024Apr 8, 2024
Safe-Policy-Optimization
Public
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
benchmarks reinforcement-learning-algorithms safe safe-reinforcement-learning constrained-reinforcement-learning
Python
•
Apache License 2.0
•45•333•1•0•Updated Mar 20, 2024Mar 20, 2024
AlignmentSurvey
Public
AI Alignment: A Comprehensive Survey
awesome reinforcement-learning ai deep-learning survey alignment papers interpretability red-teaming large-language-models
0•131•0•0•Updated Nov 2, 2023Nov 2, 2023
beavertails
Public
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
safety llama gpt datasets language-model beaver ai-safety human-feedback-data llm llms
Makefile
•
Apache License 2.0
•5•116•2•0•Updated Oct 27, 2023Oct 27, 2023
ReDMan
Public
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
Python
•
Apache License 2.0
•2•16•0•0•Updated May 2, 2023May 2, 2023