We think methods like these are definitely promising simply because language models previously learn a whole lot about human values in the course of pretraining. Learning about human values is just not not like learning about other subjects, and we should anticipate larger models to possess a a lot more accurate photograph of human values and to fi