ChatGLM2-6B采用python/ray python提交多显卡OOM问题着重说明如下,还是之前的代码与版本:
python main.py --do_train --train_file ./data.json --validation_file ./data.json --preprocessing_num_workers 1 --prompt_column instruction --response_column output --overwrite_cache --model_name_or_path THUDM/chatglm2-6b --output_dir ./chatglm2-6b-pt2t --overwrite_output_dir --max_source_length 512 --max_target_length 512 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --predict_with_generate --max_steps 12000 --logging_steps 10 --save_steps 1000 --learning_rate 2e-4 --pre_seq_len 128 --quantization_bit 4
通过参数pre_seq_len设置走fp16, quantization_bit走量化,两者不设置走fp32。之前文章已经说明不同精度占用GPU的资源不同。
1.由于多卡显示涉及到device_map映射问题,需要更改代码
device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}
model = AutoModel.from_pretrained(model_args.model_name_or_path, config=config, trust_remote_code=True, device_map=device_map,)
2.注意环境变量设置显卡显式可用,ray上自动识别
例如:export CUDA_VISIBLE_DEVICES=1,2,3,4
3.对于量化intX(--quantization_bit=4 or 8)
报错如下:
THUDM/chatglm2-6b/quantization.py", line 137, in __init__
self.weight = torch.round(weight / self.weight_scale[:, None]).to(torch.int8)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 15.78 GiB total capacity; 14.36 GiB already allocated; 49.50 MiB free; 14.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
根据报错查看源代码下v1与v2的差异对比代码实现