清华ChatGLM2-6B多显卡微调OOM总结

ChatGLM2-6B采用python/ray python提交多显卡OOM问题着重说明如下，还是之前的代码与版本：

python main.py --do_train --train_file ./data.json --validation_file ./data.json --preprocessing_num_workers 1 --prompt_column instruction --response_column output --overwrite_cache --model_name_or_path THUDM/chatglm2-6b --output_dir ./chatglm2-6b-pt2t --overwrite_output_dir --max_source_length 512 --max_target_length 512 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --predict_with_generate --max_steps 12000 --logging_steps 10 --save_steps 1000 --learning_rate 2e-4 --pre_seq_len 128 --quantization_bit 4

通过参数pre_seq_len设置走fp16， quantization_bit走量化，两者不设置走fp32。之前文章已经说明不同精度占用GPU的资源不同。

1.由于多卡显示涉及到device_map映射问题，需要更改代码

device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)}

model = AutoModel.from_pretrained(model_args.model_name_or_path, config=config, trust_remote_code=True, device_map=device_map,)

2.注意环境变量设置显卡显式可用，ray上自动识别

例如：export CUDA_VISIBLE_DEVICES=1,2,3,4

3.对于量化intX(--quantization_bit=4 or 8)

报错如下：

THUDM/chatglm2-6b/quantization.py", line 137, in __init__

self.weight = torch.round(weight / self.weight_scale[:, None]).to(torch.int8)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB (GPU 0; 15.78 GiB total capacity; 14.36 GiB already allocated; 49.50 MiB free; 14.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

根据报错查看源代码下v1与v2的差异对比代码实现

相关推荐

取消回复欢迎你发表评论:

Google 黑客常用搜索语句一览原力计划

npx简介（npxvip是哪国的）

在 Android 模拟器上运行 ARM 应用（android模拟器原理）

GB28181,B接口协议之SIPRTSPRTPRTMP协议从入门到精通

手机实时提取SIM卡打电话的信令和声音-辅助外设与商用通话方案

安装使用Hoppscotch构建API请求访问与测试

轻松转换!AppleNumbers到Excel的快捷教程

Python自动化办公——后台截图（python 自动截图）

电脑端腾讯文档如何导出excel

网络流媒体经典开源软件宝典webRTC, FFMpeg, SIP_流媒体开发教程