Huggingface镜像站使用方法总结

Huggingface镜像站使用方法总结

oyxy2019 1,097 2024-11-17

Huggingface镜像站使用方法总结

方法一:网页下载

网站地址https://hf-mirror.com/

方法二:huggingface-cli

huggingface-cli 是 Hugging Face 官方提供的命令行工具,自带完善的下载功能。

1、安装依赖

pip install huggingface_hub

2、设置环境变量
Linux

export HF_ENDPOINT=https://hf-mirror.com

Windows Powershell

$env:HF_ENDPOINT = "https://hf-mirror.com"

建议将上面这一行写入 ~/.bashrc。
3、下载
下载模型:

hf download gpt2 --local-dir gpt2

下载数据集:

hf download wikitext --repo-type dataset --local-dir wikitext

命令解释:

$ hf download --help
Usage: hf download [OPTIONS] REPO_ID [FILENAMES]...

  Download files from the Hub.

Arguments:
  REPO_ID         The ID of the repo (e.g. `username/repo-name`).  [required]
  [FILENAMES]...  Files to download (e.g. `config.json`,
                  `data/metadata.jsonl`).

Options:
  --repo-type [model|dataset|space]
                                  The type of repository (model, dataset, or
                                  space).  [default: model]
  --revision TEXT                 Git revision id which can be a branch name,
                                  a tag, or a commit hash.
  --include TEXT                  Glob patterns to include from files to
                                  download. eg: *.json
  --exclude TEXT                  Glob patterns to exclude from files to
                                  download.
  --cache-dir TEXT                Directory where to save files.
  --local-dir TEXT                If set, the downloaded file will be placed
                                  under this directory. Check out https://hugg
                                  ingface.co/docs/huggingface_hub/guides/downl
                                  oad#download-files-to-local-folder for more
                                  details.
  --force-download / --no-force-download
                                  If True, the files will be downloaded even
                                  if they are already cached.  [default: no-
                                  force-download]
  --dry-run / --no-dry-run        If True, perform a dry run without actually
                                  downloading the file.  [default: no-dry-run]
  --token TEXT                    A User Access Token generated from
                                  https://huggingface.co/settings/tokens.
  --quiet / --no-quiet            If True, progress bars are disabled and only
                                  the path to the download files is printed.
                                  [default: no-quiet]
  --max-workers INTEGER           Maximum number of workers to use for
                                  downloading files. Default is 8.  [default:
                                  8]

方法三:使用 hfd 下载

hfd 是本站开发的 huggingface 专用下载工具,基于成熟工具 git+aria2,可以做到稳定下载不断线。

0、安装aria2

sudo apt install aria2

如果没有sudo权限或者只想在conda虚拟环境:

conda install -c conda-forge aria2

1、下载hfd

可以先新建一个目录

mkdir huggingface && cd huggingface
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh

2、设置环境变量(前面两步做一次就可以,以后可从这里直接开始)
Linux:

export HF_ENDPOINT=https://hf-mirror.com

Windows Powershell:

$env:HF_ENDPOINT = "https://hf-mirror.com"

3、下载

下载模型:(不加 --local-dir 参数则直接下载到当前目录):

./hfd.sh Qwen/Qwen1.5-0.5B-Chat --local-dir ./model

下载数据集(必须加 --dataset 参数):

./hfd.sh wikitext --dataset

有些项目需要登录才能下载:

./hfd.sh meta-llama/Llama-2-7b --hf_username YOUR_HF_USERNAME --hf_token hf_***

命令解释:

$ hfd -h
Usage:
  hfd <repo_id> [--include include_pattern] [--exclude exclude_pattern] [--hf_username username] [--hf_token token] [--tool aria2c|wget] [-x threads] [--dataset] [--local-dir path]    

Description:
  Downloads a model or dataset from Hugging Face using the provided repo ID.

Parameters:
  repo_id        The Hugging Face repo ID in the format 'org/repo_name'.
  --include       (Optional) Flag to specify a string pattern to include files for downloading.
  --exclude       (Optional) Flag to specify a string pattern to exclude files from downloading.
  include/exclude_pattern The pattern to match against filenames, supports wildcard characters. e.g., '--exclude *.safetensor', '--include vae/*'.
  --hf_username   (Optional) Hugging Face username for authentication. **NOT EMAIL**.
  --hf_token      (Optional) Hugging Face token for authentication.
  --tool          (Optional) Download tool to use. Can be aria2c (default) or wget.
  -x              (Optional) Number of download threads for aria2c. Defaults to 4.
  --dataset       (Optional) Flag to indicate downloading a dataset.
  --local-dir     (Optional) Local directory path where the model or dataset will be stored.

Example:
  hfd bigscience/bloom-560m --exclude *.safetensors
  hfd meta-llama/Llama-2-7b --hf_username myuser --hf_token mytoken -x 4
  hfd lavita/medical-qa-shared-task-v1-toy --dataset

4、为命令设置别名

可以为命令创建别名,就可以在任何位置直接使用

cd huggingface
alias hfd="$PWD/hfd.sh"

(但实测发现只对当前终端有效,如果需要经常使用可加入.bashrc中)

方法四:使用环境变量(非侵入式)

非侵入式,能解决大部分情况。huggingface 工具链会获取HF_ENDPOINT环境变量来确定下载文件所用的网址,所以可以使用通过设置变量来解决。

方式1——命令行

HF_ENDPOINT=https://hf-mirror.com python your_script.py

方式2——代码前添加环境变量os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

python下载脚本:

# pip install huggingface-hub

import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
os.chdir(os.path.dirname(os.path.abspath(__file__)))

from huggingface_hub import snapshot_download

repo_id = "BAAI/bge-large-zh-v1.5"

folder_path = snapshot_download(
    repo_id=repo_id,
    repo_type="model",  # "model" or "dataset"
    local_dir=repo_id.split("/")[-1],
)
print("folder_path:", folder_path)