[ot][ml] FUSE Git_LFS mount huggingface models

Undescribed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Sep 27 11:52:05 PDT 2023


I made this bash script while trying to do other minimally-related things.

https://github.com/xloem/httpfs_lm

# httpfs-lm

Load Git-LFS files without using disk space.

This quick batch script clones Git-LFS repositories from https://huggingface.co
and uses https://github.com/excitoon/httpfs to mount the Git-LFS files via the
network with FUSE, without ever writing them to disk.

This makes it easy to clone and use models without consuming disk space for
them.

Caveats:
- Bytes will be retrieved from the network on every read, so repeated loading
  is just as slow as initial loading.
- There is presently no way to unmount the files. I use `ps -Af | grep httpfs`
  and sigkill all the listed process ids.
- A new mounted filesystem and two processes are constructed for every Git-LFS
  file.
- I have not tested this much, and used it with a tiny model.

Example:
```
$ ./httpfs_lm.bash cerebras/btlm-3b-8k-base at main
Cloning into 'cerebras_btlm-3b-8k-base_main'...
remote: Enumerating objects: 87, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (60/60), done.
remote: Total 87 (delta 27), reused 0 (delta 0), pack-reused 27
Unpacking objects: 100% (87/87), 2.71 MiB | 9.20 MiB/s, done.
fatal: a branch named 'main' already exists
branch 'main' set up to track 'origin/main'.
Already up to date.
Mounting cerebras_btlm-3b-8k-base_main/pytorch_model.bin ..
Got 1 parts.
https://huggingface.co/cerebras/btlm-3b-8k-base/resolve/main/pytorch_model.bin
4.9G
...
cerebras_btlm-3b-8k-base_main/pytorch_model.bin ->
/tmp/tmp0y3337k4/pytorch_model.bin

https://huggingface.co/cerebras/btlm-3b-8k-base mounted at
cerebras_btlm-3b-8k-base_main
```

Extending this script to handle Git-LFS repositories on other hosts would just
mean ensuring the URL to the large files is correct.  For example, for GitHub I
think `/resolve/` would simply be changed to `/raw/` in the value assigned to
`httpfspath` in the script.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: httpfs_lm.bash
Type: application/octet-stream
Size: 1868 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20230927/a8d32a1f/attachment.obj>


More information about the cypherpunks mailing list