October Update: Learning for TunnelFS

It has been a while since I announced I would work on an exciting new project, TunnelFS, and I think it’s time for a status update.

In that post, I mentioned I had a lot to learn before I could get anywhere with this project. I had an idea of how much it was, but it turned out to be more. I’ve been working on learning C — I can read C without much trouble, except the really complex cases, and I can actually write a little on my own. For example, I’ve been going through the entire FreeBSD SCTP implementation in an effort to document its sysctl MIBs (there’s still outstanding work on this, but I’ll cycle back to it at some point). I’ve been reading some books on the subject of C, and it makes sense to me. It’ll take a while to really learn the grammar, but the concepts are nothing new. It feels a bit like learning a new language which is similar to your own, but with slightly different grammar and vocabulary.

This week I’ve been trying to figure out FUSE. It’s difficult. Documentation seems to be virtually non-existent, the code that I have found isn’t documented, and what little documentation I have found is on some sourceforge page, or specific to the kernel-side implementation of the API with little or no concern of how one would use that API. I’ve been trying to figure out how to compile some example code using Clang, but have given up and will be using GCC for now. I don’t have to solve every problem all at once. The important thing is getting a grasp of the API. I hope to implement fuse-tunnelfs in C to make it easier to port it as a native file system, keeping the options open for future potential features such as booting from TunnelFS.

I’ve done some thinking on how I want TunnelFS to be implemented, logistically speaking. I’ve come up with two implementation alternatives, but might land on a middle ground if possible. The only difference between the two implementation options is how it handles filesystem I/O on the host side.

TunnelFS-Implementation1The first option has the filesystem I/O be handled by a separate process, which may run as a different user than root if desired, communicating with bhyve over UNIX socket or some other means which I have yet to investigate. This makes the data have to be copied or streamed three times: Between fuse-tunnelfs and the TunnelFS virtio driver (guest side), between TunnelFS VirtIO driver guest side and bhyve side (backend), and between that backend and the separate process. It also has a smaller chance of breaking Bhyve, as it leaves less code running in Bhyve’s context. It could also be slightly more portable, as the filesystem I/O code isn’t directly tied to Bhyve in any way.

TunnelFS-Implementation2The second option has the filesystem I/O implemented into Bhyve itself. This leaves a bigger footprint in Bhyve, meaning there’s a higher chance of breaking Bhyve. But it also means there’s one less data-copying step, as it won’t have to communicate with a separate process. It is possibly less portable than the first implementation, but it could remain easily portable if done right (i.e. not implementing the I/O logic directly in the VirtIO backend code). Filesystem I/O would run in the bhyve context, meaning all operations would be executed as root.

I think this is a performance-vs-security/reliability decision. I’d err on the side of security/reliability, but this is all quite far off into the future, so we’ll see what I find out. If anyone have any input on this please let me know; your knowledge is appreciated.

I think that sums up this month, and the status of TunnelFS pretty well. Hopefully, I’ll have skeleton code by the end of the year, but we’ll see. :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s