Sapan's PL development page

This page is outdated. My current activities are summarized in PlanetLab's trac system. An up-to-date version can be found on my homepage.

VSys

August 14, 2007  

The 'new vnet', implemented as an iptables extension is now running on alpha. 2 TODO items before rolling it out automatically:

1. Run the following iptables instruction:
    iptables -t nat -A POSTROUTING -j MARK --copy-xid

2. Apply my 3 line patch that exposes --copy-xid to the iptables uspace library


August 2, 2007  

I just finished documenting the vsys sources. The results are available here: www.cs.princeton.edu/~sapanb/vsys. I also audited the code, and tried to make vsys more robust to system exceptions, so that it recovers and logs errors instead of shutting down.
 
July 28, 2007  
 
Added support for ACLs. The new semantics for offering a service to a restricted set of slices are as follows:
 
(i) Go to the /vsys directory
(ii) Create a file named <myscript.acl>, with the following format:
      princeton_codeen   rw
      some_slice               r
etc.
(iii) Copy myscript into /vsys.
 
 
July 27, 2007 
 
Vsys has been running non-stop on the vici cluster for the past month or so. I'm curious if it'll scale
and run as well on a PL workload. 
 
TODO:
- Clean up the build process
- Run on an alpha node or 2 
- Define a policy to decidewho gets to use it and implement that in the node manager. 
There seem to be 2 options -(i) run it for everyone, which would give everyone a /sys directory, allowing us to use it for general PL mechanisms (eg. say eserving a non-privileged port). (ii) run it for slices that have proper ops.
 
 
Basics

The idea of vsys is simple, if you compare it to its UNIX analog. VSys is like the sys filesystem on UNIX, which exposes kernel services as files in the /sys directory. Values written to a file are passed to the service as inputs, and outputs generated by the service can be received by reading from the file. 

Vsys does the same across the boundary of a container. Each slice has a /vsys directory, populated with services that we want to make available to the slice (maybe each service would eventually become a slice attribute, or one slice attribute for vsys services). The way one makes a service available to a slice is by copying the script that implements the service into the /vsys (on the node) directory. The Vsys daemon uses the new iNotify API of linux to intercept this copy event and creates a pair of pipes in the vsys directory inside the slice. A user invokes a service by writing into the pertinent fifo pipe. From the developer's point of view, whatever gets stuffed into this pipe comes in via STDIN, and things written out via STDOUT go to the other pipe. STDERR is directed into the vsys log. A service (script placed in /vsys) is stateful if it runs in a loop, in which case vsys looks up the running instance every time a user invokes it. 

Anything placed in /vsys shows up inside every slice instantiated on the system. Files placed in /vsys/pvt/myslice show up only in 'myslice'. This is currently the only form of access control. We could add support for ACLs in the future, which will not be difficult.

Status: Alpha testing on the VICI cluster.
TODO for PL integration:
1. Make NM create the /vsys directory, /vserver/slice_name/vsys and start the vsys daemon. As a hack, /etc/init.d/vsys restart will carry out these steps.
2. Make NM use /vsys to give users access to files in root context, and other stuff proper was used for.

Initscripts Bugfix


Problem: Several Node Manager processes would get 'stuck' executing initscripts inside just-started-up vservers. Trying to chcontext into these vservers would fail, making it necessary to restart the node.  Synopsis: A vserver has its own PID namespace, meaning that if a script changes into a new vserver context, then it should make sure it doesn't refer to pids from the parent context. This should normally only lead to a "No such process" error, but in this case NM ran the sequence "pid=fork();if (pid) {chcontext();run_initscripts()} else wait(pid)". So the  chcontext() and wait() would get scheduled alongside each other; trying to fetch the pid while the context was changing, possibly leading to a race. The fix hoists the chcontext() to before the fork() and uses python's spawnvp, which combines fork() + execve() + wait().
Strangely, only doing the former (hoisting chcontext) didn't fix the bug, and strace showed that theblocking was happening in glibc and not in the chcontext syscall.  ?>