Throughout the kernel there are access restrictions relating to jailed processes. Usually, these restrictions only check if the process is jailed, and if so, returns an error. For example:
if (p->p_prison) return EPERM;
System V IPC is based on messages. Processes can send each other these messages which tell them how to act. The functions which deal with messages are: msgsys, msgctl, msgget, msgsend and msgrcv. Earlier, I mentioned that there were certain sysctls you could turn on or off in order to affect the behavior of Jail. One of these sysctls was jail_sysvipc_allowed. On most systems, this sysctl is set to 0. If it were set to 1, it would defeat the whole purpose of having a jail; privleged users from within the jail would be able to affect processes outside of the environment. The difference between a message and a signal is that the message only consists of the signal number.
/usr/src/sys/kern/sysv_msg.c:
msgget(3): msgget returns (and possibly creates) a message descriptor that designates a message queue for use in other system calls.
msgctl(3): Using this function, a process can query the status of a message descriptor.
msgsnd(3): msgsnd sends a message to a process.
msgrcv(3): a process receives messages using this function
In each of these system calls, there is this conditional:
/usr/src/sys/kern/sysv msg.c: if (!jail.sysvipc.allowed && p->p_prison != NULL) return (ENOSYS);
Semaphore system calls allow processes to synchronize execution by doing a set of operations atomically on a set of semaphores. Basically semaphores provide another way for processes lock resources. However, process waiting on a semaphore, that is being used, will sleep until the resources are relinquished. The following semaphore system calls are blocked inside a jail: semsys, semget, semctl and semop.
/usr/src/sys/kern/sysv_sem.c:
semctl(2)(id, num, cmd, arg): Semctl does the specified cmd on the semaphore queue indicated by id.
semget(2)(key, nsems, flag): Semget creates an array of semaphores, corresponding to key.
Key and flag take on the same meaning as they do in msgget.
semop(2)(id, ops, num): Semop does the set of semaphore operations in the array of structures ops, to the set of semaphores identified by id.
System V IPC allows for processes to share memory. Processes can communicate directly with each other by sharing parts of their virtual address space and then reading and writing data stored in the shared memory. These system calls are blocked within a jailed environment: shmdt, shmat, oshmctl, shmctl, shmget, and shmsys.
/usr/src/sys/kern/sysv shm.c:
shmctl(2)(id, cmd, buf): shmctl does various control operations on the shared memory region identified by id.
shmget(2)(key, size, flag): shmget accesses or creates a shared memory region of size bytes.
shmat(2)(id, addr, flag): shmat attaches a shared memory region identified by id to the address space of a process.
shmdt(2)(addr): shmdt detaches the shared memory region previously attached at addr.
Jail treats the socket(2) system call and related lower-level socket functions in a special manner. In order to determine whether a certain socket is allowed to be created, it first checks to see if the sysctl jail.socket.unixiproute.only is set. If set, sockets are only allowed to be created if the family specified is either PF_LOCAL, PF_INET or PF_ROUTE. Otherwise, it returns an error.
/usr/src/sys/kern/uipc_socket.c: int socreate(dom, aso, type, proto, p) ... register struct protosw *prp; ... { if (p->p_prison && jail_socket_unixiproute_only && prp->pr_domain->dom_family != PR_LOCAL && prp->pr_domain->dom_family != PF_INET && prp->pr_domain->dom_family != PF_ROUTE) return (EPROTONOSUPPORT); ... }
The Berkeley Packet Filter provides a raw interface to data link layers in a protocol independent fashion. The function bpfopen() opens an Ethernet device. There is a conditional which disallows any jailed processes from accessing this function.
/usr/src/sys/net/bpf.c: static int bpfopen(dev, flags, fmt, p) ... { if (p->p_prison) return (EPERM); ... }
There are certain protocols which are very common, such as TCP, UDP, IP and ICMP. IP and ICMP are on the same level: the network layer 2. There are certain precautions which are taken in order to prevent a jailed process from binding a protocol to a certain port only if the nam parameter is set. nam is a pointer to a sockaddr structure, which describes the address on which to bind the service. A more exact definition is that sockaddr "may be used as a template for reffering to the identifying tag and length of each address"[2]. In the function in pcbbind, sin is a pointer to a sockaddr.in structure, which contains the port, address, length and domain family of the socket which is to be bound. Basically, this disallows any processes from jail to be able to specify the domain family.
/usr/src/sys/kern/netinet/in_pcb.c: int in.pcbbind(int, nam, p) ... struct sockaddr *nam; struct proc *p; { ... struct sockaddr.in *sin; ... if (nam) { sin = (struct sockaddr.in *)nam; ... if (sin->sin_addr.s_addr != INADDR_ANY) if (prison.ip(p, 0, &sin->sin.addr.s_addr)) return (EINVAL); .... } ... }
You might be wondering what function prison_ip() does. prison.ip is given three arguments, the current process (represented by p), any flags, and an ip address. It returns 1 if the ip address belongs to a jail or 0 if it does not. As you can see from the code, if it is indeed an ip address belonging to a jail, the protcol is not allowed to bind to a certain port.
/usr/src/sys/kern/kern_jail.c: int prison_ip(struct proc *p, int flag, u_int32_t *ip) { u_int32_t tmp; if (!p->p_prison) return (0); if (flag) tmp = *ip; else tmp = ntohl (*ip); if (tmp == INADDR_ANY) { if (flag) *ip = p->p_prison->pr_ip; else *ip = htonl(p->p_prison->pr_ip); return (0); } if (p->p_prison->pr_ip != tmp) return (1); return (0); }
Jailed users are not allowed to bind services to an ip which does not belong to the jail. The restriction is also written within the function in_pcbbind:
/usr/src/sys/net inet/in_pcb.c if (nam) { ... lport = sin->sin.port; ... if (lport) { ... if (p && p->p_prison) prison = 1; if (prison && prison_ip(p, 0, &sin->sin_addr.s_addr)) return (EADDRNOTAVAIL);
Even root users within the jail are not allowed to set any file flags, such as immutable, append, and no unlink flags, if the securelevel is greater than 0.
/usr/src/sys/ufs/ufs/ufs_vnops.c: int ufs.setattr(ap) ... { if ((cred->cr.uid == 0) && (p->prison == NULL)) { if ((ip->i_flags & (SF_NOUNLINK | SF_IMMUTABLE | SF_APPEND)) && securelevel > 0) return (EPERM); }
This, and other documents, can be downloaded from ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.
For questions about FreeBSD, read the documentation before contacting <questions@FreeBSD.org>.
For questions about this documentation, e-mail <doc@FreeBSD.org>.