solved by kylebot from Shellphish
The challenge executes a binary that players send and the goal is to pwn a echo_srv server in the remote environment.
The echo server is a single-threaded server that sends back whatever you send to it. Fortunately, the service's logic is nice and simple. Unfortunately, it is written in C++ which makes things complicated. Fortunately, we have access to the source code. Unfornately, C++ is so complicated that the author introduced a bug that made our lives much harder.(Oops, I went too far with the fortunately-unfortunately game)
The server uses select function and two fd_set, namely readset and writeset to monitor:
- whether there is a new connection. This is done by adding the socket fd to
readsetandselecton it. - whether there is any data readable from any fd. This is done by adding connection fd to
readsetandselecton it. - whether any fd is ready to write. This is done by adding connection fd to
writesetandselecton it.
And the main logic of the service is a while-loop starting with adding fds into readset/writeset, select on them and following by their handlers. Once it receives exit message from any connection, the server exits.
Worth noticing, the connections are abstracted in the program as ClientCtx. And all ClientCtx are stored in a vector clients.
The moment I saw fd_set and select. I thought about the fd_set bug that I learnt about from shitorrent challenge during DEFCON quals 2019.
Basically, fd_set is just a buffer of size 128 bytes(1024 bits). Each fd "in" it is represented as a bit in the buffer. Clearly, due to the size of the buffer, the largest fd it can hold is 1023. Due to the fact that there is no fd check in fd_set, if a fd larger than 1023 is added to the fd_set, it writes out of the boundary.
More specifically, FD_ZERO clears the fd_set buffer. FD_SET add a fd to the fd_set by fliping the bit representing the fd to 1. For example, add fd=4 to a fd_set means flip the 4th bit in the buffer to 1. By controlling what fds in a fd_set, we are able to control the bits in the buffer. Taking the overflow into account, the vulnerability is basically an out-of-bound write without restriction on the value. ulimit -n returning 4000 on the remote server confirmed my hypothesis.
But then, I noticed that if I create a lot of connections and then close some of them, the server crashes. This is weird. By some debugging, I notice that there is actually a UAF bug during interating through the ClientCtx vector.
for (auto it = clients.begin(), end = clients.end(); it != end; ++it) {
...
clients.erase(it);
...
}
The root cause is: end is not obtained during the iteration, if any element is erased during it, the last element will be accessed twice. If the last element is also erased, that is a UAF.
Now we have two bugs in hand. One is fd_set overflow, the other is a UAF.
By playing with the UAF bug for a bit, I quickly drew a conclusion that it is not exploitable. It's because UAF happens after the ClientCtx is freed meaning the corresponding fd is also closed. We can't interact with the fd any more.
Then I switched to the fd_set overflow bug.
There are two fd_set on the stack. I quickly confirmed that I can easily overflow the fd_set on the stack and overwrite the return address by creating a lot of connections to the server.
But there is a problem: fd_set clears a whole 8 bytes all together when it uses part of that 8 bytes. Think about this, there is a stack canary after the fd_set buffer. Once the fds overflow into canary even if just 1 bit. The 8-byte canary will be cleared to 0. This is a big NO since we need the canary in place when we exits the main function to start ROPing.
This leads to two problems:
- partial overwrite is impossible
- need a way to bypass canary check
These two problems lead to a natural conclusion: there must a way to leak data on stack.
I got stuck here for a while. But after playing with the memory in gdb, I noticed that the overflow region belongs to the writeset which is a set indicating whether the server should write to the fd. If the bit represented by a fd is 1 in writeset, the server will send a greeting message to the fd; if the bit is 0, nothing happens. This can be used as a side-channel attack!
What we can do here is to fill up 1024-bit buffer by creating 1020 connections(there are already 4 fds in use before any connection). And then create a new connection, wait for a bit and then check whether the server sends a greeting message back to us, if yes, that means the bit in memory is 1, if not, it is 0.
A bit-level leak is good, but not good enough. Remember the previous issue that I mentioned: it clears 8 bytes whenever it uses new 8 bytes. By sending only 1 connection at a time, we can only leak 1 bit of the 8 bytes, other 63 bits will be cleared.
By some debugging, I identified that the "bit clearing" logic is performed inside select function. If 64 connections are sent "at the same time", the next select will not clear the new 8 bytes. And the whole 8 bytes can be leaked using the bit-level info leak. This can be done easily locally by setting a break point after the first selcet. But 64 connections is a lot, how to establish this amount of connections between two select calls?
This becomes a race between our exploit process and the server process. We need 64 connections estabilished between 2 select calls so some important 8 bytes won't be destoried and can be leaked to us(canary, libc pointer, etc). I heard about teams using multi-threading stuff or spin up dummy processes to occupy cpu resources trying to increase the race window. But actually, they are not necessary.
The idea that I used was the same as their strategies: enlarge race window, but with a different approach. By examining the race window closely, I quickly noticed that handle_read itself is also a loop. And notice that the server is single-threaded, if we send a loooot of traffic to the server and make it stuck in the handle_read loop and at the same time we open connections, it's very likely that 64 connections can be established within the race window.
It turned out I was right. The success rate of a 8-byte leak in this way is more than 50%(maybe even higher, I didn't calculate it).
Now summarize what we have so far:
- we can perform stack overflow using
fd_setoverflow - we can leak arbitrary value on the stack by racing with the server
As a pwner, you may say, it's done. Right? Not really. There is one more step.
By playing with the leak for a bit, I noticed that the leak "protects" the leaked the value as well. That means after leaking 8 bytes, this 8 bytes will be left on stack, intact, and leaked to us.
This is good when leaking canary. However, it causes trouble when trying to write ROP chain on the stack. Because of this "protection" mechanism, a way is needed to clear bits on stack. That means we need two primitives to write arbitrary value on stack: one to set 1 to 0 and the other to set 0 to 1. Setting 0 to 1 can be done by sending a message to the server. But setting 1 back to 0 means the fd is not writable. That means the fd needs to be closed.
Now, let's recall the UAF bug. This bug prevents a user from closing any fd(In fact, some other teams figured out a way to close fds but I didn't), which kills the easiest way to set 1 to 0.
Then the "clear 8 bytes" logic came to my mind: if I overflow 1 bit into 8 bytes and sleep, the other 63 bits in that 8 bytes will be cleared. Then the 0-to-1 primitive alone is good enough to write almost arbitray value on stack. The least significant bit will be inhereted from the old value on stack, but it is good enough to launch exploitation.
With all the primitives we have, the exploitation logic is pretty straightforward:
- race to preserve canary on stack so it won't cause any trouble later
- race to leak a library pointer so we can calculate the libc address
- place ROP payload on stack
- send "exit" and start ROPing
To leak libc base, there is a pointer from ld library between fd_set buffer and return address. We can leak that pointer and calculate libc base.
But this requires us to know the offset between ld and libc. This is not hard. Although the absolute addresses of libraries may be different across runs, the offset between them are kept the same on Linux. The offset can be obtained by two ways:
- launch a Ubuntu-20.04 docker and debug the service inside it
- perform a separate run to leak both ld address and libc address
As a very cautious person, I chose method 2 since it has the additional advantage that it can confirm that the offset never changes.
Since we have access to the server, libc is known to us. Now it's ROP time. Construct the ROP chain, leak the canary, write the ROP chain onto stack and ...
CRASH!
What happened? Again, by some debugging, I found out I only have 8 gadgets. If I write more than that, it crashes inside libc. 8 gadgets is good enough for a decent hacker to get a shell with dup+system. But I'm a noob. dup just never came to my mind. Instead of doing it using the clean way. I went to the dark side :)
I wrote "/tmp/1" to a buffer and then called system("/tmp/1"). But it didn't work due to the fact that some environment variables got overwritten. Then I went to the last resort: execve. And execve("/tmp/1", nullptr, NULL) worked!
Challenge solved! Horay!
This is a fun challenge. It raises attention to the the fd_set bugs which exist in many debian packages. But it could be better without the irrelevant UAF bug :)
Now the stack is smashed, fun is had. It's time to profit ;)
And finally, thank you so much for the fun CTF. I really enjoyed it.