DOMAIN GENERATION ALGORITHMS (DGAs) - A C&C ATTACK TECHNIQUE

kkalvani
Jan 2
13 min read

Updated: Jan 3

ABSTRACT

This article focuses on the attack technique – Domain Generation Algorithm (DGA), used by adversaries and APT groups. We discuss on what exactly is it, how it works in code with a few examples, different types of inputs/seeds attackers may use and types of DGAs that are known today. In addition, we discuss on how an attacker would use their generated domains to infect target systems and establish a C&C attack and what can they do after doing so. This would then dive into how DGAs have improved their attack methods and strategy into quickly performing a C&C attack. Lastly, we discuss how security teams deal with DGA based attacks through various advanced methods and tools.

INTRODUCTION

As a security analyst from the previous years of monitoring, you may have noticed some cases where malware relied on hard-coded (static) IP addresses or static domain names for Command and Control (C2) server communication between the infected system(s) and the attacker’s server. For example, in Cybereason’s EDR, such activity may be flagged as part of a MalOp (Malicious Operation), requiring immediate attention. Security teams could easily block traffic to these known IPs or domains, effectively disrupting the malware’s communication with its C2 server and therefore being blacklisted.

Besides static IPs, some attackers would also use Dynamic DNS (DDNS) services to link a domain name to a dynamic IP address that can regularly change over time. This IP address would belong to the attackers C2 server. This aids in evading detection or in case one IP was blocked, no problem, just update the A record of the registered domain to point to new IP addresses. However, despite this smart workaround, security tools could detect frequent IP address changes associated with DDNS services. This would render them suspicious and blocked, thereby disrupting the C2 communication.

All these earlier attack methods had their own strength and weaknesses where the attackers used to depend on few domains or IPs, which defenders could easily disrupt the communication once a pattern was recognized. However, modern malware often uses techniques like domain generation algorithms or encrypted channels, which can bypass static blacklisting, and they would require more advanced detection strategies.

Nowadays, attackers use an effective dynamic attack technique i.e., Domain Generation Algorithm (DGA). MITRE ATT&CK - T1568.002.

WHAT IS A DOMAIN GENERATION ALGORITHM (DGA)?

That’s right! It is an algorithm, but what does it do?

It generates domain names based on inputs. It is a deterministic algorithm, meaning that whatever input you give, the output generated will always be the same if the input is same. This means if I give a certain input or seed value, the output generated i.e., the domains generated, would be the same list of domains as long as the seed value is the same as well. By this, we can infer that, the output is dependent on the seed value, and the output will only change if we use a different seed value where that output belongs to that seed input.

The DGA is programmed into the malware payload and propagated to the target system(s). In the infected system, the malware would generate the domain names.

Meanwhile, the attacker will have a copy of their programmed DGA (the same one from the malware) with the same input/seed and will generate the same list of domains. What do they do with these domains? We will talk about that in the upcoming sections.

HOW DOES A DGA WORK?

There are two main components to this algorithm. Here is an example piece of Python code that adversaries may use to generate a fixed set of domains. We have the 2 components marked on them:

A. INPUTS (SEED)

A seed serves as the starting point or initialization value for a pseudorandom number generator (PRNG). PRNGs generate sequences of numbers that seem random but are actually deterministic, meaning they depend on the seed. As discussed before, when you input the same seed, you’ll get the same sequence of numbers (in this case, domains) every time. This allows both the attacker and the malware to synchronize and use the same list of domains for C2 server communication establishment.

Here, the attacker provides the algorithm with a starting point or “seed”. A common seed that we see are specific system dates or times. In fig.1, we can see that for the current date (2025-01-01), the outputs displayed above are generated and it will stay the same each time we run the code. With different dates or time as inputs, the domain sequences will randomly change and stay the same for those specific dates or time values.

To generate the domains using the seed, that’s where the B component comes in i.e., to use a set of characters, combine them randomly, and append the TLD to create the domains.

Using dates can be advantageous to the attacker as dates are predictable and it’s like a pattern that the attacker and the malware follow for their attack process. The DGA can generate a new set of domains every day (based on the date) so that even if one set of domains is discovered, the next day's set is different. The malware could fetch the current date from the system, and the attacker can generate domains based on the same date.

E.g.,

current_date = date.today()

seed = int(current_date.strftime(“%Y%m%d”))

This takes in the current date of the day, converts the date into a string format YYMMDD (20250101), and then converts that formatted string into an integer which can be used as a seed.

In this way, the malware that is in the infected system, would generate a new set of domains every day and the attacker would know the exact domains being generated every day because of the date pattern and because they have a copy of their DGA algorithm to generate the same list of domains that the malware’s DGA is also generating.

Changing the seed value in the code, we see a different set of domains:

FIG.2. CHANGE OF SEED VALUE CREATES A DIFFERENT SET OF DOMAINS.

The other kinds of inputs or seeds that can be used are:

Hardcoded strings or keys: These are predefined inputs embedded in the malware code, known to the attacker but hidden from others.

E.g., seed = “malwarekey12345”.

This key would be fixed and used in a synchronized manner between the attacker and the malware. Again, both the parties (attacker and malware) would know the domain list generated because of having one fixed key/seed.

Random seeds: This could be anything - user-specific information, configuration files, or pseudorandom values generated during execution. If you use a random value as the seed, the domains will still seem “random”, but they will still follow the same deterministic behaviour. But the attacker must remember or derive the seed to regenerate the same domains, or they could just hardcode it into the malware. If attacker just used random seeds all the time, they’d risk generating too many domains or losing track of what domains are being used, which could make it more difficult to maintain control over their botnet.

B. GENERATION LOGIC

The DGA applies mathematical or cryptographic transformations to the input seed to create seemingly random domain names. Some common techniques can be classified into the following types of DGAs:

1) PRNG DGAs:

If the DGA uses system time, date, or any fixed seed, like we saw earlier, to drive a PRNG, it falls under the PRNG DGA category. In the code example that we saw above, it uses PRNG logic. We use dynamic inputs, like a date seed, store all the domains in a list and set the count of domains to generate as 5. In a real attack scenario, an attacker would create at least a 1000 domains per day. Iterating through the for loop, we put in all the 26 alphabets and randomly generate five domains where each have a length of 10 (k=10) and all of them end with the Top-Level domain (TLD) - .com.

2) CHARACTER-BASED DGAs:

Character-Based DGAs focus on randomly selecting characters or numbers to create domains without requiring dynamic inputs like dates or times. It directly pulls from a fixed set of characters, numbers, or symbols to generate domains that look random, non-human-readable, and suspicious. These are simpler to implement but easier for defenders to detect due to predictable patterns.

3) HASHING-BASED (MD5 or SHA-256) DGAs:

In this type, the input (e.g., date, time, or other seed values) is passed through a hashing algorithm to produce a deterministic output. This can be done by using Python’s hashlib module which can encode your input in whichever hashing algorithm you choose.

The malware can take the date (20250101) and hash it with SHA-256, truncate the hash, and map it to domain characters. This makes the generated domains harder to predict without reverse-engineering the malware to extract the hashing logic and seed.

4) CHARACTER MANIPULATION-BASED DGAs:

These are simpler algorithms that rely on directly manipulating characters to generate domains.

For example:

We start with a hardcoded base string, such as “attack”.
Then iteratively modify the string to produce domains, e.g., “attack1.com”, “attack2.com”, “attack3.com”, etc.
The modification could involve appending numbers, rotating characters, replacing characters, etc.

Besides the various coding methods that attackers utilize when creating their payload, there are other few types of DGAs:

5) DICTIONARY BASED DGAs:

This type of DGA uses words from a dictionary to construct domain names. It combines dictionary words randomly, then appending numbers or TLDs, to make them look more legitimate.

For e.g.

FIG.6. EXAMPLE OF USING A LIST OF WORDS TO CREATE DOMAINS (CAN ALSO USE A DEFAULT WORDLIST WHICH CAN HAVE 1000s OF WORDS).

The algorithm would combine 2 words (k=2) like ‘cloud’ and ‘secure’ to generate real-looking domains like ‘securecloud.com’. This makes them harder to detect because they look legitimate and not random neither suspicious. This is very effective if an attacker is trying to bypass detection systems that rely on randomness as an indicator.

6) HIGH-COLLISION DGAs:

These kinds of domains the deliberately generated in such a way that, it uses short common words or characters paired with common TLDs (.net, .org, .info, etc.), in order to exploit the likelihood that the generated domain will overlap with legitimate domain names, or other DGA-generated domains.

In other words, it increases the likelihood of a domain collision (accidentally matching an existing domain). This can result in attackers using legitimate domains that are already registered (causing malware traffic to blend with normal traffic).

To summarize:

A DGA is a technique used by attackers to dynamically generate a large number of domain names (usually random ones). We have seen some ways of how they generate the domains. These domains are used for malicious purposes, mainly for C2 servers for malware communication and to make it harder for security teams to block suspicious IPs and domains simply because there are many of them being used and changed at regular periods.

Here is a typical workflow of a C&C attack with a C2 server. Later, we will apply DGA into this workflow so we can understand it better:

WHAT DO ATTACKERS DO WITH ALL THE DOMAINS?

When the attacker creates their DGA, they have a copy of it for themselves in which they would get their generated domains.

The attacker from his/her end, registers only one to a few of the generated domains that he/she selects, at the domain registrar (like GoDaddy) to make them seem official. This would make it harder for defenders to predict which ones will be active and it’s also cheaper to register a few domains than to register all of them. The selection usually depends on the cost of the TLDs and the availability to see if the domain is already registered by someone else.

If the domain is available, the attacker registers it by providing fake information to stay anonymous, and they would proceed with their payment with either cryptocurrencies or stolen credit cards to avoid linking the transaction back to themselves. They would do the same for other selected inactive domains.

After registration, the domains that were registered become active and the attacker has two options now:

Configure all the registered domains to point to their one and single Command-and-Control (C2) server by configuring their DNS settings to set the domain to resolve to their C2 server’s IP address. For example, if “example-malicious.com”, “thisisfake.com”, “securecloud.org” are the registered domains, their DNS A-records will point to a single IP address - 192.0.2.123, which is the IP of the attacker's C2 server.
The other option, which is what adversaries go for usually, is that they would use multiple C2 servers where each registered domain points to a different C2 server. Each C2 server has their own IP addresses that the attacker configures. So, in case if one of the domains are detected and blocked by security teams, the malware can simply resolve the other registered domains to their own respective C2 servers to ensure uninterrupted communication.

Now the attacker has done their part, they would then create their malware which would be executed on the target system. This malware would use the same DGA and generate the same domain list.

The malware would try and resolve all of these domains from its list to their IP addresses. There won’t be any resolving output connection for the non-registered domains because those domains are inactive (they have no IP addresses configured). But, when it queries the attacker-registered domain(s), it resolves those domains to the IP addresses that the attacker set up for that registered domain i.e., to their C2 server or C2 servers. The infected system would then communicate back to the attacker’s C2 server(s).

The attacker can distribute this attack to a botnet of computers. This happens when multiple systems are infected with the malware, and all these systems would communicate with the C2 server(s) as a part of the botnet. This communication can take place in two ways:

One infected system and one C2 server at a time, the rest are backups in case one domain gets disrupted – useful for smaller botnets or targeted attacks.
All the domains in the infected systems simultaneously communicate with each of the C2 servers at the same time – useful for large botnets for load balancing.

Now, from the basic C&C attack technique view from Fig.7, when we apply the involvement of DGA, we get:

WHAT CAN ATTACKERS DO NOW?

To be direct, they can do lots of things:

Data Exfiltration: Attackers use the C2 connection to transfer stolen files from the victim’s system to their own servers.
Remote Access and Control: Attackers can execute commands, upload/download files, and interact with the system as if they were physically present. (refer to my other article and video demo to understand more about RAC with remote access trojans (RATs) - https://kkalvani.wixsite.com/my-site/post/cybersecurity-awareness-month).
Install additional malware: Attackers may download and install other malware, such as ransomware, spyware, keyloggers, or cryptocurrency miners.

The list can go on. Doing all this can enable lateral movement, credential harvesting, data encryption (ransomware infection), spying on the victim, persistence and evasion, weaponizing the bot for a DDoS attack, etc.

HOW HAVE DGAs IMPROVED THE ATTACKER'S SITUATION?

As we’ve seen earlier, the attacker could generate thousands of domains using a DGA and only register a subset of them. The malware could try different domains until it found one that was active.

Even if security teams took down/blocked some of the domains or IPs, the attacker could continue using new domains generated on the fly. This made it harder to block the attacker's infrastructure. Therefore, the attacker could maintain their access and persistence. The high number of DGA generated domains can strain the firewalls and other networking-filtering solutions.

The attacker could use any seed input type to generate new domains, making the malware more adaptive to different environments such as different country domains, domains based on specific IPs or specific sandboxed environments, etc., thereby making it more harder to track.

DGAs made it more difficult for defenders to block the communication because they couldn’t predict which domains would be used next. With traditional static domains or IPs, defenders could simply block known bad addresses, but with DGAs, the domains kept changing regularly. Defenders also may find it challenging to distinguish between legitimate and malicious domains, especially with dictionary-based or high-collision DGAs.

Before, attackers needed to maintain a large number of registered domains or fallback IPs, which could be costly and operationally complex. Now, with DGAs, the attacker registers only a few domains from the DGA-generated list, minimizing the costs. The algorithm's ability to produce numerous potential domains ensures continuity even with minimal registration effort.

WHAT ARE SECURITY TEAMS DOING ABOUT THIS?

Security teams defend against DGA attack techniques through the following ways:

Reverse Engineer the DGA: Analyze the malware to understand the algorithm and predict the domains.
Use DNS Filtering and Analysis: Look for domains with high entropy (randomness) or random patterns.
Behavioural Analysis: Identify unusual DNS activity from infected systems.
Collaborate with Registrars: Work with domain registrars to pre-emptively block or sinkhole predicted domains. Sinkholing works when the security teams are able to successfully predict the domain generating pattern from the malware and they may either register them or take control of them if they are already registered, usually by working with domain registrars. This makes the domains “active” but instead of resolving to the attacker's C2 server, they are redirected to a controlled, safe server (the “sinkhole”).

This redirection process allows security teams to:

Analyze the traffic: They can study the malware's behaviour and track the infected systems.
Prevent communication: Malware cannot establish a connection with the attacker's C2 server, effectively disrupting the attack.
Monitor infections: By observing which systems are attempting to connect to the sinkhole, defenders can identify infected machines in a network.

Domain monitoring and blocking

Security teams monitor domain registrations and traffic for patterns indicating DGAs (IoCs) like non-human readable domain names, or rapid domain lookups to flag them and add them to blocklists.

AI and ML models

They are trained to distinguish between legitimate and DGA-generated domains based on features like length, character composition, etc.

DNS traffic analysis

Security tools analyze DNS query logs for unusual patterns like, high volumes of queries to non-existent (non-registered) domains or domains that don’t resolve. With this, they can track whichever infected system attempts to resolve DGA-generated domains.

Threat Intelligence Feeds

Security tools integrate with threat intelligence platforms to keep updated lists of known DGA patterns, algorithms, or active malicious domains.

CYBEREASON EDR SECURITY SOLUTION

Cybereason, an advanced Endpoint Detection and Response (EDR) tool, incorporates multiple countermeasures (like the ones mentioned above) to combat threats like Domain Generation Algorithm (DGA)-based malware.

Detection of Abnormal Behavior: Cybereason performs detection of unusual system and network behaviors, particularly on endpoints. This includes identifying anomalies such as unusual DNS queries or patterns indicative of malware leveraging DGAs.
Graph Database with AI/ML: Cybereason uses a graph database enhanced by artificial intelligence (AI) and machine learning (ML). This approach allows for deep analysis of DNS traffic to pinpoint domains that are likely generated by DGAs. By studying the structure, frequency, and entropy of queried domains, the tool can distinguish between legitimate and suspicious activity. This would enable them to find and investigate different variants of DGAs and constantly update the threat intelligence feed of new IoCs and details.
Integrated Threat Intelligence: The platform integrates real-time threat intelligence, enabling it to cross-reference suspicious domains with known malicious indicators. This helps enhance detection accuracy and provides context for detected threats.
Attack Tree Visualization: One of Cybereason’s standout features is its attack tree, which provides a comprehensive visualization of the attack chain providing the complete attack picture. This tool helps security analysts understand the full attack picture originating from infected systems. It outlines:

Which processes triggered specific MalOps (malicious operations).
The timeline and order of execution for these processes.
Parent-child relationships between various components of the attack.

By combining these capabilities, Cybereason empowers security teams to detect, analyze, and respond to DGA-based threats effectively.

REFERENCES

Asher-Dotan, L. (no date) What is Domain Generation Algorithm: 8 Real World DGA Variants. Available at: https://www.cybereason.com/blog/what-are-domain-generation-algorithms-dga (Accessed: 31 December 2024).

CEH, S.B.C., CCSP, CISM, OSCP (2024) ‘Protecting Against Cyber Threats: The use of Domain Generation Algorithm (DGA) by threat actors’, Medium, 2 March. Available at: https://osintph.medium.com/protecting-against-cyber-threats-the-role-of-domain-generation-algorithm-dga-80c3ec3cda9f (Accessed: 31 December 2024).

Dynamic Resolution: Domain Generation Algorithms, Sub-technique T1568.002 - Enterprise | MITRE ATT&CK® (no date). Available at: https://attack.mitre.org/techniques/T1568/002/ (Accessed: 1 January 2025).

Team, V. (2024) Demystifying Domain Generation Algorithms, Vercara. Available at: https://vercara.com/resources/demystifying-domain-generation-algorithms (Accessed: 2 January 2025).

What is a DGA? (no date) Search Security. Available at: https://www.techtarget.com/searchsecurity/definition/domain-generation-algorithm-DGA (Accessed: 31 December 2024).

zvelo (2020) ‘(DGAs) Domain Generation Algorithms | What You Need to Know’, 11 August. Available at: https://zvelo.com/domain-generation-algorithms-dgas/ (Accessed: 31 December 2024).