What data is on the Dark Web?

Milan Kyselica
Monday, Jul 5, 2021

Definition of the Dark Web

The dark web is a part of the internet that is accessible through special software or settings, where the idea is a maximally anonymized access for its users. The dark web is a subset of the so-called deep web which generally means content that is not indexed by search engines like Google or Bing. These two terms are often mistaken for confusion.

You can read more about the dark web in our short blog post.

Data types

Various, this time we will focus mainly on data found in leaked data from organizations. The images below show leaks and data from them that have occurred and been uploaded to multiple web portals. There are several reasons why this data can be found. However, the most likely scenario remains the extortion of organizations that did not pay the ransom after becoming a victim to a ransomware attack.

Fig. 1. DoppelPaymer group portal, containing published company data

Fig. 2. Sample content Proof folder of one of the organizations — Fig. 2. Sample content *Proof* folder of one of the organizations

Possibilities of data misuse

An attacker has many ways to exploit this data. It depends mainly on the type of data, size, name, type of organization, and, last but not least, the abilities of the attacker. We looked in more detail at a few categories into which we can divide this published data.

1. Employee data

Photos of personal documents, such as an identity card, driver’s license, or passport. There are also photos of employees, employee cards, and contracts that contain sensitive data.

Fig. 3. Passports and other personal documents were published as proof of obtaining data from the organization

The risk of abuse, in this case, is very high, whereas these data are very sensitive and in most cases are of high quality, which greatly simplifies their handling. An attacker can sell this data directly to other people who know how to misuse this data to create various accounts, issue loans, or it is possible to “steal their identity” and significantly make life uncomfortable for the victim. Thanks to this data, an attacker can try to bypass the verification process KYC (know your customer) when registering different accounts (PayPal, Revolut, accounts on stock exchanges, and exchange offices) and act under a different identity. Such newly registered accounts, which are verified, have a relatively high value on the dark website.

2. Employee data

This category includes e.g. source codes that belong to the organization, description of the infrastructure, log-in details, invoices, and more. Source codes are widely used, competing companies can gain an advantage by identifying how parts of the product work. Hackers who can more easily identify vulnerabilities in the application and then create an exploit are also used. They can use the identified vulnerability or exploit to their advantage, or sell it on one of the many platforms dealing with the purchase of 0-day vulnerabilities and exploits, or just publish and damage the company affected by this vulnerability.

Fig. 4. Example of an invoice that was published as part of a leak

Another relatively common thing that appears in the published data is the above-mentioned description of the infrastructure, the division of the network, and the list of devices on the network. Domain details, identified domain controllers, and operating system versions allow attackers to prepare more thoroughly for certain types of attacks, including spear-phishing and phishing.

Spear-phishing is one of the attacks that require more sophisticated preparation, and this data can facilitate it, while attackers still achieve a relatively high success rate in initial penetrations into organizations.

3. Military files

There are files of military units of certain states, these files include drawings, maps, satellite positions, data on military equipment, development, and research. This type of data is of great value, especially for enemy military units.

4. Blue leaks - records from a police investigation

Data relating in particular to the work of the police. This includes records of people not only from police investigations. The nature of this data is very sensitive as it can be used by an attacker to create a spear-phishing campaign aimed either at these people or their relatives.

Figure 7. Details of a person suspected of threatening multiple organizations

In the previous lines, we looked in more detail at the types of data that the dark website contains. In the next section, we’ll show what the preparation and creation of one of the most common attacks that use this data might look like. We focus on spear-phishing, however, we will first explain what this term means and how it differs from phishing itself.

Spear-phishing

It is a more sophisticated type of phishing that targets a narrow group of potential victims. An attacker must take several steps before sending a targeted email campaign. These steps usually include the reconnaissance phase during which an attacker obtains data about the company, employees, infrastructure, equipment used, and more.

Example - preparing a spear-phishing campaign

In this example, we’ll show how an attacker can use data from a dark site to create a personalized phishing attack.

1. Select a destination

The attacker is looking for his target at this stage. Interest in the goal may be different, but the basic goals, whether of spear-phishing or social engineering, remain the same. Belongs here:

gaining unauthorized access to the network,
access to systems or information,
industrial espionage,
identity theft and others.

An attacker at this stage can also target organizations that have been the target of an attack in the past or a victim of ransomware and their data has been released.

2. Goal Analysis & Survey (Reconnaissance)

After the attacker has chosen the organization as his target, it is time to collect and analyze information about employees, workstations, infrastructure. In our example, the attacker used an already published leak of information from the organization. It was very easy to get to:

an up-to-date list of staff,
email addresses,
corporate as well as private telephone numbers.

Based on this data, he decided to impersonate an employee from HR (Human Resources) and impersonate him, then choosing a few people from the organization’s marketing department. Employees in this department have access to the company’s social network accounts, which are of high value as they can be used to distribute malicious content.

3. Email Preparation

Content creation can be a problem if the attacker has not seen prior communication between employees and departments across the target organization. In this case, it was greatly facilitated by already published data, which often contains email communication. He can find addresses, signatures, and the structure of HR internal emails in the data. The content of the email is a link to the SSO (Single Sign-On) portal, which is used in organizations to log in across various applications.

At this point, it is assumed that the text must be perfect, should not contain grammatical or stylistic errors, and must blend in as much as possible with the environment. Thus, such an email should mimic ordinary emails in the organization that are received regularly and are not in any way surprising or new to the recipient.

4. Distribute a campaign

A phishing email is sent through a pre-prepared and tested phishing infrastructure. Since this is a type of campaign aimed at obtaining credentials, and the use of two-factor authentication (2FA) is expected, the basis of such an infrastructure is a combination of two Open-source frameworks Gophish1 and evilginx22. With this combination, an attacker can successfully bypass two-factor authentication, obtain cookies of the logged-in user and perform an action on behalf of the user. This phase also covers the analysis of the e-mail solution and the protection of victims, such as various sandboxes, which can partially prevent generic phishing campaigns. It is assumed that an attacker will use a domain that has a positive score, and a specific categorization in a quality spear-phishing campaign. Another feature of a quality campaign is that it was registered long enough before the actual distribution, so as not to unnecessarily increase the SPAM score, which takes into account, for example, the age of the domain.

5. Analysis of results

After successfully sending the email campaign, we get to the analysis. This means that the attacker is tracking campaign metrics, such as:

open emails,
link clicks,
submitted login data.

In this final phase, it waits for the acquisition of functional login data and also the second factor, or authenticated session cookies. He can already gain access to the organization, accounts, and internal network after the first acquisition of the above data. At this stage, the attacker tries to remain undetected for as long as possible and to obtain as much data as possible, which will allow him to spread further attacks on the organization.

Conclusion

In this article, we explained how data obtained from the dark web can significantly help an attacker, for example, in creating spear-phishing campaigns. At the same time, they increase his chances of future attacks. Of course, this is just one of the few possible forms of data applications found on the dark web.