# Part 1: Build a DNS query

How do we make a query asking for the IP address for `google.com`?

Well, DNS queries have 2 parts: a **header** and a **question**. So we're going to 

1. create some Python classes for the header and the question
2. Write a `to_bytes` function to convert those objects into byte strings
3. Write a `build_query(domain_name, type)` function that creates a DNS query



## 1.1: Write the `DNSHeader` and `DNSQuestion` classes


First, our DNS **Header**. This has a query ID, some flags (which we'll mostly ignore), and 4 counts, telling you how many records to expect in each section of a DNS packet. Ignore the `to_bytes` method for now: we'll explain that in a second.

In [1]:
from dataclasses import dataclass
import dataclasses
import struct

@dataclass
class DNSHeader:
    id: int
    flags: int
    num_questions: int = 0
    num_answers: int = 0
    num_authorities: int = 0
    num_additionals: int = 0

    def to_bytes(self):
        fields = dataclasses.astuple(self)
        return struct.pack("!HHHHHH", *fields)

Next, a DNS **Question** just has 3 fields: a name (like `example.com`), a type (like `A`), and a class (which is always the same).


In [2]:
@dataclass
class DNSQuestion:
    name: bytes
    type: int 
    class_: int

    def to_bytes(self):
        return self.name + struct.pack("!HH", self.type, self.class_)

Next, let's talk about those `to_bytes` methods that convert the objects into byte
strings. 

## meet `struct.pack`: how we create byte strings

In the `to_bytes` function, we converted our Python objects into a byte string
using the `struct` module, which is built into Python. 

Let's see an example of how `struct` can convert Python variables into byte strings:

In [3]:
struct.pack('!HH', 5, 23)

b'\x00\x05\x00\x17'

`H` means "2-byte integer", so `!HH` is saying "format the arguments as two
2-byte integers. `\x00\x05` is 5 and `\x00\x17` is 23. 

### `struct.pack` format strings

In the format string `"!HH"`, there's an `H`, which we just said means "2 byte integer". Here are some more examples of things we'll be using later in our format strings:

* `H`: 2 bytes (as an integer)
* `I`: 4 bytes (as an integer)
* `4s`: 4 bytes (as a byte string)

Here's what an example DNS header looks like converted to bytes:

In [4]:
DNSHeader(id=2329, flags=0, num_questions=1, num_additionals=0, num_authorities=0, num_answers=0).to_bytes()

b'\t\x19\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00'

### a note on byte order

Why is there a `!` at the beginning of the format string `"!HH"`? That's
because anytime you convert an integer into a byte string, there are two
options for how to do it. Let's see the two ways to convert the integer
`0x01020304` (16909060) into a 4-byte string:

In [5]:
int.to_bytes(0x01020304, length=4, byteorder='little')

b'\x04\x03\x02\x01'

In [6]:
int.to_bytes(0x01020304, length=4, byteorder='big')

b'\x01\x02\x03\x04'

These are the reversed versions of each other. `b'\x01\x02\x03\x04'` is the
"little endian" version and `b'\x04\x03\x02\x01'` is the "big endian" version. 

The names "little-endian" and "big endian" actually have a funny origin:
they're named after two satirical religious sects in Gulliver's Travels. One
sect liked to break eggs on the little end, and the other liked the big end.
They're named after this Gulliver's travels debate because people used to like
to argue a lot about which byte order was best but it didn't make a big
difference.

In network packets, integers are always encoded in a big endian way (though
little endian is the default in most other situations). So `!` is telling
Python "use the byte order for computer networking".



## 1.2: encode the name


Now we're ready to build our DNS query.

First, we need to encode the domain name. We don't literally send "google.com",
instead it gets translated into `b"\x06google\x03com\x00"`. Here's the code:



In [7]:
def encode_dns_name(domain_name):
    encoded = b""
    for part in domain_name.encode("ascii").split(b"."):
        encoded += bytes([len(part)]) + part
    return encoded + b"\x00"

Let's run it:

In [8]:
encode_dns_name("google.com")

b'\x06google\x03com\x00'

The first byte of the output is `6` (the length of `"google"`):

In [9]:
encode_dns_name("google.com")[0]

6

## 1.3: build the query

Finally, let's write our `build_query` function! Our function takes a domain name (like
`google.com`) and the number of a DNS record type (like `A`). 



In [10]:
import random

TYPE_A = 1
CLASS_IN = 1

def build_query(domain_name, record_type):
    name = encode_dns_name(domain_name)
    id = random.randint(0, 65535)
    RECURSION_DESIRED = 1 << 8
    header = DNSHeader(id=id, num_questions=1, flags=RECURSION_DESIRED)
    question = DNSQuestion(name=name, type=record_type, class_=CLASS_IN)
    return header.to_bytes() + question.to_bytes()

This:

1. Defines some constants (`TYPE_A = 1`, `CLASS_IN = 1`)
2. encodes the DNS name with `encode_dns_name`
3. picks a random ID for the query
4. sets the flags to "recursion desired" (which you need to set any time you're talking to a DNS resolver)
5. creates the question
6. concatenates the header and the question together



In [11]:
build_query("example.com", TYPE_A)

b'\x81\x18\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x07example\x03com\x00\x00\x01\x00\x01'

## 1.4: Test our code

Now let's test if our code works!

In [12]:
import socket

query = build_query("www.example.com", 1)
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.sendto(query, ("8.8.8.8", 53))
response, _ = sock.recvfrom(1024)

This sends a query to Google's DNS resolver asking where `www.example.com` is.

But how can we know that this worked if we don't know how to parse the response
yet? Well can run `tcpdump` to see our program making its DNS query:

```
$ sudo tcpdump -ni any port 53
08:31:19.676059 IP 192.168.1.173.62752 > 8.8.8.8.53: 45232+ A? www.example.com. (33)
08:31:19.694678 IP 8.8.8.8.53 > 192.168.1.173.62752: 45232 1/0/0 A 93.184.216.34 (49)
```

It worked! You can see `8.8.8.8`'s answer at the end of tcpdump's output here, at the end of the second line. 

Asking Google's DNS resolver here is cheating, of course -- our final goal is
to **write** a DNS resolver that finds out where `example.com` is ourself,
instead of asking `8.8.8.8` to do the work for us. But this is a nice easy way
to check that our code for building a DNS query works.

## Success!

In the next part, we'll see how to parse this DNS response we just got back:

In [13]:
response

b'\xc3c\x81\x80\x00\x01\x00\x01\x00\x00\x00\x00\x03www\x07example\x03com\x00\x00\x01\x00\x01\xc0\x0c\x00\x01\x00\x01\x00\x00L\xfc\x00\x04]\xb8\xd8"'