Report this

What is the reason for this report?

Python struct.pack and struct.unpack for Binary Data

Updated on March 20, 2026
Vinayak Baranwal

By Vinayak Baranwal

Technical Writer II

Python struct.pack and struct.unpack for Binary Data

Introduction

Python’s struct module converts between Python values and packed binary bytes using format strings that describe a fixed layout similar to C structs. It solves the problem of matching exact byte layouts for network protocols, binary file formats, and interoperability with C code. Binary packing matters whenever wire formats or on-disk records must agree with a specification byte for byte. By the end of this tutorial, you pack and unpack binary data, control byte order with format prefixes, reuse compiled formats with the Struct class, write into and read from buffers with pack_into and unpack_from, and spot common errors before they corrupt data.

Key Takeaways

  • struct.pack() returns a bytes object that contains the binary representation described by the format string.
  • struct.unpack() reads from a bytes-like buffer and returns a tuple of Python values.
  • struct.pack_into() writes into an existing writable buffer, and it uses an explicit offset to choose where to write.
  • Byte order prefixes (@, =, <, >, !) change how integers and floats are encoded, and mismatches can silently corrupt data.
  • struct.Struct compiles the format once, which saves work when you pack or unpack many records with the same format.
  • For mixed binary formats, struct.calcsize() tells you how many bytes a given format requires.

What Is the Python struct Module

The struct module is part of the Python standard library. It converts Python values into packed binary bytes and back, using format strings that describe a fixed memory layout similar to C structs. Use it when a protocol, file format, or C interface requires exact byte widths and positions.

In production code, struct is a good fit when your protocol or file format is described in terms of fixed-width integers, floats, and fixed-length byte fields, and when you need to convert them deterministically. It is not designed for variable-length fields, optional fields, or deeply nested layouts. For those cases, see the construct library or protobuf in the comparison table later in this tutorial.

The authoritative reference is the Python documentation for struct.

Understanding Format Strings

A format string tells struct how many bytes to allocate for each field and what Python type to map it to. It combines an optional byte order prefix with one or more type codes and optional repeat counts. The order of codes defines the on-wire layout.

Common format characters grouped by category:

Category Code C Type Python Type Standard Size
Boolean ? _Bool bool 1 byte
Integer b signed char int 1 byte
Integer B unsigned char int 1 byte
Integer h short int 2 bytes
Integer H unsigned short int 2 bytes
Integer i int int 4 bytes
Integer I unsigned int int 4 bytes
Integer q long long int 8 bytes
Integer Q unsigned long long int 8 bytes
Float f float float 4 bytes
Float d double float 8 bytes
Bytes s char[] bytes 1 byte per char
Padding x pad byte no value 1 byte

Standard sizes apply when you use a byte order prefix (<, >, !, =). Without a prefix, sizes are platform-native and may differ. For the complete list, see Python struct format characters.

A few format-string rules that readers use often:

  • Repeat counts multiply the following type, for example 3H means three unsigned shorts.
  • s formats bytes with an explicit length, for example 8s stores exactly eight bytes.
  • x adds a padding byte that is skipped when packing and ignored when unpacking.
  • calcsize(fmt) returns the number of bytes the format string will occupy.

The s format requires extra attention. Unlike integer codes, 8s packs exactly eight bytes as a single bytes value, not eight separate values. You must encode a Python str to bytes before packing, and pad or truncate to the declared length.

import struct

label = b'hello'
padded = label.ljust(8, b'\x00')  # pad to exactly 8 bytes

packed = struct.pack('>8sI', padded, 42)
print('packed_hex', packed.hex())

name_raw, number = struct.unpack('>8sI', packed)
print('name', name_raw.rstrip(b'\x00'))
print('number', number)
packed_hex 68656c6c6f0000000000002a
name b'hello'
number 42

Notice that struct.unpack returns the full eight bytes including the null padding. Call .rstrip(b'\x00') to recover the original value.

How to Use struct.pack

struct.pack(format, v1, v2, ...) takes Python values and returns a bytes object whose length always equals struct.calcsize(format). The number of values must match the number of fields in the format string.

The signature is:

struct.pack(format, v1, v2, ...) -> bytes

Example: pack three integers using signed short and signed long:

import struct

packed = struct.pack('>hhi', 5, 10, 15)
print(packed)
print(packed.hex())
print('size_bytes', struct.calcsize('>hhi'))
b'\x00\x05\x00\n\x00\x00\x00\x0f'
0005000a0000000f
size_bytes 8

In this example, the format string '>hhi' uses big-endian byte order, two signed shorts, and a signed int. The > prefix makes the output identical on any platform, which is important when you compare results against protocol specs.

If you are building a network protocol header, you typically combine struct.pack with socket send and receive code. For a full client/server example, see Python socket programming server-client.

When a format string mixes integers and a bytes field, the value count must still match the number of format codes, with one value per code except for s. The s code always consumes exactly one bytes argument regardless of the declared length.

import struct

# Format: big-endian, uint16 version, 4-byte name, uint32 timestamp
fmt = '>H4sI'
print('expected_size', struct.calcsize(fmt))

packed = struct.pack(fmt, 1, b'node', 1700000000)
print('packed_hex', packed.hex())
expected_size 10
packed_hex 00016e6f6465655359c0

If calcsize returns a number you did not expect, check whether your format uses native types without a prefix. Native types like l and i follow platform alignment rules, which can add padding bytes between fields. Switching to a prefixed format such as > or < gives you fixed, predictable sizes on any machine.

How to Use struct.unpack

struct.unpack(format, buffer) reads exactly struct.calcsize(format) bytes from the buffer and returns a tuple of Python values. The buffer must be exactly the right length, not shorter and not longer.

The signature is:

struct.unpack(format, buffer) -> tuple

Example: pack values, then unpack them back to Python objects:

import struct

fmt = '>hhi'
data = (5, 10, 15)
wire = struct.pack(fmt, *data)
values = struct.unpack(fmt, wire)
print('wire_hex', wire.hex())
print('values', values)
wire_hex 0005000a0000000f
values (5, 10, 15)

struct.unpack always returns a tuple, even when the format contains a single element. Code that expects a single scalar can still unpack the tuple with tuple unpacking, for example x, = struct.unpack('i', buf).

In a real stream or file, you rarely have a buffer that contains exactly one record. Use struct.calcsize to slice the right number of bytes before calling struct.unpack:

import struct

fmt = '>HH'
record_size = struct.calcsize(fmt)  # 4 bytes per record

# Simulate a stream containing three back-to-back records.
stream = struct.pack(fmt, 1, 100) + struct.pack(fmt, 2, 200) + struct.pack(fmt, 3, 300)

offset = 0
while offset + record_size <= len(stream):
    record = struct.unpack(fmt, stream[offset:offset + record_size])
    print('record', record)
    offset += record_size
record (1, 100)
record (2, 200)
record (3, 300)

This pattern works for binary files too. Open the file in 'rb' mode, read record_size bytes at a time, and stop when read() returns fewer bytes than record_size.

Controlling Byte Order and Endianness

A byte order prefix at the start of a format string controls how multi-byte integers and floats are encoded. Without a prefix, struct uses native byte order, which varies by platform and makes output non-portable.

Binary protocols often define a single byte order for multi-byte fields. For example, PNG file headers use big-endian integers, and Windows BMP headers use little-endian integers.

Python struct supports five prefix characters:

Prefix Name Byte Order Size/Alignment
@ Native with alignment Native Native alignment may add padding
= Standard sizes, native order Native Standard sizes, no alignment padding
< Little-endian Little-endian Standard sizes, no alignment padding
> Big-endian Big-endian Standard sizes, no alignment padding
! Network order Big-endian Standard sizes, no alignment padding

To see how each prefix changes the bytes, pack a concrete integer:

import struct

value = 0x12345678
for prefix in ['@', '=', '<', '>', '!']:
    packed = struct.pack(prefix + 'I', value)
    print(prefix, packed.hex())
@ 78563412
= 78563412
< 78563412
> 12345678
! 12345678

On this example host, native byte order is little-endian. On a big-endian host, @ and = switch to big-endian, while <, >, and ! stay fixed.

Byte order mismatch between sender and receiver does not raise an exception, it can silently corrupt values.

Using the Struct Class for Repeated Operations

struct.Struct(fmt) compiles a format string once and stores the result. Calling st.pack(...) or st.unpack(...) on the instance skips format parsing on every call, which matters in loops and high-throughput packet processing.

Side-by-side timing example:

import struct
import timeit

fmt = '>Ih'
data = (1, 2)
N = 200000

module_time = timeit.timeit(
    'struct.pack(fmt, *data)',
    number=N,
    globals={'struct': struct, 'fmt': fmt, 'data': data},
)

st = struct.Struct(fmt)
struct_time = timeit.timeit(
    'st.pack(*data)',
    number=N,
    globals={'st': st, 'data': data},
)

speedup = module_time / struct_time
print('module_level_s', round(module_time, 6))
print('Struct_instance_s', round(struct_time, 6))
print('speedup_x', round(speedup, 2))
module_level_s 0.020752
Struct_instance_s 0.012069
speedup_x 1.72

Exact timings vary by CPU and Python build, but the pattern holds: Struct avoids repeated format parsing and repeated size computation.

Working With Buffers: struct.pack_into and struct.unpack_from

struct.pack_into(fmt, buffer, offset, v1, ...) writes packed bytes into an existing writable buffer at a given byte offset. struct.unpack_from reads from the same kind of buffer at a given offset. Both work with bytearray and memoryview.

struct.pack_into writes into an existing buffer and returns None, so you typically inspect the buffer after packing.

Example using bytearray:

import struct

fmt = '>Ih'  # unsigned int, signed short
st = struct.Struct(fmt)

buf = bytearray(st.size)
print('buf_len', len(buf))

st.pack_into(buf, 0, 0x12345678, -2)
print('buf_hex', buf.hex())

values = st.unpack_from(buf, 0)
print('unpacked', values)
buf_len 6
buf_hex 12345678fffe
unpacked (305419896, -2)

Python’s bytes, bytearray, and memoryview types represent immutable data, writable buffers, and zero-copy views. For background, see the Python data types tutorial.

Using offset to pack at a specific location in a larger buffer:

import struct

fmt = '>Ih'
st = struct.Struct(fmt)

big = bytearray(st.size + 4)
offset = 4

st.pack_into(big, offset, 0x01020304, 7)
print('big_hex', big.hex())
print('unpacked_at_offset', st.unpack_from(big, offset))
big_hex 00000000010203040007
unpacked_at_offset (16909060, 7)

If you already have a slice view, memoryview can avoid extra copies:

import struct

fmt = '<HH'
st = struct.Struct(fmt)

backing = bytearray(st.size + 6)
view = memoryview(backing)[3:]  # view starts at offset three bytes

st.pack_into(view, 0, 0x1122, 0x3344)
out = st.unpack_from(view, 0)
print('backing_hex', backing.hex())
print('unpacked', out)
backing_hex 00000022114433000000
unpacked (4386, 13124)

Legacy code sometimes uses ctypes.create_string_buffer to allocate writable memory for pack_into. For pure-Python buffer handling, prefer bytearray and memoryview unless you are already integrating with a C ABI memory region.

Practical Examples

The examples below cover three common production patterns: building a binary network header, writing and reading a binary file, and matching a C struct layout for interop.

Network Packet Header Example

The format '>BBHI' packs a four-field packet header in big-endian byte order. Pack it before sending and unpack it on receipt to recover the original field values.

import struct

version = 1
packet_type = 2
payload = b'hello'
length = len(payload)  # payload length in bytes

checksum = sum(payload) & 0xFFFFFFFF

fmt = '>BBHI'  # version (u8), type (u8), length (u16), checksum (u32)
header = struct.pack(fmt, version, packet_type, length, checksum)

print('packed_header_hex', header.hex())

# send simulation
sent = header

# receive simulation
received = sent
v, t, l, c = struct.unpack(fmt, received)
print('unpacked', (v, t, l, c))
packed_header_hex 0102000500000214
unpacked (1, 2, 5, 532)

When you build real client/server code around this, pair the packed header with socket send and receive calls. For patterns and buffering strategies, see Python socket programming server-client.

If you compute checksums using masking and shifts, Python’s bitwise operators are the same primitives you use for protocol-level arithmetic.

Binary File Parsing Example

Open the file in binary mode ('wb' and 'rb'), write packed bytes directly, and read them back for unpacking. Context managers handle file closing even if an error occurs.

import struct
import tempfile
from pathlib import Path

values = (10, 20, 30, 40)
fmt = '>4I'  # four unsigned ints in big-endian order

with tempfile.TemporaryDirectory() as d:
    path = Path(d) / 'data.bin'

    packed = struct.pack(fmt, *values)
    print('packed_hex', packed.hex())
    print('size_bytes', struct.calcsize(fmt))

    with path.open('wb') as f:
        f.write(packed)

    with path.open('rb') as f:
        raw = f.read()

    unpacked = struct.unpack(fmt, raw)
    print('unpacked', unpacked)
packed_hex 0000000a000000140000001e00000028
size_bytes 16
unpacked (10, 20, 30, 40)

C Struct Interop Example

The format '<IHBB' matches a fixed-width C record layout using little-endian byte order. Call struct.calcsize first to confirm your Python format string produces the same byte width as the C struct.

Assume a C struct like this:

// C (conceptual)
// uint32_t magic;    // 4 bytes
// uint16_t code;     // 2 bytes
// uint8_t flags;     // 1 byte
// uint8_t reserved;  // 1 byte
//
// Total size is 8 bytes on typical ABIs with this field order.

Python format string:

<IHBB means little-endian, uint32, uint16, uint8, uint8, with standard sizes and no extra alignment padding.

import struct

fmt = '<IHBB'

print('calcsize', struct.calcsize(fmt))

# Bytes received from the wire or a binary file.
wire = bytes.fromhex('ddccbbaa34120100')
print('wire_hex', wire.hex())

decoded = struct.unpack(fmt, wire)
print('decoded', decoded)
calcsize 8
wire_hex ddccbbaa34120100
decoded (2864434397, 4660, 1, 0)

If you need to match a C struct that includes compiler-inserted padding, check your C compiler ABI rules and consider using @ for native alignment in Python, or define explicit packing in C. The safest path is to confirm sizes with both sides and add tests that parse real fixtures.

Common Errors and How to Fix Them

Most struct errors fall into three categories: buffer length mismatches, wrong item counts, and silent byte order corruption. The third category raises no exception, which makes it the most dangerous in production.

struct.error: Unpack Requires a Buffer of X Bytes

Error message: struct.error: unpack requires a buffer of 4 bytes

import struct

try:
    struct.unpack('>I', b'\x00\x01')  # 2 bytes, but '>I' needs 4 bytes
except struct.error as e:
    print(e)
unpack requires a buffer of 4 bytes

Root cause: the input buffer is shorter than struct.calcsize(format). The unpack call must read exactly the number of bytes required by the format string.

Fix: slice or read exactly struct.calcsize(format) bytes, then pass that buffer to struct.unpack. When parsing stream data, buffer until you have enough bytes.

struct.error: Pack Expected X Items for Packing (Y Given)

Error message: struct.error: pack expected 2 items for packing (got 1)

import struct

try:
    struct.pack('>Ih', 123)  # format requires two values
except struct.error as e:
    print(e)
pack expected 2 items for packing (got 1)

Root cause: the number of Python values passed to struct.pack does not match the number of fields described by the format string.

Fix: provide one value per format element, or expand a tuple/list into positional arguments, for example struct.pack(fmt, *values).

Byte Order Mismatch Between Sender and Receiver

Symptom: no exception is raised. The decoded values are silently wrong, which makes byte order mismatch the hardest of the three errors to catch in production.

import struct

value = 0x1234
wire = struct.pack('>H', value)  # sender uses big-endian

# receiver mistakenly uses little-endian
decoded = struct.unpack('<H', wire)[0]
print('wire_hex', wire.hex())
print('decoded', decoded)
wire_hex 1234
decoded 13330

Root cause: the receiver interprets multi-byte fields with a different byte order prefix from the sender.

Fix: use the same prefix on both sides, for most network protocols choose > or !, and for host-to-host binary files decide whether you need native alignment or standard sizes.

struct vs. Alternative Approaches

struct is the right tool for fixed-width, C-compatible binary layouts. For more complex needs, Python offers several alternatives with different trade-offs.

Approach Use Case Pros Cons
struct Fixed layouts, C-like binary packing and unpacking Standard library, explicit format strings, predictable sizes with <>! and = Manual format strings, limited for highly nested or variable-length formats
ctypes Interop with C APIs, mapping foreign memory Can mirror C structs and call C functions ABI and alignment differences by platform, more error-prone memory handling
array module Homogeneous numeric arrays for I/O Simple for one numeric type, easy to convert to bytes in some workflows Not designed for mixed field layouts or padding rules
construct library Declarative parsing and building of complex binary formats Rich schema language for conditional and nested parsing Extra dependency, parsing overhead compared to struct in tight loops
protobuf Message serialization with schemas Cross-language schema compatibility, versioning support Not a byte-for-byte match to C struct layouts, uses variable-length encodings

Frequently Asked Questions

The questions below cover the most common points of confusion when working with struct.pack, struct.unpack, byte order prefixes, and buffer sizing.

Q: What does struct.pack return in Python?
A: struct.pack(format, ...) returns a bytes object containing the binary representation described by format. The returned length always matches struct.calcsize(format).

Q: What is the difference between struct.pack and struct.pack_into?
A: struct.pack creates and returns a new bytes object. struct.pack_into writes the packed values into an existing writable buffer, using an offset to choose where to write.

Q: Why does struct.unpack return a tuple?
A: struct.unpack returns a tuple because a format string can describe multiple fields. Even if the format has one field, returning a tuple keeps the API consistent.

Q: How do I handle byte order when using Python struct with network data?
A: Network protocols commonly use a fixed network byte order, usually big-endian. Use > or ! in your format strings for multi-byte fields, and ensure both sender and receiver use the same prefix.

Q: What is the difference between native byte order @ and standard byte order = in Python struct?
A: @ uses native byte order and native alignment, which can add padding between fields. = uses native byte order but standard sizes with no alignment padding, so layouts match the documented sizes rather than the platform ABI.

Q: How do I pack a string or bytes object using Python struct?
A: Use the s format with an explicit length, for example 8s. For str, encode to bytes first, then pass the resulting bytes to struct.pack, and make sure it is exactly the required length.

Q: When should I use the Struct class instead of module-level struct.pack and struct.unpack?
A: Use struct.Struct(fmt) when your code repeatedly packs or unpacks with the same format string. It compiles the format once, so repeated operations avoid re-parsing the format each call.

Q: What causes struct.error: unpack requires a buffer of X bytes and how do I fix it?
A: This error happens when the buffer you pass to struct.unpack is shorter than struct.calcsize(format). Fix it by buffering until you have enough bytes or by slicing the correct number of bytes before calling struct.unpack.

Conclusion

Most binary format problems reduce to three decisions: which fields to pack, what byte order the other side expects, and whether you are calling the same format string often enough to justify a Struct instance. Get those three right and struct rarely surprises you.

The format character table and the byte order prefix table in this tutorial are the two references you will return to most often. The errors section covers the cases that trip up even experienced users, particularly the silent byte order mismatch that produces wrong values with no exception.

From here, a natural next step is to put a packed header onto a real socket. For a working client and server you can extend with your own header format, see Python socket programming server-client.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Vinayak Baranwal
Vinayak Baranwal
Author
Technical Writer II
See author profile

Building future-ready infrastructure with Linux, Cloud, and DevOps. Full Stack Developer & System Administrator. Technical Writer @ DigitalOcean | GitHub Contributor | Passionate about Docker, PostgreSQL, and Open Source | Exploring NLP & AI-TensorFlow | Nailed over 50+ deployments across production environments.

Category:
Tags:
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Still looking for an answer?

Was this helpful?

why 10 and 13 return b’\n\x00\r\x00’

- Richard Ng

is it possible to switch off this feature and get it ‘uneasy’ ?

- wally

I agree, the way the struct var prints is ridiculous. How did it get past the checking process?

- Pete

Creative CommonsThis work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.