How to put data into the SMPP SM field

In SMPP (Short Message Peer-to-Peer), the Short Message (SM) payload is the actual text (or binary data) of the SMS message being transmitted. To ensure the correct interpretation of this content, the data_coding field in the PDU plays a critical role by indicating the encoding format of the message.

Common SMPP Encodings

The data_coding field is 1 byte and informs the SMSC how to interpret the message payload.

HexDecimalEncodingDescription
0x000GSM 7-bit defaultStandard SMS character set
0x011ASCII8-bit ASCII (Latin-1 subset)
0x033Latin-1 (ISO 8859-1)Western European charset
0x088UCS2Unicode (16-bit, big-endian)
0x044BinaryRaw 8-bit binary data

Examples of Encoded Messages

1. GSM 7-bit (data_coding = 0x00)

Standard SMS encoding. Efficient (up to 160 characters in a single message).

Text: "Hello"
GSM 7-bit Packed: C8 32 9B FD 06

2. UCS2 (data_coding = 0x08)

Used for non-Latin scripts (e.g., Arabic, Chinese, emojis). Supports 70 characters per message.

Text: "مرحبا"
UCS2 Hex: 0645 0631 062D 0628 0627
Bytes (hex): 06 45 06 31 06 2D 06 28 06 27

3. ASCII (data_coding = 0x01)

Basic Latin characters only, less space-efficient than GSM 7-bit.

Text: "Hello"
ASCII Hex: 48 65 6C 6C 6F

SMPP PDU Example with UCS2 Encoding

Here is an SMPP submit_sm PDU carrying a Unicode message:

0000004B  // Command Length (75 bytes)
00000004  // Command ID (submit_sm)
00000000  // Command Status
00000001  // Sequence Number
74657374  // service_type: "test"
01        // source_addr_ton: International
01        // source_addr_npi: ISDN
31323334  // source_addr: "1234" (ASCII)
00
01        // dest_addr_ton
01        // dest_addr_npi
35363738  // destination_addr: "5678"
00
00        // esm_class
00        // protocol_id
00        // priority_flag
00        // schedule_delivery_time
00        // validity_period
00        // registered_delivery
00        // replace_if_present_flag
08        // data_coding: UCS2
00        // sm_default_msg_id
0A        // sm_length: 10 bytes
06450631  // Message in UCS2 (e.g. "مر")
062D0628

Encoding and Concatenation

Long messages are split into parts using UDH (User Data Header). This reduces max payload size:

  • GSM 7-bit: 160 → 153 chars per part
  • UCS2: 70 → 67 chars per part

Example UDH for message part:

05 00 03 CC 02 01
// 05: header length
// 00 03: Concatenation IEI
// CC: Message reference
// 02: total parts
// 01: current part

Summary

SMPP provides flexible encoding options through the data_coding field. Proper encoding ensures compatibility across global networks, especially when handling multilingual text or binary data. Developers must match encoding types with the content and expected recipients to avoid message corruption.

References

  • SMPP 3.4 Specification
  • GSM 03.38 Character Set
  • Unicode Standard

More information