Emoji

Challenge prompt

String:

🁳🁣🁲🁩🁰🁴🁃🁔🁆🁻🀳🁭🀰🁪🀱🁟🀳🁮🁣🀰🁤🀱🁮🁧🁟🀱🁳🁟🁷🀳🀱🁲🁤🁟🀴🁮🁤🁟🁦🁵🁮🀡🀱🁥🀴🀶🁤🁽

Description: “Emojis everywhere! Is it a joke? Or something is hiding behind it.”


TL;DR

  • The given text is UTF-8 bytes that were mis-decoded as Windows-1252 (classic mojibake).
  • When repaired, it becomes a sequence of Domino Tile emojis (Unicode block U+1F030–U+1F093).
  • Take the low byte of each emoji’s codepoint (ord(c) & 0xFF) to recover ASCII.
  • Flag: scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}

Step-by-step

1) Recognize mojibake

  • Strings like ðŸ... are a strong signal that UTF-8 got decoded as cp1252/Windows-1252.
  • Goal: reconstruct the original bytes, then decode as proper UTF-8.

2) Repair the text encoding

  • Strategy: map each displayed character back to its single byte as if it were cp1252, then decode the resulting byte stream as UTF-8.

Quick Python (minimal)

s = "🁳🁣🁲🁩🁰🁴🁃🁔🁆🁻🀳🁭🀰🁪🀱🁟🀳🁮🁣🀰🁤🀱🁮🁧🁟🀱🁳🁟🁷🀳🀱🁲🁤🁟🀴🁮🁤🁟🁦🁵🁮🀡🀱🁥🀴🀶🁤🁽"

# Rebuild original bytes: cp1252 where possible; otherwise use the raw low byte
buf = bytearray()
for ch in s:
    try:
        buf += ch.encode('cp1252')        # preferred reverse-map
    except UnicodeEncodeError:
        code = ord(ch)
        if code <= 0xFF:                  # include undefined cp1252 controls like U+0081 as raw byte
            buf.append(code)
        else:
            raise

emoji = bytes(buf).decode('utf-8')        # -> Domino Tile emojis
print(emoji[:10])

You should see a run of Domino Tile glyphs like 🁳🁣🁲....

3) Extract the hidden message

  • Domino Tiles live at codepoints U+1F030..U+1F093.
  • The challenge embeds ASCII in the lowest 8 bits of each codepoint.

Extract (Python)

flag = ''.join(chr(ord(c) & 0xFF) for c in emoji)
print(flag)
# scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}

How we knew what to try

  • ðŸ sequences scream “UTF-8 → cp1252 mojibake”.
  • After repair, the emojis are all from a single thematically consistent block (Domino Tiles) → likely a numeric structure you can bit-mask.
  • Many CTFs hide text by abusing codepoint structure (e.g., low byte, low nibble, or offset from block start).

One-liners & alternatives

Python one-liner

print(''.join(chr(ord(c)&0xFF) for c in ( "YOUR_STRING".encode('cp1252','replace').decode('utf-8') )))
If encode('cp1252') errors due to undefined chars, use the longer loop from above or 'ignore' and then patch bytes if needed.

CyberChef approach (conceptual)

  1. Encode input as Windows-1252 to bytes.
  1. Decode those bytes as UTF-8 → domino emojis.
  1. Map each codepoint to codepoint & 0xFF and cast to ASCII.
    • (Step 3 typically needs a short script; CyberChef’s “JQ”, “Custom code” or “Find/Replace with JS” can do it, or just finish in Python.)

Verification snippet

assert flag.startswith("scriptCTF{") and flag.endswith("}")
assert all(32 <= ord(x) <= 126 for x in flag)  # printable ASCII

Final answer

scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}

Lessons learned

  • Mojibake patterns rapidly fingerprint mis-decoding issues.
  • Emoji/Unicode blocks are often chosen so that bit-masking yields readable ASCII.
  • When in doubt, inspect codepoints and try simple bit operations.