Emoji
Challenge prompt
String:
ðŸ³ðŸ£ðŸ²ðŸ©ðŸ°ðŸ´ðŸƒðŸ”ðŸ†ðŸ»ðŸ€³ðŸðŸ€°ðŸªðŸ€±ðŸŸðŸ€³ðŸ®ðŸ£ðŸ€°ðŸ¤ðŸ€±ðŸ®ðŸ§ðŸŸðŸ€±ðŸ³ðŸŸðŸ·ðŸ€³ðŸ€±ðŸ²ðŸ¤ðŸŸðŸ€´ðŸ®ðŸ¤ðŸŸðŸ¦ðŸµðŸ®ðŸ€¡ðŸ€±ðŸ¥ðŸ€´ðŸ€¶ðŸ¤ðŸ½
Description: “Emojis everywhere! Is it a joke? Or something is hiding behind it.”
TL;DR
- The given text is UTF-8 bytes that were mis-decoded as Windows-1252 (classic mojibake).
- When repaired, it becomes a sequence of Domino Tile emojis (Unicode block U+1F030–U+1F093).
- Take the low byte of each emoji’s codepoint (
ord(c) & 0xFF
) to recover ASCII.
- Flag:
scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}
Step-by-step
1) Recognize mojibake
- Strings like
ðŸ...
are a strong signal that UTF-8 got decoded as cp1252/Windows-1252.
- Goal: reconstruct the original bytes, then decode as proper UTF-8.
2) Repair the text encoding
- Strategy: map each displayed character back to its single byte as if it were cp1252, then decode the resulting byte stream as UTF-8.
Quick Python (minimal)
s = "ðŸ³ðŸ£ðŸ²ðŸ©ðŸ°ðŸ´ðŸƒðŸ”ðŸ†ðŸ»ðŸ€³ðŸðŸ€°ðŸªðŸ€±ðŸŸðŸ€³ðŸ®ðŸ£ðŸ€°ðŸ¤ðŸ€±ðŸ®ðŸ§ðŸŸðŸ€±ðŸ³ðŸŸðŸ·ðŸ€³ðŸ€±ðŸ²ðŸ¤ðŸŸðŸ€´ðŸ®ðŸ¤ðŸŸðŸ¦ðŸµðŸ®ðŸ€¡ðŸ€±ðŸ¥ðŸ€´ðŸ€¶ðŸ¤ðŸ½"
# Rebuild original bytes: cp1252 where possible; otherwise use the raw low byte
buf = bytearray()
for ch in s:
try:
buf += ch.encode('cp1252') # preferred reverse-map
except UnicodeEncodeError:
code = ord(ch)
if code <= 0xFF: # include undefined cp1252 controls like U+0081 as raw byte
buf.append(code)
else:
raise
emoji = bytes(buf).decode('utf-8') # -> Domino Tile emojis
print(emoji[:10])
You should see a run of Domino Tile glyphs like 🁳🁣🁲...
.
3) Extract the hidden message
- Domino Tiles live at codepoints U+1F030..U+1F093.
- The challenge embeds ASCII in the lowest 8 bits of each codepoint.
Extract (Python)
flag = ''.join(chr(ord(c) & 0xFF) for c in emoji)
print(flag)
# scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}
How we knew what to try
ðŸ
sequences scream “UTF-8 → cp1252 mojibake”.
- After repair, the emojis are all from a single thematically consistent block (Domino Tiles) → likely a numeric structure you can bit-mask.
- Many CTFs hide text by abusing codepoint structure (e.g., low byte, low nibble, or offset from block start).
One-liners & alternatives
Python one-liner
print(''.join(chr(ord(c)&0xFF) for c in ( "YOUR_STRING".encode('cp1252','replace').decode('utf-8') )))
If encode('cp1252') errors due to undefined chars, use the longer loop from above or 'ignore' and then patch bytes if needed.
CyberChef approach (conceptual)
- Encode input as Windows-1252 to bytes.
- Decode those bytes as UTF-8 → domino emojis.
- Map each codepoint to
codepoint & 0xFF
and cast to ASCII.- (Step 3 typically needs a short script; CyberChef’s “JQ”, “Custom code” or “Find/Replace with JS” can do it, or just finish in Python.)
Verification snippet
assert flag.startswith("scriptCTF{") and flag.endswith("}")
assert all(32 <= ord(x) <= 126 for x in flag) # printable ASCII
Final answer
scriptCTF{3m0j1_3nc0d1ng_1s_w31rd_4nd_fun!1e46d}
Lessons learned
- Mojibake patterns rapidly fingerprint mis-decoding issues.
- Emoji/Unicode blocks are often chosen so that bit-masking yields readable ASCII.
- When in doubt, inspect codepoints and try simple bit operations.