feat(mail): support gzip compressed email storage via ENABLE_MAIL_GZIP (#933)

* feat(mail): support gzip compressed email storage in D1 raw_blob column

Add ENABLE_MAIL_GZIP env var to optionally gzip-compress incoming emails
into a new raw_blob BLOB column, saving D1 storage space. Reading is
backward-compatible: prioritizes raw_blob (decompress) with fallback to
plaintext raw field. Includes DB migration v0.0.7, docs, and changelogs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: gzip fallback on missing column + decouple resolve from handleListQuery

- email/index.ts: gzip INSERT failure now falls back to plaintext INSERT
  instead of silently losing the email (P1: data loss prevention)
- common.ts: add handleMailListQuery for raw_mails-specific list queries
  with resolveRawEmailList, keeping handleListQuery generic
- Replace handleListQuery → handleMailListQuery in mails_api, admin_mail_api,
  user_mail_api (only raw_mails callers)
- Add e2e test infrastructure: worker-gzip service, wrangler.toml.e2e.gzip,
  api-gzip playwright project, mail-gzip.spec.ts with 4 test cases

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback for gzip feature

- Use destructuring in resolveRawEmailRow to truly remove raw_blob key
- Narrow fallback scope: only fallback to plaintext on compression failure
  or missing raw_blob column, re-throw other DB errors
- Clean unused imports in e2e gzip test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add try-catch in resolveRawEmail to prevent single corrupt blob from failing entire list

A corrupted raw_blob would cause decompressBlob to throw, which with
Promise.all in resolveRawEmailList would reject the entire batch query.
Now catches decompression errors and falls back to row.raw field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mail): align sendAdminInternalMail with gzip storage path

sendAdminInternalMail now respects ENABLE_MAIL_GZIP: compresses to
raw_blob when enabled, with fallback to plaintext on failure.
Added e2e test verifying admin internal mail is readable under gzip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(e2e): match admin internal mail by body content instead of encoded subject

mimetext base64-encodes the Subject header, so the raw MIME string
does not contain the literal subject text. Match on body content
(balance: 99) which is plaintext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(e2e): add WORKER_GZIP_URL guard and length assertions in gzip tests

Address CodeRabbit feedback:
- Skip gzip tests when WORKER_GZIP_URL is not set to prevent false positives
- Assert results array length before accessing [0] for clearer error messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(mail): narrow gzip fallback scope and fix webhook query compatibility

- sendAdminInternalMail: separate compress vs DB error handling, only
  fallback to plaintext on compression failure or missing raw_blob
  column, rethrow other DB errors (aligns with email/index.ts)
- Webhook test endpoints: use SELECT * instead of explicit raw_blob
  column reference, so pre-migration databases don't 500
- Docs/changelog: clarify that db_migration must run before enabling
  ENABLE_MAIL_GZIP

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(telegram): use generic Record type for raw_mails query result

Align with other query sites — avoid hardcoding raw_blob in the
TypeScript type annotation so the query works with or without the
column after migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(models): add RawMailRow type and unify raw_mails query typing

Add RawMailRow type to models with raw_blob as optional field, replacing
ad-hoc Record<string, unknown> and inline type annotations across
webhook test endpoints, telegram API, and gzip utilities.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Dream Hunter
2026-04-04 18:46:39 +08:00
committed by GitHub
parent 53c35062c8
commit 7c6d0d7c8a
28 changed files with 563 additions and 40 deletions

View File

@@ -1,5 +1,5 @@
import { Context } from "hono";
import { handleListQuery } from "../common";
import { handleMailListQuery } from "../common";
export default {
getMails: async (c: Context<HonoCustomType>) => {
@@ -9,7 +9,7 @@ export default {
const filterQuerys = [addressQuery].filter((item) => item).join(" and ");
const finalQuery = filterQuerys.length > 0 ? `where ${filterQuerys}` : "";
const filterParams = [...addressParams]
return await handleListQuery(c,
return await handleMailListQuery(c,
`SELECT * FROM raw_mails ${finalQuery}`,
`SELECT count(*) as count FROM raw_mails ${finalQuery}`,
filterParams, limit, offset
@@ -17,7 +17,7 @@ export default {
},
getUnknowMails: async (c: Context<HonoCustomType>) => {
const { limit, offset } = c.req.query();
return await handleListQuery(c,
return await handleMailListQuery(c,
`SELECT * FROM raw_mails where address NOT IN (select name from address) `,
`SELECT count(*) as count FROM raw_mails`
+ ` where address NOT IN (select name from address) `,

View File

@@ -9,6 +9,7 @@ CREATE TABLE IF NOT EXISTS raw_mails (
source TEXT,
address TEXT,
raw TEXT,
raw_blob BLOB,
metadata TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
@@ -184,6 +185,18 @@ export default {
// migration to v0.0.6: add message_id index on raw_mails
await c.env.DB.exec(`CREATE INDEX IF NOT EXISTS idx_raw_mails_message_id ON raw_mails(message_id);`);
}
if (version && version <= "v0.0.6") {
// migration to v0.0.7: add raw_blob column for gzip compressed email storage
const tableInfo = await c.env.DB.prepare(
`PRAGMA table_info(raw_mails)`
).all();
const hasRawBlob = tableInfo.results?.some(
(col: any) => col.name === 'raw_blob'
);
if (!hasRawBlob) {
await c.env.DB.exec(`ALTER TABLE raw_mails ADD COLUMN raw_blob BLOB;`);
}
}
if (version != CONSTANTS.DB_VERSION) {
// remove all \r and \n characters from the query string
// split by ; and join with a ;\n

View File

@@ -1,7 +1,8 @@
import { Context } from "hono";
import { CONSTANTS } from "../constants";
import { WebhookSettings } from "../models";
import { WebhookSettings, RawMailRow } from "../models";
import { commonParseMail, sendWebhook } from "../common";
import { resolveRawEmail } from "../gzip";
async function getWebhookSettings(c: Context<HonoCustomType>): Promise<Response> {
const settings = await c.env.KV.get<WebhookSettings>(
@@ -21,10 +22,12 @@ async function saveWebhookSettings(c: Context<HonoCustomType>): Promise<Response
async function testWebhookSettings(c: Context<HonoCustomType>): Promise<Response> {
const settings = await c.req.json<WebhookSettings>();
// random raw email
const { id: mailId, raw } = await c.env.DB.prepare(
`SELECT id, raw FROM raw_mails ORDER BY RANDOM() LIMIT 1`
).first<{ id: string, raw: string }>() || {};
const parsedEmailContext: ParsedEmailContext = { rawEmail: raw || "" };
const mailRow = await c.env.DB.prepare(
`SELECT * FROM raw_mails ORDER BY RANDOM() LIMIT 1`
).first<RawMailRow>();
const mailId = mailRow?.id;
const raw = mailRow ? await resolveRawEmail(mailRow) : "";
const parsedEmailContext: ParsedEmailContext = { rawEmail: raw };
const parsedEmail = await commonParseMail(parsedEmailContext);
const res = await sendWebhook(settings, {
id: mailId || "0",

View File

@@ -622,6 +622,33 @@ export const handleListQuery = async (
return c.json({ results, count });
}
/**
* handleListQuery variant for raw_mails: resolves raw_blob → raw after query.
*/
export const handleMailListQuery = async (
c: Context<HonoCustomType>,
query: string, countQuery: string, params: string[],
limit: string | number | undefined | null,
offset: string | number | undefined | null,
orderBy?: string
): Promise<Response> => {
const { resolveRawEmailList } = await import('./gzip');
const msgs = i18n.getMessagesbyContext(c);
if (typeof limit === "string") limit = parseInt(limit);
if (typeof offset === "string") offset = parseInt(offset);
if (!limit || limit < 0 || limit > 100) return c.text(msgs.InvalidLimitMsg, 400);
if (offset == null || offset == undefined || offset < 0) return c.text(msgs.InvalidOffsetMsg, 400);
const orderClause = orderBy || 'id desc';
const resultsQuery = `${query} order by ${orderClause} limit ? offset ?`;
const { results } = await c.env.DB.prepare(resultsQuery).bind(
...params, limit, offset
).all();
const resolvedResults = await resolveRawEmailList(results);
const count = offset == 0 ? await c.env.DB.prepare(
countQuery
).bind(...params).first("count") : 0;
return c.json({ results: resolvedResults, count });
}
export const commonParseMail = async (parsedEmailContext: ParsedEmailContext): Promise<{
sender: string,

View File

@@ -3,7 +3,7 @@ export const CONSTANTS = {
// DB Version
DB_VERSION_KEY: 'db_version',
DB_VERSION: "v0.0.6",
DB_VERSION: "v0.0.7",
// DB settings
ADDRESS_BLOCK_LIST_KEY: 'address_block_list',

View File

@@ -1,6 +1,6 @@
import { Context } from "hono";
import { getJsonSetting } from "../utils";
import { getBooleanValue, getJsonSetting } from "../utils";
import { sendMailToTelegram } from "../telegram_api";
import { auto_reply } from "./auto_reply";
import { isBlocked } from "./black_list";
@@ -11,6 +11,7 @@ import { extractEmailInfo } from "./ai_extract";
import { forwardEmail } from "./forward";
import { EmailRuleSettings } from "../models";
import { CONSTANTS } from "../constants";
import { compressText } from "../gzip";
async function email(message: ForwardableEmailMessage, env: Bindings, ctx: ExecutionContext) {
@@ -65,11 +66,49 @@ async function email(message: ForwardableEmailMessage, env: Bindings, ctx: Execu
const message_id = message.headers.get("Message-ID");
// save email
try {
const { success } = await env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind(
message.from, message.to, parsedEmailContext.rawEmail, message_id
).run();
let success = false;
if (getBooleanValue(env.ENABLE_MAIL_GZIP)) {
let compressed: ArrayBuffer | null = null;
try {
compressed = await compressText(parsedEmailContext.rawEmail);
} catch (gzipError) {
console.error("gzip compression failed, falling back to plaintext", gzipError);
}
if (compressed) {
try {
({ success } = await env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw_blob, message_id) VALUES (?, ?, ?, ?)`
).bind(
message.from, message.to, compressed, message_id
).run());
} catch (dbError) {
// Fallback to plaintext only if raw_blob column is missing (migration not applied)
const errMsg = String(dbError);
if (errMsg.includes('raw_blob') || errMsg.includes('no such column')) {
console.error("raw_blob column missing, falling back to plaintext", dbError);
({ success } = await env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind(
message.from, message.to, parsedEmailContext.rawEmail, message_id
).run());
} else {
throw dbError;
}
}
} else {
({ success } = await env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind(
message.from, message.to, parsedEmailContext.rawEmail, message_id
).run());
}
} else {
({ success } = await env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind(
message.from, message.to, parsedEmailContext.rawEmail, message_id
).run());
}
if (!success) {
message.setReject(`Failed save message to ${message.to}`);
console.error(`Failed save message from ${message.from} to ${message.to}`);

48
worker/src/gzip.ts Normal file
View File

@@ -0,0 +1,48 @@
/**
* Gzip compression/decompression utilities for D1 BLOB storage.
* Uses Web Standard CompressionStream/DecompressionStream (native in CF Workers).
*/
import { RawMailRow } from "./models";
export async function compressText(text: string): Promise<ArrayBuffer> {
const stream = new Blob([text]).stream().pipeThrough(new CompressionStream('gzip'));
return new Response(stream).arrayBuffer();
}
export async function decompressBlob(buffer: ArrayBuffer): Promise<string> {
const stream = new Blob([buffer]).stream().pipeThrough(new DecompressionStream('gzip'));
return new Response(stream).text();
}
/**
* Resolve the raw email text from either raw_blob (gzip) or raw (plaintext) field.
*/
export async function resolveRawEmail(row: RawMailRow): Promise<string> {
if (row.raw_blob) {
try {
// D1 returns BLOB as Array<number>, convert to ArrayBuffer for decompression
return await decompressBlob(new Uint8Array(row.raw_blob as ArrayLike<number>).buffer);
} catch (e) {
console.error("decompressBlob failed, fallback to raw field", e);
return row.raw ?? '';
}
}
return row.raw ?? '';
}
/**
* Resolve a single row: decompress raw_blob if present, strip raw_blob from result.
*/
export async function resolveRawEmailRow(row: RawMailRow): Promise<RawMailRow> {
const raw = await resolveRawEmail(row);
const { raw_blob: _, ...rest } = row;
return { ...rest, raw };
}
/**
* Batch resolve raw emails for list queries using Promise.all.
*/
export async function resolveRawEmailList(rows: RawMailRow[]): Promise<RawMailRow[]> {
return Promise.all(rows.map(row => resolveRawEmailRow(row)));
}

View File

@@ -2,8 +2,9 @@ import { Context, Hono } from 'hono'
import i18n from '../i18n';
import { getBooleanValue, getJsonSetting, checkCfTurnstile, getStringValue, getSplitStringListValue, isAddressCountLimitReached } from '../utils';
import { newAddress, handleListQuery, deleteAddressWithData, getAddressPrefix, getAllowDomains, updateAddressUpdatedAt, generateRandomName } from '../common'
import { newAddress, handleMailListQuery, deleteAddressWithData, getAddressPrefix, getAllowDomains, updateAddressUpdatedAt, generateRandomName } from '../common'
import { CONSTANTS } from '../constants'
import { resolveRawEmailRow } from '../gzip'
import auto_reply from './auto_reply'
import webhook_settings from './webhook_settings';
import s3_attachment from './s3_attachment';
@@ -28,7 +29,7 @@ api.get('/api/mails', async (c) => {
}
const { limit, offset } = c.req.query();
if (Number.parseInt(offset) <= 0) updateAddressUpdatedAt(c, address);
return await handleListQuery(c,
return await handleMailListQuery(c,
`SELECT * FROM raw_mails where address = ?`,
`SELECT count(*) as count FROM raw_mails where address = ?`,
[address], limit, offset
@@ -41,7 +42,8 @@ api.get('/api/mail/:mail_id', async (c) => {
const result = await c.env.DB.prepare(
`SELECT * FROM raw_mails where id = ? and address = ?`
).bind(mail_id, address).first();
return c.json(result);
if (!result) return c.json(null);
return c.json(await resolveRawEmailRow(result));
})
api.delete('/api/mails/:id', async (c) => {

View File

@@ -1,7 +1,8 @@
import { Context } from "hono";
import { CONSTANTS } from "../constants";
import { AdminWebhookSettings, WebhookSettings } from "../models";
import { AdminWebhookSettings, WebhookSettings, RawMailRow } from "../models";
import { commonParseMail, sendWebhook } from "../common";
import { resolveRawEmail } from "../gzip";
import i18n from "../i18n";
@@ -37,10 +38,12 @@ async function testWebhookSettings(c: Context<HonoCustomType>): Promise<Response
const settings = await c.req.json<WebhookSettings>();
const { address } = c.get("jwtPayload");
// random raw email
const { id: mailId, raw } = await c.env.DB.prepare(
`SELECT id, raw FROM raw_mails WHERE address = ? ORDER BY RANDOM() LIMIT 1`
).bind(address).first<{ id: string, raw: string }>() || {};
const parsedEmailContext: ParsedEmailContext = { rawEmail: raw || "" };
const mailRow = await c.env.DB.prepare(
`SELECT * FROM raw_mails WHERE address = ? ORDER BY RANDOM() LIMIT 1`
).bind(address).first<RawMailRow>();
const mailId = mailRow?.id;
const raw = mailRow ? await resolveRawEmail(mailRow) : "";
const parsedEmailContext: ParsedEmailContext = { rawEmail: raw };
const parsedEmail = await commonParseMail(parsedEmailContext);
const res = await sendWebhook(settings, {
id: mailId || "0",

View File

@@ -190,3 +190,14 @@ export type RoleConfig = {
}
export type RoleAddressConfig = Record<string, RoleConfig>;
export type RawMailRow = {
id: number;
message_id?: string;
source?: string;
address?: string;
raw?: string;
raw_blob?: unknown;
metadata?: string;
created_at?: string;
}

View File

@@ -3,6 +3,7 @@ import { Jwt } from 'hono/utils/jwt'
import { CONSTANTS } from "../constants";
import { bindTelegramAddress, jwtListToAddressData, tgUserNewAddress, unbindTelegramAddress } from "./common";
import { checkCfTurnstile, checkIsAdmin, getBooleanValue } from "../utils";
import { resolveRawEmailRow } from "../gzip";
import { TelegramSettings } from "./settings";
import i18n from "../i18n";
@@ -144,7 +145,7 @@ async function getMail(c: Context<HonoCustomType>): Promise<Response> {
if (!result) {
return c.text("Mail not found", 404);
}
return c.json(result);
return c.json(await resolveRawEmailRow(result));
}
const userId = await checkTelegramAuth(c, initData);
const jwtList = await c.env.KV.get<string[]>(`${CONSTANTS.TG_KV_PREFIX}:${userId}`, 'json') || [];
@@ -152,13 +153,14 @@ async function getMail(c: Context<HonoCustomType>): Promise<Response> {
const result = await c.env.DB.prepare(
`SELECT * FROM raw_mails where id = ?`
).bind(mailId).first();
if (!result) return c.json(null);
const settings = await c.env.KV.get<TelegramSettings>(CONSTANTS.TG_KV_SETTINGS_KEY, "json");
const superUser = settings?.enableGlobalMailPush && settings?.globalMailPushList.includes(userId);
if (!superUser) {
if (result?.address && !(result.address as string in addressIdMap)) {
if (!(result.address as string in addressIdMap)) {
return c.text(msgs.TgNoPermissionViewMailMsg, 403);
}
const address_id = addressIdMap[result?.address as string];
const address_id = addressIdMap[result.address as string];
const db_address_id = await c.env.DB.prepare(
`SELECT id FROM address where id = ? `
).bind(address_id).first("id");
@@ -166,7 +168,7 @@ async function getMail(c: Context<HonoCustomType>): Promise<Response> {
return c.text(msgs.TgNoPermissionViewMailMsg, 403);
}
}
return c.json(result);
return c.json(await resolveRawEmailRow(result));
}
catch (e) {
return c.text((e as Error).message, 400);

View File

@@ -9,6 +9,8 @@ import { TelegramSettings } from "./settings";
import { sendTelegramAttachments } from "./tg_file_upload";
import { bindTelegramAddress, deleteTelegramAddress, jwtListToAddressData, tgUserNewAddress, unbindTelegramAddress, unbindTelegramByAddress } from "./common";
import { commonParseMail } from "../common";
import { resolveRawEmail } from "../gzip";
import { RawMailRow } from "../models";
import { UserFromGetMe } from "telegraf/types";
import i18n from "../i18n";
import { LocaleMessages } from "../i18n/type";
@@ -301,12 +303,15 @@ export function newTelegramBot(c: Context<HonoCustomType>, token: string): Teleg
if (!db_address_id) {
return await ctx.reply(msgs.TgInvalidAddressMsg);
}
const { raw, id: mailId, created_at } = await c.env.DB.prepare(
const mailRow = await c.env.DB.prepare(
`SELECT * FROM raw_mails where address = ? `
+ ` order by id desc limit 1 offset ?`
).bind(
queryAddress, mailIndex
).first<{ raw: string, id: string, created_at: string }>() || {};
).first<RawMailRow>();
const raw = mailRow ? await resolveRawEmail(mailRow) : undefined;
const mailId = mailRow?.id;
const created_at = mailRow?.created_at;
const { mail } = raw ? await parseMail(msgs, { rawEmail: raw }, queryAddress, created_at) : { mail: msgs.TgNoMoreMailsMsg };
const settings = await c.env.KV.get<TelegramSettings>(CONSTANTS.TG_KV_SETTINGS_KEY, "json");
const miniAppButtons = []

View File

@@ -98,6 +98,9 @@ type Bindings = {
ENABLE_AI_EMAIL_EXTRACT: string | boolean | undefined
AI_EXTRACT_MODEL: string | undefined
// gzip compression for raw_mails
ENABLE_MAIL_GZIP: string | boolean | undefined
// E2E testing
E2E_TEST_MODE: string | boolean | undefined
}

View File

@@ -1,5 +1,5 @@
import { Context } from "hono";
import { handleListQuery } from "../common";
import { handleMailListQuery } from "../common";
import UserBindAddressModule from "./bind_address";
export default {
@@ -19,7 +19,7 @@ export default {
const filterQuerys = [addressQuery].filter((item) => item).join(" and ");
const finalQuery = filterQuerys.length > 0 ? `where ${filterQuerys}` : "";
const filterParams = [...addressParams]
return await handleListQuery(c,
return await handleMailListQuery(c,
`SELECT * FROM raw_mails ${finalQuery}`,
`SELECT count(*) as count FROM raw_mails ${finalQuery}`,
filterParams, limit, offset

View File

@@ -2,6 +2,7 @@ import { Context } from "hono";
import { createMimeMessage } from "mimetext";
import { UserSettings, RoleAddressConfig } from "./models";
import { CONSTANTS } from "./constants";
import { compressText } from "./gzip";
export const getJsonObjectValue = <T = any>(
value: string | any
@@ -264,11 +265,41 @@ export const sendAdminInternalMail = async (
data: text
});
const message_id = Math.random().toString(36).substring(2, 15);
const { success } = await c.env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind(
"admin@internal", toMail, msg.asRaw(), message_id
).run();
const rawText = msg.asRaw();
let success = false;
if (getBooleanValue(c.env.ENABLE_MAIL_GZIP)) {
let compressed: ArrayBuffer | null = null;
try {
compressed = await compressText(rawText);
} catch (gzipError) {
console.error("gzip compression failed, falling back to plaintext", gzipError);
}
if (compressed) {
try {
({ success } = await c.env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw_blob, message_id) VALUES (?, ?, ?, ?)`
).bind("admin@internal", toMail, compressed, message_id).run());
} catch (dbError) {
const errMsg = String(dbError);
if (errMsg.includes('raw_blob') || errMsg.includes('no such column')) {
console.error("raw_blob column missing, falling back to plaintext", dbError);
({ success } = await c.env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind("admin@internal", toMail, rawText, message_id).run());
} else {
throw dbError;
}
}
} else {
({ success } = await c.env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind("admin@internal", toMail, rawText, message_id).run());
}
} else {
({ success } = await c.env.DB.prepare(
`INSERT INTO raw_mails (source, address, raw, message_id) VALUES (?, ?, ?, ?)`
).bind("admin@internal", toMail, rawText, message_id).run());
}
if (!success) {
console.log(`Failed save message from admin@internal to ${toMail}`);
}