feat(engine): Batch trigger reloaded #2779

ericallam · 2025-12-11T16:57:57Z

New batch trigger system with larger payloads, streaming ingestion, larger batch sizes, and a fair processing system.

This PR introduces a new FairQueue abstraction inspired by our own RunQueue that enables multi-tenant fair queueing with concurrency limits. The new BatchQueue is built on top of the FairQueue, and handles processing Batch triggers in a fair manner with per-environment concurrency limits defined per-org. Additionally, there is a global concurrency limit to prevent the BatchQueue system from creating too many runs too quickly, which can cause downstream issues.

For this new BatchQueue system we have a completely new batch trigger creation and ingestion system. Previously this was a single endpoint with a single JSON body that defined details about the batch as well as all the items in the batch.

We're introducing a two-phase batch trigger ingestion system. In the first phase, the BatchTaskRun record is created (and possibly rate limited). The second phase is another endpoint that accepts an NDJSON body with each line being a single item/run with payload and options.

At ingestion time all items are added to a queue, in order, and then processed by the BatchQueue system.

New batch trigger rate limits

This PR implements a new batch trigger specific rate limit, configured on the Organization.batchRateLimitConfig column, and defaults using these environment variables:

BATCH_RATE_LIMIT_REFILL_RATE defaults to 10
BATCH_RATE_LIMIT_REFILL_INTERVAL the duration interval, defaults to "10s"
BATCH_RATE_LIMIT_MAX defaults to 1200

This rate limiter is scoped to the environment ID and controls how many runs can be submitted via batch triggers per interval. The SDK handles the retrying side.

Batch queue concurrency limits

The new column Organization.batchQueueConcurrencyConfig now defines an org specific processingConcurrency value, with a backup of the env var BATCH_CONCURRENCY_LIMIT_DEFAULT which defaults to 10. This controls how many batch queue items are processed concurrently per environment.

There is also a global rate limit for the batch queue set via the BATCH_QUEUE_GLOBAL_RATE_LIMIT which defaults to being disabled. If set, the entire batch queue system won't process more than BATCH_QUEUE_GLOBAL_RATE_LIMIT items per second. This allows controlling the maximum number of runs created per second via batch triggers.

Batch trigger limits

STREAMING_BATCH_MAX_ITEMS controls the maximum number of items in a single batch
STREAMING_BATCH_ITEM_MAXIMUM_SIZE controls the maximum size of each item in a batch

changeset-bot · 2025-12-11T16:58:02Z

🦋 Changeset detected

Latest commit: e78b619

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 26 packages

Name	Type
@trigger.dev/sdk	Patch
@trigger.dev/python	Patch
d3-chat	Patch
references-d3-openai-agents	Patch
references-nextjs-realtime	Patch
references-realtime-hooks-test	Patch
references-realtime-streams	Patch
references-telemetry	Patch
@trigger.dev/build	Patch
@trigger.dev/core	Patch
@trigger.dev/react-hooks	Patch
@trigger.dev/redis-worker	Patch
@trigger.dev/rsc	Patch
@trigger.dev/schema-to-json	Patch
@trigger.dev/database	Patch
@trigger.dev/otlp-importer	Patch
trigger.dev	Patch
@internal/cache	Patch
@internal/clickhouse	Patch
@internal/redis	Patch
@internal/replication	Patch
@internal/run-engine	Patch
@internal/schedule-engine	Patch
@internal/testcontainers	Patch
@internal/tracing	Patch
@internal/zod-worker	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

coderabbitai · 2025-12-11T16:58:15Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a two‑phase batch system: Phase 1 API to create batches and Phase 2 NDJSON streaming to submit items. Introduces a Redis‑backed BatchQueue with DRR scheduling, FairQueue primitives (master/worker/visibility/concurrency), completion tracking, payload offload to object storage, per‑organization limits, and RunEngine integration for processing and callbacks. Persists batch state and item errors via Prisma migrations and schema updates. Extends web UI, API routes/presenters, SDK/client streaming surfaces, observability (OTel/Prometheus/Grafana), and many tests for queueing, scheduling, visibility, and streaming logic.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Heterogeneous, high‑risk changes across infra, runtime, persistence, API, SDK, and UI.
Focus review on:
- Redis Lua scripts and key schemes (FairQueue, MasterQueue, Visibility, Concurrency, CompletionTracker).
- Scheduler correctness (DRR/Weighted/RoundRobin) and concurrency/reservation semantics.
- Batch two‑phase flow (CreateBatchService, StreamBatchItemsService), NDJSON parser, streaming retry/tee logic, and client streaming implementation.
- Idempotency and rate‑limit integrations (request idempotency, per‑org limits, global rate limiter).
- Prisma migrations and data model changes (enum additions, new columns/tables) for safety and backward compatibility.
- RunEngine integration and lifecycle changes (new methods, BatchQueue wiring, removal of RunNumberIncrementer).
- Payload offload (BatchPayloadProcessor) and object store error/trace handling.
- Large test suites and containerized Redis usage for determinism and resource cleanup.
- Observability and docker/Grafana/Prometheus configs for local dev correctness.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(engine): Batch trigger reloaded' is specific and directly describes the main change: a redesigned batch trigger system.
Description check	✅ Passed	The PR description provides comprehensive details about the new batch system, rate limits, concurrency controls, and two-phase ingestion. However, the description template sections (Checklist, Testing, Changelog, Screenshots) are not filled in.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/batch-trigger-v2

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fce7874 and e78b619.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (1)

packages/trigger-sdk/package.json (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/trigger-sdk/package.json

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Cursor Bugbot
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: typecheck / typecheck
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: Analyze (javascript-typescript)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on January 3

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

apps/webapp/app/v3/runEngine.server.ts

apps/webapp/app/runEngine/services/createBatch.server.ts

coderabbitai

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/trigger-sdk/src/v3/batch.ts (1)
26-46: Update JSDoc return type to match RetrieveBatchV2Response

retrieveBatch now correctly returns ApiPromise<RetrieveBatchV2Response>, but the JSDoc still documents RetrieveBatchResponse. This can mislead consumers and tooling that read JSDoc.

Consider updating the comment:
- * @returns {ApiPromise<RetrieveBatchResponse>} A promise that resolves with the batch details
+ * @returns {ApiPromise<RetrieveBatchV2Response>} A promise that resolves with the batch details

🧹 Nitpick comments (48)

apps/webapp/seed.mts (1)
113-156: Consider dynamically building the tenants array.

The tenants array construction is repetitive. A refactored approach would reduce duplication and make it easier to maintain when adding/removing orgs or projects.
-  console.log("tenants.json");
-  console.log(
-    JSON.stringify({
-      apiUrl: "http://localhost:3030",
-      tenants: [
-        {
-          id: org1Project1.project.externalRef,
-          secretKey: org1Project1.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org1Project2.project.externalRef,
-          secretKey: org1Project2.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org1Project3.project.externalRef,
-          secretKey: org1Project3.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org2Project1.project.externalRef,
-          secretKey: org2Project1.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org2Project2.project.externalRef,
-          secretKey: org2Project2.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org2Project3.project.externalRef,
-          secretKey: org2Project3.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org3Project1.project.externalRef,
-          secretKey: org3Project1.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org3Project2.project.externalRef,
-          secretKey: org3Project2.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org3Project3.project.externalRef,
-          secretKey: org3Project3.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-      ],
-    })
-  );
+  const allProjects = [
+    org1Project1, org1Project2, org1Project3,
+    org2Project1, org2Project2, org2Project3,
+    org3Project1, org3Project2, org3Project3,
+  ];
+
+  const tenants = allProjects.map((p) => ({
+    id: p.project.externalRef,
+    secretKey: p.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
+  }));
+
+  console.log("tenants.json");
+  console.log(JSON.stringify({ apiUrl: "http://localhost:3030", tenants }));
internal-packages/run-engine/src/engine/tests/batchTriggerAndWait.test.ts (3)
816-842: Loop iterates over createdRuns but ignores the runId when dequeuing.

The loop variable runId from createdRuns isn't used to select which run to dequeue—it relies on queue ordering. If dequeuedChild.length === 0, the loop silently continues, which could mask timing issues in the test. Consider either:

Failing the test if no run is dequeued when expected, or

Adding a comment clarifying that the loop count just controls iteration count and ordering is assumed
         for (const { runId } of createdRuns) {
           // Dequeue and start child
           await setTimeout(300);
           const dequeuedChild = await engine.dequeueFromWorkerQueue({
             consumerId: "test_12345",
             workerQueue: "main",
           });
 
-          if (dequeuedChild.length === 0) continue;
+          // Expect a run to be dequeued for each created run
+          expect(dequeuedChild.length).toBeGreaterThan(0);
1159-1185: Same dequeue loop pattern as the success test.

Consider adding an assertion that a run is dequeued for each iteration, or documenting why silent continuation is acceptable. This would make test failures more debuggable if timing issues occur.

877-1221: Comprehensive partial failure test.

Good coverage of the partial failure scenario. The test verifies:

PARTIAL_FAILED status when some items fail

Error records created in BatchTaskRunError

Parent still resumes after successful runs complete

Batch eventually transitions to COMPLETED

Consider adding a test case for when all items fail (the ABORTED branch at line 983) to complete the status matrix coverage.

Would you like me to help generate a test case for the all-items-fail (ABORTED) scenario?
packages/trigger-sdk/src/v3/shared.ts (3)
1578-1599: Streaming path fully buffers items, negating memory benefits.

The executeBatchTwoPhaseStreaming function collects all items into an array before processing, which eliminates the memory efficiency advantages of streaming for large batches. This is acknowledged in the comment, but users may expect true streaming behavior when passing an AsyncIterable.

Consider documenting this limitation in the public API JSDoc comments (e.g., for batchTriggerById) so users understand that streaming inputs are currently buffered before transmission.

1663-1961: Significant code duplication across transform functions.

The six transform*Stream* functions share ~90% identical logic for building the BatchItemNDJSON options object. The main variations are:

Task ID extraction (item.id vs item.task.id vs parameter)

lockToVersion source (env var vs taskContext.worker?.version)

Optional queue parameter

Consider extracting the common options-building logic into a shared helper to reduce the ~300 lines of duplication:
function buildBatchItemOptions(
  item: { options?: BatchItemOptions },
  payloadPacket: { dataType: string },
  batchItemIdempotencyKey: string | undefined,
  options?: { idempotencyKeyTTL?: string },
  overrides?: { lockToVersion?: string; queue?: string }
): BatchItemNDJSON['options'] {
  return {
    queue: item.options?.queue 
      ? { name: item.options.queue } 
      : overrides?.queue 
      ? { name: overrides.queue } 
      : undefined,
    // ... common fields
    lockToVersion: overrides?.lockToVersion ?? item.options?.version ?? getEnvVar("TRIGGER_VERSION"),
  };
}
611-730: Array path duplicates the streaming transform logic.

The array processing (lines 613-652) duplicates the logic in transformBatchItemsStream. Both paths could use the same transformation logic:
// Unified approach
const asyncItems = Array.isArray(items) 
  ? (async function* () { for (const item of items) yield item; })()
  : normalizeToAsyncIterable(items);

const ndJsonItems: BatchItemNDJSON[] = [];
for await (const item of transformBatchItemsStream(asyncItems, options)) {
  ndJsonItems.push(item);
}
This would reduce duplication and ensure both paths stay in sync. However, the current explicit approach is also valid for clarity.
packages/redis-worker/src/fair-queue/tests/workerQueue.test.ts (1)
26-29: Consider guarding non-null assertions.

While the test expects pop to return a result, using a non-null assertion without a preceding null check can make test failures less informative.

Consider this pattern for clearer test failures:
 const result = await manager.pop("worker-1");
 expect(result).not.toBeNull();
-expect(result!.messageKey).toBe("msg-1:queue-1");
-expect(result!.queueLength).toBe(0);
+if (result) {
+  expect(result.messageKey).toBe("msg-1:queue-1");
+  expect(result.queueLength).toBe(0);
+}
docs/batch-queue-metrics.md (2)
74-78: Add language specifier to fenced code block.

The code block lacks a language specifier. Based on the content (mathematical relationships), consider using text or plaintext to satisfy the linter.
-```
+```text
 batches_enqueued × avg_items_per_batch ≈ items_enqueued
 items_enqueued = items_processed + items_failed + items_pending
 batches_completed ≤ batches_enqueued (lag indicates processing backlog)
---

`180-212`: **Add `promql` language specifier to PromQL code blocks.**

The PromQL query blocks at lines 180, 194, and 205 are missing language specifiers. Adding `promql` will enable syntax highlighting and satisfy the linter.



```diff
 ### Processing Health
-```
+```promql
 # Throughput
 rate(batch_queue_items_processed_total[5m])
Apply similar changes to the code blocks starting at lines 194 and 205.
packages/redis-worker/src/fair-queue/retry.ts (1)
33-42: Minor redundancy in maxAttempts assignment.

The maxAttempts value is already guaranteed to be set in the this.options object on line 35, making the null coalescing on line 41 redundant.
     this.options = {
       maxAttempts: options?.maxAttempts ?? 12,
       factor: options?.factor ?? 2,
       minTimeoutInMs: options?.minTimeoutInMs ?? 1_000,
       maxTimeoutInMs: options?.maxTimeoutInMs ?? 3_600_000, // 1 hour
       randomize: options?.randomize ?? true,
     };
-    this.maxAttempts = this.options.maxAttempts ?? 12;
+    this.maxAttempts = this.options.maxAttempts!;
apps/webapp/app/v3/batchTriggerWorker.server.ts (1)
24-25: Clarify the initialization pattern.

While the comment on lines 9-10 explains the import, the void engine statement on line 25 could be more explicit about forcing initialization. Consider:
-  // Ensure the engine (and its BatchQueue) is initialized
-  void engine;
+  // Force engine initialization (triggers BatchQueue setup for v2 batches)
+  // The void operator discards the value while ensuring the side effect
+  void engine;
packages/redis-worker/src/fair-queue/tests/drr.test.ts (1)
8-356: DRR scheduler test coverage is solid; consider renaming one test for clarity

The tests exercise deficit lifecycle, queue selection semantics, ordering by deficit, and aggregate deficit retrieval against a real Redis instance, which gives good confidence in DRRScheduler.

Minor nit: the test
redisTest("should skip tenants with insufficient deficit", ...);
ultimately asserts that both tenants are returned after quantum is added, so the name doesn’t quite describe what’s being verified. Renaming it to something like "should include tenants once quantum has been added" would make intent clearer for future maintainers.
internal-packages/database/prisma/migrations/20251205135152_add_columns_for_run_engine_batch_trigger_v2/migration.sql (1)
17-30: Consider adding a unique constraint on (batchTaskRunId, index).

The BatchTaskRunError table allows multiple error entries with the same batchTaskRunId and index combination. If each batch item (identified by index) should have at most one error record, adding a unique constraint would enforce data integrity and prevent duplicate error entries.
 -- CreateIndex
 CREATE INDEX "BatchTaskRunError_batchTaskRunId_idx" ON "public"."BatchTaskRunError"("batchTaskRunId");
+
+-- CreateIndex (optional: enforce one error per batch item)
+CREATE UNIQUE INDEX "BatchTaskRunError_batchTaskRunId_index_key" ON "public"."BatchTaskRunError"("batchTaskRunId", "index");
apps/webapp/app/presenters/v3/BatchPresenter.server.ts (1)

69-71: Consider using a more specific error or returning null.

Throwing a generic Error("Batch not found") may make it harder for the caller to distinguish between different failure modes. Consider using a custom error type or returning null and letting the caller handle the not-found case, which is a common pattern in presenters.
packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts (1)
472-473: Fixed delay before DLQ check could cause flaky tests.

Using a fixed 500ms delay to wait for DLQ processing may be insufficient under load or on slower CI machines. Consider using vi.waitFor with a condition that checks DLQ length, similar to how other assertions in this file are structured.
-        // Give time for DLQ processing
-        await new Promise((resolve) => setTimeout(resolve, 500));
-
-        // Check DLQ
-        const dlqMessages = await queue.getDeadLetterMessages("t1");
-        expect(dlqMessages).toHaveLength(1);
+        // Wait for DLQ processing and verify
+        let dlqMessages: Awaited<ReturnType<typeof queue.getDeadLetterMessages>> = [];
+        await vi.waitFor(
+          async () => {
+            dlqMessages = await queue.getDeadLetterMessages("t1");
+            expect(dlqMessages).toHaveLength(1);
+          },
+          { timeout: 5000 }
+        );
internal-packages/run-engine/src/engine/tests/batchTwoPhase.test.ts (2)
484-508: Child run completion loop may skip runs if timing varies.

The loop continues silently when dequeuedChild.length === 0. While this handles timing issues gracefully, if a child run is never dequeued due to a bug, the test would still pass (it waits for waitpoints to clear, which could happen for other reasons).

Consider adding an assertion after the loop to verify all expected child runs were processed:
// After the loop
expect(createdRuns.filter(r => /* was completed */).length).toBe(2);
18-56: Consider extracting common engine configuration.

The RunEngine configuration is duplicated across all four tests with minor variations. Extract a helper function to reduce duplication and make tests easier to maintain.
function createTestEngine(prisma: PrismaClient, redisOptions: RedisOptions, overrides?: Partial<RunEngineOptions>) {
  return new RunEngine({
    prisma,
    worker: {
      redis: redisOptions,
      workers: 1,
      tasksPerWorker: 10,
      pollIntervalMs: 20,
    },
    queue: {
      redis: redisOptions,
      masterQueueConsumersDisabled: true,
      processWorkerQueueDebounceMs: 50,
    },
    runLock: { redis: redisOptions },
    machines: {
      defaultMachine: "small-1x",
      machines: {
        "small-1x": { name: "small-1x" as const, cpu: 0.5, memory: 0.5, centsPerMs: 0.0001 },
      },
      baseCostInCents: 0.0001,
    },
    batchQueue: {
      redis: redisOptions,
      consumerCount: 2,
      consumerIntervalMs: 50,
      drr: { quantum: 10, maxDeficit: 100 },
    },
    tracer: trace.getTracer("test", "0.0.0"),
    ...overrides,
  });
}
Also applies to: 164-202, 278-316, 547-584
apps/webapp/app/routes/api.v3.batches.ts (1)
84-92: Consider redacting idempotencyKey in logs.

The idempotencyKey is logged directly. If users include sensitive information in their idempotency keys (e.g., user IDs, email-based keys), this could leak PII to logs.
     logger.debug("Create batch request", {
       runCount: body.runCount,
       parentRunId: body.parentRunId,
       resumeParentOnCompletion: body.resumeParentOnCompletion,
-      idempotencyKey: body.idempotencyKey,
+      hasIdempotencyKey: !!body.idempotencyKey,
       triggerVersion,
       isFromWorker,
       triggerClient,
     });
internal-packages/run-engine/src/engine/index.ts (1)
973-975: isBatchQueueEnabled() always returns true.

Since batchQueue is always instantiated in the constructor (lines 325-346), this method will always return true. If the intent is to conditionally enable the BatchQueue based on configuration, consider checking a config flag instead.
 isBatchQueueEnabled(): boolean {
-  return this.batchQueue !== undefined;
+  return this.options.batchQueue !== undefined && this.batchQueue !== undefined;
 }
apps/webapp/app/v3/runEngine.server.ts (1)
354-366: Consider using createMany for batch error insertion.

Creating error records in a sequential loop can be slow for batches with many failures. Consider using Prisma's createMany for better performance.
-      if (failures.length > 0) {
-        for (const failure of failures) {
-          await prisma.batchTaskRunError.create({
-            data: {
-              batchTaskRunId: batchId,
-              index: failure.index,
-              taskIdentifier: failure.taskIdentifier,
-              payload: failure.payload,
-              options: failure.options as Prisma.InputJsonValue | undefined,
-              error: failure.error,
-              errorCode: failure.errorCode,
-            },
-          });
-        }
-      }
+      if (failures.length > 0) {
+        await prisma.batchTaskRunError.createMany({
+          data: failures.map((failure) => ({
+            batchTaskRunId: batchId,
+            index: failure.index,
+            taskIdentifier: failure.taskIdentifier,
+            payload: failure.payload,
+            options: failure.options as Prisma.InputJsonValue | undefined,
+            error: failure.error,
+            errorCode: failure.errorCode,
+          })),
+        });
+      }
packages/redis-worker/src/fair-queue/schedulers/roundRobin.ts (2)
79-89: Sequential capacity checks may impact performance.

With many tenants, calling context.isAtCapacity sequentially in a loop could introduce latency. Consider batching these checks or using Promise.all with a reasonable concurrency limit if the underlying implementation supports it.
// Example: batch capacity checks
const capacityResults = await Promise.all(
  rotatedTenants.map(async (tenantId) => ({
    tenantId,
    atCapacity: await context.isAtCapacity("tenant", tenantId),
  }))
);

const eligibleTenants: TenantQueues[] = capacityResults
  .filter((r) => !r.atCapacity)
  .map(({ tenantId }) => ({
    tenantId,
    queues: queuesByTenant.get(tenantId) ?? [],
  }));
146-149: Consider adding TTL to the lastServed key.

The lastServed index persists indefinitely in Redis. If shards are removed or renamed, stale keys accumulate. Consider setting an expiration or implementing periodic cleanup.
  async #setLastServedIndex(shardKey: string, index: number): Promise<void> {
    const key = this.#lastServedKey(shardKey);
-   await this.redis.set(key, index.toString());
+   // Expire after 24 hours - will be refreshed on each selectQueues call
+   await this.redis.set(key, index.toString(), "EX", 86400);
  }
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.batches.$batchParam/route.tsx (2)
51-73: Redundant error handling: tryCatch wrapped in try/catch.

The tryCatch utility already returns [error, data] tuple, making the outer try/catch unnecessary unless BatchPresenter constructor can throw. The current pattern double-handles errors.
-  try {
-    const presenter = new BatchPresenter();
-    const [error, data] = await tryCatch(
-      presenter.call({
-        environmentId: environment.id,
-        batchId: batchParam,
-        userId,
-      })
-    );
-
-    if (error) {
-      throw new Error(error.message);
-    }
-
-    return typedjson({ batch: data });
-  } catch (error) {
-    console.error(error);
-    throw new Response(undefined, {
-      status: 400,
-      statusText: "Something went wrong, if this problem persists please contact support.",
-    });
-  }
+  const presenter = new BatchPresenter();
+  const [error, data] = await tryCatch(
+    presenter.call({
+      environmentId: environment.id,
+      batchId: batchParam,
+      userId,
+    })
+  );
+
+  if (error) {
+    console.error(error);
+    throw new Response(undefined, {
+      status: 400,
+      statusText: "Something went wrong, if this problem persists please contact support.",
+    });
+  }
+
+  return typedjson({ batch: data });
75-75: Use named function declaration instead of default export.

Per coding guidelines for **/*.{ts,tsx,js,jsx}, prefer function declarations over default exports.
-export default function Page() {
+function Page() {
   // ... component body
 }
+
+export { Page as default };
Or rename to a more descriptive name like BatchDetailPage.
apps/webapp/app/runEngine/concerns/batchPayloads.server.ts (1)
148-163: Silent fallback on serialization failure may hide issues.

When JSON.stringify fails for non-JSON types, returning an empty packet without logging could make debugging difficult.

Consider adding debug logging for the catch case:
     // For other types, try to stringify
     try {
       return { data: JSON.stringify(payload), dataType: payloadType };
     } catch {
+      logger.debug("Failed to stringify payload, returning empty packet", {
+        payloadType,
+      });
       return { dataType: payloadType };
     }
packages/redis-worker/src/fair-queue/schedulers/drr.ts (2)
184-187: Fragile deficit key construction.

The key is built by splitting the master queue key and taking the first segment. This creates tight coupling to the key format and could break if the key structure changes.

Consider adding a dedicated method to the FairQueueKeyProducer interface for the DRR deficit key:
 #deficitKey(): string {
-  // Use a fixed key for DRR deficit tracking
-  return `${this.keys.masterQueueKey(0).split(":")[0]}:drr:deficit`;
+  // Consider adding a deficitKey() method to FairQueueKeyProducer
+  // For now, use a fixed prefix approach
+  const prefix = this.keys.masterQueueKey(0).split(":")[0] ?? "fairqueue";
+  return `${prefix}:drr:deficit`;
 }
Or better, add drrDeficitKey() to the FairQueueKeyProducer interface for consistency with other key methods.

189-215: Hardcoded limit of 1000 queues per shard.

The limit is hardcoded without documentation or configuration option. This could silently drop queues in high-volume scenarios.

Consider making this configurable:
+  private queueFetchLimit: number;
+
   constructor(private config: DRRSchedulerConfig) {
     // ... existing code
+    this.queueFetchLimit = config.queueFetchLimit ?? 1000;
   }

   async #getQueuesFromShard(shardKey: string): Promise<QueueWithScore[]> {
     const now = Date.now();
     const results = await this.redis.zrangebyscore(
       shardKey,
       "-inf",
       now,
       "WITHSCORES",
       "LIMIT",
       0,
-      1000 // Limit for performance
+      this.queueFetchLimit
     );
apps/webapp/app/runEngine/services/streamBatchItems.server.ts (2)
34-37: Constructor doesn't pass the engine option to base class.

The WithRunEngine base class accepts an optional engine parameter, but here only prisma is passed. While this works because the base class has a default, explicitly passing both options would be more consistent with the class design.
-  constructor(protected readonly _prisma: PrismaClientOrTransaction = prisma) {
-    super({ prisma });
+  constructor(protected readonly _prisma: PrismaClientOrTransaction = prisma, engine?: RunEngine) {
+    super({ prisma: _prisma, engine });
     this.payloadProcessor = new BatchPayloadProcessor();
   }
96-148: Consider adding a timeout or item limit to prevent unbounded processing.

The for await loop processes items indefinitely without any timeout or maximum item count check. If a client sends items slowly or never closes the stream, this could hold resources indefinitely.

Consider adding a timeout mechanism or enforcing the batch.runCount limit during streaming:
+        const maxItems = batch.runCount;
         // Process items from the stream
         for await (const rawItem of itemsIterator) {
+          // Safety check: don't accept more items than expected
+          if (itemsAccepted + itemsDeduplicated >= maxItems) {
+            throw new ServiceValidationError(
+              `Received more items than expected runCount ${maxItems}`
+            );
+          }
packages/redis-worker/src/fair-queue/keyProducer.ts (1)

94-102: Consider logging or documenting when fallback is used in extractTenantId.

The fallback to parts[0] ?? "" when the queue ID doesn't match the expected tenant:{tenantId}:... format could silently return incorrect tenant IDs. This might make debugging difficult in production.

The current behavior is reasonable for flexibility, but documenting the expected format clearly or logging when fallback is triggered would help with troubleshooting.
packages/core/src/v3/apiClient/index.ts (2)
427-432: Consider using ApiError for consistency in parse failure.

When the response fails to parse, a generic Error is thrown. Other methods in this class throw ApiError for server-related issues. Consider wrapping this in ApiError for consistent error handling by callers.
     if (!parsed.success) {
-      throw new Error(`Invalid response from server: ${parsed.error.message}`);
+      throw ApiError.generate(
+        response.status,
+        { error: "Invalid response format", details: parsed.error.message },
+        undefined,
+        Object.fromEntries(response.headers.entries())
+      );
     }
1545-1548: JSON.stringify can throw on circular references.

If a BatchItemNDJSON contains circular references (unlikely but possible with user data), JSON.stringify will throw. The error would propagate but might be confusing. Consider wrapping with try-catch for a clearer error message.
-        const line = JSON.stringify(item) + "\n";
-        controller.enqueue(encoder.encode(line));
+        try {
+          const line = JSON.stringify(item) + "\n";
+          controller.enqueue(encoder.encode(line));
+        } catch (err) {
+          controller.error(new Error(`Failed to serialize batch item at index ${index - 1}: ${(err as Error).message}`));
+        }
Also applies to: 1562-1564
packages/redis-worker/src/fair-queue/scheduler.ts (2)
1-1: Unused import: QueueDescriptor.

QueueDescriptor is imported from ./types.js but never used in this file.
-import type { FairScheduler, SchedulerContext, TenantQueues, QueueDescriptor } from "./types.js";
+import type { FairScheduler, SchedulerContext, TenantQueues } from "./types.js";
77-92: Consider parallelizing capacity checks for better performance.

The sequential await inside the loop means N tenants require N round-trips. For large tenant counts, parallel checks would be faster:
   protected async filterAtCapacity(
     tenants: TenantQueues[],
     context: SchedulerContext,
     groupName: string = "tenant"
   ): Promise<TenantQueues[]> {
-    const filtered: TenantQueues[] = [];
-
-    for (const tenant of tenants) {
-      const isAtCapacity = await context.isAtCapacity(groupName, tenant.tenantId);
-      if (!isAtCapacity) {
-        filtered.push(tenant);
-      }
-    }
-
-    return filtered;
+    const capacityChecks = await Promise.all(
+      tenants.map(async (tenant) => ({
+        tenant,
+        isAtCapacity: await context.isAtCapacity(groupName, tenant.tenantId),
+      }))
+    );
+    return capacityChecks.filter(({ isAtCapacity }) => !isAtCapacity).map(({ tenant }) => tenant);
   }
packages/redis-worker/src/fair-queue/masterQueue.ts (1)
24-30: Consider adding error handling for Redis connection failure.

The constructor creates a Redis client but doesn't handle connection errors. If Redis is unavailable, errors will occur on first operation. Consider adding connection verification or at least documenting that the caller should handle connection errors.
   constructor(private options: MasterQueueOptions) {
     this.redis = createRedisClient(options.redis);
     this.keys = options.keys;
     this.shardCount = Math.max(1, options.shardCount);
 
     this.#registerCommands();
+    // Note: Redis connection errors will surface on first operation
   }
packages/redis-worker/src/fair-queue/concurrency.ts (3)
10-14: Consider using type instead of interface per coding guidelines.

Per the TypeScript coding guidelines for this repository, prefer types over interfaces.
-export interface ConcurrencyManagerOptions {
-  redis: RedisOptions;
-  keys: FairQueueKeyProducer;
-  groups: ConcurrencyGroupConfig[];
-}
+export type ConcurrencyManagerOptions = {
+  redis: RedisOptions;
+  keys: FairQueueKeyProducer;
+  groups: ConcurrencyGroupConfig[];
+};
47-62: Sequential capacity check may cause unnecessary round-trips.

The canProcess method performs sequential async checks for each group. Consider batching these checks using a pipeline or the Lua script approach used in reserve for better performance with many groups.

94-107: Release operation is not atomic across groups.

Unlike reserve, the release method uses a pipeline which executes commands sequentially but not atomically. If a failure occurs mid-pipeline, some groups may have the message removed while others retain it.

Consider using a Lua script for atomic release similar to the reserve operation, or document that partial release is acceptable:
   async release(queue: QueueDescriptor, messageId: string): Promise<void> {
+    // Note: Pipeline is sufficient here as partial release is recoverable
+    // (worst case: message is removed from some groups, which is still safe)
     const pipeline = this.redis.pipeline();
packages/redis-worker/src/fair-queue/visibility.ts (1)
5-14: Consider using type instead of interface per coding guidelines.
-export interface VisibilityManagerOptions {
-  redis: RedisOptions;
-  keys: FairQueueKeyProducer;
-  shardCount: number;
-  defaultTimeoutMs: number;
-  logger?: {
-    debug: (message: string, context?: Record<string, unknown>) => void;
-    error: (message: string, context?: Record<string, unknown>) => void;
-  };
-}
+export type VisibilityManagerOptions = {
+  redis: RedisOptions;
+  keys: FairQueueKeyProducer;
+  shardCount: number;
+  defaultTimeoutMs: number;
+  logger?: {
+    debug: (message: string, context?: Record<string, unknown>) => void;
+    error: (message: string, context?: Record<string, unknown>) => void;
+  };
+};
apps/webapp/app/runEngine/services/createBatch.server.ts (1)
173-203: Error handling for P2002 could be more precise.

The current logic assumes that if the target doesn't include "oneTimeUseToken", it must be an idempotency key violation. However, there could be other unique constraints on the table.

Consider checking for the specific constraint name or including additional checks:
           if (
             Array.isArray(target) &&
             target.length > 0 &&
             typeof target[0] === "string" &&
             target[0].includes("oneTimeUseToken")
           ) {
             throw new ServiceValidationError(
               "Cannot create batch with a one-time use token as it has already been used."
             );
-          } else {
+          } else if (
+            Array.isArray(target) &&
+            target.some((t) => typeof t === "string" && t.includes("idempotencyKey"))
+          ) {
             throw new ServiceValidationError(
               "Cannot create batch as it has already been created with the same idempotency key."
             );
+          } else {
+            // Unknown unique constraint violation - re-throw original error
+            throw error;
           }
packages/redis-worker/src/fair-queue/workerQueue.ts (3)
4-11: Consider using type instead of interface per coding guidelines.
-export interface WorkerQueueManagerOptions {
-  redis: RedisOptions;
-  keys: FairQueueKeyProducer;
-  logger?: {
-    debug: (message: string, context?: Record<string, unknown>) => void;
-    error: (message: string, context?: Record<string, unknown>) => void;
-  };
-}
+export type WorkerQueueManagerOptions = {
+  redis: RedisOptions;
+  keys: FairQueueKeyProducer;
+  logger?: {
+    debug: (message: string, context?: Record<string, unknown>) => void;
+    error: (message: string, context?: Record<string, unknown>) => void;
+  };
+};
93-150: Blocking pop has a potential resource leak with abort signal.

The event listener added to the abort signal is never explicitly removed if the operation completes normally before abort. While { once: true } helps, the listener remains attached until either abort or GC.

Consider explicitly removing the listener in the finally block:
   async blockingPop(
     workerQueueId: string,
     timeoutSeconds: number,
     signal?: AbortSignal
   ): Promise<string | null> {
     const workerQueueKey = this.keys.workerQueueKey(workerQueueId);
     const blockingClient = this.redis.duplicate();
+    let cleanup: (() => void) | undefined;

     try {
       if (signal) {
-        const cleanup = () => {
+        cleanup = () => {
           blockingClient.disconnect();
         };
         signal.addEventListener("abort", cleanup, { once: true });

         if (signal.aborted) {
           return null;
         }
       }
       // ... rest of implementation
     } finally {
+      if (signal && cleanup) {
+        signal.removeEventListener("abort", cleanup);
+      }
       await blockingClient.quit().catch(() => {});
     }
   }
231-274: Lua script duplication between private and public methods.

The popWithLength Lua script is defined identically in both #registerCommands() and registerCommands(). Consider extracting the script to a constant to avoid drift.
+const POP_WITH_LENGTH_LUA = `
+local workerQueueKey = KEYS[1]
+local messageKey = redis.call('LPOP', workerQueueKey)
+if not messageKey then
+  return nil
+end
+local queueLength = redis.call('LLEN', workerQueueKey)
+return {messageKey, queueLength}
+`;

 #registerCommands(): void {
   this.redis.defineCommand("popWithLength", {
     numberOfKeys: 1,
-    lua: `
-local workerQueueKey = KEYS[1]
-...
-    `,
+    lua: POP_WITH_LENGTH_LUA,
   });
 }

 registerCommands(redis: Redis): void {
   redis.defineCommand("popWithLength", {
     numberOfKeys: 1,
-    lua: `
-local workerQueueKey = KEYS[1]
-...
-    `,
+    lua: POP_WITH_LENGTH_LUA,
   });
 }
internal-packages/run-engine/src/batch-queue/completionTracker.ts (1)
101-110: Consider validating parsed JSON against BatchMeta schema.

The getMeta method parses JSON and casts to BatchMeta without validation. If corrupted data exists in Redis, this could cause unexpected runtime errors.

Consider using Zod validation for defense-in-depth:
async getMeta(batchId: string): Promise<BatchMeta | null> {
  const key = this.metaKey(batchId);
  const metaJson = await this.redis.get(key);

  if (!metaJson) {
    return null;
  }

  const result = BatchMeta.safeParse(JSON.parse(metaJson));
  if (!result.success) {
    this.logger.error("Invalid batch metadata", { batchId, error: result.error.message });
    return null;
  }
  return result.data;
}
packages/redis-worker/src/fair-queue/index.ts (2)
786-795: Rate limiting waits indefinitely if resetAt is in the future.

When rate limited, the code waits until resetAt before proceeding. This could cause a consumer to block for an extended period. Consider adding a maximum wait time or checking abort signal during the wait.
     if (this.globalRateLimiter) {
       const result = await this.globalRateLimiter.limit();
       if (!result.allowed && result.resetAt) {
-        const waitMs = Math.max(0, result.resetAt - Date.now());
+        const waitMs = Math.min(5000, Math.max(0, result.resetAt - Date.now())); // Cap at 5s
         if (waitMs > 0) {
           this.logger.debug("Global rate limit reached, waiting", { waitMs, loopId });
           await new Promise((resolve) => setTimeout(resolve, waitMs));
         }
       }
     }
1026-1036: Duplicate rate limiting code could be extracted.

The rate limiting logic at lines 786-795 and 1026-1036 is duplicated. Consider extracting to a private helper method.
private async waitForRateLimit(loopId: string): Promise<void> {
  if (!this.globalRateLimiter) return;
  
  const result = await this.globalRateLimiter.limit();
  if (!result.allowed && result.resetAt) {
    const waitMs = Math.max(0, result.resetAt - Date.now());
    if (waitMs > 0) {
      this.logger.debug("Global rate limit reached, waiting", { waitMs, loopId });
      await new Promise((resolve) => setTimeout(resolve, waitMs));
    }
  }
}
packages/redis-worker/src/fair-queue/types.ts (1)
15-21: Consider using type instead of interface per coding guidelines.

The coding guidelines specify "Use types over interfaces for TypeScript". However, this is a new file with extensive interface usage. Given the scope and that interfaces provide better error messages for complex types, this may be acceptable.

If strict adherence to guidelines is required:
export type GlobalRateLimiter = {
  limit(): Promise<{ allowed: boolean; resetAt?: number }>;
};

.env.example

apps/webapp/app/env.server.ts

apps/webapp/app/runEngine/concerns/batchLimits.server.ts

docker/config/grafana/provisioning/dashboards/batch-queue.json

packages/redis-worker/src/fair-queue/schedulers/weighted.ts

packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts

packages/redis-worker/src/fair-queue/types.ts

packages/redis-worker/src/fair-queue/visibility.ts

packages/trigger-sdk/src/v3/shared.ts

apps/webapp/app/runEngine/services/streamBatchItems.server.ts

internal-packages/run-engine/src/batch-queue/index.ts

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/trigger-sdk/src/v3/batch.ts (1)
31-46: JSDoc return type is outdated.

The JSDoc comment on line 31 still references RetrieveBatchResponse, but the actual return type has been updated to RetrieveBatchV2Response. Update the JSDoc to maintain consistency.
- * @returns {ApiPromise<RetrieveBatchResponse>} A promise that resolves with the batch details
+ * @returns {ApiPromise<RetrieveBatchV2Response>} A promise that resolves with the batch details

♻️ Duplicate comments (13)

internal-packages/BATCH_TRIGGER_LIMITS.md (2)
1-6: Remove AI conversation artifacts from documentation.

Lines 1–6 contain conversational AI artifacts ([10 tools called], "Now I have a comprehensive picture") that should be removed so the document begins directly with the technical analysis.
-Let me analyze the potential issues with increasing batch size limits by examining the key components involved in batch processing:
-
-[10 tools called]
-
-Now I have a comprehensive picture. Let me compile a detailed analysis of the potential issues with increasing batch size limits:
-
 ## Analysis: Potential Issues with Larger Batch Sizes
220-220: Remove trailing AI prompt artifact.

Line 220 contains an AI prompt question that doesn't belong in technical documentation.
 9. **Batch checkpointing**: For batches > 1,000 runs, checkpoint progress to allow recovery without reprocessing
-
-Would you like me to start implementing any of these recommendations?
apps/webapp/app/env.server.ts (1)
943-950: Verify BATCH_QUEUE_DRR_QUANTUM and BATCH_QUEUE_MAX_DEFICIT have safe fallbacks.

These optional env vars are used by the DRR scheduler. Ensure the consuming code provides safe defaults when these are undefined, otherwise a runtime error may occur when the scheduler tries to use them.
#!/bin/bash
# Check how these env vars are consumed and if defaults are provided at the usage site
rg "BATCH_QUEUE_DRR_QUANTUM|BATCH_QUEUE_MAX_DEFICIT" -t ts -B 2 -A 5 | head -80
.env.example (1)

88-94: Add trailing blank line.

The local observability documentation is well-structured. However, the file is missing a trailing newline at the end.

docker/config/grafana/provisioning/dashboards/batch-queue.json (1)

113-118: NaN when no items are processed - previously flagged.

The success rate expression sum(rate(...processed...)) / (sum(rate(...processed...)) + sum(rate(...failed...))) returns NaN when both rates are 0. This was noted in a previous review.
packages/redis-worker/src/fair-queue/schedulers/weighted.ts (3)
246-257: Division by zero risk when all tenants have zero avgAge.

When all queues have a score of 0, maxAge will be 0, causing t.avgAge / maxAge to produce NaN. This was flagged in a previous review.
     // Weighted shuffle to select top N tenants
     const maxAge = Math.max(...tenantAges.map((t) => t.avgAge));
+    if (maxAge === 0) {
+      // All tenants have same age, use equal weights
+      const selectedTenants = new Set(tenantAges.slice(0, this.maximumTenantCount).map(t => t.tenantId));
+      return queues.filter((q) => selectedTenants.has(q.tenantId));
+    }
     const weightedTenants = tenantAges.map((t) => ({
302-314: Division by zero risk when maxLimit is 0.

If all tenants have a concurrency limit of 0, the weight calculation at line 313 will divide by zero. This was flagged in a previous review.
     // Calculate weights
     const maxLimit = Math.max(
       ...tenantIds.map((id) => snapshot.tenants.get(id)!.concurrency.limit)
     );
+    if (maxLimit === 0) {
+      return this.#shuffle(tenantIds);
+    }
344-349: Division by zero risk when all queues have age 0.

If maxAge is 0 (all queues are brand new), the weight calculation will produce NaN. This was flagged in a previous review.
     // Weighted random based on age
     const maxAge = Math.max(...queues.map((q) => q.age));
+    if (maxAge === 0) {
+      return queues.map((q) => q.queueId);
+    }
     const weightedQueues = queues.map((q) => ({
packages/redis-worker/src/fair-queue/visibility.ts (1)

363-372: Member parsing assumes messageId contains no colons.

The #parseMember method uses indexOf(":") to split. If a custom messageId contains colons, parsing produces incorrect results. Use lastIndexOf(":") instead since queueId is appended last, or document the constraint.
packages/redis-worker/src/fair-queue/index.ts (1)
1337-1351: Remove unused shardId when DLQ is disabled (minor).

In the !deadLetterQueueEnabled branch of #moveToDeadLetterQueue, shardId is computed but never used:
if (!this.deadLetterQueueEnabled) {
  // Just complete and discard
  const shardId = this.masterQueue.getShardForQueue(storedMessage.queueId);
  await this.visibilityManager.complete(storedMessage.id, storedMessage.queueId);
  return;
}
You can safely drop the shardId line to avoid linter warnings.
packages/trigger-sdk/src/v3/shared.ts (1)
1517-1564: Expose batchId on Phase 2 failures to aid recovery (non-blocking).

If createBatch succeeds but streamBatchItems throws, the error surfaced from executeBatchTwoPhase doesn’t carry the batch.id, making post-failure inspection or manual recovery harder for callers. Consider attaching the batchId to the thrown error (or wrapping it) so higher-level code can decide whether to inspect/cleanup the batch:
-  const batch = await apiClient.createBatch(
-    {
-      runCount: items.length,
-      parentRunId: options.parentRunId,
-      resumeParentOnCompletion: options.resumeParentOnCompletion,
-      idempotencyKey: options.idempotencyKey,
-    },
-    { spanParentAsLink: options.spanParentAsLink },
-    requestOptions
-  );
-
-  // If the batch was cached (idempotent replay), skip streaming items
-  if (!batch.isCached) {
-    // Phase 2: Stream items
-    await apiClient.streamBatchItems(batch.id, items, requestOptions);
-  }
+  const batch = await apiClient.createBatch(
+    {
+      runCount: items.length,
+      parentRunId: options.parentRunId,
+      resumeParentOnCompletion: options.resumeParentOnCompletion,
+      idempotencyKey: options.idempotencyKey,
+    },
+    { spanParentAsLink: options.spanParentAsLink },
+    requestOptions
+  );
+
+  // If the batch was cached (idempotent replay), skip streaming items
+  if (!batch.isCached) {
+    try {
+      // Phase 2: Stream items
+      await apiClient.streamBatchItems(batch.id, items, requestOptions);
+    } catch (error) {
+      (error as any).batchId = batch.id;
+      throw error;
+    }
+  }
This keeps existing behavior but gives callers more context when Phase 2 fails.
packages/redis-worker/src/fair-queue/types.ts (1)
40-79: Consider unifying metadata types across queue models (optional).

QueueMessage.metadata is typed as Record<string, unknown>, while StoredMessage.metadata and the metadata in EnqueueOptions / EnqueueBatchOptions are Record<string, string>. Since StoredMessage.metadata is directly passed through to QueueMessage.metadata, this asymmetry is safe but a bit surprising.

If metadata is intended to hold arbitrary JSON-like values everywhere, you could simplify by making these all Record<string, unknown>:
-export interface StoredMessage<TPayload = unknown> {
+export interface StoredMessage<TPayload = unknown> {
   ...
-  metadata?: Record<string, string>;
+  metadata?: Record<string, unknown>;
 }

-export interface EnqueueOptions<TPayload = unknown> {
+export interface EnqueueOptions<TPayload = unknown> {
   ...
-  metadata?: Record<string, string>;
+  metadata?: Record<string, unknown>;
 }

-export interface EnqueueBatchOptions<TPayload = unknown> {
+export interface EnqueueBatchOptions<TPayload = unknown> {
   ...
-  metadata?: Record<string, string>;
+  metadata?: Record<string, unknown>;
 }
Not required for correctness, but it keeps the type surface consistent.

Also applies to: 458-492
internal-packages/run-engine/src/batch-queue/index.ts (1)
63-67: Fix logger initialization to honor options.logger.

The constructor still ignores a provided options.logger; both ternary branches create a new Logger instance:
this.logger = options.logger
  ? new Logger("BatchQueue", "info")
  : new Logger("BatchQueue", "info");
You likely want to use the supplied logger when present:
-    this.logger = options.logger
-      ? new Logger("BatchQueue", "info")
-      : new Logger("BatchQueue", "info");
+    this.logger = options.logger ?? new Logger("BatchQueue", "info");

🧹 Nitpick comments (15)

internal-packages/BATCH_TRIGGER_LIMITS.md (1)
212-212: Replace overused intensifier "very" with more specific language.

Lines 212 and 216 use "very" which can be more precisely expressed.
-7. **Streaming batch completion**: Instead of waiting for all runs, allow partial results callback for very large batches
+7. **Streaming batch completion**: Instead of waiting for all runs, allow partial results callback for large batches (>1,000 runs)

-8. **Separate batch runs table**: For very large batches, consider a denormalized `BatchRun` junction table optimized for batch queries
+8. **Separate batch runs table**: For batches exceeding 1,000 runs, consider a denormalized `BatchRun` junction table optimized for batch queries
Also applies to: 216-216
internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)

66-78: Consider extracting the version string to a shared constant.

The magic string "runengine:v2" is used in multiple places across the codebase (batchSystem.ts:68, BatchPresenter.server.ts:74, and test files). Extracting it to a constant (e.g., BATCH_VERSION_RUNENGINE_V2) would improve maintainability and prevent typo-related bugs if this string is referenced elsewhere.
apps/webapp/seed.mts (1)
113-156: Consider simplifying tenant array construction and guarding against missing environments.

The repetitive tenant object construction could be simplified, and the secretKey lookup may silently produce undefined if no DEVELOPMENT environment exists.
-  console.log("tenants.json");
-  console.log(
-    JSON.stringify({
-      apiUrl: "http://localhost:3030",
-      tenants: [
-        {
-          id: org1Project1.project.externalRef,
-          secretKey: org1Project1.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        {
-          id: org1Project2.project.externalRef,
-          secretKey: org1Project2.environments.find((e) => e.type === "DEVELOPMENT")?.apiKey,
-        },
-        // ... remaining entries
-      ],
-    })
-  );
+  const allProjects = [
+    org1Project1, org1Project2, org1Project3,
+    org2Project1, org2Project2, org2Project3,
+    org3Project1, org3Project2, org3Project3,
+  ];
+
+  const tenants = allProjects.map((p) => {
+    const devEnv = p.environments.find((e) => e.type === "DEVELOPMENT");
+    if (!devEnv) {
+      console.warn(`⚠️  No DEVELOPMENT environment found for project ${p.project.name}`);
+    }
+    return {
+      id: p.project.externalRef,
+      secretKey: devEnv?.apiKey,
+    };
+  });
+
+  console.log("tenants.json");
+  console.log(JSON.stringify({ apiUrl: "http://localhost:3030", tenants }));
docs/batch-queue-metrics.md (1)
180-212: Add language specifiers to PromQL code blocks.

The fenced code blocks containing PromQL queries should have a language specifier for proper syntax highlighting. This was flagged by markdownlint.
-```
+```promql
 # Throughput
 rate(batch_queue_items_processed_total[5m])
Apply the same change to the code blocks at lines 194 and 205.
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.batches.$batchParam/route.tsx (1)
51-72: Redundant error handling pattern.

The code uses tryCatch from @trigger.dev/core which returns [error, data], but then wraps it in a try/catch block. The inner throw new Error(error.message) at line 62 is caught by the outer catch at line 66, which logs and returns a generic 400. This works but is slightly redundant.

Consider simplifying to use only one error handling approach, or making the error response more specific (e.g., 404 for "Batch not found" vs 400 for other errors).
   try {
     const presenter = new BatchPresenter();
     const [error, data] = await tryCatch(
       presenter.call({
         environmentId: environment.id,
         batchId: batchParam,
         userId,
       })
     );

     if (error) {
-      throw new Error(error.message);
+      console.error("BatchPresenter error:", error);
+      if (error.message === "Batch not found") {
+        throw new Response("Batch not found", { status: 404 });
+      }
+      throw new Response(undefined, {
+        status: 400,
+        statusText: error.message,
+      });
     }

     return typedjson({ batch: data });
   } catch (error) {
+    if (error instanceof Response) {
+      throw error;
+    }
     console.error(error);
     throw new Response(undefined, {
       status: 400,
       statusText: "Something went wrong, if this problem persists please contact support.",
     });
   }
apps/webapp/app/runEngine/concerns/batchPayloads.server.ts (1)

86-104: Consider documenting the inline storage limitation.

When object storage is unavailable but payload exceeds the threshold, the code logs a warning and continues with inline storage. The comment notes this "may fail downstream for very large payloads."

This is a reasonable fallback, but consider whether this should be a hard failure in production environments where R2 is expected to be configured. As-is, large payloads could silently cause issues downstream.
packages/redis-worker/src/fair-queue/retry.ts (1)
29-52: Minor redundancy in maxAttempts assignment.

maxAttempts is assigned twice - once within the options object (line 35) and again directly on line 41. This works but is slightly redundant.
   constructor(options?: Partial<RetryOptions>) {
     this.options = {
       maxAttempts: options?.maxAttempts ?? 12,
       factor: options?.factor ?? 2,
       minTimeoutInMs: options?.minTimeoutInMs ?? 1_000,
       maxTimeoutInMs: options?.maxTimeoutInMs ?? 3_600_000, // 1 hour
       randomize: options?.randomize ?? true,
     };
-    this.maxAttempts = this.options.maxAttempts ?? 12;
+    this.maxAttempts = this.options.maxAttempts!;
   }
packages/core/src/v3/apiClient/index.ts (1)

394-435: streamBatchItems lacks retry logic available to other API methods.

Unlike other methods that use zodfetch with built-in retry support via mergeRequestOptions, this method uses raw fetch. For large batch streams, transient network failures could cause the entire stream to fail without retry.

Consider documenting this limitation or implementing retry at a higher level for the streaming use case.
internal-packages/run-engine/src/batch-queue/tests/index.test.ts (1)
338-340: Avoid fixed sleep in tests; prefer event-driven waits.

Using setTimeout with a fixed 200ms delay is a potential source of flakiness. Consider using a more reliable mechanism like polling getBatchRemainingCount or checking queue state.
-        // Wait a bit - nothing should be processed
-        await new Promise((resolve) => setTimeout(resolve, 200));
-        expect(processedItems).toHaveLength(0);
+        // Verify items are enqueued but not processed (consumers not started)
+        const remaining = await queue.getBatchRemainingCount("batch1");
+        expect(remaining).toBe(3);
+        expect(processedItems).toHaveLength(0);
packages/redis-worker/src/fair-queue/concurrency.ts (2)
47-62: TOCTOU race between canProcess and reserve.

canProcess checks capacity non-atomically before reserve is called. Between these calls, another consumer could reserve slots, causing reserve to fail unexpectedly. This is acceptable if callers handle reserve returning false, but consider documenting this or removing canProcess if reserve already handles the atomic check.

97-107: Pipeline exec result not checked for errors.

pipeline.exec() returns an array of [error, result] tuples. If any SREM fails, the error is silently ignored. Consider checking for errors or at minimum logging them.
-    await pipeline.exec();
+    const results = await pipeline.exec();
+    if (results) {
+      for (const [err] of results) {
+        if (err) {
+          console.error("Error releasing concurrency:", err);
+        }
+      }
+    }
packages/redis-worker/src/fair-queue/telemetry.ts (1)
248-318: Consider error handling in gauge callbacks.

The gauge callbacks iterate over queues/shards and make async Redis calls. If any call fails, the entire callback throws and potentially disrupts metric collection for other dimensions.

Consider wrapping individual calls in try-catch to ensure partial failures don't prevent other metrics from being observed:
     if (callbacks.getQueueLength && callbacks.observedQueues) {
       const getQueueLength = callbacks.getQueueLength;
       const queues = callbacks.observedQueues;

       this.metrics.queueLength.addCallback(async (observableResult) => {
         for (const queueId of queues) {
-          const length = await getQueueLength(queueId);
-          observableResult.observe(length, {
-            [FairQueueAttributes.QUEUE_ID]: queueId,
-          });
+          try {
+            const length = await getQueueLength(queueId);
+            observableResult.observe(length, {
+              [FairQueueAttributes.QUEUE_ID]: queueId,
+            });
+          } catch {
+            // Skip this queue on error, continue with others
+          }
         }
       });
     }
apps/webapp/app/runEngine/services/streamBatchItems.server.ts (1)
213-270: Reuse TextEncoder instance for better performance.

A new TextEncoder is created for each line (lines 234, 255). Since the stream may process many items, consider hoisting the encoder outside the transform callbacks.
 export function createNdjsonParserStream(
   maxItemBytes: number
 ): TransformStream<Uint8Array, unknown> {
   const decoder = new TextDecoder();
+  const encoder = new TextEncoder();
   let buffer = "";
   let lineNumber = 0;

   return new TransformStream<Uint8Array, unknown>({
     transform(chunk, controller) {
       buffer += decoder.decode(chunk, { stream: true });

       // Split on newlines
       const lines = buffer.split("\n");
       buffer = lines.pop() ?? "";

       for (const line of lines) {
         lineNumber++;
         const trimmed = line.trim();
         if (!trimmed) continue;

         // Check byte size before parsing
-        const lineBytes = new TextEncoder().encode(trimmed).length;
+        const lineBytes = encoder.encode(trimmed).length;
packages/redis-worker/src/fair-queue/workerQueue.ts (1)
231-274: Duplicate Lua script definition.

The popWithLength Lua script is defined twice - in #registerCommands() and registerCommands(). Extract the script to a constant to avoid duplication and potential drift.
+const POP_WITH_LENGTH_LUA = `
+local workerQueueKey = KEYS[1]
+
+-- Pop the first message
+local messageKey = redis.call('LPOP', workerQueueKey)
+if not messageKey then
+  return nil
+end
+
+-- Get remaining queue length
+local queueLength = redis.call('LLEN', workerQueueKey)
+
+return {messageKey, queueLength}
+`;
+
 export class WorkerQueueManager {
   // ...
   
   #registerCommands(): void {
     this.redis.defineCommand("popWithLength", {
       numberOfKeys: 1,
-      lua: `
-local workerQueueKey = KEYS[1]
-...
-      `,
+      lua: POP_WITH_LENGTH_LUA,
     });
   }

   registerCommands(redis: Redis): void {
     redis.defineCommand("popWithLength", {
       numberOfKeys: 1,
-      lua: `
-local workerQueueKey = KEYS[1]
-...
-      `,
+      lua: POP_WITH_LENGTH_LUA,
     });
   }
 }
apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.batches/route.tsx (1)
212-264: Unify row selection styling by passing isSelected into cells (optional).

Right now selection is applied via className on TableRow and only TableCellMenu receives isSelected, so hover/focus styles may differ between the menu cell and the others. For more consistent UX and keyboard focus outlines, consider passing isSelected into the data cells as well and letting the table primitives handle styling:
-              <TableRow key={batch.id} className={isSelected ? "bg-grid-dimmed" : undefined}>
-                <TableCell to={inspectorPath} isTabbableCell>
+              <TableRow key={batch.id}>
+                <TableCell to={inspectorPath} isTabbableCell isSelected={isSelected}>
...
-                <TableCell to={inspectorPath}>
+                <TableCell to={inspectorPath} isSelected={isSelected}>
...
-                <BatchActionsCell runsPath={runsPath} />
+                <BatchActionsCell runsPath={runsPath} isSelected={isSelected} />
and update BatchActionsCell to accept/forward isSelected into TableCellMenu.

Also applies to: 281-291

docker/config/grafana/provisioning/dashboards/nodejs-runtime.json

internal-packages/database/prisma/schema.prisma

internal-packages/run-engine/src/engine/index.ts

packages/redis-worker/src/fair-queue/index.ts

packages/redis-worker/src/fair-queue/visibility.ts

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/webapp/app/runEngine/services/triggerTask.server.ts (1)
347-359: Use safeParse to prevent validation errors from bypassing the ServiceValidationError path.

TaskRunError.parse(taskRun.error) will throw if taskRun.error doesn't match the expected schema, which exits the try block before reaching your explicit error handling at lines 355–359. This changes the error surface from ServiceValidationError to ZodError. Use safeParse with an explicit success check instead:
-            const error = taskRun.error ? TaskRunError.parse(taskRun.error) : undefined;
+            const parsedError = taskRun.error ? TaskRunError.safeParse(taskRun.error) : undefined;
+            const error = parsedError?.success ? parsedError.data : undefined;
+
+            if (taskRun.error && !parsedError?.success) {
+              logger.error("Failed to parse taskRun.error", {
+                taskId,
+                runFriendlyId,
+              });
+              throw new ServiceValidationError("Run failed with an unparseable error payload.");
+            }

♻️ Duplicate comments (2)

internal-packages/run-engine/src/engine/index.ts (2)

333-333: TLS configuration is incorrectly converted to boolean.

This issue was already identified in a previous review. Line 333 converts the TLS options object to a boolean true instead of passing the actual TLS configuration.

968-973: isBatchQueueEnabled() always returns true.

This issue was already identified in a previous review. Since batchQueue is unconditionally initialized in the constructor (lines 325-346), this method will always return true.

🧹 Nitpick comments (1)

internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)
66-88: Consider extracting the batch version string to a constant.

The hardcoded string "runengine:v2" at line 68 could be extracted to a named constant (e.g., BATCH_VERSION_V2) to prevent typos and improve maintainability if this version string is referenced elsewhere in the codebase.

For example, at the top of the file or in a shared constants file:
const BATCH_VERSION_V2 = "runengine:v2";
Then use it in the comparison:
-  const isNewBatch = batch.batchVersion === "runengine:v2";
+  const isNewBatch = batch.batchVersion === BATCH_VERSION_V2;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ed008d and 342c9fc.

📒 Files selected for processing (8)

apps/webapp/app/runEngine/concerns/runNumbers.server.ts (0 hunks)
apps/webapp/app/runEngine/services/triggerTask.server.ts (1 hunks)
apps/webapp/app/runEngine/types.ts (0 hunks)
apps/webapp/app/v3/services/triggerTask.server.ts (0 hunks)
internal-packages/run-engine/src/engine/index.ts (5 hunks)
internal-packages/run-engine/src/engine/systems/batchSystem.ts (2 hunks)
internal-packages/run-engine/src/engine/tests/batchTwoPhase.test.ts (1 hunks)
internal-packages/run-engine/src/engine/types.ts (3 hunks)

💤 Files with no reviewable changes (3)

apps/webapp/app/runEngine/types.ts
apps/webapp/app/v3/services/triggerTask.server.ts
apps/webapp/app/runEngine/concerns/runNumbers.server.ts

🚧 Files skipped from review as they are similar to previous changes (2)

internal-packages/run-engine/src/engine/tests/batchTwoPhase.test.ts
internal-packages/run-engine/src/engine/types.ts

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/batchSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/batchSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/batchSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

🧠 Learnings (29)

📓 Common learnings

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

Applied to files:

internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/batchSystem.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-08-14T18:35:44.370Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Applied to files:

internal-packages/run-engine/src/engine/index.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to internal-packages/database/**/*.{ts,tsx} : Use Prisma for database interactions in internal-packages/database with PostgreSQL

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Leverage the PostgreSQL database through the `trigger.dev/database` Prisma 6.14.0 client in the webapp for all data access patterns

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Applied to files:

internal-packages/run-engine/src/engine/index.ts
internal-packages/run-engine/src/engine/systems/batchSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Applied to files:

internal-packages/run-engine/src/engine/index.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeyTTL` option to define a time window during which duplicate triggers return the original run

Applied to files:

internal-packages/run-engine/src/engine/index.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeys.create()` to create idempotency keys for preventing duplicate task executions

Applied to files:

internal-packages/run-engine/src/engine/index.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Applied to files:

internal-packages/run-engine/src/engine/systems/batchSystem.ts
apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Export tasks with unique IDs within the project to enable proper task discovery and execution

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.trigger()` with type-only imports to trigger tasks from backend code without importing the task implementation

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schedules.task()` for scheduled/cron tasks instead of regular `task()`

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use metadata methods (set, del, replace, append, remove, increment, decrement, stream, flush) to update metadata during task execution

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Subscribe to run updates using `runs.subscribeToRun()` for realtime monitoring of task execution

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Configure OpenTelemetry instrumentations and exporters in trigger.config.ts for enhanced logging

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk/v3` with Zod schema for payload validation

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.trigger()` to trigger a task from inside another task with specified payload

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.triggerAndWait()` to trigger a task and wait for its result from a parent task

Applied to files:

apps/webapp/app/runEngine/services/triggerTask.server.ts

🧬 Code graph analysis (1)

internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)

apps/webapp/app/runEngine/services/batchTrigger.server.ts (2)

batch (521-664)

batch (666-722)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Cursor Bugbot
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: typecheck / typecheck
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (7)

internal-packages/run-engine/src/engine/systems/batchSystem.ts (1)

47-49: LGTM!

The Prisma selection correctly includes the new fields needed to support v2 batch processing with success/failure counts.

internal-packages/run-engine/src/engine/index.ts (5)

23-23: LGTM!

The import additions correctly bring in the BatchQueue infrastructure and related types needed for the v3 two-phase batch processing system.

Also applies to: 31-38

84-84: LGTM!

The batchQueue field is appropriately declared as private and used consistently throughout the class.

321-332: LGTM!

The BatchQueue initialization is well-structured with sensible defaults and helpful logging. The configuration correctly ties consumer startup to the worker disabled flag.

Also applies to: 334-353

975-1056: LGTM!

The new BatchQueue methods are well-documented and provide a clean API surface for the two-phase batch processing system. The JSDoc comments clearly explain the purpose and usage of each method, especially the distinction between Phase 1 (initialization) and Phase 2 (item streaming).

1320-1321: LGTM!

The BatchQueue cleanup is correctly integrated into the shutdown sequence, ensuring proper resource cleanup and Redis connection closure.

apps/webapp/app/runEngine/services/triggerTask.server.ts (1)

270-274: Good: trace/span attributes are set consistently (queueName + runId).

apps/webapp/app/runEngine/services/triggerTask.server.ts

apps/webapp/app/v3/runEngine.server.ts

apps/webapp/app/runEngine/concerns/batchPayloads.server.ts

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

apps/webapp/app/runEngine/services/createBatch.server.ts (1)

119-153: Race condition: idempotency may be saved before batch is fully initialized.

The event at line 121 triggers idempotency key saving asynchronously via Evt.post(), which doesn't await handlers. If initializeBatch (line 153) fails after the idempotency key is saved, subsequent retries will return a cached batch that isn't fully initialized in Redis.

Consider moving onBatchTaskRunCreated.post(batch) after initializeBatch completes, or ensuring idempotency is only saved after all initialization succeeds.
internal-packages/run-engine/src/engine/index.ts (1)
325-346: TLS configuration is incorrectly converted to boolean.

Line 333 converts the TLS options object to true instead of passing the actual TLS configuration, losing any custom settings (certificates, ciphers, etc.).

Apply this diff:
       keyPrefix: `${options.batchQueue?.redis.keyPrefix ?? ""}batch-queue:`,
       enableAutoPipelining: options.batchQueue?.redis.enableAutoPipelining ?? true,
-      tls: options.batchQueue?.redis.tls !== undefined,
+      tls: options.batchQueue?.redis.tls,
     },

🧹 Nitpick comments (1)

apps/webapp/app/entry.server.tsx (1)

1-25: Minor: remove stale inline comment on Remix node import
import { createReadableStreamFromReadable, type EntryContext } from "@remix-run/node"; // or cloudflare/deno (Line 1) reads like leftover guidance; prefer removing to reduce noise.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 342c9fc and ef76ff7.

📒 Files selected for processing (8)

apps/webapp/app/entry.server.tsx (2 hunks)
apps/webapp/app/routes/api.v3.batches.$batchId.items.ts (1 hunks)
apps/webapp/app/routes/api.v3.batches.ts (1 hunks)
apps/webapp/app/runEngine/services/createBatch.server.ts (1 hunks)
apps/webapp/app/v3/runEngine.server.ts (2 hunks)
apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
apps/webapp/test/engine/triggerTask.test.ts (0 hunks)
internal-packages/run-engine/src/engine/index.ts (5 hunks)

💤 Files with no reviewable changes (1)

apps/webapp/test/engine/triggerTask.test.ts

🚧 Files skipped from review as they are similar to previous changes (3)

apps/webapp/app/routes/api.v3.batches.ts
apps/webapp/app/routes/api.v3.batches.$batchId.items.ts
apps/webapp/app/v3/runEngine.server.ts

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx
internal-packages/run-engine/src/engine/index.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx
internal-packages/run-engine/src/engine/index.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx

**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
apps/webapp/app/entry.server.tsx
internal-packages/run-engine/src/engine/index.ts

🧠 Learnings (26)

📓 Common learnings

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/entry.server.tsx
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/runEngine/services/createBatch.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeys.create()` to create idempotency keys for preventing duplicate task executions

Applied to files:

apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeyTTL` option to define a time window during which duplicate triggers return the original run

Applied to files:

apps/webapp/app/runEngine/services/createBatch.server.ts
internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/**/*.{ts,tsx} : When importing from `trigger.dev/core` in the webapp, use subpath exports from the package.json instead of importing from the root path

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/services/**/*.server.{ts,tsx} : Separate testable services from configuration files; follow the pattern of `realtimeClient.server.ts` (testable service) and `realtimeClientGlobal.server.ts` (configuration) in the webapp

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/v3/services/**/*.server.{ts,tsx} : Organize services in the webapp following the pattern `app/v3/services/*/*.server.ts`

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: The webapp at apps/webapp is a Remix 2.1 application using Node.js v20

Applied to files:

apps/webapp/app/entry.server.tsx

📚 Learning: 2025-08-14T18:35:44.370Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to internal-packages/database/**/*.{ts,tsx} : Use Prisma for database interactions in internal-packages/database with PostgreSQL

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Leverage the PostgreSQL database through the `trigger.dev/database` Prisma 6.14.0 client in the webapp for all data access patterns

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger.config.ts : Specify runtime environment (node or bun) in trigger.config.ts using the `runtime` property

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

internal-packages/run-engine/src/engine/index.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

internal-packages/run-engine/src/engine/index.ts

🧬 Code graph analysis (2)

apps/webapp/app/v3/runEngineHandlers.server.ts (4)

apps/webapp/app/v3/runEngine.server.ts (1)

engine (11-11)

internal-packages/tracing/src/index.ts (1)

SpanKind (34-34)

apps/webapp/app/v3/services/triggerTask.server.ts (1)

TriggerTaskService (51-120)

apps/webapp/app/db.server.ts (1)

prisma (101-101)

apps/webapp/app/entry.server.tsx (1)

apps/webapp/app/v3/runEngineHandlers.server.ts (1)

setupBatchQueueCallbacks (645-794)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Cursor Bugbot
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (11)

apps/webapp/app/runEngine/services/createBatch.server.ts (4)

1-12: LGTM!

Imports are properly structured with subpath imports from @trigger.dev/core/v3 as per coding guidelines.

14-20: LGTM!

Type definition uses type over interface and properly types optional fields with string union for realtimeStreamsVersion.

34-44: LGTM!

Class structure follows the existing service pattern in this codebase, extending WithRunEngine and initializing dependencies in the constructor.

173-203: LGTM!

Error handling properly distinguishes between oneTimeUseToken and idempotencyKey unique constraint violations, providing clear user-facing error messages via ServiceValidationError.

internal-packages/run-engine/src/engine/index.ts (3)

31-38: LGTM!

BatchQueue imports and types are properly structured with .js extensions for ESM compatibility.

964-1048: LGTM!

BatchQueue wrapper methods are well-documented with clear JSDoc comments and provide appropriate delegation to the underlying BatchQueue instance.

1303-1318: LGTM!

BatchQueue is properly closed during engine shutdown, ensuring Redis connections are cleaned up.
apps/webapp/app/v3/runEngineHandlers.server.ts (2)
645-727: Reduce per-item overhead: reuse TriggerTaskService instance
Creating new TriggerTaskService() per item (Line 662) will add avoidable overhead at high throughput. Prefer instantiating once in setupBatchQueueCallbacks() and closing over it.
 export function setupBatchQueueCallbacks() {
+  const triggerTaskService = new TriggerTaskService();
   // Item processing callback - creates a run for each batch item
   engine.setBatchProcessItemCallback(async ({ batchId, friendlyId, itemIndex, item, meta }) => {
…
-          const triggerTaskService = new TriggerTaskService();
-
           // Normalize payload - for application/store (R2 paths), this passes through as-is
           const payload = normalizePayload(item.payload, item.payloadType);
[ suggest_recommended_refactor ]

1-19: Fix incomplete environment object passed to determineEngineVersion

The constructed environment object at line 675 is missing the project.engine field required by determineEngineVersion(). This will cause a runtime crash when the batch queue tries to determine the engine version.

The object currently provides only project: { id: ... }, but determineEngineVersion accesses environment.project.engine to determine the RunEngineVersion. Additionally, the as AuthenticatedEnvironment cast masks this incompatibility.

Fetch the full environment from the database (via findEnvironmentFromRun already imported at line 10, or use the prisma query directly) to ensure all required fields are available, rather than constructing a partial object.
⛔ Skipped due to learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions
apps/webapp/app/entry.server.tsx (2)

241-242: Good: exporting runWithHttpContext enables request-scoped async context
This is a nice plumbing improvement for propagating per-request context across async boundaries.

196-203: This concern is based on a misunderstanding of the execution model. While Worker.init() and bootstrap() are invoked before the singleton() calls, neither is awaited, meaning they return promises that execute asynchronously. The singleton() calls on lines 236-237 execute synchronously and immediately, registering the handlers before the async work in Worker.init() and bootstrap() begins. Since neither function contains synchronous code that emits events, the handlers will always be registered before any events can fire. The proposed reordering is unnecessary.

Likely an incorrect or invalid review comment.

apps/webapp/app/v3/runEngineHandlers.server.ts

internal-packages/run-engine/src/engine/index.ts

apps/webapp/app/v3/runEngineHandlers.server.ts

internal-packages/database/prisma/schema.prisma

coderabbitai

Actionable comments posted: 14

♻️ Duplicate comments (9)

apps/webapp/app/runEngine/concerns/batchLimits.server.ts (1)

56-75: Duration format validation still missing.

The type assertion as Duration on line 60 bypasses validation, as previously flagged. The environment variable BATCH_RATE_LIMIT_REFILL_INTERVAL is only validated as a string without format checking. An invalid duration format would pass validation but fail at runtime.
apps/webapp/app/v3/runEngineHandlers.server.ts (3)
734-796: Make batch completion writes idempotent + atomic (transaction + createMany/skipDuplicates)

This is the same issue previously flagged: the callback updates batchTaskRun then inserts errors one-by-one (Lines 749-777). If the callback is re-invoked or partially fails, you can get partial updates and/or duplicate batchTaskRunError rows.

812-831: normalizePayload() should not JSON.parse when payloadType is undefined

As previously noted: the current condition (Line 815) allows parsing when payloadType is undefined, which can unintentionally transform plain string payloads.
-  if (payloadType !== "application/json" && payloadType !== undefined) {
+  if (payloadType !== "application/json") {
     return payload;
   }
645-732: Pass planType when calling TriggerTaskService.call() with skipChecks: true (billing/usage correctness)

skipChecks: true is set (line 698), but no planType is provided in the options. The RunEngineTriggerTaskService explicitly requires planType when skipChecks is enabled to ensure proper billing/usage attribution—when omitted, it logs a warning but processes the run with planType === undefined, breaking usage tracking.

The v2 batch trigger (batchTrigger.server.ts) correctly passes planType from batch-level entitlement checks. Adopt the same pattern here: ensure planType is included in the batch meta object during batch creation, then pass meta.planType in the TriggerTaskService.call() options.
           const result = await triggerTaskService.call(
             item.task,
             environment,
             {
               payload,
               options: {
                 ...(item.options as Record<string, unknown>),
                 payloadType: item.payloadType,
                 parentRunId: meta.parentRunId,
                 resumeParentOnCompletion: meta.resumeParentOnCompletion,
                 parentBatch: batchId,
               },
             },
             {
               triggerVersion: meta.triggerVersion,
               traceContext: meta.traceContext as Record<string, unknown> | undefined,
               spanParentAsLink: meta.spanParentAsLink,
               batchId,
               batchIndex: itemIndex,
               skipChecks: true, // Already validated at batch level
+              planType: meta.planType,
               realtimeStreamsVersion: meta.realtimeStreamsVersion,
             },
             "V2"
           );
apps/webapp/app/runEngine/services/streamBatchItems.server.ts (1)
150-170: Response does not indicate whether the batch was sealed (client can’t distinguish partial ingestion)
Line 154-169 returns the same shape as the success path (Line 194-198), so clients can’t tell “sealed & processing” vs “needs retry with missing items”. This matches the prior review feedback.

Also: set span attributes before the early return so telemetry isn’t missing counts on partial ingestions.
         if (enqueuedCount !== batch.runCount) {
+          span.setAttribute("itemsAccepted", itemsAccepted);
+          span.setAttribute("itemsDeduplicated", itemsDeduplicated);
           logger.warn("Batch item count mismatch", {
             batchId: batchFriendlyId,
             expected: batch.runCount,
             received: enqueuedCount,
             itemsAccepted,
             itemsDeduplicated,
           });
packages/trigger-sdk/src/v3/shared.ts (1)

1533-1585: Good: two-phase errors now preserve phase + batchId context. That addresses a big chunk of “Phase 2 failed after Phase 1 succeeded” debuggability. Remaining question is whether you want any explicit server-side cleanup/cancel on stream failure.
packages/redis-worker/src/fair-queue/index.ts (2)
798-832: Incorrect access to workerQueue field (should not be under payload).
This matches a previously reported issue: workerQueue is a top-level field on StoredMessage, not inside payload.
-    const workerQueueId = message.payload.workerQueue ?? queueId;
+    const workerQueueId = message.workerQueue ?? queueId;
1090-1103: Release reserved concurrency when dequeue-time payload validation fails.
This matches a previously reported issue: the early-return path moves to DLQ but doesn’t release the previously reserved concurrency slot.
       if (!result.success) {
@@
         // Move to DLQ
         await this.#moveToDeadLetterQueue(storedMessage, "Payload validation failed");
+        // Release reserved concurrency so the queue/group doesn't get stuck
+        if (this.concurrencyManager) {
+          await this.concurrencyManager.release(descriptor, storedMessage.id).catch((e) => {
+            this.logger.error("Failed to release concurrency after validation failure", {
+              queueId,
+              messageId: storedMessage.id,
+              error: e instanceof Error ? e.message : String(e),
+            });
+          });
+        }
         return;
       }
internal-packages/run-engine/src/batch-queue/index.ts (1)
664-700: Don’t cleanup Redis state if completion callback fails.
This matches an existing report: if the completion callback fails (e.g., DB update), deleting Redis progress makes the batch unrecoverable/stuck.
     if (this.completionCallback) {
       try {
         await this.completionCallback(result);
       } catch (error) {
         this.logger.error("Error in batch completion callback", {
           batchId,
           error: error instanceof Error ? error.message : String(error),
         });
+        // Preserve Redis state for inspection/retry; do NOT cleanup on callback failure.
+        return;
       }
     }

     // Clean up Redis keys for this batch
     await this.completionTracker.cleanup(batchId);
(If you still need eventual cleanup, consider a separate “reaper” job gated on DB state.)

🧹 Nitpick comments (6)

apps/webapp/app/runEngine/services/streamBatchItems.server.ts (2)

95-113: Index/order + payloadType handling is too loose for “in-order streaming”

Line 105-113: only upper-bound check exists. If the contract expects “items are enqueued in order”, you should enforce index is an integer >= 0 and (optionally) strictly increasing (or at least non-decreasing).

Line 116: deriving payload type from item.options?.payloadType is brittle (and bypasses schema intent). Prefer an explicit field from the NDJSON schema (if present) or validate the option value is a string.

Also applies to: 115-133

276-287: AsyncIterable wrapper should cancel the reader on early exit
If the consumer stops early (break/throw), releasing the lock (Line 285) doesn’t necessarily cancel the underlying stream. Consider await reader.cancel() in finally (best-effort) before releaseLock().

packages/trigger-sdk/src/v3/shared.ts (1)

613-733: Consider runtime validation for “stream inputs” (clearer errors than for await TypeError). Today, any non-array value that isn’t actually AsyncIterable/ReadableStream will fail later with a generic error. A small guard at the branch point would improve DX.

Also applies to: 868-996, 1127-1250, 1387-1518

internal-packages/run-engine/src/batch-queue/types.ts (2)

22-40: Consider tightening payloadType/options shape (optional).
If the engine expects "application/json" vs "application/store" semantics, consider at least documenting/encoding those as a union (or z.enum([...]).catchall(...)) to reduce downstream ambiguity while staying permissive.

46-79: BatchMeta vs InitializeBatchOptions: keep them intentionally in-sync.
These two structures mirror each other; a drift here will be painful. Consider a shared base type/schema or a single source-of-truth conversion helper.

Also applies to: 133-164

packages/core/src/v3/apiClient/index.ts (1)

410-416: Harden retry option handling (avoid maxAttempts!).
Because retryOptions is built with spreads, it’s possible to end up with maxAttempts: undefined (if a caller explicitly provides it as undefined). Prefer normalizing once (e.g., const maxAttempts = retryOptions.maxAttempts ?? DEFAULT...) and removing the non-null assertion.

Also applies to: 1602-1608, 1616-1673

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef76ff7 and daa0b5b.

⛔ Files ignored due to path filters (1)

references/hello-world/src/trigger/batches.ts is excluded by !references/**

📒 Files selected for processing (12)

apps/webapp/app/env.server.ts (3 hunks)
apps/webapp/app/runEngine/concerns/batchLimits.server.ts (1 hunks)
apps/webapp/app/runEngine/services/streamBatchItems.server.ts (1 hunks)
apps/webapp/app/v3/runEngineHandlers.server.ts (2 hunks)
internal-packages/run-engine/src/batch-queue/index.ts (1 hunks)
internal-packages/run-engine/src/batch-queue/types.ts (1 hunks)
packages/core/src/v3/apiClient/index.ts (6 hunks)
packages/redis-worker/src/fair-queue/index.ts (1 hunks)
packages/redis-worker/src/fair-queue/schedulers/weighted.ts (1 hunks)
packages/redis-worker/src/fair-queue/tests/concurrency.test.ts (1 hunks)
packages/redis-worker/src/fair-queue/types.ts (1 hunks)
packages/trigger-sdk/src/v3/shared.ts (13 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

packages/redis-worker/src/fair-queue/tests/concurrency.test.ts
packages/redis-worker/src/fair-queue/schedulers/weighted.ts
packages/redis-worker/src/fair-queue/types.ts

🧰 Additional context used

📓 Path-based instructions (7)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

packages/core/src/v3/apiClient/index.ts
apps/webapp/app/runEngine/concerns/batchLimits.server.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
packages/trigger-sdk/src/v3/shared.ts
packages/redis-worker/src/fair-queue/index.ts

{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

packages/core/src/v3/apiClient/index.ts
apps/webapp/app/runEngine/concerns/batchLimits.server.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

packages/core/src/v3/apiClient/index.ts
apps/webapp/app/runEngine/concerns/batchLimits.server.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
packages/trigger-sdk/src/v3/shared.ts
packages/redis-worker/src/fair-queue/index.ts

**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

packages/core/src/v3/apiClient/index.ts
apps/webapp/app/runEngine/concerns/batchLimits.server.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
packages/trigger-sdk/src/v3/shared.ts
packages/redis-worker/src/fair-queue/index.ts

apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

apps/webapp/app/runEngine/concerns/batchLimits.server.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts

apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

apps/webapp/app/runEngine/concerns/batchLimits.server.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts
apps/webapp/app/runEngine/services/streamBatchItems.server.ts

packages/trigger-sdk/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Files:

packages/trigger-sdk/src/v3/shared.ts

🧠 Learnings (29)

📓 Common learnings

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use `trigger.dev/redis-worker` for background job and worker system needs in the webapp and run engine

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.trigger()` to trigger multiple different tasks at once from backend code

Applied to files:

packages/core/src/v3/apiClient/index.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.batchTrigger()` to trigger multiple runs of a single task with different payloads

Applied to files:

packages/core/src/v3/apiClient/index.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerAndWait()` to batch trigger multiple different tasks and wait for results

Applied to files:

packages/core/src/v3/apiClient/index.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `runs.subscribeToBatch()` to subscribe to changes for all runs in a batch

Applied to files:

packages/core/src/v3/apiClient/index.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
internal-packages/run-engine/src/batch-queue/index.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTaskAndWait()` to batch trigger tasks by passing task instances and wait for results

Applied to files:

packages/core/src/v3/apiClient/index.ts
internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-14T16:03:06.917Z

Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2681
File: apps/webapp/app/services/platform.v3.server.ts:258-302
Timestamp: 2025-11-14T16:03:06.917Z
Learning: In `apps/webapp/app/services/platform.v3.server.ts`, the `getDefaultEnvironmentConcurrencyLimit` function intentionally throws an error (rather than falling back to org.maximumConcurrencyLimit) when the billing client returns undefined plan limits. This fail-fast behavior prevents users from receiving more concurrency than their plan entitles them to. The org.maximumConcurrencyLimit fallback is only for self-hosted deployments where no billing client exists.

Applied to files:

apps/webapp/app/runEngine/concerns/batchLimits.server.ts
apps/webapp/app/env.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Limit task duration using the `maxDuration` property (in seconds)

Applied to files:

apps/webapp/app/runEngine/concerns/batchLimits.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `schemaTask()` from `trigger.dev/sdk/v3` with Zod schema for payload validation

Applied to files:

internal-packages/run-engine/src/batch-queue/types.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `batch.triggerByTask()` to batch trigger tasks by passing task instances for static task sets

Applied to files:

internal-packages/run-engine/src/batch-queue/types.ts
apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTrigger()` to trigger multiple runs of a task from inside another task

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Use the Run Engine 2.0 from `internal/run-engine` for new run lifecycle code in the webapp instead of the legacy run engine

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-09-03T14:34:41.781Z

Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 2464
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.settings/route.tsx:357-371
Timestamp: 2025-09-03T14:34:41.781Z
Learning: When using Zod's safeParse, the .data property is undefined when parsing fails, but TypeScript may still complain about accessing .data without checking .success first. The suggested approach of checking .success before accessing .data improves type safety and code clarity.

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-11-27T16:26:37.432Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-08-14T18:35:44.370Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2390
File: apps/webapp/app/env.server.ts:764-765
Timestamp: 2025-08-14T18:35:44.370Z
Learning: The BoolEnv helper in apps/webapp/app/utils/boolEnv.ts uses z.preprocess with inconsistent default value types across the codebase - some usages pass boolean defaults (correct) while others pass string defaults (incorrect), leading to type confusion. The helper should enforce boolean-only defaults or have clearer documentation.

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
apps/webapp/app/env.server.ts

📚 Learning: 2025-10-08T11:48:12.327Z

Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Attach metadata to task runs using the metadata option when triggering, and access/update it inside runs using metadata functions

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use the `task()` function from `trigger.dev/sdk/v3` to define tasks with id and run properties

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.batchTriggerAndWait()` to batch trigger tasks and wait for all results from a parent task

Applied to files:

apps/webapp/app/v3/runEngineHandlers.server.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:26:58.661Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

apps/webapp/app/env.server.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Control concurrency using the `queue` property with `concurrencyLimit` option

Applied to files:

internal-packages/run-engine/src/batch-queue/index.ts
packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `.withStreams()` to subscribe to realtime streams from task metadata in addition to run changes

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use metadata methods (set, del, replace, append, remove, increment, decrement, stream, flush) to update metadata during task execution

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Generate example payloads for tasks when possible

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.trigger()` to trigger a task from inside another task with specified payload

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `tasks.trigger()` with type-only imports to trigger tasks from backend code without importing the task implementation

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `yourTask.triggerAndWait()` to trigger a task and wait for its result from a parent task

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeys.create()` to create idempotency keys for preventing duplicate task executions

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

📚 Learning: 2025-11-27T16:27:35.304Z

Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Use `idempotencyKeyTTL` option to define a time window during which duplicate triggers return the original run

Applied to files:

packages/trigger-sdk/src/v3/shared.ts

🧬 Code graph analysis (3)

apps/webapp/app/runEngine/concerns/batchLimits.server.ts (3)

apps/webapp/app/env.server.ts (1)

env (1289-1289)

apps/webapp/app/services/rateLimiter.server.ts (3)

createRedisRateLimitClient (74-105)

RateLimiter (21-72)

Duration (17-17)

apps/webapp/app/runEngine/concerns/batchGlobalRateLimiter.server.ts (1)

limit (27-33)

apps/webapp/app/runEngine/services/streamBatchItems.server.ts (4)

apps/webapp/app/v3/services/baseService.server.ts (1)

WithRunEngine (52-59)

apps/webapp/app/runEngine/concerns/batchPayloads.server.ts (1)

BatchPayloadProcessor (28-164)

packages/core/src/v3/isomorphic/friendlyId.ts (1)

BatchId (96-96)

internal-packages/run-engine/src/batch-queue/types.ts (2)

BatchItem (22-40)

BatchItem (41-41)

packages/trigger-sdk/src/v3/shared.ts (9)

packages/core/src/v3/schemas/api.ts (2)

BatchItemNDJSON (358-367)

BatchItemNDJSON (369-369)

packages/core/src/v3/utils/ioSerialization.ts (1)

stringifyIO (88-105)

packages/core/src/v3/idempotencyKeys.ts (2)

makeIdempotencyKey (39-53)

flattenIdempotencyKey (12-37)

packages/core/src/v3/workers/index.ts (1)

getEnvVar (4-4)

packages/core/src/v3/taskContext/index.ts (1)

ctx (26-28)

packages/trigger-sdk/src/v3/batch.ts (1)

batch (18-24)

packages/core/src/v3/runMetadata/index.ts (1)

stream (80-86)

internal-packages/run-engine/src/index.ts (1)

BatchItem (16-16)

packages/core/src/v3/types/schemas.ts (1)

SchemaParseFn (101-101)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Cursor Bugbot
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: typecheck / typecheck
GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (9)

apps/webapp/app/runEngine/concerns/batchLimits.server.ts (5)

9-16: LGTM!

The BatchLimitsConfig schema is well-defined with appropriate defaults.

18-31: Good use of singleton pattern.

The Redis client singleton addresses the previous concern about connection exhaustion from creating new clients on every call.

81-87: LGTM!

The function provides a clean API for retrieving batch limits and rate limiter for an organization.

89-105: LGTM!

The config resolution logic properly validates and falls back to defaults when needed.

111-123: LGTM!

The error class provides all necessary information for constructing proper 429 responses.

apps/webapp/app/env.server.ts (2)

943-950: Defaults added for DRR settings.

The previously flagged critical issue has been resolved—BATCH_QUEUE_DRR_QUANTUM and BATCH_QUEUE_MAX_DEFICIT now have defaults (5 and 50). The optional CONSUMER_COUNT and CONSUMER_INTERVAL_MS are safe since FairQueue provides defaults.

952-960: LGTM!

Plan-based batch limits and concurrency settings are well-defined with sensible defaults for each tier.

packages/trigger-sdk/src/v3/shared.ts (2)

56-57: Type import for BatchItemNDJSON looks right (keeps SDK isomorphic).

128-130: This concern does not apply—BatchTriggerError is not part of the public API.

BatchTriggerError is exported from shared.ts but never re-exported through index.ts, which is the sole public entry point (per package.json exports). As an internal-only export, the early re-export pattern does not affect consumers and poses no CJS transpilation risk. This same pattern is used consistently elsewhere in the file (e.g., line 124).

Likely an incorrect or invalid review comment.

apps/webapp/app/env.server.ts

apps/webapp/app/runEngine/services/streamBatchItems.server.ts

packages/core/src/v3/apiClient/index.ts

packages/redis-worker/src/fair-queue/index.ts

packages/trigger-sdk/src/v3/shared.ts

…e trigger pipeline

… efficiently create errors in the completion callback

…encyKeyError is thrown

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/trigger-sdk/src/v3/batch.ts (1)

1-9: Update JSDoc @returns annotation to match new return type

The function return type on line 46 has been correctly updated to ApiPromise<RetrieveBatchV2Response>, but the JSDoc @returns annotation on line 31 still references ApiPromise<RetrieveBatchResponse>. Update line 31 to document the actual return type: @returns {ApiPromise<RetrieveBatchV2Response>}.

♻️ Duplicate comments (10)

.env.example (1)
90-94: Add trailing blank line to end of file.

This is a duplicate of a previous review comment — the file is missing a trailing blank line at EOF. This is required by linting tools and standard POSIX text file conventions.

Apply this fix:
 # Enable local observability stack (requires `pnpm run docker` to start otel-collector)
 # Uncomment these to send metrics to the local Prometheus via OTEL Collector:
 # INTERNAL_OTEL_METRIC_EXPORTER_ENABLED=1
 # INTERNAL_OTEL_METRIC_EXPORTER_URL=http://localhost:4318/v1/metrics
 # INTERNAL_OTEL_METRIC_EXPORTER_INTERVAL_MS=15000
+
docker/config/grafana/provisioning/dashboards/nodejs-runtime.json (1)

1-445: ❌ Unresolved issues from previous review: Prettier formatting and metric definition mismatch.

This review comment was previously flagged and remains unresolved. Two critical issues must be addressed:

Formatting: The JSON file is not Prettier-formatted. Per coding guidelines, all .json files must be formatted with Prettier. This violates the requirement: **/*.{js,ts,jsx,tsx,json,md,css,scss}: Format code using Prettier.

Metrics mismatch: The dashboard references 10 metrics that do not appear to be exported by the application:

triggerdotdev_nodejs_event_loop_utilization_ratio

triggerdotdev_nodejs_eventloop_lag_p50_seconds, p90_seconds, p99_seconds, max_seconds, mean_seconds

triggerdotdev_nodejs_uv_threadpool_size_threads

triggerdotdev_nodejs_active_handles_total_handles, active_handles_handles

triggerdotdev_nodejs_active_requests_total_requests

Without these metrics exported, the dashboard queries will return no data, rendering it non-functional.

Required actions:

Run Prettier on the file: prettier --write docker/config/grafana/provisioning/dashboards/nodejs-runtime.json

Verify and confirm that your Node.js instrumentation exports all referenced metrics, or update dashboard queries to match actual exported metric names.

apps/webapp/app/runEngine/concerns/batchPayloads.server.ts (1)

150-170: Double stringification issue has been addressed.

The current implementation correctly checks typeof payload === "string" before calling JSON.stringify for application/json payloads (lines 153-154). Pre-serialized JSON strings from the SDK are now handled correctly without double-encoding.

packages/redis-worker/src/fair-queue/types.ts (1)

62-79: StoredMessage.metadata type alignment confirmed.

The metadata field is now Record<string, unknown> (line 78), consistent with QueueMessage.metadata (line 55). This addresses the type mismatch identified in past reviews.
packages/redis-worker/src/fair-queue/tests/fairQueue.test.ts (1)
700-731: Potential race condition: handler registered after start() and enqueue().

The message handler is registered at lines 714-718, after queue.start() (line 701) and queue.enqueue() (line 708). This ordering could cause the message to be processed before the handler is attached.

While a previous review comment was marked as addressed, the code still shows the handler being registered after the queue starts and messages are enqueued. Consider moving the handler registration before queue.start():
+        const processed: string[] = [];
+        queue.onMessage(async (ctx) => {
+          processed.push(ctx.message.payload.value);
+          await ctx.complete();
+        });
+
         // Start without any messages (will trigger empty dequeues)
         queue.start();

         // Wait a bit for cooloff to kick in
         await new Promise((resolve) => setTimeout(resolve, 500));

         await queue.enqueue({
           queueId: "tenant:t1:queue:q1",
           tenantId: "t1",
           payload: { value: "after-cooloff" },
         });

-        const processed: string[] = [];
-        queue.onMessage(async (ctx) => {
-          processed.push(ctx.message.payload.value);
-          await ctx.complete();
-        });
internal-packages/run-engine/src/batch-queue/index.ts (2)
514-652: Batch item handling and completion callback behavior look robust

#handleMessage correctly:

Validates metadata presence.

Calls processItemCallback, then records success/failure via the idempotent completion tracker.

Always sets processedCount (including unexpected-error path) before completing the FairQueue message.

Triggers #finalizeBatch only when processedCount === meta.runCount.

#finalizeBatch now only cleans up Redis after a successful completionCallback, and propagates callback errors so the batch can be retried or inspected with full Redis state, while still cleaning up when no callback is registered. This addresses the earlier “cleanup after callback failure” bug.

Also applies to: 658-698

267-321: Validate envId against batch metadata before enqueueing items

enqueueBatchItem() currently trusts the caller-provided envId for routing:
const meta = await this.completionTracker.getMeta(batchId);
// ...
const queueId = this.#makeQueueId(envId, batchId);
await this.fairQueue.enqueue({ queueId, tenantId: envId, ... });
But meta already contains the authoritative environmentId for the batch. If a caller accidentally passes the wrong envId, items will be enqueued under the wrong environment’s queue while the DB batch still belongs to the original environment, breaking fairness/concurrency guarantees and making debugging very hard.

Add a strict check before routing:
    const meta = await this.completionTracker.getMeta(batchId);
    if (!meta) {
      throw new Error(`Batch ${batchId} not found or not initialized`);
    }
+
+    if (meta.environmentId !== envId) {
+      throw new Error(
+        `Batch ${batchId} belongs to environment ${meta.environmentId} but enqueue was attempted for ${envId}`
+      );
+    }
This keeps routing consistent with stored metadata and protects against cross-environment bugs.
internal-packages/run-engine/src/engine/index.ts (1)
84-85: Guard BatchQueue initialization and fix Redis keyPrefix ordering

Two issues here:

BatchQueue is created even when options.batchQueue is undefined, which means redis becomes { keyPrefix: "batch-queue:" } and createRedisClient will fall back to its own host/port defaults (typically localhost:6379). That’s risky in production: RunEngine will silently connect a second client to the wrong Redis if batchQueue isn’t explicitly configured. Either:

Make options.batchQueue required and fail fast if it’s missing, or

Treat it as truly optional and skip BatchQueue initialization (and gate the public batch methods accordingly).

The redis.keyPrefix override is backwards:
redis: {
  keyPrefix: `${options.batchQueue?.redis.keyPrefix ?? ""}batch-queue:`,
  ...options.batchQueue?.redis,
},
The spread comes after keyPrefix, so any caller-provided keyPrefix overwrites your "…batch-queue:" suffix. This can cause key collisions with other queues. Flip the order so you always append the suffix:
-    this.batchQueue = new BatchQueue({
-      redis: {
-        keyPrefix: `${options.batchQueue?.redis.keyPrefix ?? ""}batch-queue:`,
-        ...options.batchQueue?.redis,
-      },
+    if (!options.batchQueue) {
+      throw new Error("RunEngine batchQueue configuration is required");
+    }
+
+    this.batchQueue = new BatchQueue({
+      redis: {
+        ...options.batchQueue.redis,
+        keyPrefix: `${options.batchQueue.redis.keyPrefix ?? ""}batch-queue:`,
+      },
       drr: {
-        quantum: options.batchQueue?.drr?.quantum ?? 5,
-        maxDeficit: options.batchQueue?.drr?.maxDeficit ?? 50,
+        quantum: options.batchQueue.drr?.quantum ?? 5,
+        maxDeficit: options.batchQueue.drr?.maxDeficit ?? 50,
       },
       consumerCount: options.batchQueue?.consumerCount ?? 2,
       consumerIntervalMs: options.batchQueue?.consumerIntervalMs ?? 100,
       defaultConcurrency: options.batchQueue?.defaultConcurrency ?? 10,
       globalRateLimiter: options.batchQueue?.globalRateLimiter,
       startConsumers,
       tracer: options.tracer,
       meter: options.meter,
     });
(If you prefer keeping batchQueue optional, drop the throw and only construct BatchQueue when options.batchQueue is set, plus null‑check in the public batch methods.)

Also applies to: 321-348
packages/redis-worker/src/fair-queue/index.ts (2)
762-832: Fix StoredMessage usage: id vs messageId, and direct-mode message processing

There are a few tightly related correctness bugs here:

Using message.messageId instead of message.id

StoredMessage defines id: string but no messageId. In both two-stage and direct modes you use message.messageId for concurrency and worker-queue keys:
// In #claimAndPushToWorkerQueue
const reserved = await this.concurrencyManager.reserve(descriptor, message.messageId);
await this.visibilityManager.release(message.messageId, queueId, queueKey, queueItemsKey);
const messageKey = `${message.messageId}:${queueId}`;

// In #processOneMessage
const reserved = await this.concurrencyManager.reserve(descriptor, message.messageId);
await this.visibilityManager.release(message.messageId, queueId, queueKey, queueItemsKey);
Because message.messageId is undefined, you end up reserving and releasing concurrency under inconsistent keys, and building worker-queue keys like "undefined:queue". This will leak concurrency slots and corrupt worker-queue behavior.

Direct mode calls #processMessage with message.payload instead of the StoredMessage

#processMessage now expects a full StoredMessage:
async #processMessage(loopId: string, storedMessage: StoredMessage<...>, queueId: string)
But in #processOneMessage you call:
await this.#processMessage(loopId, message.payload, queueId);
which passes only the payload. This will fail type-checking and, at runtime, break visibility/concurrency handling inside #processMessage.

A minimal, consistent fix is:
@@ async #claimAndPushToWorkerQueue(...)
-    if (this.concurrencyManager) {
-      const reserved = await this.concurrencyManager.reserve(descriptor, message.messageId);
+    if (this.concurrencyManager) {
+      const reserved = await this.concurrencyManager.reserve(descriptor, message.id);
       if (!reserved) {
         // Release message back to queue
-        await this.visibilityManager.release(message.messageId, queueId, queueKey, queueItemsKey);
+        await this.visibilityManager.release(message.id, queueId, queueKey, queueItemsKey);
         return false;
       }
     }
@@
-    const workerQueueId = message.payload.workerQueue ?? queueId;
+    const workerQueueId = message.workerQueue ?? queueId;
@@
-    const messageKey = `${message.messageId}:${queueId}`;
+    const messageKey = `${message.id}:${queueId}`;
@@ async #processOneMessage(...)
-    if (this.concurrencyManager) {
-      const reserved = await this.concurrencyManager.reserve(descriptor, message.messageId);
+    if (this.concurrencyManager) {
+      const reserved = await this.concurrencyManager.reserve(descriptor, message.id);
       if (!reserved) {
         // Release message back to queue
-        await this.visibilityManager.release(message.messageId, queueId, queueKey, queueItemsKey);
+        await this.visibilityManager.release(message.id, queueId, queueKey, queueItemsKey);
         return false;
       }
     }
@@
-    await this.#processMessage(loopId, message.payload, queueId);
+    await this.#processMessage(loopId, message, queueId);
This makes concurrency reservation/release and worker-key construction consistent with StoredMessage and ensures direct mode uses the same processing path as worker-queue mode.

Also applies to: 1003-1067, 1073-1221, 1223-1349

838-889: Worker-queue routing likely never matches consumer queues

In worker-queue mode:

Producers push messages to a queue derived from the StoredMessage:
// After fixing as above:
const workerQueueId = message.workerQueue ?? queueId;
await this.workerQueueManager!.push(workerQueueId, messageKey);
But consumers always block-pop from worker-{consumerId}:
const loopId = `worker-${consumerId}`;
const workerQueueId = loopId; // Each consumer has its own worker queue by default

const messageKey = await this.workerQueueManager!.blockingPop(
  workerQueueId,
  this.workerQueueBlockingTimeoutSeconds,
  this.abortController.signal
);
Unless your workerQueueResolver happens to return "worker-0", "worker-1", etc., messages will be pushed to one set of keys (e.g., per-queue or per-tenant) and consumed from a completely different set (worker-{n}), so worker-queue mode will sit idle.

You’ll want a single, consistent routing scheme. Two concrete options:

Per-consumer worker queues: keep worker-{consumerId} naming, but have workerQueueResolver (or an internal helper) map each message deterministically to a specific consumer ID and set storedMessage.workerQueue = worker-{id}; or

Per-queue/per-tenant worker queues: have consumers block-pop from the same IDs workerQueueResolver writes to (e.g., via a keys.workerQueueKey(...) helper), so all consumers can process from those queues.

Right now the producer/consumer queue IDs don’t line up, so pick one model and align both push() and blockingPop() with it, plus add tests to lock the behavior in.

Also applies to: 1516-1620

🧹 Nitpick comments (11)

apps/webapp/app/routes/api.v1.batches.$batchId.ts (1)
41-52: Response extensions look good; consider slightly more defensive errors access

The new successfulRunCount, failedRunCount, and errors fields are added in a backward‑compatible way and the ?? undefined usage ensures optional fields are omitted from the JSON when not present.

If you want this to be a bit more future‑proof against changes to the include above, you could guard the errors array access with optional chaining:
-      errors:
-        batch.errors.length > 0
-          ? batch.errors.map((err) => ({
+      errors:
+        batch.errors?.length
+          ? batch.errors.map((err) => ({
               index: err.index,
               taskIdentifier: err.taskIdentifier,
               error: err.error,
               errorCode: err.errorCode ?? undefined,
             }))
           : undefined,
Optionally, you might also tweak the comment to reflect that errors will be included for any batch with errors, not just PARTIAL_FAILED statuses.
packages/core/src/v3/idempotencyKeys.ts (1)

50-52: Add JSDoc documentation to makeIdempotencyKey explaining its behavior and scope.

The function makeIdempotencyKey explicitly uses scope: "run" to align with createIdempotencyKey's default behavior. However, unlike createIdempotencyKey, which has comprehensive JSDoc explaining the scope options and implications, makeIdempotencyKey lacks any documentation. This makes it unclear to callers what scope behavior to expect.

Add JSDoc similar to createIdempotencyKey (lines 55-89) to document:

The purpose of the function

That it uses run-scoped idempotency keys by default

How scope affects key derivation

Usage examples for batch and non-batch scenarios
apps/webapp/seed.mts (1)
236-248: Consider reducing verbosity for existing projects (optional).

The environment listing is logged every time findOrCreateProject is called, even when the project already exists. While this is useful for local development to see API keys, it can be verbose when re-running the seed.

Consider only logging environments when the project is newly created, or add a flag to control verbosity:
 console.log(`✅ Project ready: ${project.name} (${project.externalRef})`);

-// list environments for this project
-const environments = await prisma.runtimeEnvironment.findMany({
-  where: { projectId: project.id },
-  select: {
-    slug: true,
-    type: true,
-    apiKey: true,
-  },
-});
-console.log(`   Environments for ${project.name}:`);
-for (const env of environments) {
-  console.log(`   - ${env.type.toLowerCase()} (${env.slug}): ${env.apiKey}`);
-}
+// list environments for this project
+const environments = await prisma.runtimeEnvironment.findMany({
+  where: { projectId: project.id },
+  select: {
+    slug: true,
+    type: true,
+    apiKey: true,
+  },
+});
+
+if (process.env.VERBOSE_SEED) {
+  console.log(`   Environments for ${project.name}:`);
+  for (const env of environments) {
+    console.log(`   - ${env.type.toLowerCase()} (${env.slug}): ${env.apiKey}`);
+  }
+}

 return { project, environments };
apps/webapp/test/engine/streamBatchItems.test.ts (1)

319-343: Clever race condition simulation via Prisma mock.

The racingPrisma object intercepts updateMany to simulate a concurrent request winning the race. The as unknown as PrismaClient cast is necessary for this mocking approach.

Consider extracting this pattern into a reusable test utility if similar race condition tests are needed elsewhere.
packages/redis-worker/src/fair-queue/workerQueue.ts (2)
93-150: Minor: Abort listener not removed on normal completion.

The abort event listener is added with { once: true }, but if blpop completes normally (without abort), the listener remains attached to the signal until it's aborted or garbage collected. For short-lived signals this is fine, but consider explicitly removing the listener in the finally block for cleaner resource management.
   async blockingPop(
     workerQueueId: string,
     timeoutSeconds: number,
     signal?: AbortSignal
   ): Promise<string | null> {
     const workerQueueKey = this.keys.workerQueueKey(workerQueueId);

     // Create a separate client for blocking operation
     // This is required because BLPOP blocks the connection
     const blockingClient = this.redis.duplicate();

+    let cleanup: (() => void) | undefined;
+
     try {
       // Set up abort handler
       if (signal) {
-        const cleanup = () => {
+        cleanup = () => {
           blockingClient.disconnect();
         };
         signal.addEventListener("abort", cleanup, { once: true });

         if (signal.aborted) {
           return null;
         }
       }

       const result = await blockingClient.blpop(workerQueueKey, timeoutSeconds);
       // ... rest of try block
     } catch (error) {
       // ... error handling
     } finally {
+      if (signal && cleanup) {
+        signal.removeEventListener("abort", cleanup);
+      }
       await blockingClient.quit().catch(() => {
         // Ignore quit errors (may already be disconnected)
       });
     }
   }
231-274: Consider extracting duplicated Lua script to a constant.

The popWithLength Lua script is duplicated between #registerCommands() (lines 235-248) and registerCommands() (lines 257-272). Consider extracting to a module-level constant for DRY compliance and easier maintenance.
+const POP_WITH_LENGTH_LUA = `
+local workerQueueKey = KEYS[1]
+
+-- Pop the first message
+local messageKey = redis.call('LPOP', workerQueueKey)
+if not messageKey then
+  return nil
+end
+
+-- Get remaining queue length
+local queueLength = redis.call('LLEN', workerQueueKey)
+
+return {messageKey, queueLength}
+`;
+
 export class WorkerQueueManager {
   // ...

   #registerCommands(): void {
     this.redis.defineCommand("popWithLength", {
       numberOfKeys: 1,
-      lua: `
-local workerQueueKey = KEYS[1]
-...
-      `,
+      lua: POP_WITH_LENGTH_LUA,
     });
   }

   registerCommands(redis: Redis): void {
     redis.defineCommand("popWithLength", {
       numberOfKeys: 1,
-      lua: `
-local workerQueueKey = KEYS[1]
-...
-      `,
+      lua: POP_WITH_LENGTH_LUA,
     });
   }
 }
packages/redis-worker/src/fair-queue/types.ts (1)

461-492: Align metadata types between enqueue and queue message interfaces.

EnqueueOptions.metadata and EnqueueBatchOptions.metadata use Record<string, string> (lines 473, 491), while QueueMessage.metadata, StoredMessage.metadata, and QueueDescriptor.metadata use Record<string, unknown> (lines 37, 55, 78). The metadata is passed through directly without conversion, creating type friction.

Change EnqueueOptions and EnqueueBatchOptions to use Record<string, unknown> for consistency, unless string-only values are required for a specific reason (e.g., Redis serialization constraints), in which case document this constraint and validate on enqueue.

apps/webapp/test/engine/triggerTask.test.ts (1)

782-782: Consider using explicit wait conditions instead of fixed setTimeout(500).

The fixed 500ms delays may cause flaky tests under load or slow CI environments. Consider polling for expected state or using event-based synchronization where possible.

Also applies to: 812-812, 887-887
packages/redis-worker/src/fair-queue/tests/workerQueue.test.ts (1)
8-8: Consider moving keys initialization into each test.

The keys variable is declared at describe scope but reassigned in every test. Since each redisTest creates a fresh instance anyway, declaring it locally within each test would be cleaner and more explicit.
 describe("WorkerQueueManager", () => {
-  let keys: FairQueueKeyProducer;
-
   describe("push and pop", () => {
     redisTest(
       "should push and pop a single message",
       { timeout: 10000 },
       async ({ redisOptions }) => {
-        keys = new DefaultFairQueueKeyProducer({ prefix: "test" });
+        const keys = new DefaultFairQueueKeyProducer({ prefix: "test" });
packages/redis-worker/src/fair-queue/schedulers/roundRobin.ts (1)
146-149: Consider adding TTL to lastServed key.

The lastServedIndex key persists indefinitely. If shards are removed or renamed, stale keys could accumulate. Consider adding an expiry (e.g., 24 hours) to auto-cleanup:
  async #setLastServedIndex(shardKey: string, index: number): Promise<void> {
    const key = this.#lastServedKey(shardKey);
-   await this.redis.set(key, index.toString());
+   await this.redis.set(key, index.toString(), "EX", 86400); // 24h TTL
  }
apps/webapp/app/runEngine/services/streamBatchItems.server.ts (1)

39-42: Streaming batch sealing and response semantics look correct now

The service now:

Validates batchFriendlyId via parseBatchFriendlyId() and returns a 400-style ServiceValidationError on malformed IDs.

Ensures the batch exists, is unsealed, and PENDING before accepting items.

Tracks itemsAccepted vs itemsDeduplicated, then compares getBatchEnqueuedCount(batchId) to batch.runCount and, on mismatch, returns a response with sealed: false, enqueuedCount, and expectedCount so clients can retry missing items instead of silently “succeeding”.

Seals via a conditional updateMany (sealed=false and status="PENDING"), handles concurrent sealers by re-reading the row, and surfaces an explicit error if the batch ends up in an unexpected state.

This addresses the earlier race and “no sealed flag” issues cleanly; the flow is consistent with the v3 StreamBatchItemsResponse schema.

One small nit: StreamBatchItemsServiceOptions is accepted but not used inside call(). If size limits are enforced solely in the route via createNdjsonParserStream(maxItemBytes), consider either dropping the options parameter here or documenting it as reserved for future behavior to avoid confusion.

Also applies to: 48-107, 108-190, 192-277

apps/webapp/seed.mts

coderabbitai · 2025-12-15T13:55:40Z

apps/webapp/seed.mts

+  if (updates) {
+    organization = await prisma.organization.update({
+      where: { id: organization.id },
+      data: updates,
+    });
+  }


⚠️ Potential issue | 🟠 Major

Unconditional update may cause unexpected overwrites.

The updates parameter is applied even when the organization already exists, which means re-running the seed will overwrite existing batchQueueConcurrencyConfig values. This violates the "find or create" semantics and could reset intentional configuration changes made during testing.

Consider applying updates only during creation:

if (!organization) { console.log(`Creating organization: ${title}...`); organization = await createOrganization({ title: title, userId: user.id, companySize: "1-10", }); -} - -if (updates) { - organization = await prisma.organization.update({ - where: { id: organization.id }, - data: updates, - }); + + if (updates) { + organization = await prisma.organization.update({ + where: { id: organization.id }, + data: updates, + }); + } +} else { + console.log(`✅ Organization already exists: ${title}`); }

Alternatively, if updates should always be applied, rename the function to findOrCreateAndUpdateOrganization to make the behavior explicit.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In apps/webapp/seed.mts around lines 194-199 the code unconditionally applies `updates` via prisma.update which overwrites existing organization fields on re-seed; change behavior to preserve "find or create" semantics by only applying `updates` when the organization was newly created (i.e., move the update into the creation branch or set `data` with `...updates` only in the create call), or if the intent is to always modify existing records, rename the function to `findOrCreateAndUpdateOrganization` and keep the update but update the function name and its call sites to reflect the new contract.

apps/webapp/seed.mts

docs/batch-queue-metrics.md

cursor · 2025-12-15T14:04:01Z

apps/webapp/app/routes/api.v3.batches.$batchId.items.ts

+        error.message.includes("exceeds maximum size")
+      ) {
+        return json({ error: error.message }, { status: 400 });
+      }


Bug: Invalid UTF-8 errors return wrong HTTP status code

The NDJSON parser in streamBatchItems.server.ts can throw "Invalid UTF-8" errors when the client sends malformed data, but the error handling in this route only checks for "Invalid JSON" and "exceeds maximum size" to return HTTP 400. Invalid UTF-8 errors fall through to return HTTP 500 with x-should-retry: false, which is semantically incorrect since encoding errors are client-side issues (not server errors). The error.message.includes() check is missing "Invalid UTF-8" as a condition for returning 400.

cursor · 2025-12-15T14:04:01Z

apps/webapp/app/v3/runEngineHandlers.server.ts

+        batchId,
+        error: error instanceof Error ? error.message : String(error),
+      });
+    }


Bug: Batch completion callback swallows errors preventing retry

The batch completion callback catches errors but doesn't re-throw them. The BatchQueue.#finalizeBatch method (in batch-queue/index.ts lines 681-697) is designed to preserve Redis data for retry only when the callback throws an error. When the error is swallowed here, the BatchQueue proceeds to call cleanup(batchId), deleting the Redis data needed for retry. If the Postgres transaction fails (e.g., database temporarily unavailable), the batch gets stuck in PROCESSING state with no recovery path since the completion data is gone.

mintlify bot deployed to staging - docs December 11, 2025 16:59 View deployment

cursor bot reviewed Dec 11, 2025

View reviewed changes

apps/webapp/app/v3/runEngine.server.ts Outdated Show resolved Hide resolved

apps/webapp/app/runEngine/services/createBatch.server.ts Show resolved Hide resolved

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

ericallam force-pushed the feat/batch-trigger-v2 branch from e8388a5 to 3ed008d Compare December 11, 2025 17:51

mintlify bot deployed to staging - docs December 11, 2025 17:52 View deployment

cursor bot reviewed Dec 11, 2025

View reviewed changes

apps/webapp/app/runEngine/services/streamBatchItems.server.ts Show resolved Hide resolved

internal-packages/run-engine/src/batch-queue/index.ts Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

apps/webapp/app/runEngine/services/triggerTask.server.ts Show resolved Hide resolved

cursor bot reviewed Dec 12, 2025

View reviewed changes

apps/webapp/app/v3/runEngine.server.ts Outdated Show resolved Hide resolved

apps/webapp/app/v3/runEngine.server.ts Outdated Show resolved Hide resolved

apps/webapp/app/runEngine/concerns/batchPayloads.server.ts Show resolved Hide resolved

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

apps/webapp/app/v3/runEngineHandlers.server.ts Show resolved Hide resolved

apps/webapp/app/v3/runEngineHandlers.server.ts Show resolved Hide resolved

internal-packages/run-engine/src/engine/index.ts Outdated Show resolved Hide resolved

cursor bot reviewed Dec 12, 2025

View reviewed changes

apps/webapp/app/v3/runEngineHandlers.server.ts Outdated Show resolved Hide resolved

apps/webapp/app/v3/runEngineHandlers.server.ts Show resolved Hide resolved

internal-packages/database/prisma/schema.prisma Show resolved Hide resolved

coderabbitai bot reviewed Dec 12, 2025

View reviewed changes

ericallam added 16 commits December 15, 2025 13:40

WIP

0550658

some fixes

7b5056b

fair queue baby

ceddba0

wip of the streaming batch trigger stuff

c1743bb

new async iterable version of batch trigger

8a9532d

pnpm lock changes

fdbed14

add new batch status to api schema

98a5709

Handle large payloads and correct the trace ID propogation to child runs

f066ae5

record when the batch processing is completed

8552b3d

more batch processing work

d72068c

better dequeuing from fair queue

d55dbb4

fixed tests and removed the run number incrementor from the run engin…

08eb6dd

…e trigger pipeline

restructure the batch queue callbacks to prevent circular import

7c85267

handle batch failures more reliably

f3354b1

Better handling of batches when sealed = false

00d5428

improve completion tracker callback calling and cleanup

4d76367

ericallam added 13 commits December 15, 2025 13:40

Add unique index to batch task run errors on the index field and more…

6c6de84

… efficiently create errors in the completion callback

batch task run seal update is atomic

d69cfa5

better ndjson parsing with bounded memory usage

5b8ec88

release concurrency slot when moving to the DLQ

5211ba6

Fix fair queue heartbeat

f592aac

fixed batch payload handling

1610766

capture and pass the planType through to triggering

da247bf

should preserve runFriendlyId across retries when RunDuplicateIdempot…

b77c910

…encyKeyError is thrown

better handling streaming ndjson retries

02eec34

Better batch friendly ID parsing errors

8f1619b

only parse payloads when they are application/json

82da484

fix batch queue redis config

7e073b4

cancel read stream before releasing lock

a5791a0

ericallam force-pushed the feat/batch-trigger-v2 branch from daa0b5b to a5791a0 Compare December 15, 2025 13:40

mintlify bot deployed to staging - docs December 15, 2025 13:41 View deployment

ericallam added 2 commits December 15, 2025 13:50

Add changeset

628a7d2

Fix stream batch items test

fce7874

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

cursor bot reviewed Dec 15, 2025

View reviewed changes

all tests should use root vitest

e78b619

Uh oh!

feat(engine): Batch trigger reloaded #2779

Are you sure you want to change the base?

feat(engine): Batch trigger reloaded #2779

Conversation

ericallam commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New batch trigger rate limits

Batch queue concurrency limits

Batch trigger limits

Uh oh!

changeset-bot bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

coderabbitai bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This is the final PR Bugbot will review for you during this billing cycle

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericallam commented Dec 11, 2025 •

edited

Loading

changeset-bot bot commented Dec 11, 2025 •

edited

Loading

coderabbitai bot commented Dec 11, 2025 •

edited

Loading