ml: ensure context is registered for REGEX type #11231

mirko-lazarevic · 2025-12-01T11:19:43Z

This fix ensures that when the buffer is
flushed, the record will have proper timestamp
and metadata instead of just the "log" field.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Example configuration file for the change
Debug log output from testing the change

Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Run local packaging test showing all targets (including any new ones) build.
Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Documentation required for this feature

Backporting

Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

Bug Fixes
- Multiline processing now registers first-line context earlier (including pre-concatenation when starting fresh) and avoids packing metadata for records truncated during processing, preventing metadata loss.
Tests
- Added regression tests to ensure full per-record metadata (time, stream, file, log) is preserved across multiline flushes, slow arrivals, and truncation/continuation boundaries.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-01T11:20:04Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The multiline parser now registers the first-line context earlier when the group's first-line buffer is empty and avoids packing metadata for truncated content; new unit tests verify metadata preservation across flushes and truncation boundaries.

Changes

Cohort / File(s)	Change Summary
Multiline context & metadata logic `src/multiline/flb_ml.c`	Register `full_map` as context earlier when `stream_group->mp_sbuf.size == 0` (including ENDSWITH path) and require `!truncated` when deciding to pack metadata, preventing metadata packing for truncated content.
Multiline metadata regression tests `tests/internal/multiline.c`	Add tests for issue 10576: introduce `metadata_result`, `flush_callback_metadata_check`, `append_log_with_metadata`, `test_issue_10576`, and `test_issue_truncation_10576`; register tests in `TEST_LIST` to assert per-record stream/file metadata presence and behavior across truncation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review placement of flb_ml_register_context() calls to ensure they're only invoked when mp_sbuf.size == 0.
Confirm the updated metadata packing guard (!truncated && processed && metadata != NULL) doesn't unintentionally drop metadata in valid flows.
Validate the new tests for determinism and that they correctly simulate truncation/continuation edge cases.

Poem

🐇 I hop through logs where fragments play,
I plant the map at each new day.
When bytes are chewed and lines extend,
I keep your tags until the end.
A rabbit guards the metadata way.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title 'ml: ensure context is registered for REGEX type' is partially related to the changeset. The changes involve context registration in multiline processing with conditional logic on empty buffers and truncation boundaries, but the title specifically references 'REGEX type' which is not explicitly mentioned in the file-level summaries and may not represent the main point of the changes.	Clarify whether 'REGEX type' is the key aspect being fixed, or consider a more descriptive title that captures the core issue (e.g., metadata preservation during multiline flush/truncation) if that is the primary change.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa27f8d and 79afb58.

📒 Files selected for processing (2)

src/multiline/flb_ml.c (2 hunks)
tests/internal/multiline.c (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/multiline/flb_ml.c

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

🧬 Code graph analysis (1)

tests/internal/multiline.c (4)

src/multiline/flb_ml.c (1)

flb_ml_append_object (764-863)

src/flb_config.c (1)

flb_config_exit (488-672)

src/multiline/flb_ml_parser.c (4)

flb_ml_parser_create (200-224)

flb_ml_parser_init (131-141)

flb_ml_parser_instance_create (261-312)

flb_ml_parser_instance_set (315-340)

src/multiline/flb_ml_stream.c (1)

flb_ml_stream_create (223-276)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)

GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
GitHub Check: pr-compile-centos-7
GitHub Check: pr-compile-without-cxx (3.31.6)
GitHub Check: PR - fuzzing test

🔇 Additional comments (1)

tests/internal/multiline.c (1)

1646-2090: LGTM! Excellent test coverage for metadata preservation.

The new unit tests comprehensively validate the fix for issue #10576:

test_issue_10576 properly simulates slow log arrival by flushing after each line and verifies all records maintain complete metadata (stream and file fields).

test_issue_truncation_10576 correctly validates metadata isolation across truncation boundaries—ensuring the second multiline group gets its own fresh metadata rather than inheriting stale values.

Helper infrastructure (metadata_result struct, flush_callback_metadata_check, append_log_with_metadata) is well-designed, bounds-safe, and properly manages msgpack resources.

Test assertions are thorough and include helpful diagnostic output.

All previous review feedback (timestamp length, typo, style) has been addressed in prior commits.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/multiline/flb_ml.c

mirko-lazarevic · 2025-12-01T11:31:48Z

This pull request should address the issue #10576

For the fluent-bit configuration example and steps how to reproduce the issue, navigate to #10576

Output after the fix:

Fluent Bit v4.2.1
* Copyright (C) 2015-2025 The Fluent Bit Authors
* Fluent Bit is a CNCF graduated project under the Fluent organization
* https://fluentbit.io

______ _                  _    ______ _ _             ___   _____
|  ___| |                | |   | ___ (_) |           /   | / __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __/ /| | `' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| |   / /
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /\___  |_./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/     |_(_)_____/

             Fluent Bit v4.2 – Direct Routes Ahead
         Celebrating 10 Years of Open, Fluent Innovation!

[2025/12/01 12:30:43.528267000] [ info] [fluent bit] version=4.2.1, commit=10ebd3a354, pid=6123
[2025/12/01 12:30:43.528771000] [ info] [storage] ver=1.5.4, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2025/12/01 12:30:43.528993000] [ info] [simd    ] disabled
[2025/12/01 12:30:43.528998000] [ info] [cmetrics] version=1.0.5
[2025/12/01 12:30:43.529349000] [ info] [ctraces ] version=0.6.6
[2025/12/01 12:30:43.529578000] [ info] [input:tail:tail.0] initializing
[2025/12/01 12:30:43.529585000] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2025/12/01 12:30:43.530012000] [ info] [input:tail:tail.0] multiline core started
[2025/12/01 12:30:43.530308000] [ info] [input:tail:tail.0] thread instance initialized
[2025/12/01 12:30:43.530546000] [ info] [filter:multiline:ml-detect] created emitter: emitter_for_ml-detect
[2025/12/01 12:30:43.530591000] [ info] [input:emitter:emitter_for_ml-detect] initializing
[2025/12/01 12:30:43.530596000] [ info] [input:emitter:emitter_for_ml-detect] storage_strategy='memory' (memory only)
[2025/12/01 12:30:43.530916000] [ info] [output:stdout:stdout.0] worker #0 started
[2025/12/01 12:30:43.531683000] [ info] [http_server] listen iface=0.0.0.0 tcp_port=8081
[2025/12/01 12:30:43.531917000] [ info] [sp] stream processor started
[2025/12/01 12:30:43.532206000] [ info] [engine] Shutdown Grace Period=5, Shutdown Input Grace Period=2


[2025/12/01 12:30:49.787352000] [ info] [filter:multiline:ml-detect] created new multiline stream for tail.0_kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588649.296478000, {}], {"time"=>"2025-12-01T12:30:49.296478+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:30:49 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588654.298018000, {}], {"time"=>"2025-12-01T12:30:54.298018+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:30:54 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588659.299245000, {}], {"time"=>"2025-12-01T12:30:59.299245+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"2025-12-01T11:30:59+00:00 should be ok", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588667.512873000, {}], {"time"=>"2025-12-01T12:31:07.512873+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:31:07 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]
[0] kube.Users.mila.dev.tmp.fluent-bit-10576-go-repro.output.log: [[1764588672.513383999, {}], {"time"=>"2025-12-01T12:31:12.513384+01:00", "stream"=>"stdout", "_p"=>"F", "log"=>"Mon Dec  1 11:31:12 UTC 2025 Likely to fail", "file"=>"/Users/mila/dev/tmp/fluent-bit-10576-go-repro/output.log"}]

patrick-stephens · 2025-12-01T11:41:54Z

@mirko-lazarevic maybe tweak the commit slightly as having ml: in there is redundant and confusing.

Can you add some unit tests as well? I really like to see those as next time the code is refactored/updated it will prevent a similar problem.

patrick-stephens · 2025-12-01T11:42:29Z

The CIFuzz failure is down to something else so can be ignored: #11227

mirko-lazarevic · 2025-12-01T12:00:03Z

@patrick-stephens

@mirko-lazarevic maybe tweak the commit slightly as having ml: in there is redundant and confusing.

I saw exact the same commit message from one of the maintainers, that's why I did the same. Anyway, I removed ml:.

I'll see if I can add some unit tests, although my knowledge in this area is limited.

mirko-lazarevic · 2025-12-02T22:26:16Z

@mirko-lazarevic maybe tweak the commit slightly as having ml: in there is redundant and confusing.

Can you add some unit tests as well? I really like to see those as next time the code is refactored/updated it will prevent a similar problem.

@patrick-stephens Done

mirko-lazarevic · 2025-12-10T09:28:18Z

Hey, the patch is good but there is only one commit does not fit our policy of commit messages:

❌ Commit 23a2e18 failed:
Subject prefix 'multiline:' does not match files changed.
Expected one of: ml:

Done.

mirko-lazarevic · 2025-12-10T11:43:23Z

@cosmo0920 @patrick-stephens I'm not 100% sure what needs to be done regarding commit message linter. Do I need to squash my commits into a single one? Thank you

patrick-stephens · 2025-12-10T12:53:25Z

The prefix should match the files being changed, it looks like it is expected tests: for some of them. If you check out the output from the CI you can see: https://github.com/fluent/fluent-bit/actions/runs/20093737550/job/57646949415?pr=11231

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

cosmo0920 · 2025-12-11T08:59:57Z

Commit linter still complains one commit:
❌ Commit 01f3911 failed:
Subject prefix 'tests:' does not match files changed.
Expected one of: ml:, tests:
Commit prefix validation failed.
Error: Process completed with exit code 1.

Addresses PR comments and adds correspoinding unit tests Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

mirko-lazarevic requested review from cosmo0920 and edsiper as code owners December 1, 2025 11:19

github-actions bot added the docs-required label Dec 1, 2025

mirko-lazarevic temporarily deployed to pr December 1, 2025 11:19 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Dec 1, 2025

View reviewed changes

src/multiline/flb_ml.c Outdated Show resolved Hide resolved

mirko-lazarevic mentioned this pull request Dec 1, 2025

Multiline filter issue, the timestamp and metadata of some logs is missing #10576

Open

mirko-lazarevic temporarily deployed to pr December 1, 2025 11:38 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 1, 2025 11:39 — with GitHub Actions Inactive

patrick-stephens changed the title ~~multiline: ml: ensure context is registered for REGEX type~~ multiline: ensure context is registered for REGEX type Dec 1, 2025

mirko-lazarevic force-pushed the master branch from a398968 to 23a2e18 Compare December 1, 2025 11:49

mirko-lazarevic temporarily deployed to pr December 1, 2025 11:50 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 1, 2025 12:09 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 2, 2025 22:26 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 5, 2025 11:29 — with GitHub Actions Inactive

mirko-lazarevic requested a review from patrick-stephens December 10, 2025 08:55

mirko-lazarevic changed the title ~~multiline: ensure context is registered for REGEX type~~ ml: ensure context is registered for REGEX type Dec 10, 2025

mirko-lazarevic added 2 commits December 10, 2025 14:36

tests: Add unit tests

ac0ab6a

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

tests: improve unit tests

128f9e4

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

mirko-lazarevic force-pushed the master branch from 916c20a to fa27f8d Compare December 10, 2025 13:37

mirko-lazarevic temporarily deployed to pr December 10, 2025 13:38 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 10, 2025 13:58 — with GitHub Actions Inactive

mirko-lazarevic added 3 commits December 11, 2025 11:30

ml: handle TRUNCATED return case

2a27cbb

Addresses PR comments and adds correspoinding unit tests Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

tests: conding style fix

1d8954f

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

tests: fix typo

79afb58

Signed-off-by: Mirko Lazarevic <mirko.lazarevic@ibm.com>

mirko-lazarevic force-pushed the master branch from fa27f8d to 79afb58 Compare December 11, 2025 10:30

mirko-lazarevic temporarily deployed to pr December 11, 2025 10:31 — with GitHub Actions Inactive

mirko-lazarevic temporarily deployed to pr December 11, 2025 10:49 — with GitHub Actions Inactive

ml: ensure context is registered for REGEX type #11231

Are you sure you want to change the base?

ml: ensure context is registered for REGEX type #11231

Uh oh!

Conversation

mirko-lazarevic commented Dec 1, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mirko-lazarevic commented Dec 1, 2025

Uh oh!

patrick-stephens commented Dec 1, 2025

Uh oh!

patrick-stephens commented Dec 1, 2025

Uh oh!

mirko-lazarevic commented Dec 1, 2025

Uh oh!

mirko-lazarevic commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mirko-lazarevic commented Dec 10, 2025

Uh oh!

mirko-lazarevic commented Dec 10, 2025

Uh oh!

patrick-stephens commented Dec 10, 2025

Uh oh!

cosmo0920 commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mirko-lazarevic commented Dec 1, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading

mirko-lazarevic commented Dec 2, 2025 •

edited

Loading