Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement native stack unwind in arm64 #300

Open
zdyj3170101136 opened this issue Jan 10, 2025 · 10 comments
Open

implement native stack unwind in arm64 #300

zdyj3170101136 opened this issue Jan 10, 2025 · 10 comments

Comments

@zdyj3170101136
Copy link

In arm64, systems below Linux 6 cannot use fp stack unwinding.

Therefore, I suggest that we implement our own fp stack unwinding to support arm64 below linux 6.

In addition, for the go language, if there is a function call chain of a -> b -> c and fp is used for stack unwinding, then when the function is in c, the call chain of fp stack unwinding is a -> c, one function less.
On arm64, we can use the lr link register to solve this problem.

@zdyj3170101136
Copy link
Author

@rockdaboot @umanwizard

@florianl
Copy link
Contributor

florianl commented Jan 10, 2025

Hi @zdyj3170101136
This project does not rely on FP to unwind the stack. The minimum supported version on amd64 and arm64 is defined by the availability of eBPF helper functions on the respective architecture, that help to unwind the stack without FP. So for amd64 the minimum supported Linux kernel version is 4.19 and on arm64 the minimum supported Linux kernel version is 5.5.

Does this resolve your request?

@zdyj3170101136
Copy link
Author

Hi @zdyj3170101136 This project does not rely on FP to unwind the stack. The minimum supported version on amd64 and arm64 is defined by the availability of eBPF helper functions on the respective architecture, that help to unwind the stack without FP. So for amd64 the minimum supported Linux kernel version is 4.19 and on arm64 the minimum supported Linux kernel version is 5.5.

Does this resolve your request?

As far as I know, for go functions.

Its eh_frame is incomplete, so if you want to do stack unwinding, you can only use debug_frame.
But debug_frame requires a lot of memory.
There is also .gopclntab that can be used for unwinding, but it is very CPU-intensive.
In fact, go itself has also switched to using fp for stack unwinding to reduce CPU consumption

https://github.com/golang/go/blob/932ec2be8d01e553a768df3709182abf2b579097/src/runtime/tracestack.go#L255

@florianl
Copy link
Contributor

This project supports unwinding the Go via .gopclntab - for details feel free to check out https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/nativeunwind/elfunwindinfo/elfgopclntab.go. At some point unwinding the Go stack using the Go internal FP could be an option for newer Go versions, that support that.

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Jan 10, 2025

https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/nativeunwind/elfunwindinfo/elfgopclntab.go

the go itself replace .gopcintab with fp,

you could see https://blog.felixge.de/reducing-gos-execution-tracer-overhead-with-frame-pointer-unwinding/.

and the arm64 fp unwind is not supported is for all language below linux 6.

@florianl
Copy link
Contributor

In the future unwinding the Go stack using the Go internal FP could be an optimization for Go versions, that support that. But to keep backwards compatibility, unwinding the Go stack with support of .gopclntab will be kept, I think.

@fabled
Copy link
Contributor

fabled commented Jan 10, 2025

In arm64, systems below Linux 6 cannot use fp stack unwinding.

What you mean by "Linux 6"? Linux kernel 6 or some distribution version? Why it cannot use fp unwinding?
Some distributions intentionally enable frame pointers for C-code, but we don't use it. We use .eh_frame and .gopclntab if available for native code.

Therefore, I suggest that we implement our own fp stack unwinding to support arm64 below linux 6.

We support fp unwinding. And infact, we use fp unwinding for Go by default, see https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/nativeunwind/elfunwindinfo/elfgopclntab.go#L269-L273.

In addition, for the go language, if there is a function call chain of a -> b -> c and fp is used for stack unwinding, then when the function is in c, the call chain of fp stack unwinding is a -> c, one function less. On arm64, we can use the lr link register to solve this problem.

This is a traditional problem on architectures without lr. We do suppot lr unwinding for .eh_frame, but it seems the general heuristic for framepointer unwinding does not take advantage of this. We should probably add a mode (or modify the existing FP mode) so that it uses lr when unwinding first user mode frame, and fp after that. cc @athre0z @christos68k

There is also .gopclntab that can be used for unwinding, but it is very CPU-intensive.
In fact, go itself has also switched to using fp for stack unwinding to reduce CPU consumption

This does not apply to us. At attach time, the .gocplntab is translated to internal representation which is fast to look up. Even when not using fp unwinding we are very fast.

However, for Go, this is a problem because the Go executables are statically built and tend to be huge in size. This also results in millions of unwinding entries which take a lot of memory. For this reason, we default ot fp unwinding of Go on arm64. For x86, we do a mix, we use fp for most of it, but have a heuristic to do stack delta unwinding for functions needing it.

@zdyj3170101136
Copy link
Author

zdyj3170101136 commented Jan 10, 2025

In arm64, systems below Linux 6 cannot use fp stack unwinding.

What you mean by "Linux 6"? Linux kernel 6 or some distribution version? Why it cannot use fp unwinding? Some distributions intentionally enable frame pointers for C-code, but we don't use it. We use .eh_frame and .gopclntab if available for native code.

Therefore, I suggest that we implement our own fp stack unwinding to support arm64 below linux 6.

We support fp unwinding. And infact, we use fp unwinding for Go by default, see https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/nativeunwind/elfunwindinfo/elfgopclntab.go#L269-L273.

In addition, for the go language, if there is a function call chain of a -> b -> c and fp is used for stack unwinding, then when the function is in c, the call chain of fp stack unwinding is a -> c, one function less. On arm64, we can use the lr link register to solve this problem.

This is a traditional problem on architectures without lr. We do suppot lr unwinding for .eh_frame, but it seems the general heuristic for framepointer unwinding does not take advantage of this. We should probably add a mode (or modify the existing FP mode) so that it uses lr when unwinding first user mode frame, and fp after that. cc @athre0z @christos68k

There is also .gopclntab that can be used for unwinding, but it is very CPU-intensive.
In fact, go itself has also switched to using fp for stack unwinding to reduce CPU consumption

This does not apply to us. At attach time, the .gocplntab is translated to internal representation which is fast to look up. Even when not using fp unwinding we are very fast.

However, for Go, this is a problem because the Go executables are statically built and tend to be huge in size. This also results in millions of unwinding entries which take a lot of memory. For this reason, we default ot fp unwinding of Go on arm64. For x86, we do a mix, we use fp for most of it, but have a heuristic to do stack delta unwinding for functions needing it.

given a go program, and use perf fp, you cloud see that the user stack fp unwind have problem.

the func bpf_get_stackid could not unwind successfully, i do not know the reason.

perf record -a -g -F 100 -p 742579
perf script
testjson 742580 3029651.742305:          1 cycles:
        ffff8000081d7378 arch_local_irq_enable+0x8 ([kernel.kallsyms])
        ffff800008e390c0 __schedule+0x208 ([kernel.kallsyms])
        ffff800008e39444 schedule+0x4c ([kernel.kallsyms])
        ffff800008e3dd24 do_nanosleep+0x74 ([kernel.kallsyms])
        ffff800008277750 hrtimer_nanosleep+0x90 ([kernel.kallsyms])
        ffff800008277870 __arm64_sys_nanosleep+0x98 ([kernel.kallsyms])
        ffff80000816b00c el0_svc_common.constprop.0+0x84 ([kernel.kallsyms])
        ffff80000816b244 do_el0_svc+0x74 ([kernel.kallsyms])
        ffff800008e3258c el0_svc+0x1c ([kernel.kallsyms])
        ffff800008e32df0 el0_sync_handler+0xa8 ([kernel.kallsyms])
        ffff800008151de8 el0_sync+0x168 ([kernel.kallsyms])
                  46fd78 runtime.usleep.abi0+0x48 (/root/testjson/testjson)

here is my linux version

cat /proc/version
Linux version 5.10.134-17.3.al8.aarch64 ([email protected]) (gcc (GCC) 10.2.1 20200825 (Alibaba 10.2.1-3.8 2.32), GNU ld version2.35-12.3.al8) #1 SMP Thu Oct 31 14:27:09 CST 2024

in fact, the linux 6 already solve leaf caller miss problem, see https://github.com/torvalds/linux/blob/2144da25584eb10b84252230319b5783f6a83041/tools/perf/util/arm64-frame-pointer-unwind-support.c#L31

@fabled
Copy link
Contributor

fabled commented Jan 10, 2025

the func bpf_get_stackid could not unwind successfully, i do not know the reason.

We do not use this function ever. All unwinding is done by custom ebpf code at:
https://github.com/open-telemetry/opentelemetry-ebpf-profiler/tree/main/support/ebpf

The native unwinder code is at:
https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/support/ebpf/native_stack_trace.ebpf.c

@zdyj3170101136
Copy link
Author

the func bpf_get_stackid could not unwind successfully, i do not know the reason.

We do not use this function ever. All unwinding is done by custom ebpf code at: https://github.com/open-telemetry/opentelemetry-ebpf-profiler/tree/main/support/ebpf

The native unwinder code is at: https://github.com/open-telemetry/opentelemetry-ebpf-profiler/blob/main/support/ebpf/native_stack_trace.ebpf.c

if you read fp by bpf_probe_read_user, seems there is no problem, but i have not tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants