性能优化PPT6

发表于 2025/03/29

作者

2 分钟阅读

性能优化PPT6

Session 5: Profiling

Lecturer: Anne Reinarz

🧠 核心目标

在大型代码库中使用 Profiling 识别热点，优化性能瓶颈。

🔍 为什么需要 Profiling？

Performance counters 不适用于大规模代码（太难标注）。
Profiling 帮助我们：
1. 找到耗时最多的部分（hotspots）
2. 量化这些部分的性能
3. 优化它们

📊 Profiling 类型

1. Sampling

✅ 不需要修改源码
⚠️ 只提供统计模型（不够精确）
⚠️ 对小函数或短程序不够详细
✅ 适合长时间运行的程序

特点

程序中断 + 定期采样 + 栈快照
有一定误差

2. Instrumentation

✅ 极度详细，需要源码注释
⚠️ 需预处理源码
⚠️ 对小函数开销大

Tracing

显式测量，精度高但信息少，成本高

🛠 使用 `gprof` 做采样分析

基本流程

编译时加 profiling 选项：

  
gcc -pg -g <source_file> -o <executable_name>

运行程序 → 生成 gmon.out

生成报告：

  
gprof <executable_name> gmon.out        # 平面信息 + 调用图
gprof -A <executable_name> gmon.out     # 带注释的源码

📄 gprof 输出解读

1. Flat Profile

显示各函数耗时、调用次数：

%  cumulative  self  self  total
time seconds   seconds calls s/call s/call name
99.82   5.70     5.70     2   2.85   2.85  basic_gemm

self time：函数本身耗时
total time：包含子函数总耗时

2. Call Graph

展示调用关系及各函数耗时、调用路径：

[1] 100.0 0.00 5.71 1 bench
      5.70 0.00 2/2 basic_gemm

3. Annotated Source

源码中显示每行执行次数，例如：

  
2 -> {
   const int ilim = ...
}

🧩 优化工作流程

识别热点函数
找到相关代码段
确认算法实现
加入 Instrumentation 标记（如 likwid API）
用性能模型做进一步分析

MISCADA, GPU Course

GPU Programming

本文由作者按照 CC BY 4.0 进行授权

Session 5: Profiling

🧠 核心目标

🔍 为什么需要 Profiling？

📊 Profiling 类型

1. Sampling

特点

2. Instrumentation

Tracing

🛠 使用 gprof 做采样分析

基本流程

📄 gprof 输出解读

1. Flat Profile

2. Call Graph

3. Annotated Source

🧩 优化工作流程

热门标签

🛠 使用 `gprof` 做采样分析