Linux性能評(píng)測(cè)工具之一：gprof篇

昵稱17328427 2014-09-11

展開(kāi)全文

1 簡(jiǎn)介

改進(jìn)應(yīng)用程序的性能是一項(xiàng)非常耗時(shí)耗力的工作，但是究竟程序中是哪些函數(shù)消耗掉了大部分執(zhí)行時(shí)間，這通常都不是非常明顯的。GNU 編譯器工具包所提供了一種剖析工具 GNU profiler（gprof）。gprof 可以為 Linux平臺(tái)上的程序精確分析性能瓶頸。gprof精確地給出函數(shù)被調(diào)用的時(shí)間和次數(shù)，給出函數(shù)調(diào)用關(guān)系。

gprof 用戶手冊(cè)網(wǎng)站 http:///binutils/docs-2.17/gprof/index.html

2 功能

Gprof 是GNU gnu binutils工具之一，默認(rèn)情況下linux系統(tǒng)當(dāng)中都帶有這個(gè)工具。

1. 可以顯示“flat profile”，包括每個(gè)函數(shù)的調(diào)用次數(shù)，每個(gè)函數(shù)消耗的處理器時(shí)間，

2. 可以顯示“Call graph”，包括函數(shù)的調(diào)用關(guān)系，每個(gè)函數(shù)調(diào)用花費(fèi)了多少時(shí)間。

3. 可以顯示“注釋的源代碼”－－是程序源代碼的一個(gè)復(fù)本，標(biāo)記有程序中每行代碼的執(zhí)行次數(shù)。

3 原理

通過(guò)在編譯和鏈接程序的時(shí)候（使用 -pg 編譯和鏈接選項(xiàng)），gcc 在你應(yīng)用程序的每個(gè)函數(shù)中都加入了一個(gè)名為mcount ( or “_mcount” , or “__mcount” , 依賴于編譯器或操作系統(tǒng))的函數(shù)，也就是說(shuō)你的應(yīng)用程序里的每一個(gè)函數(shù)都會(huì)調(diào)用mcount, 而mcount 會(huì)在內(nèi)存中保存一張函數(shù)調(diào)用圖，并通過(guò)函數(shù)調(diào)用堆棧的形式查找子函數(shù)和父函數(shù)的地址。這張調(diào)用圖也保存了所有與函數(shù)相關(guān)的調(diào)用時(shí)間，調(diào)用次數(shù)等等的所有信息。

4 使用流程

1. 在編譯和鏈接時(shí) 加上-pg選項(xiàng)。一般我們可以加在 makefile 中。

2. 執(zhí)行編譯的二進(jìn)制程序。執(zhí)行參數(shù)和方式同以前。

3. 在程序運(yùn)行目錄下生成 gmon.out 文件。如果原來(lái)有g(shù)mon.out 文件，將會(huì)被重寫(xiě)。

4. 結(jié)束進(jìn)程。這時(shí) gmon.out 會(huì)再次被刷新。

5. 用 gprof 工具分析 gmon.out 文件。

5 參數(shù)說(shuō)明

l -b 不再輸出統(tǒng)計(jì)圖表中每個(gè)字段的詳細(xì)描述。

l -p 只輸出函數(shù)的調(diào)用圖（Call graph的那部分信息）。

l -q 只輸出函數(shù)的時(shí)間消耗列表。

l -e Name 不再輸出函數(shù)Name 及其子函數(shù)的調(diào)用圖（除非它們有未被限制的其它父函數(shù)）?？梢越o定多個(gè) -e 標(biāo)志。一個(gè) -e 標(biāo)志只能指定一個(gè)函數(shù)。

l -E Name 不再輸出函數(shù)Name 及其子函數(shù)的調(diào)用圖，此標(biāo)志類似于 -e 標(biāo)志，但它在總時(shí)間和百分比時(shí)間的計(jì)算中排除了由函數(shù)Name 及其子函數(shù)所用的時(shí)間。

l -f Name 輸出函數(shù)Name 及其子函數(shù)的調(diào)用圖。可以指定多個(gè) -f 標(biāo)志。一個(gè) -f 標(biāo)志只能指定一個(gè)函數(shù)。

l -F Name 輸出函數(shù)Name 及其子函數(shù)的調(diào)用圖，它類似于 -f 標(biāo)志，但它在總時(shí)間和百分比時(shí)間計(jì)算中僅使用所打印的例程的時(shí)間?？梢灾付ǘ鄠€(gè) -F 標(biāo)志。一個(gè) -F 標(biāo)志只能指定一個(gè)函數(shù)。-F 標(biāo)志覆蓋 -E 標(biāo)志。

l -z 顯示使用次數(shù)為零的例程（按照調(diào)用計(jì)數(shù)和累積時(shí)間計(jì)算）。

一般用法： gprof –b 二進(jìn)制程序 gmon.out >report.txt

6 報(bào)告說(shuō)明

Gprof 產(chǎn)生的信息解釋：

%time

Cumulative

seconds

Self

Seconds

Calls

Self

TS/call

Total

TS/call

name

該函數(shù)消耗時(shí)間占程序所有時(shí)間百分比

程序的累積執(zhí)行時(shí)間

（只是包括gprof能夠監(jiān)控到的函數(shù)）

該函數(shù)本身執(zhí)行時(shí)間

（所有被調(diào)用次數(shù)的合共時(shí)間）

函數(shù)被調(diào)用次數(shù)

函數(shù)平均執(zhí)行時(shí)間

（不包括被調(diào)用時(shí)間）

（函數(shù)的單次執(zhí)行時(shí)間）

函數(shù)平均執(zhí)行時(shí)間

（包括被調(diào)用時(shí)間）

（函數(shù)的單次執(zhí)行時(shí)間）

函數(shù)名

Call Graph 的字段含義：

Index	%time	Self	Children	Called	Name
索引值	函數(shù)消耗時(shí)間占所有時(shí)間百分比	函數(shù)本身執(zhí)行時(shí)間	執(zhí)行子函數(shù)所用時(shí)間	被調(diào)用次數(shù)	函數(shù)名

注意：

程序的累積執(zhí)行時(shí)間只是包括gprof能夠監(jiān)控到的函數(shù)。工作在內(nèi)核態(tài)的函數(shù)和沒(méi)有加-pg編譯的第三方庫(kù)函數(shù)是無(wú)法被gprof能夠監(jiān)控到的，（如sleep（）等）

Gprof 的具體參數(shù)可以通過(guò) man gprof 查詢。

7 共享庫(kù)的支持

對(duì)于代碼剖析的支持是由編譯器增加的，因此如果希望從共享庫(kù)中獲得剖析信息，就需要使用 -pg 來(lái)編譯這些庫(kù)。提供已經(jīng)啟用代碼剖析支持而編譯的 C 庫(kù)版本（libc_p.a）。

如果需要分析系統(tǒng)函數(shù)（如libc庫(kù)），可以用 –lc_p替換-lc。這樣程序會(huì)鏈接libc_p.so或libc_p.a。這非常重要，因?yàn)橹挥羞@樣才能監(jiān)控到底層的c庫(kù)函數(shù)的執(zhí)行時(shí)間，（例如memcpy()，memset()，sprintf()等）。

gcc example1.c –pg -lc_p -o example1

注意要用ldd ./example | grep libc來(lái)查看程序鏈接的是libc.so還是libc_p.so

8 用戶時(shí)間與內(nèi)核時(shí)間

gprof 的最大缺陷：它只能分析應(yīng)用程序在運(yùn)行過(guò)程中所消耗掉的用戶時(shí)間，無(wú)法得到程序內(nèi)核空間的運(yùn)行時(shí)間。通常來(lái)說(shuō)，應(yīng)用程序在運(yùn)行時(shí)既要花費(fèi)一些時(shí)間來(lái)運(yùn)行用戶代碼，也要花費(fèi)一些時(shí)間來(lái)運(yùn)行 “系統(tǒng)代碼”，例如內(nèi)核系統(tǒng)調(diào)用sleep()。

有一個(gè)方法可以查看應(yīng)用程序的運(yùn)行時(shí)間組成，在 time 命令下面執(zhí)行程序。這個(gè)命令會(huì)顯示一個(gè)應(yīng)用程序的實(shí)際運(yùn)行時(shí)間、用戶空間運(yùn)行時(shí)間、內(nèi)核空間運(yùn)行時(shí)間。

如 time ./program

輸出：

real 2m30.295s

user 0m0.000s

sys 0m0.004s

9 注意事項(xiàng)

1. g++在編譯和鏈接兩個(gè)過(guò)程，都要使用-pg選項(xiàng)。

2. 只能使用靜態(tài)連接libc庫(kù)，否則在初始化*.so之前就調(diào)用profile代碼會(huì)引起“segmentation fault”，解決辦法是編譯時(shí)加上-static-libgcc或-static。

3. 如果不用g++而使用ld直接鏈接程序，要加上鏈接文件/lib/gcrt0.o，如ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p。也可能是gcrt1.o

4. 要監(jiān)控到第三方庫(kù)函數(shù)的執(zhí)行時(shí)間，第三方庫(kù)也必須是添加 –pg 選項(xiàng)編譯的。

5. gprof只能分析應(yīng)用程序所消耗掉的用戶時(shí)間.

6. 程序不能以demon方式運(yùn)行。否則采集不到時(shí)間。（可采集到調(diào)用次數(shù)）

7. 首先使用 time 來(lái)運(yùn)行程序從而判斷 gprof 是否能產(chǎn)生有用信息是個(gè)好方法。

8. 如果 gprof 不適合您的剖析需要，那么還有其他一些工具可以克服 gprof 部分缺陷，包括 OProfile 和 Sysprof。

9. gprof對(duì)于代碼大部分是用戶空間的CPU密集型的程序用處明顯。對(duì)于大部分時(shí)間運(yùn)行在內(nèi)核空間或者由于外部因素（例如操作系統(tǒng)的 I/O 子系統(tǒng)過(guò)載）而運(yùn)行得非常慢的程序難以進(jìn)行優(yōu)化。

10. gprof 不支持多線程應(yīng)用，多線程下只能采集主線程性能數(shù)據(jù)。原因是gprof采用ITIMER_PROF信號(hào)，在多線程內(nèi)只有主線程才能響應(yīng)該信號(hào)。但是有一個(gè)簡(jiǎn)單的方法可以解決這一問(wèn)題：http://sam./writings/programming/gprof.html

11. gprof只能在程序正常結(jié)束退出之后才能生成報(bào)告（gmon.out）。

a) 原因： gprof通過(guò)在atexit()里注冊(cè)了一個(gè)函數(shù)來(lái)產(chǎn)生結(jié)果信息，任何非正常退出都不會(huì)執(zhí)行atexit()的動(dòng)作，所以不會(huì)產(chǎn)生gmon.out文件。

b) 程序可從main函數(shù)中正常退出，或者通過(guò)系統(tǒng)調(diào)用exit()函數(shù)退出。

10 多線程應(yīng)用

gprof 不支持多線程應(yīng)用，多線程下只能采集主線程性能數(shù)據(jù)。原因是gprof采用ITIMER_PROF信號(hào)，在多線程內(nèi)只有主線程才能響應(yīng)該信號(hào)。

采用什么方法才能夠分析所有線程呢？關(guān)鍵是能夠讓各個(gè)線程都響應(yīng)ITIMER_PROF信號(hào)?？梢酝ㄟ^(guò)樁子函數(shù)來(lái)實(shí)現(xiàn)，重寫(xiě)pthread_create函數(shù)。

//////////////////// gprof-helper.c////////////////////////////

#define _GNU_SOURCE

#include <sys/time.h>

#include <stdio.h>

#include <stdlib.h>

#include <dlfcn.h>

#include <pthread.h>

static void * wrapper_routine(void *);

/* Original pthread function */

static int (*pthread_create_orig)(pthread_t *__restrict,

__const pthread_attr_t *__restrict,

void *(*)(void *),

void *__restrict) = NULL;

/* Library initialization function */

void wooinit(void) __attribute__((constructor));

void wooinit(void)

{

pthread_create_orig = dlsym(RTLD_NEXT, "pthread_create");

fprintf(stderr, "pthreads: using profiling hooks for gprof/n");

if(pthread_create_orig == NULL)

{

char *error = dlerror();

if(error == NULL)

{

error = "pthread_create is NULL";

}

fprintf(stderr, "%s/n", error);

exit(EXIT_FAILURE);

}

/* Our data structure passed to the wrapper */

typedef struct wrapper_s

{

void * (*start_routine)(void *);

void * arg;

pthread_mutex_t lock;

pthread_cond_t wait;

struct itimerval itimer;

} wrapper_t;

/* The wrapper function in charge for setting the itimer value */

static void * wrapper_routine(void * data)

{

/* Put user data in thread-local variables */

void * (*start_routine)(void *) = ((wrapper_t*)data)->;start_routine;

void * arg = ((wrapper_t*)data)->;arg;

/* Set the profile timer value */

setitimer(ITIMER_PROF, &((wrapper_t*)data)->;itimer, NULL);

/* Tell the calling thread that we don't need its data anymore */

pthread_mutex_lock(&((wrapper_t*)data)->;lock);

pthread_cond_signal(&((wrapper_t*)data)->;wait);

pthread_mutex_unlock(&((wrapper_t*)data)->;lock);

/* Call the real function */

return start_routine(arg);

}

/* Our wrapper function for the real pthread_create() */

int pthread_create(pthread_t *__restrict thread,

__const pthread_attr_t *__restrict attr,

void * (*start_routine)(void *),

void *__restrict arg)

{

wrapper_t wrapper_data;

int i_return;

/* Initialize the wrapper structure */

wrapper_data.start_routine = start_routine;

wrapper_data.arg = arg;

getitimer(ITIMER_PROF, &wrapper_data.itimer);

pthread_cond_init(&wrapper_data.wait, NULL);

pthread_mutex_init(&wrapper_data.lock, NULL);

pthread_mutex_lock(&wrapper_data.lock);

/* The real pthread_create call */

i_return = pthread_create_orig(thread,

attr,

&wrapper_routine,

&wrapper_data);

/* If the thread was successfully spawned, wait for the data

* to be released */

if(i_return == 0)

{

pthread_cond_wait(&wrapper_data.wait, &wrapper_data.lock);

}

pthread_mutex_unlock(&wrapper_data.lock);

pthread_mutex_destroy(&wrapper_data.lock);

pthread_cond_destroy(&wrapper_data.wait);

return i_return;

}

///////////////////

然后編譯成動(dòng)態(tài)庫(kù) gcc -shared -fPIC gprof-helper.c -o gprof-helper.so -lpthread -ldl

使用例子：

/////////////////////a.c/////////////////////////////

#include <stdio.h>;

#include <stdlib.h>;

#include <unistd.h>;

#include <pthread.h>;

#include <string.h>;

void fun1();

void fun2();

void* fun(void * argv);

int main()

{

int i =0;

int id;

pthread_t thread[100];

for(i =0 ;i< 100; i++)

{

id = pthread_create(&thread[i], NULL, fun, NULL);

printf("thread =%d/n",i);

}

printf("dsfsd/n");

return 0;

}

void* fun(void * argv)

{

fun1();

fun2();

return NULL;

}

void fun1()

{

int i = 0;

while(i<100)

{

i++;

printf("fun1/n");

}

void fun2()

{

int i = 0;

int b;

while(i<50)

{

i++;

printf("fun2/n");

//b+=i;

}

///////////////

gcc -pg a.c gprof-helper.so

運(yùn)行程序:

./a.out

分析gmon.out:

gprof -b a.out gmon.out

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：昵稱17328427 > 《Linux》

舉報(bào)/認(rèn)領(lǐng)