從源代碼看.net下exe的加載過程

zqx1000 2006-09-19

展開全文

樓

【原創(chuàng)】從源代碼看.net下exe的加載過程

標(biāo) 題: 【原創(chuàng)】從源代碼看.net下exe的加載過程
作者: tankaiha
時(shí) 間: 2006-09-11,18:24
鏈接: http://bbs./showthread.php?threadid=31799

這里的源代碼自然不是指.net Framework的源碼，不過微軟公開了一個(gè)代號為rotor的open source cli的源碼，你可以把它看為輕量級的.net framework。最關(guān)鍵的是，它倆的運(yùn)行機(jī)理大致相同。今天，我們就從rotor的源碼中看看做為程序調(diào)試最基本的exe文件的動(dòng)態(tài)加載。同樣，先給出參考文獻(xiàn)，免得有人說我抄襲?！秈nside the rotor cli》，另一本是《shared source cli》，只不過網(wǎng)上搞不到。當(dāng)然，還要從MSDN的網(wǎng)站下載sscli2.0壓縮包。
和win32下一樣，系統(tǒng)會提供一個(gè)loader將exe讀入，sscli中提供了另一個(gè)loader的例子：clix.exe。我們暫且把它看為系統(tǒng)默認(rèn)的loader，來看源碼（clix.cpp），注意紅色的代碼

代碼:
DWORD Launch(WCHAR* pFileName, WCHAR* pCmdLine)
{
    WCHAR exeFileName[MAX_PATH + 1];
    DWORD dwAttrs;
    DWORD dwError;
    DWORD nExitCode;
...
//這里進(jìn)行一系列文件的屬性檢查
...
 
    if (dwError != ERROR_SUCCESS) {
        // We can‘t find the file, or there‘s some other problem. Exit with an error.
        fwprintf(stderr, L"%s: ", pFileName);
        DisplayMessageFromSystem(dwError);
        return 1;   // error
    }
    nExitCode = _CorExeMain2(NULL, 0, pFileName, NULL, pCmdLine);
    // _CorExeMain2 never returns with success
    _ASSERTE(nExitCode != 0);
    DisplayMessageFromSystem(::GetLastError());
    return nExitCode;
}

這里我們看到了著名的CorExeMain，還記得用PE編輯文件打開.netPE文件，只引入了一個(gè)函數(shù)嗎？mscoree.dll!_CorExeMain。奇怪，怎么不是_CorExeMain2呢？這只是rotor和商業(yè)版的framework的一點(diǎn)區(qū)別而已。你可以用IDApro逆一下mscoree.dll，就可以看到_CorExeMain()只不過是一個(gè)中轉(zhuǎn)，代碼如下

代碼:
.text:79011B47                 push    offset a_corexemain ; "_CorExeMain"
.text:79011B4C                 push    [ebp+hModule]   ; hModule
.text:79011B4F                 call    ds:__imp__GetProcAddress@8 ; GetProcAddress(x,x)
.text:79011B55                 test    eax, eax
.text:79011B57                 jz      loc_79019B46
.text:79011B5D                 call    eax

進(jìn)入后馬上就調(diào)用了mscorwks.dll的_CorExeMain。而這個(gè)函數(shù)和rotor中剛才提到的_CorExeMain2提供的功能差不多，就開始exe載入的初始化了。這些都可以從反匯編代碼與源代碼比較看出來。繼續(xù)回到sscli中，來看_CorExeMain2()的代碼（ceemain.cpp）

代碼:
__int32 STDMETHODCALLTYPE _CorExeMain2( // Executable exit code.
    PBYTE   pUnmappedPE,                // -> memory mapped code
    DWORD   cUnmappedPE,                // Size of memory mapped code
    __in LPWSTR  pImageNameIn,          // -> Executable Name
    __in LPWSTR  pLoadersFileName,      // -> Loaders Name
    __in LPWSTR  pCmdLine)              // -> Command Line
{
    // This entry point is used by clix
    BOOL bRetVal = 0;
    //BEGIN_ENTRYPOINT_VOIDRET;
    // Before we initialize the EE, make sure we‘ve snooped for all EE-specific
    // command line arguments that might guide our startup.
    HRESULT result = CorCommandLine::SetArgvW(pCmdLine);
    if (!CacheCommandLine(pCmdLine, CorCommandLine::GetArgvW(NULL))) {
        LOG((LF_STARTUP, LL_INFO10, "Program exiting - CacheCommandLine failed\n"));
        bRetVal = -1;
        goto exit;
    }
    if (SUCCEEDED(result))
        result = CoInitializeEE(COINITEE_DEFAULT | COINITEE_MAIN);
    if (FAILED(result)) {
        VMDumpCOMErrors(result);
        SetLatchedExitCode (-1);
        goto exit;
    }
    // This is here to get the ZAPMONITOR working correctly
    INSTALL_UNWIND_AND_CONTINUE_HANDLER;
    // Load the executable
    bRetVal = ExecuteEXE(pImageNameIn);
...
...

大多數(shù)代碼都可以略過，關(guān)鍵的就兩個(gè)，一個(gè)是初始化ee（execute engine），初始化成功后就調(diào)用ExecuteEXE，參數(shù)是文件名。這里可以清楚地看到_CorExeMain()的傳入?yún)?shù)是什么。ExecuteEXE()的代碼不多，也是個(gè)跳板：

代碼:
BOOL STDMETHODCALLTYPE ExecuteEXE(HMODULE hMod)
{
    STATIC_CONTRACT_GC_TRIGGERS;
    _ASSERTE(hMod);
    if (!hMod)
        return FALSE;
    ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_EXEC_EXE);
    TIMELINE_START(STARTUP, ("ExecuteExe"));
    EX_TRY_NOCATCH
    {
        // Executables are part of the system domain
        SystemDomain::ExecuteMainMethod(hMod);
    }
    EX_END_NOCATCH;
    ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_EXEC_EXE+1);
    TIMELINE_END(STARTUP, ("ExecuteExe"));
    return TRUE;
}

    同樣，關(guān)鍵的代碼只有一行，SystemDomain::ExecuteMainMethod(hMod)。其中，字面上看ExecuteMainMethod是將傳入的文件作為了一個(gè)module，在.net中，如果要以包含關(guān)系算的話，assembly > module > class > method。也就是說每一個(gè)assembly可能包含多個(gè)module，且至少有一個(gè)module有且只有一個(gè)MainMethod，就是入口方法。

    下面轉(zhuǎn)到SystemDomain::ExecuteMainMethod()的代碼中（assembly.cpp）

代碼:    
INT32 Assembly::ExecuteMainMethod(PTRARRAYREF *stringArgs)
{
    CONTRACTL
    {
        INSTANCE_CHECK;
        THROWS;
        GC_TRIGGERS;
        MODE_ANY;
        ENTRY_POINT;
        INJECT_FAULT(COMPlusThrowOM());
    }
    CONTRACTL_END;
    HRESULT hr = S_OK;
    INT32   iRetVal = 0;
    BEGIN_ENTRYPOINT_THROWS;
    Thread *pThread = GetThread();
    MethodDesc *pMeth;
    {
        // This thread looks like it wandered in -- but actually we rely on it to keep the process alive.
        pThread->SetBackground(FALSE);
    
        GCX_COOP();
        pMeth = GetEntryPoint();
        if (pMeth) {
            RunMainPre();
            hr = ClassLoader::RunMain(pMeth, 1, &iRetVal, stringArgs);
        }
    }
    //RunMainPost is supposed to be called on the main thread of an EXE,
    //after that thread has finished doing useful work.  It contains logic
    //to decide when the process should get torn down.  So, don‘t call it from
    // AppDomain.ExecuteAssembly()
    if (pMeth) {
        if (stringArgs == NULL)
            RunMainPost();
    }
    else {
        StackSString displayName;
        GetDisplayName(displayName);
        COMPlusThrowHR(COR_E_MISSINGMETHOD, IDS_EE_FAILED_TO_FIND_MAIN, displayName);
    }
    if (FAILED(hr))
        ThrowHR(hr);
    END_ENTRYPOINT_THROWS;
    return iRetVal;
}

關(guān)鍵的步驟還是兩個(gè)，準(zhǔn)備好線程環(huán)境，然后運(yùn)行Main方法。下面來到clsload.cpp中看ClassLoader::RunMain，這也是這次我們的最后一站。

代碼:
HRESULT ClassLoader::RunMain(MethodDesc *pFD ,
                             short numSkipArgs,
                             INT32 *piRetVal,
                             PTRARRAYREF *stringArgs /*=NULL*/)
{
    STATIC_CONTRACT_THROWS;
    _ASSERTE(piRetVal);
    DWORD       cCommandArgs = 0;  // count of args on command line
    DWORD       arg = 0;
    LPWSTR      *wzArgs = NULL; // command line args
    HRESULT     hr = S_OK;
    *piRetVal = -1;
    // The exit code for the process is communicated in one of two ways.  If the
    // entrypoint returns an ‘int‘ we take that.  Otherwise we take a latched
    // process exit code.  This can be modified by the app via setting
    // Environment‘s ExitCode property.
    if (stringArgs == NULL)
        SetLatchedExitCode(0);
    if (!pFD) {
        _ASSERTE(!"Must have a function to call!");
        return E_FAIL;
    }
    CorEntryPointType EntryType = EntryManagedMain;
    ValidateMainMethod(pFD, &EntryType);
    if ((EntryType == EntryManagedMain) &&
        (stringArgs == NULL)) {
        // If you look at the DIFF on this code then you will see a major change which is that we
        // no longer accept all the different types of data arguments to main.  We now only accept
        // an array of strings.
        wzArgs = CorCommandLine::GetArgvW(&cCommandArgs);
        // In the WindowsCE case where the app has additional args the count will come back zero.
        if (cCommandArgs > 0) {
            if (!wzArgs)
                return E_INVALIDARG;
        }
    }
    ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_MAIN);
    TIMELINE_START(STARTUP, ("RunMain"));
    EX_TRY_NOCATCH
    {
        MethodDescCallSite  threadStart(pFD);
        
        PTRARRAYREF StrArgArray = NULL;
        GCPROTECT_BEGIN(StrArgArray);
        // Build the parameter array and invoke the method.
        if (EntryType == EntryManagedMain) {
            if (stringArgs == NULL) {
                // Allocate a COM Array object with enough slots for cCommandArgs - 1
                StrArgArray = (PTRARRAYREF) AllocateObjectArray((cCommandArgs - numSkipArgs), g_pStringClass);
                // Create Stringrefs for each of the args
                for( arg = numSkipArgs; arg < cCommandArgs; arg++) {
                    STRINGREF sref = COMString::NewString(wzArgs[arg]);
                    StrArgArray->SetAt(arg-numSkipArgs, (OBJECTREF) sref);
                }
            }
            else
                StrArgArray = *stringArgs;
        }
#ifdef STRESS_THREAD
        OBJECTHANDLE argHandle = (StrArgArray != NULL) ? CreateGlobalStrongHandle (StrArgArray) : NULL;
        Stress_Thread_Param Param = {pFD, argHandle, numSkipArgs, EntryType, 0};
        Stress_Thread_Start (&Param);
#endif
        ARG_SLOT stackVar = ObjToArgSlot(StrArgArray);
        if (pFD->IsVoid()) 
        {
            // Set the return value to 0 instead of returning random junk
            *piRetVal = 0;
            threadStart.Call(&stackVar);
        }
        else 
        {
            *piRetVal = (INT32)threadStart.Call_RetArgSlot(&stackVar);
            if (stringArgs == NULL) 
            {
                SetLatchedExitCode(*piRetVal);
            }
        }
        GCPROTECT_END();
        fflush(stdout);
        fflush(stderr);
    }
    EX_END_NOCATCH
    ETWTraceStartup::TraceEvent(ETW_TYPE_STARTUP_MAIN+1);
    TIMELINE_END(STARTUP, ("RunMain"));
    return hr;
}

    這些代碼主要是進(jìn)行方法最終運(yùn)行前的一些準(zhǔn)備，然后運(yùn)行。分兩種，有返回值的和void()的。下面的運(yùn)行情況就是深入到framework的核心中了，改天看了再寫吧。代碼中運(yùn)用了許多COM下的定義，也可見.net和COM關(guān)系的密切。就像.net下的Debugger和Profiler甚至直接調(diào)用了COM接口來編譯。只是我對COM了解不深，無法就此問題深入。
    btw:在看雪發(fā)了幾篇.net文章，主要是看雪類似的文章較少，研究的人也不多。要是有興趣共同學(xué)習(xí).net 的內(nèi)核，歡迎和我交流。

由 tankaiha 于 2006-09-11 20:25 最后編輯

本站是提供個(gè)人知識管理的網(wǎng)絡(luò)存儲空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來自： zqx1000 > 《我的圖書館》

舉報(bào)/認(rèn)領(lǐng)