龙芯开源社区

 找回密码
 注册新用户(newuser)
查看: 3327|回复: 3

gp overflow问题

[复制链接]
发表于 2008-10-18 20:03:51 | 显示全部楼层 |阅读模式
GP溢出

1
背景1.1 GOT
global offset table,存放数据地址的表,在数据段中。


参见《system V mips abi suplement, third edition》

1.2
PIC
position-independent code,一种生成代码方式

1.2.1定义


In computing, position-independentcode (PIC) or position-independent executable (PIE) is machine instruction codethat executes properly regardless of where in memory it resides. PIC iscommonly used for shared libraries, so that the same library code can be loadedin a location in each program address space where it won't overlap any otheruses of memory (for example, other shared libraries).


Position-independent code can becopied to any memory location without modification and executed, unlikerelocatable code, which requires special processing by a link editor or programloader to make it suitable for execution at a given location. Code mustgenerally be written or compiled in a special fashion in order to be positionindependent.




1.2.2
特点
PIC代码的一大特点是代码中不能有绝对地址。因此PIC代码访问全局量和静态量数据时,可以


1.
gp-relative方法,即gp+offset直接获得数据


2.
分两步:首先查找GOT表获得数据地址,然后再访问该地址内容。后一方式即GOT方式。





PIC代码常用于shared library




1.3 对几种内存访问方式的开销的比较:
Method 1:gp+offset: cost = ld
ld offset(gp)

Method 2:direct addressing: cost =lui + ori +ld
lui reg,%lo(addr)
ori reg,%hi(addr)
ld [reg]

Method 3:GOT: cost = ld + ld(dependent)
ld reg,offset(gp)
ld [reg]

可见按开销考虑,gp-relative 优于 绝对地址访问 优于 GOT式访问


[ 本帖最后由 matman 于 2008-10-18 20:07 编辑 ]
 楼主| 发表于 2008-10-18 20:05:48 | 显示全部楼层
2 GP溢出参见
http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0640&db=man&fname=/usr/share/catman/p_man/cat5/gp_overflow.z
2.1原因:
mips指令长度限制,gp+offset只能访问gp-32768gp+3276764KB大小空间(offset只能为16)。此空间称Global data area
gp溢出事实上是指Global data area,即gp寄存器+offset能访问的64KB大小的段不够用。不够用的原因见下。
2.2分类
gp overflow gp-relative section overflow GOT overflow两种
2.2.1 gp-relative sectionoverflow
我们在SPEC2006中碰到的所谓“GOT表溢出都是属于GP溢出的这个类别。
gp-relative section指放小数据的.sdata/.sbss段。这两个段的数据能通过gp+offset直接访问。
产生PIC代码时有一个编译器优化:把原来通过GOT访问的数据放到这两个段中,从而能降低开销(减少一次读内存,见前面1.3的比较)。在编各个.o时,mips pro系列的编译器按-G选项的指示把一些小数据放到.sdata(初始化) .sbss(未初始化)段中。-G <num>表示把长度小于<num>字节的数据放到这两个段中。最终产生executable时会合并各个.o.sdata/.sbss段。由于此前各.o是独立编译的,没有global data area的使用情况的全局信息,所以可能在最后合并各个.o.sdata/.sbss段时产生最终的.sdata/.sbss段时两段大小之和超过64KB,从而出错。
2.2.2
解决方法
binutils升级至2.18.50后,解决了tonto静态链时的错。现在SPEC2006中所有gp溢出都能由加-Gspace 0(IPA情形)-IPA:for_GP_nilIPA时)解决。解决的原理是禁止产生.sdata/.sbss段。
添加-IPA:for_GP_nilpatch 见附录B
2.2.2
优化机会
这一问题还可以由multi-gp的方法解决,即将过大的.sdata/.sbss段分成多个64KB大小的块,则每个块都能各自通过同一个gp来访问。但这样每次gp变化都要setup gp,为了减少这一开销,可以采用gp-partition,使同一函数中的数据使用同一gp,同时使互相调用频繁的函数使用同一gp,在orc中属于todo部分。
 楼主| 发表于 2008-10-18 20:06:10 | 显示全部楼层
1.2.2
GOT-overflow2.2.2.1分类
它们200510月的状态如下,此后未见更新,参见 http://lists.debian.org/debian-mips/2005/10/msg00023.html
1. overflow of GOT in onerelocatable object file.o文件)
no existingsolution
2. overflow of GOT in an executable
if there areless than 16k entries of exported symbols, multi-got can solve it. If more than16k entries of exported symbols, then no existing solution.
2.2.2.2 .解决方法
1. multi-got方法
所谓multi-got方法,是在各.o链接产生executable时采用多个got表。由于开销小,取代了xgot成为主流方法。
2 xgot方法
xgot方法则是按不同的方式构造GOT表。因而与multigot不兼容。用xgot编译的.o只能与同样由xgot编译的库相链接。以前SGI曾因此而同时提供两套库(用xgot和不用xgot的)。
gnu as代码中知xgot方法为如下:
(load_address函数)


/* This is the large GOT case.
If this is a reference to an


external symbol, and there is no constant,we want


lui
$tempreg,<sym>
(BFD_RELOC_MIPS_GOT_HI16)


add
$tempreg,$tempreg,$gp


lw
$tempreg,<sym>($tempreg)(BFD_RELOC_MIPS_GOT_LO16)


or if tempreg is PIC_CALL_REG


lui
$tempreg,<sym>
(BFD_RELOC_MIPS_CALL_HI16)


add
$tempreg,$tempreg,$gp


lw
$tempreg,<sym>($tempreg)(BFD_RELOC_MIPS_CALL_LO16)


If we have a small constant, and this is areference to


an external symbol, we want


lui
$tempreg,<sym>
(BFD_RELOC_MIPS_GOT_HI16)


add
$tempreg,$tempreg,$gp


lw
$tempreg,<sym>($tempreg)(BFD_RELOC_MIPS_GOT_LO16)


addi
$tempreg,$tempreg,<constant>


If we have a large constant, and this is areference to


an external symbol, we want


lui
$tempreg,<sym>
(BFD_RELOC_MIPS_GOT_HI16)


addu
$tempreg,$tempreg,$gp


lw
$tempreg,<sym>($tempreg)(BFD_RELOC_MIPS_GOT_LO16)


lui
$at,<hiconstant>


addi
$at,$at,<loconstant>


add
$tempreg,$tempreg,$at


If we have NewABI, and we know it's alocal symbol, we want


lw
$reg,<sym>($gp)
(BFD_RELOC_MIPS_GOT_PAGE)


addiu
$reg,$reg,<sym>
(BFD_RELOC_MIPS_GOT_OFST)


otherwise we have to resort toGOT_HI16/GOT_LO16.
*/

GOT表表项仍为32位。数据区按4k分为多个“GOTGOT表项的高16位与gp相加指向对应的GOT页的地址,GOT表项的低16位为GOT页页内偏移。两者相加得到数据地址。与multi-got相比,需要多一条addiu才能获得地址。
3.一些相关的优化3.1 gp-partition2.2.2
3.2 优化gp setup prologue本优化破坏了PIC
http://sourceware.org/ml/binutils/2004-12/msg00094.html

lui
$gp,%hi(_gp_disp)
addiu
$gp,$gp,%lo(_gp_disp)
addu
$gp,$gp,.cpload argument优化为

lui
$gp,%hi(_gp)
addiu
$gp,$gp,%lo(_gp)
 楼主| 发表于 2008-10-18 20:06:32 | 显示全部楼层
4. 附录 gcc选项中相关部分

-mabicalls

-mno-abicalls

Generate (do not generate) code thatis suitable for SVR4-style dynamic objects.
-mabicalls is the default for SVR4-based systems.

-mshared

-mno-shared

Generate (do not generate) code thatis fully position-independent, and that can therefore be linked into sharedlibraries.
This option

only affects -mabicalls.

All -mabicalls code hastraditionally been position-independent, regardless of options like -fPIC and-fpic.
However, as an extension, the

GNU toolchain allows executables touse absolute accesses for locally-binding symbols.
It can also use shorter GP initializationsequences

and generate direct calls tolocally-defined functions.
This mode isselected by -mno-shared.

-mno-shared depends on binutils 2.16or higher and generates objects that can only be linked by the GNU linker.
However, the option does

not affect the ABI of the finalexecutable; it only affects the ABI of relocatable objects.
Using -mno-shared will generally makeexecuta-

bles both smaller and quicker.

-mshared is the default.


-mxgot

-mno-xgot

Lift (do not lift) the usualrestrictions on the size of the global offset table.

GCC normally uses a singleinstruction to load values from the GOT.
While this is relatively efficient, it will only work if the GOT is

smaller than about 64k.
Anything larger will cause the linker toreport an error such as:

relocation truncated to fit:R_MIPS_GOT16 foobar

If this happens, you shouldrecompile your code with -mxgot.
Itshould then work with very large GOTs, although it will also be less effi-

cient, since it will take threeinstructions to fetch the value of a global symbol.

Note that some linkers can createmultiple GOTs.
If you have such alinker, you should only need to use -mxgot when a single object file

accesses more than 64k's worth ofGOT entries.
Very few do.

These options have no effect unlessGCC is generating position independent code.

-G num


Put global and static itemsless than or equal to num bytes into the small data or bss section instead ofthe normal data or bss section.

This allows the data to be accessedusing a single instruction.

All modules should be compiled withthe same -G num value.

本版积分规则

小黑屋|手机版|Archiver|Lemote Inc.  

GMT+8, 2019-8-18 11:30 , Processed in 0.182326 second(s), 28 queries .

快速回复 返回顶部 返回列表