在python编程过程中,有些时候由于重用代码或者性能之类的原因,需要直接调用c语言写的库文件。python支持Cython、swig、ctypes、cffi等不同方式来调用c的代码。下面以ctypes为例,介绍怎么在python中使用sge的库文件。
下载最新的Son of Grid Engine,链接https://gitlab.com/loveshack/sge (注:本文档以9baeb84cdc883c4ba8b38cfc89ab8262e6d4e1d9这个commit的版本为例)
# tar zxf sge-master.tar.gz
# cd sge-master/source
# sh scripts/bootstrap.sh
根据实际需求可以打开一些程序或库的debug
选项,方便gdb进行跟踪。比如test_eval_expression
程序,修改libs/sgeobj/Makefile下面这行:
test_eval_expression: test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB_DEP) $(COMMLIB) $(COMMLISTSLIB)
$(LD_WRAPPER) $(CC) $(CFLAGS) -o test_eval_expression $(LFLAGS) test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB) $(DLLIB) $(SECLIB) $(LIBS)
加上-g
参数:
test_eval_expression: test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB_DEP) $(COMMLIB) $(COMMLISTSLIB)
$(LD_WRAPPER) $(CC) $(CFLAGS) -g -o test_eval_expression $(LFLAGS) test_eval_expression.o $(SGEOBJLIB) $(SGEOBJDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB) $(DLLIB) $(SECLIB) $(LIBS)
aimk
选项根据实际需要选择,不清楚是什么问题,-shared-libs
选项编译很多地方通不过。下面是一些修改记录,可能不全,请根据编译时实际错误信息进行修改。
aimk
文件需要修改如下:
$ diff aimk.orig aimk
2126a2127,2131
> if ( $SHAREDLIBS == 1 ) then
> set SHARED_PATH_NAME = `dist/util/arch -lib`
> setenv $SHARED_PATH_NAME ${SOURCE}/${COMPILE_ARCH}
> endif
>
2221a2227,2230
> if ( $SHAREDLIBS == 1 ) then
> unsetenv $SHARED_PATH_NAME
> endif
>
很多Makefile
文件需要修改
## libs/Makefile
$(SHAREDLD) $(SHARED_LFLAGS) -o libsge$(SHAREDEXT) $(SCHEDLIB_OBJS) $(MIRLIB_OBJS) $(EVCLIB_OBJS) $(GDILIB_OBJS) $(SGEOBJLIB_OBJS) $(SGEOBJDLIB_OBJS) $(KRBLIBS) $(COMMLIB_OBJS) $(COMMLISTSLIB_OBJS) $(CULLLIB_OBJS) $(UTILIB_OBJS) sig_handlers.o $(LOADAVGLIBS) $(LIBS) -lc $(SECLIB)
## libs/sgeobj/Makefile
libsgeobj$(SHAREDEXT): $(SGEOBJLIB_OBJS) $(SGEOBJDLIB) $(COMMLIB) $(CULLLIB) $(UTILIB) version.o sge_gdi_packet.o sge_gdi2.o sge_security.o sge_gdi_packet_internal.o sge_gdi_packet_pb_cull.o sge_gdi_ctx.o qm_name.o
$(SHAREDLD) $(SHARED_LFLAGS) -o libsgeobj$(SHAREDEXT) $(SGEOBJLIB_OBJS) version.o sge_gdi_packet.o sge_gdi2.o sge_security.o sge_gdi_packet_internal.o sge_gdi_packet_pb_cull.o sge_gdi_ctx.o qm_name.o -lsgeobjd -lcomm -lcommlists -lcull -luti $(LIBS) -lc
libsgeobjd$(SHAREDEXT): $(SGEOBJDLIB_OBJS) $(CULLLIB) $(UTILIB) version.o sge_gdi_packet.o sge_gdi2.o
$(SHAREDLD) $(SHARED_LFLAGS) -o libsgeobjd$(SHAREDEXT) $(SGEOBJDLIB_OBJS) version.o sge_gdi_packet.o sge_gdi2.o -lcull -luti $(LIBS) -lc
## libs/comm/Makefile
$(SHAREDLD) $(SHARED_LFLAGS) -o libcomm$(SHAREDEXT) $(COMMLIB_OBJS) -luti $(DLLIB) $(LIBS) -lcrypto -lssl
## libs/uti/Makefile
$(SHAREDLD) $(SHARED_LFLAGS) -o libuti$(SHAREDEXT) $(UTILIB_OBJS) $(LOADAVGLIBS) $(LIBS) -lc -ldl -lsgeobj
## libs/spool/Makefile
test_sge_spooling_utilities: test_sge_spooling_utilities.o $(SPOOLING_DEPS) $(SGEOBJLIB) $(SGEOBJDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SCHEDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB_DEP) sig_handlers.o
$(LD_WRAPPER) $(CC) $(CFLAGS) -o test_sge_spooling_utilities $(LFLAGS) test_sge_spooling_utilities.o $(SPOOLING_LIBS) $(SCHEDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SGEOBJLIB) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB) $(SECLIB) $(SLIBS) $(LIBS) $(DLLIB) sig_handlers.o
test_spooling_mt: test_spooling_mt.o $(SPOOLING_DEPS) $(SGEOBJLIB) $(SGEOBJDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SCHEDLIB) $(CULLLIB) $(COMMLIB) $(COMMLISTSLIB) $(UTILIB) $(WINGRIDLIB_DEP) sig_handlers.o
$(CC) $(CFLAGS) -o test_spooling_mt $(LFLAGS) test_spooling_mt.o $(SPOOLING_LIBS) $(SCHEDLIB) $(MIRLIB) $(EVCLIB) $(GDILIB) $(SGEOBJLIB) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB) $(WINGRIDLIB) $(SECLIB) $(SLIBS) $(LIBS) $(DLLIB) sig_handlers.o
## 3rdparty/tacc_pam_sge/Makefile
$(TACCFOO): $(TACCFOO_OBJS)
$(CC) $(CFLAGS) -o $(TACCFOO) $(TACCFOO_OBJS) -lsge -lsched -levc -lgdi -lsgeobj -lsgeobjd -lcull -lcomm -lcommlists -luti -lc -ldl -lm -lpthread
$(TACCLIB)$(SHAREDEXT): $(TACCLIB_OBJS) $(SGEOBJDLIB) $(COMMLIB) $(COMMLISTSLIB) $(CULLLIB) $(UTILIB)
$(SHAREDLD) $(SHARED_LFLAGS) -o $(TACCLIB)$(SHAREDEXT) $(TACCLIB_OBJS) -lsge -lcull -luti -ldl -lpthread
## libs/cull/Makefile
example1: $(EXAMPLE1_DEPS) sge_dlopen.o
$(LD_WRAPPER) $(CC) $(CFLAGS) -o example1 $(LFLAGS) $(EXAMPLE1_OBJS) $(LIBS) sge_dlopen.o
## libs/cull/Makefile
example1: $(EXAMPLE1_DEPS)
$(LD_WRAPPER) $(CC) $(CFLAGS) -o example1 $(LFLAGS) $(EXAMPLE1_OBJS) $(LIBS) -luti
然后编译sge:
# ./aimk -no-java -no-gui-inst -debug -gprof -shared-libs -no-remote -no-qtcsh
编译完成后,有个以架构命令的目录,比如LINUXAMD64
,生成的有用文件都在目录下。把该目录加到环境变量LD_LIBRARY_PATH
中,在python里就可以用ctypes
模块去调用sge的模块了。
最简单的使用方式是像下面这样的:
>>> import ctypes
>>> libsge = ctypes.CDLL('/path/to/libsge.so')
>>> libsge.sge_eval_expression(6, "a*", "A")
0
>>> libsge.sge_eval_expression(6, "a*", "b")
1
对应的c函数代码是:
int sge_eval_expression(u_long32 type, const char *expr, const char *value, lList **answer_list)
复杂一点特殊字符会出问题,下面的应该返回0,而不是错误:
>>> libsge.sge_eval_expression(6, '(sol-*64|linux|hp*)&!sol-sparc', 'hp11', None)
error: Parse error on position 1 of the expression "(".
-1
改为用byte
的方式就好了:
>>> libsge.sge_eval_expression(6, b'(sol-*64|linux|hp*)&!sol-sparc', b'hp11', None)
0
但这样也不代表没问题了,再试试TYPE_HOST
类型的表达式,Segmentation fault
了:
>>> libsge.sge_eval_expression(7, b'Latte*', b'latte3.czech.sun.com', None)
Segmentation fault
用gdb来debug看,出问题的是sge_hostmatch()
:
Program received signal SIGSEGV, Segmentation fault.
bootstrap_get_ignore_fqdn () at ../libs/uti/sge_bootstrap.c:188
188 return bootstrap->get_ignore_fqdn(bootstrap);
#0 bootstrap_get_ignore_fqdn () at ../libs/uti/sge_bootstrap.c:188
#1 0x00007ffff1218d75 in sge_hostcpy (dst=0x7fffffffc050 " \301\377\377\377\177", raw=0x7fffffffc960 "latte*")
at ../libs/uti/sge_hostname.c:1167
#2 0x00007ffff1218f0b in sge_hostmatch (h1=<value optimized out>, h2=0x7fffffffc160 "latte3.czech.sun.com")
at ../libs/uti/sge_hostname.c:1291
#3 0x00007ffff11beb18 in MatchPattern (token_p=<value optimized out>, skip=<value optimized out>) at ../libs/sgeobj/sge_eval_expression.c:409
#4 0x00007ffff11bec0d in SimpleExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:376
#5 0x00007ffff11bec2e in AndExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:343
#6 0x00007ffff11beca9 in OrExpression (token_p=0x7fffffffd160, skip=false) at ../libs/sgeobj/sge_eval_expression.c:323
#7 0x00007ffff11bee30 in sge_eval_expression (type=7, expr=<value optimized out>, value=0x7ffff196ab78 "latte3.czech.sun.com",
answer_list=<value optimized out>) at ../libs/sgeobj/sge_eval_expression.c:167
sge_hostmatch()
代码里有解释,这个函数相当于fnmatch()
,但是会根据配置来决定怎么去比较hostname是否一样。
// source/libs/uti/sge_hostname.c
/****** uti/hostname/sge_hostmatch() ********************************************
* NAME
* sge_hostmatch() -- fnmatch() for hostnames
*
* SYNOPSIS
* int sge_hostmatch(const char *h1, const char*h2)
*
* FUNCTION
* fnmatch() for hostnames. Honours some configuration values:
* - Domain name may be ignored
* - Domain name may be replaced by a 'default domain'
* - Hostnames may be used as they are.
*
* INPUTS
* const char *h1 - 1st hostname
* const char *h2 - 2nd hostname
*
* RESULT
* int - 0, 1 or -1
具体错误是在执行return bootstrap->get_ignore_fqdn(bootstrap)
的时候出错了,这跟sge_bootstrap(5)的ignore_fqdn
参数有关。
// source/libs/uti/sge_bootstrap.c
bool bootstrap_get_ignore_fqdn(void)
{
sge_bootstrap_state_class_t* bootstrap = NULL;
GET_SPECIFIC(sge_bootstrap_thread_local_t, handle, bootstrap_thread_local_init, sge_bootstrap_thread_local_key,
"bootstrap_get_ignore_fqdn");
bootstrap = handle->current;
return bootstrap->get_ignore_fqdn(bootstrap); // <= 这里报错
}
#define GET_SPECIFIC(type, variable, init_func, key, func_name) \
type *variable = pthread_getspecific(key); \
if(variable == NULL) { \
int ret; \
variable = sge_malloc(sizeof(type)); \
init_func(variable); \
ret = pthread_setspecific(key, (void*)variable); \
if (ret != 0) { \
fprintf(stderr, "pthread_setspecific(%s) failed: %s\n", func_name, strerror(ret)); \
abort(); \
} \
}
sge_hostmatch()
的代码里需要读ignore_fqdn
和default_domain
参数,这些参数只能在安装的时候设置,已经在运行的系统是不能修改这两个参数的。
主要问题是这个函数要在pthread线程中执行,在sge中可以通过bootstrap_mt_init()
和feature_mt_init()
来初始化,而在python中没有初始化相关线程就直接调用sge_eval_expression()
,在获取线程信息的时候就会出错。
默认设置下,sge_hostmatch()
比较的时候用的还是fnmatch()
,只是有以下部分的特殊处理,可以考虑用其他类型来代替。
void sge_hostcpy(char *dst, const char *raw)
{
bool ignore_fqdn = bootstrap_get_ignore_fqdn(); // <= 这里报错
bool is_hgrp = is_hgroup_name(raw);
const char *default_domain;
if (dst == NULL || raw == NULL) {
return;
}
if (is_hgrp) { // 如果是hostgroup,直接对比,不做处理
/* hostgroup name: not in FQDN format, copy the entire string*/
sge_strlcpy(dst, raw, CL_MAXHOSTLEN);
return;
}
if (ignore_fqdn) {
char *s = NULL;
/* standard: simply ignore FQDN */
sge_strlcpy(dst, raw, CL_MAXHOSTLEN);
if ((s = strchr(dst, '.'))) { // compute-0-0.hpc.cn 只返回compute-0-0进行对比
*s = '\0';
}
return;
}
/* ... skipped ... */
}
当然,也可以把代码改一下,直接返回结果,不用去获取值。
# diff ./source/libs/uti/sge_hostname.c.orig source/libs/uti/sge_hostname.c
1167c1167
< bool ignore_fqdn = bootstrap_get_ignore_fqdn();
---
> bool ignore_fqdn = 1;
这样修改编译完,再用相同的调用就不会报错了。
同时通过测试也可以看到,相同的表达式,type参数不同,结果是不一样的。TYPE_HOST
类型和别的类型最大的区别是在默认sge配置下是不比较域名部分的。
>>> libsge.sge_eval_expression(7, b'Latte*', b'latte3.czech.sun.com', None)
0
>>> libsge.sge_eval_expression(7, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
0
>>> libsge.sge_eval_expression(2, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
1
>>> libsge.sge_eval_expression(6, b'Latte*.hpc.cn', b'latte3.czech.sun.com', None)
1
如果更严谨点,可以指定函数接口类型,具体参考python ctypes官方文档。
>>> class lList(ctypes.Structure):
... pass
...
>>> libsge = ctypes.CDLL('libsge.so')
>>> libsge.sge_eval_expression.argtypes = [ctypes.c_long, ctypes.c_char_p, ctypes.c_char_p, ctypes.POINTER(ctypes.POINTER(lList))]
>>>
>>> TYPE_INT = 1
>>> TYPE_FIRST = TYPE_INT
>>> TYPE_STR = 2
>>> TYPE_TIM = 3
>>> TYPE_MEM = 4
>>> TYPE_BOO = 5
>>> TYPE_CSTR = 6
>>> TYPE_HOST = 7
>>> TYPE_DOUBLE = 8
>>> TYPE_RESTR = 9
>>> TYPE_CE_LAST = TYPE_RESTR
>>>
>>> libsge.sge_eval_expression(TYPE_CSTR, b'(sol-*64|linux|hp*)&!sol-sparc', b'hp11', None)
0
>>> libsge.sge_eval_expression(TYPE_CSTR, b"a*", b"A", None)
0
>>> libsge.sge_eval_expression(TYPE_STR, b"a&", b"a", None)
error: Parse error on position 2 of the expression "a&".
-1
>>> libsge.sge_eval_expression(TYPE_CSTR, b"a*", b"A", None)
1
>>> libsge.sge_eval_expression(TYPE_HOST, b'Latte*', b'latte3.czech.sun.com', None)
1