Simon
2018-06-01 00:36:06 UTC
Hello tor developers,
I've had limited contact with nickm and dmr and I was encouraged to
solicit interest and comments via the tor-dev mailing list...
I'm interested in contributing a new feature to little-t tor in the form of
a new log severity level called 'trace'. Currently there are 5 severity
levels [1]; debug, info, notice, warn, and err. trace would be more verbose
than debug, and trace function calls at runtime. It would also be
configurable -- perhaps by environment variables and/or command line
options -- regarding which functions get traced and when at runtime.
Because introducing function level tracing likely also introduces a
performance penalty, I'm suggesting that this feature would be optionally
compiled in at build time.
The use of the new log severity level 'trace' would be for new developers
to better comprehend the tor source code by giving them a further resource
-- in addition to the source code, tests, documentation, and IRC chat --
to learn from. The debug log will be more useful in the context of the
function call tree, and function usage is easier to grasp than battling
through the ~ 165k LOC. In addition, any developer who likes to debug via
the log files will appreciate having the extra function call tree context.
I have a basic, generic working proof of concept which produces output when
running tor. Here is a snippet of the output:
...
T125580 +
directory_initiate_command_routerstatus_rend() { // #63
T125580 + node_get_by_id() { // #278
T125580 }
T125580 + directory_initiate_command_rend() { // #63
T125580 + directory_command_should_use_begindir() { // #63
T125580 + fascist_firewall_allows_address_or()
{ // #5814
T125580 + compare_tor_addr_to_addr_policy() { // #6009
T125580 }
T125580 }
T125580 + directory_fetches_from_authorities() { // #126
T125580 + server_mode() { // #6859
T125580 }
T125580 +
router_my_exit_policy_is_reject_star() { // #128
T125580 + router_get_my_routerinfo() { // #130
T125580 }
T125580 }
T125580 }
T125580 }
...
Explanation of the snippet: '<thread ID> <indent> + <function>() {
#<calls>' means <function> has been called <calls> times. <indent>
represents the function call tree indent level. '}' means the function
returns.
One can introduce extra info not implemented yet such as timestamps,
calling parameters, return value, and extra info specified by the developer.
Regarding regular tor log output: This could be mixed in with the trace
output, e.g.:
...
T125580 + router_get_my_routerinfo() { // #130
T125580 - info 1 of 2: abc <-- non trace
T125580 - info 1 of 2: def <-- non trace
T125580 }
...
This email is mainly to determine if there is a interest for such a tor
feature to be implemented further. AFAIK it does not seem to be implemented
already and does not seem to exist as a documented helpful tool [2]. And
I'm not aware of a third party tool which would 'just work' with tor.
Strategies for implementing the function trace:
I'll now discuss the strategies which I know about for implementing the
function trace, including the one I want to use:
1. Manual instrumentation: Here each function is manually instrumented in
the tor source code with entry and exit macros. This is time consuming and
error prone. And it can make the source code less readable.
2. Using the gcc -finstrument-functions [3] command line switch: There are
many examples [4] [5] of using this to generate a runtime function call
tree. But this mechanism causes each function to make two extra calls
whether you want that or not, so performance might be poor. Plus getting
hold of the actual function names and displaying them can be tricky; only
the function address is given to the entry and exit functions. Also, there
is no way to access the function parameters or return value.
3. Using the llvm xray [6] [7] feature: This was originally contributed to
llvm by Google but is more designed for inspecting very high performance
call trees in a production environment. A sort of dtrace for user level
functions. However, it also doesn't log function parameters or the return
value AFAIK. And logging happens via an intermediate binary file which must
get converted to something ascii / human friendly. This also means it's
tricky to mix in the existing tor log output, for one nice seemless log
file.
4. Using gcc compile to assembler command line switch: AFAIK this technique
was first developed by myself. The idea is to replace gcc with an own
script which pretends to be gcc and invokes the real gcc after making
subtle changes. If a makefile tries to compile a .c file to a .o file then
the pretend gcc will change the behaviour so that an intermediate assembler
file will be generated. However, the assembler file gets changed so that
every function call becomes an indirect function call via a pointer. This
means we can change the pointer to execute either the original function
(with the overhead of an indirect function call), or a wrapper function
with the same declaration. The wrapper function has access to all the
function parameters and return value, can log the entry and exit, while
calling the original function which got wrapped.
I have chosen implementation strategy #4 because it provides the most
flexibility without changing any existing source code or makefile. A
limitation is that it only works currently with gcc on Intel. However, msvc
& llvm might have a similar compile to assembly feature. And in theory the
same technique could be implemented for non-Intel chips if they have an
indirect call assembly instruction.
An interesting side effect of strategy #4 is that it could possibly be used
to enhance testing in two ways: 1. If the test is for a particular function
or set of functions then trace could be preconfigured to provide more
detailed output for just those functions, and any functions they call. 2.
The indirect function pointer can be changed at runtime if desired to
create a kind of general purpose function mocking facility for testing.
However, I believe tor tests already do their own mocking.
Here is an example of the proof of concept auto generated C code for the
function firewall_is_fascist_or():
extern unsigned long long cwrap_count_call_firewall_is_fascist_or;
extern int (* cwrap_indirect_orig_firewall_is_fascist_or) (void);
extern int (* cwrap_indirect_wrap_firewall_is_fascist_or) (void);
extern int (* cwrap_indirect_mock_firewall_is_fascist_or) (void);
static int cwrap_wrap_firewall_is_fascist_or (void)
{
cwrap_count_call_firewall_is_fascist_or ++;
// todo: add per binary/file/function verbosity checking
// todo: show function parameters for simple types, e.g. foo=123, bar=7
if (cwrap_count_call_firewall_is_fascist_or <= 10000) cwrap_log(1, "+
firewall_is_fascist_or() { // #%llu\n",
cwrap_count_call_firewall_is_fascist_or);
int cwrap_result = (* cwrap_indirect_orig_firewall_is_fascist_or)();
if (cwrap_count_call_firewall_is_fascist_or <= 10000) cwrap_log(-1,
"}\n");
// todo: show return value for simple types
return cwrap_result;
}
So the pretend gcc doubles the number of functions, creating a 'wrap'
version of each regular function. While the changed assembler files change
the assembler call function into an indirect call function which uses the
newly created indirect function pointer. In theory the makefile just works
and is unmodified. However, extra files get created during make.
Looking forward to comments, suggestions, and/or votes for and against,
even if it's only to point out that something with similar functionality
exists already.
--
Simon
[1] https://www.torproject.org/docs/tor-manual.html.en
[2] https://gitweb.torproject.org/tor.git/tree/doc/HACKING/HelpfulTools.md
[3]
https://stackoverflow.com/questions/6176284/why-doesnt-finstrument-functions-work-for-me
[4]
https://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/
[5]
https://mcuoneclipse.com/2015/04/04/poor-mans-trace-free-of-charge-function-entryexit-trace-with-gnu-tools/
[6] https://www.youtube.com/watch?reload=9&v=jyL-__zOGcU
[7] https://llvm.org/docs/XRay.html
I've had limited contact with nickm and dmr and I was encouraged to
solicit interest and comments via the tor-dev mailing list...
I'm interested in contributing a new feature to little-t tor in the form of
a new log severity level called 'trace'. Currently there are 5 severity
levels [1]; debug, info, notice, warn, and err. trace would be more verbose
than debug, and trace function calls at runtime. It would also be
configurable -- perhaps by environment variables and/or command line
options -- regarding which functions get traced and when at runtime.
Because introducing function level tracing likely also introduces a
performance penalty, I'm suggesting that this feature would be optionally
compiled in at build time.
The use of the new log severity level 'trace' would be for new developers
to better comprehend the tor source code by giving them a further resource
-- in addition to the source code, tests, documentation, and IRC chat --
to learn from. The debug log will be more useful in the context of the
function call tree, and function usage is easier to grasp than battling
through the ~ 165k LOC. In addition, any developer who likes to debug via
the log files will appreciate having the extra function call tree context.
I have a basic, generic working proof of concept which produces output when
running tor. Here is a snippet of the output:
...
T125580 +
directory_initiate_command_routerstatus_rend() { // #63
T125580 + node_get_by_id() { // #278
T125580 }
T125580 + directory_initiate_command_rend() { // #63
T125580 + directory_command_should_use_begindir() { // #63
T125580 + fascist_firewall_allows_address_or()
{ // #5814
T125580 + compare_tor_addr_to_addr_policy() { // #6009
T125580 }
T125580 }
T125580 + directory_fetches_from_authorities() { // #126
T125580 + server_mode() { // #6859
T125580 }
T125580 +
router_my_exit_policy_is_reject_star() { // #128
T125580 + router_get_my_routerinfo() { // #130
T125580 }
T125580 }
T125580 }
T125580 }
...
Explanation of the snippet: '<thread ID> <indent> + <function>() {
#<calls>' means <function> has been called <calls> times. <indent>
represents the function call tree indent level. '}' means the function
returns.
One can introduce extra info not implemented yet such as timestamps,
calling parameters, return value, and extra info specified by the developer.
Regarding regular tor log output: This could be mixed in with the trace
output, e.g.:
...
T125580 + router_get_my_routerinfo() { // #130
T125580 - info 1 of 2: abc <-- non trace
T125580 - info 1 of 2: def <-- non trace
T125580 }
...
This email is mainly to determine if there is a interest for such a tor
feature to be implemented further. AFAIK it does not seem to be implemented
already and does not seem to exist as a documented helpful tool [2]. And
I'm not aware of a third party tool which would 'just work' with tor.
Strategies for implementing the function trace:
I'll now discuss the strategies which I know about for implementing the
function trace, including the one I want to use:
1. Manual instrumentation: Here each function is manually instrumented in
the tor source code with entry and exit macros. This is time consuming and
error prone. And it can make the source code less readable.
2. Using the gcc -finstrument-functions [3] command line switch: There are
many examples [4] [5] of using this to generate a runtime function call
tree. But this mechanism causes each function to make two extra calls
whether you want that or not, so performance might be poor. Plus getting
hold of the actual function names and displaying them can be tricky; only
the function address is given to the entry and exit functions. Also, there
is no way to access the function parameters or return value.
3. Using the llvm xray [6] [7] feature: This was originally contributed to
llvm by Google but is more designed for inspecting very high performance
call trees in a production environment. A sort of dtrace for user level
functions. However, it also doesn't log function parameters or the return
value AFAIK. And logging happens via an intermediate binary file which must
get converted to something ascii / human friendly. This also means it's
tricky to mix in the existing tor log output, for one nice seemless log
file.
4. Using gcc compile to assembler command line switch: AFAIK this technique
was first developed by myself. The idea is to replace gcc with an own
script which pretends to be gcc and invokes the real gcc after making
subtle changes. If a makefile tries to compile a .c file to a .o file then
the pretend gcc will change the behaviour so that an intermediate assembler
file will be generated. However, the assembler file gets changed so that
every function call becomes an indirect function call via a pointer. This
means we can change the pointer to execute either the original function
(with the overhead of an indirect function call), or a wrapper function
with the same declaration. The wrapper function has access to all the
function parameters and return value, can log the entry and exit, while
calling the original function which got wrapped.
I have chosen implementation strategy #4 because it provides the most
flexibility without changing any existing source code or makefile. A
limitation is that it only works currently with gcc on Intel. However, msvc
& llvm might have a similar compile to assembly feature. And in theory the
same technique could be implemented for non-Intel chips if they have an
indirect call assembly instruction.
An interesting side effect of strategy #4 is that it could possibly be used
to enhance testing in two ways: 1. If the test is for a particular function
or set of functions then trace could be preconfigured to provide more
detailed output for just those functions, and any functions they call. 2.
The indirect function pointer can be changed at runtime if desired to
create a kind of general purpose function mocking facility for testing.
However, I believe tor tests already do their own mocking.
Here is an example of the proof of concept auto generated C code for the
function firewall_is_fascist_or():
extern unsigned long long cwrap_count_call_firewall_is_fascist_or;
extern int (* cwrap_indirect_orig_firewall_is_fascist_or) (void);
extern int (* cwrap_indirect_wrap_firewall_is_fascist_or) (void);
extern int (* cwrap_indirect_mock_firewall_is_fascist_or) (void);
static int cwrap_wrap_firewall_is_fascist_or (void)
{
cwrap_count_call_firewall_is_fascist_or ++;
// todo: add per binary/file/function verbosity checking
// todo: show function parameters for simple types, e.g. foo=123, bar=7
if (cwrap_count_call_firewall_is_fascist_or <= 10000) cwrap_log(1, "+
firewall_is_fascist_or() { // #%llu\n",
cwrap_count_call_firewall_is_fascist_or);
int cwrap_result = (* cwrap_indirect_orig_firewall_is_fascist_or)();
if (cwrap_count_call_firewall_is_fascist_or <= 10000) cwrap_log(-1,
"}\n");
// todo: show return value for simple types
return cwrap_result;
}
So the pretend gcc doubles the number of functions, creating a 'wrap'
version of each regular function. While the changed assembler files change
the assembler call function into an indirect call function which uses the
newly created indirect function pointer. In theory the makefile just works
and is unmodified. However, extra files get created during make.
Looking forward to comments, suggestions, and/or votes for and against,
even if it's only to point out that something with similar functionality
exists already.
--
Simon
[1] https://www.torproject.org/docs/tor-manual.html.en
[2] https://gitweb.torproject.org/tor.git/tree/doc/HACKING/HelpfulTools.md
[3]
https://stackoverflow.com/questions/6176284/why-doesnt-finstrument-functions-work-for-me
[4]
https://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/
[5]
https://mcuoneclipse.com/2015/04/04/poor-mans-trace-free-of-charge-function-entryexit-trace-with-gnu-tools/
[6] https://www.youtube.com/watch?reload=9&v=jyL-__zOGcU
[7] https://llvm.org/docs/XRay.html