Thanks to visit codestin.com
Credit goes to github.com

Skip to content

profile.sh misdetects login shell #473

@mw-a

Description

@mw-a

Describe the bug

We're seeing error messages from OpenPBS jobs as generated by the reproducer further down. This happens for users whose login shell is /bin/tcsh but override their job to use /bin/bash for executing the job script. It can be traced to /etc/profile.d/modules.sh which sources /usr/share/Modules/init/tcsh because it wrongly detects the executing shell to be tcsh. Further investigation reveals this to be caused by the optimisation/fix introduced in #173 to use the $BASH variable to make detection more efficient and robust. Unfortunately this runs afoul of two very specific circumstances in our scenario:

  • PBS executes /bin/bash with argv[0] set to -bash in order to get it to behave like a login shell and then feeds the path to the job script to execute to it via stdin
  • bash in turn defaults to using the login shell value from getpwent() when called as a login shell (without further information)

To Reproduce

Steps to reproduce the behavior:

  1. Create a user with login shell tcsh:
$ getent passwd $USER
testuser:*:237:1120:test user:/home/testuser:/bin/tcsh
  1. Have them execute the following minimal reproducer command:
$ echo echo '$BASH' | bash -c 'exec -a -bash /bin/bash'
/usr/share/Modules/init/tcsh: eval: line 3: syntax error near unexpected token `alias'
/usr/share/Modules/init/tcsh: eval: line 3: `if ( $?histchars && $?prompt ) alias module 'set _histchars = $histchars; unset histchars; set _prompt=$prompt:q; set prompt=""; eval "`/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl tcsh \!*:q`"; set _exit="$status"; set histchars = $_histchars; unset _histchars; set prompt=$_prompt:q; unset _prompt; test 0 = $_exit' ;'
/usr/share/Modules/init/tcsh: line 15: syntax error near unexpected token `alias'
/usr/share/Modules/init/tcsh: line 15: `      if ( $?histchars && $?prompt ) alias module 'set _histchars = $histchars; unset histchars; set _prompt=$prompt:q; set prompt=""; eval `/usr/share/Modules/libexec/modulecmd-compat tcsh \!*`; set _exit="$status";  set histchars = $_histchars; unset _histchars; set prompt=$_prompt:q; unset _prompt; test 0 = $_exit' ;'
/bin/tcsh

No modulerc or modulefiles are involved in this problem because it's the initialisation process that's failing.

Expected behavior

The shell should execute the command without error messages.

Error and debugging information

It could be argued whether bash's or PBS's behaviour are strictly correct here. On the other hand, changing their established behaviour, particularly that of bash, is likely to break other things. So any ideas how to work around this problem on the modules level would be very much appreciated.

Modules version and configuration

$ module --version
Modules Release 4.5.2 (2020-07-30)
$ module config --dump-state
Modules Release 4.5.2 (2020-07-30)

- Config. name ---------.- Value (set by if default overridden) ---------------
advanced_version_spec     0
auto_handling             0
avail_indepth             1
avail_report_dir_sym      1
avail_report_mfile_sym    1
collection_pin_version    0
collection_target         <undef>
color                     never
colors                    hi=1:db=2:se=2:er=91:wa=93:me=95:in=94:mp=1;94:di=94:al=96:sy=95:de=4:cm=92
contact                   root@localhost
extended_default          0
extra_siteconfig          <undef>
home                      /usr/share/Modules (env-var)
icase                     never
ignored_dirs              CVS RCS SCCS .svn .git .SYNC .sos
implicit_default          1
locked_configs
ml                        1
pager                     /usr/bin/less -eFKRX
rcfile                    <undef>
run_quarantine            <undef>
search_match              starts_with
set_shell_startup         0
silent_shell_debug        <undef>
siteconfig                /etc/environment-modules/siteconfig.tcl
tcl_ext_lib               /usr/lib64/libtclenvmodules.so
term_background           dark
unload_match_order        returnlast
verbosity                 normal
wa_277                    0

- State name -----------.- Value ----------------------------------------------
cmdline                   /usr/share/Modules/libexec/modulecmd.tcl tcsh config --dump-state --no-pager
extra_siteconfig_loaded   0
force                     0
is_stderr_tty             1
is_win                    0
machine                   x86_64
os                        Linux 4.18.0-372.9.1.el8.x86_64
pager_started             0
paginate                  0
path_separator            :
rc_loaded                 <undef>
siteconfig_loaded         1
shell                     tcsh
shelltype                 csh
subcmd                    config
subcmd_args               --dump-state
tcl_ext_lib_loaded        1
tcl_version               8.6.8
term_columns              158

- Env. variable --------.- Value ----------------------------------------------
LOADEDMODULES
MODULEPATH                /app/modulefiles:/app/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles
MODULEPATH_modshare       /usr/share/modulefiles:1:/usr/share/Modules/modulefiles:1:/etc/modulefiles:1
MODULESHOME               /usr/share/Modules
MODULES_CMD               /usr/share/Modules/libexec/modulecmd.tcl
MODULES_USE_COMPAT_VERSION 0

Additional context

  • We're experiencing this on RHEL 8.6 but have no reason to believe this wouldn't affect any newer version of modules as well.
  • In our use-case the error messages are purely cosmetic but annoying and generate support churn. It's also likely that there are cases where this additional error output might confuse software which is parsing the output of commands it runs in a similar fashion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions