Sunday 12 July 2015

second x11 server

I am trying to make a second X sesstion over ssh.

http://unix.stackexchange.com/questions/85383/how-to-start-a-second-x-session
http://unix.stackexchange.com/questions/85383/how-to-start-a-second-x-session
solution:

run startx -- :1


  1. adfasdf
  2. 1232131


  3. adfasdf

    Tuesday 7 July 2015

    paraview study

    the reason I decided to start looking at paraview is that it gives a flexible view on the meshes and all other things.

    view the meshing numbers:
    1. at the layout window choose select mesh. then use click and drag to select the desired cells
    2. tab view -> selection display inspector
    3. at the newly opened selection display inspect-> cell labels, then choose the one to display

    Thursday 25 June 2015

    touch2 progress.

    got code. once compiled using proveded makefile, it says:
    it can not accept 1e50
    which can be get around by.
    -fdefault-real-8

    at the moment file libaztec.a is missing.
    a request is submitted for downloading


    Friday 19 June 2015

    try to use latex to update blogs

    $\Psi$ \[ \begin{align} \nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}} {\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\ \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\ \nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\ \nabla \cdot \vec{\mathbf{B}} & = 0 \end{align} \]
    asdfdsafsa \chapter{} \section{INTRODUCTION} \textbf{Duration}: January 2014 to Continuing \textbf{Name of the Employer}: School of Civil Engineering \textbf{Designation}: Research \section{BACKGROUND} \N Urban network sewage network serves

    Monday 8 June 2015

    openbox study

    To get used to another window manager that is desktop-metaphor-based, I decided to have a further dig on Openbox that is able to stand in parallel with i3wm. here are the updates.

    (1) tint2 is found to work very well with openbox. tint2 provides a taskbar at the bottom of the screen, which makes the desktop very much like windows.
    (2) the following script works perfectly as keyboard short cut.
    should be stored at .config/openbox/rc.xml (see ubuntu in toshiba)
    The time I spent to configure it is that the W-left should be stored at the end of the keybinding list.

    (3) Current problem: one of the reasons I want to use this method is that I can work at home in a windows-based machine, using tightvnc based command. however, windows is heavily relying on the windows key, which makes the keybindings at the tightvnc useless. TO 13-06-08



    Sunday 7 June 2015

    tightvnc and vino study

    I decide to work on vnc due to the necessary to work at home using the graphic interface. I think the main reason to display the whole thing remotely is because X11 forward is still too slow, at least for matlab figures.


    1. Install vnc.
    # Make sure Debian is the latest and greatest:
    # be noticed
    # http://raspberrypi.stackexchange.com/questions/4474/tightvnc-copy-paste-between-local-os-and-raspberry-pi
    apt-get install xorg lxde-core tightvncserver 

    # Start VNC to create config file for password (run as a normal user)
    tightvncserver :1
    # Then stop VNC
    tightvncserver -kill :1
    # Edit config file to start session with LXDE:
    vim ~/.vnc/xstartup
    # Add this at the bottom of the file:

    #!/bin/sh
    xrdb $HOME/.Xresources
    #xsetroot -solid grey
    #x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
    #x-window-manager &
    # Fix to make GNOME work
    export XKL_XMODMAP_DISABLE=1
    #/etc/X11/Xsession
    #lxterminal &
    #/usr/bin/lxsession -s LXDE &
    exec openbox-session &
    #--------------  openbox -----------------------------
    #!/bin/sh
    xrdb $HOME/.Xresources
    xsetroot -solid grey
    x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
    openbox &
    # ------   working for twm  ------
    xrdb $HOME/.Xresources
    xsetroot -solid grey
    xterm -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
    twm &
    # --------------------- a working solution for i3----------------------------
    xrdb $HOME/.Xresources
    xsetroot -solid grey
    x-terminal-emulator -geometry 80x24+10+10 -ls -title "$VNCDESKTOP Desktop" &
    i3 &


    and make sure this file is executable.


    # Restart VNC
    tightvncserver :1   # working for a debian distro but not on gentoo distributions..

    for the client, the things that are needed to execute is
    1. openup vnc, for example the app within chrome and input
    192.168.0.4::5901


    for the office (gentoo) machine, i use
     vncviewer ae429-1105:1  #from a ssh that connects to office.

    vncserver :1 -geometry 1440x900 -depth 24  # start the server
    pkill Xvnc # stop
    vncserver -geometry  1680x1010 -depth 24 :1  # works perfectly with office machine

    2. Solving X11 forward problem when using tightvnc combining with tmux.
    The problem of using this conbination is that at tightvnc when attaching a seesion that was used to be working locally (i.e., DISPLAY=:0), all the X11 pops out locally rather than the window from tightvnc. The way of solving such problem is:
    a. check "echo $DISPLAY" at the terminal in tightvnc window.
    b. attach tmux sessions
    c. change DISPLAY variable into the value displayed in a. e.g., "export DISPLAY=:1"

    This may indicate that every time there is a swich of change working environment (from local use to ssh). one needs to change the environment variables.
    more reading http://unix.stackexchange.com/questions/75681/why-do-i-have-to-re-set-env-vars-in-tmux-when-i-re-attach

    3. Is vino a good package?
    it should be pointed out that ubuntu uses Vino to do the remote connecting stuff. which is albeit identical to the desktop, very slow comparing with tightvnc (pretty much like qq remote).
    the way to setup such connections is
    a. type vino-preferences and enable desktopsharing ability.
    b. tricky part : apt-get install dconf-tool, then find org->gnome->desktop->applications->remote-access-> deselect require-encryption.
    c. use vnc application to remote connecting the desktop.

    4. scalling in tightvnc viewer.
    It is found that tightvnc viewer does not support dynamically changing resolution, but in windows tightvncviewer, one can scale the resolution to satisfy the change of the screen, which is pretty convinent.

    5. working seamlessly with termserv.
    Somehow the termserv machine does not support directly connecting to 5901 port, which is the default port for tightvnc. but I have made a way to workaround.
    1. open putty->tunnel->sourceport 5901, destination 127.0.0.1:5901, choose local, choose auto.
    2. go to tightvncviewer, choose127.0.0.1:5901. then you should be able to see the result perfectly.
    in tightvncviewer, one can scale the size of the window so changing resolution dynamicly is not a issue at all.
    6. working seamlessly with linux
    the problem using linux as a client to control a vnc client is that, one has to remap some of the keys (e.g., win key) so that the host machine and guest machine has individual keys representing their functions. (e.g., user does not want to press win buttom with both host and client behaving the same as press win, instead, one key has to be working for host machine and another one for guest).

    resolving this problem requires knowledge of the desktop environment on both client and host machine. this paragraph only works for the situation where client has i3wm (using mod4 as function key).

    if server is openbox, one can add the following script to rc.xml for achieving aerosnap:
    <keybind key="C-Left">        # HalfLeftScreen
          <action name="UnmaximizeFull"/>
          <action name="MoveResizeTo"><x>0</x><y>0</y><height>97%</height><width>50%</width></action>
        </keybind>
        <keybind key="C-Right">        # HalfRightScreen
          <action name="UnmaximizeFull"/>
          <action name="MoveResizeTo"><x>-0</x><y>0</y><height>97%</height><width>50%</width></action>
        </keybind>
     if server is i3wm then one can change the mod key as right window key.
    note that openbox and i3wm is not the same in terms of master key. for i3wm, one can assign a key as master key, so if one is able to map the master key to another key, problem gets solved. Instead, openbox always use 'absolute address' to define a key (C represents control, W represents windows key). this is a bit difficult to change, unless one define and keybinding that does not conflict with other keybindings, just like the snippet above.




    Saturday 6 June 2015

    mac os study

    Bought mac for my wife as the previous pc was almost 6 years old. the other reason is that i can really look into how posix is applied among different os.
    seriously a linux/unix user does not feel and problem in using mac system at all. Instead, from my experience, Unity and gnome3 is just a free replica of mac system!!!  here is the list about how i improve the mac machines.

    (1) To make directory paths visible atop Finder windows, open Terminal.app (/Applications/Utilities/) and type the following command:

    defaults write com.apple.finder _FXShowPosixPathInTitle -bool YES


    (2) never use the file system with case sensitive function as some of the applications (e.g., adobe apps) does not support case sensitive journals.



    (3) make terminal more good looking..
    follow this website: (http://osxdaily.com/2013/02/05/improve-terminal-appearance-mac-os-x/)
    execute the following command in terminal:

    export PS1="\[\033[36m\]\u\[\033[m\]@\[\033[32m\]\h:\[\033[33;1m\]\w\[\033[m\]\$ "
    export CLICOLOR=1
    export LSCOLORS=ExFxBxDxCxegedabagacad
    alias ls='ls -GFh'


    (4) how to cut files:

    just copy by command+c
    then go to the destinations, using command+option+v to paste files.

    (5) at the moment ntfs driver is not able to write. also my ext 4 journal is not able to be read and mount.
    solution: install macfuse and ntfs-3g

    (6) once a x application is opened from sshed terminal, the x application starts to boot and exit loop.
    solution:

    https://xquartz.macosforge.org/trac/ticket/589
    rm -rf /tmp/.X11-unix
    mkdir /tmp/.X11-unix

    Thursday 4 June 2015

    flopy

    install:
    pip install flopy --upgrade

    i got stuck to set the path for modflow execution.
    it seems that the executional file


    >>> mf = flopy.modflow.Modflow(modelname, exe_name='mf2005')
    >>> success, buff = mf.run_model()
    FloPy is using the following executable to run the model: mf2005
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.7/site-packages/flopy/mbase.py", line 375, in run_model
        stdout=sp.PIPE, cwd=self.model_ws)
      File "/usr/lib64/python2.7/subprocess.py", line 710, in __init__
        errread, errwrite)
      File "/usr/lib64/python2.7/subprocess.py", line 1335, in _execute_child
        raise child_exception


    OSError: [Errno 2] No such file or directory



    Monday 18 May 2015

    swmmout study

    swmmout

    swmmtoolbox catalog sydney_sps.out

    get all the variables in the outfile

    swmmtoolbox extract sydney_sps.out link,link597,1 >> link597.txt
    swmmtoolbox extract sydney_sps.out link,link597,1 link,link597,2 link,link597,3 link,link597,4 >link597




    0--flow rate
    1--height
    2--flow
    3-- froud number
    4--capacity


    in dropdown,

    listdetail
     getdata        DEPRECATED: Use 'extract' instead.
    
    
    
    
    
    



    swmmtoolbox listdetail sydney_sps.out node





    swmmtoolbox extract sydney_sps.out node,nodeSPS0067,1 >nodeSPS0067_1

    0. depth_above_invert
    1. hydraulic head
    2. volume_storeded_ponded
    3.lateral inflow
    4. total inflow
    5. flow lost_flooding

    swmmtoolbox extract sydney_sps.out node,nodeSPS0067,2 node,nodeSPS0067,3 node,nodeSPS0067,4 node,nodeSPS0067,5  >nodeSPS0067_2

    Tuesday 5 May 2015

    slurm study

    /usr/share/doc/slurm-llnl-doc/

    slurm-llnl-doc install

    the location is at /usr/share/doc/slurm-llnl-doc/html


    http://sphaleron.blogspot.com.au/2011/08/really-super-quick-start-guide-to.html

    Really Super Quick Start Guide to Setting Up SLURM
    SLURM is the awesomely-named Simple Linux Utility for Resource Management written by the good people at LLNL. It's basically a smart task queuing system for clusters. My cluster has always run Sun Grid Engine, but it looks like SGE is more or less dead in the post-Oracle Sun software apocalypse. In light of this and since SGE recently looked at me the wrong way, I'm hoping to ditch it for SLURM. I like pop culture references and software that works.

    The "Super Quick Start Guide" for LLNL SLURM has a lot of words, at least one of which is "make." If you're lazy like me, just do this:

    0. Be using Ubuntu
    1. Install: # apt-get install slurm-llnl
    2. Create key for MUNGE authentication: /usr/sbin/create-munge-key
    3a. Make config file: https://computing.llnl.gov/linux/slurm/configurator.html
    3b. Put config file in: /etc/slurm-llnl/slurm.conf
    4. Start master: # slurmctld
    5. Start node: # slurmd
    6. Test that fool: $ srun -N1 /bin/hostname

    service slurm-llnl restart
    
    
    
    
     cat slurmctld.log 
    [2015-05-06T13:36:48] Job accounting information stored, but details not gathered
    [2015-05-06T13:36:48] error: open /var/log/slurm_jobacct.log: Permission denied
    [2015-05-06T13:36:48] error: Couldn't load specified plugin name for accounting_storage/filetxt: Plugin init() callback failed
    [2015-05-06T13:36:48] error: cannot resolve acct_storage plugin operations
    [2015-05-06T13:36:48] slurmctld version 2.3.2 started on cluster cluster
    [2015-05-06T13:36:48] error: open /var/log/slurm_jobacct.log: Permission denied
    [2015-05-06T13:36:48] error: Couldn't load specified plugin name for accounting_storage/filetxt: Plugin init() callback failed
    [2015-05-06T13:36:48] error: cannot resolve acct_storage plugin operations
    [2015-05-06T13:36:48] fatal: failed to initialize accounting_storage plugin
    
    
    
    

    https://paolobertasi.wordpress.com/2011/05/24/how-to-install-slurm-on-debian/

    • /etc/init.d/slurm-llnl start

    Wednesday 8 April 2015

    git study

    git rev-list HEAD --count
    display the number of commits 


    put version into git repo




     git log --oneline   get a list of commited result.



    git log --oneline --graph --decorate --all
    see the whole tree of the current code.
    see  [http://stackoverflow.com/questions/5361019/viewing-full-version-tree-in-git]


    echo "#define GIT_REF \"`git show-ref refs/heads/master | cut -d " " -f 1 | cut -c 31-40`\"" >
    put one line in sutra to demonstrate the version number
    [http://stackoverflow.com/questions/2696975/how-do-i-add-revision-and-build-date-to-source]


    git checkout 5b01926d4bdc850de431ae3cc1d6098168288826
    go to another version. do not worry about the possible update you make, because if you have modification, you checkout will be aborted.


    git clean -f examples/  delete all the redundant files

    Wednesday 25 March 2015

    segmentation fault

    After refining the mesh for p3, the program ends up with segmetation fault.
    after debugging using idb, it is found it crashes at a line that writes result into file. apparently it is not casued by this line simply. the problem can be reproduced by using gfortran.
    I was suggested by changing the stack size by

       ulimit -s 4000000

    the way to get around is to compile the source code by windows. at least the program is now working.


    case 2

    the second time i see segmentation fault is that one of the files are not properly transfered from source to sub functions.


    say: call aaa(a,b,c)

    in aaa :
    aaa(a,b)
    c=12

    this may cause segmentation fault

    gdb (gefortran --debug) debug

      Intel fortran on linux is no longer free, which is annoying as a fortran user. I dicided to change to gfortran instead.
      to creat a executable file that can debug, one can use  '-g' flag
    TBH, at the very beginning, it is quite annoying to use command interface to do the debugging, but

    gfortran -g a.F90 -o ab

    1.  use gdb (15-06-29) 
    if gdb does not diplay the line properly, use -ggdb flag instead.
    it is also found that the files has to be compiled by the same compiler in the same machine to ensure the readability of the code.
    gfortran -ggdb a.f -o a.out

    then use the following command to start debug processes.

    gdb ab

    other useful command:

    break main

    run  [r]

    step [s]  (step into the subroutine)

    print a [p a]

    where

    quit

    next (n) go to the next line without step into subfunctions just over the current
    n 5   go to the next for 5 times
    2.  Info command
    info b   see all the breaking points info breakpoints check all of the breakpoints
    info line  shows where current script is stoped at



    delete 1 delete break point 1 from list info b
    disable 1 
    enable  1

    disable disable all break points

    break sutra.f:333    %stop at line 333 of file sutra.f

    frame   display current line

    finish  stepout



    https://www.google.com.au/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDcQFjAE&url=http%3A%2F%2Fdarkdust.net%2Ffiles%2FGDB%2520Cheat%2520Sheet.pdf&ei=HH4zVZnBO4PsmAXPz4HoCQ&usg=AFQjCNHCDpbVUfxmJyacITfnesZ0PKBKqg&sig2=ImgMp83YjaeYNoztPgTw-w&bvm=bv.91071109,d.dGY&cad=rja

    http://stackoverflow.com/questions/501486/getting-gdb-to-save-a-list-of-breakpoints
    save breakpoints a.txt

    how to load break points --
    source a.txt


    Error: Dummy argument 'cherin' of procedure 'readif' at (1) has an attribute that requires an explicit interface for this procedure


    set breakpoint pending on


    No symbol table is loaded.  Use the "file" command.


    2.  Correct the program. (15-06-29)
    TIPS:
      (1) if something is wrong in the source code. one does not need to quit gdb. instead, just correct the file, make it with debug flag, and rerun simulation in gdb using r.

    1. Start the program being debugged.Example 1. The program is printch, which can take an optional command line argument. Start it running with no command line argument.
      (gdb) r
      
      Example 2. Start printch running with command line argument A.
      (gdb) r A
      
    2. Execute a single statement. If the statement is a function call, just single step into the function.
      (gdb) s
      
    3. Execute a single statement. If the statement is a function call, execute the entire function and return to the statement just after the call; that is, step over the function.
      (gdb) n
      
    4. Execute from the current point up to the next breakpoint if there is one, otherwise execute until the program terminates.
      (gdb) c
      
    5. Execute the rest of the current function; that is, step out of the function.
      (gdb) finish


    Sunday 15 February 2015

    mflab -- malab read excel using xlsread slow down problem

    1. it is the second time i found matlab is doing dodgy things.
    Specifically, It is found that mflab in some machines are extremetly slow. the main slow down script
    turns out,
    line 88 h=feval(['COM.' convertedProgID], 'server', machinename, interface);
    at 
    C:\Program Files\MATLAB\R2012b\toolbox\matlab\winfun\actxserver.m
    or
    C:\Program Files\MATLAB\R2014a\toolbox\matlab\winfun\actxserver.m

    is the problematic file. just change it into

    line 88 h=feval([convertedProgID], 'server', machinename, interface);
    solves the slow down problem

    2. aera object not working.
    2015-02-16 change fdm3.m file into the old one will make area object working

    1. et doesn't work, no matter how many time i have checked the input.
    solution. reboot matlab, and use mflab in southern folder.
    10/03/15

    rect_2 is designed to make very small et and start from low, now i am trying to make et very low at the beginning.
    apparently the flow one does not work at all!!!



    There is a tricky part for the volumetric flow given by budget, particularly at the cell where water table locates. the front or right side flux is not calculated by volumetric flow on cross section area of the cell, but volumetric flow on wetting area of the CELL!!!




    Comment 1:  if there is only one layer of cell, the first layer SHOULD be set as unconfied rather than varying cells. so that the result is correct. a experience from changing one layer from  modflow_no_leak
    and modflow_rect_dd_1



    Comment 2: if the time-variant specified head is not working properly, particularly the cells above the time variant hydraulic head is not wetted again, one should always make sure the cell abve the manipulated cell wet. This can be done either by moving the manipulated cell downward, or change the bottom hydraulic head that is manipulating.


    example study:

    mf2005\TerwisschaPE     lots of stress periods. theis equation is considered
    mf2005\DutchTop\RainLens very good chd package, but the simulation is not working. needs to find out the way to get it running. perhaps git back to the previous versions.


    an further analyze has indicated that mf2005 is performing better than mf2000 in terms of handling wetting and drying functions. specifically when WETDRY is positive, meaning cells from all directions can make the dry cell rewetted, mf2005 is able to deliver nice results while mf2000 can not get converged.



     *** ERROR OPENING FILE "Q2.HDS" ON UNIT    51
           SPECIFIED FILE STATUS: UNKNOWN
           SPECIFIED FILE FORMAT: BINARY
           SPECIFIED FILE ACCESS: SEQUENTIAL
           SPECIFIED FILE ACTION: READWRITE
      -- STOP EXECUTION (SGWF2BAS7OPEN)

    solution:
    1. download modflow for unix (
    2. in openspec.inc, comment  DATA FORM/'BINARY'/ and uncomment   DATA FORM/'UNFORMATTED'/.
    3. make the binary file




    mf2kgmg.h: In function ‘MF2KGMG_BIGH’:
    mf2kgmg.h:448:29: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
    pcgn2.f90:26.6:
      USE PCG_MAIN
          1
    Fatal Error: Can't open module file 'pcg_main.mod' for reading at (1): No such file or directory
    utl7.f:234.14:
             IF ( LVAL.EQ. .FALSE. ) THEN                                
                  1
    Error: Logicals at (1) must be compared with .eqv. instead of .eq.
    utl7.f:263.14:
             IF ( LVAL.EQ. .FALSE. ) THEN                                
                  1
    Error: Logicals at (1) must be compared with .eqv. instead of .eq.


    (1) improve quiver object
    $ grep -r -i --include *.m quiver ~/Projects/mflab/trunk/examples/
    /home/chenming/Projects/mflab/trunk/examples/Analytic/GGOR/Analytic04.m:if isphi==0 % if averag seepage from second aquiver is given
    /home/chenming/Projects/mflab/trunk/examples/mf2k/Qanats/TafiletMdl/mfiles/mf_analyze.m:%gr.quiver(B,1,'k','power',0.5);
    /home/chenming/Projects/mflab/trunk/examples/mf2k/Qanats/TafiletMdl/SCENARIOS/mf_analyze.m:%gr.quiver(B,1,'k','power',0.5);

    samba share

    For file linux file system applying on windows, it is always the best to use samba share.
    The confiuration file is given as follows:

    1.  uncommet the follow line in /etc/samba/smb.conf

    #======================= Share Definitions =======================
    
    # Un-comment the following (and tweak the other settings below to suit)
    # to enable the default home directory shares. This will share each 
    # user's home director as \\server\username
    [homes]
       comment = Home Directories
       browseable = no

    2. creater new users by

    #smbpasswd -a chenming

    3. restart samba process by:

    #systemctl restart smbd
    #systemctl restart nmbd
    4. test the connections by: 
    #smbclient -L localhost
    
    
    once there is an error
    ae429-1105 chenming # smbclient -L 10.33.20.12X
    Enter root's password: 
    Connection to 10.33.20.120 failed (Error NT_STATUS_CONNECTION_REFUSED)
    the problem is solved by restart samba 
    
    
    
    
    
    
    5. now it is time to mouprint(gcf,'-dpng',sprintf('-r%d',r), 'bar.png');nt the samba share by right click My computer -> map networkdrive->\\10.33.20.120\chenming and then password


    Tuesday 13 January 2015

    modelmuse study

    Although having been using sutra for long, I still find good GUI for preprocessing is lacking. A recent project allows me to further search an effect tool to create quad mesh


    first, the system lacks a input promt to put accurate coordinate for the object. the way to get around is to manually input from the gpt file.

    vertical exaggeration=vertical /horizontal ratio


    Porosity:   select the bigget object-> double click-> evaluated at nodes-> datasets tab-> initial head,
    Show nodal no ->


    at the moment nature is not able to run:
    reason: U-solution infereed from matrix equation a*u=0 solver not called
    so I set another

    nature_reset.UFluxBcs is always accepted, but the other ones are not working at the moment.

    initial head working 9800. * (240-Y)

    Monday 5 January 2015

    Setup torque/maui system _debug the system

    This one follows my previous article focusing on setting up torque system. However, it is found that torque 2.6.1 in Ubuntu system is out of date and not working properly. To circumvent this problem, I decide to move to torque/maui for better schedule efficiency.
    http://www.adaptivecomputing.com/support/download-center/torque-download/
    It is also noticed that adaptive computing is not maintaining torque and mari any more. which means bugs will not be cleaned. The ultimate solution for the system really is to move to slurm or sun grid system.


    First, Download torque and maui from their websites:

    maui has to be installed after torque installation

    error 1:
    pbs_mom: symbol lookup error: pbs_mom: undefined symbol: log_mutex

    solution:
    echo '/usr/local/lib' > /etc/ld.so.conf.d/torque.conf
    ldconfig


    error 2:
    socket_connect_unix failed: 15137
    qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd

    solution: make sure trqauthd is running with pbs_mom

    error 3: at the client
    pbs_mom
    pbs_mom: symbol lookup error: pbs_mom: undefined symbol: dis_getc


    error 4 at the client
    ./torque-mom start
     * Starting Torque Mom torque-mom
    /usr/sbin/pbs_mom: symbol lookup error: /usr/sbin/pbs_mom: undefined symbol: dis_getc
       ...fail!
    pbs_mom: symbol lookup error: pbs_mom: undefined symbol: dis_getc

    solution:
    ldd /usr/local/sbin/pbs_mom
            linux-vdso.so.1 =>  (0x00007fff9f7ff000)
            libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f2abbbed000)
            libtorque.so.2 => /usr/local/lib/libtorque.so.2 (0x00007f2abb2f6000)
            libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f2abaf99000)
            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2abad7c000)
            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2abab74000)
            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2aba873000)
            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2aba577000)
            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2aba361000)
            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2ab9fa1000)
            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2ab9d9d000)
            libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2ab9b86000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f2abbe0b000)

    the solution so far is to resintall the torque 5.0.1, 2015-05-27 it takes the whole morning to fix it
    this happens again 2015-09-28
    this file is located in
    /usr/local/sbin/pbs_mom
    just run it should be ok
    dis_getc is the old package from apt-get
    first: remove the torque in apt repo   : apt-get remove torque-mom
    now if run pbs_mom you wiil see
    ./pbs_mom
    pbs_mom: LOG_ERROR::No such file or directory (2) in chk_file_sec, Security violation with "/var/spool/torque/checkpoint" - /var/spool/torque/checkpoint cannot be lstat'd - errno=2, No such file or directory

    then reinstall torque-5.0.1-1_4fa836f5
    torque-package-clients-linux-x86_64.sh  --install
    torque-package-mom-linux-x86_64.sh  --install







    question 1:
    limit the maximum processes per user
    http://docs.adaptivecomputing.com/maui/6.2throttlingpolicies.php


    install pam torque
    libtool --finish /lib64/security
    /lib64/security/ is the place where pam files are located
    /etc/security/access.conf give access to anyone you wish to give



    set maui to limit the jobs and process per user

    USERCFG[DEFAULT] MAXPROC=64 MAXJOB=5 #working

    #GROUPCFG[useraid] MAXJOB[USER]=5  # not working
    #CLASSCFG[batch] MAXJOB[USER]=5   working


    CLASSCFG[batch] MAXJOB[USER]=5 MAXPROC=64  # not working


    Working solution to use pam to prevent user from logging into compute nodes
    give some users into compute nodes while others staying outside

    versions: torque-5.0.1-1_4fa836f5 maui-3.3.tar.gz
    in the tutorial given by official maui http://docs.adaptivecomputing.com/torque/3-0-5/3.4hostsecurity.php
    it says

    1. first configure torque with ./configure --with-pam

    2.
    /etc/pam.c/sshd.
    account required pam_pbssimpleauth.so
    account required pam_access.so

    and
    3.
    In /etc/security/access.conf make sure all users who access the compute node are added to the configuration.This is an example which allows the users root, george, allen, and michael access.
    -:ALL EXCEPT root george allen michael torque:ALL


    However, I found this method is too strong, specifically, none of root george allen can log into compute node.

    my solution:

    1. do not need to resinstall torque with  ./configure --with-pam

    2. put
    account required     pam_access.so
     into /etc/pam.d/sshd

    which means pam_access has to be considered for each ssh login

    3. put

    -:ALL EXCEPT root szhang czhang storres torque:ALL
    into /etc/security/access.conf
    now only szhang czhang root can log into compute nodes

    I think this idea is working and understandable. because at the moment all the submission is done by pbs_mom which is running under root, so pam_pbssimpleauth.so doesn't have to take into effect.







    reload maui

    just restart it. it wont affect the queue

    pkill maui && qterm -t quick && sleep 5&& /usr/local/maui/sbin/maui && pbs_server && ps aux |grep maui


    showres working
    showres -n
    checkjob 810 working
    checknode macondo01  % very good feedback
    showgrid AVGXFACTOR
    showstats
    mbal this will kill maui!!!!!!!!!!!!!!!
    mdiag same as diagnose

    I still didn't get the idea of maxnode. does it mean all job for one person has to go to one perticular node?

    mjobct
    ERROR:    corrupt command received


    mclient
    ERROR:    unknown command: 'mclient'

    mprof
    USAGE ERROR:  (tracefile not specified)

    mstat
    ERROR:  command 'mstat' args not handled
    ERROR:    service 36 not handled
    ERROR:    Service[36] 'mstat' not implemented

    showbf
    backfill window (user: 'czhang' group: 'useraid' partition: ALL) Sun Jan 18 15:25:07

    231 procs available for    7:11:35:38
    175 procs available for   21:18:13:37
    118 procs available for   40:14:55:01
     62 procs available for   40:21:06:15



    diagnose -j | grep -o -P '(?<=job \047).*(?=\047 utilizes more procs than)
    # this line can find out all the job where warnings comes out.

    diagnose -j
    Name                  State Par Proc QOS     WCLimit R  Min     User    Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class Features

    381                 Running DEF    1 DEF 10:00:00:00 1    1    cwang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    569                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '569' utilizes more procs than dedicated (10.35 > 1)
    650                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:20   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '650' utilizes more procs than dedicated (13.00 > 1)
    651                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:20   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '651' utilizes more procs than dedicated (10.28 > 1)
    669                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:19   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '669' utilizes more procs than dedicated (14.00 > 1)
    671                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '671' utilizes more procs than dedicated (9.57 > 1)
    672                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
    WARNING:  job '672' utilizes more procs than dedicated (7.80 > 1)


    \047 octal ascii represent single quote

    diagnose -j | grep -o -P '(?<=than dedicated \050).*(?=>)'

    \047 octal ascii represent 'left bracket'

    adse=$(diagnose -j | grep -o -P '(?<=than dedicated \050).*(?=>)')
    store result into adse

    if [ "$a" != "$b" ]
    then
      echo "$a is not equal to $b."
      echo "(string comparison)"
      #     "4"  != "5"
      # ASCII 52 != ASCII 53
    fi

    #!/bin/bash
    x=5.0
    y=3.0
    #ans= $(( $x + $y |bc  ))
    #ans=$(echo  $x + $y |bc )
    #ans=$(echo  $x / $y |bc -l )   # this ends up with good result
    #ans=$(echo  $x / $y |bc  )     # this does not give good result

    #ans=$(python -c "print $x / $y")    # this one is also ok but format is a problem

    #ans=$(python -c "print( "%.2f"     %($x / $y) ) ")  #failed
    #alpha=`echo "$a/100" | bc -l | awk '{printf("%06.2f", $1);}'`
    ans=`echo "$x/$y" | bc -l | awk '{printf("%6.4f", $1);}'`
    echo "$x / $y = $ans"


    maui starts off to be deprecated. use Sun Grid Engine (SGE, rock cluster uses this Oracle Grid Engine)  or slurm instead. 

    it feels to me that the soft hard limit only works for the groups not rather for users
    /usr/local/maui
    http://www.physics.oregonstate.edu/cluster_install

    Problem 2016-01-12:
    once running trqauthd
    trqauthd: symbol lookup error: trqauthd: undefined symbol: debug_mode
    this happens for the server, the server has been runing for a few days. once trqauthd is killed, it can not reboot, properly.



    root@macondo03:/home/users/uqczhan2#  trqauthd
    trqauthd: symbol lookup error: trqauthd: undefined symbol: debug_mode
    root@macondo03:/home/users/uqczhan2# pbs_server
    pbs_server: symbol lookup error: pbs_server: undefined symbol: job_log_mutex
    root@macondo03:/home/users/uqczhan2# pbs_mom
    pbs_mom: symbol lookup error: pbs_mom: undefined symbol: log_mutex
    root@macondo03:/home/users/uqczhan2# which trqauthd
    /usr/local/sbin/trqauthd
    root@macondo03:/home/users/uqczhan2# pbs_
    pbs_demux    pbs_mom      pbs_restart  pbs_sched    pbs_server   pbs_track  
    root@macondo03:/home/users/uqczhan2# pbs_sched
    pbs_sched: symbol lookup error: pbs_sched: undefined symbol: log_mutex
    root@macondo03:/home/users/uqczhan2# pbs_restart
    Cannot connect to default server host 'macondo03' - check pbs_server daemon.
    qterm: could not connect to server '' (1) Operation not permitted


     torque-package-server-linux-x86_64.sh
    we get pbs_sched  pbs_server  qschedd  qserverd

    ./torque-package-mom-linux-x86_64.sh --install

    Installing TORQUE archive... 

    Done.
    root@macondo03:/home/user/uqczhan2/czhang/Downloads/torque-5.0.1-1_4fa836f5# ls /usr/local/sbin
    momctl  pbs_demux  pbs_mom  pbs_sched  pbs_server  qnoded  qschedd  qserverd
    solution:
    ldd trqauthd
            linux-vdso.so.1 =>  (0x00007ffcf55e1000)
            libtorque.so.2 => /usr/local/lib/libtorque.so.2 (0x00007f365ed33000)
            libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f365eb16000)
            libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f365e816000)
            libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f365e458000)
            librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f365e250000)
            libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f365df54000)
            libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f365dd3e000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f365f62a000)
    today problem resolved again:

    infact fds model gets the system hangs. it changes the address of libtorque.so.2 and so trqauthd is not working.
    solution: i have removed everything associated with FDS in .bashrc (from LD_LIBRARY_PATH). and check ldd trqauthd. the right one should be the same as the ones above.

    also after the restore, there is a bit problem in restart pbs_mom pbs_server and pbs_sched .
    solution:
    first, apt-get remove torque-mom torque-server torque-sched, make sure the torque in apt system is not installed.
    second, reinstall torque 5.0.1 by configure, make make install.
    run one by one.
    the below are the errors appears when running pbs_mom pbs_server pbs_sched.
     pbs_mom
    pbs_mom: LOG_ERROR::No such file or directory (2) in chk_file_sec, Security violation with "/var/spool/torque/checkpoint" - /var/spool/torque/checkpoint cannot be lstat'd - errno=2, No such file or directory


    for pbs_server and pbs_sched, once running it, it doesn't show as a process in the system. 


    as long as reinstall torque 5.0.1 problem get resolved. 2016-01-12

    problem
    pbsnodes
    pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file



    cd /var/spool/torque/server_priv



    Saturday 3 January 2015

    install environment modules in a cluster

    http://linuxcluster.wordpress.com/2012/11/05/installing-and-configuting-environment-modules/

    use ganglia to monitor the system

    1. follow the instruction at https://www.digitalocean.com/community/tutorials/introduction-to-ganglia-on-ubuntu-14-04 to finish the installation.

    2. install gexec in each client

    3. change /etc/ganglia/gmond.conf
      change    gexec= yes

    4. reboot by sudo service ganglia-monitor restart

    it is noted that the server has to do this in the last so that all the clients can be found


    one incident: macondo03 is down. after it is rebooted, gstat can not see other machines. the only way to make everything back to normal is to run "sudo service ganglia-monitor restart" on every client so that the host can find all the machines.