Sunday, 15 February 2015

samba share

For file linux file system applying on windows, it is always the best to use samba share.
The confiuration file is given as follows:

1.  uncommet the follow line in /etc/samba/smb.conf

#======================= Share Definitions =======================

# Un-comment the following (and tweak the other settings below to suit)
# to enable the default home directory shares. This will share each 
# user's home director as \\server\username
[homes]
   comment = Home Directories
   browseable = no

2. creater new users by

#smbpasswd -a chenming

3. restart samba process by:

#systemctl restart smbd
#systemctl restart nmbd
4. test the connections by: 
#smbclient -L localhost

once there is an error
ae429-1105 chenming # smbclient -L 10.33.20.12X
Enter root's password: 
Connection to 10.33.20.120 failed (Error NT_STATUS_CONNECTION_REFUSED)
the problem is solved by restart samba 



5. now it is time to mouprint(gcf,'-dpng',sprintf('-r%d',r), 'bar.png');nt the samba share by right click My computer -> map networkdrive->\\10.33.20.120\chenming and then password


Tuesday, 13 January 2015

modelmuse study

Although having been using sutra for long, I still find good GUI for preprocessing is lacking. A recent project allows me to further search an effect tool to create quad mesh


first, the system lacks a input promt to put accurate coordinate for the object. the way to get around is to manually input from the gpt file.

vertical exaggeration=vertical /horizontal ratio


Porosity:   select the bigget object-> double click-> evaluated at nodes-> datasets tab-> initial head,
Show nodal no ->


at the moment nature is not able to run:
reason: U-solution infereed from matrix equation a*u=0 solver not called
so I set another

nature_reset.UFluxBcs is always accepted, but the other ones are not working at the moment.

initial head working 9800. * (240-Y)

Monday, 5 January 2015

Setup torque/maui system _debug the system

This one follows my previous article focusing on setting up torque system. However, it is found that torque 2.6.1 in Ubuntu system is out of date and not working properly. To circumvent this problem, I decide to move to torque/maui for better schedule efficiency.
http://www.adaptivecomputing.com/support/download-center/torque-download/
It is also noticed that adaptive computing is not maintaining torque and mari any more. which means bugs will not be cleaned. The ultimate solution for the system really is to move to slurm or sun grid system.


First, Download torque and maui from their websites:

maui has to be installed after torque installation

error 1:
pbs_mom: symbol lookup error: pbs_mom: undefined symbol: log_mutex

solution:
echo '/usr/local/lib' > /etc/ld.so.conf.d/torque.conf
ldconfig


error 2:
socket_connect_unix failed: 15137
qstat: cannot connect to server (null) (errno=15137) could not connect to trqauthd

solution: make sure trqauthd is running with pbs_mom

error 3: at the client
pbs_mom
pbs_mom: symbol lookup error: pbs_mom: undefined symbol: dis_getc


error 4 at the client
./torque-mom start
 * Starting Torque Mom torque-mom
/usr/sbin/pbs_mom: symbol lookup error: /usr/sbin/pbs_mom: undefined symbol: dis_getc
   ...fail!
pbs_mom: symbol lookup error: pbs_mom: undefined symbol: dis_getc

solution:
ldd /usr/local/sbin/pbs_mom
        linux-vdso.so.1 =>  (0x00007fff9f7ff000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f2abbbed000)
        libtorque.so.2 => /usr/local/lib/libtorque.so.2 (0x00007f2abb2f6000)
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f2abaf99000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2abad7c000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2abab74000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2aba873000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2aba577000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2aba361000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2ab9fa1000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2ab9d9d000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2ab9b86000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2abbe0b000)

the solution so far is to resintall the torque 5.0.1, 2015-05-27 it takes the whole morning to fix it
this happens again 2015-09-28
this file is located in
/usr/local/sbin/pbs_mom
just run it should be ok
dis_getc is the old package from apt-get
first: remove the torque in apt repo   : apt-get remove torque-mom
now if run pbs_mom you wiil see
./pbs_mom
pbs_mom: LOG_ERROR::No such file or directory (2) in chk_file_sec, Security violation with "/var/spool/torque/checkpoint" - /var/spool/torque/checkpoint cannot be lstat'd - errno=2, No such file or directory

then reinstall torque-5.0.1-1_4fa836f5
torque-package-clients-linux-x86_64.sh  --install
torque-package-mom-linux-x86_64.sh  --install







question 1:
limit the maximum processes per user
http://docs.adaptivecomputing.com/maui/6.2throttlingpolicies.php


install pam torque
libtool --finish /lib64/security
/lib64/security/ is the place where pam files are located
/etc/security/access.conf give access to anyone you wish to give



set maui to limit the jobs and process per user

USERCFG[DEFAULT] MAXPROC=64 MAXJOB=5 #working

#GROUPCFG[useraid] MAXJOB[USER]=5  # not working
#CLASSCFG[batch] MAXJOB[USER]=5   working


CLASSCFG[batch] MAXJOB[USER]=5 MAXPROC=64  # not working


Working solution to use pam to prevent user from logging into compute nodes
give some users into compute nodes while others staying outside

versions: torque-5.0.1-1_4fa836f5 maui-3.3.tar.gz
in the tutorial given by official maui http://docs.adaptivecomputing.com/torque/3-0-5/3.4hostsecurity.php
it says

1. first configure torque with ./configure --with-pam

2.
/etc/pam.c/sshd.
account required pam_pbssimpleauth.so
account required pam_access.so

and
3.
In /etc/security/access.conf make sure all users who access the compute node are added to the configuration.This is an example which allows the users root, george, allen, and michael access.
-:ALL EXCEPT root george allen michael torque:ALL


However, I found this method is too strong, specifically, none of root george allen can log into compute node.

my solution:

1. do not need to resinstall torque with  ./configure --with-pam

2. put
account required     pam_access.so
 into /etc/pam.d/sshd

which means pam_access has to be considered for each ssh login

3. put

-:ALL EXCEPT root szhang czhang storres torque:ALL
into /etc/security/access.conf
now only szhang czhang root can log into compute nodes

I think this idea is working and understandable. because at the moment all the submission is done by pbs_mom which is running under root, so pam_pbssimpleauth.so doesn't have to take into effect.







reload maui

just restart it. it wont affect the queue

pkill maui && qterm -t quick && sleep 5&& /usr/local/maui/sbin/maui && pbs_server && ps aux |grep maui


showres working
showres -n
checkjob 810 working
checknode macondo01  % very good feedback
showgrid AVGXFACTOR
showstats
mbal this will kill maui!!!!!!!!!!!!!!!
mdiag same as diagnose

I still didn't get the idea of maxnode. does it mean all job for one person has to go to one perticular node?

mjobct
ERROR:    corrupt command received


mclient
ERROR:    unknown command: 'mclient'

mprof
USAGE ERROR:  (tracefile not specified)

mstat
ERROR:  command 'mstat' args not handled
ERROR:    service 36 not handled
ERROR:    Service[36] 'mstat' not implemented

showbf
backfill window (user: 'czhang' group: 'useraid' partition: ALL) Sun Jan 18 15:25:07

231 procs available for    7:11:35:38
175 procs available for   21:18:13:37
118 procs available for   40:14:55:01
 62 procs available for   40:21:06:15



diagnose -j | grep -o -P '(?<=job \047).*(?=\047 utilizes more procs than)
# this line can find out all the job where warnings comes out.

diagnose -j
Name                  State Par Proc QOS     WCLimit R  Min     User    Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs       Class Features

381                 Running DEF    1 DEF 10:00:00:00 1    1    cwang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
569                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '569' utilizes more procs than dedicated (10.35 > 1)
650                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:20   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '650' utilizes more procs than dedicated (13.00 > 1)
651                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:20   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '651' utilizes more procs than dedicated (10.28 > 1)
669                 Running DEF    1 DEF 41:16:00:00 1    1 mgholami  useraid uq-Civil    00:49:19   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '669' utilizes more procs than dedicated (14.00 > 1)
671                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '671' utilizes more procs than dedicated (9.57 > 1)
672                 Running DEF    1 DEF 25:00:00:00 1    1   pzhang  useraid uq-Civil    00:49:21   [NONE] [NONE] [NONE]    >=0    >=0    NC0   [batch:1] [NONE]
WARNING:  job '672' utilizes more procs than dedicated (7.80 > 1)


\047 octal ascii represent single quote

diagnose -j | grep -o -P '(?<=than dedicated \050).*(?=>)'

\047 octal ascii represent 'left bracket'

adse=$(diagnose -j | grep -o -P '(?<=than dedicated \050).*(?=>)')
store result into adse

if [ "$a" != "$b" ]
then
  echo "$a is not equal to $b."
  echo "(string comparison)"
  #     "4"  != "5"
  # ASCII 52 != ASCII 53
fi

#!/bin/bash
x=5.0
y=3.0
#ans= $(( $x + $y |bc  ))
#ans=$(echo  $x + $y |bc )
#ans=$(echo  $x / $y |bc -l )   # this ends up with good result
#ans=$(echo  $x / $y |bc  )     # this does not give good result

#ans=$(python -c "print $x / $y")    # this one is also ok but format is a problem

#ans=$(python -c "print( "%.2f"     %($x / $y) ) ")  #failed
#alpha=`echo "$a/100" | bc -l | awk '{printf("%06.2f", $1);}'`
ans=`echo "$x/$y" | bc -l | awk '{printf("%6.4f", $1);}'`
echo "$x / $y = $ans"


maui starts off to be deprecated. use Sun Grid Engine (SGE, rock cluster uses this Oracle Grid Engine)  or slurm instead. 

it feels to me that the soft hard limit only works for the groups not rather for users
/usr/local/maui
http://www.physics.oregonstate.edu/cluster_install

Problem 2016-01-12:
once running trqauthd
trqauthd: symbol lookup error: trqauthd: undefined symbol: debug_mode
this happens for the server, the server has been runing for a few days. once trqauthd is killed, it can not reboot, properly.



root@macondo03:/home/users/uqczhan2#  trqauthd
trqauthd: symbol lookup error: trqauthd: undefined symbol: debug_mode
root@macondo03:/home/users/uqczhan2# pbs_server
pbs_server: symbol lookup error: pbs_server: undefined symbol: job_log_mutex
root@macondo03:/home/users/uqczhan2# pbs_mom
pbs_mom: symbol lookup error: pbs_mom: undefined symbol: log_mutex
root@macondo03:/home/users/uqczhan2# which trqauthd
/usr/local/sbin/trqauthd
root@macondo03:/home/users/uqczhan2# pbs_
pbs_demux    pbs_mom      pbs_restart  pbs_sched    pbs_server   pbs_track  
root@macondo03:/home/users/uqczhan2# pbs_sched
pbs_sched: symbol lookup error: pbs_sched: undefined symbol: log_mutex
root@macondo03:/home/users/uqczhan2# pbs_restart
Cannot connect to default server host 'macondo03' - check pbs_server daemon.
qterm: could not connect to server '' (1) Operation not permitted


 torque-package-server-linux-x86_64.sh
we get pbs_sched  pbs_server  qschedd  qserverd

./torque-package-mom-linux-x86_64.sh --install

Installing TORQUE archive... 

Done.
root@macondo03:/home/user/uqczhan2/czhang/Downloads/torque-5.0.1-1_4fa836f5# ls /usr/local/sbin
momctl  pbs_demux  pbs_mom  pbs_sched  pbs_server  qnoded  qschedd  qserverd
solution:
ldd trqauthd
        linux-vdso.so.1 =>  (0x00007ffcf55e1000)
        libtorque.so.2 => /usr/local/lib/libtorque.so.2 (0x00007f365ed33000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f365eb16000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f365e816000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f365e458000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f365e250000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f365df54000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f365dd3e000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f365f62a000)
today problem resolved again:

infact fds model gets the system hangs. it changes the address of libtorque.so.2 and so trqauthd is not working.
solution: i have removed everything associated with FDS in .bashrc (from LD_LIBRARY_PATH). and check ldd trqauthd. the right one should be the same as the ones above.

also after the restore, there is a bit problem in restart pbs_mom pbs_server and pbs_sched .
solution:
first, apt-get remove torque-mom torque-server torque-sched, make sure the torque in apt system is not installed.
second, reinstall torque 5.0.1 by configure, make make install.
run one by one.
the below are the errors appears when running pbs_mom pbs_server pbs_sched.
 pbs_mom
pbs_mom: LOG_ERROR::No such file or directory (2) in chk_file_sec, Security violation with "/var/spool/torque/checkpoint" - /var/spool/torque/checkpoint cannot be lstat'd - errno=2, No such file or directory


for pbs_server and pbs_sched, once running it, it doesn't show as a process in the system. 


as long as reinstall torque 5.0.1 problem get resolved. 2016-01-12

problem
pbsnodes
pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file



cd /var/spool/torque/server_priv



Saturday, 3 January 2015

install environment modules in a cluster

http://linuxcluster.wordpress.com/2012/11/05/installing-and-configuting-environment-modules/

use ganglia to monitor the system

1. follow the instruction at https://www.digitalocean.com/community/tutorials/introduction-to-ganglia-on-ubuntu-14-04 to finish the installation.

2. install gexec in each client

3. change /etc/ganglia/gmond.conf
  change    gexec= yes

4. reboot by sudo service ganglia-monitor restart

it is noted that the server has to do this in the last so that all the clients can be found


one incident: macondo03 is down. after it is rebooted, gstat can not see other machines. the only way to make everything back to normal is to run "sudo service ganglia-monitor restart" on every client so that the host can find all the machines.

Monday, 1 December 2014

compilling modpath6 modflow-nwt modflow2005 in linux(makefile change)

The list of files are as follows:
MOD_GLOBAL.for  MOD_PARTICLEDATA.for    MP6Flowdata.for  MP6ParticleMgr.for     Writpts0.for    MOD_MPBAS.for   MOD_PrecisionCheck.for  MP6.for          MP6TrackParticles.for  MOD_MPDATA.for  MP6Budgetrd.for         MP6MPBAS1.for    MP6Util.for
Note that MP6PrecisionCheck.for has been changed into MOD_precisionCheck.for as it contains a module. Also fortran is independent from file names.
 By first start, I have used:

gfortran *.for *.inc -o a.out
there are errors reported.

Then I dicided to make the step-by-step compiling
gfortran -c MOD*.for
gfortran -c PM6*.for
gfortran *.o -o def
The binary file has been created, but it runs with errors.
So I dicided to use ifort for compilling:
ifort -c MOD*.for
ifort -c PM6*.for
ifort *.o -o def
Everything works perfectly.

Learned:
  1. if dirct compilling is not working, it is necessary to create the object files first and then do the compilling work. 
  2. the *.inc file may not be needed in the command that conduct the compilling work.
  3. some programs are depended on the compiller (either gfortran or ifort)

The same trick may also need to be used for SUTRA-MS

Modflow2005 is almost working out of the box, except that one needs to change the inc file.

modflow-nwt it is found that gfortran lack of ieee_arithmatic library. ifort has no problem in finding that library. question has been raised in stackoverflow. the other trick i made



Sunday, 5 October 2014

use matlab -nodesktop -nosplash and oop

>>commandhistory

 pops out the commandhistory window. but this window does not remember all the commands executed in terminal emulator

 >>setenv('LD_LIBRARY_PATH',[getenv('PATH') getenv('LD_LIBRARY_PATH')])

 Add the system library in front of the matlab library. this is exetrmely useful when running system command in matlab using "!"


Debugging in command line

dbstop in Datareading

where Datareading is the file name where script is suppose to stop

Datareading

then scripts starts to run, and EDITOR pops out.

dbclear

clear out all the breaking point
2.  Use editor to edit one file
>>edit ~/Dropbox/Matlab/SutraLab/SutraLab/mfiles/slsetpath

methodsview  -- show all the methods

3.  Make sure tiff files are exactly the same as eps file:
See my question: http://stackoverflow.com/questions/3600945/printing-a-matlab-plot-in-exact-dimensions-on-paper
alternative solution: using bash command:
convert -colorspace RGB -density 300 coutour.eps -resize 1024x1024 image.jpg
another better command gives high compression files
convert -colorspace RGB -density 1000 coutour.eps -resize 1000x1000 -compress ZIP image.tif
the export_fig is a useful tool to work for figure output.
but I found it is not very friendly with linux.
so the getaround is again, using linux to do caluclation, and then use windows to output results.
4.  Font size does not change in matlab in Linux 15-07-14
there are serveral things to look into for solving this problem
(1) check if you have something missing with the font.
http://stackoverflow.com/questions/16218979/changing-figure-fonts-in-matlab-has-no-effect
sudo apt-get install xfonts-75dpi xfonts-100dpi
This works for my ubuntu machine.

in gentoo linux,
emerge -av media-fonts/font-adobe-100dpi media-fonts/font-adobe-75dpi liberation-fonts
I also have rebooted the computer.
here are some useful commands:

 eselect fontconfig list
 find /usr/share/ -iname 'helvet'*
fc-match Helvetica
(2) people also argue that matlab is very slow when using figures through ssh. to resolve that, one has to put two lines in /etc/X11/xorg.conf (in 'device' section), which is caused by nvidia driver.
http://ifixdit.blogspot.com.au/2011/08/how-to-fix-matlab-small-figures-and.html

Option "UseEdidDpi"   "false"
Option "Dpi"          "92 x 92"
which i also have done in gentoo. but in ubuntu (installed in a machine using ATI cards), I do not have to do that.