The questions below will allow an appropriate infrastructure, operations and procedures to be established for a service. It is recommended to answer the questions with a collection of experts from the full service stack (system administrators, storage administrators, application administrators).
Where the answers are not known, a guess based on current experience and tests should be entered followed by a question marl (?). As more experience is gained, an improved figure can be entered. Naturally, some attributes of the service such as performance may not be possible to attain if the data provided is not precise. While an answer of 'do not know' is not ideal, it is better than an incorrect but confident answer.
This is intended for use before the solution is implemented. The ScFourQualityAssurance step ensures that the requested service has been implemented or highlights open activities.
Additional information can be found at ScFourServiceDefinition and ScFourServiceTechnicalFactors.
Question | Answer |
---|---|
Service | |
What service class is requested during which calendar periods ? An answer for AP, AS, OP and OS is required | OS=C |
Who is providing second level support for the application (e.g. when there is an application problem, which organisation is responsible for resolution) | CERN project-lcg-vo-dteam-admin@cern.ch |
By what mechanism should the second level support organisation be contacted | |
What is the agreed response time of the second level organisation | 3 hrs (9-5,Mon-Fri) |
Is the service level defined in ScFourServiceDefinition | Yes VOMS |
Configuration | |
What are the interfaces for this application | vomrs & voms-admin: HTTP-based user interface and SOAP
API, implemented as a Java web application. voms core: Internet ports, configurable, 15000 by default. We already use 15000-15010. |
What machines does it/could it run on | Linux SL3 |
What are the configuration parameters | vomrs: See http://computing.fnal.gov/docs/products/vomrs/vomrs1_2/configfile.html for
the config. paremetres. voms-admin: See http://cern.ch/dimou/lcg/voms/voms-admin-config-parameters voms core: port, backlog, backend, socket timeout, etc... |
Hardware Sizing | |
How much CPU power does the application need | Dependent on the number of client connections but basically anything >=Dual CPU 1GHz (what we have now). We can expect ~60 simultaneous client connections. |
How much real memory does the application require | Total for our configuration should be >=3GB |
How much swap space does the application require | vomrs: needs
about 60 MiB of virtual memory per VO. There is also one vomrs server per VO
that requires about 20-30 MiB per VO. voms-admin: needs about 50-75MiB of virtual memory per VO. voms core: Dependent on the number of client connections. |
What is the additional disk space required for the application (local logs, state data) | local
logs, size is configurable. vomrs: 60 MiB voms-admin: 100 MiB |
What is the database setup requirements | Database_Type=Oracle Database_Name=? (living on grid8/voms-pilot.cern.ch host) Database_Size=vomrs and voms-admin applications need at least one separate database account per VO. Normally one account capable of doing selects on all the DB, and another capable of doing update on a single table (the table will contain just one record). Expected size for a VO with 5000 users, 10 groups and 10 roles is less then 2MB, plus thedata for the admin interface. |
Software Components | |
What software components make up the solution ? Web Servers, Databases, Code | vomrs (uses tomcat, needs database backend, now on grid8) foundation-view (uses CERN HR Oracle DB) voms-admin (uses tomcat,soap,java) voms http://cern.ch/dimou/lcg/voms/server.html |
Is there any licenses software which is part of the solution | Oracle client but OK for CERN |
Is there a diagram explaining the role of the application in the total deliverable | There are diagrams describing the links between the components but I don't know if there is a picture of the "total deliverable" where we could plug our part. |
Data | |
Is the application stateful ? Where is the state data stored ? | vomrs: The web UI retains session states for some configurable
amount of time. vomrs and voms-admin: Yes. The state is stored in the configured database. The user interface is stateless, i.e., the service does not retain session states across user requests. voms core: Yes, the state is stored in DB, the size is fixed. |
Is there a replication procedure so that the state data can be copied to another system ? | vomrs: a script exists that copies the configuration. 2
or more vomrs web application/servers can share the same database. voms-admin: Yes. Two or more voms-admin servers can share the same database. |
Backup/Restore | |
What files and directories should be backed up daily | Everything under /opt and /var |
Is there any requirement for a backup more frequenty than once a day | No |
What files need to be archived (i.e. kept for legal, security or accounting) ? | Everything in /var/log |
How long should the archive data be kept | Two years. See http://edms.cern.ch/document/428034 section 4 point 2 |
What databases need to be backed up | All under accounts voms_[VOname] and vomrs_[VOname] |
Is there a requirement for a coherent backup between files and databses | No |
Is off-site data storage required for any of the data being backed up | Not for now. Replication of VO DB is being discussed in the Security group still. Normally it is allowed by policy but not yet required. |
Networking | |
Are IP aliases supported for the service rather than the hostname of the machines ? | Not now. |
What is the expected network bandwidth requirement for the machine | 100MB/sec sustained
bandwidth at maximum. |
Is connectivity from the application to the outside of CERN required ? If so, for what purpose | TCP/IP connectivity in http://network.cern.ch is INCOMING |
Is connectivity from outside of CERN to the application required ? If so, for what purpose | Yes, on port 8443 for authorised registration and ports 15xyz for voms-proxy |
What external systems is the product dependent on for correct function | lxb2051 (where the configuration files are stored), the central Physics (?) db servers (grid8 for now) and the CERN HR db. |
Monitoring | |
What processes need to be running for the service to be up ? | tomcat5, edg-voms, sshd (from within CERN), crond, java (vomrs) |
What file systems need to be monitored and to what thresholds to avoid operation issues | local /opt and /var up to 80% |
Is there an application level check (such as a simulation of a user query) which can be used to check that the application is responding to user requests | Yes. The command 'service gLite status' and https://[VOMSserver].cern.ch:8443/vo/[VOname]/vomrs where VOMSserver == lcg-voms | voms | voms-slave and [VOname] == a list of 9 VOs visible in https://lcg-voms.cern.ch:8443/vomses (requires personal certificate). |
Automation | |
What automatic processes run when (cron, acron) | cron contains various processes different on every [VOMSserver]. |
Testing | |
Is a test environment defined | Yes. On testbed004 == voms-test |
Procedures | |
Is there an administration guide which explains
|
Installation: https://uimon.cern.ch/twiki/bin/view/LCG/VomsCernSetup Configuration & Update: https://uimon.cern.ch/twiki/bin/view/LCG/VomsConfiguration Problem Solving: https://uimon.cern.ch/twiki/bin/view/LCG/VomsProblemSolving |
Are there provides defined for the operators to
|
https://uimon.cern.ch/twiki/bin/view/LCG/VomsStartStopCheck |
Is there automatic monitoring of the service and a procedure to re-act in the event of a problem | r-gma configured, maybe not used (?). Periodical restart of tomcat5. |
In the event of a extended failure, what processes must be executed retro-actively (such as accounting catchup) | None I know of. |
What regular tasks need to be performed by the operators (cleanup of file systems, reboot of servers,...) | None hopefully when the system and software is stable. So far, we have been doing continuous upgrades to install bug fixes. |
What regular tasks need to be performed by administrators (change configuration files, tuning) | Creation of new VOs, logs' checkin, voms-admin parameters adjustment has been so far necessary. |
For planned changes, how can the service be drained so that there are no new requests arriving ? What is the maximum lifetime of a request to the application can be stopped after draining ? | Stoppage of tomcat and voms (== edg-voms) on scheduled time. Don't know about jobs running that might require proxy renewal during the intervention. Normally the default time for a proxy is 12 hours. |
Users | |
Who are the users of the service ? | All grid registered users of all VOs. |
What declarations of users / groups / roles is required | This is a decision of the VO Admins and is declared via vomrs. |
How will the users access the service | Via the web https://lcg-voms.cern.ch:8443/vo/[VOname]/vomrs and the command line of their UI (voms-proxy-[init|info]) |
What super user / high access rights are required by the application adminstrators | root (the way we have installed now) and the passwd for the CERN HR db view and the Oracle db accounts were the data are stored (just for direct testing of the databases' content with sql commands). |
What technical users are required for the installation and administrator | I don't understand this question |
Support | |
Who are the users of the service | All grid registered users of all VOs. |
What channel do the users have for reporting problems | email to project-lcg-vo-[VOname]-admin@cern.ch |
Escalation | |
Who should be informed when the service goes down | The LCG-ROLLOUT@listserv.cclrc.ac.uk list and the VO Admins project-lcg-vo-[VOname]-admin@cern.ch, via the EGEE BROADCAST <egee-broadcast@cern.ch> |
When the service window for recovery will not be achieved, who should comprise the crisis committee | project-lcg-vo-dteam-admin@cern.ch |
What other services should be stopped in order to reduce impact of outage | The cron jobs for running edg-mkgridmap.pl on every CE and RB but it is hard to arrange that at all sites. |
Changes | |
Who is authorized to request an update or change | The VO Admins, the developers, the Grid Deployment members responsible for LCG2 and/or gLite releases. |
What are the procedures for announcing the change to the community | Announcement via EGEE broadcast tool under https://cic.in2p3.fr |
What is the lifetime of the current product version (i.e. when should it be changed) | Between May and October 2005 we had to change 6 times. We sincerely wish to stabilise if the code is now robust. |
When are the maintenance windows for this product ? | I don't understand. We make changes during working hours but we announce them before. |
-- Maria Dimou 1 Nov 2005
to top