Notes from a 2nd meeting between Maria Dimou, Dave Kelsey (part-time), Wim van Leersum, Ian Neilson, Nick Ziogas on 2004-05-19. This is a follow-up of the 2004-04-22 meeting and the one with the PIE specialists of 2004-04-30.
In this meeting we discussed with the technical responsibles of the new CCDB project, now called Computing Resources Admin (CRA) possibilities for our LCG User Registration tools to read-in/link-to the necessary Personal information from CERN HR db (generic name adopted in sub-sequent discussions: ORGDB, for ORGanisational DB) for new candidate LCG users. We explained that the LCG Virtual Organisation Data Base (VODB) must contain Personal information, i.e. Authentication (AuthN) data and Grid access Authorization (AuthZ) data for every user.
The AuthZ data can only be entered and maintained in VODB. The AuthN data can be only present in ORGDB. Open questions at the time of the meeting:
Nick sent by email straight after the meeting the following proposal that
keeps the VO manager as the mail data validator and minimises automation of
the process to join the VO:
- After discussion with all VOs a common interface is elaborated and agreed,
with a min amount of checks, that all VOs will have to use to enter personal
data.
- At data entry, the interface sends a request to the "ORGDB"s
of specific or all ORGs and a best effort match is done at the "ORG".
Data is sent back and the data entry clerk or manager chooses from the list
(s)
or rejects them and enters the person manually
- This mechanism is also available in batch mode. It can run regularly but
it means that someone will have to go though the matched data and sort it
out.
It could run once to 'clean' and validate the data.
It is not ideal as it requires manual intervention but then again if people
do not want to follow specific procedures they won't automatically get the
benefits. If the do not want to end with unmanageable systems 2 years from now
they must
be prepared to do the manual work.
In response to Maria's question: "How many records per LHC experiment can be retrieved in ORGDB?" Wim sent the numbers by email.
Wim's comment to these notes sent on 2004-06-15:
I would like to stress that the choice of how to link the databases should
depend on the number of overlaps. If 95% of the population already exists in
HR, I
would ask the experiment secretariats to enter the remaining 5% as well through
PIE, thus ensuring coherency. If it's only 50%, it may not be worth forcing
them to use PIE, and you could just take a copy of what we have and develop
your own tools to add the other half. However, if a person is entered that
way and later comes to CERN and gets registered through PIE as well, you may
end up with duplicates, because matching two persons with all the possibilities
of data entry mistakes is quite difficult. For that reason we do have a duplicate
checker which is invoked every time a person is entered in PIE, which is based
on statistics, and which is very difficult to maintain and tune.
Maria Dimou, IT/GD, Grid Infrastructure Services