KOBIC-DDBJ timeline

written by Okubo.K, with collboration of Mr.Aono and Dr.Osagawara for Mr.Hanshimoto in JPO Okubo 12:08, 18 May 2012 (JST) aemk791hZ7I =Data flow= #1 KIPO data for KOBIC: We have not see this one. #2 JPO format file of KIPO data <> PN: patent number -> application number PD: patent date -> application date PR: priority date and number PT: title PA: applicants TY: sequence type (DNA, PRT) OS: organism CO: comments SQ: sequence (start the below lines)
 * KIPO|--#1-(DVD)-->|KOBIC|--#2-(KOBIC/ftp site)>|DDBJ|--->EMBL,GenBank

 2007-03-08  Dr.Sugawara visit Dr.Lee at KOBIC on Patent data  2007-08-03  KIPO DNA data send to DDBJ (#1)  2007-12-12  KIPO DNA data send to DDBJ (#2)  2008-02-21  KIPO DNA data(#1#2) released from DDBJ 282,117 entries  2008-03-26  EBI pointed out some "errors" in KIPO-DDBJ-INSD  2009-1-09  KIPO send DNA and amino acid data to DDBJ (#3)  2009-04-30  KIPO aa data(#3) released from DDBJ 110,000 ? entries 2009-xxxx <Data> KIPO data(#4) transferred to DDBJ but hold due to errors. <Date> 2010-03-15 <WEB> KOBIC/KIPO/DDBJ/JPO wiki started  (DDBJ Patent ) <Date> 2010-05-27 <Meet> Dr.Okubo visit Dr.Lee at KOBIC on Patent data <Date> 2012-03-08 <Meet> Dr.Lee visit DDBJ as a board member of DDBJ

Since INSDC has no strict format other than table structure, "error" means out of the common restriction of character usage, or regulatory code usage, otherwise at the semantic level which is in a sense vocabulary control.

= various small "errors" pointed out in KIPO/KOBIC/DDBJ data　summary= PN KR 1020067020883-A/1660 PT METHODS AND COMPOSITIONS FOR THETREATMENT OF GASTROINTESTINAL DISORDERS PD 2006-10-04 PA CURRIE, Mark G., MAHAJAN-MIKLOS, Shalina, FRETZEN, Angelika, SUN, Li Jing, MILNE, G. Todd,    NORMAN, Thea, CURRIE, Mark G., MAHAJAN-MIKLOS, Shalina, FRETZEN, Angelika, SUN, Li Jing, MILNE, G. Todd, NORMAN, Thea and KURTZ, Caroline PR 2005-02-08 US 11/054,071
 * 1) No space in word boundary in Reference title. A few thousand entries found (ex DI001534)
 * 2) *KOBIC and DDBJ tried to correct in ad hoc manner but some are still on the web.
 * 3) * example link to accession DI001534
 * 4) Same author names are repeated
 * 5) * exapmle before correction data sent from KOBIC to DDBJ

--->corrected by KOBIC as follows--->

PA CURRIE,M.G., MAHAJAN-MIKLOS,S., FRETZEN,A., SUN,L.J., MILNE,G.T., NORMAN,T., KURTZ,C. LOCUS      DI524890                7285 aa    PRT              PAT 21-FEB-2008 DEFINITION KR 1020037014672-A/28: PEPTIDES AND RELATED MOLECULES THAT BIND TO            TALL-1.
 * 1) * example link to acession DI598270
 * 2) DNA entry submitted as AminoAcids. not corrected yet D524890

1 GATCAGCAGT CCCCGGAACA TCGTAGCTGA CGCCTTCGCG TTGCTCAGTT GTCCAACCCC 61 GGAAACGGGA AAAAGCAAGT TTTCCCCGCT CCCGGCGTTT CAATAACTGA AAACCATACT 121 ATTTCACAGT TTAAATCACA TTAAACGACA GTAATCCCCG TTGATTTGTG CGCCAACACA : 7201 ATGTCGTCGT CAACGACCCC CCATTCAAGA ACAGCAAGCA GCATTGAGAA CTTTGGAATC 7261 CAGTCCCTCT TCCACCTGCT GACCG
 * 1) Some amino acid sequence has Amino acid 3 character code.
 * 2) *this one should be corrected to "RGSHHHHHHGS" using sible letter code. DDBJ asked KOBIC whether we can correct.  But no reply yet.

=Current Status as of 2012= [2011-02-24] "“ Currently we have received Korean patent data from KIPO once a month, but as I mentioned, these data have some errors, for example, illegal DNA characters and misspelling in title. So, before the data transfer, we need to correct these errors with KIPO members. “"
 * DDBJ received the comment from Dr. Byungwook Lee (KOBIC) :

[2011-02-28] “ If you could let us see the raw data you received from KIPO, that will help us understand your situation. ”
 * Dr. Okubo (DDBJ) replied to Dr. Byungwook Lee (KOBIC) :

=PROPOSALS=


 * DDBJ do not have permission to correct trivial errors detected. --> Many correspondence rquired
 * We ask KPBIC/KIPO permission to correct trivial things alone by reporting to KOBIC.
 * To do this we need to have access to KIPO data (send from KIPO to KOBIC)


 * Automatic translation of contaminating Hangle letters to English
 * This step is impossible for us. We count on KOBIC to establish this difficult one.
 * Proper parser to make multi lines into one (DDBJ can do with KOBIC)
 * This step is a routine for DDBJ, we can easily translate KIPO file to DDBJ
 * Check program to detect amino acids in DNA field (DDBJ routine)
 * System to correct common biological name to systematic name (DDBJ routine)
 * An environment to compare the input data and output data for DDBJ is necesary to save the mail correspondences. (access to KIPO data send to KOBIC)