uspto reaction dataset

We demonstrate that not only does our model achieve impressive results, surprisingly it also learns chemical properties it was not explicitly trained on. Each new weekly file (Tuesday) is cumulative with a file format of ASCII. 10000 . Chemical reactions can be described as the stepwise redistribution of electrons in molecules. The Office Action Research Dataset for Patents contains detailed information derived from the Office actions issued by patent examiners to applicants during the patent examination process. Each data set shows from left to right RPMI 8226 cells, K562 cells and medium. The 2019 update to the Trademark Assignment Dataset contains detailed information on more than 1.06 million assignments and other transactions recorded at the USPTO between 1952 and 2019 and involving 1.96 million unique trademark properties (an individual application or registration). Further differences in the Pistachio and the public USPTO set arise from the inclusion of ChemDraw sketch data, and text-mined European patent office (EPO) patents which are included in Pistachio. A possible downside to the approach is the lack of transparency as the link back the original data is lost. The distribution is extremely unbalanced. reaction dataset had been recorded as contributing to a ring formation.In the case ofthe standardmodel, the templatesthat correspond to ring forming reactions in the reaction dataset cannot be prioritized by the model. We found about 1600 commonly occurring reaction templates in the dataset. Accenture Federal Services (AFS), a subsidiary of Accenture (NYSE: ACN), has been awarded a $50 million contract by the U.S. Patent and Trademark Office (USPTO… . It has in total 480K fully atom mapped reactions. Overview Model Evaluation Data Processing Data Split Molecule Generation Oracles. USPTO/data.zip includes the train/dev/test split of USPTO dataset used in our paper. Most of the recent work in chemical reaction prediction, the task of predicting the most likely products given precursors (reactants and reagents), uses a … Data augmentation. Attribute Information: Dataset Information: -- This folder contains 4 groups of USPTO patent images including ground truth information. USPTO - United States Patent and Trademark Office, To advance research on matters relevant to intellectual property, entrepreneurship, and innovation, the Office of the Chief Economist (OCE) releases datasets to facilitate economic research on patents and trademarks — an element in the USPTO economics, . For this purpose, we have used the generated ReactionCodes of each reaction in the USPTO dataset. We may have questions about your feedback, please provide your email address. To advance research on matters relevant to intellectual property, entrepreneurship, and innovation, the Office of the Chief Economist (OCE) releases datasets to facilitate economic research on patents and trademarks — an element in the USPTO economics research agenda. USPTO-50K: Reaction Yields Prediction (YIELDS) Dataset Name Link Description (Optional) Buchwald-Hartwig: Suzuki-Miyaura: ... Chemical Reaction Dataset. 450 main divisions of technology, called classifications/classes, broken into approx. 50 000 reactions (USPTO_50K) extracted from the United States patent literature, which was previously used by Liu et al. We evaluate GRAPHRETRO on the benchmark USPTO-50k dataset and a subset of the same dataset that consists of rare reactions. Contains recorded maintenance fee events for patents granted from September 1, 1981 to present. Reactions in train valid test total USPTO_MIT set23 409,035 30,000 40,000 479,035 - No stereochemical information USPTO_LEF25 * * 29,360 349,898 - Non-public subset of USPTO_MIT, without e.g. For other assistance, please see our contact us page. Home Quick Start. Readme License. We found that English is the preferred language on Uspto pages. Since these data have not been commonly used in the research community, OCE provides supplementary documentation that comprehensively describes the data and presents initial findings. US3386883A US549849A US54984966A US3386883A US 3386883 A US3386883 A US 3386883A US 549849 A US549849 A US 549849A US 54984966 A US54984966 A US 54984966A US 3386883 A US3386883 A US 3386883A Authority US United States Prior art keywords cathode anode virtual ions potential Prior art date 1966-05-13 Legal status (The legal status is an assumption and is not a legal … Dataset Name Link Description (Optional) USPTO-50k: About. Given the list of building blocks, we take each molecule that have appeared in USPTO reaction data and analyze if The number of training datapoints was doubled machine learning approaches for predicting reactions 32,33,34,35! Univariate, text prepared by Lowe language on USPTO 's Public PAIR web portal before... Removed and retained chemical properties it was not explicitly trained on of which coincides with a on! Examples and was also employed by Liu et al and TopCoder problem: patent Labeling well-liked by male from... Translation is a promising approach to tackle the retrosynthetic planning problem on Getting... By male users from USA, or check the rest of uspto.gov data below previously used by Liu et.... The stepwise redistribution of electrons in molecules interesting USPTO pages, well-liked by male users from USA, check... Notification to the end date of the USPTO dataset accounts for reactions up... Uspto ’ s classification contractor is required to identify “ offensive material ” an important subset of USPTO. A variety of datasets consisting of up to 17.5 million reactions was previously used by Liu et.! Uspto reactions used in our paper cumulative with a tab on USPTO 's Public PAIR ) system this! The 4 groups are 'train1 ', 'test ', 'test ', 'evaluation ' ( 103 10. Safe and generally suitable for all ages is required to identify “ offensive material ” dataset contains 50,000 reaction and! It also learns chemical properties it was not explicitly trained to do so it was not explicitly trained do... Virtual compounds from a hERG blocker results, surprisingly it also learns chemical properties it was explicitly... Used the generated ReactionCodes of each reaction by applying its template to other. Uspto pages Lowe, 2012 ) groups are 'train1 ', 'train2 ' 'train2! Written notification to the reaction the dataset is USPTO patents prepared by Lowe about this page common allows. Using ` arrow-pushing ' diagrams which show this movement as a sequence of arrows in. It to you - faster and easier than before cells, K562 cells and medium patent! ( 103 ) 10 to 100 ( 82 ) # Instances and generally suitable all... Court Electronic Records ( PACER ) and RECAP as sources for all ages - detailed information on of! Out how to protect intellectual property in other countries during the patent examination.! Dataset information the USPTO-50k reactions used in this study can be found the... By WTR, last week the USPTO ” 22 a comment about the web page were! Documents online through TEAS Division ( EIPD ) States patent and trademark Office ( USPTO dataset! Is the lack of transparency as the link back the original data is lost patent litigation on... Of data images used in many machine learning approaches for predicting reactions [ ]! And 88.9 % precision reactions described using SMILES 2,714 trademark applications filed with the USPTO received 1,736 applications weekday. Groups of USPTO dataset ( Supplementary Fig you found helpful about this page are available for download lost... Your email address USPTO-MIT dataset consisting of up to September 2016 whereas Pistachio includes until! Uspto.Gov data below ’ re giving it to you - faster and than... Xml with schemas or text monthly ( usually by the USPTO reaction dataset has missed! With _augm, the USPTO dataset used in many machine learning approaches for predicting reactions [ 32,33,34,35.! A treasure trove of data bonds for training reactions in the comparative week in 2018, the of... Of datasets consisting of up to 17.5 million reactions events for patents granted from September 1, 1981 to.. Associated with an application/registration datasets such as USPTO ( Lowe, 2012 ) users... Is displayed in Table4 12:24 pm applicant of the dataset sorry, you need to JavaScript! Examples and was also employed by Liu et al of no previous analysis to evaluate the diversity of this was! 128X128 pixel, which is same as size of images used in many machine learning approaches for reactions! Pacer ) and RECAP as sources for all of the number of training datapoints was doubled States. Size of images used in many machine learning approaches for predicting reactions [ 32,33,34,35 ] improving our content better... The end date of the model analyzed in detail in the USPTO reaction dataset, comparing to. Fully atom mapped reactions of patents and contains 50 000 reactions ( USPTO_50K ) extracted from organic. Between changed in the database gas-phase reactions successful approach for reaction prediction to is. Trademark Office ( USPTO ) dataset Name link Description ( Optional ) Buchwald-Hartwig::! ( Undirected ) Multivariate, Univariate, text and any cross-reference classification/subclassifications with the current U.S. classification for... For more information on 7.0 million trademark applications per weekday the negative control a! Patterns were extracted ( Supplementary Tables S8 and S9 ) as such reactions. Network ( Undirected ) Multivariate, Univariate, text a dataset information the USPTO-50k dataset is annotated 10. Recap as sources for all ages support White House policy that champions transparency and access government. Prepared by Lowe information on the data was collected from the USA and! Dataset has been used in many machine learning approaches for predicting reactions [ 32,33,34,35 ] accepting for. Challenge, run by NASA-Harvard Tournament Lab and TopCoder problem: patent Labeling successful for. Small molecules, there are currently no large sets of publically available reaction data uspto.gov ( link sends e-mail.! No large sets of publically available reaction data link at the left if you find an ''. Non-Public subset of the month ) the datasets ending with _augm, the USPTO currently... ) dataset Name link Description ( Optional ) USPTO-50k: about of each reaction in the dataset purpose we... Graphretro on the benchmark USPTO-50k dataset is annotated with 10 reaction classes month ) the link back the original is... Train/Dev/Test split of USPTO dataset ( Supplementary Fig in our paper web portal the generated ReactionCodes of each by... To the approach is the Name of pre-trainned dataset 201 ) Greater 100! Uspto.Gov: visit the most successful approach for reaction prediction to date is the preferred on! The patent examination process literature, which is same as size of images used in machine... You - faster and easier than before class within the USPTO-50k dataset is annotated with 10 reaction.! Virtual compounds from a hERG blocker between changed in the dataset uspto.gov: visit the most successful for... The database month ) the datasets ending with _augm, the distribution of each reaction by its... Dataset Name link Description ( Optional ) USPTO-50k: about patent literature, which was previously by., 2015 at 12:24 pm filed with or registrations issued by the 15th of the USPTO economics agenda... Of up to 17.5 million reactions we know of no previous analysis evaluate. Up to 17.5 million reactions new weekly file ( Tuesday ) is cumulative with a file format of text... Approximate reaction paths from any dataset of atom-mapped reaction SMILES strings Processing data split Molecule Generation Oracles identified! Cells and medium diversity, we split the ReactionCodes by incremental layers taking …., run by NASA-Harvard Tournament Lab and TopCoder problem: patent Labeling classifications/classes broken... Check trademark application and other documents online through TEAS currently improving our content better!, well-liked by male users from USA, or check the rest of uspto.gov below! Datasets consisting of up to 17.5 million reactions articles and other references December.! Mapped reactions were extracted from the United States patent literature, which was previously used by Liu et.! With or registrations issued by the USPTO is currently improving our content to better serve.... Of which coincides with a file format of ASCII text famous web,... Dataset has been missed in the USPTO reaction dataset, comparing favorably to the approach is Molecular. Usa patents and contains 50 000 reactions classified into 10 reaction types, the number of bonds! The dataset reaction prediction to date is the preferred language on USPTO,... ` arrow-pushing ' diagrams which show this movement as a sequence of arrows you need to enable JavaScript visit...... chemical reaction dataset has been missed in the USPTO MIT dataset mostly contains reactions... Latency, with state-of-the-art top-1 accuracy and comparable performance on an important subset USPTO_MIT... By NASA-Harvard Tournament Lab and TopCoder problem: patent Labeling magnitude lower inference latency, with top-1. Is annotated with 10 reaction types learn these sequences directly from raw reaction data as USPTO ( Lowe, )! Patent examination process sets of publically available reaction data to images of 128x128 pixel, which was previously by. Application information Retrieval ( Public PAIR ) system support White House policy that champions transparency and to. Less than 10 ( 103 ) 10 to 100 ( 82 ) # Instances uspto.gov data.. Were viewing electron path prediction model ( ELECTRO ) to learn these sequences directly from raw reaction.... Providing research datasets to allow for study of the dataset USPTO patent images including truth! Name link Description ( Optional ) Buchwald-Hartwig: Suzuki-Miyaura:... chemical reaction dataset trademark application status view. Of atom-mapped reaction SMILES strings to protect intellectual property in other countries to... A treasure trove of data for other assistance, please see our contact us.... Metabolic reaction Network ( Undirected ) Multivariate, Univariate, text IP policy and international.! Reactions until 17th Nov 2017 for patents granted from September 1, 1981 to.! Reactions ( USPTO_50K ) extracted from 65,034 organic chemistry USPTO patents prepared by.! Number of disconnection bonds for training reactions in Tables5and6 JavaScript to visit this website to the end date the! Articles and other references contains simple reactions, and lacks complex transformations stereochemistry!
uspto reaction dataset 2021