Finding our way in information space : Finding our way in information space
Phil Ashworth
Phil Scordis
Slide2 : UCB: The Next Generation Biopharmaceutical Leader R&D activities at 10 global sites
R&D Headcount = 2,100 (August 2007) Monheim (De) Global biopharmaceutical company with specialist focus: Neurology, Inflammation and Oncology
Proven sales and marketing – creating global brands
Keppra®, Xyzal®, Zyrtec®
Revenues of €3.5 billion in 2006 (pro forma)
Successfully transformed with:
Celltech acquisition in 2004
Integration of SCHWARZ PHARMA in September 2007
Over 10,000 employees across more than 40 countries
Listed on EURONEXT (Brussels); current market cap of €7.5 bn
Apology : Apology Health Warning
We are still in the middle of all of this, I don’t have all of the answers
History : History Research and Development in UCB
Comes from integration of Schwarz Pharma, Celltech, OGS, Chiroscience, Darwin
Variety of data source issues
Silos, vendor systems, structured, un-structured etc.
Data integration
A mess of legacy approaches and many situations where no attempt has been made.
To warehouse or not to warehouse?
After a rollout of a research warehouse, at least two distinct examples of different working practice “break” the model
Difficult to extend and rebuild warehouses. – Just another rigid system
Principles and Ideals of the Semantic Web : Principles and Ideals of the Semantic Web
“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [Tim Berners-Lee et al 2001]
Ideal environment
Starting from scratch, building connectivity
Start defining the problem space from a blank page
How applicable is this attractive approach to us?
Lets find out……
The Dream : The Dream What did we want
Facilitating UCB’s pipeline faster to market
Better ROI, an environment in which investment in data generation can be exploited to the full.
Breaking down data boundaries
Major Areas for Improvement
Operational Orchestration
Data Integration
Knowledge discovery and creation
The fantasy
Legacy systems remain in place where appropriate
Data integration is seamless, facilitates aggregation, query based on the meaning of the data
Facilitated exploration of data and exploitation of connections
Starting the journey : Starting the journey Heard of others oscillating around the semantic vs warehouse question
Large investment in both technologies, building components, rolling out home built solutions
Our initial investment
Minimal resource
Limited to vendor applications (best of breed) rather than building our own
But not an all or nothing approach offered by a some
Our learning curve has been steep
Made many mistakes
Visited many dead ends
Experienced limitations first hand
Had many frustrations
Data Integration was our key goal
Where to start : Where to start Principles of the Semantic Web
Understanding the concepts of semantics – so much reading.
Semantic Technologies
Differences between the semantic and OO mindsets
Academia
Some nice projects but, not enterprise orientated
Data Integration
RDF
Has desirable flexibility inherent potential for integration
OWL
Builds on top of RDF potential for rich descriptive framework, plus the power of DL to facilitate Knowledge discovery through Reasoning
Making connections
But our data is in relational systems!
How to integrate: Getting RDF from RDB : How to integrate: Getting RDF from RDB RDF from RDB
D2RQ
Offered the ability to read/query relational databases as RDF
Limitations
Open source.
Didn’t work on real world databases in our hands
Concerns of query speed when using multiple data sources. Wanted asynchronous distributed environment
Reasoning very slow across multiple data sources, Forward Chaining
Cerebra server
Tantalising prospect. A dead-end? Recent changes within company meant that direction for tool was uncertain.
SDS – Interesting prospect (www.insilicodiscovery.com)
Integrated query environment across a variety of data sources (relational, excel, web services etc.)
Distributed asynchronous computing model
No RDF!
How to integrate: RDF Stores / Warehouse : How to integrate: RDF Stores / Warehouse Triple stores
Allegrograph – Franz.
Sesame
Problems
Immature technology
data volumes are limited wrt to life science data volumes
Security and backup – primitive
Limited Integration with other tools.
Needed tighter integration – queries not being carried out directly in RDF stores. Again slow queries & reasoning from tools due to forward chaining.
Still have data duplication issues and requirements for ETL processes
One step forward, two steps back!
How to integrate: Development Tools : How to integrate: Development Tools Few professional development and deployment environments
Roll your own vs the use of open source
Protégé
Great for model development but lacked integration with other tools (when we looked)
TopBraidComposer - TopQuadrant
Excellent functionality out of the box. Easy interface, File imports, navigation etc
Integrated with a variety of third party systems.
D2RQ, Allegrograph, Sesame, Jena, Oracle
But still could not do everything we wanted it to.
TopQuadrant supported our limited resource to enhance our understanding and knowledge.
TopBraidLive one of the first development –> deployment applications
Reasoners
Several looked at - Each had their quirks
None did as we thought or wanted with the data volume we had.
Used Rules to achieve what we needed.
Isn’t this cheating?
Stop the journey – we are getting off : Stop the journey – we are getting off We have tried to achieve data integration chasing several avenues
RDF from RDB
RDF warehouse
Via RDB data -> txt -> RDF -> RDF Store
Semantic SOA, another approach
Pragmatic semantics
Now we understand the messages others have been trying to pass
Blowing hot and cold on the whole idea
Wavering over semantic vs conventional warehousing
Heavy investment in home brew technology or enterprise environment
Is this a dead end?
The end : The end Thanks for coming …
Hang on, we are not giving up yet : Hang on, we are not giving up yet We decided to persevere
But we still don’t have a large amount of resource to throw at this
We need to take a different path
Community action
Collaboration
There is a vibrant and active community out there
W3C …
Involved in direction and calling for standards
So where are we today? : So where are we today?
Driving change : Driving change TopBraidComposer - A semantic development environment using open source and limited data integration tools.
Help with SDS
Tighter Integration with RDF stores
TQ also had to drive other vendors to provide functionality for them
Many other changes as we pushed the boundaries of the tool
TopBraidLive looks very promising as an easy deployment environment
SDS - A data integration platform, enterprise ready, lacking a semantic direction
SPARQL integration (Not just RDF from RDB, RDF from RDB, Excel, web services)
We believe this is key to our future strategy
Changes to their interfaces, tools and capabilities
Integration with TBC
UCB is driving collaborative development
Helping bring companies together (A big thank you to TQ and ISD)
Helping drive the community
In Summary : In Summary The semantic wave is too large to surf alone
Too unpredictable to control
There are some big hurdles to overcome
Integration, tools, enterprise solutions, visualisation, orchestration
However we are committed to helping make things happen
Always on the lookout for open-minded enthusiasts
Committed to contribute to the community
Still believe that Semantic Technologies are part of the solution
But it is not just something we can adopt (at the moment)
It is still something we have to help forge so others can be adopters.
Thank you : Thank you Any Advice Questions?