Buzzwords kicks off on Monday and some of you might still be wondering what’s so exciting about open source, scalable search, data-analysis and NoSQL-databases. We sat down (in front of our inboxes) with Galina Hinova, who’ll be giving a talk at Buzzwords, to ask her more about life on the frontier of big data.
Hi Galina. You’re a Project Manager at IntraFind Software AG, working on Search applications. Can you tell us more about what you do there and what your everyday work look like?
First of all I want to thank the Berlin Geekettes blog for the interest in me and my work. At IntraFind we are working in the area of enterprise search. For most people search is an input box on a webpage. But we at IntraFind do not think so narrowly about search. We are convinced that since the beginning of mankind people have been searching. Let me propose an experiment. Take a piece of paper and a pen and write down all the things you look for in a day. It starts early in the morning. Take me as an example: I always search for my front door key. Or you look outside your window and it is raining, so you have to search for your umbrella, etc. You will be surprised how often you search every day.
So search is more than just software?
Enterprise search is not only about a nice working piece of software. It is about knowing the needs of our customers even if they are not spoken out loud, knowing their infrastructure and problems and picking out the right strategy for your customer. You have to know the methods, the strategies, what is possible and what not.
If you’re saying every search application is unique, how do you approach the problem?
I would describe my job as project manager for enterprise search like piecing a puzzle. I have a set of different modules, strategies, methods, software, skills, nice colleagues and for every customer I need to create a great picture which he can put on his wall – in my case behind the search box of his internet or intranet. So it starts with what does the customer really want, what are the requirements on customer side, what is the best way to get the right solution, how can we best implement this and deliver the right solution to the customer. I am responsible for all this from the beginning till the end of a project. Of course there is a lot of routine but there is also a lot of innovation and creativity in my job, because no customer is like the others.
How did you end up helping people find what they are looking for? What’s your background?
Well, I have a really exotic start in the field of information retrieval and search. I studied German linguistics in Sofia. Then one day I took part in a course on computational linguistics for linguists and the topic just fascinated me. I was excited to learn more and more and one day, about two years later, I found myself as a student of computational linguistics in Munich, Germany. There, I wrote my master thesis on information retrieval and my professor helped me find my first job. It happened to be a firm which offered services in the field of enterprise and internet search. Over time, I transitioned from implementing search solutions to managing the implementation of search solutions. That’s the short version and the long one has much more about giving something up for getting something else, downs and ups, trails and fails and so on. What I have learned from my own story is to keep at it, to work for it and one day it happens.
I imagine things have changed as you’ve been working in the field. Lately it seems like Open Source Software solutions, with projects like Lucene and Solr from the Apache Software Foundation, seem to be key to the Search (and even more generally Big Data) industry. Why is that? Can you tell us more about the Open Source community and mentality in your field?
Open source solutions in the field of search like Lucene and Solr are very popular. In my opinion the reason for this is that Lucene and Solr do not cost a lot. You can download Lucene on your home PC, grab the Lucene in Action book and in a couple of days you have a running search engine. Maybe you don’t know all about search, big data, Lucene but it costs very little money and effort to reach your first success. The entry to the search world can be so easy, for everyone. This is the charm and the great thing about open source software.
So Open Source rules? Case closed?…
Well, on the other side, there are the commercial search engines, for which you have to pay a license fee. And for this you get much more than a core search engine. For example the iFinder, IntraFind’s search software is Lucene based. We have added a lot of goodies which a Lucene engine doesn’t provide and our customers can use them out of the box without implementing them by themselves. We have, for example, integrated a Tagging Service for NER (named entity recognition), a TopicFinder for document classification and a lot of other great stuff.
Okay, so there’s benefits to Open Source and Commercial Search. How do you choose?
The decision for a company is mostly, do I want to invest every year in an employee who can implement a search engine for my company or should I get a software license. The truth is that a software license is much cheaper than a software developer and the licensed software can do much more out of the box than the open source software.
Where’s the cutting edge now? How are the cool kids doing things?
Fifteen to twenty years ago the search engine companies started from university projects with their own cores. Think about FAST, now a Microsoft subsidiary, or Autonomy, now owned by HP, or Endeca, now owned by Oracle. Younger search engine companies like IntraFind, Attivio, etc. do not invest in core implementation. Instead, they use the Lucene core and build their software on it. We at IntraFind have for example two Lucene committers. Even in the scene of commercial search engines there is a shifting from do-it-your-self to pump-it-up-by-your-self. I don’t mean it ironically, because Lucene has proven itself in the last years to be a great open source software. So why reinvent the wheel if there is Lucene ;-).
How can a (beginner) developer get started learning Search and Information Retrieval? What advice do you have?
First of all, you have to want it and you have to keep at it, even if there are some difficulties or disappointments. Second, there are a lot of great books about the topic. I will start with the Information Retrieval Bible. It is called “Modern Information Retrieval” and almost everyone I know, from developers to managers, who are working professionally in the search scene has read it. Then of course you can try to run your own search engine, for example with Lucene. A lot of things you learn just by doing them. So get to know the theory, there are some important rules and best practices you can read about, and then get to know the praxis, there are several nice open source search engines, which you can play with in your sandbox. One day you will be confident enough to break the rules and create your own best practices. But the most important one is: Do not give up!
In a few days you will be at the Buzzwords conference. What are you looking forward to doing/seeing there (other than yourself ;)?
I am really excited about it. I hope to meet some interesting people there, some old friends. My calendar for a sit down with a Berliner beer is already pretty full. But I also hope to meet some new faces. I am curious about new ideas and use cases in the field of search, big data, and data analysis. I am pretty sure I will go home from Berlin Buzzwords with new inspiration for my own work.
Of course there is also my talk, which I will give at the conference. Edit Kiss, project manager at MAN Truck & Bus AG and I will be happy to show a real life project about a search base application, on which we are working right now. It is about the After Sales Portal at MAN Truck and Bus and we will speak about some design decisions for this application. The challenge of the project is to get many different kinds of very specific data in just one application. But I do not want to reveal everything. If you are curious I will be happy to meet you at Berlin Buzzwords.
If you’ve been inspired to dive headfirst into search, and if you’d like to hear more from Galina in person, send our Tech Ambassador Amélie an email (email@example.com) with a short explanation of why you should be the lucky Geekette to snag a free ticket to Buzzwords. Please send us your emails by Friday, May 31st at Midnight!