Intro
Ever imagined looking at a photo of a room and then simply switching the light in that room, inside the photo? or interactively rearranging the fruits on the table? or changing the time that the clock wall shows? all inside the photo. Idea in a Nutshell This post proposes a (new?*) approach for editing and representing photos. No more editing of photos only at the pixel level but editing at the semantic level. Photos in photo editing applications can be much more than just a series of pixels. They can be represented as a collection of semantic objects, objects with meaning: objects with a specific 3D structure, physical characteristics and functional features. This way, photo editing can become more similar to editing a scene in a 3D software. The focus in this post is on photos but SEED could be applied to other types of media like video or audio as well. Furthermore, this post discusses additional opportunities opening up by semantic representation of media. What do I mean by an 'approach'? The suggested approach is called Semantic Editing, Encoding and Decoding (SEED). It is NOT a new object recognition algorithm nor a scene rendering algorithm. It is simply a theoretical approach with roots in current technology. It discusses the 'What' rather than the 'How'. I tend to believe that this approach could be implemented in the future based on future big advances in certain key basic research areas, like object recognition and real time scene rendering. SEED explained First, a good-enough semantic representation should be extracted from a photo and saved - after the raw photo is taken, an object recognition algorithm runs to extract the semantic info from it and save it in a semantic file format. An example of a photo semantic representation would be: a book with a specific ISBN catalog number, opened on page 4, located at a specific position and angle in the room which is lighted by a light bulb with specific lighting characteristics. Then, the saved semantic file can then be re-rendered algorithmically by the photo editing application and let the fun begin... SEED Strengths SEED Weaknesses
- Potential abuse - this should be handled by means of education, awareness, ethics and law.
- Confusion between reality and virtual reality - this should be handled as well via education and awareness.
- Feasibility - SEED is pending on future advances in object recognition techniques. Some might even say this level of recognition will never be achieved but I personally tend to believe it will.
- Processing power - SEED is also pending on advances in efficient real time rendering of complex scenes. This aspect might benefit from advances in cloud computing and bandwidth availability.
- Authentic representation of reality - Is SEED lossy or lossless? do we lose information when we use it or not? I say that the first SEED implementations would be very lossy but as SEED advances it might become even a 'gainy' method. By 'gainy' I mean enabling additional freedom of manipulating images with a basis in reality. Example - enhanced resolution or higher levels of zoom based on knowledge of the 'real' structure and colors of a photographed object. However, there are obvious inherent limitations to SEED for providing authentic representation of reality (such as if we delete an object what do we see instead?).
- Scalability. In order for the idea to be useful the knowledge base, the semantic databases, regarding different kind of objects, environments etc. should be enormous in scale. It should also span many different categories (such as commercial products, nature elements, urban and natural landscapes). There exists an option of having a hybrid scene especially for the first implementations of SEED. A hybrid scene would be made partly of pixels and partly of semantic information. In addition, the initial database of identifiable objects can be small and specific to a certain domain but then gradually grow (to include more categories and more objects).
- Some photos are hard to represent semantically - like one of a kind product or a specific pattern of colors. The hybrid approach could support these use cases. Once encoding algorithms become more sophisticated larger and larger parts of photos could be changed from pixel-based to semantic-based encoding. The amount of objects which are hard to represent semantically might be surprisingly smaller than expected following industrialization processes and improvements in data-sharing and in modeling algorithms.
- Standardization
Acknowledgements
This post was written during the very last couple of days as part of contemplating my personal wishlist for Adobe MAX Sneak Peeks - so thanks Adobe for the inspiration. It was also inspired by the video accompanying Arik Shamir and Shai Avidan's paper, "Seam Carving for Content-Aware Image Resizing" so thanks Arik and Shai.
* Disclaimer: I do not have any background knowledge in the field of object recognition and the like. I consider this as both a bad and a good thing. Bad ,as the idea presented therein might not be refreshing at all without me knowing about it; and good, as it might be extremely refreshing just because I'm not constrained by existing paradigms. So I decided to let go, post as-is in a draft format and hear your valuable feedback. Comment here or find me at Adobe MAX 2010 next week and let me know what you think.