Moving to DX11 or 12 as pointed out would be yet another huge effort that would not show immediate benefits though, as then we should also get away from the pre-rendering of static objects, etcetc. and then would have to change the way stuff is rendered dramatically as the new DXs put much more work and responsibility into the programmers hands, which is good and bad.
I've already been trying to take steps to replace old high poly (static) parts with lower poly parts that could be rendered dynamic. The problem is, we simply aren't going to be crossing that bridge until we come to it so to speak. If we want people to get away from using static rendering, the simplest way would be to take that leap and get rid of static rendering. As it is now, we suffer the drawbacks of not being fully dynamic, but the advantage being we get to use high poly static items with no real impact on performance, no one is going to stop utilising that advantage while it's still present. When people start seeing the advantages of a dynamic environment and the drawbacks of using all these high poly items in that environment, attitudes will shift quickly I can assure you.
The other thing is, making the leap to dx11 or 12, we'd have access to a lot more shaders and we could likely utilise more mapping options like cavity map, light map, shadow map, occlusion map, displacement map. The big one has been normal maps, they are totally key to getting low poly objects to look highly detailed in a dynamic environment. The shaders, in combination with good maps can make a low poly model look fantastic, just see any of my more recent models on sketchfab and try enabling lit wireframe mode to see how much the maps are actually cheating and giving the illusion of detailed geometry that just isn't there in the optimised model.
https://sketchfab.com/dark0verseer
See the X-Files models or TWD model to see what I mean. Sketchfab is using shaders similar to what you see in modern gaming engines, albeit optimised for the web.
I agree on most of your arguments, but the thing is that most of what you mention is also possible in DX9 already.
The engine first would need to be optimized anyway, no matter if targeting DX9 or 10,11,12 or whatever, as for example all similar objects are not combined into packets yet internally and so on, which is bad for performance as the driver usually gets small packages of work instead of huge chunks (that keep the GPU better busy).
The maps you mention are all also all possible in DX9 except for displacement. The reason why we did not include a much mightier material model and more/different input maps is that people were already confused like hell with what we offered with VPX and it took a lot of time for most people to catch up on that. Even normal maps are rarely used in tables.
I'll also most likely will not have as much time on my hands the next year as i had the last two years, so doing a revamp of the engine again will most likely have to wait, unless somebody else wants to tackle this.
But at least trying to have a real dynamic rendering, with movable camera and dynamic lighting, that would be ace IMHO. And i'd love to do the dynamic lighting part especially. 
If I'm understanding things right on sounds/music we have two choices:
- have music in separate files then use the playmusic command, advantages can use mp3 for smaller file size, disadvantages very limited control beyond start stop and set volume at start
- import wav files into vp itself then use playsound command, advantages more control of sound including on the fly frequency (volume?) control, disadvantages wav files would be huge if doing a selection of full songs.
Is there any way to combine those two things? I think simplest from user/builder perspective would be to be able to import mp3 files and have all the playsound controls available. My guess is there is a reason this has not been done but if there would be some way to upgrade this feature in a new vp that would be great. I'm thinking of cases like in Metallica etc where music is playing and when it counts end of ball bonus it lowers the volume on the playing music then raises it back up again, from what I understand this can't be done currently in vp without importing wav files which would make the tables huge. Along with this it would be nice if we could also specify individually where the music/sound played from, backglass or mechanical/cab.
This is one of the big things on my todo list next (hopefully).
The plan is to have the same code/library (e.g. BASS) for music and soundfx, and to have the files all included in the table, with all possible sound formats that the library can offer.
This way one also does not need to use wavs anymore (e.g. smaller table size).
Its not that much of work, but more like fiddling with all the nasty details and extending the sound commands to something more useful for the authors.