Following the command structure is quite a task on it's own, and not doable without a logic analyzer.
You could ascertain this by monitoring the reads to the SD card. You could be incredibly lucky and the player holds almost no file system metadata in ram, so does re-read and scan the FS directory for each item it plays. That would be possible but again you are left with the potential for misreading if you alter the FS content unknown to the reader. Additional fees may apply contact us to discuss your needs and obtain a quote. If you were to dynamically load data on your dual port SD card emulator, you'd have to fake an eject/insert to get the player to re-read the filing system. Yes, we will forward mail to the address our clients provide for this service. You want to come behind the scenes and dynamically add data to the file system, but the player does not know the file system is changing.In your case the player only reads from the SD card, but even so it will have to manage streaming the file system segments into memory to play the music. The device you plug the SD card into (emulated or not) will read the card and probably hold some small part of the file system metadata in ram as it functions.The real problem (after you've built the raw interface) is that you have media which is dual access with no way to sync the contents for the player side.
Raspberry Pi would be a greater challenge, though you might be able to boot from USB and hack the SD card interface (a huge low level driver challenge) hardware. If your player can read V1.0 SD cards, then you can run 1 lane at 12.5Mhz, which could easily use something like an Arduino SPI interface. Creating a hardware interface is relatively easy and there are multiple FPGA implementations out there. Emulating a raw SD/MMC interface is not the base problem for you.